Oide Rascal, a 2001 Game Boy Color game, contains these strings:
デバックメニューしゅうりょう * アニメーションテスト * モーションテスト * バックグラウンドテスト * ＢＧＭテスト * ＳＥテスト
I know zero Japanese, but thankfully it was pointed out that the first line translates to “Debug Menu End”, and so another hunt begins…
We got all we need:
- Disassembler / Decompiler: Ghidra + Game Boy loader (Identifies sections in the memory map, creates overlays for each bank…)
- Debugger: BGB (conditional breakpoints, read/write memory watches…) + binjgb (instruction trace logging)
While binjgb includes a debugger, it has much less features than BGB’s. It does have some interesting visualizations, such as a heatmap that gets colored when addresses are read, giving you an idea of code and data coverage.
I also knew zero Game Boy internals before reversing this game, so I took a more black-box approach, guided by some of these insights:
- In Japanese games, text can be encoded in Shift-JIS, rather than some unicode flavour. The above text includes simple looking characters, which usually falls into either Hiragana or Katakana lettering systems. Looking at some Shift-JIS tables, we are able to come with a close enough regex to match these strings:
- Our 8-bit CPU features some constrained hardware accesses:
The SM83 keeps an 8-bit data bus and a 16-bit address bus, so up to 64 KB of memory can be addressed. […] Games are written in assembly and they have a maximum size of 32 KB, this is due to the limited address space available. However, with the use of a Memory Bank Controller (mapper), games can reach bigger sizes.
Indeed, this game contains more than 60 banks (another name for overlays?). Looking at the memory map, we see that a fixed bank occupies the address range
0000..3fff, while the other (switchable) banks occupy
Now, let’s say you are interested in a memory access of a string residing in e.g. bank 8. The actual reference won’t be represented inside range
8*4000+4000..8*4000+7fff, but inside the bank range
4000..7fff. So, how do we distinguish addresses between banks? Well… we need to somehow keep track of what bank was loaded at a given code address! Also, since addresses only go up to 2 bytes, you are likely to get more false positives than if you had 4 or more bytes to match, due to this reduced range of values.
First, let’s find the bank address matching file offset
0xacde9, which is where the debug menu text starts1:
0xacde9 // 0x4000 = 43 # bank 43 hex(0xacde9 - 0x4000 * (0xacde9 // 0x4000 - 1)) = 0x4de9 # rom43::4de9
Now we have an address likely to be referenced in code (
4de9). The file offset has some null bytes around it, so it’s better to also do some off-by-one searches (
4dea). And just in case it was being referenced relative to
4000, also try searching with that value subtracted (
0dea). We need to encode all these in little-endian, since SM83 uses that endianess.
Of all those candidates, actually
4de9 ended up being the most promising:
LANG=C grep -Poba '\xe9\x4d' Oide\ Rascal\ \(Japan\).gbc # 49378 # Alternatively: binwalk -R '\xe9\x4d' Oide\ Rascal\ \(Japan\).gbc # 0xC0E2
0xc0e2 // 0x4000 = 3 # bank 3 hex(0xc0e2 - 0x4000 * (0xc0e2 // 0x4000 - 1)) = 0x40e2 # rom3::40e1
rom3::40ca fa d7 c4 LD A,(DAT_c4d7) rom3::40cd 87 ADD A rom3::40ce 5f LD E,A rom3::40cf 16 00 LD D,0x0 rom3::40d1 21 e2 40 LD HL,0x40e2 rom3::40d4 19 ADD HL,DE rom3::40d5 2a LD A,(HL+)=>PTR_DAT_rom3__4de9_rom3__40e2 = rom3::4de9 rom3::40d6 66 LD H,(HL=>PTR_DAT_rom3__4de9_rom3__40e2+1) rom3::40d7 6f LD L,A rom3::40d8 3e 01 LD A,0x1 rom3::40da e0 ea LDH (offset DAT_ffea),A rom3::40dc 3e 2b LD A,0x2b rom3::40de cd f5 14 CALL FUN_14f5 rom3::40e1 c9 RET PTR_DAT_rom3__4de9_rom3__40e2+1 XREF[1,1]: FUN_rom3__40ca:40d5(R), PTR_DAT_rom3__4de9_rom3__40e2 FUN_rom3__40ca:40d6(R) rom3::40e2 e9 4d addr DAT_rom3__4de9 rom3::40e4 08 4e addr DAT_rom3__4e08 rom3::40e6 1f 4e addr DAT_rom3__4e1f
We can see bank 43 (0x2b) being loaded, the same bank that contains debug strings. Although it isn’t yet clear what
FUN_14f5 does with it (could just be some unrelated 0x2b value). After
FUN_rom3__40ca ends, we seem to have a list of valid addresses, of which the first is the one we matched.
After disassembling instructions inside range
4000..40d1 in bank 3, we can infer this call stack up to
FUN_rom3__40ca, it’s likely we want to end up calling one of these functions:
FUN_rom3__4000 rom3::4039 cd 3f 40 CALL FUN_rom3__403f FUN_rom3__403f rom3::405d cd ca 40 CALL FUN_rom3__40ca FUN_rom3__40ca
Now that we have a candidate bank for the debug menu code, let’s figure out how bank switching works.
Writing to this address space selects the lower 5 bits of the ROM Bank Number (in range 01-1Fh). When 00h is written, the MBC translates that to bank 01h also.
We don’t know exactly how these writes are being done in assembly, so let’s try setting a memory watch on this address using BGB:
- Open the debugger window, right-click anywhere on the rom hex dump pane, select “Set access breakpoint”;
- Add an entry with “addr range” set to
- Check “on write”.
We get a hit on an
ld (2000),a instruction, with corresponding bytes
ea 00 20. Let’s check how many of these instructions are present:
binwalk -R '\xea\x00\x20' Oide\ Rascal\ \(Japan\).gbc # 0x27D # 0x28C # 0x796
The last match has several xrefs, making it a better candidate to investigate:
0795 f3 DI 0796 ea 00 20 LD (LAB_2000),A 0799 e0 98 LDH (offset DAT_ff98),A 079b fb EI 079c c9 RET
At first, I just set a breakpoint at
0796, and when hit, manually changed the value of register
af, so that the high part (
a) had value
03. Then we continue without breaking (press “Shift + F9”).
af = 1000, these attempts resulted in a crash screen (Google translated as “This game software seems like Game Boy Color / Play with Game Boy Color!”):
af = 0470 or
af = 01c0, they resulted in some of the debug text being loaded:
But we can mess around some more… On
af = 1000, after setting it to
af = 3000, instead of continuing:
- Step until we reach the
1d66, step over to
1d69, and step inside;
pc = 4000;
- Continue and hit the previous breakpoint at
0796a few more times.
Eventually these graphics are loaded before the crash screen:
We can see that these are tiles used for menus, such as in this sound test screen:
Also, the first button in these menus is used for returning to the previous screen. It makes sense that “Debug Menu End” would then be the first string stored for the debug menu, suggesting some of the loading code to be identical across these menus!
Activating the debug menu
At this point, we now turn to finding functions specific to menu loading. We can dump a trace log up until the main menu loads, then another trace log where we select the menu entry that opens the sound menu. Diffing these two logs should make evident which additional functions were called.
Let’s build binjgb and capture these traces:
# compile mkdir -p build && (cd build && cmake -DTESTER_DEBUGGER=ON .. && make) # run up to main menu ./bin/binjgb-debugger -t Oide\ Rascal\ \(Japan\).gbc > trace1.log # run up to sound menu ./bin/binjgb-debugger -t Oide\ Rascal\ \(Japan\).gbc > trace2.log # filter by control-flow modifying instructions, # but also include the next instruction for i in trace1.log trace2.log; do grep -A1 '\(call\|jp\|jr\|rst\) ' "$i" \ | cut -d' ' -f11- \ # remove uneeded info to reduce changes in diff | sort -u \ > filtered-"$i" done # compare diff -auw filtered-trace1.log filtered-trace2.log
We now get several blocks of additional calls in the diff. Sorting them makes it easier to find the first candidate call of each block to investigate, as it’s likely that closer instructions are under the same function.
Eventually, we come across these calls:
|0x1d96: cd 6a 03 call $036a |0x1d9a: cd 95 07 call $0795 |0x1da1: c3 12 1d jp $1d12 +|0x1db4: cd 95 07 call $0795 +|0x1db7: cd 98 45 call $4598
1daf f0 98 LDH A,(offset DAT_ff98) 1db1 f5 PUSH AF 1db2 3e 01 LD A,0x1 1db4 cd 95 07 CALL switch_bank 1db7 cd 98 45 CALL SUB_4598 1dba fa 8f c4 LD A,(DAT_c48f) 1dbd c7 RST rst00 -- Flow Override: CALL_RETURN (CALL_TERMINATOR) 1dbe ab 43 addr DAT_43ab
And we also find
43ab to be exclusive to the sound menu trace log:
|0x4219: 18 01 jr +1 |0x4226: 18 0b jr +11 |0x4233: c3 12 1d jp $1d12 +|0x4251: cd 03 05 call $0503 +|0x4254: 20 47 jr nz,+71 +|0x4276: 20 25 jr nz,+37 +|0x429d: c3 12 1d jp $1d12 +|0x43ab: cd b1 17 call $17b1
grep -B100 1db7 trace2.log (a bit more manageable than parsing the whole ~1G log), we confirm that
1daf and the next instructions get executed:
[...] 0x0892: e9 jp hl 0x1daf: f0 98 ldh a,[$ff98] 0x1db1: f5 push af 0x1db2: 3e 01 ld a,1 0x1db4: cd 95 07 call $0795 0x0795: f3 di 0x0796: ea 00 20 ld [$2000],a 0x0799: e0 98 ldh [$ff98],a 0x079b: fb ei 0x079c: c9 ret 0x1db7: cd 98 45 call $4598
Now, if the debug menu would be loaded like the sound menu, maybe there’s a routine that also ends with an
rst00, but is followed by one of the addresses we identified as candidate calls from bank 3 (e.g.
4000). Let’s check:
binwalk -R '\xc7\x00\x40' Oide\ Rascal\ \(Japan\).gbc # 0x3C0A
A single match, and we do find a very similar routine, which even loads bank 3:
3bff f0 98 LDH A,(offset DAT_ff98) 3c01 f5 PUSH AF 3c02 3e 03 LD A,0x3 3c04 cd 95 07 CALL switch_bank 3c07 fa eb c4 LD A,(DAT_c4eb) 3c0a c7 RST rst00 -- Flow Override: CALL_RETURN (CALL_TERMINATOR) 3c0b 00 40 addr SUB_4000
You know what, let’s jump to this routine instead of following through the sound menu:
-1daf f0 98 LDH A,(offset DAT_ff98) -1db1 f5 PUSH AF +1daf c3 ff 3b JP SUB_3bff 1db2 3e 01 LD A,0x1
We can manually apply this patch under BGB (right-click on the code pane, select “Modify code/data”). Alternatively, we could apply this Game Genie code:
C3D-AFE-2A9 FFD-B0E-808 3BD-B1E-3BD
Now when we select the sound menu, we are instead greeted by another screen:
That’s the first string of the debug menu text, and the other buttons match the other identified strings, looks like the real deal!
BTW, let’s take a look at one of the “Background Test” screens:
The last entry “おまけ” means extra / bonus, so it’s probably just some unlockable that isn’t shown by default. Maybe the debug menu wasn’t activated in the main menu but in some other manner, perhaps a button code…
In case we didn’t know the file offset, we could take a string and convert it to shift-jis, using the corresponding bytes for matching:
LANG=C grep -Poba "$(iconv -f utf-8 -t shift-jis \ <(printf '%s' 'デバックメニューしゅうりょう') | \ xxd -p | \ sed 's/\(..\)/\\x\1/g')" Oide\ Rascal\ \(Japan\).gbc # 708075 (0xACDEB)