8-bit needles
Oide Rascal, a 2001 Game Boy Color game, contains these strings:
デバックメニューしゅうりょう
* アニメーションテスト
* モーションテスト
* バックグラウンドテスト
* BGMテスト
* SEテスト
I know zero Japanese, but thankfully it was pointed out that the first line translates to “Debug Menu End”, and so another hunt begins…
Tooling
We got all we need:
- Disassembler / Decompiler: Ghidra + Game Boy loader (Identifies sections in the memory map, creates overlays for each bank…)
- Debugger: BGB (conditional breakpoints, read/write memory watches…) + binjgb (instruction trace logging)
While binjgb includes a debugger, it has much less features than BGB’s. It does have some interesting visualizations, such as a heatmap that gets colored when addresses are read, giving you an idea of code and data coverage.
Idiosyncrasies
I also knew zero Game Boy internals before reversing this game, so I took a more black-box approach, guided by some of these insights:
- In Japanese games, text can be encoded in Shift-JIS, rather than some unicode flavour. The above text includes simple looking characters, which usually falls into either Hiragana or Katakana lettering systems. Looking at some Shift-JIS tables, we are able to come with a close enough regex to match these strings:
([\x81-\x83][\x3f-\xff])+
. - Our 8-bit CPU features some constrained hardware accesses:
The SM83 keeps an 8-bit data bus and a 16-bit address bus, so up to 64 KB of memory can be addressed. […] Games are written in assembly and they have a maximum size of 32 KB, this is due to the limited address space available. However, with the use of a Memory Bank Controller (mapper), games can reach bigger sizes.
Indeed, this game contains more than 60 banks (another name for overlays?). Looking at the memory map, we see that a fixed bank occupies the address range 0000..3fff
, while the other (switchable) banks occupy 4000..7fff
.
Now, let’s say you are interested in a memory access of a string residing in e.g. bank 8. The actual reference won’t be represented inside range 8*4000+4000..8*4000+7fff
, but inside the bank range 4000..7fff
. So, how do we distinguish addresses between banks? Well… we need to somehow keep track of what bank was loaded at a given code address! Also, since addresses only go up to 2 bytes, you are likely to get more false positives than if you had 4 or more bytes to match, due to this reduced range of values.
Finding references
First, let’s find the bank address matching file offset 0xacde9
, which is where the debug menu text starts1:
0xacde9 // 0x4000 = 43 # bank 43
hex(0xacde9 - 0x4000 * (0xacde9 // 0x4000 - 1)) = 0x4de9 # rom43::4de9
Now we have an address likely to be referenced in code (4de9
). The file offset has some null bytes around it, so it’s better to also do some off-by-one searches (4de8
, 4dea
). And just in case it was being referenced relative to 4000
, also try searching with that value subtracted (0de8
, 0de9
, 0dea
). We need to encode all these in little-endian, since SM83 uses that endianess.
Of all those candidates, actually 4de9
ended up being the most promising:
LANG=C grep -Poba '\xe9\x4d' Oide\ Rascal\ \(Japan\).gbc
# 49378
# Alternatively:
binwalk -R '\xe9\x4d' Oide\ Rascal\ \(Japan\).gbc
# 0xC0E2
Bank address:
0xc0e2 // 0x4000 = 3 # bank 3
hex(0xc0e2 - 0x4000 * (0xc0e2 // 0x4000 - 1)) = 0x40e2 # rom3::40e1
Disassembly:
rom3::40ca fa d7 c4 LD A,(DAT_c4d7)
rom3::40cd 87 ADD A
rom3::40ce 5f LD E,A
rom3::40cf 16 00 LD D,0x0
rom3::40d1 21 e2 40 LD HL,0x40e2
rom3::40d4 19 ADD HL,DE
rom3::40d5 2a LD A,(HL+)=>PTR_DAT_rom3__4de9_rom3__40e2 = rom3::4de9
rom3::40d6 66 LD H,(HL=>PTR_DAT_rom3__4de9_rom3__40e2+1)
rom3::40d7 6f LD L,A
rom3::40d8 3e 01 LD A,0x1
rom3::40da e0 ea LDH (offset DAT_ffea),A
rom3::40dc 3e 2b LD A,0x2b
rom3::40de cd f5 14 CALL FUN_14f5
rom3::40e1 c9 RET
PTR_DAT_rom3__4de9_rom3__40e2+1 XREF[1,1]: FUN_rom3__40ca:40d5(R),
PTR_DAT_rom3__4de9_rom3__40e2 FUN_rom3__40ca:40d6(R)
rom3::40e2 e9 4d addr DAT_rom3__4de9
rom3::40e4 08 4e addr DAT_rom3__4e08
rom3::40e6 1f 4e addr DAT_rom3__4e1f
We can see bank 43 (0x2b) being loaded, the same bank that contains debug strings. Although it isn’t yet clear what FUN_14f5
does with it (could just be some unrelated 0x2b value). After FUN_rom3__40ca
ends, we seem to have a list of valid addresses, of which the first is the one we matched.
After disassembling instructions inside range 4000..40d1
in bank 3, we can infer this call stack up to FUN_rom3__40ca
, it’s likely we want to end up calling one of these functions:
FUN_rom3__4000
rom3::4039 cd 3f 40 CALL FUN_rom3__403f
FUN_rom3__403f
rom3::405d cd ca 40 CALL FUN_rom3__40ca
FUN_rom3__40ca
Now that we have a candidate bank for the debug menu code, let’s figure out how bank switching works.
Looking at some documentation on memory bank controllers, we have a section about addresses in range 2000..3fff
:
Writing to this address space selects the lower 5 bits of the ROM Bank Number (in range 01-1Fh). When 00h is written, the MBC translates that to bank 01h also.
We don’t know exactly how these writes are being done in assembly, so let’s try setting a memory watch on this address using BGB:
- Open the debugger window, right-click anywhere on the rom hex dump pane, select “Set access breakpoint”;
- Add an entry with “addr range” set to
0:2000
; - Check “on write”.
We get a hit on an ld (2000),a
instruction, with corresponding bytes ea 00 20
. Let’s check how many of these instructions are present:
binwalk -R '\xea\x00\x20' Oide\ Rascal\ \(Japan\).gbc
# 0x27D
# 0x28C
# 0x796
The last match has several xrefs, making it a better candidate to investigate:
0795 f3 DI
0796 ea 00 20 LD (LAB_2000),A
0799 e0 98 LDH (offset DAT_ff98),A
079b fb EI
079c c9 RET
At first, I just set a breakpoint at 0796
, and when hit, manually changed the value of register af
, so that the high part (a
) had value 03
. Then we continue without breaking (press “Shift + F9”).
On af = 1000
, these attempts resulted in a crash screen (Google translated as “This game software seems like Game Boy Color / Play with Game Boy Color!”):
On af = 0470
or af = 01c0
, they resulted in some of the debug text being loaded:
But we can mess around some more… On af = 1000
, after setting it to af = 3000
, instead of continuing:
- Step until we reach the
ret
instruction; - On
1d66
, step over to1d69
, and step inside; - Set
pc = 4000
; - Continue and hit the previous breakpoint at
0796
a few more times.
Eventually these graphics are loaded before the crash screen:
We can see that these are tiles used for menus, such as in this sound test screen:
Also, the first button in these menus is used for returning to the previous screen. It makes sense that “Debug Menu End” would then be the first string stored for the debug menu, suggesting some of the loading code to be identical across these menus!
Activating the debug menu
At this point, we now turn to finding functions specific to menu loading. We can dump a trace log up until the main menu loads, then another trace log where we select the menu entry that opens the sound menu. Diffing these two logs should make evident which additional functions were called.
Let’s build binjgb and capture these traces:
# compile
mkdir -p build && (cd build && cmake -DTESTER_DEBUGGER=ON .. && make)
# run up to main menu
./bin/binjgb-debugger -t Oide\ Rascal\ \(Japan\).gbc > trace1.log
# run up to sound menu
./bin/binjgb-debugger -t Oide\ Rascal\ \(Japan\).gbc > trace2.log
# filter by control-flow modifying instructions,
# but also include the next instruction
for i in trace1.log trace2.log; do
grep -A1 '\(call\|jp\|jr\|rst\) ' "$i" \
| cut -d' ' -f11- \ # remove uneeded info to reduce changes in diff
| sort -u \
> filtered-"$i"
done
# compare
diff -auw filtered-trace1.log filtered-trace2.log
We now get several blocks of additional calls in the diff. Sorting them makes it easier to find the first candidate call of each block to investigate, as it’s likely that closer instructions are under the same function.
Eventually, we come across these calls:
|[00]0x1d96: cd 6a 03 call $036a
|[00]0x1d9a: cd 95 07 call $0795
|[00]0x1da1: c3 12 1d jp $1d12
+|[00]0x1db4: cd 95 07 call $0795
+|[00]0x1db7: cd 98 45 call $4598
Disassembly:
1daf f0 98 LDH A,(offset DAT_ff98)
1db1 f5 PUSH AF
1db2 3e 01 LD A,0x1
1db4 cd 95 07 CALL switch_bank
1db7 cd 98 45 CALL SUB_4598
1dba fa 8f c4 LD A,(DAT_c48f)
1dbd c7 RST rst00
-- Flow Override: CALL_RETURN (CALL_TERMINATOR)
1dbe ab 43 addr DAT_43ab
And we also find 43ab
to be exclusive to the sound menu trace log:
|[01]0x4219: 18 01 jr +1
|[01]0x4226: 18 0b jr +11
|[01]0x4233: c3 12 1d jp $1d12
+|[01]0x4251: cd 03 05 call $0503
+|[01]0x4254: 20 47 jr nz,+71
+|[01]0x4276: 20 25 jr nz,+37
+|[01]0x429d: c3 12 1d jp $1d12
+|[01]0x43ab: cd b1 17 call $17b1
With grep -B100 1db7 trace2.log
(a bit more manageable than parsing the whole ~1G log), we confirm that 1daf
and the next instructions get executed:
[...]
[00]0x0892: e9 jp hl
[00]0x1daf: f0 98 ldh a,[$ff98]
[00]0x1db1: f5 push af
[00]0x1db2: 3e 01 ld a,1
[00]0x1db4: cd 95 07 call $0795
[00]0x0795: f3 di
[00]0x0796: ea 00 20 ld [$2000],a
[00]0x0799: e0 98 ldh [$ff98],a
[00]0x079b: fb ei
[00]0x079c: c9 ret
[00]0x1db7: cd 98 45 call $4598
Now, if the debug menu would be loaded like the sound menu, maybe there’s a routine that also ends with an rst00
, but is followed by one of the addresses we identified as candidate calls from bank 3 (e.g. 4000
). Let’s check:
binwalk -R '\xc7\x00\x40' Oide\ Rascal\ \(Japan\).gbc
# 0x3C0A
A single match, and we do find a very similar routine, which even loads bank 3:
3bff f0 98 LDH A,(offset DAT_ff98)
3c01 f5 PUSH AF
3c02 3e 03 LD A,0x3
3c04 cd 95 07 CALL switch_bank
3c07 fa eb c4 LD A,(DAT_c4eb)
3c0a c7 RST rst00
-- Flow Override: CALL_RETURN (CALL_TERMINATOR)
3c0b 00 40 addr SUB_4000
You know what, let’s jump to this routine instead of following through the sound menu:
-1daf f0 98 LDH A,(offset DAT_ff98)
-1db1 f5 PUSH AF
+1daf c3 ff 3b JP SUB_3bff
1db2 3e 01 LD A,0x1
We can manually apply this patch under BGB (right-click on the code pane, select “Modify code/data”). Alternatively, we could apply this Game Genie code:
C3D-AFE-2A9
FFD-B0E-808
3BD-B1E-3BD
Now when we select the sound menu, we are instead greeted by another screen:
That’s the first string of the debug menu text, and the other buttons match the other identified strings, looks like the real deal!
BTW, let’s take a look at one of the “Background Test” screens:
The last entry “おまけ” means extra / bonus, so it’s probably just some unlockable that isn’t shown by default. Maybe the debug menu wasn’t activated in the main menu but in some other manner, perhaps a button code…
-
In case we didn’t know the file offset, we could take a string and convert it to shift-jis, using the corresponding bytes for matching:
LANG=C grep -Poba "$(iconv -f utf-8 -t shift-jis \ <(printf '%s' 'デバックメニューしゅうりょう') | \ xxd -p | \ sed 's/\(..\)/\\x\1/g')" Oide\ Rascal\ \(Japan\).gbc # 708075 (0xACDEB)