Oide Rascal, a 2001 Game Boy Color game, contains these strings:

デバックメニューしゅうりょう
* アニメーションテスト
* モーションテスト
* バックグラウンドテスト
* BGMテスト
* SEテスト

I know zero Japanese, but thankfully it was pointed out that the first line translates to “Debug Menu End”, and so another hunt begins…

Tooling

We got all we need:

  • Disassembler / Decompiler: Ghidra + Game Boy loader (Identifies sections in the memory map, creates overlays for each bank…)
  • Debugger: BGB (conditional breakpoints, read/write memory watches…) + binjgb (instruction trace logging)

While binjgb includes a debugger, it has much less features than BGB’s. It does have some interesting visualizations, such as a heatmap that gets colored when addresses are read, giving you an idea of code and data coverage.

Idiosyncrasies

I also knew zero Game Boy internals before reversing this game, so I took a more black-box approach, guided by some of these insights:

  • In Japanese games, text can be encoded in Shift-JIS, rather than some unicode flavour. The above text includes simple looking characters, which usually falls into either Hiragana or Katakana lettering systems. Looking at some Shift-JIS tables, we are able to come with a close enough regex to match these strings: ([\x81-\x83][\x3f-\xff])+.
  • Our 8-bit CPU features some constrained hardware accesses:

    The SM83 keeps an 8-bit data bus and a 16-bit address bus, so up to 64 KB of memory can be addressed. […] Games are written in assembly and they have a maximum size of 32 KB, this is due to the limited address space available. However, with the use of a Memory Bank Controller (mapper), games can reach bigger sizes.

Indeed, this game contains more than 60 banks (another name for overlays?). Looking at the memory map, we see that a fixed bank occupies the address range 0000..3fff, while the other (switchable) banks occupy 4000..7fff.

Now, let’s say you are interested in a memory access of a string residing in e.g. bank 8. The actual reference won’t be represented inside range 8*4000+4000..8*4000+7fff, but inside the bank range 4000..7fff. So, how do we distinguish addresses between banks? Well… we need to somehow keep track of what bank was loaded at a given code address! Also, since addresses only go up to 2 bytes, you are likely to get more false positives than if you had 4 or more bytes to match, due to this reduced range of values.

Finding references

First, let’s find the bank address matching file offset 0xacde9, which is where the debug menu text starts1:

0xacde9 // 0x4000 = 43  # bank 43
hex(0xacde9 - 0x4000 * (0xacde9 // 0x4000 - 1)) = 0x4de9  # rom43::4de9

Now we have an address likely to be referenced in code (4de9). The file offset has some null bytes around it, so it’s better to also do some off-by-one searches (4de8, 4dea). And just in case it was being referenced relative to 4000, also try searching with that value subtracted (0de8, 0de9, 0dea). We need to encode all these in little-endian, since SM83 uses that endianess.

Of all those candidates, actually 4de9 ended up being the most promising:

LANG=C grep -Poba '\xe9\x4d' Oide\ Rascal\ \(Japan\).gbc
# 49378

# Alternatively:
binwalk -R '\xe9\x4d' Oide\ Rascal\ \(Japan\).gbc
# 0xC0E2

Bank address:

0xc0e2 // 0x4000 = 3  # bank 3
hex(0xc0e2 - 0x4000 * (0xc0e2 // 0x4000 - 1)) = 0x40e2  # rom3::40e1

Disassembly:

rom3::40ca fa d7 c4        LD         A,(DAT_c4d7)
rom3::40cd 87              ADD        A
rom3::40ce 5f              LD         E,A
rom3::40cf 16 00           LD         D,0x0
rom3::40d1 21 e2 40        LD         HL,0x40e2
rom3::40d4 19              ADD        HL,DE
rom3::40d5 2a              LD         A,(HL+)=>PTR_DAT_rom3__4de9_rom3__40e2 = rom3::4de9
rom3::40d6 66              LD         H,(HL=>PTR_DAT_rom3__4de9_rom3__40e2+1)
rom3::40d7 6f              LD         L,A
rom3::40d8 3e 01           LD         A,0x1
rom3::40da e0 ea           LDH        (offset DAT_ffea),A
rom3::40dc 3e 2b           LD         A,0x2b
rom3::40de cd f5 14        CALL       FUN_14f5
rom3::40e1 c9              RET
                       PTR_DAT_rom3__4de9_rom3__40e2+1    XREF[1,1]: FUN_rom3__40ca:40d5(R),
                       PTR_DAT_rom3__4de9_rom3__40e2                 FUN_rom3__40ca:40d6(R)
rom3::40e2 e9 4d           addr       DAT_rom3__4de9
rom3::40e4 08 4e           addr       DAT_rom3__4e08
rom3::40e6 1f 4e           addr       DAT_rom3__4e1f

We can see bank 43 (0x2b) being loaded, the same bank that contains debug strings. Although it isn’t yet clear what FUN_14f5 does with it (could just be some unrelated 0x2b value). After FUN_rom3__40ca ends, we seem to have a list of valid addresses, of which the first is the one we matched.

After disassembling instructions inside range 4000..40d1 in bank 3, we can infer this call stack up to FUN_rom3__40ca, it’s likely we want to end up calling one of these functions:

FUN_rom3__4000
  rom3::4039 cd 3f 40      CALL  FUN_rom3__403f
FUN_rom3__403f
  rom3::405d cd ca 40      CALL  FUN_rom3__40ca
FUN_rom3__40ca

Now that we have a candidate bank for the debug menu code, let’s figure out how bank switching works.

Looking at some documentation on memory bank controllers, we have a section about addresses in range 2000..3fff:

Writing to this address space selects the lower 5 bits of the ROM Bank Number (in range 01-1Fh). When 00h is written, the MBC translates that to bank 01h also.

We don’t know exactly how these writes are being done in assembly, so let’s try setting a memory watch on this address using BGB:

  1. Open the debugger window, right-click anywhere on the rom hex dump pane, select “Set access breakpoint”;
  2. Add an entry with “addr range” set to 0:2000;
  3. Check “on write”.

We get a hit on an ld (2000),a instruction, with corresponding bytes ea 00 20. Let’s check how many of these instructions are present:

binwalk -R '\xea\x00\x20' Oide\ Rascal\ \(Japan\).gbc
# 0x27D
# 0x28C
# 0x796

The last match has several xrefs, making it a better candidate to investigate:

0795 f3              DI
0796 ea 00 20        LD         (LAB_2000),A
0799 e0 98           LDH        (offset DAT_ff98),A
079b fb              EI
079c c9              RET

At first, I just set a breakpoint at 0796, and when hit, manually changed the value of register af, so that the high part (a) had value 03. Then we continue without breaking (press “Shift + F9”).

On af = 1000, these attempts resulted in a crash screen (Google translated as “This game software seems like Game Boy Color / Play with Game Boy Color!”):

On af = 0470 or af = 01c0, they resulted in some of the debug text being loaded:

But we can mess around some more… On af = 1000, after setting it to af = 3000, instead of continuing:

  1. Step until we reach the ret instruction;
  2. On 1d66, step over to 1d69, and step inside;
  3. Set pc = 4000;
  4. Continue and hit the previous breakpoint at 0796 a few more times.

Eventually these graphics are loaded before the crash screen:

We can see that these are tiles used for menus, such as in this sound test screen:

Also, the first button in these menus is used for returning to the previous screen. It makes sense that “Debug Menu End” would then be the first string stored for the debug menu, suggesting some of the loading code to be identical across these menus!

Activating the debug menu

At this point, we now turn to finding functions specific to menu loading. We can dump a trace log up until the main menu loads, then another trace log where we select the menu entry that opens the sound menu. Diffing these two logs should make evident which additional functions were called.

Let’s build binjgb and capture these traces:

# compile
mkdir -p build && (cd build && cmake -DTESTER_DEBUGGER=ON .. && make)

# run up to main menu
./bin/binjgb-debugger -t Oide\ Rascal\ \(Japan\).gbc > trace1.log
# run up to sound menu
./bin/binjgb-debugger -t Oide\ Rascal\ \(Japan\).gbc > trace2.log

# filter by control-flow modifying instructions,
# but also include the next instruction
for i in trace1.log trace2.log; do
   grep -A1 '\(call\|jp\|jr\|rst\) ' "$i" \
      | cut -d' ' -f11- \  # remove uneeded info to reduce changes in diff
      | sort -u \
      > filtered-"$i"
done

# compare
diff -auw filtered-trace1.log filtered-trace2.log

We now get several blocks of additional calls in the diff. Sorting them makes it easier to find the first candidate call of each block to investigate, as it’s likely that closer instructions are under the same function.

Eventually, we come across these calls:

 |[00]0x1d96: cd 6a 03  call $036a
 |[00]0x1d9a: cd 95 07  call $0795
 |[00]0x1da1: c3 12 1d  jp $1d12
+|[00]0x1db4: cd 95 07  call $0795
+|[00]0x1db7: cd 98 45  call $4598

Disassembly:

1daf f0 98           LDH        A,(offset DAT_ff98)
1db1 f5              PUSH       AF
1db2 3e 01           LD         A,0x1
1db4 cd 95 07        CALL       switch_bank
1db7 cd 98 45        CALL       SUB_4598
1dba fa 8f c4        LD         A,(DAT_c48f)
1dbd c7              RST        rst00
                 -- Flow Override: CALL_RETURN (CALL_TERMINATOR)
1dbe ab 43           addr       DAT_43ab

And we also find 43ab to be exclusive to the sound menu trace log:

 |[01]0x4219: 18 01     jr +1
 |[01]0x4226: 18 0b     jr +11
 |[01]0x4233: c3 12 1d  jp $1d12
+|[01]0x4251: cd 03 05  call $0503
+|[01]0x4254: 20 47     jr nz,+71
+|[01]0x4276: 20 25     jr nz,+37
+|[01]0x429d: c3 12 1d  jp $1d12
+|[01]0x43ab: cd b1 17  call $17b1

With grep -B100 1db7 trace2.log (a bit more manageable than parsing the whole ~1G log), we confirm that 1daf and the next instructions get executed:

[...]
[00]0x0892: e9        jp hl
[00]0x1daf: f0 98     ldh a,[$ff98]
[00]0x1db1: f5        push af
[00]0x1db2: 3e 01     ld a,1
[00]0x1db4: cd 95 07  call $0795
[00]0x0795: f3        di
[00]0x0796: ea 00 20  ld [$2000],a
[00]0x0799: e0 98     ldh [$ff98],a
[00]0x079b: fb        ei
[00]0x079c: c9        ret
[00]0x1db7: cd 98 45  call $4598

Now, if the debug menu would be loaded like the sound menu, maybe there’s a routine that also ends with an rst00, but is followed by one of the addresses we identified as candidate calls from bank 3 (e.g. 4000). Let’s check:

binwalk -R '\xc7\x00\x40' Oide\ Rascal\ \(Japan\).gbc
# 0x3C0A

A single match, and we do find a very similar routine, which even loads bank 3:

3bff f0 98           LDH        A,(offset DAT_ff98)
3c01 f5              PUSH       AF
3c02 3e 03           LD         A,0x3
3c04 cd 95 07        CALL       switch_bank
3c07 fa eb c4        LD         A,(DAT_c4eb)
3c0a c7              RST        rst00
                 -- Flow Override: CALL_RETURN (CALL_TERMINATOR)
3c0b 00 40           addr       SUB_4000

You know what, let’s jump to this routine instead of following through the sound menu:

-1daf f0 98           LDH        A,(offset DAT_ff98)
-1db1 f5              PUSH       AF
+1daf c3 ff 3b        JP         SUB_3bff
 1db2 3e 01           LD         A,0x1

We can manually apply this patch under BGB (right-click on the code pane, select “Modify code/data”). Alternatively, we could apply this Game Genie code:

C3D-AFE-2A9
FFD-B0E-808
3BD-B1E-3BD

Now when we select the sound menu, we are instead greeted by another screen:

That’s the first string of the debug menu text, and the other buttons match the other identified strings, looks like the real deal!


BTW, let’s take a look at one of the “Background Test” screens:

The last entry “おまけ” means extra / bonus, so it’s probably just some unlockable that isn’t shown by default. Maybe the debug menu wasn’t activated in the main menu but in some other manner, perhaps a button code…

  1. In case we didn’t know the file offset, we could take a string and convert it to shift-jis, using the corresponding bytes for matching:

      LANG=C grep -Poba "$(iconv -f utf-8 -t shift-jis \
         <(printf '%s' 'デバックメニューしゅうりょう') | \
         xxd -p | \
         sed 's/\(..\)/\\x\1/g')" Oide\ Rascal\ \(Japan\).gbc
      # 708075 (0xACDEB)