SimCity 2000 had a port made for the Saturn in 1995. Some magazine coverage mentioned a cameo of Sonic the Hedgehog:

However, there has been no evidence of it ever showing up ingame. We will parse files for tiles and animations, to see if at least the graphics are still there.

From previous versions of the game, we know that tiles are compressed with a custom run-length encoding. However, the metadata in the Saturn port seemed to be significantly different, so we’ll look into how it’s parsed in the disassembly.

Last time I worked on reversing a compression format I did the whole “meticulously step through each instruction” approach. This time I went with a lazier approach: Identify the decompression function, prepare state, then run in a CPU emulator, iterating through all tiles. Usually this is where I would pick up Unicorn for scripting, but at the time of writing, it didn’t support SH-2. So I picked the next best thing: writing a skeleton driver for MAME. Although a GDB remote server is available, it only supports a subset of emulated CPUs. Other alternatives such as PCode emulation could work here as well, but MAME has other advantages. If you are here just to read this part, feel free to jump to the corresponding section, otherwise, read on for the full journey.

Tooling

  • Ghidra loader: Implements its own ISO9660 parsing due to the “Initial Program” being stored in the first 15 sectors of the CD. As an alternative, memory dumps can also be loaded. Besides that, it does the usual memory mapping and entrypoint dissassembly;
  • Emulator: There’s a variety of them, but with distinct feature sets:
    • Mednafen: Debugger works, but the memory view isn’t implemented, which renders memory write and read breakpoints useless. However, I did use it for instruction tracing, since I was already familiar with that code. As usual, the goal was to maximize disassembled code and visually identify unreached blocks;
    • BizHawk: Memory view works, but no debugger implemented, so it was dropped;
    • Yabause: Despite reports of less compatibility with games, at least it does include a debugger and memory editor! Well, almost… I picked the Kronos fork, but instruction breakpoints were not working, even after configuring a debugging interpreter. Turns out that this fork was missing a call to the breakpoint hook, which is called on the uoYabause fork. It was just a matter of adding the call again;
    • MAME: Also has debugging covered, and would be my next choice if Kronos didn’t work out. Still, it was used for scripting;

First impressions

Under directory ./tiles/ we find distinct tile sets that are loaded depending on which ingame year was reached: 1900, 1950, or 2050. Headers are now split into their own files (.hed) instead of being bundled together with chunks (.dat).

But the header structure doesn’t match the “Sprite File Specification” section in the previously mentioned docs. Here’s a snippet from ImHex of the beginning and the end of file y2050ini.hed:

On these 0x10 sized entries, we can already identify some patterns (curiously packed as little-endian, despite the SH-2 being big-endian):

  • 0x00..0x02 is an index, likely also defined by another value at 0x02..0x04;
  • 0x0c..0x10 is the sum of previous values at 0x08..0x0a, which in turn is likely the start offset of the corresponding chunk;
    • File y2050init.dat has size 0xc20e1, which matches the values on the last header entry (0x0c12fb + 0x0de6 = 0xc20e1);

However, no sign of “Sprite Count”. Neither do these entries match “Sprite Metadata”.

What about the chunks? In y2050ini.dat, if we assume the first 2 bytes would be the first “Pixel Data Chunk”, we get Count = 0x9, which is fine, but Mode = 0x6 doesn’t match any known mode:

Let’s try something else.

Identifying how tiles are loaded

(All the following addresses are for the US version unless mentioned otherwise)

Yabause has a convenient VRAM viewer under “Debug > VDP1”, where we get info on loaded textures in this video processor, such as the memory address where they are placed:

We can also recognize the (wrapped) shape in the corresponding bytes by opening the “Memory View” and jumping to 0x05c00000 + 0x00071980 = 0x05c71980:

After placing a memory write breakpoint at one of these addresses (e.g. 0x05c719b0), we hit it at 0x0605173e:

06051734 62 e3           mov        dst_i,r2
06051736 32 4c           add        vdp1_dst,r2
06051738 63 73           mov        src_i,r3
0605173a 33 5c           add        chunk_src,r3
0605173c 61 30           mov.b      @r3,r1
0605173e 22 10           mov.b      r1,@r2 ; store tile byte in VDP1 VRAM

Which is contained in function 0x060516ba (named chunk_dcx_to_vdp1):

void chunk_dcx_to_vdp1(int vdp1_dst,int chunk_src,uint dcx_size)
{
  undefined uVar1;
  bool bVar2;
  byte bVar3;
  int src_i;
  int iVar4;
  uint dst_i;

  dst_i = 0;
  src_i = 0;
  do {
    while( true ) {
      if (dcx_size <= dst_i) {
        return;
      }
      bVar3 = *(byte *)(chunk_src + src_i) & 3;
      if (bVar3 != 0) break;
      iVar4 = (((int)*(char *)(src_i + chunk_src) & 0xffU) >> 2) + 1;
      while( true ) {
        src_i = src_i + 1;
        bVar2 = iVar4 < 1;
        iVar4 = iVar4 + -1;
        if (bVar2) break;
                    /* put tile byte */
        *(undefined *)(dst_i + vdp1_dst) = *(undefined *)(src_i + chunk_src);
        dst_i = dst_i + 1;
      }
    }
    if (bVar3 == 1) {
      iVar4 = (((int)*(char *)(src_i + chunk_src) & 0xffU) >> 2) + 1;
      while (bVar2 = 0 < iVar4, iVar4 = iVar4 + -1, bVar2) {
                    /* clear previous tile bytes */
        *(undefined *)(dst_i + vdp1_dst) = 0;
        dst_i = dst_i + 1;
      }
    }
    else {
      if (bVar3 != 2) {
        return;
      }
      iVar4 = (((int)*(char *)(src_i + chunk_src) & 0xffU) >> 2) + 1;
      src_i = src_i + 1;
      uVar1 = *(undefined *)(chunk_src + src_i);
      while (bVar2 = 0 < iVar4, iVar4 = iVar4 + -1, bVar2) {
        *(undefined *)(dst_i + vdp1_dst) = uVar1;
        dst_i = dst_i + 1;
      }
    }
    src_i = src_i + 1;
  } while( true );
}

Note that we can deduce argument dcx_size from the width and height reported in “Debug > VDP1”, and the “Memory View” bytes. It’s visually evident how many bytes the tile should take (32 * 17 = 0x220) and confirm that it matches the third argument’s value (passed in register r6) when we hit the breakpoint.

Although chunk_dcx_to_vdp1() doesn’t have any direct references to it, we can place a breakpoint at the return instruction rts at 0x06051776, then when we hit it, step into the caller function. Or, since we’re using SH-2, just look at the value of the “Procedure Register” (pr), since it will contain the return address used by jsr / bsr / rts instructions. In this case, it’s 0x0603d50a, which belongs to function 0x0603d46e (named chunk_dcx):

int chunk_dcx(short tile_i)
{
  undefined *puVar1;
  int dst_offset;
  uint uVar2;
  uint dcx_size;
  char *tile_entry;

  puVar1 = PTR_DAT_0603d554;
  if (((tile_i < 0) || (DAT_0603d498 <= tile_i)) || (PTR_w_hdr_meta_0603d49c[tile_i * 0x10 + 6] == '\x01')) {
    dst_offset = 0;
  }
  else {
    tile_entry = PTR_w_hdr_meta_0603d54c + tile_i * 0x10;
    dcx_size = ((int)tile_entry[1] & 0xffU) * ((int)*tile_entry & 0xffU) * 8 & 0xffff;
    uVar2 = (dcx_size + 0x1f & (uint)PTR_DAT_0603d550 & 0xffff) >> 5;
    if ((int)*(short *)PTR_DAT_0603d558 < (int)((int)*(short *)PTR_DAT_0603d554 + uVar2)) {
      *(undefined2 *)(tile_entry + 2) = 0;
    }
    else {
      *(short *)PTR_DAT_0603d554 = *(short *)PTR_DAT_0603d554 + (short)uVar2;
      *(short *)(tile_entry + 2) = DAT_0603d546 - *(short *)puVar1;
          /* Indirect call to handler, e.g. chunk_dcx_to_vdp1() */
      (**(code **)(tile_entry + 0xc))(PTR_vdp1_vram_cache_0603d55c + ((int)*(short *)(tile_entry + 2) & 0xffffU) * 0x20,*(undefined4 *)(tile_entry + 8),dcx_size);
    }
    dst_offset = (int)*(short *)(tile_entry + 2);
  }
  return dst_offset;
}

We now have more context on the arguments passed to the handler. But first, let’s look at the referenced data used to read tile_entry. w_hdr_meta is at address 0x06098ca8, which is the start of the following table of 0x10 sized entries:

Oh, so there’s more than one VDP1 VRAM handler that can be called. Entries with the same handler are packed together, so if we scroll down, we see the entries for chunk_dcx_to_vdp1() starting at 0x0609a688:

Let’s compare those 3 highlighted entries (named tile_entry) with the 3 entries (named hed_entry) in y2050.hed starting at 0x30:

Going back to the decompiled function, we can now piece together all parsed tile data:

  • The current .dat is loaded at 0x225560, and each chunk’s start offset is given by tile_entry[0x00..0x04] = 0x225560 + hed_entry[0x0c..0x10];
  • Number of bytes to decompress per chunk (dcx_size) is given by height * width * 8 = tile_entry[1] * tile_entry[0] * 8 = hed_entry[4] * 2**hed_entry[2] * 8;

Not shown above but can also be deduced:

  • File SC2000.DAT is an archive that contains MAIN.PAL at offset 0x13c0, which contains several palette sets. A tile will use one of these sets. The set index is stored at tile_entry[0xc] = hed_entry[0x6]. The palette format is RGB555 (i.e. 5 bits per channel), which was guessed from observing repeated values of 0x8000 (suggesting 2 bytes per entry), but it’s a standard format in Saturn, which we can convert to the more conventional ARGB8888;
  • The destination address is given as 0x25c00000, which is the VDP1 VRAM cache that is mirrored at 0x05c00000. There is an optional displacement added to this address, stored at tile_entry[0xa..0xc]

With all requirements to call chunk_dcx() identified, we place a breakpoint at the first instruction (0x0603d46e), run until we hit it, then dump the memory (on Kronos’ “Memory Editor”, by selecting tab “All” and pressing “Save Tab”), and also take note of the register values to load on a MAME driver that, for a given tile_i:

  1. Initializes all memory and registers as captured above, with tile_i passed as argument;
  2. Runs until it returns from handler chunk_dcx_to_vdp1();
  3. Stores the decompressed bytes in VDP1 VRAM to a file;
  4. Sets tile_i+=1 and goes back to step 1 until we iterated through all entries starting at w_hdr_meta;

After finishing these steps, we should have all tiles decompressed for a given .dat.

Writing a skeleton driver

Disclaimer: This implementation might not follow best practices, it’s just what I put together for hooking CPU emulation in a quick and dirty fashion!

Nevertheless, I wanted to minimize modifications on existing code, and also provide some flexibility on the state to emulate by passing different memory dumps when running MAME. So the overall approach should be:

  • Write a new CPU device that extends the existing SH-2 device used by the Saturn driver, so that we can run our own code at each instruction, just like UC_HOOK_CODE in Unicorn;
  • Allow specifying the memory dump as an argument when running MAME, to initialize both code and data. I decided to be very lazy here, storing the full memory range of 0x00000000..0x7fffffff from Kronos. Yeah, that’s 128M of mostly uneeded state, but we can load it in one go onto the CPU’s memory map using the cartridge loader.

Luckily for us, MAME has a very modular architecture, allowing us to simply declare various pre-defined devices through C++ templates and macros. Compared to plain CPU emulators, it brings these advantages:

  • Built-in graphical debugger;
  • Easier to recreate the target system’s context (e.g. reading/writing to memory regions or hardware registers that are handled by an already emulated device).

To figure out what’s actually required in a driver, it’s worth looking at a simple system, such as the Micom Mahjong.

Basic necessities

The MAMEDEV Wiki already points out 2 requirements:

  • An additional entry on the model list at ./src/mame/mame.lst:
    @source:sega/saturn_test.cpp
    saturn_test                     //
    
  • At the very bottom of our driver’s source file (./src/mame/sega/saturn_test.cpp), the system driver structure macro CONS:
    //    year, name,        parent,  compat, machine,     input,      class,      init,       company, fullname, flags
    CONS( 2000, saturn_test, 0,       0,      test_config, test_input, test_state, empty_init, "Test",  "Test",   MACHINE_NO_SOUND_HW )
    

Our class, along with the machine config function, declares the devices we’ll use:

class test_state : public driver_device
{
public:
    test_state(const machine_config &mconfig, device_type type, const char *tag)
        : driver_device(mconfig, type, tag)
        , m_maincpu(*this, "maincpu")
        , m_cart(*this, "cartslot")
        , m_cart_region(nullptr)
        , m_bank(*this, "cartbank")
    { }

    void test_config(machine_config &config);

private:
    virtual void machine_start() override;

    DECLARE_DEVICE_IMAGE_LOAD_MEMBER(cart_load);

    void test_map(address_map &map);

    required_device<cpu_device> m_maincpu;
    required_device<generic_slot_device> m_cart;
    memory_region *m_cart_region;
    required_memory_bank m_bank;
};

void test_state::test_config(machine_config &config)
{
    TEST_CPU(config, m_maincpu, XTAL(14'318'181)*2); // 28.6364 MHz
    m_maincpu->set_addrmap(AS_PROGRAM, &test_state::test_map);

    GENERIC_CARTSLOT(config, m_cart, generic_plain_slot, "test_cart");
    m_cart->set_width(GENERIC_ROM32_WIDTH);
    m_cart->set_endian(ENDIANNESS_BIG);
    m_cart->set_device_load(FUNC(test_state::cart_load));
    m_cart->set_must_be_loaded(true);
}

The CPU has an assigned memory region, which is initialized through a memory bank. This is required since the dumped state is loaded as a cart, which has it’s own dedicated memory region that needs to be copied to the other one when the machine starts (this can be checked in the debugger, which lists all existing memory regions):

DEVICE_IMAGE_LOAD_MEMBER(test_state::cart_load)
{
    uint32_t size = m_cart->common_get_size("cartrom");
    m_cart->rom_alloc(size, GENERIC_ROM32_WIDTH, ENDIANNESS_BIG);
    // FIXME: How to load as 32bit BE?
    m_cart->common_load_rom(m_cart->get_rom_base(), size, "cartrom");
    u8 *p = m_cart->get_rom_base();
    for (int i = 0; i < size; i += 4) {
        u8 p3 = p[i + 3];
        u8 p2 = p[i + 2];
        u8 p1 = p[i + 1];
        u8 p0 = p[i + 0];
        p[i + 0] = p3;
        p[i + 1] = p2;
        p[i + 2] = p1;
        p[i + 3] = p0;
    }

    return image_init_result::PASS;
}

void test_state::machine_start() {
    std::string region_tag;
    m_cart_region = memregion(region_tag.assign(m_cart->tag()).append(GENERIC_ROM_REGION_TAG).c_str());
    m_bank->configure_entry(0, m_cart_region->base());
    m_bank->set_entry(0);
}

As seen above, the address map is given by test_map(): one map for the dumped state’s full range, and some other default maps copied from the Saturn driver:

void test_state::test_map(address_map &map)
{
    map(0x00000000, 0x07ffffff).ram().bankr("cartbank").share("maincpu_share");
    map(0x20000000, 0x27ffffff).ram();
    map(0x40000000, 0x46ffffff).nopw(); // associative purge page
    map(0x60000000, 0x600003ff).nopw(); // cache address array
    map(0xc0000000, 0xc0000fff).ram(); // cache data array, Dragon Ball Z sprites relies on this
}

We don’t care about user input:

static INPUT_PORTS_START( test_input )
INPUT_PORTS_END

Neither do we load any BIOS or other pre-defined ROMs:

ROM_START( saturn_test )
ROM_END

Doesn’t look like much? That’s because we’ll encapsulate all our scripting logic in the CPU device. It’s listed with the other SH devices:

diff --git a/scripts/src/cpu.lua b/scripts/src/cpu.lua
index e40c49cc94f..46be249ee37 100644
--- a/scripts/src/cpu.lua
+++ b/scripts/src/cpu.lua
@@ -879,6 +879,8 @@ if CPUS["SH"] then
                MAME_DIR .. "src/devices/cpu/sh/sh4dmac.cpp",
                MAME_DIR .. "src/devices/cpu/sh/sh4dmac.h",
                MAME_DIR .. "src/devices/cpu/sh/sh4regs.h",
+               MAME_DIR .. "src/devices/cpu/sh/test_cpu.cpp",
+               MAME_DIR .. "src/devices/cpu/sh/test_cpu.h",
        }
 end

Instruction hook

While memory hooks can be set on memory maps, unfortunately running a hook on each executed instruction isn’t directly exposed. We can however modify device_execute_interface::debugger_instruction_hook() to allow our CPU device to override it with whatever logic we want:

diff --git a/src/emu/diexec.h b/src/emu/diexec.h
index 5d3fb93ed59..9bf664e9b5f 100644
--- a/src/emu/diexec.h
+++ b/src/emu/diexec.h
@@ -226,7 +226,7 @@ protected:

        // debugger hooks
        bool debugger_enabled() const { return bool(device().machine().debug_flags & DEBUG_FLAG_ENABLED); }
-       void debugger_instruction_hook(offs_t curpc)
+       virtual void debugger_instruction_hook(offs_t curpc)
        {
                if (device().machine().debug_flags & DEBUG_FLAG_CALL_HOOK)
                        device().debug()->instruction_hook(curpc);

This hook seems to be called by all CPU devices on their execute_run() method, in our case, for sh2_device:

void sh2_device::execute_run()
{
    if ( m_isdrc )
    {
        execute_run_drc();
        return;
    }
    // ...
    do
    {
        debugger_instruction_hook(m_sh2_state->pc);

        const uint16_t opcode = m_decrypted_program->read_word(m_sh2_state->pc >= 0x40000000 ? m_sh2_state->pc : m_sh2_state->pc & SH12_AM);

        if (m_sh2_state->m_delay)
        {
            m_sh2_state->pc = m_sh2_state->m_delay;
            m_sh2_state->m_delay = 0;
        }
        else
            m_sh2_state->pc += 2;

        execute_one(opcode);

        if(m_test_irq && !m_sh2_state->m_delay)
        {
            CHECK_PENDING_IRQ("mame_sh2_execute");
            m_test_irq = 0;
        }
        m_sh2_state->icount--;
    } while( m_sh2_state->icount > 0 );
}

Other options that could be considered from the above snippet:

  • execute_run_drc() would execute instructions under the dynamic recompiler, so they wouldn’t pass through the interpreter. It didn’t seem very straighforward to me in terms of changes required, so our driver disables it;
  • execute_one() is not virtual, and also specific to sh_common_execution, so not as interesting as a general hook;

Initializing state

As mentioned above, we start by disabling the dynamic recompiler:

DEFINE_DEVICE_TYPE(TEST_CPU,  test_cpu_device,  "testcpu",  "TestCPU")

test_cpu_device::test_cpu_device(const machine_config &mconfig, const char *tag, device_t *owner, uint32_t clock)
: sh2_device(mconfig, TEST_CPU, tag, owner, clock, CPU_TYPE_SH2, address_map_constructor(FUNC(sh2_device::sh7604_map), this), 32)
{
    // FIXME: Pass as config option
    set_force_no_drc(true);
    m_isdrc = allow_drc();
}

Now for the registers set at the decompression function’s entry. We can see how they are represented in ./cpu/sh/sh.h:

enum
{
    SH4_PC = 1, SH_SR, SH4_PR, SH4_GBR, SH4_VBR, SH4_DBR, SH4_MACH, SH4_MACL,
    SH4_R0, SH4_R1, SH4_R2, SH4_R3, SH4_R4, SH4_R5, SH4_R6, SH4_R7,
    SH4_R8, SH4_R9, SH4_R10, SH4_R11, SH4_R12, SH4_R13, SH4_R14, SH4_R15, SH4_EA, SH4_SP
};

void sh_common_execution::device_start()
{
    // ...
    state_add(SH4_PC, "PC", m_sh2_state->pc).formatstr("%08X").callimport();
    state_add(SH_SR, "SR", m_sh2_state->sr).formatstr("%08X").callimport();
    state_add(SH4_PR, "PR", m_sh2_state->pr).formatstr("%08X");
    // ...
};

So when our device resets, we can modify that state, ensuring that the first argument (passed in register r4) contains the chunk / tile index we want to decompress:

void test_cpu_device::state_reset(u64 test_i) {
    osd_printf_verbose("test_i=0x%08X\n", test_i);
    set_state_int(SH4_R0,   0x00000000);
    set_state_int(SH4_R1,   0x00000008);
    set_state_int(SH4_R2,   0x00000000);
    set_state_int(SH4_R3,   0x06098CA8);
    set_state_int(SH4_R4,   test_i);
    set_state_int(SH4_R5,   0x06001EDC);
    set_state_int(SH4_R6,   0x00000000);
    set_state_int(SH4_R7,   0x00000000);
    set_state_int(SH4_R8,   0x0000002B);
    set_state_int(SH4_R9,   0x00000020);
    set_state_int(SH4_R10,  0x0000030B);
    set_state_int(SH4_R11,  0x060A6100);
    set_state_int(SH4_R12,  0x060A1630);
    set_state_int(SH4_R13,  0x0000030B);
    set_state_int(SH4_R14,  0x0609BD58);
    set_state_int(SH4_R15,  0x06001EB0);
    set_state_int(SH_SR,    0x00000101);
    set_state_int(SH4_GBR,  0x00000000);
    set_state_int(SH4_VBR,  0x00000000);
    set_state_int(SH4_DBR,  0x00000000);
    set_state_int(SH4_MACH, 0x00000000);
    set_state_int(SH4_MACL, 0x000007D0);
    set_state_int(SH4_PR,   0x0603DDD8);
    set_pc(0x0603d46e);
}

void test_cpu_device::device_reset() {
    sh2_device::device_reset();
    state_reset(0);
}

The state will also be reset by our instruction hook when:

  • We run chunk_dcx() up to the VDP1 handler call, but it’s not the handler we want;
  • Or the decompression finishes;

In either case, we move to the next table entry. Our stop condition is hardcoded to a maximum of 0x800 entries.

void test_cpu_device::debugger_instruction_hook(offs_t curpc)
{
    //osd_printf_verbose("PC: 0x%08X\n", pc());
    if (pc() == 0x0603d506) { // jsr r2 (jump to VDP1 handler)
        if (state_int(SH4_R2) != 0x060516ba) { // not the handler we want
            test_i++;
            state_reset(test_i);
        }
    }
    else if (pc() == 0x060516ba) { // mov.l r14,@-r15=>local_4 (chunk_dcx_to_vdp1() entry)
        vdp1_dst  = state_int(SH4_R4);
        chunk_src = state_int(SH4_R5);
        dcx_size  = state_int(SH4_R6);
        osd_printf_verbose("0x%08X 0x%08X 0x%08X\n", vdp1_dst, chunk_src, dcx_size);
    }
    else if (pc() == 0x0603d51c) { // rts (chunk_dcx() end)
        // FIXME: How to load as 32bit BE?
        // FIXME: Assuming sizes are multiple of 4...
        memory_share *shr = machine().root_device().memshare("maincpu_share");
        u8 *p = reinterpret_cast<uint8_t *>(shr->ptr());

        // Dump decompressed chunk
        std::stringstream ss;
        ss << "0x" << std::hex << chunk_src;
        std::ofstream f("chunks/" + ss.str(), std::ios::binary | std::ios::out);
        u8 m[4];
        for (int i = 0; i < dcx_size; i += 4) {
            // Destination is cache address, convert to mirrored address
            u8 p3 = p[vdp1_dst - 0x20000000 + i + 3];
            u8 p2 = p[vdp1_dst - 0x20000000 + i + 2];
            u8 p1 = p[vdp1_dst - 0x20000000 + i + 1];
            u8 p0 = p[vdp1_dst - 0x20000000 + i + 0];
            m[0] = p3;
            m[1] = p2;
            m[2] = p1;
            m[3] = p0;
            f.write(reinterpret_cast<const char*>(m), 4);
        }
        f.close();

        if (test_i < 0x800) {
            test_i++;
            state_reset(test_i);
        } else {
            exit(123);
        }
    }
    device_execute_interface::debugger_instruction_hook(curpc);
}

It lives!

After compiling the driver, we prepare the dumped state to contain a given .dat file’s contents at 0x225560 and its corresponding table entries at w_hdr_meta. A shell script glues everything together.

That’s it for the driver. You can also find it on a branch. Now, let’s check the decompressed tiles…

Where are you?

Unfortunately, we don’t find any Sonic statue on these tiles, just the boring statue:

Well, at least on the US and EU versions, which have the exact same checksums on .dat files. However, the Japan version did have differences. Sure, why not give it a try:

Interesting, the boring statue was replaced! We can confirm it by looking up a cheat code to unlock the statue. The Japan version’s code can be deduced by looking up byte signatures at the modified addresses in the US version, since those bytes are the same across versions:

  • Enable Rewards = \x00\x00\x00\x00\x05\x00\x00\x05\xff\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x1a\x00\x00\x00\x00\x00\x00\x08\x99
  • Build statue = \x00\x00\x00\x00\x00\x00\x00\x1c\xff\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x6e\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x06\x00\x00\x04\xbc\x00\x00\x08\xd1

We end up with these GameShark codes:

f6000914 c305
b6002800 0000
1606f0ee 0001
1606fa7e 00ff

Here’s how it looks ingame:

Parsing .ani

There’s another set of graphics related to tiles under directory ./anim/, which are pre-rendered animations played on the query window for a selected tile. They have their own file format, which is simple enough to deduce from looking at the contents (for implementation details, see the parsing script):

Yes, they are stored uncompressed. Yes, all tile animations take up around 146M.

The reason may be linked to another property: each palette and frame is aligned to the nearest offset multiple of 0x800 = 2048, padded by null bytes, as shown above in black. This happens to be the same size as the “Data” sector of “Yellow Book Mode 1”, the same format used by the data track of a Saturn CD.

If we do the same trick of placing a memory write breakpoint on the VDP1 address containing a given frame, we don’t actually find that frame loaded anywhere else in memory. So, are animations transferred directly from sectors? Maybe decompressing frames on-the-fly would have been too slow.

Let’s confirm it by looking up the animation for the “Suspension Bridge”, which is the first one stored in the file with a few duplicates:

In Kronos, CD block functions are logged with macro CDLOG. For some reason debug logging wasn’t being enabled during build, so I just did it in the source:

diff --git a/yabause/src/sys/memory/src/cs2.c b/yabause/src/sys/memory/src/cs2.c
index 34c494e6..25c0fc17 100644
--- a/yabause/src/sys/memory/src/cs2.c
+++ b/yabause/src/sys/memory/src/cs2.c
@@ -69,6 +69,7 @@
 #define CDB_PLAYTYPE_FILE       0x02

 // #define CDLOG YuiMsg
+#define CDLOG printf

 extern void resetSyncVideo(void);

With the query window active, we continuously get output for sector reads, here’s a snippet:

cs2     : Command: getSectorNumber 0bc4 5100 0000 0200 0000
cs2     : ret: 0bc5 0400 0000 0000 00c6
cs2     : Command: calculateActualSize 0b84 5200 0000 0200 0001
Cs2Area->calcsize = 1024
cs2     : Command: getActualSize
cs2     : ret: 0bc5 0400 0400 0000 0000
cs2     : Command: getSectorData 0b44 6100 0000 0200 0001
cs2     : ret: 0bc7 0400 4101 0100 ac9d
cs2     : Command: endDataTransfer 0bc4 0600 0000 0000 0000
cs2     : ret: 0bc5 0400 0400 0000 0000
cs2     : Command: deleteSectorData 0b44 6200 0000 0200 0001
Free Block
cs2     : ret: 0bc5 0400 4101 0100 ac9d
cs2     : Command: getSectorNumber 0bc4 5100 0000 0200 0000
cs2     : ret: 0bc5 0400 0000 0000 00c5
Effective Read ac9d
partition number = 2 blocks = 198 blockfreespace = 2 fad = ac9e playpartition->size = 63000 isbufferfull = 0 IRQMAsk 0
Effective Read ac9e
partition number = 2 blocks = 199 blockfreespace = 1 fad = ac9f playpartition->size = 63800 isbufferfull = 0 IRQMAsk 0
Effective Read ac9f
partition number = 2 blocks = 200 blockfreespace = 0 fad = aca0 playpartition->size = 64000 isbufferfull = 1 IRQMAsk 0
BUFFER IS FULL

Ok, so how do we correlate this to a file in the CD image’s data track? Well, first let’s see how these values are computed. One of the output lines comes from ./yabause/src/sys/memory/src/cs2.c:

void Cs2Exec(u32 timing) {
    // ...
    case CDB_STAT_PLAY:
    {
       partition_struct * playpartition;
       CDLOG("Effective Read %x \n", Cs2Area->FAD);
       int ret = Cs2ReadFilteredSector(Cs2Area->FAD, &playpartition);
       // ...
    }
    // ...
}

The “Frame Address” (FAD) is initialized for the data track on ./yabause/src/utils/src/cdbase.c:

static int LoadISO(FILE *iso_file)
{
   // ...
   disc.session[0].fad_start = 150;
   // ...
   fseek(iso_file, 0, SEEK_END);
   track->file_size = ftell(iso_file);
   // ...
   if (0 == (track->file_size % 2048))
      track->sector_size = 2048;
   else if (0 == (track->file_size % 2352))
      track->sector_size = 2352;
   // ...
   disc.session[0].fad_end = track->fad_end =
       disc.session[0].fad_start + (track->file_size / track->sector_size);
   // ...
}

In “Disc Format Standards Specification Sheet” (ST-040-R4-051795), we see how FAD relates with the different areas of a CD image:

We also get the expression to calculate the “Logical Sector Number” (LSN), which is LSN = FAD - 150. This is where our data starts, so if we pick one of the logged FAD values, such as 0xac40, we have (0xac40 - 150) * 2352 = 0x62929e0. Note that we multiply with 2352 since our CD image is in BIN/CUE format, so it doesn’t contain just the 2048 bytes per sector of an ISO image.

Let’s check that offset:

So far so good, we can skip the first 12 bytes of the “Sector Synchronization” and the 4 bytes of the “Header”, and then we get some repeated values that are found on .ani files. Let’s scroll down to pick something more unique to use as a byte signature:

Searching for that signature with binwalk -R '\xB7\xCB\x35\xCB\xC8\x96\x63\xAE\x35\x29\x35\x35\x35\xC6\xAF\xBB\x36\x74\x74\x74' ./anim/anim2050.ani outputs these offsets:

  • 0xC80
  • 0x3D480
  • 0x79C80
  • 0xB6480
  • 0xF2C80

We can pick the last offset and view it in Binxelview, confirming that the stored frame matches the one transferred to the query window:

Animation placeholder

Once again, no animation for a Sonic statue in the US or EU versions, but we don’t leave empty-handed: in between traffic animations there’s this placeholder:

This time we can just follow a reference to string ANIM%04d.ANI, ending up on function 0x0604ce70 (named loads_ani):

int loads_ani(short map_x,short map_y)
{
    // ...
    map_4x = map_x * 4;
    map_alti_val = *(ushort *)(map_y * 2 + *(int *)(PTR_map_altitude_0604cfd0 + map_4x)) & 0x1f;
    map_1y = (int)map_y;
    ani_val_pre1 = (short)*(char *)(map_1y + *(int *)(PTR_map_tile_0604cfd4 + map_4x)) & 0xff;
    map_zone_val = (short)*(char *)(map_1y + *(int *)(PTR_map_zone_0604cfd8 + map_4x)) & 0xf;
    // ...
    if (ani_val_pre1 == 0) {
      ani_val = *(ushort *)(PTR_entity_idx_0604d0d0 + (short)((short)*(char *)(*(int *)(map_entity + map_x * 4) + map_1y) & 0xff) * 2);
    }
    else {
      ani_val = ani_val_pre1;
    }
    // ...
    ani_idx = (*(code *)PTR_parse_ani_idx_0604d170)((int)(short)ani_val);
    if (ani_idx == -1) {
      while (uVar6 = uVar6 & 0xff, uVar6 != 0) {
        (*(code *)PTR_FUN_0604d174)();
        map_1y = (*(code *)PTR_FUN_0604d178)(0,auStack120);
        if (map_1y == 9) {
          uVar6 = 0;
        }
      }
    }
    else {
      // ...
      (*(code *)PTR_vfprintf_0604d258)(ani_fmtstr);
      // ...
    }
    // ...
}

By placing a breakpoint at the first instruction, we confirm it’s always hit whenever we open the query window. The 2 arguments are the tile’s map coordinates. There are a few tables containing tile data, but for now, our focus is on map_tile. For most cases, ani_val_pre1 != 0, so it’s passed as argument to function 0x0604e290 (named parse_ani_idx):

int parse_ani_idx(short param_1)
{
  int iVar1;
  if ((param_1 < 0x6a) || (ani_idx_0xff < param_1)) {
    if (param_1 == 0xd) {
      iVar1 = (int)ani_penul_idx;
    }
    else if (param_1 == ani_idx_0x17c) {
      iVar1 = (int)ani_last_idx;
    }
    else if ((param_1 < 0x51) || (0x5b < param_1)) {
      iVar1 = -1;
    }
    else {
      iVar1 = (int)(short)(param_1 + -0x51);
    }
  }
  else {
    iVar1 = (int)(short)(param_1 + -0x51);
  }
  return iVar1;
}

For example, on map “Happyland”, we’ll look for a railroad, since its animation comes right before the placeholder. The expected tile value ani_val_pre1 is 10 + 0x51 = 0x5b, given by:

ani_val_pre1 = (short)*(char *)(map_1y 
    + *(int *)(PTR_map_tile_0604cfd4 + map_4x)) 
    & 0xff;

At address 0x0607b5c0 we find table map_tile. It’s a pointer table, where each entry corresponds to an x-coordinate, and its referenced list contains all y-coordinate values. A railroad tile can be found at coordinates (0x56,0x0d). Let’s follow some references:

0x0607b5c0 + (0x56 * 4) = 0x607b718 => 0x060ccec0  # map_tile[map_x * 4]
0x060ccec0 + 0x0d = 0x060ccecd                     # map_tile[map_x * 4][map_y]

Indeed, there’s some 0x5b values:

060ccec0: 0000 0000 0000 0000 0033 2c3e 315b 5b5a  .........3,>1[[Z
060cced0: 5b5b 2f3c 2c35 0000 0000 0000 0000 0000  [[/<,5..........

If we replace with 5f 5e 5d 5c:

Probably they considered having animations for more traffic tiles at some point, but didn’t make them, so they added the check (0x5b < param_1) to return -1 instead of a valid animation index.

Let’s consider replacing the last branch to is_invalid with a no-op instruction:

                     LAB_0604e2c0
                     XREF[1]:     0604e2b8(j)
0604e2c0 65 4f           exts.w     r4,r5
0604e2c2 e2 51           mov        #0x51,r2
0604e2c4 35 23           cmp/ge     r2,r5
0604e2c6 8b 06           bf         is_invalid
0604e2c8 e2 5b           mov        #0x5b,r2
0604e2ca 35 27           cmp/gt     r2,r5
0604e2cc 89 03           bt         is_invalid
0604e2ce 65 4f           exts.w     r4,r5
0604e2d0 75 af           add        -0x51,r5
0604e2d2 a0 01           bra        LAB_0604e2d8
0604e2d4 65 5f           _exts.w    r5,r5
                     is_invalid
                     XREF[2]:     0604e2c6(j), 0604e2cc(j)
0604e2d6 e5 ff           mov        #-0x1,r5
                     LAB_0604e2d8
                     XREF[4]:     0604e2a2(j), 0604e2ae(j),
                                  0604e2bc(j), 0604e2d2(j)
0604e2d8 00 0b           rts
0604e2da 60 53           _mov       r5,r0

By applying patch 0x0604e2cc = 00 09, then querying one of the highway entrances, we load the placeholder animation:

Tile glitches

Keeping up with the theme of my previous writeup, here’s a fun oversight: apparently some tiles were corrupted during compression, e.g. the sail boat is mixed with the Loch Ness monster:

This isn’t observed ingame since the sail boat isn’t rendered when we zoom out. Probably they couldn’t be bothered with a proper fix…

However, it’s possible to still load these zoomed out tiles by reusing the same trick we used for the animation, although we’ll modify table map_entity, because when the tile value is 0 (i.e. water), ani_val is derived from an entry in table entity_idx:

int loads_ani(short map_x,short map_y)
{
    // ...
    if (ani_val_pre1 == 0) {
      ani_val = *(ushort *)(PTR_entity_idx_0604d0d0
              + (short)((short)*(char *)
                  (*(int *)(map_entity + map_x * 4) + map_1y) & 0xff)
              * 2);
    }
    // ...
}
                     entity_idx
                     XREF[16]:    loads_ani:0604d0aa(R), ...
0606d970 01 00           dw         100h
0606d972 01 01           dw         101h
0606d974 01 02           dw         102h
; ...
0606db06 01 7c           dw         17Ch

We want the value in map_entity to be (0x0606db06 - 0x0606d970) // 2 = 0xcb, so that ani_val = 0x17c, since ani_last_idx = 0xb0 = 176 is the animation index assigned for the sail boat:

int parse_ani_idx(short param_1)
{
    // ...
    else if (param_1 == ani_idx_0x17c) {
      iVar1 = (int)ani_last_idx;
    }
    // ...
}

For example, on map “Bahamas”, for the tile at coordinates (0x6f,0x1f), we follow these references:

0x0607b9c0 + 0x6f * 4 = 0x607bb7c => 0x060d5b60
0x060d5b60 + 0x1f = 0x60d5b7f

Then apply patch 0x60d5b7f = cb. This adds the sail boat graphics, but the entity itself isn’t actually initialized: the boat doesn’t move around, the query window’s title still says “Fresh Water”, but crucially, the zoom out check isn’t applied, so we get to see those green lines from the Loch Ness monster:

Conclusion

At this point I decided to throw in the towel, since we no longer have any hints to follow. Sure, there’s some interesting code paths related with cheats that are worth checking (e.g. ingame button actions are handled at 0x060102fa, which calls 0x06010f80 to check the “City Ordinances” options bitmap at 0x0607af30 that activates the slot machine cheat), but our chances are slim:

  • If there’s an easter egg for the Sonic statue in US and EU versions, it was not considered for the Japan version, and they decided to not include the data along with all the other tiles + animations;
  • If we take a byte signature for the statue’s tile (e.g. in y2050ini.dat the offset is 0x2a0e2e - 0x225560 = 0x7b8ce) and search in the CD image’s data track, we get 3 results (1 for each tile set) on the Japan version, none on the other versions;
  • In y*upg.dat we find a subset of tiles from y*ini.dat, so nothing new there;
  • Other files in ./anim/ and ./bitmaps/ don’t have relevant differences;