Decrushing DOS Doodads
Reading about unused content in video games lead me to look into one that I played many years ago. We will explore the file formats it used for storing graphics and text, starting with black-box approaches, then tackling the trickier parts by disassembling the game’s executable.
How it started
Mission Critical was a sci-fi adventure game for MS-DOS. The box cover featured this pre-release screenshot, which was also available in a slideshow demo:
One of the more striking differences are in two items: there’s a welding torch and a spray that were redesigned in the final game.
I always wondered: could these and other unused graphics still be in the game?
Identifying the right files
There are multiple files with unfamiliar extensions, such as .PIC, .FNT, .Q
. Probably .PIC
stands for “picture”, but which one would contain the items? Tools such as file
or binwalk
do not recognize these formats, and in most cases there are no apparent magic byte sequences to aid us.
We can start by checking which files are open by the game at a given time. Filesystem operations under MS-DOS are similar to what we would find in Windows or Linux. Instead of using procmon
or strace
to trace these calls, we use a debugger. A straightforward way to run MS-DOS executables is to emulate them under DOSBox
, which comes with its own debugger. By default, it logs open file operations. After starting the game and loading a save file:
608656: FILES:file open command 0 file C:\DOS\MISSION\MISSION.EXE
9428268: FILES:Special file open command 10 file AUTORUN.LOC
9454641: FILES:file open command 0 file LEGEND.INI
9552144: FILES:file open command 0 file MDI.INI
9578906: FILES:file open command 0 file SBPRO2.MDI
9588505: FILES:file open command 0 file SBPRO2.MDI
9597085: FILES:file open command 0 file SBPRO2.MDI
9648105: FILES:file open command 0 file SAMPLE.OPL
9972960: FILES:file open command 0 file SF.XMI
10134070: FILES:file open command 0 file DIG.INI
10160307: FILES:file open command 0 file SB16.DIG
10170579: FILES:file open command 0 file SB16.DIG
10179039: FILES:file open command 0 file SB16.DIG
10225938: FILES:file open command 0 file D:\MISSION\MC002.VOC
15273849: FILES:file open command 0 file object.dat
15287650: FILES:file open command 0 file MCSTR.DAT
15318919: FILES:file open command 0 file MC001.FNT
15554514: FILES:file open command 0 file C:\DOS\MISSION\MC001.PIC
19935397: FILES:file open command 1 file C:\DOS\MISSION\RESTART.DAT
38381383: FILES:file open command 0 file C:\DOS\MISSION\MC000.SAV
38404344: FILES:file open command 0 file C:\DOS\MISSION\MC001.SAV
38429677: FILES:file open command 0 file C:\DOS\MISSION\MC002.SAV
38452868: FILES:file open command 0 file C:\DOS\MISSION\MC003.SAV
38476938: FILES:file open command 0 file C:\DOS\MISSION\MC005.SAV
99316233: FILES:file open command 0 file C:\DOS\MISSION\MC003.SAV
99482009: FILES:file open command 0 file D:\MISSION\MC002.VOC
126593126: FILES:file open command 0 file MC010.FNT
126746260: FILES:file open command 0 file C:\DOS\MISSION\MC001.PIC
126769437: FILES:file open command 0 file MC002.FNT
130058967: FILES:file open command 0 file MC003.RGN
130265260: FILES:file open command 0 file C:\DOS\MISSION\MC003.PIC
145612958: FILES:file open command 0 file MC001.FNT
Let’s rule out sound files (due to their magic bytes and extensions): .DIG, .MDI, .OPL
; config files (plaintext): .LOC, .INI
; save files (they match how many saved games I had): .SAV
.
There’s an elevator in-game that takes you to several floors. It helps seeing what gets loaded when you arrive at each one:
- Floor 2:
MC002.PIC, MC003.PIC, MC001.PIC
; - Floor 3:
MC002.PIC, MC004.PIC, MC001.PIC
; - Floor 4:
MC002.PIC, MC005.PIC, MC001.PIC
.
Some common .PIC
files are open in all of these floors. Now we have some candidates to inspect.
Visualizing byte clusters
Faced with unknown file formats, if they are:
- archives,
strings
could gives us matches for filename entries, with small offsets between them. Otherwise, we could try identifying increasing values in metadata: offsets of file blocks; - bitmaps, we can observe in a hex dump sequences of 1 byte (Monochrome, each pixel encoded per bit), 3 bytes (RGB), or 4 bytes (RGBA), which would have the same or close values for regions of an image that are colored the same or with gradients. We should have as many of these sequences as there are pixels in the image (i.e.
width * height
pixels); - compressed / encrypted, there is a high entropy of byte values, as these algorithms shouldn’t generate sequences of bytes with the same values.
These patterns can be visualized with binvis.io, an online tool that not only colors distinct byte ranges, but also arranges them in a Hilbert space-filling curve, which by preserving locality, makes clusters evident.
MC001.PIC
:
MC003.PIC
:
The following clusters can be observed in order:
- Sparse values at the beginning (most likely metadata);
- Padding (null bytes);
- Groups of:
- Low valued blocks;
- Padding;
- High entropy blocks.
In the hex dump of MC001.PIC
, we can see 3 byte sequences in the low valued block. It’s much smaller than the compressed block, so not bitmaps. Maybe it’s a palette? We can confirm that by setting a max value for blue on each sequence (diff against the original file shown with dhex
):
Which does result in a blue tinted dialog:
Parsing .PIC
We can tell that these files are archives, since most of them contain multiple palettes followed by compressed chunks, so the metadata should contain offsets to data chunks.
However, we don’t know if they are absolute offsets or relative to the metadata entries. There’s also endianness to account for. To reduce the guessing involved, we can handle the aforementioned cases with some shell scripting. I took the first palette offsets in MC003.PIC
(they all seem to start with 00 00 00 03 00 00 00
, but I wasn’t sure if the first bytes were padding, so I picked offsets for 03 00 00 00
): 0x1403, 0x1282b, 0x2d573, 0x418e9
, and run:
for i in 1403 01282b 02d573 0415e9; do
# iterate offset range
for j in $(seq $((0x$i-3)) $((0x$i+3))); do
# zero-pad odd sized hex values
j=$(printf '%X\n' "$j" | sed 's/^\(.\(..\)*\)$/0\1/g')
# iterate endianness
for k in "$j" "$(printf '%s' "$j" | tac -rs ..)"; do
binwalk -R "$(printf '%s' "$k" | sed 's/\(..\)/\\x\1/g')" MC003.PIC
done
done
done | sort -V | awk '/0x[0-9A-Fa-f]+/{printf "%8s %s\n", $2, $5}'
Which returned these matches (filtered those that extended beyond the metadata, i.e. after 0x1400
):
0x0 (\x00\x14)
0x1 (\x14\x00)
0x28 (\x28\x28\x01)
0x50 (\x70\xD5\x02)
0x8D (\x14\x04)
0xB4 (\xE6\x15\x04)
0x2BE (\x14\x00)
0x6B1 (\x00\x14)
0x6B2 (\x14\x00)
0x997 (\x01\x14)
0x998 (\x14\x01)
We can tell the first match starts at 0x0
, since the second offset at 0x28
doesn’t have the additional off-by-one match at 0x29
. Let’s also rule out matches for the same pattern if they already appeared before:
0x0 (\x00\x14)
0x28 (\x28\x28\x01)
0x50 (\x70\xD5\x02)
0xB4 (\xE6\x15\x04)
All offsets were found! It seems they do start with 00 00 00 03
(due to matching 3 bytes before the given offsets), and are encoded in little-endian.
After highlighting these offsets (magenta) and some common patterns (blue) in a hex dump:
The entries are somewhat scattered. However, we can assume they have a constant length of 0x14
, since strictly increasing offsets can still be observed for each first 4 bytes. This means multiple compressed blocks can use the same palette entry.
Decompressing graphics
By taking the differences in offsets we discovered, we know each compressed block length (if a palette is expected, we discount 0x304
bytes, since next_offset = start_offset + 0x304 + compressed_size
, where compressed_size
is the value at entry[0x4:0x8]
).
But what was used to generate these blocks? Running strings
on MISSION.EXE
gives us a hint:
Crusher! Data Compression Toolkit Version 3.0
Luckily, I came across a compatible interface for this library. After extending it with some additional functions, I wrote a bare-bones CLI to easily compress and decompress in this format.
The decompression API takes both the compressed and decompressed sizes. To identify the latter, we can use the same approach as before: search for metadata offsets that contain either width * height
values or separate values for width
and height
. Let’s take note of some image sizes from an in-game screenshot:
Going back to the hex dump above, highlighted in blue there are little-endian values 0x280, 0x120
which happen to match the width and height of the background image. With all variables identified, we can decompress a chunk, and here’s part of the hex dump of the first one in MC003.PIC
:
00000000: 4848 4848 4848 4848 4848 4848 4a63 b244 HHHHHHHHHHHHJc.D
00000010: 7444 b263 446a 4c4a 554a 4c4a 4c4a 4c6b tD.cDjLJUJLJLJLk
00000020: 4a4c 6b4a 6b4a 6b4a 6b4a 6b06 6b6b 066b JLkJkJkJkJk.kk.k
00000030: 6b06 536b 5306 5353 5353 7663 53f2 534c k.SkS.SSSSvcS.SL
Repeated bytes suggest the decompression worked. Using gimp
, we can open this file as Raw Image Data
, and get a recognizable background image:
Since we have palettes, these decompressed bytes must be indexed palette values. Converting them directly results in a too dark image, because the palette contains VGA colors, which are encoded only in 6 bits. After converting them to 8 bits, the values get larger, resulting in a brighter accurate image:
Unused items
So, the big question: are the early icons present? Sadly not, however…
This item never appears in-game! Which reminds me of this scene:
Fits like a glove, however we need to figure out how to create a masked image.
Masking
Testing with the first item (we already know how it should look like), one of the blend modes that somewhat fit was difference. While shadows looked fine, some parts should have been added, such as the reflection in the surface, which is lighter in the expected image. So either signed addition or xor could work, and the latter was the best fit. Still, some colors were incorrect (like the bottom red detail):
Seemed odd that xor would work fine with greyscale values but not colors… unless we should xor palette indexes. It made sense from a computational point of view: why waste resources converting the indexes to colors for both pictures and then applying xor to a larger number of bytes, when you could just xor the indexes and then convert that? Indeed that was the case, and after parsing all the needed masks and positions (the latter were contained in the 0x14
sized metadata entries), we get an accurate recreation of the cabinet with the third unused item:
After wasting a lot of time looking at these graphics, I noticed an unintended glitch in the cabinet surface:
This was introduced by the mask of the middle item, which for some reason includes lines for the surface, which happens to cause a misalignment in the dithering when applied:
Parsing .FNT
Satisfied with this discovery, I moved on to other file formats. The fonts were straightforward, since they were encoded as bitmaps:
Note that each line is encoded in 2 bytes (2 * 8 bits
). We can get the metadata size (from 0x0
to the start of the exclamation point, so hex(2 * 8 * 69 // 8) = 0x8a
) and data size for each character (hex(2 * 8 * 11 // 8) = 0x16
). The metadata includes kerning data encoded as width values for each character (they matched the number of characters and were always in range [0x0..0x10]
).
To better illustrate this structure, here’s a highlighted hex dump, with the following colors applied by field:
- magenta: metadata header
- brown: character widths
- green: 1st character
- cyan: 2nd character
Putting all this together allows us to render these characters with kerning applied:
Parsing MCSTR.DAT
Figuring out how text was stored interested me much more, to see if there was some description for the unused item.
Clusters:
- Sparse values at the beginning (again, metadata);
- Some mixture of low and high (sometimes
0xff
) values; - ASCII valued block (plaintext words);
- High entropy blocks.
Here’s where I got stuck. Some of the metadata entries seemed to contain sizes and offsets to compressed blocks, but I just couldn’t get decompression to work. The parsing process needed to be closely inspected, so it was time to reverse the executable.
Disassembly
Due to the particular executable format at hand (a LE
variant of linear executable, extended with DOS/4GW
), we need to follow in these footsteps:
- Unbind and discard the extender with
SB /U MISSION.EXE
, retrieving the original executable; - Disassemble with
IDA 4.1
, import the database withIDA 5.0
.
To identify the right subroutines to analyze, let’s go back to the debugger.
We want to break whenever an open file operation is called. In MS-DOS, instead of syscalls, we directly call interrupts, such as OPEN EXISTING FILE. Nevertheless, they follow the same architecture calling conventions: registers are used to pass arguments and store results.
The above reference can be looked up to get the interrupt number and AH
register value to set a breakpoint: BPINT 21 3D
. Now we continue and refresh the data view with Alt-X
(according to the interrupt reference, DS:DX
contains the address to the filename) until we break on the interrupt for MCSTR.DAT
:
---(Register Overview )---
EAX=00003D00 ESI=001DA873 DS=0188 ES=0188 FS=0000 GS=0020 SS=0188 Pr32
EBX=00000000 EDI=FFFFFFFF CS=0180 EIP=00255B28 C0 Z1 S0 O0 A0 P1 D0 I1 T0
ECX=002969E6 EBP=002B69E4 IOPL0 CPL0
EDX=001DA873 ESP=002B6858 14981947
---(Data Overview Scroll: page up/down)---
0188:001DA873 4D 43 53 54 52 2E 44 41 54 00 00 00 00 48 BB 28 MCSTR.DAT....H.(
[...]
---(Code Overview Scroll: up/down )---
0180:255B28 CD21 int 21
DOSBox
doesn’t implement any call stack view, so we have to continue until the next RET
instruction, then step into the next caller’s instruction, repeating this until we arrive at the subroutine that loaded the filename address.
At this point we want to match the addresses in the debugger with the ones in our disassembly. Supposedly you can directly convert from DOSBox
offsets to file offsets, but I ended up just taking the hex values of a few instructions until I got a single match in the file:
0180:255B28 CD21 int 21
0180:255B2A D1D0 rcl eax,1
0180:255B2C D1C8 ror eax,1
0180:255B2E 89442404 mov [esp+0004],eax
0180:255B32 85C0 test eax,eax
0180:255B34 7C07 jl 00255B3D ($+7)
binwalk -R "$(printf cd21d1d0d1c88944240485c07c07 | sed 's/\(..\)/\\x\1/g')" MISSION.LE.EXE
# 0xB38D8
Then we take that file offset, check the first offset in the disassembly (0x100000
), along with the corresponding file offset reported by IDA
(0x31db0
) to arrive at the target IDA
offset 1:
hex(0xb38d8 - 0x31db0 + 0x10000) = 0x91b28
MISSION.EXE
was original compiled from C/C++, not written in assembly, since we can find the string:
WATCOM C/C++32 Run-Time system.
So we expect to see symbols for the usual libc functions that wrap these filesystem interrupts (we can also look up which exactly in the library reference).
By following cross-references (xrefs), we get this subroutine hierarchy:
sopen_ (0x91af6) < __doopen_ (0x9ae79) < _fsopen_ (0x9af41) < fopen_ (0x9af5c) < sub_911e0 (0x911e0)
sub_911e0
has a large number of xrefs, and is already user code, not library code. Seems like a good candidate for a subroutine that would be called to load different files at several points. Our current caller (sub_144b4
) loads the offset for string MCSTR.DAT
(dword_16873
), along with an error message if open failed (return code = 0
):
000144C4 mov eax, offset dword_16873
[...]
000144C9 call sub_911E0
000144CE mov dword_E34B8, eax
000144D3 test eax, eax
000144D5 jnz short loc_144EE
000144D7 push offset dword_16873
000144DC push offset aCanTOpenFileS ; "Can't open file %s"
00016873 dword_16873 dd 5453434Dh, 41442E52h, 54h ; DATA XREF: sub_144B4+10o
[hex(ord(x)) for x in 'MCSTR.DAT']
# ['0x4d', '0x43', '0x53', '0x54', '0x52', '0x2e', '0x44', '0x41', '0x54']
Let’s give this subroutine a name (wrap_open_mcstr
) and check the next calls with Graph view
:
Seems like they do file reads, due to the error message that is loaded afterwards. Another call takes the result of multiplying ebx
and edx
:
And does getchar()
if that result is 1, otherwise read()
:
Going back to the caller, we can infer ecx
is the file pointer (fp_mcstr_dat
, its value comes after standard streams + the number of previously open files), ebx
is the number of times to read edx
sized bytes, and eax
contains the address were the read bytes are stored (num_entries
) 2. Afterwards, 6 bytes are read in a loop, as many times as the previously read value, and stored in an array (entries
), while accumulating sizes read from those 6 bytes (sum_entry_head
):
The next instructions do similar parsing of sections in the file, allocating pointer tables to hold their data. Eventually we reach a point where an offset for the start of the compressed blocks is stored (start_cx_block
, note that ftell()
returns the current position of the file pointer, which comes after the previous sections were all parsed). This offset is confirmed in the debugger to be 0x19e1
, matching the start of the first high entropy block:
Memory allocations are attempted with a value of 0xc00
, and if that fails, successively smaller amounts are tried (maybe_c00
), until it succeeds (actual_mem_c00
) or fails due to not enough free memory. This means our decompressed block size has an upper bound of 0xc00
.
To sum it up, wrap_open_mcstr
parses the following sections:
- Section 0,
[0x0..0x2]
: number of entries (n = 0x24) - Section 1,
[0x2..0x2 + n * 0x6 = 0xda]
: entry descriptions- 2 bytes: number of blocks in entry
- 4 bytes: total size of blocks
- Section 2,
[0xda..0x16a2]
: block sizes- array of 2 bytes per value
- Section 3,
[0x16a2..0x1a08]
: lookup table- 2 bytes: number of lookup values
- array of 2 bytes per value
- Section 4
[0x1a08..0x1b0a]
: plaintext word indexes- 2 bytes: number of words
- array of 2 bytes per value (first = 0x0)
[0x1b0a..0x1e91]
: plaintext words- 2 bytes: total size of words (n = 0x387)
- 0x387 bytes: words
- Section 5,
[0x1e91..]
: compressed block data- array of variable bytes per block
Highlighted hex dump, with the following colors applied per section:
- magenta: element counts
- brown: sizes / indexes
- green: 1st value in section
- cyan: 2nd value in section
The value at 0x1a0a
tells us that the first plaintext word starts at byte index 0
of the plaintext word table (not included above). The first compressed block spans [0x1e91..1ee6]
(size 0x55
, read from 0xda
), while the second compressed block spans [0x1ee6..0x1ef2]
(size 0x0c
, read from 0xdc
).
Although we don’t know the purpose of the lookup table, the compressed blocks have been identified.
While testing my Crusher CLI, I noticed that the library is pretty tolerant to unexpected values:
- If trailing data is added to a block, decompression still works fine;
- If a decompression size smaller than the original file size is provided, this results in the decompressed output being truncated, but still matching the original bytes;
- If a decompression size larger than the original file size is provided, this results in null bytes being appended to the decompressed output.
Therefore, even if we don’t know the exact decompression sizes, we can still try to decompress these blocks… except it still doesn’t work. To be fair, some block sizes were suspiciously small (e.g. 0xc
), and the minimal compressed block I could generate in my tests was larger than that. Could it be… another compression algorithm? Only one way to be sure: reversing the decompression subroutine, wherever it is.
Luckily, there were only 2 xrefs for start_cx_block
: the function we saw before and another one, which also loaded offsets to data structures used to hold the previously parsed sections.
Basically, this subroutine takes an entry index and a block index as input, and traverses the previous pointer tables to get the corresponding size and arrive at the right offset (if the target block index at si
doesn’t match start_cx_block
, it moves forward as many block sizes read from entries_sums
as needed):
Later on, the allocated space at actual_mem_c00
is passed as argument eax
to a subroutine (wrap_s3_read
), along with the address of the first lookup table value (mem_s3e1
) as ecx
, file pointer at the position of target compressed block offset as ebx
, and the compressed size to read as edx
:
Whenever a saved game is restored, a label for the current room you are in is displayed. By placing a breakpoint before the call to wrap_s3_read
, then loading a saved game, we can see that after the subroutine is called, the allocated space now contains the decompressed label (“Communications Center.”), with the total number of decompressed characters returned in eax
:
---(Register Overview )---
EAX=00000017 ESI=000700D3 DS=0188 ES=0188 FS=0000 GS=0020 SS=0188 Pr32
EBX=0009FFFF EDI=000700D3 CS=0180 EIP=001D8CB7 C0 Z0 S0 O0 A1 P1 D0 I1 T0
ECX=00000009 EBP=00FD95B0 IOPL0 CPL0
EDX=00000005 ESP=002B68F0 350756628
---(Data Overview Scroll: page up/down)---
0180:00FD95B0 43 6F 6D 6D 75 6E 69 63 61 74 69 6F 6E 73 20 43 Communications C
0180:00FD95C0 65 6E 74 65 72 2E 00 B0 EA B0 B0 EA EA EA E0 EE enter...........
0180:00FD95D0 B0 EA B0 EE B0 EA B0 B0 EA B0 B0 B0 EA B0 EE B0 ................
0180:00FD95E0 B0 B0 B0 C1 B0 B0 B6 C1 B0 B6 C1 B6 F2 B6 F2 B6 ................
0180:00FD95F0 C0 C8 F2 43 C8 43 C8 43 43 B8 43 F7 43 6E 43 F7 ...C.C.CC.C.CnC.
0180:00FD9600 F7 F7 F7 F7 F7 F7 F7 F7 F1 F7 F7 F1 F7 F7 F7 F7 ................
0180:00FD9610 F7 F7 F7 43 F7 43 43 43 43 43 43 43 B8 43 43 B8 ...C.CCCCCCC.CC.
0180:00FD9620 43 C8 C0 43 43 43 C0 43 C0 C8 C0 C0 B6 43 46 F2 C..CCC.C.....CF.
---(Code Overview Scroll: up/down )---
0180:1D8CB2 E879FCFFFF call 001D8930 ($-387)
Clearly this is the subroutine with the decompression logic! Let’s dig deeper…
si
is initialized with the compressed size to read, and an unsigned test is done for si
> 0 (i.e. do we still have bytes to read):
If so, then some variables we don’t know the purpose of are also checked (we’ll skip them for now). The first lookup table value to be considered (s2_current_value
) is not a lookup value, but the total number of lookup table entries minus 2 (s2_sizes_minus2
, not the best name, but “sizes” was my hunch at the time):
Eventually, we read the first compressed byte (stored at s3_current_byte
), and initialize dl
with 8, a counter for the following loop (notice the green branch that goes up, back to the variable checks):
If the loop ends because the counter reached 0, we read the next compressed byte and reset the counter.
Finally, we get to see the lookup table (s2_values
) taking part in some arithmetic and dereferencing operations. Note how previous lookup values are used to get the next value (s2_current_value
). These operations can be simplified as:
loop_counter = 8
s2_current_value = s2_sizes_minus2
while loop_counter > 0:
first_bit = s3_current_byte & 1
s3_current_byte = s2_current_value | first_bit
lookup_index = s3_current_byte - s2_sizes
s2_current_value = s2_values[lookup_index]
s3_current_byte >>= 1
loop_counter -= 1
if s2_current_value < 0:
# [...]
Note that the table is accessed from the end, since lookup_index
is always negative (considering python array indexing). If you go back to the highlighted hex dump, you can verify that the lookup table values are signed: some values are positive (e.g. 0x2, 0x4
), while others are negative (e.g. 0xffdc, 0xffd9
). When the value is negative, we arrive here:
The previous lookup value is negated and decremented (s3_v_neg
). Then:
- If the result is < 0x80, it is stored as a decompressed character at
edi
, incrementing the number of decompressed characters so far (num_dcx_chars
); - Else, it is decremented by 0x80, and some pointer arithmetic is done to access one of the plaintext words (
ptr_plaintext
). If the word size is > 0, it takes the branch toloc_14A23
, where each character of the word is stored at incrementing addresses ofedi
, also incrementing the number of decompressed characters so far (num_dcx_chars
).
When this second loop ends, we go back to the variable checks, which can now be understood:
If there are no more characters to read (si
) or the transformed lookup value (s3_v_neg
) is 0, we finished decompressing.
This word-based compression algorithm seems to be based on a Dictionary coder, given that a concordance of the full text was built (i.e. the sorted plaintext words with high frequency), where indexes are most likely represented via Huffman coding as negative values, while less common words and non-words (i.e. punctuation) are encoded by character as positive values.
We can verify the concordance by creating our own from the decompressed full text, with some rough cleanup on terminators and whitespaces, then counting the top word frequencies, and taking the corresponding words:
cat * \
| sed 's/[ \t\n]\+/\n/g; s/[^ \t\na-zA-Z0-9_-'"'"']//g; s/[^[[:alpha:]]*$//g' \
| sed 's/^\s*..\?\s*$//g' \
| sort \
| uniq -c \
| sort -n \
| tail -n125 \
| awk '{print $2}'
After sorting and comparing with the original using diff
, most entries are matched.
Sadly, I couldn’t find a matching description for the unused item (perhaps it was in one of the empty blocks, some are present in a few entries). However, there are some curious debug messages:
MAV *** REPORT THIS TO MIKE ASAP - THERE SHOULDN'T BE ANY MORE MAVS*** It's the Autodoc System.
MAV *** REPORT THIS TO MIKE ASAP - THERE SHOULDN'T BE ANY MORE MAVS: It's a control servo module.
MAV *** REPORT THIS TO MIKE ASAP - THERE SHOULDN'T BE ANY MORE MAVS: th_oxygen_feeds
MAV *** REPORT THIS TO MIKE ASAP - THERE SHOULDN'T BE ANY MORE MAVS: ct_data_collection_system.
MAV *** REPORT THIS TO MIKE ASAP - THERE SHOULDN'T BE ANY MORE MAVS: th_frequency_Module.
MAV *** REPORT THIS TO MIKE ASAP - THERE SHOULDN'T BE ANY MORE MAVS: th_yellow_sticky.
These are all contained on blocks of the same entry (0xa
), maybe they were used to hunt down a bug in a particular scene.
Save game patching
We can actually load these debug messages by modifying values related to items in save game files. First, let’s save the game after these atomic inventory actions:
- Open cabinet in science lab, two items available to take (inventory unchanged);
- Take the first item (added to inventory);
- Take the second item (added to inventory).
Then, we can compare the first two, disregarding the first byte since it’s just the save game filename:
At 0x2b
two bytes are updated, and the first one is again updated after taking the second item:
If we try increasing that byte:
Our inventory holds what appears to be the room’s objects, including the only pickable item, and an item that references one of those debug messages:
All unpickable items use an icon we never get to see during normal gameplay! This functionality makes an interesting debug mode, maybe developers used it to check if all expected objects were loaded, without having to hover around the room with the mouse.
TODO
- Data that links backgrounds with masked images needs to be parsed.
- There are also some file formats that weren’t explored, such as
.Q
, which contain audio and animations.
Feel free to join the fun.
Credits
- Some previous work was done on a similar game. The accompanying notes helped getting an idea of what metadata to expect when processing
.PIC
, although that version of the format had significant differences (e.g. no compression was used). - The same thread that featured a Crusher interface also described a
.PIC
structure much closer to the one parsed here.
-
We could also take the difference between
IDA
andDOSBox
offsets:hex(0x255b28 - 0x91b28) = 0x1c4000
And use that to rebase our disassembly (
Edit > Segments > Rebase program...
), so that offsets in both apps are calculated from the same base address (0x1c4000
).However,
IDA 4.1
doesn’t have rebase… is it move?What? Let’s check
ida.hlp
…Moving a segment means moving its beginning. So, the proper name for this command would be ‘Expand/Shrink a Segment’ (due to historical reasons the name is ‘move segment’).
This doesn’t seem equivalent at all. Well, if we try rebasing in
IDA 5.0
:lx.ldw: can't load file (error code 126)
Any luck just copying over
lx.ldw
fromIDA 4.1
?Access violation at address 7220656C. Read of address 7220656C.
So much for that… ↩
-
As an alternative to reversing callee subroutines, figuring out where results are stored is a matter of following register changes in the debugger, and setting the
Data overview
to the same address a register was set to before a call. This way, we can see how many bytes were read and stored at that address, comparing them against the hex dump. ↩