Title blatantly stolen from SUDDEN・DESU’s Twitter thread, which brought attention to a particular file present in a few Japanese Sega Mega CD disc images, but apparently unused.

Filenames changed across games, some having the suggestive name “warning”, but always encoded in the same elusive manner:

It does not contain any M68000 code. It does not contain any Megadrive VDP compatible graphics. It does not have any ASCII, Shift JIS or EUC-JP strings. It does not seem to have any common MD compressed chunks (Nemesis, Kozinski, etc).

The only other “warning” I’m aware of is stated in the documentation, in which games that do not use CD audio must have one CD-DA track indicating to the user that the disc is intended for the game console. But our mystery message file does not seem to have any PCM data.

Let’s give it a try

It starts with random-ish bytes:

00000000: 2028 331e 9108 4306 55fd f381 649f 8fd7   (3...C.U...d...
00000010: 96e1 5877 eb99 2255 c580 3f1f dbb6 e0c8  ..Xw.."U..?.....
00000020: ebe8 8df0 3af9 b87e eb1c 64b5 2bcb 8a6b  ....:..~..d.+..k
00000030: a113 d631 1e28 1463 4055 c582 bddd d4e4  ...1.(.c@U......
00000040: 7df7 207b b29f 6b55 a680 a25d e586 d9df  }. {..kU...]....

To check if it’s really random, we can use binvis’ entropy mode. A section around the middle showed lower-entropy chunks:

The same pattern occurred at the end as well. In fact, it seems the file was composed of two identical halfs:

dd if=warning of=warning1 bs=4M iflag=skip_bytes,count_bytes skip=$((0x0)) count=$((0x232800))
dd if=warning of=warning2 bs=4M iflag=skip_bytes,count_bytes skip=$((0x232800)) count=$((0x232800))
diff -u <(xxd warning1) <(xxd warning2)
# no output

By hovering over each of those low-entropy chunks, we can see that these byte patterns are repeated. Searching on a hex editor for one of the sequences shown above (e.g. “K47WV”) matches on every 0x800 offset. These sequences only appeared at the end of each half of the file. Maybe padding?

Nevertheless, repetitions ruled out compression being applied over the whole file, and it seemed unlikely to be an archive, since there was no apparent metadata or offsets to chunks.

At this point, we could try finding other files with some of these byte sequences. If it was padding, we could expect it to be common across files of the same format. Searching arbitrary bytes is always a crapshoot, but this time, we hit something in the dark: “K47WV” and its surrounding bytes appeared on a CD dump log of a Sega Saturn game. Seems like there were two dumps in different formats of the same disc contents:

<rom name="spike.scm" size="2123856" crc="889ec726" md5="9fb0d9487777505c2bf499b983e33f64" sha1="b67ec2f2aa46fad8ff40ab53309d3f151fba2cbf"/>
<rom name="spike.img" size="2123856" crc="dc477cb0" md5="b49484b6b2fd79fda995480baee0a5d8" sha1="de7b9a82a87b6cb857498cc3a0c13d30c715d54e"/>

Searching for rip "scm" "img" cd leads us to DiscImageCreator, a disc dumper which describes .scm files as the scrambled image of .img files.

Implementing the scrambler

The encoding is neatly explained in CD Cracking Uncovered:

Before writing the data to the disc, sector contents undergo a scrambling operation. Scrambling means that the data is transformed into a pseudo-random sequence that resembles “white noise” in its characteristics.

The scrambler uses a bit-shift register, which is reset for each sector. The generated pseudo-random values are then xor’d with input data to produce scrambled bytes. A description is included in the ECMA-130 specification, but it’s more straighforward to just take the code from the CD Cracking Uncovered book and adapt it for our own script.

Oh, there’s just one tiny but essential detail to consider in our implementation: given our assumed padding byte sequences, we need to apply scrambling in 0x800 chunks, otherwise we would produce different values after the first 0x800 bytes. However…

Scrambling is applied to all fields of a sector, except for the 12-byte sync group at its start. […] In total, this operation will produce 2,340 bytes of data.

Same is said on ECMA-130. So why is this file scrambled on each 2048 bytes instead? Seems to match the user data area size of the typical Yellow Book Mode 1 sector type, but would it be correctly recognized by a CD drive? Hold that thought.

Descrambling the mystery

To confirm if we really have padding, we can assume null bytes and generate a file full of padding:

dd if=/dev/zero of=0s bs=4M iflag=count_bytes count=$((0x2000))

Indeed, running our script on this file produces 0x800 chunks that are identical to the ones in the warning file.

The warning file produces a lower-entropy file. We can import it as raw data in Audacity as a 2-Channel 44100Hz signed 16-bit PCM and get an audible message!

About that elephant in the room

Ok, but why was this stored scrambled at all?

It seems like there’s a very similar situation with the CD-i Warning Message:

All CD-i discs are required by the Green Book specification to have a warning message encoded at the beginning of the data track. This warning message is necessary for older audio CD players that would mistakenly attempt to play the CD-i track on the disc, resulting in possible damage to the audio system.

Although we aren’t dealing with Green Book discs, there’s audio stored on a data track.

Furthermore:

All data, with the exception of audio (CD-DA), on a CD is scrambled.

When recorders write in track-at-once mode (almost all do today), they want to scramble the data as it is written to the CD. It is up to the recording software to make sure the data passed to the recorder is unscrambled.

It’s my understanding that this audio would be stored scrambled on the data track, but since it goes through the scrambling process again when recorded, it ends up stored as unscrambled, as expected by audio CD players.

Let’s check if it all adds up with an example, here’s where the warning file starts in the data track of Jangou World Cup, right after the 16 bytes for sync + header:

0000dc80: 00ff ffff ffff ffff ffff ff00 0002 2401  ..............$.
0000dc90: 2028 331e 9108 4306 55fd f381 649f 8fd7   (3...C.U...d...
0000dca0: 96e1 5877 eb99 2255 c580 3f1f dbb6 e0c8  ..Xw.."U..?.....
0000dcb0: ebe8 8df0 3af9 b87e eb1c 64b5 2bcb 8a6b  ....:..~..d.+..k
0000dcc0: a113 d631 1e28 1463 4055 c582 bddd d4e4  ...1.(.c@U......

Seems pretty close to the beginning of the track, since it’s preceeded by the Directories table. The next sync happens after the expected 2352 bytes of the sector, and is preceeded by 0x120 bytes dedicated to error detection and correction. These Yellow Book Mode 1 sector type bytes split right at the 0x800 boundary in the warning file:

; bytes @ 0x7f0 in warnmsg.bin
0000e480: 004a 9377 8ee7 24cb c555 0afc ff42 1f32  .J.w..$..U...B.2
; EDC + Intermediate + ECC (P-Parity & Q-Parity)
0000e490: 9606 0036 0000 0000 0000 0000 c837 7f49  ...6.........7.I
0000e4a0: 18d1 e2a7 b137 f3ac 6df1 c611 6081 976b  .....7..m...`..k
0000e4b0: 8b3f bb07 edff 8315 1c5e 090a 7c60 ad47  .?.......^..|`.G
0000e4c0: 4cdb c30d 7524 74f0 a5b3 c951 3016 75af  L...u$t....Q0.u.
0000e4d0: e6a0 e1a9 d351 5142 5d73 04d1 82fb 6a46  .....QQB]s....jF
0000e4e0: ad14 75ae 694a 4636 a5d7 18e8 0ae6 551c  ..u.iJF6......U.
0000e4f0: 32ef 7b20 4386 51d7 8e22 57af 8ebe 29bf  2.{ C.Q.."W...).
0000e500: 3722 6ad3 c952 83ea c169 72a4 5468 8df4  7"j..R...ir.Th..
0000e510: 3230 b832 5fc2 53c5 8282 d77b 6e71 a0d4  20.2_.S....{nq..
0000e520: 2bce 49f3 32e0 1d60 5681 458a bc24 f4ea  +.I.2..`V.E..$..
0000e530: af06 8b69 ccd0 90ce f789 6ca4 e127 aa18  ...i......l..'..
0000e540: 13da 04c5 2ff1 0e20 d27c f999 00d1 2766  ..../.. .|....'f
0000e550: a7c7 f70b 2f62 dd78 db31 1307 4561 f3c0  ..../b.x.1..Ea..
0000e560: 8a75 5cbd 4653 29a1 2fe9 107c f9b1 3b6b  .u\.FS)./..|..;k
0000e570: 9d82 b444 069b 201c f983 62a4 fd9f 7209  ...D.. ...b...r.
0000e580: a29c 496d 2e90 ec3a 1276 e577 73f0 82a1  ..Im...:.v.ws...
0000e590: 24d7 8f0f e18c 68c8 0939 2305 2912 e917  $.....h..9#.)...
0000e5a0: 1316 67de 9b20 de2e 232a a917 8bd6 5815  ..g.. ..#*....X.
; next sync + header
0000e5b0: 00ff ffff ffff ffff ffff ff00 0002 2501  ..............%.
; bytes @ 0x800 in warnmsg.bin
0000e5c0: a22a d01c 160a 9004 df00 0483 cc62 892a  .*...........b.*

So it seems that it should run fine in audio CD players, at least concerning the Yellow Book Mode 1 format. The fact our script had to descramble in 0x800 chunks is just an artifact of how the file needs to be stored so that it ends up correctly spread across the actual sectors.