reverse-engineering a proprietary sound sample format

Reverse-engineering a proprietary sound

sample format:A detective storyAndrew Bulhak

http://dev.null.org/acb/

The Problem•You make electronic music with softsynth plugins•Your drum machine plugin uses a proprietary format for its samples•You want to move your samples to an open format, like AIFF

Why?•To use your samples (or presets) with other software•To use samples with hardware devices•Because open formats are always better than proprietary ones•For the challenge

The Drum PluginLinplug RMIV

The Drum Plugin (2)Linplug RMIV•A VST/AudioUnit plugin that works with sequencer software•Can play both sample-based and synthesised sounds•Stores samples in a custom format named .D4T•Can import sounds in AIFF and WAV formats, converting them to .D4T

Examining the D4T format

Looking at the same sounds in both AIFF and D4T format:

Do you notice a pattern?

% ls -l -rw-r--r-- 1 acb staff 20586 26 May 2000 606bd.wav -rw-r--r-- 1 acb staff 40904 11 Apr 14:06 606bd.D4T-rw-r--r-- 1 acb staff 15182 26 May 2000 606ch.wav -rw-r--r-- 1 acb staff 30096 11 Apr 14:06 606ch.D4T -rw-r--r-- 1 acb staff 33900 26 May 2000 606ht.wav -rw-r--r-- 1 acb staff 67536 11 Apr 14:06 606ht.D4T-rw-r--r-- 1 acb staff 31426 26 May 2000 606lt.wav -rw-r--r-- 1 acb staff 62588 11 Apr 14:06 606lt.D4T

The D4T Format (2)•Each D4T file is roughly twice the size of its corresponding WAV•The WAVs use 16-bit samples•Therefore, D4T uses 32-bit samples

Hypothesis: D4T uses 32-bit float samples, between -1.0 and 1.0 (as that makes more sense than 32-bit ints).

Testing the Hypothesis

Using hexdump(1), examine the first nonzero samples of a WAV and a D4T. Then see if they're equivalent.

I.e., if W = the WAV sample and D the D4T one, if W = int(D*0x8000)

Testing the Hypothesis (2)% hexdump -C 606bd.wav |head

00000000 52 49 46 46 62 50 00 00 57 41 56 45 66 6d 74 20 |RIFFbP..WAVEfmt00000010 10 00 00 00 01 00 01 00 44 ac 00 00 88 58 01 00 |........D?...X..00000020 02 00 10 00 64 61 74 61 d0 4f 00 00 77 03 96 0b |....data?O..w...

% hexdump 606bd.D4T|head 0000000 00 00 00 00 00 00 00 00 00 04 08 40 01 04 29 00 0000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000020 00 00 00 00 00 00 00 00 00 c0 dd 3c 00 60 b9 3d

Thus our first samples are 0x0377 and the float represented by 00 c0 dd 3c.

Testing the Hypothesis (3)We can use Python's struct module to

unpack binary floats.

>>> struct.unpack('<f', '\x00\xc0\xdd\x3c')(0.027069091796875,)

Aha, a float between -1.0 and 1.0.

>>> int(struct.unpack('<f', '\x00\xc0\xdd\x3c')[0] * 0x8000)887>>> '%x' % 887 '377'

Eureka!

A First AttemptWe can now write a simple Python script for converting sound files. Our script will:•Open a .D4T file•Skip 40 bytes•Read the remainder of the file, treating it as floats•Write all that to an AIFF file (using Python's aifc module).

•You could just as easily write WAV

A First Attempt (2)Our script has a few limitations:•It assumes a default sample rate (44100, in this case)•It assumes samples have one channel•If a D4T has two channels, the resulting AIFF will be mono, with one after the other.

As such, we need to extract more information from our D4T files.

The D4T HeaderThe secrets must be encoded in the nonzero bytes at the top of the file.


But what do they mean?

Number of ChannelsWe examine two D4T files, one

mono and one stereo:


% hexdump BgBeatSnare.D4T|head -2 0000000 00 00 00 00 00 00 00 00 00 1d 2c 00 02 04 29 00 0000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Aha! We've found our channel count.

File SizeThe group of bytes immediately before the channel count is proportional to the file size, though doesn't translate into a sensible number of either bytes or samples.% ls -l-rw-r--r-- 1 acb wheel 40904 11 Apr 14:06 606bd.D4T -rwxr-xr-x 1 acb wheel 697976 8 Jul 00:42 808Kick.D4T-rwxr-xr-x 1 acb wheel 8152 8 Jul 00:42 GenericSynBass.D4T % hexdump 606bd.D4T | head -10000000 00 00 00 00 00 00 00 00 00 04 08 40 01 04 29 00% hexdump 808Kick.D4T | head -1 0000000 00 00 00 00 00 00 00 00 00 45 4f 24 02 04 29 00% hexdump GenericSynBass.D4T | head -1 0000000 00 00 00 00 00 00 00 00 00 00 51 0c 01 01 63 4d

File Size (2)We create a few AIFFs of specific sizes, get RMIV to convert them into .D4Ts, and examine those:% ls -l -rw-r--r-- 1 acb admin 132 8 Jul 14:18 test_23x1.D4T -rw-r--r-- 1 acb admin 136 8 Jul 14:18 test_24x1.D4T -rw-r--r-- 1 acb admin 140 8 Jul 14:18 test_25x1.D4T% hexdump test_23x1.D4T |head -1 0000000 00 00 00 00 00 00 00 00 00 00 00 5c 01 04 29 00 % hexdump test_24x1.D4T |head -1 0000000 00 00 00 00 00 00 00 00 00 00 00 60 01 04 29 00 % hexdump test_25x1.D4T |head -1 0000000 00 00 00 00 00 00 00 00 00 00 01 00 01 04 29 00

At small sizes, this looks like a byte count, though after 96 bytes, there's a discontinuity.

File Size - The AnswerIt appears that each byte in the file size

can only contain a value under 100 (0x64).From this, we can conclude that the size is encoded in binary-coded centimal.•Each byte contains a base-100 digit, or two decimal digits, as a number from 0 to 99.•I.e., 10632 would be 0x01 0x06 0x20•Why? I have no idea.

Sample RateNow that we know about binary-coded centimal, the rest of the puzzle falls into place.The remaining 3 bytes after the channel count are the sample rate in BCC, i.e.:

% hexdump 606bd.D4T|head0000000 00 00 00 00 00 00 00 00 00 04 08 40 01 04 29 00

0x04 0x29 0x00 = 04 41 00 = 44100

Summary: The D4T FormatWe now know the structure of

a .D4T:•8 zero bytes•4 byte total length (L) of sample data, in bytes, DCC-encoded•1 byte channel count C•3-byte sample rate, DCC-encoded•24 zero bytes•for i in 1..C:•(L/C)/4 samples, comprising that channel

ConclusionArmed with this knowledge, it is possible to write a Python script to convert D4T files to AIFF.

That script lives at http://dev.null.org/code/dermiv/

reverse-engineering a proprietary sound sample format

Technology

d4t format

d4t files

d4t rwxr

d4t header

d4t file skip

file size

samples hypothesis

hexdump c