Understanding Deflate
I’m trying to understand how Deflate works so decided to compress a simple string TOBEORNOTTOBEORTOBEORNOT using GZIP then decode the resulting file by hand.
Compressing the data
Pretty simple here, text in bytes out:
$ echo -n 'TOBEORNOTTOBEORTOBEORNOT' | gzip -n | xxd -ps -u 1F8B08000000000000030BF17772F50FF2F30F09013342605C00F14E3D2D 18000000
Reading the GZIP data
Even though I’m really interested in the compressed data I have to decode the GZIP “wrapper” in order to get at the juicy compressed data. Fortunately the Wikipedia page has the neccessary details:
1F8B Magic number. 08 Compression method. Must be 8 (Deflate). 00 Flags 0 = no flags 00000000 Unix time when the file was last modified. 0 means no timestamp is available. 00 Extra flags. 0: None (default value) 03 Filesystem on which compression occurred. 3: Unix 0BF17772 The compressed data. F50FF2F3 The compressed data. 0F090133 The compressed data. 42605C00 The compressed data. F14E3D2D CRC-32 (ISO 3309) of the uncompressed data. 18000000 Size (in bytes) of the uncompressed data = 24
Decoding the compressed data
Here we have to refer directly to the DEFLATE Compressed Data Format Specification version 1.3.
... continue reading