Only 17% of all 64-bit Integers are products of two 32-bit integers

In software programming, the product between two integers is often computed to a fixed number of bits with overflow. Consider 8-bit integers. If you multiply 127 by 127, you get back the number 1 as an 8-bit unsigned integer, with an overflow. The actual full product is 16129. To represent 16129, you typically use 16 bits of precision.

Thus we have the notion of the full product. The full product of two 32-bit integers is typically represented using 64 bits. The question that preoccupied me is what fraction of all 64-bit integers can be written as the product of two 32-bit integers.

You might wonder why you would care?

We often design hash functions: they are special functions that take an input and generate a random-looking output. Several years ago I designed a very fast hash function called clhash. It is a super-fast hash function for strings having a few hundred bytes or more. If you don’t know about clhash, check it out. It is interesting in its own right.

This clhash hash function uses a type of multiplication typical of cryptographic applications. I was trying to argue that our approach had benefits compared with techniques based on standard multiplications. Let me illustrate. A simple hash function for 32-bit integers could take the least significant bits and multiply them with the most significant bits.

// simpleHighLowHash is a simple (and weak) 32-bit hash // that multiplies the high 16 bits by the low 16 bits. func simpleHighLowHash ( x uint32 ) uint32 { high : = uint16 ( x >> 16 ) low : = uint16 ( x & 0xFFFF ) return uint32 ( high ) * uint32 ( low ) }

Maybe you’d want the hash function to be uniform: all possible 32-bit hash values should be equally probable. It is only possible in this instance if the hash function can produce all 32-bit hash values, which is not the case.

The great mathematician Erdös showed that the proportion of all 2n -bit values that can be generated by the product of two n-bit values goes to zero as n becomes large. This means that if you have, say, 10000000-bit integers multiplying 10000000-bit integers, you’d expect relatively few 100000000000000-bit integers to be produced. But what about practical cases like 32-bit integers or 64-bit integers?

You can just brute-force the problem easily up to the multiplication of 16-bit integers into 32-bit products. At that point, slightly one out of five 32-bit numbers is a product between two 16-bit integers. About 80% of all 32-bit integers are never produced by this hash. However, the running time grows exponentially, and brute force won’t scale all the way to 32 bits.

So what do we do about the 32-bit case? That is, what do you do when you multiply two 32-bit integers to produce a 64-bit product? What fraction of 64-bit values can the following function produce?

... continue reading