C++ Seeding Surprises (2015)

Properly seeding random number generators doesn't always get the attention it deserves. Quite often, people do a terrible job, supplying low-quality seed data (such as the system time or process id) or combining multiple poor sources in a half-baked way. C++11 provides std::seed_seq as a way to encourage the use of better seeds, but if you haven't thought about what's really going on when you use it, you may be in for a few surprises.

In contrast to C++11, some languages, such as popular scripting languages like JavaScript, Python, or Perl, take care of good seeding for you (provided you're using their built-in RNGs). Today's operating systems have a built-in source of high-quality randomness (typically derived from the sequencing of unpredictable external and internal events), and so the implementations of these languages simply lean on the operating system to produce seed data.

C++11 provides access to operating-system–provided randomness via std::random_device , but, strangely, it isn't easy to use it directly to initialize C++'s random number generators. C++'s supplied generators only allow seeding with a std::seed_seq or a single integer, nothing else. This interface is, in many respects, a mistake, because it means that we are forced to use seed_seq (the “poor seed fixer”) even when it's not actually necessary.

In this post, we'll see two surprising things:

Low-quality seeding is harder to “fix” than you might think. When std::seed_seq tries to “fix” high-quality seed data, it actually makes it worse.

Perils of Using a Single 32-Bit Integer

If you look online for how to initialize a C++11 random number generator, you'll see code that looks like

std::mt19937 my_rng(std::random_device{}());

or possibly the more explicit (but equivalent) version,

std::random_device rdev; uint32_t random_seed = rdev(); std::seed_seq seeder{random_seed}; std::mt19937 my_rng(seeder);

... continue reading