Introduction to Random Number Generation

Random numbers are essential for many programs - especially games, statistical simulations, and cryptography. Without randomness, games would be predictable and boring: enemies would always attack the same way, treasure would appear in the same locations, and puzzles would have identical solutions every time.

In the physical world, we generate randomness through dice rolls, coin flips, or card shuffling. These aren't truly random, but involve so many physical variables (gravity, friction, air resistance, momentum) that they're unpredictable and effectively random.

Computers can't flip coins or roll dice. They exist in a controlled digital environment where everything is binary (0 or 1) and deterministic. When you tell a computer to calculate 5 + 7, you want it to always return 12, not occasionally return 11 or 13.

Consequently, computers cannot generate truly random numbers through software. Instead, programs simulate randomness using algorithms.

Algorithms and state

An algorithm is a sequence of instructions that solves a problem or produces a result.

For example, here's an algorithm for sorting names alphabetically:

  • Create an empty sorted list
  • Find the alphabetically first name in the unsorted list
  • Move that name to the sorted list
  • Repeat until the unsorted list is empty

Algorithms are reusable - you can apply the same sorting algorithm to different lists.

Here's a simple algorithm that generates sequential numbers:

#include <iostream>

int generateNext()
{
    static int state{ 10 }; // initialized only once

    ++state;      // modify state
    return state; // generate next number from new state
}

int main()
{
    std::cout << generateNext() << '\n';
    std::cout << generateNext() << '\n';
    std::cout << generateNext() << '\n';

    return 0;
}

Output:

11
12
13

The first time generateNext() is called, state is initialized to 10. Then the next number is generated and returned.

This algorithm is stateful - it retains information (in state) across calls. A stateless algorithm doesn't store information and must be given everything it needs each time. The term state refers to the current values held in stateful variables.

To generate the next number, our algorithm:

  1. Modifies the current state to produce new state
  2. Generates the next number from the new state

This algorithm is deterministic - for a given starting value, it always produces the same sequence.

Pseudo-random number generators (PRNGs)

To simulate randomness, programs use a pseudo-random number generator (PRNG) - an algorithm that generates a sequence of numbers whose properties simulate randomness.

Here's a basic PRNG that generates 100 16-bit pseudo-random numbers:

#include <iostream>

// illustrative only - don't use this
unsigned int simplePRNG()
{
    static unsigned int state{ 0 }; // initialized once

    // modify state using large constants and intentional overflow
    // to make the next number hard to predict casually

    state = 8253729 * state + 2396403; // modify state
    return state % 32768; // generate next number
}

int main()
{
    for (int count{ 1 }; count <= 100; ++count)
    {
        std::cout << simplePRNG() << '\t';

        if (count % 10 == 0)
            std::cout << '\n';
    }

    return 0;
}

Output:

4339	838	25337	15372	6783	2642	6021	19992	14859	26462
25105	13860	28567	6762	17053	29744	15139	9078	14633	2108
7343	642	17845	29256	5179	14222	26689	12884	8647	17050
8397	18528	17747	9126	28505	13420	32479	23218	21477	30328
20075	26558	20081	3716	13303	19146	24317	31888	12163	982
1417	16540	16655	4834	16917	23208	26779	30702	5281	19124
9767	13050	32045	4288	31155	17414	31673	11468	25407	11026
4165	7896	25291	26654	15057	26340	30807	31530	31581	1264
9187	25654	20969	30972	25967	9026	15989	17160	15611	14414
16641	25364	10887	9050	22925	22816	11795	25702	2073	9516

Each number appears fairly random relative to the previous one.

Notice the similarity to our generateNext() function! Both use state, modify it, and generate output from the new state. PRNGs just use more complex mathematical operations to make the sequence harder to predict.

This particular PRNG isn't very good (notice how values alternate between even and odd - not very random!). Better PRNGs use more state variables and more sophisticated operations.

Seeding a PRNG

The sequence generated by a PRNG isn't random at all. Like generateNext(), simplePRNG() is deterministic. Given the same initial state (0), it generates the same sequence every time. Run the program three times and you'll get identical output.

To generate different sequences, we must vary the initial state. The value(s) used to set initial state is called a random seed (or seed). When initial state is set from a seed, the PRNG has been seeded.

Key Concept
All values a PRNG produces are deterministically calculated from the seed.

Here's a program that requests a seed from the user:

```cpp #include

unsigned int globalState{ 0 };

void seedGenerator(unsigned int seed) { globalState = seed; }

unsigned int simplePRNG() { globalState = 8253729 * globalState + 2396403; return globalState % 32768; }

void printTenNumbers() { for (int count{ 1 }; count <= 10; ++count) { std::cout << simplePRNG() << '\t'; }

std::cout << '\n';

}

int main() { unsigned int userSeed{}; std::cout << "Enter a seed: "; std::cin >> userSeed;

seedGenerator(userSeed);
printTenNumbers();

return 0;

}


Three sample runs:

Enter a seed: 7 10458 3853 16032 17299 10726 32153 19116 7455 242 549

Enter a seed: 7 10458 3853 16032 17299 10726 32153 19116 7455 242 549

Enter a seed: 9876 24071 18138 27917 23712 8595 18406 23449 26796 31519 7922


Same seed = same sequence. Different seed = different sequence.

## Seed quality and underseeding

For different results each run, we need different seeds. Asking users for seeds isn't practical - they'll just enter the same value. The program needs to generate randomized seeds automatically.

We can't use a PRNG to generate seeds (we need a seed to generate random numbers!). Instead, we use a seed generation algorithm designed specifically for producing seeds.

The theoretical maximum unique sequences a PRNG can generate depends on its state size. A PRNG with 128 bits of state can theoretically generate 2^128 unique sequences.

However, the actual number of sequences depends on how many unique seeds the seed generator can provide. If the seed generator only produces 4 different seeds, the PRNG can only generate 4 different sequences.

If a PRNG doesn't get enough quality seed data, it's **underseeded**. An underseeded PRNG produces lower quality randomness:

- Consecutive runs may produce correlated sequences
- Some values may never be generated at certain positions
- Someone might guess the seed from initial outputs and predict all future values

**Advanced note:** An ideal seed should have at least as many bits as the PRNG's state, have independently randomized bits, have a good mix of 0s and 1s across all bits, have no "stuck bits" (always 0 or always 1), and have low correlation with previously generated seeds. In practice, these are difficult to achieve perfectly. PRNGs with huge states (like Mersenne Twister with 19937 bits) are often designed to tolerate smaller seeds. However, using a single 32-bit or 64-bit value to seed a PRNG results in significant underseeding. Seeding with 64 bytes of quality data is typically sufficient for non-sensitive uses (not statistical simulations or cryptography).


## What makes a good PRNG?

A good PRNG should have several properties:

**Distribution uniformity** - Each number should be generated with approximately equal probability.

We can check this with a histogram. For a PRNG generating numbers 1-6, generating 36 numbers should produce roughly:

1|****** 2|****** 3|****** 4|****** 5|****** 6|******


A biased PRNG might produce:

1|*** 2|****** 3|****** 4|****** 5|****** 6|*********


If your monster-drop code expects a 1-in-6 chance of rare items (rolling 6), but the PRNG generates way more 6s, players will get too many rare items, breaking game balance.

**Unpredictability** - The next number shouldn't be predictable.

For example, `return ++num` is perfectly uniform but completely predictable.

Even seemingly random sequences like `simplePRNG()` can be predicted. By examining a few outputs, someone could determine the constants being used and calculate all future values. If running a betting website, predictable random numbers would let users game the system and always win.

**Good dimensional distribution** - The PRNG should return values across the entire range at random.

A PRNG that generates all low numbers then all high numbers might be uniform and unpredictable, but still produces biased results.

**High period** - The sequence shouldn't repeat quickly.

All PRNGs eventually repeat. The **period** is the sequence length before repetition begins.

Here are 100 numbers from a PRNG with poor periodicity:

112 9 130 97 64 31 152 119 86 53 20 141 108 75 42 9 130 97 64 31 152 119 86 53 20 141 108 75 42 9 130 97 64 31 152 119 86 53 20 141 ...


Notice 9 appears as the 2nd number, then the 16th, then every 14 numbers - it's stuck in a loop.

This happens because PRNGs are deterministic. Once state matches a previous state, the same sequence repeats.

Good PRNGs have long periods for all seeds, but designing such algorithms is extremely difficult.

**Efficiency** - The PRNG should be fast and use reasonable memory.

State size is usually under 4096 bytes, so memory isn't usually a concern. However, larger states are more likely to be underseeded and slower to initialize.

Generating the next number requires mathematical operations on the state. Performance varies by PRNG and architecture. This matters most when you need many random numbers quickly.

## PRNGs in C++

C++ provides randomization capabilities in the `<random>` header. As of C++20, there are 6 PRNG families:

| Type | Family | Period | State | Performance | Quality | Recommended? |
|------|--------|--------|-------|-------------|---------|--------------|
| minstd_rand, minstd_rand0 | Linear congruential | 2^31 | 4 bytes | Bad | Awful | No |
| mt19937, mt19937_64 | Mersenne Twister | 2^19937 | 2500 bytes | Decent | Decent | Probably |
| ranlux24, ranlux48 | Subtract and carry | 10^171 | 96 bytes | Awful | Good | No |
| knuth_b | Shuffled linear congruential | 2^31 | 1028 bytes | Awful | Bad | No |
| default_random_engine | Implementation defined | Varies | Varies | ? | ? | No |
| rand() | Linear congruential | 2^31 | 4 bytes | Bad | Awful | No |

Never use `knuth_b`, `default_random_engine`, or `rand()` (provided for C compatibility).

As of C++20, Mersenne Twister is the only built-in PRNG with decent performance and quality.

**Advanced note:** Tests like PracRand, SmallCrush, Crush, and BigCrush are used to assess PRNG quality and detect biases.


## Should we use Mersenne Twister?

Probably. For most applications, Mersenne Twister is fine.

However, by modern standards, Mersenne Twister is somewhat outdated. Its main weakness: results can be predicted after seeing 624 generated numbers, making it unsuitable for applications requiring unpredictability.

For applications needing highest quality, fastest performance, or unpredictability (cryptography), use a third-party library:

Popular choices:
- **Xoshiro family** and **Wyrand** for non-cryptographic PRNGs
- **Chacha family** for cryptographic (unpredictable) PRNGs

In the next lesson, we'll learn how to actually use Mersenne Twister to generate random numbers in C++.

## Summary

**Random numbers** are essential for games, simulations, and cryptography, providing unpredictability that makes programs interesting and useful.

**True randomness** cannot be generated by computers through software alone because computers operate deterministically—the same inputs always produce the same outputs.

**Algorithms** are sequences of instructions that solve problems, and can be either stateless (requiring all inputs each time) or stateful (maintaining information across calls).

**Pseudo-random number generators (PRNGs)** are algorithms that simulate randomness by generating sequences whose properties approximate true randomness, though they're actually deterministic.

**Seeding** initializes a PRNG's state with a starting value (the seed), determining the entire sequence the PRNG will generate. The same seed always produces the same sequence.

**Underseeding** occurs when a PRNG doesn't receive enough quality seed data relative to its state size, reducing randomness quality and potentially making sequences predictable or correlated.

**Distribution uniformity** means each possible value should be generated with approximately equal probability, preventing bias toward certain values.

**Unpredictability** ensures the next number cannot be easily predicted from previous numbers, important for applications requiring security or fairness.

**Period** is the sequence length before a PRNG repeats itself—longer periods are better, with good PRNGs having extremely long periods for all seeds.

**Mersenne Twister** is C++'s recommended built-in PRNG, offering decent performance and quality for most applications, though it's somewhat outdated by modern standards.

**Third-party PRNGs** like Xoshiro, Wyrand, and Chacha families offer better performance, quality, or cryptographic security than built-in options, recommended for demanding applications.

Understanding PRNG fundamentals helps you choose appropriate random number generation strategies and avoid common pitfalls like underseeding or using poor-quality generators.