Human random numbers are not random
Numbers generated by humans contain unconscious patterns and biases that make them predictable. Computer systems overcome these limitations using algorithms or physical sources of true entropy.
Patterns Detected in Human-Generated Numbers
Started 60 times with number 1 (40.3% of 149 sequences)
In a truly random distribution, all digits from 0 to 9 should have equal probability of being the first number (10% each). Humans tend to prefer certain numbers to start with, such as 7 or 3, and avoid 0 or 1.
Numbers repeat consecutively 817 times (9.3%)
In a truly random sequence, we would expect approximately 10% of the time a number repeats consecutively (example: 33, 77, 00). Humans tend to avoid these repetitions because they seem "non-random", resulting in percentages below 10%.
Average distance between consecutive numbers is 3.31
In a truly random distribution of numbers from 0 to 9, the average distance between consecutive numbers should be approximately 3.0. Humans tend to make larger jumps (distance greater than 3) to make numbers appear more "random".
Number 7 (human favorite) appears 1084 times (12.3%)
In a truly random distribution, each digit from 0 to 9 should appear approximately 10% of the time. 7 is the favorite number of humans and tends to appear more frequently.
Number 0 (avoided by humans) appears 629 times (7.1%)
In a truly random distribution, each digit from 0 to 9 should appear approximately 10% of the time. Humans tend to unconsciously avoid 0, so it tends to appear less.
There are 2661 large jumps of 5+ positions (30.2% of transitions)
In a truly random distribution, large jumps (difference of 5 or more between consecutive numbers) should occur approximately 36% of transitions. Humans often make more large jumps than expected to avoid patterns.
Detected 903 ascending and 875 descending sequences (total: 1778)
In a truly random sequence, ascending sequences (like 1-2-3) and descending sequences (like 8-7-6) occur naturally. Humans tend to avoid these patterns because they seem "too ordered", resulting in fewer sequences than would appear randomly.
Most used number is 1 with 1154 occurrences (13.1%)
In a truly random distribution, all numbers from 0 to 9 should appear with approximately the same frequency (10% each). Humans have unconscious preferences for certain numbers (especially 3, 5, and 7).
Least used number is 0 with 629 occurrences (7.1%)
In a truly random distribution, all numbers from 0 to 9 should appear with approximately the same frequency (10% each). Humans tend to avoid certain numbers (especially 0, 1, and 9).
Central numbers (3-7) represent 50.3%
In a truly random distribution, central numbers (3, 4, 5, 6, 7) should represent exactly 50% and extremes (0, 1, 2, 8, 9) the other 50%. Humans tend to prefer numbers from the middle of the range, resulting in percentages above 50% for central numbers.
In 40 sequences the first and last numbers are the same (26.8% of 149)
In a truly random distribution, the first and last numbers of a sequence should match approximately 10% of the time (1 out of every 10 times). Humans tend to avoid this type of repetition because it seems like a pattern.
Double pairs like 00, 11, 22: 817 times (9.3%)
In a truly random distribution, double pairs (00, 11, 22, 33, etc.) should appear approximately 10% of transitions. Humans actively avoid these repetitions because they don't seem random, resulting in much lower percentages.
Frequently Asked Questions
Why can't humans generate reliable random numbers?
Humans are incapable of generating true randomness due to the architecture of our brain, which is optimized to recognize and create patterns. When trying to choose numbers randomly, our mind introduces a series of unconscious cognitive biases that violate the mathematical principles of randomness (uniformity and independence):
- Pattern Avoidance Bias: We avoid sequences that, although perfectly probable in true randomness, "seem" intentional (for example, 2, 4, 6, 8). By avoiding patterns, we create a predictable pattern of anti-patterns.
- Non-uniform Distribution: We tend to favor central numbers in a range and avoid extremes, resulting in a distribution that is not flat.
- Memory Dependence: The current choice is not independent; the brain remembers what was chosen before and tries to "compensate" to make the sequence seem fair, avoiding repeating a number immediately.
Why do machines generate reliable random numbers?
Machines overcome human limitations in two reliable ways: through mathematical determinism (pseudorandomness) or through physical unpredictability (true randomness).
- Pseudorandom Number Generators (PRNG): They use complex mathematical algorithms that, from a single initial seed, generate very long sequences that possess excellent statistical properties of uniformity and independence, efficiently simulating randomness.
- True Random Number Generators (TRNG): These systems resort to physical phenomena that are intrinsically unpredictable and non-deterministic, leveraging natural entropy. Examples include measuring thermal noise in resistors, quantum variations, or jitter (tiny fluctuations) in hardware interrupt times.
Are numbers generated by machines really random?
Not always. The authenticity of randomness depends directly on the source used:
- Pseudorandom Numbers (PRNG): They are not really random. They are deterministic; if the algorithmic formula and the initial value of the seed are known, the complete sequence is predictable and reproducible. However, their statistical behavior is so good that for most simulations and non-critical applications, they are sufficient.
- Truly Random Numbers (TRNG): They are considered truly random. They are based on measuring non-deterministic physical phenomena (e.g., quantum mechanics), which means the result cannot be predicted with certainty even knowing all inputs, making them essential for cryptography and high-level security.
How does a machine generate a random number?
Machines use two main approaches:
- Mathematical Algorithms (PRNG): An algorithm takes an initial number (seed) and repeatedly applies a complex mathematical function. Each output becomes the input (or part of it) of the next iteration, creating a sequence of numbers with imperceptible correlation. The formula is $X_{n+1} = f(X_n, \dots)$.
- Physical Sources (TRNG): The operating system or dedicated hardware collects entropy from unpredictable physical sources. This involves measuring and quantifying noise (e.g., Zener diode noise, mouse movements, exact time between network requests or voltage fluctuations), then processing this chaotic data through an entropy extractor that transforms it into a uniform and usable stream of bits.
How can a truly random number be generated?
A truly random number (TRNG) can only be generated through exploitation of fundamentally unpredictable physical phenomena, known as entropy sources. These include:
- Thermal Noise: The chaotic movement of electrons in a conductor or semiconductor at room temperature, which generates a measurable random voltage.
- Quantum Phenomena: Measurement of atomic-level processes, such as radioactive decay or vacuum fluctuations, which are intrinsically non-deterministic according to the laws of physics.
- Environmental Sources: Very precise fluctuations in time between hardware interrupts, keyboard presses or mouse movements; these events are unpredictable at the nanosecond level and are used to feed the operating system''s entropy pool.
What is a pseudorandom number?
A pseudorandom number (PRN) is a figure generated by a deterministic algorithmic process designed to simulate the statistical behavior of randomness.
- Nature: It is the result of a mathematical formula and, therefore, is predictable if the formula and initial value (seed) are known.
- Properties: Despite being deterministic, a PRN sequence passes most statistical randomness tests. It is designed to have an extremely long period (the length of the sequence before it repeats) (often trillions of numbers).
- Use: It is the basis for most computer applications, including Monte Carlo simulations, complex system modeling, video games and statistical tests, where speed and the ability to reproduce the sequence (debuggability) are more important than absolute unpredictability.
Why is human randomness insufficient in science and technology?
The insufficiency of human randomness lies in the fact that the patterns and biases we introduce, although subtle, can have a catastrophic impact on the integrity and validity of results:
- Cryptography: The security of an encryption key depends on it being completely unpredictable. If a key is based on a human sequence, an attacker can drastically reduce the search space of the key, compromising data confidentiality.
- Statistics and Sampling: If sample selection is not truly random, a sampling bias is introduced. This means the sample is not representative of the population, making scientific conclusions invalid or misleading.
- Simulations: In complex models, using non-uniformly distributed numbers would introduce patterns in the model, leading to results that do not faithfully reflect the physical or statistical phenomenon being simulated.
Can the brain be trained to be more random?
Yes, obvious patterns can be mitigated, but subconscious cognitive biases can never be completely eliminated. Training and practice can slightly improve human performance. However:
- The underlying problem persists: The brain, by design, will always seek order or avoid "excessive disorder".
- Perfect randomness is unattainable: Even the most trained human will fail the rigorous statistical randomness tests designed for quality PRNGs.
How is the quality of a random number measured?
The quality of a random sequence is measured through a rigorous set of standardized statistical tests, such as those published by NIST (National Institute of Standards and Technology). These tests seek to detect any deviation from pure random behavior. Main metrics include:
- Frequency (Monobit Test): Verifies if the proportion of $0$s and $1$s in a binary sequence is statistically equal.
- Runs Test: Analyzes the length of sequences of identical consecutive bits.
- Entropy: Measures unpredictability or "information content" per bit. High entropy indicates greater randomness.
- Autocorrelation: Measures if a number in the sequence has any mathematical relationship with a previously generated number. The correlation must be null.
Are there risks if we use human "random" numbers in cryptography?
Absolutely. Using human-generated numbers as a source of randomness in cryptography creates a critical vulnerability (backdoor) that can be exploited by any sophisticated attacker.
- Prediction Risk: Since human randomness is biased and contains subtle patterns, it becomes a predictable system. Attackers can use statistical analysis or artificial intelligence to model these biases and guess or reduce the range of possible encryption keys or passwords.
- Data Compromise: A successful prediction compromises the confidentiality and integrity of data. Instead of having to test billions of combinations, the attacker only needs to test those that fit the known human pattern.