Why 1 − (1 − p)ⁿ? · A-level explainer

1. The question

Imagine throwing darts at a wall. Somewhere on the wall there is a small target. Each throw lands at a completely random spot on the wall.

If you throw n darts, what is the probability that at least one hits the target?

This is the same question as: if random insertions are scattered along a DNA segment, what is the probability a given gene gets hit at least once?

A caveat on the analogy. Real darts aren't thrown randomly — even a poor player aims at the board, so their throws cluster around an intended point rather than spreading evenly across the wall. The maths below only works for genuinely uniform-random landings; a truer mental picture is raindrops falling on a roof, or buckshot fired blindfolded. The same idealisation reappears in the biology: most insertional mutagens (Sleeping Beauty, many retroviruses, Agrobacterium T-DNA) have target-site preferences, so "uniform random insertion" is a working approximation rather than a literal description.

2. One throw

Let the target cover a fraction p of the wall's area. Then for a single random throw:

P(hit) = p

P(miss) = 1 − p

Fraction of wall covered by the target = p.

3. Throwing n darts — the independence trick

Each throw is independent. So the probability that all n darts miss is the miss-probability multiplied by itself n times:

P(all n miss) = (1 − p) × (1 − p) × … × (1 − p) = (1 − p)ⁿ

Now use the complement rule: "at least one hit" is the opposite of "every single one missed", so

P(at least one hit in n throws) = 1 − (1 − p)ⁿ

That's it — the whole formula. It's the CDF of the geometric distribution, which asks: "by throw number n, have we had our first success?"

4. Play with it

Move the sliders and watch the probability change.

p (target fraction): 0.050 n (throws): 20

Darts landing randomly. Red = hit, grey = miss.

P(at least one hit) as n increases. Dot = current n.

5. A worked example

Suppose a gene covers 2% of a DNA segment, so p = 0.02. Then after n random insertions:

n (insertions)	(1 − p)ⁿ	P(gene knocked out)

Notice how quickly the probability climbs — but it only approaches 1, it never quite reaches it. There is always a tiny chance a gene survives, even after hundreds of insertions.

6. From one gene to thirty

The simulation has 30 genes with different sizes s₁, s₂, …, s₃₀, on DNA of total length L. For gene i, its "target fraction" is

p_i = s_i / L

so the probability that gene i has been knocked out by insertion number n is

P(gene i KO) = 1 − (1 − s_i/L)ⁿ

The expected number of knocked-out genes is just the sum of these probabilities (by linearity of expectation — a result that works even though the events aren't fully independent):

E[K(n)] = Σ_i=1³⁰ [ 1 − (1 − s_i/L)ⁿ ]

That sum is exactly what the green dashed line on the simulation plots. Each gene contributes its own geometric CDF; because smaller genes have smaller p_i, they take longer to saturate, which is why the last genes to be knocked out are almost always the tiny ones.

7. Sanity checks

n = 0: formula gives 1 − (1 − p)⁰ = 1 − 1 = 0. ✓ No throws, no hits.
p = 0: formula gives 1 − 1ⁿ = 0. ✓ Zero-size target, impossible to hit.
p = 1: formula gives 1 − 0 = 1 for any n ≥ 1. ✓ Target covers the whole wall.
n → ∞: (1 − p)ⁿ → 0, so probability → 1. ✓ Given enough throws, you will eventually hit.

← Back to the simulation