## Coin Flipping, Stopping Time, Monte Carlo Simulation and Importance Weighting

This post discusses a classic coin flipping puzzler and explores Monte Carlo simulation techniques.

Consider a game of two players taking turns flipping a coin. The first player that flips a head wins. The question is what is the probability of winning the game for each player, and what is the expected number of turns the game goes on until someone wins.

To make the problem more general we assume the coin is not necessary fair, meaning the probability of getting a head is not equal to the probability of getting a tail.

Needless to say this type of problem has applications beyond the simple coin setup, for example we might be interested at a stopping time, how long it takes for a stock to cross a certain price level, or looking to price a knock-out option.

One solution to this problem is to add up the probabilities of each scenario to calculate the overall probabilities. Another arguably the more elegant way is to use a symmetry to get to the answer quicker. Let’s do that first.

# Stopping time

Assume the probability of a head to be p, and the game has been going on for a number of steps. We are at time step i, the game has not yet ended and we are about to flip the coin again. We like to compute the expected number of time steps for the game to end.

Let’s define E(t|i) as the expected number of additional time steps for the game to take, not counting the past turns, provided we already got to the time step i.

The expected number of additional steps is composed out of two terms:

• First is the case of throwing a win (head). The probability of this is p. The game ends at that point. In this case just one step concludes the game. The contribution of this case is the product of probability p multiplied by a single step 1.
• Second is the case of a tail with the probability (1-p) at the next step. The game does not stop there, but is expected to continue for a yet unknown number of E(t|i+1) steps from there on. The contribution of this case is the product of the probability (1-p) and the sum of the number 1 for the single step just taken and the expected number of future steps E(t|i+1).

E(t|i) = p*1 + (1-p)*(1+E(t|i+1))

In the limit of p=100%, a certain head, the expected number of steps is just one step. In the other limit p=0%, the game just goes on, and the expectation at time i is just the expectation at the next step plus one, with the limit of a never ending game and infinite stopping time.

The trick of solving the equation above is to recognize the translational or time symmetry in the game. The expectation E(t|i) being defined as the number of expected future steps does not actually depend on the time step i. As long as the game has not ended the chances for the outcome of the game is exactly the same for each player on his turn. With this we can set

E(t) = E(t|i) = E(i|i+1)

and find the solution of E(t) = p+(1-p)*(1+E(t)) to be

E(t) = 1/p

which is the expectation of a geometric distribution.

A game with a fair coin p=50% is expected to last an average of 2 steps. This average is composed out of the chance of the first player winning  at the first flip with a 50% chance, the first player loosing and the second player winning at the second flip with a 25% chance, and so on

2 = 1*50%+2*25%+3*12.5%+4*6.25% …

# Winning probability

The next question is the probability for the first player to win the game.  Let’s start by looking at the probability or expectation E[loose] of a player to loose, assuming it is his or hers turn. It is compose out of the chance to not win at the current step by throwing a tail with probability (1-p), and for the opposing player to win on his turn at he next step:

E[loose|player-1, step 1]=(1-p)*E[win|player-2, step 2]

However, the sum of probabilities of loosing and winning following your turn turn adds up to one, so that

E[win|player-1, step 1] = 1-E[loose|player-1, step 1]

Again we use translational symmetry by equating the expectation of player-1 to win on step 1 to the expectation of player-2 to win the game on his turn on step 2, with the condition that the game has progressed to his turn:

E[win|player-2, step 2] = E[win|player-1, step 1]

In other words once its your turn and you have not lost, your chances of winning are just the same as for player-1 one day one. Every day is fresh in this game and brings a equal chance of winning!

With this we get

1-E[win] = (1-p)E[win]

or

E[win] = 1/(2-p)

In case of a fair coin with p=50% the chances of winning are about  67%. Again this can be expressed of a series of probabilities for winning at the first step with 50% chance, winning at the third step after successive losses by both players with a 12.5% chance, and so on

2/3 = 50%+12.5%+3.125%+…

# Geometric series solution

This leads to yet another way to solve this puzzles by series expansion. The total expected stopping time is a sum of time steps multiplied by products of successive tails with probability q=1-p followed by a single head p

$E(t) = 1p+2qp+3q^2p+4q^3p+...$

$E(t) = p(1+2q+3q^2+4q^3+...)$

The above series has just the structure of a derivative in respect to q, so that we can write it as

$E(t) = p \frac{d}{d q}(q+q^2+q^3+q^4+...)$

Identifying a geometric sum by adding in the missing 1

$E(t) = p \frac{d}{d q}(-1+ \frac{d}{d q} \sum_{i=0}^{\infty} q^i )$

which has a well known formula

$E(t) = p \frac{d}{d q}(-1+\frac{1}{1-q})$

Taking the derivative leads to the result for the expected stopping time

$E(t) = p \frac{1}{(1-q)^2} = 1/p$

Similarly the can also compute the winning probability by summing up the probability series. The probability to win is the sum over odd time steps composed out a single head and even number of tails

$E(win) = p + q^2p+q^4p+q^6p+...$

Again identifying a geometric sum in terms of the squared probability of a tail, we get the result

$E(win) = p \sum_{i=0}^{\infty}q^2i = p/(1-q^2)=1/(2-p)$

# Monte Carlo simulation and convergence

In Monte Carlo simulation one does attempt to compute expectations such as the stopping time or winning probability by summing over scenarios. It might seem redundant to apply simulations to this simple coin game since we already computed the solution in closed form, but one could imagine variations of the game that do not lead to an as simple theoretical solution. Maybe the payout to the winner is made dependent on the number of steps taken before the game ends. A financial example could be a structured equity product that stops paying coupons upon a reference index crossing a barrier. Here, the greater the stopping time the greater the total amount of coupons collected.

Let’s compare the case of an biased and fair coin, with the biased head probability of $p=11/36$, and the fair head probability $p_0 = 1/2$. The biased coin example is taken as a proxy to an equivalent bet of winning a dice game consisting of two dices with getting at least one six being the winning play. The following Matlab code generates a graph that depicts the expected stopping time for the fair and biased coin.

 plot(
1:25,36/11+0*(1:25),
1:25,11/36*cumsum((1:25).*(25/36).^((1:25)-1)),
1:25,2+0*(1:25),
1:25,0.5*cumsum((1:25).*0.5.^((1:25)-1))
)
title("Theoretical and approximated stopping times vs. number of steps included in estimation.")
xlabel("Number of  steps in summation")
ylabel("Expected number of flips (stopping time)")
axis([1 25 0 3.5])


It compares the exact theoretical expectation to the estimated probability of including an increasing number of steps into the summation of probabilities using the series expansion computed above.

As the graph shows we need to consider about 10 steps in order to recover the maturity of the expectation for the fair coin, while the biased coin requires about 25 steps. The biased coin in our example has a smaller probability to yield a head and stop the game, thus requiring a greater number of steps to recover the total overall expectation.

The same kind of picture unveils for the expected probability for the first player to win the game. Using the following Matlab code

plot(
2*(1:13)-1,36/61+0*(2*(1:13)-1),
2*(1:13)-1,11/36*cumsum( (25/36).^(2*(1:13)-2)),
2*(1:13)-1,2/3+0*(2*(1:13)-1),
2*(1:13)-1,0.5*cumsum( (0.5).^(2*(1:13)-2))
)
title("Theoretical and approximate expectation of winning vs. number of steps included in estimation.")
xlabel("Number of  steps in summation")
ylabel("Probability of winning")
axis([1 25 0.3 0.7])


plots the theoretical and estimated expectations for the fair and biased coins

# Simulating coin flips

Next we intend to estimate the stopping time and winning probabilities by Monte Carlo simulation. As straight forward approach would be to simulate individual coin flips with the appropriate probabilities for the fair or biased coin.

In the following example random imaginary coin flips are simulated by sampling a uniform distribution of a total of n_bins_total. Any sample above the number n_bins_head is counted towards a tail, otherwise they are counted towards a head. A choice of  n_bins_total=36 and n_bins_head=11 simulates the biased coin in our example, while n_bins_total=36 and n_bins_head=18 corresponds to the fair coin.

However the random Monte Carlo path generated by the choice of n_bins_head and n_bins_total does not need to follow the same distribution as the actual coin we intend to  simulate, provided we adjust the likelihood ratios accordingly. The simulation below computes expectations for the fair and the biased coins simultaneously along the same path, by importance weighting each outcome by the ratio of the fair or biased respective probabilities and the weights used to generate the path. For example if the Monte Carlo generation produces a head and we are in the process of computing the expectation over the biased coin, the resulting term is multiplied by the ratio of p_head_biased/(n_bins_head/n_bins_total)).

% number of simulations m
m = 100000;
n_bins_total = 36;
stopping_time = zeros(1,m);
weights = zeros(1,m);
weights_fair = zeros(1,m);

for i = 1:m
step = 1;
weight_biased = 1;
weight_fair = 1;

# coin flip resulted in tail -> game continues
step = step + 1;
weight_biased= weight_biased*p_tail_biased/w_tail;
weight_fair = weight_fair*p_tail_fair/w_tail;
end

stopping_time(i)=step;
end

t_stop_fair = mean(stopping_time.*weights_fair);
t_stop_biased = mean(stopping_time.*weights_biased);
pwin_fair = sum(weights_fair(mod(stopping_time,2)==1)) /length(stopping_time);
pwin_biased = sum(weights_biased(mod(stopping_time,2)==1)) /length(stopping_time);

fprintf( “Stopping time for fair coin:      simulated: %f, calculated %f \n” , t_stop_fair, 2);
fprintf( “Stopping time for biased coin:    simulated: %f, calculated %f \n”, t_stop_biased, 36/11);
fprintf( “First player win for fair coin:   simulated: %f, calculated %f \n”, pwin_fair, 1/(2-0.5) );
fprintf( “First player win for biased coin: simulated: %f, calculated %f \n”, pwin_biased,1/(2-11/36) );


Stopping time for fair coin: simulated: 1.999439, calculated 2.000000
Stopping time for biased coin: simulated: 3.277148, calculated 3.272727
First player win for fair coin: simulated: 0.665160, calculated 0.666667
First player win for biased coin: simulated: 0.590150, calculated 0.590164

We can visualize the progress of the Monte Carlo simulation by plotting the running winning expectation versus the number of paths generated

plot( (1:length(stopping_time) )(mod(stopping_time,2)==1)  , cumsum(weights_biased(mod(stopping_time,2)==1))./(   (1:length(stopping_time) )(mod(stopping_time,2)==1)  ) , (1:length(stopping_time) )(mod(stopping_time,2)==1) , 1/(2-11/36)+0*(1:length(stopping_time) )(mod(stopping_time,2)==1) ,
(1:length(stopping_time) )(mod(stopping_time,2)==1)  , cumsum(weights_fair(mod(stopping_time,2)==1))./(   (1:length(stopping_time) )(mod(stopping_time,2)==1)  ) , (1:length(stopping_time) )(mod(stopping_time,2)==1) , 1/(2-18/36)+0*(1:length(stopping_time) )(mod(stopping_time,2)==1) )
title(’Simulated probability of winning vs. number of simulations’)
xlabel(’Number of simulations’)
ylabel(’Probability of winning’)
axis([1 m .55 .7])


The technique of keeping the random number generator measure separate from the actual coin process allows to compute the expectation of both the fair and biased coins along the same path. It also gives the flexibility to emphasize shorter or longer paths. A large n_bins_head would create shorter paths within the simulation as it would be more likely to throw a head, likewise a small n_bins_head would create more tails leading to longer paths.

A possible application to this is a situation where a Monte Carlo simulation using the actual price measure would not naturally sample important areas of the phase space. For example pricing a far out of the money option with few price paths sampling the interesting high gamma region around the strike. Similarly an option might be either too far or to close to the barrier in order for sufficient number of Monte Carlo paths to sample the knock out or reach a payout.

# Simulating stopping times

We can also simulate the effective path length directly. From our initial convergence study is seems a path length of 25 should be sufficient. The following code generates path up to 25 steps with from a uniform sample space and weights each path according to its expectation as given by the actual properties of the fair or biased coins:

% simulating path length
% number of simulations m
m = 100000;
n_bins_total = 25;
w = 1/n_bins_total;
stopping_time = zeros(1,m);
weights_biased = zeros(1,m);
weights_fair = zeros(1,m);

for i = 1:m
number_of_tails = unidrnd(n_bins_total) - 1;
stopping_time(i)=number_of_tails+1;
end

t_stop_fair = mean(stopping_time.*weights_fair);
t_stop_biased = mean(stopping_time.*weights_biased);
pwin_fair = sum(weights_fair(mod(stopping_time,2)==1)) /length(stopping_time);
pwin_biased = sum(weights_biased(mod(stopping_time,2)==1)) /length(stopping_time);

fprintf( "Stopping time for fair coin flip: simulated: %f, calculated %f   \n" , t_stop_fair, 2);
fprintf( "Stopping time for biased coin:    simulated: %f, calculated %f   \n", t_stop_biased, 36/11);
fprintf( "First player win for fair coin:   simulated: %f, calculated %f   \n", pwin_fair,   1/(2-18/36));
fprintf( "First player win for biased coin: simulated: %f, calculated %f   \n", pwin_biased,   1/(2-11/36));


Stopping time for fair coin: simulated: 1.994821, calculated: 2.000000
Stopping time for biased coin: simulated: 3.264265, calculated: 3.272727
First player win for fair coin: simulated: 0.669453, calculated: 0.666667
First player win for biased coin: simulated: 0.590975, calculated: 0.590164

Using the same number of 100,000 Monte Carlo trials the direct simulation of paths is much faster than generating individual coin flips. However as depict in the graph above we get a slower convergence and greater variance. This is possibly a consequence of uniformly sampling short and long paths, while the actual coin process visits short paths much more frequently.

# Minimizing variance

This leads to the question of what is the optimal choice of the measure w we use to generate the simulation? Let’s take a step back and generally consider the expectation of a function f(x) over the sample space of a random variable x. In the actual probability measure it is the integral or sum over all possible samples of x applied to the observable f(x) and weighted by the corresponding probability p(x)

$E_p[f(x)] = \sum f(x_i) p(x_i)$.

In Monte Carlo simulation we take the expectation over simulated paths, where the simulation is generated by a different measure w. In order to recover the identical expectation in the new measure averaging must be taken over f(x) multiplied by the likelihood ratio p(x)/w(x).

$E_p[f(x)] = E_w[f(x)(p(x)/w(x)) ] = \sum f(x_i)(p(x_i)/w(x_i)) w(x_i) = \sum f(x_i) p(x_i)$

The measure w represents the characteristics of the random scenario generator of the Monte Carlo implementation. In the example above it seems that choosing w to be uniform for directly generating stopping times did lead to an increased variance and poorer convergence. What choice for would be optimal? Let’s investigate what choice of w would minimize the variance of the simulation.

The variance of the expectation of f(x) in the measure w is given by

$Var_w(f(x)) = E_w[(f(x)*p(x)/w(x))^2] - E_w[f(x)*p(x)/w(x)]^2$
$= \sum (f_i \frac{p_i}{w_i})^2 w_i - ( \sum f_i \frac{p_i}{w_i} w_i )^2$
$= \sum f_i^2 \frac{p_i^2}{w_i} - (\sum f_i p_i)^2$

We are looking to minimize the variance. Introducing a Lagrange multiplier $\lambda$ to take into to account that the w_i sum up to 1, the condition for the minimum variance can be expressed as

$\frac{\partial}{\partial w_i} \left( \sum f_i^2 \frac{p_i^2}{w_i} - (\sum f_i p_i)^2 + \lambda (\sum w_i -1 ) \right) = 0$

Taking partial derivatives of the variance with respect to the weights w_i  and solving for the weights

$w_i = \frac{f_i p_i}{ \lambda^{1/2} }$

Summing over the weights leads to

$\sum w_i = 1 = \lambda^{-1/2} \sum { f_i p_i }$

$\lambda = \left( \sum { f_i p_i } \right)^2$

and finally

$w_i = \frac{f_i p_i}{ \sum { f_i p_i } }$

We find that the optimum choice of simulation measure is the real measure modified by the observable f.

Substituting this choice of weights back into the expression for the variance

$Var_w(f(x)) = \sum f_i^2 \frac{p_i^2}{w_i} - (\sum f_i p_i)^2$
$= \sum \frac{f_i^2 p_i^2}{f_i p_i} \sum { f_i p_i } - (\sum f_i p_i)^2$
$= \sum {f_i p_i} \sum { f_i p_i } - (\sum f_i p_i)^2 = 0$

we find it reduces it to zero.

Let’s go back to the problem of estimating the wining probability. Here the winning indicator is only 1 for odd steps, meaning it would be ideal to not simulate paths with even number of steps, and normalize the weight according to the missing probability of even paths. In our case we want to estimate simultaneously the stopping time which needs also the even paths. As an compromise we use weights according to f=1 in order to cover most of the probability space.

% simulating path length
% number of simulations m
m = 100000;
n_bins_total = 25;
w = (25/36).^((1:n_bins_total )-1)/sum((25/36).^((1:n_bins_total )-1));
cw = cumsum(w);
stopping_time = zeros(1,m);
weights_biased = zeros(1,m);
weights_fair = zeros(1,m);

for i = 1:m
number_of_tails = min((1:n_bins_total)(cw-rand(1)>0))-1;
stopping_time(i)=number_of_tails+1;
end

t_stop_fair = mean(stopping_time.*weights_fair);
t_stop_biased = mean(stopping_time.*weights_biased);
pwin_fair = sum(weights_fair(mod(stopping_time,2)==1)) /length(stopping_time);
pwin_biased = sum(weights_biased(mod(stopping_time,2)==1)) /length(stopping_time);

fprintf( "Stopping time for fair coin flip: simulated: %f, calculated %f   \n" , t_stop_fair, 2);
fprintf( "Stopping time for biased coin:    simulated: %f, calculated %f   \n", t_stop_biased, 36/11);
fprintf( "First player win for fair coin:   simulated: %f, calculated %f   \n", pwin_fair,   1/(2-18/36));
fprintf( "First player win for biased coin: simulated: %f, calculated %f   \n", pwin_biased,   1/(2-11/36));


Stopping time for fair coin flip: simulated: 2.000213, calculated 2.000000
Stopping time for biased coin: simulated: 3.274630, calculated 3.272727
First player win for fair coin: simulated: 0.666232, calculated 0.666667
First player win for biased coin: simulated: 0.590285, calculated 0.590164

Starting from a simple flipping coin game we investigated theoretical and numerical solution techniques to compute expectations. This post did not discuss other popular variance reduction methods like the use of antithetic paths or control variates, which could be a topic of future writings.