Coin Flipping, Stopping Time, Monte Carlo Simulation and Importance Weighting


This post discusses a classic coin flipping puzzler and explores Monte Carlo simulation techniques.

Consider a game of two players taking turns flipping a coin. The first player that flips a head wins. The question is what is the probability of winning the game for each player, and what is the expected number of turns the game goes on until someone wins.

To make the problem more general we assume the coin is not necessary fair, meaning the probability of getting a head is not equal to the probability of getting a tail.

Needless to say this type of problem has applications beyond the simple coin setup, for example we might be interested at a stopping time, how long it takes for a stock to cross a certain price level, or looking to price a knock-out option.

One solution to this problem is to add up the probabilities of each scenario to calculate the overall probabilities. Another arguably the more elegant way is to use a symmetry to get to the answer quicker. Let’s do that first.

Stopping time

Assume the probability of a head to be p, and the game has been going on for a number of steps. We are at time step i, the game has not yet ended and we are about to flip the coin again. We like to compute the expected number of time steps for the game to end.

Let’s define E(t|i) as the expected number of additional time steps for the game to take, not counting the past turns, provided we already got to the time step i.

The expected number of additional steps is composed out of two terms:

  • First is the case of throwing a win (head). The probability of this is p. The game ends at that point. In this case just one step concludes the game. The contribution of this case is the product of probability p multiplied by a single step 1.
  • Second is the case of a tail with the probability (1-p) at the next step. The game does not stop there, but is expected to continue for a yet unknown number of E(t|i+1) steps from there on. The contribution of this case is the product of the probability (1-p) and the sum of the number 1 for the single step just taken and the expected number of future steps E(t|i+1).

E(t|i) = p*1 + (1-p)*(1+E(t|i+1))

In the limit of p=100%, a certain head, the expected number of steps is just one step. In the other limit p=0%, the game just goes on, and the expectation at time i is just the expectation at the next step plus one, with the limit of a never ending game and infinite stopping time.

The trick of solving the equation above is to recognize the translational or time symmetry in the game. The expectation E(t|i) being defined as the number of expected future steps does not actually depend on the time step i. As long as the game has not ended the chances for the outcome of the game is exactly the same for each player on his turn. With this we can set

E(t) = E(t|i) = E(i|i+1)

and find the solution of E(t) = p+(1-p)*(1+E(t)) to be

E(t) = 1/p

which is the expectation of a geometric distribution.

A game with a fair coin p=50% is expected to last an average of 2 steps. This average is composed out of the chance of the first player winning  at the first flip with a 50% chance, the first player loosing and the second player winning at the second flip with a 25% chance, and so on

2 = 1*50%+2*25%+3*12.5%+4*6.25% …

Winning probability

The next question is the probability for the first player to win the game.  Let’s start by looking at the probability or expectation E[loose] of a player to loose, assuming it is his or hers turn. It is compose out of the chance to not win at the current step by throwing a tail with probability (1-p), and for the opposing player to win on his turn at he next step:

E[loose|player-1, step 1]=(1-p)*E[win|player-2, step 2]

However, the sum of probabilities of loosing and winning following your turn turn adds up to one, so that

E[win|player-1, step 1] = 1-E[loose|player-1, step 1]

Again we use translational symmetry by equating the expectation of player-1 to win on step 1 to the expectation of player-2 to win the game on his turn on step 2, with the condition that the game has progressed to his turn:

E[win|player-2, step 2] = E[win|player-1, step 1]

In other words once its your turn and you have not lost, your chances of winning are just the same as for player-1 one day one. Every day is fresh in this game and brings a equal chance of winning!

With this we get

1-E[win] = (1-p)E[win]

or

E[win] = 1/(2-p)

In case of a fair coin with p=50% the chances of winning are about  67%. Again this can be expressed of a series of probabilities for winning at the first step with 50% chance, winning at the third step after successive losses by both players with a 12.5% chance, and so on

2/3 = 50%+12.5%+3.125%+…

Geometric series solution

This leads to yet another way to solve this puzzles by series expansion. The total expected stopping time is a sum of time steps multiplied by products of successive tails with probability q=1-p followed by a single head p

E(t) = 1p+2qp+3q^2p+4q^3p+...

E(t) = p(1+2q+3q^2+4q^3+...)

The above series has just the structure of a derivative in respect to q, so that we can write it as

E(t) = p \frac{d}{d q}(q+q^2+q^3+q^4+...)

Identifying a geometric sum by adding in the missing 1

E(t) = p \frac{d}{d q}(-1+ \frac{d}{d q} \sum_{i=0}^{\infty} q^i )

which has a well known formula

E(t) = p \frac{d}{d q}(-1+\frac{1}{1-q})

Taking the derivative leads to the result for the expected stopping time

E(t) = p \frac{1}{(1-q)^2} = 1/p

Similarly the can also compute the winning probability by summing up the probability series. The probability to win is the sum over odd time steps composed out a single head and even number of tails

E(win) = p + q^2p+q^4p+q^6p+...

Again identifying a geometric sum in terms of the squared probability of a tail, we get the result

E(win) = p \sum_{i=0}^{\infty}q^2i = p/(1-q^2)=1/(2-p)

Monte Carlo simulation and convergence

In Monte Carlo simulation one does attempt to compute expectations such as the stopping time or winning probability by summing over scenarios. It might seem redundant to apply simulations to this simple coin game since we already computed the solution in closed form, but one could imagine variations of the game that do not lead to an as simple theoretical solution. Maybe the payout to the winner is made dependent on the number of steps taken before the game ends. A financial example could be a structured equity product that stops paying coupons upon a reference index crossing a barrier. Here, the greater the stopping time the greater the total amount of coupons collected.

Let’s compare the case of an biased and fair coin, with the biased head probability of p=11/36, and the fair head probability p_0 = 1/2. The biased coin example is taken as a proxy to an equivalent bet of winning a dice game consisting of two dices with getting at least one six being the winning play. The following Matlab code generates a graph that depicts the expected stopping time for the fair and biased coin.

 plot(
1:25,36/11+0*(1:25),
1:25,11/36*cumsum((1:25).*(25/36).^((1:25)-1)),
1:25,2+0*(1:25),
1:25,0.5*cumsum((1:25).*0.5.^((1:25)-1))
)
title("Theoretical and approximated stopping times vs. number of steps included in estimation.")
xlabel("Number of  steps in summation")
ylabel("Expected number of flips (stopping time)")
axis([1 25 0 3.5])

It compares the exact theoretical expectation to the estimated probability of including an increasing number of steps into the summation of probabilities using the series expansion computed above.

As the graph shows we need to consider about 10 steps in order to recover the maturity of the expectation for the fair coin, while the biased coin requires about 25 steps. The biased coin in our example has a smaller probability to yield a head and stop the game, thus requiring a greater number of steps to recover the total overall expectation.

The same kind of picture unveils for the expected probability for the first player to win the game. Using the following Matlab code

plot(
2*(1:13)-1,36/61+0*(2*(1:13)-1),
2*(1:13)-1,11/36*cumsum( (25/36).^(2*(1:13)-2)),
2*(1:13)-1,2/3+0*(2*(1:13)-1),
2*(1:13)-1,0.5*cumsum( (0.5).^(2*(1:13)-2))
)
title("Theoretical and approximate expectation of winning vs. number of steps included in estimation.")
xlabel("Number of  steps in summation")
ylabel("Probability of winning")
axis([1 25 0.3 0.7])

plots the theoretical and estimated expectations for the fair and biased coins

Simulating coin flips

Next we intend to estimate the stopping time and winning probabilities by Monte Carlo simulation. As straight forward approach would be to simulate individual coin flips with the appropriate probabilities for the fair or biased coin.

In the following example random imaginary coin flips are simulated by sampling a uniform distribution of a total of n_bins_total. Any sample above the number n_bins_head is counted towards a tail, otherwise they are counted towards a head. A choice of  n_bins_total=36 and n_bins_head=11 simulates the biased coin in our example, while n_bins_total=36 and n_bins_head=18 corresponds to the fair coin.

However the random Monte Carlo path generated by the choice of n_bins_head and n_bins_total does not need to follow the same distribution as the actual coin we intend to  simulate, provided we adjust the likelihood ratios accordingly. The simulation below computes expectations for the fair and the biased coins simultaneously along the same path, by importance weighting each outcome by the ratio of the fair or biased respective probabilities and the weights used to generate the path. For example if the Monte Carlo generation produces a head and we are in the process of computing the expectation over the biased coin, the resulting term is multiplied by the ratio of p_head_biased/(n_bins_head/n_bins_total)).

% number of simulations m
m = 100000;
n_bins_total = 36;
n_bins_head = 11;
w_head =n_bins_head/n_bins_total;
w_tail = 1-w_head;
p_head_biased = 11/36;
p_tail_biased = 1-p_head_biased;
p_head_fair = 0.5;
p_tail_fair = 1-p_head_fair;
stopping_time = zeros(1,m);
weights = zeros(1,m);
weights_fair = zeros(1,m);

for i = 1:m
    step = 1;
    weight_biased = 1;
    weight_fair = 1;

    while (unidrnd(n_bins_total)> n_bins_head)
          # coin flip resulted in tail -> game continues
          step = step + 1;
          weight_biased= weight_biased*p_tail_biased/w_tail;
          weight_fair = weight_fair*p_tail_fair/w_tail;
     end

     weights_biased(i) = weight_biased*p_head_biased/w_head;
     weights_fair(i) = weight_fair*p_head_fair/w_head;
     stopping_time(i)=step;
end

t_stop_fair = mean(stopping_time.*weights_fair);
t_stop_biased = mean(stopping_time.*weights_biased);
pwin_fair = sum(weights_fair(mod(stopping_time,2)==1)) /length(stopping_time);
pwin_biased = sum(weights_biased(mod(stopping_time,2)==1)) /length(stopping_time);

fprintf( “Stopping time for fair coin:      simulated: %f, calculated %f \n” , t_stop_fair, 2);
fprintf( “Stopping time for biased coin:    simulated: %f, calculated %f \n”, t_stop_biased, 36/11);
fprintf( “First player win for fair coin:   simulated: %f, calculated %f \n”, pwin_fair, 1/(2-0.5) );
fprintf( “First player win for biased coin: simulated: %f, calculated %f \n”, pwin_biased,1/(2-11/36) );

Stopping time for fair coin: simulated: 1.999439, calculated 2.000000
Stopping time for biased coin: simulated: 3.277148, calculated 3.272727
First player win for fair coin: simulated: 0.665160, calculated 0.666667
First player win for biased coin: simulated: 0.590150, calculated 0.590164

We can visualize the progress of the Monte Carlo simulation by plotting the running winning expectation versus the number of paths generated

plot( (1:length(stopping_time) )(mod(stopping_time,2)==1)  , cumsum(weights_biased(mod(stopping_time,2)==1))./(   (1:length(stopping_time) )(mod(stopping_time,2)==1)  ) , (1:length(stopping_time) )(mod(stopping_time,2)==1) , 1/(2-11/36)+0*(1:length(stopping_time) )(mod(stopping_time,2)==1) ,
(1:length(stopping_time) )(mod(stopping_time,2)==1)  , cumsum(weights_fair(mod(stopping_time,2)==1))./(   (1:length(stopping_time) )(mod(stopping_time,2)==1)  ) , (1:length(stopping_time) )(mod(stopping_time,2)==1) , 1/(2-18/36)+0*(1:length(stopping_time) )(mod(stopping_time,2)==1) )
title(’Simulated probability of winning vs. number of simulations’)
xlabel(’Number of simulations’)
ylabel(’Probability of winning’)
axis([1 m .55 .7])

The technique of keeping the random number generator measure separate from the actual coin process allows to compute the expectation of both the fair and biased coins along the same path. It also gives the flexibility to emphasize shorter or longer paths. A large n_bins_head would create shorter paths within the simulation as it would be more likely to throw a head, likewise a small n_bins_head would create more tails leading to longer paths.

A possible application to this is a situation where a Monte Carlo simulation using the actual price measure would not naturally sample important areas of the phase space. For example pricing a far out of the money option with few price paths sampling the interesting high gamma region around the strike. Similarly an option might be either too far or to close to the barrier in order for sufficient number of Monte Carlo paths to sample the knock out or reach a payout.

Simulating stopping times

We can also simulate the effective path length directly. From our initial convergence study is seems a path length of 25 should be sufficient. The following code generates path up to 25 steps with from a uniform sample space and weights each path according to its expectation as given by the actual properties of the fair or biased coins:

% simulating path length
% number of simulations m
m = 100000;
n_bins_total = 25;
w = 1/n_bins_total;
p_head_fair = 1/2;
p_tail_fair = 1-p_head_fair;
p_head_biased = 11/36;
p_tail_biased = 1-p_head_biased;
stopping_time = zeros(1,m);
weights_biased = zeros(1,m);
weights_fair = zeros(1,m);

for i = 1:m
  number_of_tails = unidrnd(n_bins_total) - 1;
  weights_biased(i) = 1/w*p_head_biased*p_tail_biased^number_of_tails;
  weights_fair(i) = 1/w*p_head_fair*(p_tail_fair)^(number_of_tails);
  stopping_time(i)=number_of_tails+1;
end

t_stop_fair = mean(stopping_time.*weights_fair);
t_stop_biased = mean(stopping_time.*weights_biased);
pwin_fair = sum(weights_fair(mod(stopping_time,2)==1)) /length(stopping_time);
pwin_biased = sum(weights_biased(mod(stopping_time,2)==1)) /length(stopping_time);

fprintf( "Stopping time for fair coin flip: simulated: %f, calculated %f   \n" , t_stop_fair, 2);
fprintf( "Stopping time for biased coin:    simulated: %f, calculated %f   \n", t_stop_biased, 36/11);
fprintf( "First player win for fair coin:   simulated: %f, calculated %f   \n", pwin_fair,   1/(2-18/36));
fprintf( "First player win for biased coin: simulated: %f, calculated %f   \n", pwin_biased,   1/(2-11/36));

Stopping time for fair coin: simulated: 1.994821, calculated: 2.000000
Stopping time for biased coin: simulated: 3.264265, calculated: 3.272727
First player win for fair coin: simulated: 0.669453, calculated: 0.666667
First player win for biased coin: simulated: 0.590975, calculated: 0.590164


Using the same number of 100,000 Monte Carlo trials the direct simulation of paths is much faster than generating individual coin flips. However as depict in the graph above we get a slower convergence and greater variance. This is possibly a consequence of uniformly sampling short and long paths, while the actual coin process visits short paths much more frequently.

Minimizing variance

This leads to the question of what is the optimal choice of the measure w we use to generate the simulation? Let’s take a step back and generally consider the expectation of a function f(x) over the sample space of a random variable x. In the actual probability measure it is the integral or sum over all possible samples of x applied to the observable f(x) and weighted by the corresponding probability p(x)

E_p[f(x)] = \sum f(x_i) p(x_i).

In Monte Carlo simulation we take the expectation over simulated paths, where the simulation is generated by a different measure w. In order to recover the identical expectation in the new measure averaging must be taken over f(x) multiplied by the likelihood ratio p(x)/w(x).

E_p[f(x)] = E_w[f(x)(p(x)/w(x)) ] = \sum f(x_i)(p(x_i)/w(x_i)) w(x_i) = \sum f(x_i) p(x_i)

The measure w represents the characteristics of the random scenario generator of the Monte Carlo implementation. In the example above it seems that choosing w to be uniform for directly generating stopping times did lead to an increased variance and poorer convergence. What choice for would be optimal? Let’s investigate what choice of w would minimize the variance of the simulation.

The variance of the expectation of f(x) in the measure w is given by

Var_w(f(x)) = E_w[(f(x)*p(x)/w(x))^2] - E_w[f(x)*p(x)/w(x)]^2
= \sum (f_i \frac{p_i}{w_i})^2 w_i - ( \sum f_i \frac{p_i}{w_i} w_i )^2
= \sum f_i^2 \frac{p_i^2}{w_i} - (\sum f_i p_i)^2

We are looking to minimize the variance. Introducing a Lagrange multiplier \lambda to take into to account that the w_i sum up to 1, the condition for the minimum variance can be expressed as

\frac{\partial}{\partial w_i} \left( \sum f_i^2 \frac{p_i^2}{w_i} - (\sum f_i p_i)^2 + \lambda (\sum w_i -1 ) \right) = 0

Taking partial derivatives of the variance with respect to the weights w_i  and solving for the weights

w_i = \frac{f_i p_i}{ \lambda^{1/2} }

Summing over the weights leads to

\sum w_i = 1 = \lambda^{-1/2} \sum { f_i p_i }

\lambda = \left( \sum { f_i p_i } \right)^2

and finally

w_i = \frac{f_i p_i}{ \sum { f_i p_i } }

We find that the optimum choice of simulation measure is the real measure modified by the observable f.

Substituting this choice of weights back into the expression for the variance

Var_w(f(x)) = \sum f_i^2 \frac{p_i^2}{w_i} - (\sum f_i p_i)^2
= \sum \frac{f_i^2 p_i^2}{f_i p_i}  \sum { f_i p_i } - (\sum f_i p_i)^2
= \sum {f_i p_i}  \sum { f_i p_i } - (\sum f_i p_i)^2 = 0

we find it reduces it to zero.

Let’s go back to the problem of estimating the wining probability. Here the winning indicator is only 1 for odd steps, meaning it would be ideal to not simulate paths with even number of steps, and normalize the weight according to the missing probability of even paths. In our case we want to estimate simultaneously the stopping time which needs also the even paths. As an compromise we use weights according to f=1 in order to cover most of the probability space.

Implementing the above leads to

% simulating path length
% number of simulations m
m = 100000;
n_bins_total = 25;
w = (25/36).^((1:n_bins_total )-1)/sum((25/36).^((1:n_bins_total )-1));
cw = cumsum(w);
p_head_fair = 1/2;
p_tail_fair = 1-p_head_fair;
p_head_biased = 11/36;
p_tail_biased = 1-p_head_biased;
stopping_time = zeros(1,m);
weights_biased = zeros(1,m);
weights_fair = zeros(1,m);

for i = 1:m
 number_of_tails = min((1:n_bins_total)(cw-rand(1)>0))-1;
 weights_biased(i) = 1/w(1+number_of_tails)*p_head_biased*p_tail_biased^number_of_tails;
 weights_fair(i) = 1/w(1+number_of_tails)*p_head_fair*(p_tail_fair)^(number_of_tails);
 stopping_time(i)=number_of_tails+1;
end

t_stop_fair = mean(stopping_time.*weights_fair);
t_stop_biased = mean(stopping_time.*weights_biased);
pwin_fair = sum(weights_fair(mod(stopping_time,2)==1)) /length(stopping_time);
pwin_biased = sum(weights_biased(mod(stopping_time,2)==1)) /length(stopping_time);

fprintf( "Stopping time for fair coin flip: simulated: %f, calculated %f   \n" , t_stop_fair, 2);
fprintf( "Stopping time for biased coin:    simulated: %f, calculated %f   \n", t_stop_biased, 36/11);
fprintf( "First player win for fair coin:   simulated: %f, calculated %f   \n", pwin_fair,   1/(2-18/36));
fprintf( "First player win for biased coin: simulated: %f, calculated %f   \n", pwin_biased,   1/(2-11/36));

Stopping time for fair coin flip: simulated: 2.000213, calculated 2.000000
Stopping time for biased coin: simulated: 3.274630, calculated 3.272727
First player win for fair coin: simulated: 0.666232, calculated 0.666667
First player win for biased coin: simulated: 0.590285, calculated 0.590164

Starting from a simple flipping coin game we investigated theoretical and numerical solution techniques to compute expectations. This post did not discuss other popular variance reduction methods like the use of antithetic paths or control variates, which could be a topic of future writings.

Posted in Uncategorized | Tagged , , , , , , | 1 Comment

Volatility and Correlation

The implied option volatility reflects the price premium an option commands. A trader’s profit and loss ‘P&L’ from hedging option positions is driven to a large extend by the actual historical volatility of the underlying assets. Thus as option premiums are driven by supply and demand they should reflect to some extend the expectation of future realized asset volatility by market participants.

Bloomberg GV function displays option implied vs. realized volatility. Below for example for the Dow Jones INDU index both realized and implied volatility measures are reasonably aligned

Another interesting aspect in the context of dispersion trading is implied correlation. Implied correlation is a measure on how index volatility compares to the volatility of a basket of the individual index components. A trader can take a position in correlation for example by selling options of the individual component stocks and buying index options. Instead of taking position in options a similar strategy can also be executed using variance swaps.

The Dow Jones index I is comprised of 30 individual stocks Si
I = \sum_{i=1}^{N}{w_i S_i}
with basket weights wi.
The price volatility of the index is determined by the aggregation of individual stock volatilities according to its correlation structure. Let’s consider two extreme cases. In the situation the movement of all basket components are aligned the pairwise correlation between assets is 1. The correlation 1 basket variance V1 is given by the squared sum of weighted asset volatilities
V_1 = (\sum_{i=1}^{N}{w_i \sigma_i})^2
In case all asset movement disperse independent of each other the pairwise correlation between assets is 0. The correlation 0 basket variance V0 is given by the sum of variances
V_0 = \sum_{i=1}^{N}{w_i^2 \sigma_i^2}
The relation of the actual index variance VI to these extreme cases is reflected by the implied correlation ρ
\rho = \frac{V_I-V_0}{V_1-V_0}
The graph below depicts the Dow Jones index implied volatility in the band between the dispersed correlation zero and extreme aligned correlation one cases.

The implied volatility is close to the upper range, indicating a high implied correlation. Plotting implied correlation directly confirms this

The graphs above are generated by downloading historical index and stock option implied volatility data into Excel using the Bloomberg specific functions,

=BDS(IndexTicker,"INDX_MWEIGHT","cols=2;rows=30")
=BDH("AA US EQUITY","PX_LAST",DATE(2010,6,30),DATE(2010,6,30)) 
=BDH("AA US EQUITY","30DAY_IMPVOL_100.0%MNY_DF",DATE(2010,6,30),DATE(2010,6,30))

for index composition, prices and implied option volatilities.

For calculation on the static data refer to the following google-docs spreadsheet, for better visibility right click the below and select ‘Open Frame in New Window’.

Finally let’s look at volatilities and correlations from raw realized asset prices

retrieved using the tawny and quantmod R packages

require(tawny)
indx <- getIndexComposition('^DJI')
for(stk in indx) { getSymbols(stk) }

widthNrBusDays <- 90
dayBasis <- 260
indx.ts <- xts()
var0.ts <- xts()
var1.ts <- xts()
divisor <- 0.13213
for(stk in indx) 
{
 	stk.ts<-Cl(get(stk)['2008-01::'])
        var.ts=apply.rolling(diff(log(stk.ts)),width=widthNrBusDays,FUN="var")*dayBasis
if(length(indx.ts) == 0)
{
        indx.ts <- stk.ts/divisor
        var0.ts <- var.ts*(stk.ts/divisor)^2
        var1.ts <- sqrt(var.ts)*stk.ts/divisor
} else {
       indx.ts = indx.ts + stk.ts/divisor
       var0.ts = var0.ts + var.ts*(stk.ts/divisor)^2
       var1.ts = var1.ts + sqrt(var.ts)*stk.ts/divisor
}
}
var0.ts = var0.ts/(indx.ts*indx.ts)
var1.ts = (var1.ts/indx.ts)^2

indx.var.ts <- apply.rolling(diff(log(indx.ts)),width=widthNrBusDays,FUN="var")*dayBasis

indx.corr.ts <- (indx.var.ts - var0.ts)/(var1.ts - var0.ts)
indx.vol.ts <-  sqrt(indx.var.ts)

par(mfrow=c(2,1))
chart_Series(indx.corr.ts, name='DJI 30 day implied correlation')
chart_Series(indx.vol.ts, name='DJI 30 day realized volatility')
Posted in Uncategorized | Tagged , , , , , , | 3 Comments

Nonlinear systems

There is a long standing debate if financial systems are truly random or contain some structure. From the study of non-linear dynamical systems and chaos one finds it is possible that even perfectly deterministic systems can appear to be random. Those systems tend to be predictable on short time scales but neighboring states quickly diverge, leaving the operator incapable of forecasting future outcomes beyond a characteristic time scale. Still, the study of these systems provides insights and technics for forecasting, state classification and control. For machine learning applications data from chaotic model systems can be a benchmark for tuning and selection of learning algorithms.

In this post I will briefly explore the ‘R’ package “tseriesChaos” and demonstrate its usage to generate chaotic sample data. In the following example mapping out the time-delay embedded signal reveals the simple structure that governs this particular system.

#load the library tseriesChaos and scatterplot3d for graphing
library(tseriesChaos)
library(scatterplot3d)
#generate and plot times series data for the Rössler system
rossler.ts <- sim.cont(rossler.syst, start=0, end=650, dt=0.1, start.x=c(0,0,0), parms=c(0.15, 0.2, 10))
plot(rossler.ts)

Represent and plot the times series as a 3 dimensional delayed vector [x(t), x(t-d), x(t-2*d)]

n = length(rossler.ts)
d = 10
xyz <- data.frame(x=rossler.ts[1:(n-2*d)],y=rossler.ts[(1+d):(n-d)],z=rossler.ts[(1+2*d):(n)])
scatterplot3d(xyz, type="l")

Taking the Poincaré section, which is to read a slice off the continuous attractor if it crosses a lower dimensional subspace. Here the z dimension is recorded whenever the x dimension crosses the value 0 from above.

pc<-xyz[1000+which(sapply( 1000:(length(xyz[,1])-1) , function(i) ( (xyz[i+1,1]<xyz[i,1]) * (xyz[i,1]*xyz[i+1,1])  ) )<0),3]

Plotting the return map reveals the law that governs the evolution of the states. In particular one can identify a fixed-point that where the 45 degree line crosses the return map. If the system were brought to this state it would always remain on it.

plot(pc[1:(length(pc)-1)],pc[2:length(pc)],pch=16,cex=.4,xlab="current state", ylab="next state",main="Return Map")
abline(0,1)

Similarly plotting states two iterations apart reveals the location of period-2 orbits.

plot(pc[1:(length(pc)-2)],pc[3:length(pc)],pch=16,cex=.4,xlab="previous state", ylab="next state",main="Return Map")
abline(0,1)

An area of research in this type of systems is to dynamically identify and control such special states. The ability of learning algorithms to recover the hidden model structure can be benchmarked against clean test data as generated in the example above and also against chaotic data with added noise. In addition, in order to avoid over-fitting (high variance) learning of real model data can also be compared against surrogate data. Here surrogate data refers to a generated random time series that shares the same frequency power spectrum and probability distribution with the original signal, but contains no deterministic structure. See also D. Prichard and J. Theiler in “Generating surrogate data for time series with several simultaneously measured variables” and J.D Farmer and others in “Testing for nonlinearity in time series: the method of surrogate data”.

Posted in Uncategorized | Tagged , , , , , , | 3 Comments

Simulation and resampling

In financial applications one frequently comes across the need to draw samples according to an assumed distribution. This could be because one wants to simulate stock prices for a Monte Carlo simulation, to price an option payout or to generate risk scenarios. Applications arise also in the context of machine learning and partial filters.

To illustrate this let’s look at a simple example. Say we assume a total 30% drop with a chance of 20%, a 20% gain with a chance of 30% and a chance of 50% for no change.

Programming this in ‘R’, this could be expressed as vectors:

returns <- c(-0.3,0.0,0.2)
probabilities <- c(0.2,0.5,0.3)

Next we normalize the probabilities to 1. and work out the cumulative distribution, which in this example works out to the vector cw = [0.2,0.7, 1.0]

probabilities <- probabilities/sum(probabilities)
cw <- cumsum(w)      # cumulative distribution

Finally to generate Nsimulations=100,000 samples, we generate a uniform random number between 0 and 1 using the function ‘runif’, and look up the corresponding returns for ‘which’  the the first element of the cumulative distribution function cw exceeds the just generated uniform number.

Nsimulations <- 100000
simulated_returns <- sapply(runif(Nsimulations,0,1), function(x) (returns[min(which(cw-x>0))]))

We can check the output by plotting an histogram to verify that the frequency of the simulated returns is according to our targeted probabilities, comparing the mean an variance of the simulation to their theoretical expectations.

simulated vs. theoretical
mean: 0.00022 0.00000
var: 0.02983 0.03000
hist(simulated_returns)
theoretical_mean <- sum(returns*probabilities)
theoretical_var <- sum(returns*returns*probabilities) - theoretical_mean^2

simulated_mean <- mean(simulated_returns)
simulated_var <- var(simulated_returns)

sprintf("simulated vs. theoretical")
sprintf("mean: %2.5f %2.5f",simulated_mean, theoretical_mean)
sprintf("var: %2.5f %2.5f",simulated_var, theoretical_var)

An interesting approach to sampling the distribution is the implementation below that follows Sebastian Thrun‘s stochastic particle filter. It effectively samples off the cumulative distribution. The first algorithm used the range between 0 and 1 for its uniform numbers to look up the corresponding return from the full range of the distribution function.  The algorithm below picks a random starting point from the distribution function and then just takes twice the maximum of the probability buckets as a step size to pick the next simulation from that neighborhood. In the case of very large data sets the advantage is there is less time spend to seek over the full distribution function to get to and to pick the next sample.

Nsimulations <- 100000
returns <- c(-0.3,0.0,0.2)
probabilities <- c(0.2,0.5,0.3)
probabilities <- probabilities/sum(probabilities)
simulated_returns <- c()
max_probability <- max(probabilities)
b <- 0.
L = length(probabilities)
i = floor(runif(1,1,L+1))
for( j in 1:Nsimulations)
{
b =  runif(1,0,1+2*max_probability)
while(probabilities[i]&lt;b)
{
b = b - probabilities[i]
i = (i ) %% L+1
}
simulated_returns <- c(simulated_returns,returns[i])
}
hist(simulated_returns)

Using this algorithm we get similar results
simulated vs. theoretical
mean: -0.00006 0.00000
var: 0.03006 0.03000

Posted in Uncategorized | Tagged , , , , , , , , | Leave a comment