Simulation and resampling

In financial applications one frequently comes across the need to draw samples according to an assumed distribution. This could be because one wants to simulate stock prices for a Monte Carlo simulation, to price an option payout or to generate risk scenarios. Applications arise also in the context of machine learning and partial filters.

To illustrate this let’s look at a simple example. Say we assume a total 30% drop with a chance of 20%, a 20% gain with a chance of 30% and a chance of 50% for no change.

Programming this in ‘R’, this could be expressed as vectors:

returns <- c(-0.3,0.0,0.2)
probabilities <- c(0.2,0.5,0.3)

Next we normalize the probabilities to 1. and work out the cumulative distribution, which in this example works out to the vector cw = [0.2,0.7, 1.0]

probabilities <- probabilities/sum(probabilities)
cw <- cumsum(w)      # cumulative distribution

Finally to generate Nsimulations=100,000 samples, we generate a uniform random number between 0 and 1 using the function ‘runif’, and look up the corresponding returns for ‘which’  the the first element of the cumulative distribution function cw exceeds the just generated uniform number.

Nsimulations <- 100000
simulated_returns <- sapply(runif(Nsimulations,0,1), function(x) (returns[min(which(cw-x>0))]))

We can check the output by plotting an histogram to verify that the frequency of the simulated returns is according to our targeted probabilities, comparing the mean an variance of the simulation to their theoretical expectations.

simulated vs. theoretical
mean: 0.00022 0.00000
var: 0.02983 0.03000
hist(simulated_returns)
theoretical_mean <- sum(returns*probabilities)
theoretical_var <- sum(returns*returns*probabilities) - theoretical_mean^2

simulated_mean <- mean(simulated_returns)
simulated_var <- var(simulated_returns)

sprintf("simulated vs. theoretical")
sprintf("mean: %2.5f %2.5f",simulated_mean, theoretical_mean)
sprintf("var: %2.5f %2.5f",simulated_var, theoretical_var)

An interesting approach to sampling the distribution is the implementation below that follows Sebastian Thrun‘s stochastic particle filter. It effectively samples off the cumulative distribution. The first algorithm used the range between 0 and 1 for its uniform numbers to look up the corresponding return from the full range of the distribution function.  The algorithm below picks a random starting point from the distribution function and then just takes twice the maximum of the probability buckets as a step size to pick the next simulation from that neighborhood. In the case of very large data sets the advantage is there is less time spend to seek over the full distribution function to get to and to pick the next sample.

Nsimulations <- 100000
returns <- c(-0.3,0.0,0.2)
probabilities <- c(0.2,0.5,0.3)
probabilities <- probabilities/sum(probabilities)
simulated_returns <- c()
max_probability <- max(probabilities)
b <- 0.
L = length(probabilities)
i = floor(runif(1,1,L+1))
for( j in 1:Nsimulations)
{
b =  runif(1,0,1+2*max_probability)
while(probabilities[i]&lt;b)
{
b = b - probabilities[i]
i = (i ) %% L+1
}
simulated_returns <- c(simulated_returns,returns[i])
}
hist(simulated_returns)

Using this algorithm we get similar results
simulated vs. theoretical
mean: -0.00006 0.00000
var: 0.03006 0.03000

About these ads
This entry was posted in Uncategorized and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s