Portfolio Selection and Optimization

Let’s look at the task of selecting a portfolio of stocks that optimize a particular measure of performance. In the structured product setting one might want to compose a portfolio to be used as a reference index for a derivative, with the objective that the index needs a specifically high or low correlation structure, since the overall correlation affects the volatility of the final product. Similarly an individual investor might like to select a number of stocks in order to compose a portfolio that historically had a low volatility and a high growth rate. Yet another objective could be to find portfolio with a specific correlation behavior with respect to a particular target.

To show a particular example, let’s try to select a portfolio composed out of four assets out of a larger universe of around 5,000 NYSE and NASDAQ names that would have the highest possible sharp ratio in the year of 2011. The sharp ratio measures the ratio of average daily returns over its standard derivations and is a widely used performance measure.

The attached R code runs the problem against a small universe of DJI stocks, in order not to require uploading of a large number of market data. For this exercise I don’t correct the stock growth for the risk-free rate and consider long only stock combinations.

For this task one must first be able to calculate the relative weights for a given set of four stocks that optimize the sharp ratio. In the example code the optimizer solnp from the R package Rsolnp is used, instead of a specialized function from a financial package, in order to retain the flexibility of optimizing other non standard measures.

There are a number of approaches that could be used for stock selection. The brute force method would be to iterate over all possible combinations, which quickly becomes a very large problem to solve with limited computational resources. For example to check all four name portfolios out of a universe of N stocks would require N*(N-1)*(N-2)*(N-3)/24 combinations.

On the other end on can use a bootstrapping method, by first selecting the stock with the single best sharp ratio. Finding this stock requires N computations. Next checking all two stock portfolios that are composed out of the best stock and all possible selections from the universe adds N-1 calculations. Repeating this sequence of adding just single names at a time leads first to a three name and finally to a four name portfolio after a total of 4*N-6 steps. Having arrived at an already half-way optimized four name portfolio one can then try to improve it further by rotating out individual names against the universe of stocks.

A systematic variation to the try and error bootstrapping leads to the tree based search method illustrated in the following.

The picture below for the universe of 30 DJI stocks starts off at the bottom node with the portfolio the stocks “JPM”, “HPQ”, “AA”, BAC” with a low performance sharp ratio of -0.42.  From this the tree search generates four sub portfolio combinations, (“HPQ”, “AA”, BAC”), (“JPM”,”AA”, BAC”),(“JPM”, “HPQ”, BAC”) and (“JPM”, “HPQ”, “AA”) by removing one member each. Then it linear-searches among the universe of stocks for the best name to add to each sub portfolio in order to arrive at a new set of four-name portfolios.  The one with the best sharp ratio is then selected. In the graph below this was the combination (“AA”, “BAC”,”HPQ”,”MCD”). It becomes the root node of the new search tree branch. Eventually the tree search ends when no new combinations with higher sharp ratio can be found along the deepest search path.

Image

The following shows a similar search tree, now using the full 5000 NYSE and NASDAQ stocks for that we got sufficient historical data without missing dates.

Image

The graphs below depict the historical cumulative performance of the optimized portfolio against the performance of the initial portfolio. Note that the search among the 5000 stock universe yielded a growth portfolio with much smaller variance than just selecting among the 30 DJI stocks.

Further optimization approaches are Monte Carlo like tactics of selecting random starting points for multiple searches. Other extensions are genetic algorithms that create new trial portfolios out the genetic crossings of two parent portfolios. Overall the tree approach here seemed to be robust against varying the selection of the starting root portfolio and converges after few search nodes.

This post is mainly to demonstrate the optimization technique rather than to advertise a particular portfolio selection. In reality a portfolio that performed particularly well in one year is not guaranteed to have similar characteristics in the following year. A more complete study would include back-testing and cross-validation of the selected portfolios against data that was not used during optimization.

To run the tree search first create the following functions in R by copying and pasting

require(tawny)
require(quantmod)
require(ggplot)
require(ape)
#################################
# load list of symbols via getSymbols()
# enclose fetching symbols within try clause in case of faulty/non-existing tickers
# disregard symbols without a complete history
# return list of successful loaded names
####################################l
symbol.exists <- function(stk)
{
     tryCatch( typeof(get(stk,env=global.env))=="double", error = function(e) FALSE )
}
symbols.load <- function(indx.master)
{
  indx <- c() # list of loaded tickers
  for(stk in indx.master)
  {
     cc <- 0
     while( symbol.exists(stk) == FALSE & cc < 2 )
     {
       print(stk)
       cc <- cc +1   
       tryCatch( getSymbols(stk, from="2011-01-03", to="2011-12-30",env=global.env), error = function(e) Sys.sleep(1) )
       if(exists(stk)==TRUE){
       if(NROW(get(stk,env=global.env)['2011-01-03::2011-12-30'])==252){
         indx<-c(indx,stk)  }}
      }
   }
}

#########################
# check master symbol list
# return clean list of symbols that are successfully loaded
#########################
symbols.clean <- function(indx.master)
{
  indx <- c()
  for(stk in indx.master){ if(symbol.exists(stk)==TRUE) {
      if(NROW(get(stk,env=global.env)['2011-01-03::2011-12-30'])==252){
            indx<-c(indx,stk) }}}
  indx
}

##################
# compute list of sharp ratio for each name in the indx list
##################
sharp.ratios <-function(indx)
{
 results<-matrix(nrow=length(indx),ncol=4)
 stks.dailyret.ts <- xts()
 cc <-1
 for(stk in indx){
   stk.ts <- Ad(get(stk,env=global.env)['2011-01-03::2011-12-30'])
   stk.cumret.ts <- stk.ts/coredata(stk.ts[1])[1]
   stk.dailyret.ts <- tail(stk.cumret.ts/lag(stk.cumret.ts,k=1)-1,-1)
   stks.dailyret.ts <- merge(stks.dailyret.ts, stk.dailyret.ts)
   stk.mean <- mean(stk.dailyret.ts)
   stk.var <- var(stk.dailyret.ts)
   stk.sharp <- sqrt(252)*stk.mean/sqrt(stk.var)
   results[cc,] <-  c(stk, stk.mean, sqrt(stk.var), stk.sharp)
  cc <- cc+1
  }
 results
}


######################
# compute sharp ratio and optimum combination for names in stks
######################
require(Rsolnp)
sharp.ratio <- function(stks)
{
  stks.dailyret.ts <- xts()
  for(stk in stks ){
   stk.ts <- Ad(get(stk,global.env)['2011-01-03::2011-12-30'])
   stk.cumret.ts <- stk.ts/coredata(stk.ts[1])[1]
   stk.dailyret.ts <- tail(stk.cumret.ts/lag(stk.cumret.ts,k=1)-1,-1)
   stks.dailyret.ts <- merge(stks.dailyret.ts, stk.dailyret.ts)
 }
 dcount <- sqrt(252)
 mu<-colMeans(stks.dailyret.ts)
 V <- cov(stks.dailyret.ts)
  res <- solnp(
  rep(1/length(mu), length(mu)),
  function(w) - dcount*t(w) %*% mu / sqrt( t(w) %*% V %*% w ),
  eqfun = function(w) sum(w),
  eqB   = 1,
  LB = rep(0, length(mu))
)
  res$mu <-  t(res$pars) %*% mu
  res$std <-  sqrt( t(res$pars) %*% V %*% res$pars )
  res$sharp <- -tail( res$values,1)
  res
}

sharp.simple.ratio <- function(stks)
{
 stks.cumret.ts <- xts()
 for(stk in stks ){
  stk.ts <- Ad(get(stk,global.env)['2011-01-03::2011-12-30'])
  stk.cumret.ts <- stk.ts/coredata(stk.ts[1])[1]
  stks.cumret.ts <- merge(stks.cumret.ts, stk.cumret.ts)
  }
dcount <- sqrt(252)
res <- solnp(
  rep(1/length(stks), length(stks)),
  function(w) {
   stks.dailyret.ts <-  stks.cumret.ts %*% w
   n <- length(stks.dailyret.ts)
   stks.dailyret.ts <- stks.dailyret.ts[2:n]/stks.dailyret.ts[1:n-1]-1
   - dcount*  mean(stks.dailyret.ts)/  sqrt(var(stks.dailyret.ts))
   },
  eqfun = function(w) sum(w),
  eqB   = 1,
   LB = rep(0, length(stks))
)
stks.dailyret.ts <-  stks.cumret.ts %*% res$par
n <- length(stks.dailyret.ts)
stks.dailyret.ts <- stks.dailyret.ts[2:n]/stks.dailyret.ts[1:n-1]-1
res$mu <- mean(stks.dailyret.ts)
res$std <-  sqrt(var(stks.dailyret.ts))
res$sharp <- -tail( res$values,1)
res
}


# scan universe for best stk to add to portfolio
portfolio.add <- function(portfolio, universe)
{
  n<-NROW(universe)
  # check home many of the portfolio stocks are also part of the universe
  noverlap <- sum(sapply(1:length(portfolio), function(i) any(portfolio[i]==universe)))
  # allocate result matrix
  results<-matrix(nrow=n-noverlap,ncol=length(portfolio)+4)
  cc <- 1
  for (i in 1:n)   if(any(portfolio==universe[i])==FALSE)
  {
   stks<-c(portfolio,universe[i])
   #sr<-sharp.simple.ratio(stks)
   sr<-sharp.ratio(stks)
   res <- c(sprintf("%s",stks), sr$mu, sr$std, sr$sharp )
   results[cc,] = res
   cc <- cc+1
 }
results.ordered <- results[order(as.numeric(results[,ncol(results)]),decreasing=TRUE),]
results.ordered
}
node.id <- function(portfolio)
{
  ratio <- sprintf("%2.2f",as.numeric(sharp.ratio(portfolio)$sharp))
  paste(portfolio[1],portfolio[2],portfolio[3],portfolio[4],ratio,sep="_")
}

portfolio.visited <- function(portfolio, visited)
{
  any(portfolio[1]==visited[,1] & portfolio[2]==visited[,2] & portfolio[3]==visited[,3] )
}
portfolio.visited4 <- function(portfolio, visited)
{
 any(portfolio[1]==visited[,1] & portfolio[2]==visited[,2] & portfolio[3]==visited[,3], portfolio[4]==visited[,4] )
}

and then execute the following

global.env <- new.env()
indx.master <- getIndexComposition('^DJI')
symbols.load(indx.master)
universe <- symbols.clean(indx.master)
portfolio <- c("JPM", "HPQ", "AA", "BAC")

mytree <- (read.tree(text = paste("(:1,:1,:1,",node.id(portfolio),":1):1;")))
mytree$node.label <- "ROOT"

visited <- c()
sharp.max <- -999999
search.results <- c()
cont <- TRUE
while(cont == TRUE)
{
    # create combinations with of subportfolios with 3 names
    portfolios <- list( c(portfolio[2],portfolio[3],portfolio[4]),
                         c(portfolio[1],portfolio[3],portfolio[4]),
                         c(portfolio[1],portfolio[2],portfolio[4]),
                         c(portfolio[1],portfolio[2],portfolio[3]) )
    # order each subportfolio alphabetical
    portfolios <- lapply(1:4, function(i) {
           portfolios[[i]] [ order( portfolios[[i]]     ) ]  }  )

i.portfolio.best <- 0
my.portfolio.best <- c()
mytree.txt <- "("
visited4 <- c()
for(i in 1:4)
{
        if(portfolio.visited(portfolios[[i]],visited)==FALSE)
        {
         visited <-  rbind(visited,portfolios[[i]])
         my.results  <- portfolio.add(portfolios[[i]], universe)    
         my.results.best <- my.results[1,1:7]
         my.sharp <- my.results[1,7]
         search.results <- rbind( search.results , my.results.best)
         
         # check if we got this node already
         my.results.best.ordered <-  my.results.best[1:4] [ order( my.results.best[1:4]  ) ]
         if(portfolio.visited(my.results.best.ordered,visited4)==FALSE){
           visited4 <-  rbind(visited4, my.results.best.ordered)
           if(i>1) { mytree.txt <- paste(mytree.txt,",") }
           mytree.txt <- paste(mytree.txt, node.id(my.results.best[1:4]),":1")

           if(my.sharp > sharp.max){
              sharp.max <- my.sharp
              i.portfolio.best <- i
              my.portfolio.best <- my.results[1,1:4]}
          }
       }
}

mytree.txt <- paste(mytree.txt,"):1;")
if( i.portfolio.best > 0 )
{
    mytree.sub <-  read.tree(text = mytree.txt)
    mytree.sub$node.label <- node.id( portfolio)
    mytree <- bind.tree(mytree, mytree.sub,
    where=which(mytree$tip.label == node.id(portfolio) ), position=0)

    plot(mytree,cex=1,srt=0,direction="u")
    nodelabels(mytree$node.label,frame="r",cex=1,srt=0)

    portfolio <- my.portfolio.best
    print(portfolio)
    flush.console()
}

if(i.portfolio.best == 0 )
{
    cont <- FALSE
    print(search.results)
    flush.console()
}

}
mytree.drop <- drop.tip(mytree,1:2)
plot(mytree.drop,cex=1,srt=0,direction="u",root.edge=TRUE)
nodelabels(mytree.drop$node.label,frame="r",cex=1,srt=0)
This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

Leave a comment