Wednesday, April 30, 2014

What is a good volatility model?

While conditional volatility modeling has evolved substantially in the 30+ years since it is first documented in financial data, there is little consensus as to which volatility model is “best”.

Historical Volatility

The simplest model for volatility is simple historical volatility (HV), $$\sigma_t = \sqrt{\sum _{j=1}^t r_j^2}.$$ Historical volatility is really easy to compute and only involves 1 parameter – the length of the window use (or none if using all available data).  Of course, historical volatility doesn’t react to news and so is unlikely to be satisfactory in practice.

Exponentially Weighted Moving Averages

EWMA volatility goes beyond HV to reflect more recent information.  The basic EWMA can be expressed in one of two forms, $$\sigma_t = (1-\lambda)r_{t-1}^2 + \lambda \sigma_{t-1}^2$$ or equivalently $$\sigma_t = (1-\lambda)\sum_{i=0}^{\infty} \lambda^i r_{t-1-i}^2.$$
I’ve often found EWMA-based volatility (and covariance/correlation/beta) to be very popular choices among practitioners.  This popularity is often attributed to a few features of the EWMA model:
  • Simplicity.  The entire model is summarized by 1 parameter which is easily interpretable in terms of memory of the process.
  • Robustness: The model, at least when using common values for \(\lambda\), is very unlikely to produce absurd values.  This is particularly important in practice when volatility is required for hundreds of assets.

Combining: GARCH

EWMAs are far less popular among academics, primarily since they have some undesirable statistical properties (random walk forecasts and do not always generate not non-negative conditional variances), and ARCH-type models are commonly used to address these issues.  The conditional variance in the standard GARCH model is $$\sigma_{t+1}^2 = \omega + \alpha r_{t}^2 + \beta \sigma_t^2$$ and so has 3 free parameters.  However, a GARCH model can be equivalently expressed as a convex combination of HV and EWMA where $$\sigma^2_{t+1} = (1-w) \sigma^2_{t+1,HV} + w \sigma^2_{t+1,EWMA}. $$
In this strange reformulation, \(w=\alpha+\beta\) and \(\lambda\) in the EWMA is the same as the \(\beta\) in the usual GARCH specification.  When I hear GARCH models dismissed while EWMA and MA models lauded, I find this to be somewhat paradoxical.  Of course, the combination model sill have 3 parameters which may be two too many.

Assessing Volatility Models

The standard method to assess volatility models is to evaluate the forecast using a volatility proxy such as the squared return.  The non-observability of volatility poses some problems and so only a subset of loss functions will consistently select the best forecast in the sense that if the true model was included that it would necessarily be selected.  The two leading examples from this class are the Mean Square Error (MSE) loss function and the QLIK loss function, defined 

\[L\left(r_{t+1}^2,\hat{\sigma}^2_{t+1}\right) = \ln\left(\hat{\sigma}^2_{t+1}\right)+ \frac{r_{t+1}^2}{\hat{\sigma}^2_{t+1}}\]

The name QLIK is derived from the similarity to the (negative) Gaussian log-likelihood and its use as a quasi-likelihood in mis-specified models. 

A New Criteria for Volatility Models Assessment

In some discussion with practitioners, I recently stumbled across a new consideration for volatility model assessment - one that is driven by the desire to avoid noise-induced trading.  Trading strategies involve both conditional mean predictions (or \(\alpha\)) as well as forecasts of quantities required to manage risk management.  Volatility is almost always one component of the risk management forecast.  A simple strategy aims to maximize the return subject to a volatility limit - say 20% per year.  The in this type of strategy, the role of the mean forecast is primarily to determine the directions of the position - long or short.  The volatility is used to scale the position.  As a result, a substantial amount of trading is induced by changes in volatility since the sign of the mean forecast is typically persistent. 

Optimizing a volatility forecast for this scenario requires a different criteria, and one simple method to formalize this idea is to include a term that penalizes changes in the volatility forecast.  This could be measured using the absolute or squared difference in forecasts, so that a modified QLIK criteria could be constructed as 

\[QLIK + \gamma\left|\hat{\sigma}^2_{t+1}-\hat{\sigma}^2_{t}\right|\]

where \(\gamma\) is a weight used to control the sensitivity of the loss function to the smoothness penalty.

Monday, April 21, 2014

Scaling up vs. scaling out

Simulations have a long history in econometrics; one of the most influential simulations was Granger and Newbold (1974) who demonstrate the dangers of spurious regression. These results are not more than 40 years old, and while the entire simulation can be readily replicated in less than a minute on a modern computer, the simulation was state-of-the-art when it was originally conducted.

Scale Out

Scale out has traditionally been the model used to enable realistic simulation designs. Many academic institutions have clusters available for researchers, and anyone who is willing to invest some time can run their own cluster on Amazon Web Services (AWS) using the StarCluter toolkit. The scale out model allows for nearly unbounded scaling of computational resources for most simulations since most fall into the class of embarrassinglyparallel problem.

However, the scale out model, while providing a highly scalable environment, comes with one important cost - researcher time. Most econometricians are not overly dependent on cluster environments and so the incentive to learn a substantially programming environment is low. Most programming is done in a relatively interactive manner using MATLAB, GAUSS, Ox or R. The process of moving these to a cluster environment is non-trivial and requires porting to a batch system - where simulations are submitted to queue - which is substantially different from the desktop environment used to develop the code.

Scale Up and a Missing Market

A simpler method to enable researchers to conduct complex simulations is to use a scale up model. This is a far simpler solution since it allows researchers to simply make use of a bigger version of their desktop. Moderately large single computers can be configured with 24 to 30 cores in a single machine, and any code that runs well on 4 core desktop can be trivially ported to run on a 24 core machine. More importantly, it is simple to replicate the environment that was used to develop the code in the scaled up environment so that the time costs, and risk associated with changing environments, are minimized.

The challenge with the scale up model is the cost of provisioning a large computer. A capable scale up computer costs north of $10,000 which may be difficult for research budgets to accommodate. The other side the cost is that the machine will probably not be highly utilized by a single researcher. Most simulation studies I've performed were designed to provide results within a few weeks, and even with multiple simulation studies per year, the machine would idle most of the time.

I am surprised that the "cloud" has not provided this mode of operation. Virtually all of the cloud has developed around the cluster model and the largest instances that are currently available on either Google or AWS contains16 physical core. A single virtual desktop that doubled this core count would be a very attractive to researchers in many fields, and so this seems like a missing market.