### Historical Volatility

The simplest model for volatility is simple historical volatility (HV), $$\sigma_t = \sqrt{\sum _{j=1}^t r_j^2}.$$ Historical volatility is really easy to compute and only involves 1 parameter – the length of the window use (or none if using all available data). Of course, historical volatility doesn’t react to news and so is unlikely to be satisfactory in practice.### Exponentially Weighted Moving Averages

EWMA volatility goes beyond HV to reflect more recent information. The basic EWMA can be expressed in one of two forms, $$\sigma_t = (1-\lambda)r_{t-1}^2 + \lambda \sigma_{t-1}^2$$ or equivalently $$\sigma_t = (1-\lambda)\sum_{i=0}^{\infty} \lambda^i r_{t-1-i}^2.$$I’ve often found EWMA-based volatility (and covariance/correlation/beta) to be very popular choices among practitioners. This popularity is often attributed to a few features of the EWMA model:

- Simplicity. The entire model is summarized by 1 parameter which is easily interpretable in terms of memory of the process.
- Robustness: The model, at least when using common values for \(\lambda\), is very unlikely to produce absurd values. This is particularly important in practice when volatility is required for hundreds of assets.

### Combining: GARCH

EWMAs are far less popular among academics, primarily since they have some undesirable statistical properties (random walk forecasts and do not always generate not non-negative conditional variances), and ARCH-type models are commonly used to address these issues. The conditional variance in the standard GARCH model is $$\sigma_{t+1}^2 = \omega + \alpha r_{t}^2 + \beta \sigma_t^2$$ and so has 3 free parameters. However, a GARCH model can be equivalently expressed as a convex combination of HV and EWMA where $$\sigma^2_{t+1} = (1-w) \sigma^2_{t+1,HV} + w \sigma^2_{t+1,EWMA}. $$In this strange reformulation, \(w=\alpha+\beta\) and \(\lambda\) in the EWMA is the same as the \(\beta\) in the usual GARCH specification. When I hear GARCH models dismissed while EWMA and MA models lauded, I find this to be somewhat paradoxical. Of course, the combination model sill have 3 parameters which may be two too many.

### Assessing Volatility Models

The standard method to assess volatility models is to evaluate the forecast using a volatility proxy such as the squared return. The non-observability of volatility poses some problems and so only a subset of loss functions will consistently select the best forecast in the sense that if the true model was included that it would necessarily be selected. The two leading examples from this class are the Mean Square Error (MSE) loss function and the QLIK loss function, defined

\[L\left(r_{t+1}^2,\hat{\sigma}^2_{t+1}\right) = \ln\left(\hat{\sigma}^2_{t+1}\right)+ \frac{r_{t+1}^2}{\hat{\sigma}^2_{t+1}}\]

The name QLIK is derived from the similarity to the (negative) Gaussian log-likelihood and its use as a quasi-likelihood in mis-specified models.

### A New Criteria for Volatility Models Assessment

In some discussion with practitioners, I recently stumbled across a new consideration for volatility model assessment - one that is driven by the desire to avoid noise-induced trading. Trading strategies involve both conditional mean predictions (or \(\alpha\)) as well as forecasts of quantities required to manage risk management. Volatility is almost always one component of the risk management forecast. A simple strategy aims to maximize the return subject to a volatility limit - say 20% per year. The in this type of strategy, the role of the mean forecast is primarily to determine the directions of the position - long or short. The volatility is used to scale the position. As a result, a substantial amount of trading is induced by changes in volatility since the sign of the mean forecast is typically persistent.

Optimizing a volatility forecast for this scenario requires a different criteria, and one simple method to formalize this idea is to include a term that penalizes changes in the volatility forecast. This could be measured using the absolute or squared difference in forecasts, so that a modified QLIK criteria could be constructed as

\[QLIK + \gamma\left|\hat{\sigma}^2_{t+1}-\hat{\sigma}^2_{t}\right|\]

where \(\gamma\) is a weight used to control the sensitivity of the loss function to the smoothness penalty.