Monday, April 21, 2014

Scaling up vs. scaling out

Simulations have a long history in econometrics; one of the most influential simulations was Granger and Newbold (1974) who demonstrate the dangers of spurious regression. These results are not more than 40 years old, and while the entire simulation can be readily replicated in less than a minute on a modern computer, the simulation was state-of-the-art when it was originally conducted.

Scale Out

Scale out has traditionally been the model used to enable realistic simulation designs. Many academic institutions have clusters available for researchers, and anyone who is willing to invest some time can run their own cluster on Amazon Web Services (AWS) using the StarCluter toolkit. The scale out model allows for nearly unbounded scaling of computational resources for most simulations since most fall into the class of embarrassinglyparallel problem.

However, the scale out model, while providing a highly scalable environment, comes with one important cost - researcher time. Most econometricians are not overly dependent on cluster environments and so the incentive to learn a substantially programming environment is low. Most programming is done in a relatively interactive manner using MATLAB, GAUSS, Ox or R. The process of moving these to a cluster environment is non-trivial and requires porting to a batch system - where simulations are submitted to queue - which is substantially different from the desktop environment used to develop the code.

Scale Up and a Missing Market

A simpler method to enable researchers to conduct complex simulations is to use a scale up model. This is a far simpler solution since it allows researchers to simply make use of a bigger version of their desktop. Moderately large single computers can be configured with 24 to 30 cores in a single machine, and any code that runs well on 4 core desktop can be trivially ported to run on a 24 core machine. More importantly, it is simple to replicate the environment that was used to develop the code in the scaled up environment so that the time costs, and risk associated with changing environments, are minimized.

The challenge with the scale up model is the cost of provisioning a large computer. A capable scale up computer costs north of $10,000 which may be difficult for research budgets to accommodate. The other side the cost is that the machine will probably not be highly utilized by a single researcher. Most simulation studies I've performed were designed to provide results within a few weeks, and even with multiple simulation studies per year, the machine would idle most of the time.

I am surprised that the "cloud" has not provided this mode of operation. Virtually all of the cloud has developed around the cluster model and the largest instances that are currently available on either Google or AWS contains16 physical core. A single virtual desktop that doubled this core count would be a very attractive to researchers in many fields, and so this seems like a missing market.

No comments:

Post a Comment