Saturday, March 8, 2014

Upcoming SoFiE Conference:
Skewness, Heavy Tails, Market Crashes, and Dynamics

The deadline for the upcoming SoFiE sponsored conference on extreme risks is next week.

Date: April 28 & 29, 2014
Submission Deadline: March 15, 2014
Location: Cambridge, UK

Topics include:
  • Estimation and inference in dynamic asset pricing models 
  • Characterization of financial risk in the presence of skewness and fat tails 
  • Modelling Bubbles and Crashes 
  • Multivariate non-Gaussian densities 
  • Measures of dependences – co-skewness 
  • Conditional Skewness Models 

Invited Speakers

Paul Embrechts (ETH Z├╝rich), Andrew Harvey (Cambridge), Eric Ghysels (UNC - Chaple Hill), Peter Christoffersen (Toronto)

Program Chairs

O. Linton and E. Renault


Papers can only be submitted electronically via e-mail to, with the subject line “SoFiE Fac 2014 Submission” and must consist of a single PDF file. No other formats will be accepted. Submissions must be received by March 15, 2014.

More details are available on the conference website.

Friday, March 7, 2014

Referee Reports (Lost in Translation)

I’ve come across a wide variety of referee report styles, ranging from holistic, short essay style, to simple list of short bullet points.   Unlike general academic writing, when someone first starts writing reports, often as a graduate student, there is little guidance.  I recall asking the person who asked for the report for some and was given one of their recent reports as a template – I have little doubt that this induced extreme path dependence and that my default template today still reflects this initial observation.

Internationalization and Report Language

I recently came across a HBR article on the difference, across cultures, between what is said and what is heard, and am wondering sensitive this issue is in reading referee reports.  It might explain why I’ve heard complaints that the editorial decision did not match the (perceived) reports.

Popular Fiction as Academic Writing

If Harry Potter Was An Academic Work is a light-hearted take on the peer review process.  I found the following to be particularly insightful.
Dear Dr. Rowling 
I am pleased to say that the reviewers have returned their reports on your submission Harry Potter and the Half-Blood Prince and we are able to make an editorial decision, which is ACCEPT WITH MAJOR REVISION.
Reviewer 1 felt that the core point of your contribution could be made much more succinctly, and recommended that you remove the characters of Ron, Hermione, Draco, Hagrid and Snape. I concur with his assessment that the final version will be tighter and stronger for these cuts, and am confident that you can make them in a way that does not compromise the plot. 
Reviewer 2 was positive over all, but did not like being surprised by the ending, and felt that it should have been outlined in the abstract. She also felt that citation of earlier works including Lewis (1950, 1951, 1952, 1953, 1954, 1955, 1956) and Pullman (1995, 1997, 2000) would be appropriate, and noted an over-use of constructions such as “… said Hermione, warningly”.

Thursday, March 6, 2014

Time for WRDS 2.0?

In the beginning...

Managing financial data was very painful. Using CRSP required either using a clunky program to extract or compiling some FORTRAN when more control was needed. Using TAQ meant spending a day rotating CDs through a reader (and also either using a clunky GUI or writing your own code to read a binary format). Wharton Research Data Services (WRDS) dramatically simplified the process of accessing financial data, whether it was simply extracting a large set of return data or accessing quarterly report information. WRDS has grown considerably in scope and covers both a wide range of proprietary databases as well as offering a warehouse for free-to-use datasets.

The good, the bad and the SAS

The WRDS infrastructure seems to be built mostly on SAS, one of the grand-daddy’s of statistical software.  SAS was one of the first statistical packages I used as an undergraduate (along with Shazam, which I didn’t realize still exists and possibly the best domain name).  Back in these dark days it took 30 minutes to run a cross-section regression with 800,000 observations and a dozen or so variables on the shared Sun server.  Of course, the 800,000 observations had actually been read off of a 10.5 inch tape.  But this environment was revolutionary since it could run the regression at all, and so was very valuable.

A short 20 years since I ran my first regressions and I have no use for SAS – well, I would have no use for SAS were it not the only practical method to make non-trivial queries on WRDS.  I am sympathetic to the idea that SAS provides a simple abstraction for a wide range of data form the (now) tiny Monthly CRSP dataset to the large TAQ dataset.  However, this is a decidedly dated approach, especially for larger datasets.  I know a wide range of practitioners who work around high-frequency data and I am not aware of any who use SAS as an important component of their financial modeling toolkit.  Commercial products such as kdb exploit the structure of the dataset to be both faster and require a less storage for the same dataset.  I recently was introduced to an alternative, widely used open data storage format HDF by a former computer science colleague which can also achieve fantastic compression while providing direct access to MATLAB (or R, Python, C, Java, C#, inter alia.) It has been so successful at managing TAQ-type data that the entire TAQ database (1993-2013) can be stored on a $200 desktop hard drive. 

The deep issue is that the SAS data file format is not readily usable in many software packages, which creates an unnecessary cycle of using SAS to export data (possibly aggregating at some level) before re-importing into the native format for a more appropriate statistical analysis package designed for rapidly iterating between model and results.

WRDS Cloud

In 2012 WRDS introduced the cloud, which provided a much needed speed boost to the now aging main server. The cloud operates as a batch processor where jobs – mostly SAS programs – are submitted and run in an orderly, balanced fashion. This is a far superior, both in terms of fairness and long-run potential for growth since it follows the scale-out model so that as the use of WRDS increases, or as new, large datasets are introduced, new capacity can be brought on-line to reflect demand. The limitation of the Cloud is that it mostly is still running SAS jobs, just on faster hardware, and so the deep issues about access remain. The WRDS cloud does also support R for a small minority of datasets which have been exported to text (also not a good format since conversion from text to binary is slow, text files are verbose (although this can be mitigated using compression) and, if not carefully done, the conversion may not perfectly preserve fidelity).

Expectations of a modern data provider

What changes would I really like to see in WRDS? A brief list:

  • Use of more open data formats that support a wide range of tools, especially those which are free (e.g. R or Python, but also Octave, Julia, Java or C++ if needed) or free for academic use (e.g. Ox).
  • The ability to submit queries directly using a Web API. This is how the Oxford-Man Realized dB operates using Thompson-Reuter’s TickHistory – a C# program manages submission requests, check completion queues and downloads data, all using a Web Service.
  • The ability to execute small, short running queries directly from leading, popular software packages. MATLAB, for example, has an add-on that allows Bloomberg data to be directly pulled into MATLAB with essentially no effort and especially no importing code.