## Tuesday, February 25, 2014

### Data silos and the regulatory burden

Carol Clark at the Chicago Fed has recently written about some of the issues surrounding high-speed trading.  Three of the five policy questions have important, unanswered econometrics issues:
• Are market participants underpricing the risks of HST?
• Do they have real-time controls to they need to manage these risks?
• Do regulators have the proper incentives and tools to identify and control market manipulation?
Two of these three are basic questions about risk, which is arguably the fundamental question underlying much of financial econometrics. The final question does not directly require risk measurement but it does require econometric tools to identify market manipulation.
I don’t think that many would quibble that these are important questions that need answers.  Unfortunately many of these are unanswerable using market data.  For example, it is possibly to characterize the short term risk of an asset price using a wide range of volatility or other measures.  It is not, however, possible to determine whether the episodes were characterized by unusual HST activity.

## Regulatory Burden and Risk

Currently only regulators have access to data at the granularity that is required to answer these questions.  This data is usually considered highly proprietary since studying it requires some forms of trader identification, which may, for example, allow proprietary trading algorithms to be reverse engineered.  This leaves the entire burden of addressing these challenging questions up to the regulators, who are often stretched dealing with more mundane tasks (e.g. Dodd-Frank rule writing).
On occasion, this data has been shared using various arrangements to permit outsiders access.  Most recently, this has not gone well.  The ambiguity surrounding the data, even for a visiting academic, creates a non-trivial risk: the more interesting the research, the less likely it is publishable.

## Creative Solutions Needed

Simply releasing the raw data will clearly never satisfy all parties.  However, there are a number of solutions which could be used to improve access:
• Scramble trader identification information on a fixed schedule.  If traders have id $0,...,n_i$, then each day $i$ they are randomly reinitialized from the same set.  This would make it more difficult to reverse engineer since it is non-trivial to splice together data.
• Add a rule-based flag for HST trader-initiated trade.  This could be computed by the regulator based on some guidelines about what is a HST file (order rate, cancellation ate, resting time, net position, end-of-day position).  This would make it impossible to study individual trader activity, but would allow the higher-level questions to be addressed.