Saturday, August 20, 2016

Financial Markets and Deep Learning Methods

Financial markets modeling is quite imprecise, non-scientific discipline, resembling alchemy and other futile human endeavors. We know by now that it is next to impossible to fully model and completely predict how markets will behave. There are many factors  affecting security prices and financial markets in general, with human reaction being the biggest unknown and one that is impossible to model and thus predict.
What we can strive to do is perhaps to model smaller, relatively limited domains - an approach that is  not completely dissimilar to the difference between classic and quantum physics domains, for example.
Supervised learning gives us predictable powers, but is limited because we rely on history to predict the future. That all but invalidates it for this purpose as financial markets history ( and future ) rhymes, but doesn't repeat.
Unsupervised learning, on the other hand, does not have training set ( history ) problem, but provides limited set of options regarding what we can actually do with it. Aside from running clustering algorithms ( k-means etc. ) to find out groupings in data, there isn't much in proactive department that can be done with it.
Combination of supervised and unsupervised (semisupervised )methods is one approach that could potentially be successful ( ensemble learning i.e. combining multiple models into one ). There are dozens of algorithms already implemented in many frameworks and languages that can be combined to form a useful model.
We could for example run unsupervised deep learning algorithm on massive volumes of raw data to automatically discover areas of interest (clusters of data), then perform further analysis via targeted supervised learning methods to find out how data is correlated. This approach would give us tactical advantage i.e. useful actionable information.
The whole process can be automated  and performed on massive volumes of data ( sample = ALL ) quite inexpensively.  The latest wave of technology makes it possible to store and process EVERYTHING - all indexes, stock prices, derivatives, currencies, commodity prices - as much data as we can get or  buy - and store it cheaply on Hadoop cluster, preprocess using Spark framework, then model with TensorFlow or Spark MLlib. It can all be done in the cloud, and it is all open source software.
Amazon AWS even offers GPU instances to boost processing power ( Google Cloud doesn't offer GPUs yet ).
TensorFlow can take advantage of distributed GPUs, or any combination of CPU/GPUs.
Final result is an automated system that will react and recommend ( and possibly automaticaly act )  on new insights.
We are aware that HFT and systematic trading do something similar already, but our point of interest is not short term arbitrage ( which seems to be running out of steam anyway ) - it is to exploit deeper, longer lasting knowledge about market direction.  This is not rule based, hardcoded system that acts on predefined insights. It is a live system that automatically learns, gains insights and changes behavior. We could think of it as an attempt to create AlphaGo or DeepBlue for financial markets.
Renaissance Technologies, Two  Sigma and other hedge fund high fliers probably already utilize or are building similar systems.  What is new is commoditization of such approach - suddenly almost everybody can do it. Open source nature of the latest advances in AI, Deep Learning and advances in affordability and power of parallel processing level out playing field, thus nullifying decades of incumbent advantage. The race is on to transplant the latest relevant Deep Learning advances from Silicon Valley to active segments of financial industry. This could further stir already shaken hedge fund industry.