Friday, January 27, 2017

Enterprise Grade Risk Modeling Using Machine Learning

All pieces of a puzzle are now in place for a productive and successful large scale risk modeling using Machine Learning. A slew of recent software and hardware announcements means that we finally have a full, brand new stack of components to have a shot at productive enterprise class Machine Learning risk modeling exercise. Google's TensorFlow makes it possible to quickly train, test and run predictive models on a variety of target devices ( CPU, GPU ). Latest TensorFlow releases incorporate tf.learn library that makes it much easier to extract features and pass datasets on to train, test and predict modules. This trend towards ease of use will continue with announced Keras incorporation into TensorFlow build. On a hardware side, IBM just released Power AI platform/appliance that, aside from TensorFlow, also incorporates Nvidia hardware and software ( GPUs, Cuda, NVLink ).

Monday, January 02, 2017

A Comical Break: Moody's Prefers Intuition or Economic Theory to Machine Learning Because It Is Important to Have Theoretical Underpinnings

Here is Moody's ( 2012) take on variable ( feature ) selection in the context of risk analysis ( Methodology for Forecasting and Stress-Testing U.S. Vehicles ABS Deals ):

A key aspect of model development is variable selection-identifying which credit and economic
variables best explain the dynamic behavior of the dependent variable in question. Aligned with principles  of modern econometrics, we prefer to choose the variables based on a combination of economic theory or intuition, together with a consideration of the statistical properties of the estimated model.
We believe models built using pure data-mining techniques or principles such as machine learning, though they may fit the existing data well, are more likely to fail in a changing external environment because they lack theoretical underpinnings. The best prediction models employ a combination of statistical rigor with a healthy dose of economic principle. Models built this way enjoy the additional benefit of ease of interpretation.

I am not sure how they can claim the above with the straight face. Moody's is one of agencies that completely failed to predict 2008 housing originated crash. There are not known, scientific, or even commonly agreed upon "theoretical principles" or economic theory. It is now clearer why agencies have a problem with prediction, which is hard, especially about the future.  The FCIC commission found that agencies' credit ratings were influenced by "flawed computer models, ...". Yet they stick to the same practices. Continuing:

Adding each economic variable helps the model improve predictive power.Generally speaking, the economic variables should be useful in both producing accurate out-of sample forecasts and providing good in-sample fit. However, we sometimes have to make tradeoff decisions to balance out between these two goals when they are conflicting. If the
discrepancy is unavoidable and very significant, we prioritize forecast accuracy rather than in-sample fit, as forecasts are end results of our models. 

Translated: but when the above practice fails - we fudge by taking whatever works better - exactly an approach they ( Moody's ) dismissed earlier.

Here they finally convince us it is actually alchemy approach, based on art and intuition ( which doesn't prevent them from sprinkling some scary looking math - just for the artistic impression:

And here Moody's finally leaves no shade of doubt we are dealing with artists, entertainers and illusionists :
Variable selection is more art than science. The criteria mentioned above are not black or white.
The bottom line is to build a theoretically sound and empirically workable model and get reasonable and
consistent forecasts that are supported by both economic intuition and statistical significance.

To their credit, and unlike many inhouse modeling practices, Moody's actually checks how model performs, but they rarely admit model is wrong:

The consistency check is the comparison of model performance across different production runs. We keep track of the model performance by comparing the forecast statistics over time. The results of the analysis may suggest revisions to the model. However, differences do not necessarily indicate that the model is in error. We should look into what causes the discrepancy and how this affects the end results. If the statistics get really worse and fall into an unacceptable range, we should modify the original model to accommodate revised performance data and changing economic conditions and make sure that the model reflects the most recent development in the auto ABS market.