i just read an interview of david aronson over at the cssanalytics blog

Wow, what an amazing book! I have now read this book cover to cover 3 times over the last month while getting my hands dirty with TSSB.

This book gives implementation level detail of how to create and test predictive models for any class of market (equity, forex etc...). The implementation is done using the free software called TSSB. The use of TSSB requires learning its associated command line language, which is very very simple to learn yet extremely powerful. As a point of comparison, to program the same models in the R programming language would require perhaps 10 or 100 times more lines of code (I know this because I programmed similar models in R before trying TSSB).

Upon first use, it took me all of 1 hour to create my first model in TSSB and it was off to the races from there having since computed over 900 predictive indicators and over 100 models. A great feature is the ability to combine models in committees or oracles, which is common practice to achieve optimal results.

One of the best features is the ability to perform a multitude of tests (boostrap, monte carlo, and permutation) that will greatly aid in ones confidence that the proposed model will in fact generalize to unseen data. In particular, the permutation test separates and measures a models skill in comparison to its measured bias as well as luck due to trend. It is my hypothesis that the majority of hedge funds do not do such tests and is the reason why so many fail so quickly.

Another great thing about the book being implementation detail is that much of the theory and best practices can be abstracted out of it. For example, it is one thing to read about stationarity (which I have read extensively about through complex greek lettered equations and proofs in other books) yet it was still not completely clear until I actual saw its impact with my own eyes through performance of models I created. Sometimes (often, in my case), getting to the implementation level detail (developing and testing and reviewing results) is the best way to understand a concept. Additionally, I learned exactly how and why serial correlation could dupe someone into thinking a lucky model had any skill.

There also subtle hints given throughout the book (e.g. committees often perform better than individual models, regime specialization per model often performs better, models tend to perform better when specializing in long-only or short-only contexts then combine to either a portfolio or committee, more esoteric indicators (e.g. non-trend type) are better used in more complex models (non-linear), use of more than 3 indicators often overfits a model (a big time saver too!).

Other great things about TSSB (and the book): Most of the indicators are scaled and normalized to an extent, which helps take care of those little things that must be done before getting to the actual modelling. There are also specific functions which help create one model for multiple markets (market regression, cross-market normalization, pooling variables, use of indexes etc...) often such implementation complexities are overlooked in theoretical books.

Indicator selection is very clearly explained with the use of tests such as Chi-Sq, Non-Redundant Indicator scanning, visual examination of indicator-target relationships, find groups, exclusion groups, and model stepwise selection.

It is worth mentioning that currently TSSB is only a research a tool; the models created cannot be simply exported to a trading program (e.g. TradeStation) but apparently this is planned for a future release. In the mean time, Perhaps this is my biggest gripe and fear I will create an amazing system but be unable to implement it in real life. On the other hand, with some time and help, it should be possible to re-create a lot of functionality in R (and use a trading program that integrates with R) or directly re-create it in a trading program that supports predictive functions.

I initially found out about the book over at the CssAnalytics blog. Shortly after, I bought it. Here is why, the book fills a HUGE gap:

Over the last 6 months I have taken an interest as a hobby to learn predictive modelling and apply it to the Forex market. During that those months, I learned the basics of the data mining process, programming and how to implement predictive models in R, and how to apply it to Forex (outside of by day job) . It's been a lot of fun and have gotten somewhat successful results so far.

I learned from the following materials in order:

- "Data Science for Business" by Foster Provost & Tom Fawcett from my MBA alma matter NYU-Stern. This is a fantastic and clear book written which describes data mining process and various models from a conceptual and logical perspective.

- Online course at Stanford from Professor's Hastie and Tibshirani, pioneers in the field which blends theory with practice to show how to implement various classification and regression models using R.

- "Applied Predictive Modelling" by Max Kuhn - an amazing book on implementing predictive models in R - mostly using the R caret package, which is wrapper to over 100 predictive models in R that essentially automates re-sampling (bootstrapping, cross validation, data splitting) as well as model evaluation and comparison.

- White papers that outline some predictive models in forex by generating a bunch of technical indicators and then running predictive models on them (SVM, Random Forest etc..)

As you can see from the above, I have learned from some really smart people how to

1) Create and evaluate predictive models2) Program in R3) Apply to markets such as Forex

Of the above, I have gotten a fairly good grasp on #1 and #2.This book fills the gap on #3!

The following points in the blog interview that caught my initial interest in the book:

"It is possible for a model to have poor error reduction across the entire range of its forecasts while being profitable for trading becausewhen its forecasts are extremethey carry useful information. It is more appropriate to use financial measures such as the profit factor which are all included as objective functions within TSSB."- I literally just had that realization of the effective of EXTREME forecasts, just this last week - by plotting the residuals vs. prediction made this very clear.IF I read the book, I probably wouldn't have needed 6 months to discover that point!Though, the process of discovering it myself was fun too.

"Even the best conventional technical indicators have only small amount predictive information. The vast majority is noise. Thus the task is to model that tiny amount of useful information in each indicator"

Wow, is that true! I'm super interested in learning how to model just theusefulinformation in each indicator. Cool concept.

"In my opinion, the way to differentiate or uncover real opportunities currently lie in the clever engineering of new features- such as better indicators."

I've been using R's TTR package as my sole source of indicators. While there a LOT of indicators in the TTR package, I'm very interested in the 100 or so you mentioned in the software

--------------------

From an excited modeler. Cheers!

i just read an interview of david aronson over at the cssanalytics blog

Documents