domain expertise and unstructured data

22
Domain Expertise and Unstructured Data William D. MacMillan and Evan A. Schnidman O P E N D A T A S C I E N C E C O N F E R E N C E_ BOSTON 2015 @opendatasci

Upload: odsc

Post on 12-Aug-2015

255 views

Category:

Technology


1 download

TRANSCRIPT

Domain Expertise and Unstructured Data

William D. MacMillan and Evan A. Schnidman

O P E ND A T AS C I E N C EC O N F E R E N C E_

BOSTON 2015 @opendatasci

▶Everyone seems to love collecting and mining unstructured data.▶How to make decisions based on it?

Big Data -> Consequential Decisions?

Tools to Find Paths

Machine Learning Structural Methods

Forest or Trees?

Data Without Domain Expertise

Domain Expertise Without Data

▶Expertise allows us to impose structure on otherwise messy results.

Imposing Structure

▶ Data is not limited to numerical.

▶ Information not Data

▶ How to analyze:

-Corporate Communications?

-Central Bank Communications?

▶ Need to know things not easily vectorized.

▶ Dimension reduction by applying information.

Data is Everywhere

▶Good Buzzword minus Bad Buzzword == Sentiment

Traditional Sentiment Analysis

▶Domain expertise allows for much more refined analysis

▶Not a pure data science solution

▶Time for experts to embrace tech and data science to utilize experts!

▶Central Bank communications are complex and important▶Focus today is Federal Reserve

Example: Central Banks

The Briefcase Watch

Traditional Fed Watching

Not Much has Changed

Failed Attempts▶Experts are biased and fail to be comprehensive▶Simple text analysis dictionaries don’t work for

Fed Speak and other complex language▶Ex. “modest” v. “moderate”

Necessary Components▶Must use expertise to train the system based on

whole communications▶Market response matters (Hawkish v. Dovish)

Experts in “Fed Speak”

Scaling Data

+ =

Enough documents can eliminate bias

Expertise allows scaling based on whole documents End result is whole

communications scored in orderly fashion

Resulting Data:▶Comprehensive▶Unbiased▶Quantitative▶Fast

Many Possible Uses▶Eliminate post-hoc

hedging on CB policy▶Forecast based on

established correlations▶Add as a signal in

multifactor model

Qual Turned Quant

Trend matters more than value!

▶Alpha across asset classes, not just Fixed Income▶Mitigates downside risk, especially with Equities.

▶Beats Buy and Hold and Trend Following ▶Low correlation to commonly used strategies▶Better performance with FOREX because both sides of currency pair trade.

Backtested Data

Graph Courtesy of Mavenomics

▶Method translates across wide variety of financially important texts▶Regulatory and shareholder documents for individual equities▶Other regulatory information (Dodd-Frank, FDA, EPA etc.)

Other Applications

Data + Domain Expertise

Prattle AnalyticsTradable Data From Market Chatter

QUESTIONS?

Evan A. [email protected]

Prattle AnalyticsTradable Data From Market Chatter

EXTRA SLIDES

Evan A. [email protected]

U.S. Federal Reserve European Central Bank Bank of EnglandBank of Canada Bank of Japan Reserve Bank of Australia Bank of Korea Reserve Bank of India Swedish Riksbank

List of Central Banks

Reserve Bank of New Zealand Central Bank of MexicoCentral Bank of BrazilCentral Bank of RussiaSouth African Reserve Bank Bank of IsraelCentral Bank of TurkeyCentral Bank of TaiwanSwiss National Bank

Tree Maps

Tree Maps Courtesy of Mavenomics

Forecasting

BacktestingIndependent Backtesting Results

The following results are from a fund that independently tested the Fed Playbook data in January of 2015. This fund primarilyutilized a standard return to volatility futures trading strategy based on a common risk parity model to test the FPSI data fromJanuary 2000 to December 2014. All transactions costs are built into the testing. Their findings indicated the following:

• The FPSI is a superior trade signal to both of the most common trading strategies, “Trend Following” and “Buy and Hold.”

EQUITIES• Using a simple portfolio of the S&P 500, both Trend Following and Buy and Hold generate returns of roughly 27% over the testing period.• The FPSI generates risk adjusted returns of 58%, more than double the most commonly used trading strategies.• FPSI returns were generated with almost perfect long/short balance.• The FPSI only has a 0.3 correlation to Trend Following and just a 0.1 correlation to Buy and Hold, so the FPSI can be used inconcert with these established strategies to generate even higher returns.• The FPSI also proved to be a superb indicator of downside risk, even beating Trend Following.• Optimal holding periods for an equity portfolio traded on FPSI data is 2-3 months.

FOREX• Examining only the U.S. Dollar and Euro based on just U.S. data indicates that the FPSI outperforms existing currency trading models.• Trend Following tends to dominate the currency trading space because over the sample period it generated a 55% return.• Over the same period the FPSI generates over 70% returns.• The FPSI only has a 0.17 correlation to Trend Following, so these two strategies could be used in concert to generate even higher returns.• Optimal holding periods for a currency trade based on the FPSI data is 10-15 days.• These returns are only taking into account Prattle Analytics’ data on the U.S. Federal Reserve, since Prattle also has data onthe European Central Bank (along with more than a dozen other central banks), this information could be used to betterunderstand the other side of the currency pair trade and generate even greater returns.

Prattle AnalyticsTradable Data From Market Chatter

Using Domain Expertise To Improve Text Analysis

--Evan A. [email protected]