20121108 sntmnt data_sciencenl

Post on 27-Jan-2015

111 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

the prevailing attitude of investors as to anticipatedprice development in a market.

< sen·ti·ment >

Tim Harbers, CTO SNTMNTDataScienceNL Meetup November 8th 2012

Tim Harbers

Background

BSc Computer Science

MSc Computer Science

Researcher

Data Miner

Technical Consultant

Co-Founder and COO

Co-Founder and CTO

Vincent van LeeuwenCustomer Development

Kees van NunenProduct Development

Durk KingmaData Mining Expert

Tim HarbersMachine Learning Expert

The Rockstars‣ Balanced multidisciplinary

team

‣ Two machine learning experts in predictive analysis and large datasets

‣ Academic degrees in Behavioral Finance, Portfolio Finance, Strategic Management & Artificial Intelligence

‣ Strong network in (Dutch) financial industry

‣ Young, enthusiastic team with a proven entrepreneurial mindset

How to select the right stockto invest in?

Our solution:

Predicting stock price movementbased on online buzz

Engineered based on academic research:

Bollen, et al, (2010)

Sprenger and Welpe (2010)

Van Leeuwen (2011)

Sehgal and Song (2007)

Why would this work? Very different from traditional indicators News travels faster via social than traditional

media Tremendous amount of data (Almost) nobody uses it yet

Why focus on Twitter? Public data & easily accessible Structured language 400M tweets per day

Historic ResearchBollen (2010) Created a model based on Twitter mood states, which was 86% accurate on the DJI.

Sprenger and Welpe (2011) Analyzed correlation of the stock market and micro blogs

Financial Sentiment vs Brand Sentiment

Financial Sentiment Brand Sentiment

Tweets relating to stocks

Written by traders Trader mumbo

jumbo More relevant Shorter term

Tweets relating to brands

Written by consumers

Any language Larger dataset Longer term

Data setupPeriodJune 2010 to April 2012

StocksTop 15 most tweeted stocks in S&P 500

TweetsFinancial Dataset Timm Sprenger (4 million)4 Million tweets Topsy Brand Tweets (100+ million tweets)

OtherKloutPeerindex

Sentiment Scoring

Financial tweets

Commercial tweets

Sentiment analysis:

Enabling computers to derive sentimentfrom natural language

Naive Approach: Dictionaries Use a dictionary of common positive and

negative terms Count the number of positive and negative

terms Use the difference between the two.

SNTMNT’s approach: machine learning Label a training set of tweets (target) Use preprocessing techniques Use several feature extractors Create a sparse dataset. Use supervised learning to train a machine

learning model.

Labeling

• 25K Financial tweets hand labeled• 30K Commercial tweets hand

labeled• 1M #happy vs. #sad

Difficulties in sentiment analysis Authors / Urls Foreign languages

Slang aykm lol tgsttttptct

Negation

Target Sentiment Analysis

ResultsFinancial tweets84.3% accurate on 2-point scale (Baseline: 60.4%)76.8% accurate on 3-point scale (Baseline: 65.0%)Beat Lexalytics (84.3% vs. 70.3%)Commercial tweets 84.7% accurate on 2-point scale (Baseline: 61.0%) 86.9% accurate on 3-point scale (baseline: 81.1%)

Stock Regression

Stock Regression Input:

Sentiment scores Mood states Meta Data Stock

Output: Trading Indication Confidence

Many dimensions Tweet period Trading period Financial Tweets or Commercial Tweets Tweet Crunchers Models Trading strategy

Tweet Aggregation Problem

Tweet volume Volume positive

tweets Avg sentiment Sentiment Growth Etc.

Machine Learning Models Linear Regression Bayesian Approaches Decision Trees Neural Nets Support Vector Machines

Results R2 < 0.01 Not usable as an independent trading model

after transaction costs. Still usable as an extra indicator to be used by

proven trading models.

Stock Dashboard (B2B2C)

Sentiment APIs(B2B)

Trading Indicator API(B2B)

Products - next steps:

‣ Extend scope to further niche domains and languages.

‣ Market leader and thought leader financial sentiment analysis. ‣ Getting more

insights into added value of SNTMNT algorithm as indicator next to fundamental and technical analysis.

Any questions?

For more info, visit:

www.SNTMNT.com

top related