(med302) leveraging cloud-based predictive analytics to strengthen audience engagement | aws...

Michael Limcaco, Amazon Web Services

Content discovery … and the conversation around it … matter!

[1] http://www.slideshare.net/AmazonWebServices/maximizing-audience-engagement-in-media-delivery-med303-aws-reinvent-2013-28622676

[2] http://www.nielsen.com/content/corporate/us/en/press-room/2013/new-nielsen-research-indicates-two-way-causal-influence-between-.html

[3] http://www.google.com.au/think/research-studies/quantifying-movie-magic.html

http://www.slideshare.net/AmazonWebServices/maximizing-audience-engagement-in-media-delivery-med303-aws-reinvent-2013-28622676

http://www.nielsen.com/content/corporate/us/en/press-room/2013/new-nielsen-research-indicates-two-way-causal-influence-between-.html

http://www.google.com.au/think/research-studies/quantifying-movie-magic.html

Search

Watch

Listen

Play

Download

Purchase

Contact sales

Subscribe

Contact support

Cancel

Rate It

Review It

Upgrade It

Sharing

Tagging

Bookmarking

Social Sentiment

• Descriptive

– Retrospective

– What happened or is happening

– Simple aggregations and counters

• Predictive

– Statistical forecast

– Predict a value in a dataset

– Machine learning

• Prescriptive (emergent)

– What should I do about it?

Descriptive

Predictive

Prescriptive

Machine Learning

Signals Predictions

Recommendations

Clustering

Classification

Storage

Visualization

&

Analysis

R

Octave

Matlab

Excel

DAS

Graphlab

Mahout

Spark MLlib

H20

Hbase

HDFS

RDBMS

SAN/NAS

KNIME

WEKA

Python Kits

Single Node Big Data

Use Case 1

Spark H20

Recommendation Clustering Classification

Math Library

Hadoop

Map-Reduce

Estimate similar users and items

http://www.slideshare.net/tdunning/recommendation-techn

User1 Thing1

User2 Thing2

User3 Thing3

User2 Thing4

User5 Thing1

User1 Thing2

User1 Thing3

Mike

Jon

Mary

Phil

Kris

Logs History Matrix

History Matrix

2 8

2 4

8

4

Item-Item Matrix

2 8

2 4

8

4

Item-Item Matrix

LLR

Indicators

(“Items Similar To This….”)

Indicators


Items Similar To This

Superman Highlander,

Dune

Star Wars Raiders,

Minority

Report

Highlander Superman

Mulan Home Alone,

Mermaid

Star Trek …

… …

4587 223, 5234

748 5345, 235

12 8234

245 9543, 7673

3456 4587

… …

Index

Indicators

748 Star Wars 45, 235

12 Highlander 8234

245 Mulan 9543,

7673

4587 Superman 12, 5234

3456 Star Trek 2458 …

Query

“12”

5345

3456

12

users

Media

platforms

Mobile

Search

Play

Buy

Rate

Recommendations

https://github.com/apache/mahout

https://github.com/apache/mahout

movie-b movie-c:2.772588722239781

movie-a:2.772588722239781

movie-d ….Indicators


% mahout spark-itemsimilarity

-i input-folder/data.txt

-o output-folder/

--filter1 buy -fc 1 -ic 2

--filter2 view

Use Case 2

Classify (estimate) as Positive | Negative

http://www.slideshare.net/tdunning/recommendation-techn

“I thought Star Wars Episode 28 was not without merit ”

https://github.com/cyhex/streamcrab

users

Media

platforms

Mobile

Search

Play

Buy

Rate

Recommend

Social Media activity

Extract

FeaturesClassify

Extract

FeaturesClassify

Extract

FeaturesClassify

Model

Training

Positive Negative

“I adored this

movie”

“adore” =

POSITIVE

Extract

FeaturesClassify

Extract

FeaturesClassify

Extract

FeaturesClassify

Model

Training

Positive Negative

http://www.nltk.org/book/ch06.html

TextBlob + Natural Language Toolkit (NLTK)

1

2

from textblob.classifier import NaiveBayesClassifier

training_data = [(‘I love this movie’, ‘Positive’),

(‘This makes me mad ’, ‘Negative’) …]

my_classifier = NaiveBayesClassifier(training_data)

“I thought Star Wars Episode 29 was not without merit ”

“Positive”

from amazon_kclpy import kcl import json, base64

class RecordProcessor(kcl.RecordProcessorBase):

def process_records(self, records, checkpointer):

:

inbound_tweet = base64.b64decode(record.get(‘data’))

sentiment = my_classifier.classify(inbound_tweet)

Extract

FeaturesClassify

Extract

FeaturesClassify

Extract

FeaturesClassify

Model

Training

Positive Negative

12 2 7 85 1 997

Mulan

1 5 99 85 50 4

Mulan

1 2 3 4 5 6

Mulan

3 1 4 6 7 9

Mulan

Use Case 3

This is a form of unsupervised learning

Segaran, Toby. Programming Collective Intelligence. Sebastopol: O’Reilly, 2009. Print.

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6374152&isnumber=6374097

R + H20

R + H20

Data

Science

Desktop

Machine

Learning

Cluster

R + H20

% java –jar h20.jar

Use Case 4

Customer Geo Account Type Account

Age

Support

Tickets

Minutes

streamed

Churn?

Mike CA Premium 120 10 240 TBD

John CA Basic 240 1 140 TBD

Ingrid WA Premium 60 5 1800 TBD

Mark WA Basic 30 0 0 TBD

Usman WA Basic 720 0 360 TBD

http://www.bigml.com

http://www.bigml.com/

AWS Marketplace

Software

• BigML

• Revolution R Enterprise

• PredictionIO

• Yhat

• Mortar

• Zementis

http://bit.ly/awsevals

http://bit.ly/awsevals

(med302) leveraging cloud-based predictive analytics to strengthen audience engagement | aws...

Technology

similar users

movied similar

star wars episode

star wars45

merit https

positive negativehttp

merit positivefrom amazon

kclpy import kcl import