(med302) leveraging cloud-based predictive analytics to strengthen audience engagement | aws...
DESCRIPTION
In order to improve audience engagement., media companies must deal with vast amounts of raw data from web, social media, devices, catalogs, and back-channel sources. This session dives into predictive analytic solutions on AWS: We present architecture patterns for optimizing media delivery and tuning overall user experience based on representative data sources (video player clickstream, web logs, CDN, user profiles, social media sentiment, etc.). We dive into concrete implementations of cloud-based machine learning services and show how they can be leveraged for profiling audience demand, cueing content recommendations and prioritizing delivery of related media. Services covered include Amazon EC2, Amazon S3, Amazon CloudFront, and Amazon EMR.TRANSCRIPT
Michael Limcaco, Amazon Web Services
Content discovery … and the conversation around it … matter!
[1] http://www.slideshare.net/AmazonWebServices/maximizing-audience-engagement-in-media-delivery-med303-aws-reinvent-2013-28622676
[2] http://www.nielsen.com/content/corporate/us/en/press-room/2013/new-nielsen-research-indicates-two-way-causal-influence-between-.html
[3] http://www.google.com.au/think/research-studies/quantifying-movie-magic.html
Search
Watch
Listen
Play
Download
Purchase
Contact sales
Subscribe
Contact support
Cancel
Rate It
Review It
Upgrade It
Sharing
Tagging
Bookmarking
Social Sentiment
• Descriptive
– Retrospective
– What happened or is happening
– Simple aggregations and counters
• Predictive
– Statistical forecast
– Predict a value in a dataset
– Machine learning
• Prescriptive (emergent)
– What should I do about it?
Descriptive
Predictive
Prescriptive
Machine Learning
Signals Predictions
Recommendations
Clustering
Classification
Storage
Visualization
&
Analysis
R
Octave
Matlab
Excel
DAS
Graphlab
Mahout
Spark MLlib
H20
Hbase
HDFS
RDBMS
SAN/NAS
KNIME
WEKA
Python Kits
Single Node Big Data
Use Case 1
Spark H20
Recommendation Clustering Classification
Math Library
Hadoop
Map-Reduce
Estimate similar users and items
http://www.slideshare.net/tdunning/recommendation-techn
User1 Thing1
User2 Thing2
User3 Thing3
User2 Thing4
User5 Thing1
User1 Thing2
User1 Thing3
Mike
Jon
Mary
Phil
Kris
Logs History Matrix
History Matrix
2 8
2 4
8
4
Item-Item Matrix
2 8
2 4
8
4
Item-Item Matrix
LLR
Indicators
(“Items Similar To This….”)
Indicators
(“Items Similar To This….”)
Items Similar To This
Superman Highlander,
Dune
Star Wars Raiders,
Minority
Report
Highlander Superman
Mulan Home Alone,
Mermaid
Star Trek …
… …
4587 223, 5234
748 5345, 235
12 8234
245 9543, 7673
3456 4587
… …
Index
Indicators
748 Star Wars 45, 235
12 Highlander 8234
245 Mulan 9543,
7673
4587 Superman 12, 5234
3456 Star Trek 2458 …
Query
“12”
5345
3456
12
users
users
Media
platforms
Mobile
Search
Play
Buy
Rate
Recommendations
https://github.com/apache/mahout
movie-b movie-c:2.772588722239781
movie-a:2.772588722239781
movie-d ….Indicators
(“Items Similar To This….”)
% mahout spark-itemsimilarity
-i input-folder/data.txt
-o output-folder/
--filter1 buy -fc 1 -ic 2
--filter2 view
Use Case 2
Classify (estimate) as Positive | Negative
http://www.slideshare.net/tdunning/recommendation-techn
“I thought Star Wars Episode 28 was not without merit ”
https://github.com/cyhex/streamcrab
users
users
Media
platforms
Mobile
Search
Play
Buy
Rate
Recommend
Social Media activity
Extract
FeaturesClassify
Extract
FeaturesClassify
Extract
FeaturesClassify
Model
Training
Positive Negative
“I adored this
movie”
“adore” =
POSITIVE
Extract
FeaturesClassify
Extract
FeaturesClassify
Extract
FeaturesClassify
Model
Training
Positive Negative
http://www.nltk.org/book/ch06.html
TextBlob + Natural Language Toolkit (NLTK)
1
2
from textblob.classifier import NaiveBayesClassifier
training_data = [(‘I love this movie’, ‘Positive’),
(‘This makes me mad ’, ‘Negative’) …]
my_classifier = NaiveBayesClassifier(training_data)
“I thought Star Wars Episode 29 was not without merit ”
“Positive”
from amazon_kclpy import kcl import json, base64
class RecordProcessor(kcl.RecordProcessorBase):
def process_records(self, records, checkpointer):
:
inbound_tweet = base64.b64decode(record.get(‘data’))
sentiment = my_classifier.classify(inbound_tweet)
Extract
FeaturesClassify
Extract
FeaturesClassify
Extract
FeaturesClassify
Model
Training
Positive Negative
12 2 7 85 1 997
Mulan
1 5 99 85 50 4
Mulan
1 2 3 4 5 6
Mulan
3 1 4 6 7 9
Mulan
Use Case 3
This is a form of unsupervised learning
Segaran, Toby. Programming Collective Intelligence. Sebastopol: O’Reilly, 2009. Print.
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6374152&isnumber=6374097
R + H20
R + H20
Data
Science
Desktop
Machine
Learning
Cluster
R + H20
% java –jar h20.jar
Use Case 4
Customer Geo Account Type Account
Age
Support
Tickets
Minutes
streamed
Churn?
Mike CA Premium 120 10 240 TBD
John CA Basic 240 1 140 TBD
Ingrid WA Premium 60 5 1800 TBD
Mark WA Basic 30 0 0 TBD
Usman WA Basic 720 0 360 TBD
http://www.bigml.com
AWS Marketplace
Software
• BigML
• Revolution R Enterprise
• PredictionIO
• Yhat
• Mortar
• Zementis
http://bit.ly/awsevals