(arc303) panning for gold: analyzing unstructured data | aws re:invent 2014
TRANSCRIPT
0
5
10
15
20
25
30
35
40
45
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Glo
bal D
ata
in
Ze
tta
byte
s
Year
1 ZB = 1, 126, 000,000,000,000,000,000 bytes (approximate)
1 ZB = 10 ²¹ bytes = 1024 Exabytes
About 85% is unstructured data
Limited View of
Customer
Internal Data
Server Logs
Data Center
Database
Structured
Unstructured
• Customer Profile
• Product Purchase Statistics
• Product Catalog & Inventory
• Surveys & Customer Reviews
• Emails & Support Requests
• Audio & Video Discussions*
Objective: To personalize and improve online user experience
Complete View of
Customer
Internal Data
Server Logs
Data Center
Database
Structured
Unstructured
• Customer Profile
• Product Purchase Statistics
• Product Catalog & Inventory
• Surveys & Customer Reviews
• Emails & Support Requests
• Audio & Video Discussions*
Internal Data
Server Logs
Data Center
Database
Structured
Unstructured
• Customer Profile
• Product Purchase Statistics
• Product Catalog & Inventory
• Surveys & Customer Reviews
• Emails & Support Requests
• Audio & Video Discussions*
External Data
Social
Reports
Structured
Unstructured
• Social & Professional Profile
• Data from External APIs
• User Location Details
• External Panel Data, Webpages
• Blogs, Reviews, Social Activity
• Likes, Connections, Videos
Mobile
PC
Tablet
From: Device/Form Factors
*Across Browsers/Apps
How: Data Collection
APIs
Third Party
Data Providers
Client Data
Social
Chat
What: Data Variants
User Profile
HTML & Images
Location & Time
Surveys & Reviews
Feeds
Reports
Base EC2 Node
Input
Configuration
Amazon SQS
Amazon S3 Input
Launch
Instances
Amazon S3 Code &
Input List
Amazon S3 Output
Send
Job MessagesPull
Job Messages in Parallel
Fabric
Read Input Files
from Amazon S3 in
Parallel
Write Output Files to
Amazon S3 in Parallel
Alarms
Email & Notifications
Process
Logs
HTML
DIV 1 DIV 2 DIV 3
A
A
<LI> <UL> <UL>
A
A
A
A
DIV 4
A
A
<OL> A
IMGFeature 1
Feature 2
Feature 3
Feature 4
Feature 5
Feature 6
Panel & Web Logs
Social
Rules
Engine
Data
Parser
• Tweets
• Comments
• Likes
• Shares
• Blogs
• Reviews
• Clickstream
• HTML
• Images
• Audio*
• Video*
Feature Type Detail
Feature 1 Image 600*400
Feature 2 Link #
Feature 3 Price 200$
Feature 4 Star 3.5
Tweet Time View
Tweet1 12:00 Positive
Tweet 2 12:05 Neutral
AWS technology Use
AWS Identity and Access Management (IAM) Security and access
Amazon CloudWatch Monitoring infrastructure
Auto Scaling Rule-based dynamic scaling
Amazon Simple Email Service (Amazon SES) Notification and emails
Amazon Simple Notification Service (Amazon SNS) Alarms and notification
AWS CloudTrail User activity and change tracking
AWS CloudFormation Deployment templates
AWS Trusted Advisor Cloud optimization
On Demand (66%)
Spot (33%)
D
a
t
a
P
r
o
v
i
d
e
r
Real Time
A
P
I
Social
Chat
Batch
Batch
Reviews
Surveys C
l
i
e
n
t
Amazon
Redshift
Unified
Data
Store
IAMAWS
CloudFormation
AWS
CloudFormation
Operations
Tracking
for
SLA
Amazon
DynamoDB
Amazon
KinesisAuto Scaling
Sentiment Parser Workers
Amazon EC2
Indexers
Scaled-Up
Input Data
Receiver
Amazon
S3Amazon EMR
Amazon
S3
Amazon EC2
Content
Crawlers
Amazon EC2
Lexical Analyzer
Workers
Amazon
S3
Alarms
Notification
Operations
Logs
Amazon S3
Amazon
CloudWatch
Devices
“Platform we have built has given business teams the muscle and insight
that they have never seen before”
“This unique user view has given Product teams an excellent lens into
what drives user behaviour and how they can positively impact it!”
Please give us your feedback on this session.
Complete session evaluations and earn re:Invent swag.
http://bit.ly/awsevals