big data on aws
DESCRIPTION
TRANSCRIPT
BIG Data on AWS
Paul Duffy
Big Data on the Cloud
In the Real World
How the Cloud Is
Big Data’s Best Friend
Characteristics of
Big Data
Characteristics of Big Data
The cost of data generation is falling rapidly
Dramatic increase in volume, velocity and
variety of data
BIG DATA
A collection of tools, techniques and technologies that
allow you to work productively with data at any scale.
Big Data is Getting Bigger
2.7 Zetabytes in 2012
Over 90% will be
unstructured
Data spread across a wide
array of silos
Features driven by MapReduce
Variable data structures and sources
Computer Generated
• Application server logs
(web sites, games)
• Sensor data (weather,
water, smart grids)
• Images/videos (traffic,
security cameras)
Human Generated
• Twitter “Fire Hose” 50m
tweets/day 1,400%
growth per year
• Blogs/Reviews/Emails/P
ictures
• Social Graphs:
Facebook, Linked-in,
Contacts
The Role of Data
is Changing
Traditional analytics required a fixed data model,
based on pre-known questions
Big Data promotes data exploration and experimentation which leads to innovation
Generation Collection &
storage Computation & analytics
Collaboration & sharing
Generation Collection &
storage Computation & analytics
Collaboration & sharing
Lower costs,
faster throughput
Increased pressure on traditional IT and tools
Require tools designed for data
collection and computation at
any volume, velocity or format.
Software
• Designed for distribution
• Easy programming models
• Flexible language choice
• Platform for abstraction and ecosystem
• Good example: Hadoop
Infrastructure
• Designed for distribution
• Easy programming models
• Flexible language choice
• Platform for abstraction and ecosystem
• Good example: Cloud computing
Software
Infrastructure
How the Cloud Is
Big Data’s Best Friend
How do we define the cloud?
By Benefits!
Cloud
Elasticity
Fast Time to Market Focus on core
competency
Pay Per
Use
No Cap Ex
Why is the Cloud
Big Data’s Best Friend?
We know we want collect, store, organize, analyze and
share it.
But we have limited resources.
The Cloud Optimizes
Precious IT Resources
i.e. Skilled People
“Over the next decade, the number of files or containers that
encapsulate the information in the digital universe will grow by
75x.
While the pool of IT staff available to manage them will grow
only slightly. At 1.5x”
- 2011 IDC Digital Universe Study
Deploying a Hadoop cluster is hard
Using Big Data
70%
The Old IT World
30%
Managing All of the “Undifferentiated Heavy Lifting”
Cloud computing
Cloud-Based Infrastructure
Using Big Data
Analyzing and Using Big Data Configuring
Cloud Assets
70%
30% 70%
30%
Managing All of the “Undifferentiated Heavy Lifting”
Cloud computing
The Old IT World
Reusability Managed
Services
Scale Innovation
Reusability Managed
Services
Scale Innovation
Reusability Managed
Services
Scale Innovation
Reusability Managed
Services
Scale Innovation
Reusability Managed
Services
Scale Innovation
The Cloud Optimizes
Capacity Resources
On and Off Fast Growth
Variable peaks Predictable peaks
Elastic Compute Capacity
Elastic Compute Capacity
On and Off Fast Growth
Predictable peaks Variable peaks
WASTE
CUSTOMER DISSATISFACTION
Elastic cloud capacity
Traditional
IT capacity
Your IT needs
Time
Capacity
Elastic Compute Capacity
Elastic Compute Capacity
Fast Growth On and Off
Predictable peaks Variable peaks
The Cloud Empowers Users
to Balance Cost and Time
1 instance for 500 hours
=
500 instances for 1 hour I like this!
I scale
The Cloud
Reduces Cost
For Experimentation
The Cloud
Enables Collection and Storage
of Big Data
Storage Costs are Declining
0,000
250,000
500,000
750,000
1000,000
1 Trillion
750k+ peak transactions per second
Simple Storage Service
Global Accessibility
Region
US-WEST (N. California) EU-WEST (Ireland)
ASIA PAC (Tokyo)
ASIA PAC
(Singapore)
US-WEST (Oregon)
SOUTH AMERICA (Sao Paulo)
US-EAST (Virginia)
GOV CLOUD
Amazon DynamoDB
Managed NoSQL database service
Unlimited size
Unlimited scale
Flexible key/value store
Consistent, low latencies (single digit milliseconds, SSD)
Robust, durable data storage
Integrated analytics with Elastic MapReduce
Amazon Elastic MapReduce
On-demand, managed analytics platform
Powered by Hadoop
Integrated with Spot instances to lower costs
Vibrant ecosystem of tools
Elastic clusters
Flexible programming model (Java, Python, Ruby etc)
Big Data on the Cloud
In the Real World
Big Data Verticals
Media/Advertising
Targeted Advertising
Image and Video
Processing
Oil & Gas
Seismic Analysis
Retail
Recommend
Transactions Analysis
Life Sciences
Genome Analysis
Financial Services
Monte Carlo Simulations
Risk Analysis
Security
Anti-virus
Fraud Detection
Image Recognition
Social Network/Gamin
g
User Demographics
Usage analysis
In-game metrics
Visualizations
Bank – Monte Carlo Simulations
“The AWS platform was a good fit for its unlimited and flexible computational power to our risk-simulation process requirements. With AWS, we now have the power to decide how fast we want to obtain simulation results, and, more importantly, we have the ability to run simulations not possible before due to the large amount of infrastructure required.” – Castillo, Director, Bankinter
23 Hours to 20 Minutes
etsy.com/gifts
Recommendations
Gift Ideas for Facebook Friends
Targeted Ad
User recently
purchased a
sports movie and
is searching for
video games (1.7 Million per day)
Click Stream Analysis
Big Data on the Cloud
In the Real World
How the Cloud Is
Big Data’s Best Friend
Characteristics of
Big Data
Thank you…