big data and hadoop in the cloud
DESCRIPTION
Big Data and Hadoop in the Cloud - Presentation made in the conference Colombia 3.0 in Bogotá, ColombiaTRANSCRIPT
![Page 1: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/1.jpg)
Jose Papo
Amazon Evangelist
@josepapo @josepapo
![Page 2: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/2.jpg)
HANDS-ON DEMOS
AFTER THE BIG
DATA SESSION
![Page 3: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/3.jpg)
La Nube es el driver de las nuevas tendencias tecnológicas
![Page 4: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/4.jpg)
Accelerating the startup boom
![Page 5: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/5.jpg)
Optimizing the corporate world
![Page 6: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/6.jpg)
#1 ●○○○○
![Page 7: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/7.jpg)
We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form when you have a
chance.
We are constantly producing more data
![Page 8: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/8.jpg)
We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form when you have a
chance.
From all types of industries
![Page 9: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/9.jpg)
Collect,
Store,
Organize,
Analyze &
Share
![Page 10: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/10.jpg)
3Vs
![Page 11: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/11.jpg)
27 TB per day Large Hadron Collider – CERN
![Page 12: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/12.jpg)
![Page 13: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/13.jpg)
![Page 14: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/14.jpg)
The Role of Data
is Changing
![Page 15: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/15.jpg)
We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form when you have a
chance.
Until now, Questions you ask drove Data model
New model is collect as much data as possible – “Data-First Philosophy”
![Page 16: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/16.jpg)
We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form when you have a
chance.
Data is the new raw material for
any business on par with
capital, people, labor
Data is the new raw material for business on par with capital
& labor
![Page 17: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/17.jpg)
Data
Actionable Information
![Page 18: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/18.jpg)
Generated
data
Available for analysis
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
![Page 19: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/19.jpg)
Data Strategist
![Page 20: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/20.jpg)
1.1M peak requests/sec
![Page 21: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/21.jpg)
lunch hours last year?
![Page 22: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/22.jpg)
select productId, count(*) from page_hits where hour in (12,13) group by productId order by count(*) desc
cat *-(12|13) | cut –f3 | sort | uniq -c > out
Hit <enter>?
![Page 23: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/23.jpg)
1PB = 10^15 (1,000,000,000,000,000) bytes
1 PB = 231 days at 50MB/s
![Page 24: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/24.jpg)
Solution: Massively Parallel Processing
![Page 25: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/25.jpg)
#2 ○●○○○
![Page 26: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/26.jpg)
![Page 27: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/27.jpg)
HDFS Reliable storage
MapReduce Data analysis
![Page 28: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/28.jpg)
Very large log
(e.g TBs)
![Page 29: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/29.jpg)
Very large log
(e.g TBs)
Lots of actions
by John
![Page 30: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/30.jpg)
Very large log
(e.g TBs) Split into
small
pieces
Lots of actions
by John
![Page 31: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/31.jpg)
Very large log
(e.g TBs)
Process in a
hadoop cluster
Split into
small
pieces
Lots of actions
by John
![Page 32: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/32.jpg)
Very large log
(e.g TBs)
John’s history
Process in a
hadoop cluster
Aggregate
the results Split into
small
pieces
Lots of actions
by John
![Page 33: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/33.jpg)
map Input
file reduce Output
file
Worker node
![Page 34: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/34.jpg)
map Input
file reduce Output
file
map Input
file reduce Output
file
map Input
file reduce Output
file
Worker node
Worker node
Worker node
![Page 35: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/35.jpg)
How can we
help John?
Very large log
(e.g TBs) Actionable Insight
![Page 36: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/36.jpg)
Deploying a Hadoop Cluster is Hard
![Page 37: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/37.jpg)
![Page 38: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/38.jpg)
#3 ♥
○○●○○
![Page 39: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/39.jpg)
We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form when you have a
chance.
![Page 40: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/40.jpg)
Elastic On Demand
Pay as you go
Focus on
YOUR
business
![Page 41: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/41.jpg)
Elastic On Demand
Pay as you go
Focus on
YOUR
business
![Page 42: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/42.jpg)
November
![Page 43: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/43.jpg)
Provisioned capacity
November
![Page 44: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/44.jpg)
76%
24%
Provisioned capacity
November
![Page 45: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/45.jpg)
November
![Page 46: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/46.jpg)
On and Off Fast Growth
Variable Peaks Predictable Peaks
![Page 47: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/47.jpg)
On and Off Fast Growth
Predictable Peaks Variable Peaks
WASTE
CUSTOMER DISSATISFACTION
![Page 48: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/48.jpg)
Fast Growth On and Off
Predictable peaks Variable peaks
![Page 49: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/49.jpg)
#4 ○○○●○
![Page 50: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/50.jpg)
EMR is Hadoop in the Cloud
![Page 51: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/51.jpg)
![Page 52: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/52.jpg)
![Page 53: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/53.jpg)
Media/Advertising
Targeted Advertising
Image and Video
Processing
Oil & Gas
Seismic Analysis
Retail
Recommendations
Transactions Analysis
Life Sciences
Genome Analysis
Financial Services
Monte Carlo Simulations
Risk Analysis
Security
Anti-virus
Fraud Detection
Image Recognition
Social Network/Gaming
User Demographics
Usage analysis
In-game metrics
![Page 54: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/54.jpg)
0
1.000.000
2.000.000
3.000.000
4.000.000
5.000.000
6.000.000
![Page 55: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/55.jpg)
Versions
1.0.3
0.20.205
0.20
0.18
Distributions
Apache Hadoop
![Page 56: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/56.jpg)
Job Flows
Custom JAR
Cascading
Streaming
Ruby, Perl, Python, PHP, R, Bash, C++
![Page 57: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/57.jpg)
Data Warehouse for Hadoop
SQL-like query language
Hive
![Page 58: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/58.jpg)
High-level programming
Ideal for data flow / ETL
Pig
![Page 59: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/59.jpg)
Near real time key/value
store for structured data
HBase
![Page 60: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/60.jpg)
Distributed monitoring
of cluster and nodes
Ganglia
![Page 61: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/61.jpg)
Statistical computing
and graphics
Machine learning library
discover Value in Data
![Page 62: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/62.jpg)
Unknown Unknowns
![Page 63: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/63.jpg)
Elastic On Demand
Pay as you go
Focus on
YOUR
business
![Page 64: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/64.jpg)
Undifferentiated
Heavy Lifting
Focus on
YOUR
business
![Page 65: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/65.jpg)
![Page 66: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/66.jpg)
![Page 67: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/67.jpg)
elastic-mapreduce
--create
--key-pair micro
--region eu-west-1
--name MyJobFlow
--num-instances 5
--instance-type m2.4xlarge
–-alive
--log-uri s3n://mybucket/EMR/log
Instance type/count
![Page 68: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/68.jpg)
elastic-mapreduce
--create
--key-pair micro
--region eu-west-1
--name MyJobFlow
--num-instances 5
--instance-type m2.4xlarge
–-alive
--pig-interactive --pig-versions latest
--hive-interactive –-hive-versions latest
--hbase
--log-uri s3n://mybucket/EMR/log
Adding Hive, Pig and
Hbase to the job flow
![Page 69: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/69.jpg)
Elastic On Demand
Pay as you go
Focus on
YOUR
business
![Page 70: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/70.jpg)
1 instance for 1000 hours
=
1000 instances for 1 hour
![Page 71: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/71.jpg)
![Page 72: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/72.jpg)
…to Thousands
![Page 73: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/73.jpg)
![Page 74: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/74.jpg)
![Page 75: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/75.jpg)
Turn Off the Resources and Stop Paying
![Page 76: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/76.jpg)
Elastic On Demand
Pay as you go
Focus on
YOUR
business
![Page 77: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/77.jpg)
Source: IDC Whitepaper, sponsored by Amazon, “The Business Value of Amazon Web Services Accelerates Over Time.” July 2012
70% lower 5 year TCO per app
AWS
On-premises
$3.01M
$0.90M
50% reduction in analytics costs
![Page 78: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/78.jpg)
Save more money by using Spot Instances
![Page 79: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/79.jpg)
14 hrs
Without Spot 4 instances * 14 hrs * $0.50 = $28
EMR with Spot Instances
![Page 80: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/80.jpg)
14 hrs
Without Spot 4 instances * 14 hrs * $0.50 = $28
EMR with Spot Instances
14 hrs
![Page 81: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/81.jpg)
14 hrs
Without Spot 4 instances * 14 hrs * $0.50 = $28
7 hrs
EMR with Spot Instances
![Page 82: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/82.jpg)
With Spot 4 instances * 7 hrs * $0.50 = $14 +
14 hrs
Without Spot 4 instances * 14 hrs * $0.50 = $28
EMR with Spot Instances
7 hrs
![Page 83: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/83.jpg)
With Spot 4 instances * 7 hrs * $0.50 = $14 + 5 instances * 7 hrs * $0.25 = $8.75
Total = $22.75
14 hrs
Without Spot 4 instances * 14 hrs * $0.50 = $28
EMR with Spot Instances
7 hrs
![Page 84: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/84.jpg)
Time -50% Cost -22%
With Spot 4 instances * 7 hrs * $0.50 = $14 + 5 instances * 7 hrs * $0.25 = $8.75
Total = $22.75
14 hrs
Without Spot 4 instances * 14 hrs * $0.50 = $28
EMR with Spot Instances
7 hrs
![Page 85: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/85.jpg)
#5 ○○○○●
![Page 86: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/86.jpg)
“What kind of movies do people like ?”
![Page 87: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/87.jpg)
More than 25 Million Streaming Members
50 Billion Events Per Day
30 Million plays every day
2 billion hours of video in 3
months
4 million ratings per day
3 million searches
Device location , time ,
day, week etc.
Social data
![Page 88: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/88.jpg)
10 TB of streaming data per day
![Page 89: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/89.jpg)
~1 PB of data stored in Amazon S3
S3
![Page 90: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/90.jpg)
Wide range of processing languages used
EMR
Prod Cluster (EMR)S3
![Page 91: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/91.jpg)
Data consumed in multiple ways
S3
EMR
Prod Cluster (EMR)
Recommendation
Engine
Ad-hoc
Analysis Personalization
![Page 92: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/92.jpg)
EMR
S3EMR
EMR
Prod Cluster (EMR)
Query Cluster (EMR)
EMR
EMR
![Page 93: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/93.jpg)
![Page 94: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/94.jpg)
Durability
![Page 95: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/95.jpg)
Versioning
![Page 96: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/96.jpg)
![Page 97: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/97.jpg)
![Page 98: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/98.jpg)
Foursquare…
33 million users 1.3 million businesses
…generates a lot of Data 3.5 billion check-ins 15M+ venues, Terabytes of log data
![Page 99: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/99.jpg)
Uses EMR for Evaluation of new features
Machine learning
Exploratory analysis
Daily customer usage reporting
Long-term trend analysis
![Page 100: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/100.jpg)
Benefits of EMR
Ease-of-Use “We have decreased the processing time for urgent data-analysis”
Flexibility To deal with changing requirements & dynamically expand reporting clusters
Costs “We have reduced our analytics costs by over 50%”
![Page 101: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/101.jpg)
Applic
ation S
tack
Scala/Liftweb API Machines WWW Machines Batch Jobs
Scala Application code
Mongo/Postgres/Flat Files
Databases Logs D
ata
Sta
ck
Amazon S3 Database Dumps Log Files
Hadoop Elastic Map Reduce
Hive/Ruby/Mahout Analytics Dashboard Map Reduce Jobs
mongoexport
postgres dump Flume
![Page 102: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/102.jpg)
Applic
ation S
tack
Scala/Liftweb API Machines WWW Machines Batch Jobs
Scala Application code
Mongo/Postgres/Flat Files
Databases Logs D
ata
Sta
ck
Amazon S3 Database Dumps Log Files
Hadoop Elastic Map Reduce
Hive/Ruby/Mahout Analytics Dashboard Map Reduce Jobs
mongoexport
postgres dump Flume
![Page 103: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/103.jpg)
Applic
ation S
tack
Scala/Liftweb API Machines WWW Machines Batch Jobs
Scala Application code
Mongo/Postgres/Flat Files
Databases Logs D
ata
Sta
ck
Amazon S3 Database Dumps Log Files
Hadoop Elastic Map Reduce
Hive/Ruby/Mahout Analytics Dashboard Map Reduce Jobs
mongoexport
postgres dump Flume
![Page 104: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/104.jpg)
Applic
ation S
tack
Scala/Liftweb API Machines WWW Machines Batch Jobs
Scala Application code
Mongo/Postgres/Flat Files
Databases Logs D
ata
Sta
ck
Amazon S3 Database Dumps Log Files
Hadoop Elastic Map Reduce
Hive/Ruby/Mahout Analytics Dashboard Map Reduce Jobs
mongoexport
postgres dump Flume
![Page 105: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/105.jpg)
0
0,1
0,2
0,3
0,4
0,5
0,6
Female Male
Gender
0 10 20 30 40 50 60 70 80
Age
![Page 106: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/106.jpg)
Gorilla Coffee
Gray's Papaya
Amorino
Thursday Friday Saturday Sunday
![Page 107: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/107.jpg)
![Page 108: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/108.jpg)
![Page 109: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/109.jpg)
![Page 110: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/110.jpg)
![Page 111: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/111.jpg)
Python library
https://github.com/Yelp/mrjob
![Page 112: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/112.jpg)
Log files
250 EMR clusters spun up
and down every week
![Page 113: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/113.jpg)
Common Crawl
1000 Genomes Project
Census Data
54 other datasets
http://aws.amazon.com/publicdatasets/
![Page 114: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/114.jpg)
Challenge: Large amounts of computing resources needed for short periods of time; significant data storage costs
Solution: Clusters of 100s of nodes on EMR running 4-5 hours at a time Leverages 1000 genomes Public Data Set on AWS —free access to ~200 TB of genomes for over 2,600 people from 26 populations around the world.
![Page 115: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/115.jpg)
Challenge: Volatile weather is deadly to crops like grapes
Solution: Built a predictive model based on freely available data— 60 years of crop data, 14 TBs of soil data, and 1M government Doppler radar points 50 EMR clusters process new data as it comes into S3 each day, continuously updating the model.
![Page 116: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/116.jpg)
150B Soil
Observations
3M Daily Weather
Measurements
850K Precision Rainfall
Grids Tracked
200 TB in Amazon S3
![Page 117: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/117.jpg)
Big Data and AWS Cloud
![Page 118: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/118.jpg)
Elastic and scalable
No upfront CapEx
Pay per use +
+
On demand
+
= Remove
constraints
![Page 119: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/119.jpg)
Remove constraints = More experimentation
![Page 120: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/120.jpg)
More experimentation = More innovation
![Page 121: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/121.jpg)
Focus on your business
Leave undifferentiated heavy lifting to us
![Page 122: Big Data and Hadoop in the Cloud](https://reader033.vdocument.in/reader033/viewer/2022042714/54b6b0ea4a7959ad7b8b463b/html5/thumbnails/122.jpg)
GRACIAS!
slideshare.net/AmazonWebServicesLATAM
http://aws.amazon.com/es/big-data/
José Papo
AWS Tech Evangelist
@josepapo