Download - Introducing Elastic MapReduce
![Page 1: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/1.jpg)
Karan Bhatia, PhD
Introducing Elastic MapReduce
Big Data Solutions Practice
![Page 2: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/2.jpg)
Vários Tutoriais , treinamentos e mentoria em
português
Inscreva-se agora !!
http://awshub.com.br
![Page 3: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/3.jpg)
![Page 4: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/4.jpg)
4 bytes x 1,000,000 households x 1 measurement/month x 10 years
480 MBytes
![Page 5: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/5.jpg)
4 bytes x 1,000,000 households x 1 measurement/min x 10 years
220 TBytes
![Page 6: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/6.jpg)
Big Data as Business Transformation
![Page 7: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/7.jpg)
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Generated data
Available for analysis
Data volume
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
![Page 8: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/8.jpg)
AWS Elastic MapReduce
Map reduce
HDFS
![Page 9: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/9.jpg)
Thousands of customers, 2 million+ clusters in 2012
![Page 10: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/10.jpg)
EMR Sample Use Cases
![Page 11: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/11.jpg)
Apontador e MapLink
e AWS
Apoio:
![Page 12: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/12.jpg)
• O que conheço do usuário?
{"BaseLogId":"RmlpbjZkWVhCM0NxckNjYjF3eFU0dGNTYnhJPQ","TrackUserId":"a18e0672-ad07-4f28-b447-fc0cba90ee17","SiteId":"apto-dv01","SessionId":"1369827720327:f52c5b","ExternalId":"1933510381","Hostname":"integra01.apontador.lan","Path":"/local/sp/sao_paulo/bares_e_casas_noturnas/QYN7825H/","Referer":null,"PageTitle":"Locais, Eventos, Endereços, Mapas - Apontador.com","IpAddress":"200.150.177.249","AgentInfo":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36","Position":"{ \"lat\": -23.5934691, \"lon\": -46.6882606, \"acc\": 36}","SearchInfo":null,"RawRequestInfo":”RawRequest”: ","CreateAt":"2013-06-24T14:39:46.7082358Z"}
•O que mais?
Ações, cliques, buscas
COMO trazer o melhor para o usuário?
![Page 13: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/13.jpg)
• O que recebemos para determinar o transito?
<Route><Category>1</Category><DateTime>0001-01-01T00:00:00</DateTime><Destination xmlns:a="http://schemas.datacontract.org/2004/07/SwissKnife.Spatial"><a:Lat>-8.150483</a:Lat><a:Lng>-35.420284</a:Lng></Destination><Origin xmlns:a="http://schemas.datacontract.org/2004/07/SwissKnife.Spatial"><a:Lat>-8.149973</a:Lat><a:Lng>-35.41825</a:Lng></Origin>
COMO descobrir o trânsito?
![Page 14: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/14.jpg)
Teorema de Bayes:
O MODELO estatístico
![Page 15: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/15.jpg)
• Hive (~ 40 instancias spot m3.large)
90% - Utilidades diárias
• Streaming
10% - Solr, MapReduces mais complexos (MCMC, FastFourier, e.g.)
• Estrutura usada
Hive ( ~ 40 instancias spot m3.large), Elastic MapReduce S3 (aproximadamente 7 Tb de dados estruturados em diversos buckets) RDS (dados de organização dos dados do S3)
O QUE usamos?
![Page 16: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/16.jpg)
• A Chaordic é a empresa líder em personalização para e-commerce no Brasil, tendo como clientes 9 dos 15 maiores players do país.
• Os produtos desenvolvidos pela
Chaordic se integram aos maiores sites de e-commerce brasileiros e precisam de uma infra-estrutura confiável, rápida, escalável e de baixo custo.
“Com a AWS conseguimos construir um único sistema para
atender a demanda dos maiores sites de e-commerce do Brasil a
um custo relativamente baixo”.
“Construir um data
center próprio para
atender nossa
demanda seria
economicamente
inviável” - João Bosco, CTO
![Page 17: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/17.jpg)
O Desafio
• Atender dezenas de milhões de usuários únicos por mês;
• Processamento de Big Data;
• Responder em menos de 100ms;
• Escalar bem em momentos de pico de acesso;
• Tudo isto a um custo acessível.
![Page 18: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/18.jpg)
Sobre o Papel da AWS e
Benefícios alcançados
• 4 bilhões de requisições por mês;
• +300 mil requisições por minuto;
• +200 milhões de recomendações todos os dias;
• Spot instances: -20% custo aws.
![Page 19: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/19.jpg)
Map Reduce
![Page 20: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/20.jpg)
![Page 21: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/21.jpg)
Map Shuffle Reduce
![Page 22: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/22.jpg)
AWS Elastic MapReduce
![Page 23: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/23.jpg)
Managed Hadoop analytics
![Page 24: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/24.jpg)
Input data
S3, DynamoDB, Redshift
![Page 25: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/25.jpg)
Elastic
MapReduce
Code
Input data
S3, DynamoDB, Redshift
![Page 26: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/26.jpg)
Elastic
MapReduce
Code Name
node
Input data
S3, DynamoDB, Redshift
![Page 27: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/27.jpg)
Elastic
MapReduce
Code Name
node
Input data
Elastic
cluster
S3, DynamoDB, Redshift
S3/HDFS
![Page 28: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/28.jpg)
Elastic
MapReduce
Code Name
node
Input data
S3/HDFS Queries
+ BI
Via JDBC, Pig, Hive
S3, DynamoDB, Redshift
Elastic
cluster
![Page 29: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/29.jpg)
Elastic
MapReduce
Code Name
node
Output
Input data
Queries
+ BI
Via JDBC, Pig, Hive
S3, DynamoDB, Redshift
Elastic
cluster
S3/HDFS
![Page 30: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/30.jpg)
Output
Input data
S3, DynamoDB, Redshift
![Page 31: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/31.jpg)
![Page 32: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/32.jpg)
![Page 33: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/33.jpg)
![Page 34: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/34.jpg)
![Page 35: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/35.jpg)
1
2
4
8
16
32
64
128
256
1 2 4 8 16 32 64 128
Mem
ory
(GB)
EC2 Compute Units
Instance Types
Standard 2nd Gen Standard Micro High-Memory High-CPU Cluster Compute Cluster GPU High I/O High-Storage Cluster High-Mem
hi1.4xlarge 60.5 GB of memory 35 EC2 Compute Units 2x1024 GB SSD instance storage 64-bit platform
cc1.4xlarge 23 GB of memory 33.5 EC2 Compute Units 1690 GB of instance storage 64-bit platform
c1.xlarge 7 GB of memory 20 EC2 Compute Units 1690 GB of instance storage 64-bit platform
m1.small 1.7 GB memory 1 EC2 Compute Unit 160 GB instance storage 32-bit or 64-bit
m1.medium 3.75 GB memory 2 EC2 Compute Unit 410 GB instance storage 32-bit or 64-bit platform
m1.large EBS Optimizable 7.5 GB memory 4 EC2 Compute Units 850 GB instance storage 64-bit platform
m1.xlarge EBS Optimizable 15 GB memory 8 EC2 Compute Units 1,690 GB instance storage 64-bit platform
m2.xlarge 17.1 GB of memory 6.5 EC2 Compute Units 420 GB of instance storage 64-bit platform
m2.2xlarge 34.2 GB of memory 13 EC2 Compute Units 850 GB of instance storage 64-bit platform
m2.4xlarge EBS Optimizable 68.4 GB of memory 26 EC2 Compute Units 1690 GB of instance storage 64-bit platform
t1.micro 613 MB memory Up to 2 EC2 Compute Units EBS storage only 32-bit or 64-bit platform
c1.medium 1.7 GB of memory 5 EC2 Compute Units 350 GB of instance storage 32-bit or 64-bit platform
cg1.4xlarge 22 GB of memory 33.5 EC2 Compute Units 2 x NVIDIA Tesla “Fermi” M2050 GPUs 1690 GB of instance storage 64-bit platform
cc2.8xlarge 60.5 GB of memory 88 EC2 Compute Units 3370 GB of instance storage 64-bit platform m3.xlarge
15 GB of memory 13 EC2 Compute Units
m3.2xlarge EBS Optimizable 30 GB of memory 26 EC2 Compute Units
hs1.8xlarge 117 GB of memory 35 EC2 Compute Units 24x2 TB instance storage 64-bit platform
cr1.8xlarge 244 GB of memory 88 EC2 Compute Units 2x120 GB SSD instance storage 64-bit platform
![Page 36: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/36.jpg)
![Page 37: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/37.jpg)
![Page 38: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/38.jpg)
![Page 39: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/39.jpg)
![Page 40: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/40.jpg)
![Page 41: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/41.jpg)
![Page 42: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/42.jpg)
1. Elastic clusters
![Page 43: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/43.jpg)
10 hours
![Page 44: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/44.jpg)
5 hours
![Page 45: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/45.jpg)
Peak capacity
![Page 46: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/46.jpg)
2. Rapid, tuned provisioning
![Page 47: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/47.jpg)
Tedious.
![Page 48: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/48.jpg)
Remove undifferentiated
heavy lifting.
![Page 49: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/49.jpg)
3. Hadoop all the way down
![Page 50: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/50.jpg)
Robust ecosystem. Databases, machine learning, segmentation,
clustering, analytics, metadata stores,
exchange formats, and so on...
![Page 51: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/51.jpg)
4. Agility for experimentation
![Page 52: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/52.jpg)
Instance choice. Stay flexible on instance type & number.
![Page 53: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/53.jpg)
5. Cost optimizations
![Page 54: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/54.jpg)
Built for Spot. Name-your-price supercomputing.
![Page 55: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/55.jpg)
1. Elastic clusters
2. Rapid, tuned provisioning
3. Hadoop all the way down
4. Agility for experimentation.
5. Cost optimizations
![Page 56: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/56.jpg)
Data, data, everywhere... Data is stored in silos.
![Page 57: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/57.jpg)
S3
DynamoDB EMR
HBase on EMR RDS
Redshift
On-premises
![Page 58: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/58.jpg)
S3
DynamoDB EMR
HBase on EMR RDS
Redshift
On-premises
![Page 59: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/59.jpg)
S3
DynamoDB EMR
HBase on EMR RDS
Redshift
On premises
![Page 60: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/60.jpg)
S3
DynamoDB EMR
HBase on EMR RDS
Redshift
On premises
![Page 61: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/61.jpg)
S3
DynamoDB EMR
HBase on EMR RDS
Redshift
On premises
![Page 62: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/62.jpg)
AWS Data Pipeline
Announced in November, available now.
Orchestration for data-intensive workloads.
![Page 63: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/63.jpg)
AWS Data Pipeline
Data-intensive orchestration and automation
Reliable and scheduled
Easy to use, drag and drop
Execution and retry logic
Map data dependencies
Create and manage temporary compute
resources
![Page 64: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/64.jpg)
Anatomy of a pipeline
![Page 65: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/65.jpg)
Additional checks and notifications
![Page 66: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/66.jpg)
Arbitrarily complex pipelines
![Page 67: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/67.jpg)
aws.amazon.com/datapipeline
![Page 68: Introducing Elastic MapReduce](https://reader034.vdocument.in/reader034/viewer/2022051412/54b6c76b4a795996608b45e4/html5/thumbnails/68.jpg)
aws.amazon.com/big-data