big data architectural patterns and best practices on aws · pdf filereference architecture...
Post on 17-Mar-2018
241 Views
Preview:
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Antoine Généreux, Solutions Architect, AWS
April 4th , 2017
Big Data Architectural Patterns and Best Practices on AWSBigDataMontréal(BDM52)
What to Expect from the Session
Big data challengesArchitectural principlesHow to simplify big data processingWhat technologies should you use?
• Why?• How?
Reference architectureDesign patterns
Ever-Increasing Big Data
Volume
Velocity
Variety
Big Data Evolution
Batch processing
Streamprocessing
Machinelearning
Cloud Services Evolution
Virtual machines
Managed services
Serverless
Plethora of Tools
Amazon Glacier
S3 DynamoDB
RDS
EMR
Amazon Redshift
Data PipelineAmazon Kinesis
Amazon Kinesis Streams app
Lambda Amazon ML
SQS
ElastiCache
DynamoDBStreams
Amazon ElasticsearchService
Amazon Kinesis Analytics
Big Data Challenges
Why?
How?
What tools should I use?
Is there a reference architecture?
Architectural Principles
Build decoupled systems• Data → Store → Process → Store → Analyze → Answers
Use the right tool for the job• Data structure, latency, throughput, access patterns
Leverage AWS managed services• Scalable/elastic, available, reliable, secure, no/low admin
Use log-centric design patterns• Immutable logs, materialized views
Be cost-conscious• Big data ≠ big cost
Simplify Big Data Processing
COLLECT STORE PROCESS/ANALYZE CONSUME
Time to answer (Latency)Throughput
Cost
COLLECT
Types of DataCOLLECT
Mobile apps
Web apps
Data centersAWS Direct
Connect
RECORDS
Appl
icat
ions In-memory data structures
Database records
AWS Import/ExportSnowball
Logging
Amazon CloudWatch
AWS CloudTrail
DOCUMENTS
FILES
Logg
ing
Tran
spor
t
Search documents
Log files
MessagingMessage MESSAGES
Mes
sagi
ng
Messages
Devices
Sensors & IoT platforms
AWS IoT STREAMS
IoT Data streams
Transactions
Files
Events
What Is the Temperature of Your Data ?
Hot Warm ColdVolume MB–GB GB–TB PB–EBItem size B–KB KB–MB KB–TBLatency ms ms, sec min, hrsDurability Low–high High Very highRequest rate Very high High LowCost/GB $$-$ $-¢¢ ¢
Hot data Warm data Cold data
Data Characteristics: Hot, Warm, Cold
Store
STORE
Devices
Sensors & IoT platforms
AWS IoT STREAMS
IoT
COLLECT
AWS Import/ExportSnowball
Logging
Amazon CloudWatch
AWS CloudTrail
DOCUMENTS
FILES
Logg
ing
Tran
spor
t
MessagingMessage MESSAGES
Mes
sagi
ngAp
plic
atio
ns
Mobile apps
Web apps
Data centersAWS Direct
Connect
RECORDS
Types of Data Stores
Database SQL & NoSQL databases
Search Search engines
File store File systems
Queue Message queues
Streamstorage
Pub/sub message queues
In-memory Caches, data structure servers
In-memory
Amazon Kinesis Firehose
Amazon KinesisStreams
Apache Kafka
Amazon DynamoDB Streams
Amazon SQS
Amazon SQS• Managed message queue service
Apache Kafka• High throughput distributed streaming
platform
Amazon Kinesis Streams• Managed stream storage + processing
Amazon Kinesis Firehose• Managed data delivery
Amazon DynamoDB• Managed NoSQL database• Tables can be stream-enabled
Message & Stream Storage
Devices
Sensors & IoT platforms
AWS IoT STREAMS
IoT
COLLECT STORE
Mobile apps
Web apps
Data centersAWS Direct
Connect
RECORDSDatabaseAp
plic
atio
ns
AWS Import/ExportSnowball
Logging
Amazon CloudWatch
AWS CloudTrail
DOCUMENTS
FILES
Search
File store
Logg
ing
Tran
spor
t
MessagingMessage MESSAGES
Mes
sagi
ng
Mes
sage
Stre
am
Why Stream Storage?
Decouple producers & consumersPersistent bufferCollect multiple streams
Preserve client orderingParallel consumptionStreaming MapReduce
4 4 3 3 2 2 1 14 3 2 1
4 3 2 1
4 3 2 1
4 3 2 14 4 3 3 2 2 1 1
shard 1 / partition 1
shard 2 / partition 2
Consumer 1Count of red = 4
Count of violet = 4
Consumer 2Count of blue = 4
Count of green = 4
DynamoDB stream Amazon Kinesis stream Kafka topic
What About Amazon SQS? • Decouple producers & consumers• Persistent buffer• Collect multiple streams• No client ordering (Standard)
• FIFO queue preserves client ordering
• No streaming MapReduce• No parallel consumption
• Amazon SNS can publish to multiple SNS subscribers (queues or ʎ functions)
Publisher
Amazon SNS
topic
function
ʎ
AWS Lambdafunction
Amazon SQSqueue
queue
Subscriber
Con
sum
ers
4 3 2 1
12344 3 2 1
1234
2134
13342
Standard
FIFO
Which Stream/Message Storage Should I Use?AmazonDynamoDBStreams
AmazonKinesisStreams
AmazonKinesis Firehose
ApacheKafka
AmazonSQS (Standard)
Amazon SQS (FIFO)
AWS managed Yes Yes Yes No Yes Yes
Guaranteed ordering Yes Yes No Yes No Yes
Delivery (deduping) Exactly-once At-least-once At-least-once At-least-once At-least-once Exactly-once
Data retention period 24 hours 7 days N/A Configurable 14 days 14 days
Availability 3 AZ 3 AZ 3 AZ Configurable 3 AZ 3 AZ
Scale / throughput
No limit /~ table IOPS
No limit /~ shards
No limit /automatic
No limit /~ nodes
No limits /automatic
300 TPS / queue
Parallel consumption Yes Yes No Yes No No
Stream MapReduce Yes Yes N/A Yes N/A N/A
Row/object size 400 KB 1 MB Destination row/object size
Configurable 256 KB 256 KB
Cost Higher (table cost)
Low Low Low (+admin) Low-medium Low-medium
Hot Warm
New
In-memory
COLLECT STORE
Mobile apps
Web apps
Data centersAWS Direct
Connect
RECORDSDatabase
AWS Import/ExportSnowball
Logging
Amazon CloudWatch
AWS CloudTrail
DOCUMENTS
FILES
Search
MessagingMessage MESSAGES
Devices
Sensors & IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon KinesisStreams
Amazon Kinesis Firehose
Amazon DynamoDB Streams
Hot
Stre
am
Amazon S3
Amazon SQS
Mes
sage
Amazon S3File
Logg
ing
IoT
Appl
icat
ions
Tran
spor
tM
essa
ging
File Storage
Why Is Amazon S3 Good for Big Data?
• Natively supported by big data frameworks (Spark, Hive, Presto, etc.) • No need to run compute clusters for storage (unlike HDFS)• Can run transient Hadoop clusters & Amazon EC2 Spot Instances• Multiple & heterogeneous analysis clusters can use the same data• Unlimited number of objects and volume of data• Very high bandwidth – no aggregate throughput limit• Designed for 99.99% availability – can tolerate zone failure• Designed for 99.999999999% durability• No need to pay for data replication• Native support for versioning• Tiered-storage (Standard, IA, Amazon Glacier) via life-cycle policies• Secure – SSL, client/server-side encryption at rest• Low cost
What About HDFS & Data Tiering?
• Use HDFS for very frequently accessed (hot) data
• Use Amazon S3 Standard for frequently accessed data
• Use Amazon S3 Standard – IA for less frequently accessed data
• Use Amazon Glacier for archiving cold data
In-memory
COLLECT STORE
Mobile apps
Web apps
Data centersAWS Direct
Connect
RECORDS Database
AWS Import/ExportSnowball
Logging
Amazon CloudWatch
AWS CloudTrail
DOCUMENTS
FILES
Search
MessagingMessage MESSAGES
Devices
Sensors & IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon KinesisStreams
Amazon Kinesis Firehose
Amazon DynamoDB Streams
Hot
Stre
am
Amazon SQS
Mes
sage
Amazon S3File
Logg
ing
IoT
Appl
icat
ions
Tran
spor
tM
essa
ging
In-memory, Database, Search
Anti-Pattern
Data Tier
Applications
RDBMS
Best Practice: Use the Right Tool for the Job
Data TierSearch
Amazon Elasticsearch Service
In-memory
Amazon ElastiCacheRedisMemcached
SQL
Amazon AuroraAmazon RDS
MySQLPostgreSQLOracleSQL Server
NoSQL
Amazon DynamoDBCassandraHBaseMongoDB
Applications
COLLECT STORE
Mobile apps
Web apps
Data centersAWS Direct
Connect
RECORDS
AWS Import/ExportSnowball
Logging
Amazon CloudWatch
AWS CloudTrail
DOCUMENTS
FILES
MessagingMessage MESSAGES
Devices
Sensors & IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon KinesisStreams
Amazon Kinesis Firehose
Amazon DynamoDB Streams
Hot
Stre
am
Amazon SQS
Mes
sage
Amazon Elasticsearch Service
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Sear
ch
SQL
N
oSQ
L C
ache
File
Logg
ing
IoT
Appl
icat
ions
Tran
spor
tM
essa
ging
Amazon ElastiCache• Managed Memcached or Redis service
Amazon DynamoDB• Managed NoSQL database service
Amazon RDS• Managed relational database service
Amazon Elasticsearch Service• Managed Elasticsearch service
Which Data Store Should I Use?
Data structure → Fixed schema, JSON, key-value
Access patterns → Store data in the format you will access it
Data characteristics → Hot, warm, cold
Cost → Right cost
Data Structure and Access PatternsAccess Patterns What to use?Put/Get (key, value) In-memory, NoSQLSimple relationships → 1:N, M:N NoSQLMulti-table joins, transaction, SQL SQLFaceting, search Search
Data Structure What to use?Fixed schema SQL, NoSQLSchema-free (JSON) NoSQL, Search(Key, value) In-memory, NoSQL
In-memory SQL
Request rateHigh Low
Cost/GBHigh Low
LatencyLow High
Data volumeLow High
Amazon Glacier
Stru
ctur
e
NoSQL
Hot data Warm data Cold data
Low
High
Amazon ElastiCache AmazonDynamoDB
AmazonRDS/Aurora
AmazonES
Amazon S3 Amazon Glacier
Average latency
ms ms ms, sec ms,sec ms,sec,min(~ size)
hrs
Typicaldata stored
GB GB–TBs(no limit)
GB–TB(64 TB max)
GB–TB MB–PB(no limit)
GB–PB(no limit)
Typicalitem size
B-KB KB(400 KB max)
KB(64 KB max)
B-KB(2 GB max)
KB-TB(5 TB max)
GB(40 TB max)
Request Rate
High – very high Very high(no limit)
High High Low – high(no limit)
Very low
Storage costGB/month
$$ ¢¢ ¢¢ ¢¢ ¢ ¢4/10
Durability Low - moderate Very high Very high High Very high Very high
Availability High2 AZ
Very high 3 AZ
Very high3 AZ
High2 AZ
Very high3 AZ
Very high3 AZ
Hot data Warm data Cold data
Which Data Store Should I Use?
Cost-Conscious Design
Example: Should I use Amazon S3 or Amazon DynamoDB?
“I’m currently scoping out a project. The design calls for many small files, perhaps up to a billion during peak. The total size would be on the order of 1.5 TB per month…”
Request rate (Writes/sec)
Object size(Bytes)
Total size(GB/month)
Objects per month
300 2048 1483 777,600,000
https://calculator.s3.amazonaws.com/index.html
Simple Monthly Calculator
Cost-Conscious Design
Example: Should I use Amazon S3 or Amazon DynamoDB?
Request rate (Writes/sec)
Object size(Bytes)
Total size(GB/month)
Objects per month
300 2,048 1,483 777,600,000
Amazon S3 orDynamoDB?
Request rate (Writes/sec)
Object size(Bytes)
Total size(GB/month)
Objects per month
Scenario 1300 2,048 1,483 777,600,000
Scenario 2300 32,768 23,730 777,600,000
Amazon S3
Amazon DynamoDB
use
use
PROCESS / ANALYZE
BatchTakes minutes to hours Example: Daily/weekly/monthly reportsAmazon EMR (MapReduce, Hive, Pig, Spark)
InteractiveTakes secondsExample: Self-service dashboardsAmazon Redshift, Amazon Athena, Amazon EMR (Presto, Spark)
MessageTakes milliseconds to secondsExample: Message processingAmazon SQS applications on Amazon EC2
StreamTakes milliseconds to secondsExample: Fraud alerts, 1 minute metricsAmazon EMR (Spark Streaming), Amazon Kinesis Analytics, KCL, Storm, AWS Lambda
Machine LearningTakes milliseconds to minutesExample: Fraud detection, forecast demandAmazon ML, Amazon EMR (Spark ML)
Analytics Types & Frameworks PROCESS / ANALYZE
Amazon Machine LearningM
LM
essa
ge
Amazon SQS appsAmazon EC2
Streaming
Amazon Kinesis Analytics
KCLapps
AWS Lambda
Stre
am
Amazon EC2
Amazon EMR
Fast
Amazon Redshift
Presto
AmazonEMR
Fast
Slow
Amazon Athena
Batc
hIn
tera
ctiv
e
Which Stream & Message Processing Technology Should I Use?Amazon EMR (Spark Streaming)
Apache Storm
KCL Application Amazon Kinesis Analytics
AWS Lambda
Amazon SQS Application
AWS managed
Yes (Amazon EMR)
No (Do it yourself)
No ( EC2 + AutoScaling)
Yes Yes No (EC2 + AutoScaling)
Serverless No No No Yes Yes No
Scale / throughput
No limits /~ nodes
No limits /~ nodes
No limits /~ nodes
Up to 8 KPU / automatic
No limits /automatic
No limits /~ nodes
Availability Single AZ Configurable Multi-AZ Multi-AZ Multi-AZ Multi-AZ
Programminglanguages
Java, Python, Scala
Almost any language viaThrift
Java, others via MultiLangDaemon
ANSI SQL with extensions
Node.js, Java, Python
AWS SDK languages (Java, .NET, Python, …)
Uses Multistage processing
Multistage processing
Single stage processing
Multistage processing
Simple event-based triggers
Simple event based triggers
Reliability KCL and Spark checkpoints
Framework managed
Managed by KCL Managed by Amazon Kinesis Analytics
Managed by AWS Lambda
Managed by SQS Visibility Timeout
Which Analysis Tool Should I Use?Amazon Redshift Amazon Athena Amazon EMR
Presto Spark Hive
Use case Optimized for data warehousing
Ad-hoc Interactive Queries
Interactive Query
General purpose (iterative ML, RT, ..)
Batch
Scale/throughput ~Nodes Automatic / No limits ~ Nodes
AWS ManagedService
Yes Yes, Serverless Yes
Storage Local storage Amazon S3 Amazon S3, HDFS
Optimization Columnar storage, data compression, and zone
maps
CSV, TSV, JSON, Parquet, ORC, Apache
Web log
Framework dependent
Metadata Amazon Redshift managed Athena Catalog Manager
Hive Meta-store
BI tools supports Yes (JDBC/ODBC) Yes (JDBC) Yes (JDBC/ODBC & Custom)
Access controls Users, groups, and access controls
AWS IAM Integration with LDAP
UDF support Yes (Scalar) No Yes
Slow
What About ETL?
https://aws.amazon.com/big-data/partner-solutions/
ETLSTORE PROCESS / ANALYZE
Data Integration PartnersReduce the effort to move, cleanse, synchronize, manage, and automatize data related processes. AWS Glue (Preview)
AWSGlueisafullymanagedETLservicethatmakesiteasytounderstandyourdatasources,preparethedata,andmoveitreliablybetweendatastores
CONSUME
COLLECT STORE CONSUMEPROCESS / ANALYZE
Amazon Elasticsearch Service
Apache Kafka
Amazon SQS
Amazon KinesisStreams
Amazon Kinesis Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB Streams
Hot
Hot
War
m
File
Mes
sage
Stre
am
Mobile apps
Web apps
Devices
MessagingMessage
Sensors & IoT platforms
AWS IoT
Data centersAWS Direct
Connect
AWS Import/ExportSnowball
Logging
Amazon CloudWatch
AWS CloudTrail
RECORDS
DOCUMENTS
FILES
MESSAGES
STREAMS
Logg
ing
IoT
Appl
icat
ions
Tran
spor
tM
essa
ging
ETL
Sear
ch
SQL
N
oSQ
L C
ache
Streaming
Amazon Kinesis Analytics
KCLapps
AWS Lambda
Fast
Stre
am
Amazon EC2
Amazon EMR
Amazon SQS apps
Amazon Redshift
Amazon Machine Learning
Presto
AmazonEMR
Fast
Slow
Amazon EC2
Amazon Athena
Batc
hM
essa
geIn
tera
ctiv
eM
L
STORE CONSUMEPROCESS / ANALYZE
Amazon QuickSight
Apps & Services
Anal
ysis
& v
isua
lizat
ion
Not
eboo
ks
IDE
API
Applications & API
Analysis and visualization
Notebooks
IDE
Business users
Data scientist, developers
COLLECT ETL
Putting It All Together
Streaming
Amazon Kinesis Analytics
KCLapps
AWS Lambda
COLLECT STORE CONSUMEPROCESS / ANALYZE
Amazon Elasticsearch Service
Apache Kafka
Amazon SQS
Amazon KinesisStreams
Amazon Kinesis Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB Streams
Hot
Hot
War
m
Fast
Stre
am
Sear
ch
SQL
N
oSQ
L C
ache
File
Mes
sage
Stre
am
Amazon EC2
Mobile apps
Web apps
Devices
MessagingMessage
Sensors & IoT platforms
AWS IoT
Data centersAWS Direct
Connect
AWS Import/ExportSnowball
Logging
Amazon CloudWatch
AWS CloudTrail
RECORDS
DOCUMENTS
FILES
MESSAGES
STREAMS
Amazon QuickSight
Apps & Services
Anal
ysis
& v
isua
lizat
ion
Not
eboo
ksID
EAP
I
Reference architecture
Logg
ing
IoT
Appl
icat
ions
Tran
spor
tM
essa
ging
ETL
Amazon EMR
Amazon SQS apps
Amazon Redshift
Amazon Machine Learning
Presto
AmazonEMR
Fast
Slow
Amazon EC2
Amazon Athena
Batc
hM
essa
geIn
tera
ctiv
eM
L
Design Patterns
Primitive: Decoupled Data Bus
Storage decoupled from processingMultiple stages
Store Process Store Process
processstore
Primitive: Pub/Sub
Amazon Kinesis
AWS Lambda
Amazon Kinesis Connector Library
processstore
ApacheSpark
Parallel stream consumption/processing
process store
Primitive: Materialized Views
Amazon EMR
Amazon Kinesis
AWS Lambda
Amazon S3
Amazon DynamoDB
Spark Streaming
Amazon Kinesis Connector Library
Spark SQL
Analysis framework reads from or writes to multiple data stores
Spark Streaming Apache StormAWS Lambda
KCL appsAmazon Redshift
AmazonRedshift
Hive
Spark Presto
Amazon Kinesis Amazon DynamoDB Amazon S3data
Hot ColdData temperature
Proc
essi
ng s
peed
Fast
Slow Answers
Hive
Native appsKCL appsAWS Lambda
Amazon Athena
Amazon EMR
Real-time Analytics
AmazonKinesis
KCL app
AWS Lambda
SparkStreaming
Amazon SNS
AmazonML
Notifications
AmazonElastiCache
(Redis)
AmazonDynamoDB
AmazonRDS
AmazonES
Alerts
App state
Real-time prediction
KPI
processstore
Amazon KinesisAnalytics
AmazonS3
Log
Amazon KinesisFan out
Interactive & BatchAnalytics
Amazon S3
Amazon EMR
Hive
Pig
Spark
AmazonML
processstore
Consume
Amazon Redshift
Amazon EMR
PrestoSpark
Batch
Interactive
Batch prediction
Real-time predictionAmazon Kinesis
Firehose
Amazon Athena
Files
Amazon KinesisAnalytics
Interactive&Batch
Amazon S3
Amazon Redshift
Amazon EMRPresto
Hive
Pig
Spark
AmazonElastiCache
AmazonDynamoDB
AmazonRDS
AmazonES
AWS Lambda
Storm
Spark Streaming on Amazon EMR
Ap
plic
atio
ns
Amazon Kinesis
App state
KCL
AmazonML
Real-time
AmazonDynamoDB
AmazonRDS
Change Data Capture
Transactions
Stream
Files
Data Lake
Amazon KinesisAnalytics
Amazon Athena
Amazon Kinesis
Firehose
Summary
Build decoupled systems• Data → Store → Process → Store → Analyze → Answers
Use the right tool for the job• Data structure, latency, throughput, access patterns
Leverage AWS managed services• Scalable/elastic, available, reliable, secure, no/low admin
Use log-centric design patterns• Immutable log, batch, interactive & real-time views
Be cost-conscious• Big data ≠ big cost
Amazon SQS apps
Streaming
KCLapps
Amazon Redshift
COLLECT STORE CONSUMEPROCESS / ANALYZE
Amazon Machine Learning
Presto
AmazonEMR
Amazon Elasticsearch Service
Apache Kafka
Amazon SQS
Amazon KinesisStreams
Amazon Kinesis Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB Streams
Hot
Hot
War
m
Fast
Slow
Fast
Sear
ch
SQ
L
NoS
QL
Cac
heFi
leM
essa
geSt
ream
Amazon EC2
Amazon EC2
Mobile apps
Web apps
Devices
MessagingMessage
Sensors & IoT platforms
AWS IoT
Data centersAWS Direct
Connect
AWS Import/ExportSnowball
Logging
Amazon CloudWatch
AWS CloudTrail
RECORDS
DOCUMENTS
FILES
MESSAGES
STREAMS
Amazon QuickSight
Apps & Services
Anal
ysis
& v
isua
lizat
ion
Not
eboo
ksID
EAP
I
Logg
ing
IoT
Appl
icat
ions
Tran
spor
tM
essa
ging
ETL
Batc
hM
essa
geIn
tera
ctiv
eSt
ream
ML
Amazon EMR
AWS Lambda
Amazon Kinesis Analytics
Amazon Athena
AWS Free Tier (aws.amazon.com/free)
We’re Hiring! (jobs.amazon.com)Montreal• Solutions Architect (486946)
Toronto• Solutions Architect (493577,493576,493575,493574)• Security Architect (480504)• Cloud Infrastructure Architect (456134)• Senior Solutions Architect (416514)• Technical Account Manager (403410)
Vancouver• Solutions Architect (493572,493571)• SI Solutions Architect (473414)• Senior Solutions Architect (458043)
Thank you! Questions?
Antoine Généreux, genereux@amazon.com
Resources
Blogs, Whitepapers and Video Sessions(Blog) Powering Amazon Redshift Analytics with Apache Spark and Amazon Machine Learninghttps://aws.amazon.com/blogs/big-data/powering-amazon-redshift-analytics-with-apache-spark-and-amazon-machine-learning/(Blog) Implement Serverless Log Analytics Using Kinesis Analyticshttps://aws.amazon.com/blogs/big-data/implement-serverless-log-analytics-using-amazon-kinesis-analytics/(Blog) Harmonize, Search and Analyze Loosely Coupled Datasets on AWShttps://aws.amazon.com/blogs/big-data/harmonize-search-and-analyze-loosely-coupled-datasets-on-aws/(Blog) Build an Event-Based Analytics Pipeline for Amazon Game Studios’ Breakawayhttps://aws.amazon.com/blogs/big-data/building-an-event-based-analytics-pipeline-for-amazon-game-studios-breakaway/(Whitepaper) Big Data Analytics Options on AWShttps://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf
re:Invent 2016 Breakout Sessions | Big Data Mini Con (13 videos)https://www.youtube.com/playlist?list=PLhr1KZpdzukdi1XNNdkDwQqOWObl5FDsV
re:Invent 2016 Breakout Sessions | Databases (26 videos)https://www.youtube.com/playlist?list=PLhr1KZpdzukc97mkL2-TOE-vJtp22c1yc
re:Invent 2016 Breakout Sessions | Machine Learning Mini Con (14 videos)https://www.youtube.com/playlist?list=PLhr1KZpdzukexYSNcIj9iBbmn9jYKu2pu
Code Samples (https://github.com/awslabs)Harmonize, Search and Analyze Loosely Coupled Datasets on AWS (code related to blog post)https://github.com/awslabs/harmonize-search-analyzeAWS Data Lake Solutionhttps://github.com/awslabs/aws-data-lake-solutionServerless MapReduce Reference Architecturehttps://github.com/awslabs/lambda-refarch-mapreduceStreaming Analytics Pipelinehttps://github.com/awslabs/streaming-analytics-pipelineReal-time analytics using Kinesis and Spark Streaminghttps://github.com/awslabs/real-time-analytics-spark-streamingServerless Reference Architecture for Real-time Stream Processinghttps://github.com/awslabs/lambda-refarch-streamprocessingAmazon Kinesis Data Generatorhttps://github.com/awslabs/amazon-kinesis-data-generatorMachine Learning Sampleshttps://github.com/awslabs/machine-learning-samplesCFN Cluster for Deep Learning AMIshttps://github.com/awslabs/deeplearning-cfnAmazon Simple Beer Servicehttps://github.com/awslabs/simplebeerservice
top related