data lake bestpractices - amazon web...
TRANSCRIPT
![Page 1: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/1.jpg)
Data Lake Best Practices
![Page 2: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/2.jpg)
Agenda
Why DataLakeKeyComponents ofaDataLakeModernDataArchitectureSomeBestPracticesCaseStudySummaryTakeaways
![Page 3: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/3.jpg)
WhatisaDataLake?
![Page 4: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/4.jpg)
What,whyetc.
Whatisadatalake?• Itisanarchitecturethatallowsyoutocollect,store,process,analyzeand
consumealldatathatflowsintoyourorganization.Why datalake?• Leveragealldatathatflowsintoyourorganization
• Customercentricity• Businessagility• BetterpredictionsviaMachineLearning• Competitiveadvantage
![Page 5: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/5.jpg)
ComparisonofaDataLaketoanEnterpriseDataWarehouse
Complementary to EDW (not replacement) Data lake can be source for EDW
Schema on read (no predefined schemas) Schema on write (predefined schemas)
Structured/semi-structured/Unstructured data Structured data only
Fast ingestion of new data/content Time consuming to introduce new content
Data Science + Prediction/Advanced Analytics + BI use cases BI use cases only (no prediction/advanced analytics)
Data at low level of detail/granularity Data at summary/aggregated level of detail
Loosely defined SLAs Tight SLAs (production schedules)
Flexibility in tools (open source/tools for advancedanalytics) Limited flexibility in tools (SQL only)
EnterpriseDWEMR S3
![Page 6: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/6.jpg)
KeyConceptsAssociatedwithaDataLake
![Page 7: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/7.jpg)
STORAGECOMPUTE
COMPUTE COMPUTE
COMPUTECOMPUTE
COMPUTE
COMPUTE
COMPUTE
![Page 8: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/8.jpg)
ComponentsofaDataLake
DataStorage
• Highdurability• Storesrawdatafrominputsources• Supportforanytypeofdata• Lowcost
Streaming
• Streamingingestoffeeddata• Providestheabilitytoconsumeanydataset
asastream• Facilitateslowlatencyanalytics
Storage&Streams
Catalogue&Search
Entitlements
API&UI
![Page 9: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/9.jpg)
ComponentsofaDataLake
Catalogue
• Metadatalake• Usedforsummarystatisticsanddata
Classificationmanagement
Search
• SimplifiedaccessmodelfordatadiscoveryStorage&Streams
Catalogue&Search
Entitlements
API&UI
![Page 10: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/10.jpg)
ComponentsofaDataLake
Entitlementssystem
• Encryption• Authentication• Authorisation• Chargeback• Quotas• Datamasking• Regionalrestrictions
Storage&Streams
Catalogue&Search
Entitlements
API&UI
![Page 11: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/11.jpg)
ComponentsofaDataLake
Storage&Streams
Catalogue&Search
Entitlements
API&UI API&UserInterface
• Exposesthedatalaketocustomers• Programmaticallyquerycatalogue• ExposesearchAPI• Ensuresthatentitlementsarerespected
![Page 12: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/12.jpg)
The Modern Data Architecture
![Page 13: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/13.jpg)
![Page 14: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/14.jpg)
![Page 15: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/15.jpg)
![Page 16: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/16.jpg)
Storage&Streams
Catalogue&Search
Entitlements
API&UI
![Page 17: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/17.jpg)
![Page 18: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/18.jpg)
Storage&Streams
Catalogue&Search
Entitlements
API&UI
![Page 19: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/19.jpg)
![Page 20: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/20.jpg)
Storage&Streams
Catalogue&Search
Entitlements
API&UI
![Page 21: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/21.jpg)
![Page 22: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/22.jpg)
WhyIsAmazonS3theFabricofDataLake?• Nativelysupportedbybigdataframeworks(Spark,Hive,Presto,etc.)• Decouplestorageandcompute
• Noneedtoruncomputeclustersforstorage(unlikeHDFS)• CanruntransientHadoopclusters&AmazonEC2SpotInstances• Multiple&heterogeneousanalysis clusterscanusethesamedata
• Virtuallyunlimitednumberofobjectsandvolumeofdata• Veryhighbandwidth– noaggregatethroughputlimit• Designedfor99.99%availability– cantoleratezonefailure• Designedfor99.999999999%durability• Noneedtopayfordatareplication• Nativesupportforversioning• Tiered-storage(Standard,IA,AmazonGlacier)vialife-cyclepolicies
• UseHDFSforveryfrequentlyaccessed(hot)data
• Secure– SSL,client/server-sideencryptionatrest• Lowcost
![Page 23: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/23.jpg)
Storage&Streams
Catalogue&Search
Entitlements
API&UI
![Page 24: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/24.jpg)
AWS Lambda
AWS Lambda
Metadata Index(DynamoDB)
Search Index(Amazon Elasticsearch
Service or AmazonCloudSearch)
ObjectCreatedObjectDeleted PutItem
Update Stream
Update Index
Extract Search Fields
Indexing and Searching using Metadata
![Page 25: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/25.jpg)
Storage&Streams
Catalogue&Search
Entitlements
API&UI
![Page 26: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/26.jpg)
![Page 27: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/27.jpg)
![Page 28: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/28.jpg)
Identity&AccessManagement
• Manageusers,groups,androles• IdentityfederationwithOpenID• TemporarycredentialswithAmazonSecurityToken
Service(AmazonSTS)• Storedpolicytemplates• Powerfulpolicylanguage• AmazonS3bucketpolicies
![Page 29: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/29.jpg)
DataEncryption
AWSCloudHSMDedicatedTenancySafeNet LunaSAHSMDevice
CommonCriteriaEAL4+,NISTFIPS140-2
AWSKeyManagementServiceAutomatedkeyrotation&auditing
IntegrationwithotherAWSservices
AWSserversideencryptionAWSmanagedkeyinfrastructure
![Page 30: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/30.jpg)
Storage&Streams
Catalogue&Search
Entitlements
API&UI
![Page 31: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/31.jpg)
DataLakeAPI&UI
ExposestheMetadataAPI,search,andAmazonS3storageservicestocustomers
CanbebasedonTVM/STSTemporaryAccessformanyservices,andabespokeAPIforMetadata
DriveallUIoperationsfromAPI?
![Page 32: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/32.jpg)
IntroducingAmazonAPIGateway
HostmultipleversionsandstagesofAPIs
CreateanddistributeAPIkeystodevelopers
LeverageAWSSigv4toauthorizeaccesstoAPIs
Throttleandmonitorrequeststoprotectthebackend
LeveragesAWSLambda
![Page 33: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/33.jpg)
Storage&Streams
Catalogue&Search
Entitlements
API&UI
![Page 34: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/34.jpg)
Storage&Streams
Catalogue&Search
Entitlements
API&UI
![Page 35: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/35.jpg)
![Page 36: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/36.jpg)
![Page 37: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/37.jpg)
Storage&Streams
Catalogue&Search
Entitlements
API&UI
![Page 38: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/38.jpg)
![Page 39: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/39.jpg)
![Page 40: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/40.jpg)
![Page 41: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/41.jpg)
![Page 42: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/42.jpg)
https://aws.amazon.com/big-data/partner-solutions/
DataIntegrationPartnersReducetheefforttomove,cleanse,synchronize,manage,andautomatizedatarelatedprocesses.
![Page 43: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/43.jpg)
![Page 44: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/44.jpg)
![Page 45: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/45.jpg)
Putting it all together
![Page 46: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/46.jpg)
Building a Data Lake on AWS
Kinesis Firehose AthenaQuery Service
1
2
3
4
5
6
7
8
GlueBatch
9
10
![Page 47: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/47.jpg)
Processing Data for Analytics on your data lake
![Page 48: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/48.jpg)
![Page 49: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/49.jpg)
Processing&Analytics
Real-time Batch
AI&Predictive
BI&DataVisualization
Transactional&RDBMS
AWS LambdaApache Storm
on EMR
Apache Flinkon EMR
Spark Streaming on EMR
ElasticsearchService
Kinesis Analytics, Kinesis Streams
DynamoDB
NoSQL DB Relational DatabaseAurora
EMRHadoop, Spark,
Presto
RedshiftData Warehouse
AthenaQuery Service
Amazon LexSpeech recognition
Amazon Rekognition
Amazon PollyText to speech
Machine LearningPredictive analytics
Kinesis Streams & Firehose
![Page 50: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/50.jpg)
Important considerations
![Page 51: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/51.jpg)
DataTemperature
Hot Warm ColdVolume MB–GB GB–TB PB–EBItemsize B–KB KB–MB KB–TBLatency ms ms,sec min,hrsDurability Low–high High VeryhighRequestrate Veryhigh High LowCost/GB $$-$ $-¢¢ ¢
Hot data Warm data Cold data
![Page 52: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/52.jpg)
WhichStream/MessageStorageShouldIUse?AmazonDynamoDBStreams
AmazonKinesisStreams
AmazonKinesisFirehose
ApacheKafka
AmazonSQS(Standard)
AmazonSQS(FIFO)
AWS managed Yes Yes Yes No Yes Yes
Guaranteed ordering Yes Yes No Yes No Yes
Delivery(deduping) Exactly-once At-least-once At-least-once At-least-once At-least-once Exactly-once
Dataretentionperiod 24hours 7days N/A Configurable 14 days 14days
Availability 3 AZ 3 AZ 3AZ Configurable 3 AZ 3AZ
Scale /throughput
Nolimit/~ tableIOPS
Nolimit/~shards
No limit/automatic
Nolimit /~nodes
No limits/automatic
300 TPS/queue
Parallelconsumption Yes Yes No Yes No No
StreamMapReduce Yes Yes N/A Yes N/A N/A
Row/objectsize 400KB 1MB Destinationrow/objectsize
Configurable 256KB 256KB
Cost Higher(tablecost)
Low Low Low (+admin) Low-medium Low-medium
Hot Warm
New
![Page 53: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/53.jpg)
BatchTakesminutestohoursExample:Daily/weekly/monthlyreportsAmazonEMR(MapReduce,Hive,Pig,Spark)
InteractiveTakessecondsExample:Self-servicedashboardsAmazonRedshift,AmazonAthena,AmazonEMR(Presto,Spark)Subsecond:ElastiCache (Redis 3.2TiB,MemCache),SAPHana
MessageTakesmillisecondstosecondsExample:MessageprocessingAmazonSQSapplicationsonAmazonEC2
StreamTakesmillisecondstosecondsExample:Fraudalerts,1minutemetricsAmazonEMR(SparkStreaming),AmazonKinesisAnalytics,KCL,Storm,AWSLambda
ArtificialIntelligenceTakesmillisecondstominutesExample:Frauddetection,forecastdemand,texttospeechAmazonAI(Lex,Polly,ML,Rekognition),AmazonEMR(SparkML),DeepLearningAMI(MXNet,TensorFlow,Theano,Torch,CNTKandCaffe)
AnalyticsTypes&FrameworksPROCESS/ANALYZE
Message
AmazonSQSappsAmazonEC2
Streaming
AmazonKinesisAnalytics
KCLapps
AWSLambda
Stream
AmazonEC2
AmazonEMR
Fast
AmazonRedshift
Presto
EMR
Fast
Slow
AmazonAthena
Batch
Interactive
AmazonAIAI
![Page 54: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/54.jpg)
WhichAnalysisToolShouldIUse?AmazonRedshift AmazonAthena AmazonEMR
Presto Spark Hive
Use case Optimizedfordatawarehousing
Ad-hocInteractiveQueries
InteractiveQuery
Generalpurpose(iterativeML,RT,..)
Batch
Scale/throughput ~Nodes Automatic/No limits ~Nodes
AWSManagedService
Yes Yes, Serverless Yes
Storage Localstorage Amazon S3 AmazonS3,HDFS
Optimization Columnarstorage,datacompression,andzonemaps
CSV,TSV,JSON,Parquet,ORC, ApacheWeblog
Framework dependent
Metadata AmazonRedshiftmanaged AthenaCatalogManager HiveMeta-store
BI toolssupports Yes(JDBC/ODBC) Yes(JDBC) Yes(JDBC/ODBC&Custom)
Accesscontrols Users, groups,andaccesscontrols
AWSIAM Integration withLDAP
UDF support Yes(Scalar) No Yes
Slow
![Page 55: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/55.jpg)
Case Study
![Page 56: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/56.jpg)
“For our market surveillance systems, we are looking at about 40% [savings with AWS], but the real benefits are the business benefits: We can do things that we physically weren’t able to do before, and that is priceless.”
- Steve Randich, CIO
Case Study: Re-architecting Compliance
What FINRA needed• Infrastructure for its market surveillance platform• Support of analysis and storage of approximately 75
billion market events every day
Why they chose AWS• Fulfillment of FINRA’s security requirements• Ability to create a flexible platform using dynamic
clusters (Hadoop, Hive, and HBase), Amazon EMR, and Amazon S3
Benefits realized• Increased agility, speed, and cost savings• Estimated savings of $10-20m annually by using AWS
![Page 57: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/57.jpg)
Fraud Detection
FINRAusesAmazonEMRandAmazonS3toprocessupto75billiontradingeventsperdayandsecurelystoreover5petabytesofdata,attainingsavingsof$10-20mmperyear.
![Page 58: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/58.jpg)
Summary
![Page 59: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/59.jpg)
• AWS enables you to build sophisticated data lakes and related analytics applications
• Retrospective, Real-time, Predictive
• You can build incrementally, adding use cases and increasing scale as you go
• AWS provides a broad range of security and auditing features to enable you to meet your security requirements
https://aws.amazon.com/big-data/
![Page 60: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/60.jpg)
Takeaways
![Page 61: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/61.jpg)
• Prescriptiveguidanceandrapidlydeployablesolutionstohelpyoustore,analyze,andprocessbigdataontheAWSCloud
• DeriveInsightsfromIoT inMinutesusingAWSIoT,AmazonKinesisFirehose,AmazonAthena,andAmazonQuickSight
• DeployingaDataLakeonAWS- March2017AWSOnlineTechTalks
• Harmonize,Search,andAnalyzeLooselyCoupledDatasetsonAWS
• BestPracticesforBuildingaDataLakewithAmazonS3-August2016MonthlyWebinarSeries- YouTube
http://bit.ly/2qiElYx
http://amzn.to/2mzGppL
http://bit.ly/2qipA8h
http://amzn.to/2qpiFaK
http://amzn.to/2lpbc8p
![Page 62: Data Lake BestPractices - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_B… · What is a data lake? • It is an architecture that allows you to collect](https://reader031.vdocument.in/reader031/viewer/2022030500/5aacf3c97f8b9a2e088d9eb6/html5/thumbnails/62.jpg)
?