Download - Big Data on AWS
![Page 1: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/1.jpg)
Big Data on AWSJohann Romefort
![Page 2: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/2.jpg)
Agenda
• What is Big Data?
• What is AWS?
• Presenting the tools: How Big Data and AWS fit together
![Page 3: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/3.jpg)
What is Big Data?
• It’s at the intersection of data’s 3 V:
• Velocity (Batch / Real time / Streaming)
• Volume (Terabytes/Petabytes)
• Variety (structure/semi-structured/unstructured)
![Page 4: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/4.jpg)
Why is everybody talking about it?
• Cost of generation of data has gone down
• By 2015, 3B people will be online, pushing data volume created to 8 zettabytes
• More data = More insights = Better decisions
• Ease and cost of processing is falling thanks to cloud platforms
![Page 5: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/5.jpg)
Data flow and constraintsGenerate
Ingest / Store
Process
Visualize / Share
The 3 V involve heterogeneity and
make it hard to achieve those steps
![Page 6: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/6.jpg)
What is AWS?
• AWS is a cloud computing platform
• On-demand delivery of IT resources
• Pay-as-you-go pricing model
![Page 7: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/7.jpg)
Cloud Computing
+ +
StorageCompute Networking
Adapts dynamically to ever changing needs to stick closely
to user infrastructure and applications requirements
![Page 8: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/8.jpg)
How does AWS helps with Big Data?
• Remove constraints on the ingesting, storing, and processing layer and adapts closely to demands.
• Provides a collection of integrated tools to adapt to the 3 V’s of Big Data
• Unlimited capacity of storage and processing power fits well to changing data storage and analysis requirements.
![Page 9: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/9.jpg)
Computing Solutions for Big Data on AWS
Kinesis
EC2 EMR
Redshift
![Page 10: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/10.jpg)
Computing Solutions for Big Data on AWS
EC2All-purpose computing instances.Dynamic Provisioning and resizingLet you scale your infrastructure at low cost
Use Case: Well suited for running custom or proprietary application (ex: SAP Hana, Tableau…)
![Page 11: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/11.jpg)
Computing Solutions for Big Data on AWS
EMR
‘Hadoop in the cloud’
Adapt to complexity of the analysis and volume of data to process
Use Case: Offline processing of very large volume of data, possibly unstructured (Variety variable)
![Page 12: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/12.jpg)
Computing Solutions for Big Data on AWS
Kinesis
Stream Processing
Real-time data
Scale to adapt to the flow of inbound data
Use Case: Complex Event Processing, click streams, sensors data, computation over window of time
![Page 13: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/13.jpg)
Computing Solutions for Big Data on AWS
RedShift
Data Warehouse in the cloud
Scales to Petabytes
Supports SQL Querying
Start small for just $0.25/h
Use Case: BI Analysis, Use of ODBC/JDBC legacy software to analyze or visualize data
![Page 14: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/14.jpg)
Storage Solution for Big Data on AWS
DynamoDB RedShift
S3 Glacier
![Page 15: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/15.jpg)
Storage Solution for Big Data on AWS
DynamoDB
NoSQL DatabaseConsistent Low latency access Column-base flexible data model
Use Case: Offline processing of very large volume of data, possibly unstructured (Variety variable)
![Page 16: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/16.jpg)
Storage Solution for Big Data on AWS
S3
Use Case: Backups and Disaster recovery, Media storage, Storage for data analysis
Versatile storage system
Low-cost
Fast retrieving of data
![Page 17: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/17.jpg)
Storage Solution for Big Data on AWS
Glacier
Use Case: Storing raw logs of data. Storing media archives. Magnetic tape replacement
Archive storage of cold data
Extremely low-cost
optimized for data infrequently accessed
![Page 18: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/18.jpg)
What makes AWS different when it comes to big data?
![Page 19: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/19.jpg)
Given the 3V’s a collection of tools is most of the time needed for your data processing and storage.
Integrated Environment for Big Data
AWS Big Data solutions comes integrated with each others alreadyAWS Big Data solutions also integrate with the whole AWS ecosystem (Security, Identity Management, Logging, Backups, Management Console…)
![Page 20: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/20.jpg)
Example of products interacting with each other.
![Page 21: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/21.jpg)
Tightly integrated rich environment of tools
On-demand scaling sticking to processing requirements
+
=Extremely cost-effective and easy to deploy solution for big data needs
![Page 22: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/22.jpg)
• Error Detection: Real-time detection of hardware problems
• Optimization and Energy management
Use Case: Real-time IOT Analytics
Gathering data in real time from sensors deployed in factory and send them for immediate processing
![Page 23: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/23.jpg)
First Version of the infrastructure
Aggregate
Sensors data
nodejs stream
processor
On customer site
evaluate rules over time window
in-house hadoop cluster
mongodb
feed algorithmwrite raw data for further
processing
backup
![Page 24: Big Data on AWS](https://reader034.vdocument.in/reader034/viewer/2022050808/557050f6d8b42a85618b51db/html5/thumbnails/24.jpg)
Second Version of the infrastructure
Aggregate
Sensors data
On customer site
evaluate rules over time window
write raw data for
archiving
Kinesis RedShift for BI
analysis
Glacier