practical tips on handling big data
TRANSCRIPT
![Page 1: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/1.jpg)
Dr. Brian J. Spiering
Practical Tips On Handling Big Data
![Page 2: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/2.jpg)
hi, brian.Data Science Faculty @GalvanizeU@BrianSpiering
![Page 3: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/3.jpg)
RoadmapDefining “Big Data” (aka, you probably don’t have Big Data)
How to avoid Big Data (and associated problems)
Okay, I really have Big Data. What should I do?
1
2
3
![Page 4: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/4.jpg)
Defining “Big Data” (aka, you probably don’t have Big Data)
1
![Page 5: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/5.jpg)
What is Big Data?“Data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable amounts of time.”
![Page 6: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/6.jpg)
What is Big Data?“Data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable amounts of time.”
Data that doesn’t find on a single machine.
![Page 7: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/7.jpg)
What is not Big Data?
![Page 8: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/8.jpg)
How to avoid Big Data (and associated problems)
2
![Page 9: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/9.jpg)
Handling Medium Data
![Page 10: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/10.jpg)
![Page 11: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/11.jpg)
![Page 12: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/12.jpg)
![Page 13: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/13.jpg)
![Page 14: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/14.jpg)
Cache
RAM
Disk
Data Center
![Page 15: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/15.jpg)
Big Data Gotcha!
![Page 16: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/16.jpg)
Scaling Out
1. Single Local Machine < 10s GB*2. Single Cloud Machine < 2 TB*3. Cloud Cluster of Machines > 2 TB*
* Summer 2016
![Page 17: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/17.jpg)
![Page 18: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/18.jpg)
Matrix Multiplication
![Page 19: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/19.jpg)
Matrix Multiplication: Imperative Implementation
![Page 20: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/20.jpg)
Matrix Multiplication: Functional Implementation
![Page 21: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/21.jpg)
Matrix Multiplication
![Page 22: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/22.jpg)
![Page 23: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/23.jpg)
Head, Torso, Tail: Separate models (and hardware)
![Page 24: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/24.jpg)
Okay, I really have Big Data. What should I do?
3
![Page 25: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/25.jpg)
“But my data is more than 5TB! (and I need it in memory)”
![Page 26: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/26.jpg)
“But my data is more than 5TB! (and I need it in memory)”
Your life sucks now… You are stuck with
distributed computing
![Page 27: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/27.jpg)
![Page 28: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/28.jpg)
![Page 29: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/29.jpg)
map reduce
Big Data is functional
![Page 30: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/30.jpg)
![Page 31: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/31.jpg)
What to do:
1. Learn some math tricks (linear algebra)2. Learn how to optimize your code3. Learn how to use cloud compute4. Learn a Big Data Framework
![Page 32: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/32.jpg)
Where have we been?Defining “Big Data” (aka, you probably don’t have Big Data)
How to avoid Big Data (and associated problems)
Okay, I really have Big Data. What should I do?
1
2
3
![Page 33: Practical Tips On Handling Big Data](https://reader031.vdocument.in/reader031/viewer/2022013005/58ef72e31a28aba2118b45cd/html5/thumbnails/33.jpg)
Thank You!Questions?