![Page 1: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/1.jpg)
Data Migration at ScaleMOVING THE ELEPHANT IN THE ROOM
![Page 2: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/2.jpg)
2
· BDPA Los Angeles Chapter· 4 year HSCC participant
· Columbia University, CC ‘14· Conductor, Inc.· linkedin.com/in/calltyrone
WHO AM I?
![Page 3: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/3.jpg)
3
· Web Presence Management· SAAS· Big data
· Collect 6TB of raw web data a week· Scalable Collection & ETL pipelines· Final Product: reports
· 6 years running· Tons of data!
CONDUCTOR, INC.
![Page 4: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/4.jpg)
4
· Growth· More users· More data
· Systems have to keep up!
WHY WE CARE ABOUT SCALABILITY
![Page 5: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/5.jpg)
5
HORIZONTAL SCALING
![Page 6: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/6.jpg)
6
VERTICAL SCALING
![Page 7: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/7.jpg)
7
· Yesterday’s solution is tomorrow’s problem· Under-prioritized· It’s hard!
· Can require massive changes· No cure-all
SCALABILITY IN THE REAL WORLD
![Page 8: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/8.jpg)
8
· Save money· Improve performance· Clear the way for progress
WHY REPLACE AN UNSCALABLE SYSTEM?
![Page 9: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/9.jpg)
9
· If it ain’t broke…· Significant Resource Investment
· Time· Money
· Software Downtime· Data Quality Concerns
WHY NOT?
![Page 10: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/10.jpg)
10
1. Identify an unscalable system2. Discover and vet a suitable successor3. Replace the legacy system with the new system
· while minimizing risk and cost
Simple, no???
YOUR TASK, AT A GLANCE
![Page 11: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/11.jpg)
TALKING ABOUT THE ELEPHANTIdentifying an Unscalable System
![Page 12: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/12.jpg)
12
· MySql· Normalized data model
· Helpful for initial modeling of our problem space· Hosted by a single, very powerful machine
OverviewCASE STUDY: LEGACY REPORTING DATABASE
Talking about the Elephant: Diagnosing an Unscalable System
![Page 13: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/13.jpg)
13
· Powerful hardware isn’t cheap.· Vertical Scaling· Obsolete Schema· Difficult to backup· Queries aren’t getting any faster.
UnsustainableCASE STUDY: LEGACY REPORTING DATABASE
Talking about the Elephant: Diagnosing an Unscalable System
![Page 14: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/14.jpg)
14
· If your solution…· Scales vertically· Prevents progress· Can’t perform at scale· Is difficult/slow/expensive to upgrade
…It’s time for a change!
SEE FOR YOURSELF
Talking about the Elephant: Diagnosing an Unscalable System
![Page 15: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/15.jpg)
FINDING A BIGGER ROOMVetting Scalable Alternatives
![Page 16: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/16.jpg)
16
· Price-efficient· Easy to maintain· Scales Horizontally
WHAT TO LOOK FOR
Finding a Bigger Room: Vetting Scalable Alternatives
![Page 17: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/17.jpg)
17
· Write once, read many· De-normalized reports· High storage capacity· High Availability
Our Use CaseCASE STUDY: AWS S3 DATASTORE
![Page 18: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/18.jpg)
18
· Write once, read many· Decent write performance, great read performance
· De-normalized reports· Flat files
· High storage capacity· No defined space limit
· High Availability· Configurable file replication
Technical OverviewCASE STUDY: AWS S3 DATASTORE
Finding a Bigger Room: Vetting Scalable Alternatives
![Page 19: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/19.jpg)
19
· Cheap· Cloud-based· Architecture facilitates testing· Easy to back up
BenefitsCASE STUDY: AWS S3 DATASTORE
Finding a Bigger Room: Vetting Scalable Alternatives
![Page 20: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/20.jpg)
20
· “Eventual Consistency”· Switching to non-relational storage is nontrivial
· Application code must change· Migration path gets complicated
CaveatsCASE STUDY: AWS S3 DATASTORE
Finding a Bigger Room: Vetting Scalable Alternatives
![Page 21: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/21.jpg)
MOVING THE ELEPHANTMigrating Legacy Data to the New System
![Page 22: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/22.jpg)
22
· Time Frame· Scheduling Constraints
· Operational Cost· Resource Constraints
· Standards for data parity
INITIAL CONSIDERATIONS
Moving the Elephant: Migrating Legacy Data to the New System
![Page 23: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/23.jpg)
23
· Two-month finish line· Developed COGS models· Built data validation software
CASE STUDY: OUR UPFRONT PLANNING
Moving the Elephant: Migrating Legacy Data to the New System
![Page 24: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/24.jpg)
24
· Can be scaled up or down· Speed up to save time· Slow down to save resources
· Can be run in a testing capacity· Configurable data sources/sinks· Configurable hardware resource use
IDEAL MIGRATION SOFTWARE CHARACTERISTICS
Moving the Elephant: Migrating Legacy Data to the New System
![Page 25: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/25.jpg)
25
· Oozie and Hive· Controllable time/resource tradeoff· Testable in a qa environment
OUR MIGRATION SOFTWARE
![Page 26: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/26.jpg)
26
· Easy to track progress· Enables concurrency· Dilutes failure risks· E.g. Conductor “Time Periods”
AN INCREMENTAL MIGRATION: PARTITIONING DATA
Moving the Elephant: Migrating Legacy Data to the New System
![Page 27: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/27.jpg)
27
· Limit client exposure to subtler bugs· Incorporate customer feedback· Demonstrate progress early· E.g. Conductor Searchlight 3.0 Beta Program
AN INCREMENTAL RELEASE
![Page 28: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/28.jpg)
28
YOU CAN DO IT!
![Page 29: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/29.jpg)
29
QUESTIONS?Thanks for Listening!
![Page 30: Moving the Elephant in the Room: Data Migration at Scale](https://reader035.vdocument.in/reader035/viewer/2022070515/587957761a28abb1418b73c5/html5/thumbnails/30.jpg)
30
(We’re Hiring!)