todd - london 2 - brining you data together in the cloud · 2019-12-01 · bringing your data...
TRANSCRIPT
![Page 1: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/1.jpg)
@SnowflakeDB@SnowflakeDB #CloudAnalytics17
LONDON
![Page 2: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/2.jpg)
BringingYourDataTogetherintheCloudToddBeaucheneGlobalAlliancesArchitect,SnowflakeComputing
![Page 3: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/3.jpg)
“Data!Data!Data!Ican'tmakebrickswithoutclay.”-SherlockHolmes
![Page 4: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/4.jpg)
Agenda
• CloudDataEcosystem• DataSources• Methodologies• DataIntegrationSolutions• Conclusion
![Page 5: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/5.jpg)
Cloud Data EcosystemData Integration Business Intelligence &
AnalyticsData Warehouse
Enterprise apps
Data Sources
Corporate
Web
Mobile
IoT
![Page 6: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/6.jpg)
Data Sources
![Page 7: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/7.jpg)
Data SourcesOn-Premises• Typicallybackedbyalocaltransactionaldatabase
• Alldataliveswithinthefirewall
• Customerhasfullaccesstoalldataandsystem
Cloud• Typicallybackedbyaclouddatabase(i.e.RDS)
• CanrunincustomerVPC
• Typicallyoffersfeweroptionsthanon-premises
SaaS• Typically data is only
available via API• Outside of customer
firewall or VPC• Customer has very
little control over handling of data
![Page 8: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/8.jpg)
Real World Example: Consolidated DashboardChallenges• Long-termprojectwithhigh-levelgoals
• Diversedatasources
• Differentrefreshcycles
• Inconsistentresults
Solutions• Agileprojectwithfocused,short-termgoals
• DedicatedschemainEDW
• DailyETLProcess
• DataqualitycheckswithinETL
![Page 9: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/9.jpg)
Methodologies
![Page 10: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/10.jpg)
MethodologiesBulkLoading– Trunc andLoad• Runsatregularintervals• Fulldatasetloadedduringeachrunandexistingdataispurged
• Leastefficientoption,butverysimpletomanage
• Highdatavolumeseveryrun• Morecommonlyusedfordimensiontables
DailyDifferentials• RunsduringnightlyETLwindow• Requireschangedatacapturetoidentifychangedrows
• Generallyconsistsofaseriesofstepswhereeachdependsontheprevioussteps
• Mustincludelogictohandleslowlychangingdimensions
![Page 11: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/11.jpg)
MethodologiesInsert-only– Date-based• ExtractsdatabydaterangetoeliminateneedforCDC
• Simplifiedprocessing• Commonlyusedforfacttables• Changestodatafrompreviousperiodsrequiredeletionofalldataforthegivenrange
DatabaseReplication• Generallyrunsinnear-real-time• Requiresatoolthatistightlyintegratedwiththesourcedatabase
• Schemasmustmatchbetweensourceanddestination
![Page 12: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/12.jpg)
MethodologiesBatchProcessing• Generallyusedwhendataisbeingpushedfromthesource
• Batchfrequencydependsonthevolumeandvelocityofthedata
• Requiresautomatedprocesstoloadbatchesintothedatawarehouse.
Streaming• Generallyusedforhighvolumedata
• Event-basedratherthanrow-based
• Oftenrequiresmicro-batchingofdataforloadintorelationaldatabase
• Rawdatamustusuallybetransformedtosupportanalytics
![Page 13: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/13.jpg)
Data IntegrationSolutions
![Page 14: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/14.jpg)
Data Integration SolutionsCustomCode• Flexiblebutcomplex
• Leveragesin-databaseprocessing
• Challengingtomanageandmaintain
ETL• Simplifieddatatransformationwithnocode
• Built-independencyanderrorhandling
• ReducesdatavolumeswithinEDW
ELT• Leverages benefits of
ETL while shifting data processing to EDW
• Requires tight integration between Data Integration and EDW
• Raw and transformed data in one place
![Page 15: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/15.jpg)
Data Integration SolutionsOn-Premises• Customerownshardwareandsoftwareinstall/configuration
• Don’thavetodealwithfirewalltoaccesslocalsources
Cloud• Customerownssoftwareinstall/configurationbutnothardware
• CanrunincustomerVPCtoprovidedirectaccesstodatawithinVPCorbehindfirewall
SaaS• Fully managed by
service provider• Configurable options
vary by solution• Must find secure ways
to access data not stored inside firewall
![Page 16: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/16.jpg)
Conclusion
![Page 17: Todd - London 2 - Brining You Data Together in the Cloud · 2019-12-01 · Bringing Your Data Together in the Cloud Todd Beauchene Global Alliances Architect, Snowflake Computing](https://reader033.vdocument.in/reader033/viewer/2022050412/5f88db669d4add039f765447/html5/thumbnails/17.jpg)
Cloud Data Warehousing Best Practices• Leveragethescalablecomputelayertodothebulkofthedata
processing• Isolateloadandtransformjobsfromqueriestopreventresource
contention• Eliminatephysicaldatamartsbyleveragingascalabledataplatform• QAiskey,makesureallchangesmadetodataintegrationtasksare
testedbeforetheyrolltoproduction• Whenmigratingitisimportanttoconvertonesourceatatime