making big data projects successful - data science pop-up seattle
TRANSCRIPT
![Page 1: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/1.jpg)
#datapopupseattle
AARON CORDOVACTO and Co-Founder, Koverse
aaroncordova
Making Big Data Projects Successful
koverse
![Page 2: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/2.jpg)
#datapopupseattle
UNSTRUCTUREDData Science POP-UP in Seattle
www.dominodatalab.com
D
Produced by Domino Data Lab
Domino’s enterprise data science platform is used by leading analytical organizations to increase productivity, enable collaboration, and publish
models into production faster.
![Page 3: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/3.jpg)
Keystomakingsuccessfulbigdataprojectsrepeatable
![Page 4: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/4.jpg)
©Koverse|CompanyConfiden<al 2
Intro
AaronCordovaCTO,co-founderatKoverseInc.BuiltsuccessfulbigdatasystemsforDOD,Intelligence,Finance
![Page 5: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/5.jpg)
©Koverse|CompanyConfiden<al 3
BigDataProjects
Howittendstobe
Howitshouldbe
![Page 6: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/6.jpg)
©Koverse|CompanyConfiden<al 4
BigDataProjects
Interes<ngpart
![Page 7: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/7.jpg)
©Koverse|CompanyConfiden<al 5
BigDataProjects
Interes<ngpart
![Page 8: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/8.jpg)
©Koverse|CompanyConfiden<al 6
BigDataProjects
Interes<ngpart
MorepropellantSupportInfrastructure
Propellant
LaunchplaSorm
U<li<es
![Page 9: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/9.jpg)
©Koverse|CompanyConfiden<al 7
Step1:Import
Bringthedatatothedatascien<stFromwhere?
![Page 10: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/10.jpg)
©Koverse|CompanyConfiden<al 8
Step1:Security
Sensi<vedatarequiresaccesscontrolsUsingmorethan1datasetrequirefine-grainedaccesscontrols
![Page 11: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/11.jpg)
©Koverse|CompanyConfiden<al 9
Step1:Security
![Page 12: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/12.jpg)
©Koverse|CompanyConfiden<al 10
Step2:DataAssump<ons
Needtofindout1. Structureofthedata(fieldnames,types)
2. Dataseman<cs(isCustomerIDindatasetAequaltoCIDfromdatasetB?)
Ini<alassump<onsarealmostcertainlywrong.Needtoseeactualdatasamples.Goback,getmoredatasets;normalize,cleanupdata
![Page 13: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/13.jpg)
©Koverse|CompanyConfiden<al 11
Step2:DataAssump<ons
Ifprimaryanaly<calsystemcan’thandlediscovery,needanothersystemforsampling,viewing,cleaningup,normalizingdata
![Page 14: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/14.jpg)
©Koverse|CompanyConfiden<al 12
Step3:Interes<ngPart!
Runanaly<cs!Needsomesortofsystemforrunninganaly<cs:
RPythonSparkMLLibMapReduceSAS
![Page 15: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/15.jpg)
©Koverse|CompanyConfiden<al 13
Step4:DeliveringResults
Reportsarerela<velyeasytodeliver–runonceaday..smalloutputSomeresultsarelarge,needtostayinthesystemIndexingmakesresultssearchableforalargenumberofconsumersResultscanbeembeddedininterac<vedecision-makingappswithanAPI
![Page 16: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/16.jpg)
©Koverse|CompanyConfiden<al 14
Step4:DeliveringResults
Findsomesystemforindexinganaly<calresults–possiblycopyingdata,addressconsistencyissuesApplysomesolu<onformakingresultsavailableviaanAPIsotheycanbeembeddedinapplica<ons…Thenbuildapplica<ons
![Page 17: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/17.jpg)
©Koverse|CompanyConfiden<al 15
Scalability
Eveniforiginaldatasetsaresmall,mul<pledatasetsneedtobeco-locatedOriginaldataistransformedintoderiva<vesIndexeddatarequiresmorespaceScalabilitybecomesaproblemeventually
![Page 18: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/18.jpg)
©Koverse|CompanyConfiden<al 16
Scalability
Migrateoriginalsolu<ontoascalablesystem.Rewriteanaly<cs,dataflowforthescalablesystem.
![Page 19: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/19.jpg)
©Koverse|CompanyConfiden<al 17
Repeatability
Systemworks!Nowwhat?Asnewdataarrives,thewholeprocessneedstobere-run,orrunonalltheavailabledataIfanyassump<onsorstructureofthedatachange,needtobeabletore-processdataLiveupdatesneedtobescheduled,resourcedemandsneedtobebalancedOhyeah,andgobackandaddresssecurity…ifpossible
![Page 20: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/20.jpg)
©Koverse|CompanyConfiden<al 18
Workingbackwards
![Page 21: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/21.jpg)
©Koverse|CompanyConfiden<al 19
Workingbackwards
Wanttoprovidevaluefromdatabutfirsthaveto:
Addressdatadiscovery,security,scalability,repeatability…
![Page 22: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/22.jpg)
©Koverse|CompanyConfiden<al 20
YakShavingAvoid
![Page 23: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/23.jpg)
©Koverse|CompanyConfiden<al 21
Recommendedapproach
1. Startwithscalabletechnologies2. Buildinsecurityfromthestart3. Admitthatdataismessy,makeitpossibletoaddressdataqualityissues
withinthesystem4. Integratewithwhateveranaly<caltoolsdatascien<stswanttouse5. Integrateindexingandsearchintothesystem,avoidcopyingdata6. Allowforprototypingnewdataflows,analy<cs,appsinproduc<onsystem.
Goingliveamamerofconfigura<on..notarewrite
![Page 24: Making Big Data Projects Successful - Data Science Pop-up Seattle](https://reader034.vdocument.in/reader034/viewer/2022050613/58ad900a1a28ab662a8b61a7/html5/thumbnails/24.jpg)
©Koverse|CompanyConfiden<al 22
Recommendedapproach
Gofrom2-3successfulprojectsperyearto20-30