Transcript
Page 1: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Jingjing Wang,TobinBaker,MagdalenaBalazinska,DanielHalperin,BrandonHaynes,BillHowe,DylanHutchison,Shrainik Jain,RyanMaas,Parmita Mehta,DominikMoritz,BrandonMyers, JenniferOrtiz,Dan

Suciu,AndrewWhitaker,Shengliang XuDEPARTMENT OF COMPUTER SCIENCE &ENGINEERING

UNIVERSITY OF WASHINGTONhttp://myria.cs.washington.edu

TheMyria BigDataManagementandAnalyticsSystemandCloudService

Page 2: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Acknowledgments

TheMyria Team!Oursciencecollaborators!!• AndrewConnolly,TomQuinn,SarahLoebman,ArielRokem,GingerArmbrust,Yejin Choi

Oursponsors!!!• NationalScienceFoundation,Moore&SloanFoundations,WashingtonResearchFoundation,eScience Institute,ISTCBigData,Petrobras,EMC,Amazon,andFacebook

2MagdalenaBalazinska- UniversityofWashington

Page 3: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

BigData

MagdalenaBalazinska - UniversityofWashington 3

Management

Analytics

Efficient Easy

ScienceApps

Page 4: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

GoalsoftheMyria stack• Advancestate-of-the-artinbigdatasystems• Focusonefficiencyandproductivity• Testonrealapplicationsandsupportrealusers

Deliverables:• Builtanewbigdatamgmt &analyticssystem• DeployedandoperateMyria asaservice• Sourcecodeanddemoservice:http://myria.cs.washington.edu

4MagdalenaBalazinska- UniversityofWashington

Page 5: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

5MagdalenaBalazinska- UniversityofWashington

Myria hasbeendevelopedandisoperatedby• DatabaseGroupintheComputerScience&EngineeringDepartmentatUW

• UWeScience Institute

Co-PIs:DanSuciu andBillHowe

Page 6: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

6

Myria Demo

MagdalenaBalazinska- UniversityofWashington

Page 7: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria CloudService

MagdalenaBalazinska- UniversityofWashington 7

Serviceavailablethroughprojectwebsite

Page 8: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

AnalysisintheBrowserwithMyria

MagdalenaBalazinska- UniversityofWashington 8

Declarative-imperativeanalysiswithMyriaL andPython

Page 9: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria OperatesDirectlyonDatainS3

MagdalenaBalazinska- UniversityofWashington 9

Forefficientprocessing,cachesqueryresultsinternallyincluster

Page 10: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

MyriaL isImperative+DeclarativewithIterations

MagdalenaBalazinska- UniversityofWashington 10

Page 11: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria ProvidesDetailsofQueryExecution

MagdalenaBalazinska- UniversityofWashington 11

Page 12: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria ServiceincludesJupyter Notebook

MagdalenaBalazinska- UniversityofWashington 12

Jupyter notebookavailabledirectlywithMyria service

Page 13: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria SupportsPythonUser-DefinedFunctions

MagdalenaBalazinska- UniversityofWashington 13

DatafromtheHumanConnectomeproject

MRIdataanalysis

PythonUDFsenablerunninglegacycodeandcomplexanalyticsbeyondSQL/MyriaL

Page 14: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

UsersCanDeployOwnService

pip install myria-cluster

MagdalenaBalazinska- UniversityofWashington 14

myria-cluster create [OPTIONS] CLUSTER_NAME

myria-cluster stop/start/destroy […]

Page 15: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

ExampleMyria Applications

15

NeuroscienceAstronomy

NaturalLanguageProcessing

PicturefromLeilaZillesMyMergerTree Screenshot

DatafromtheHumanConnectome project

Oceanography

100

101

102

103

104

100

101

102

103

104

ps3.fcs…subset

FSC

692-40

RED

fluo

resc

ence

FSC

Picoplankton

Nanoplankton

100

101

102

103

104

100

101

102

103

104

P35-surf

FSC Small Stuff

58

0-3

0

IS

Ultraplankton

100

101

102

103

104

100

101

102

103

104

P35-surf

FSC Small Stuff

69

2-4

0 litt

le s

tuff

Phytoplankton

Prochlorococcus

Bibliometrics

Page 16: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

16

Myria Internals

MagdalenaBalazinska- UniversityofWashington

Page 17: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria Polystore Stack

Browser SpecializedServices

RACO

MyMergerTree

QueryTranslation,Optimization,andOrchestration

Python/Jupyter

Parallel, Iterative, and Elastic Query

Execution

MyriaXMPI

SciDB

Graphs

NoSQL

MagdalenaBalazinska- UniversityofWashington 17

Page 18: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria’s DataModelandQueryInterface• RelationalAlgebraCompiler(RACO)

– Myria’s queryoptimizerandfederator• RACOcore:relationalalgebraextendedwith

– Iterations formulti-passalgorithms– Flatmap toexplodenon-1NFattributevaluesintomanytuples– Stateful apply forwindowedandneighborhoodfunctions

• Querylanguage:MyriaL (Imperative+Declarative)– Eachstatementisdeclarative(SQL,comprehensions,functioncalls)– Statementsarecombinedwithimperativeconstructs

• Variableassignment• Iteration

• PythonUDFs/UDAs– Minimizebarrierstoadoptionandrunlegacycode

• PythonAPI– FluentAPIwithPythonlambdafunctions

MagdalenaBalazinska- UniversityofWashington 18

Page 19: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Polystore Optimization• Rule-basedopt.withthreetypesofrules

– OptimizelogicalMyria algebraplans– Translatelogicalplansintoback-endspecificphysicalplans– Optimizeback-endspecificphysicalplans

• Toaddanewback-end,developermustspecify– Treerepresentationofquerylanguage– RulesthattranslateMyria algebraintothisrepresentation– Administrativefunctionsincludingonetosubmitqueries

• Datamodelindependence– Myria hidestheexistenceofvariousback-ends– UserswriteMyriaL scriptsassumingrelationalmodel– Back-endsincludeselectarray,graph,andkey-valuesystems

MagdalenaBalazinska- UniversityofWashington 19

Page 20: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

FederatedQueryExecution

Federatedplansrequirefastdatamovement

MagdalenaBalazinska- UniversityofWashington 20

Worker1

Worker"

SourceDBMS

User

t = scan(data)x = distances(t,t)export(x,'db://Target')

x = import('db://Source')u = cluster(x)

WorkerDirectorysource.w1à target.wmsource.wnà target.w1

[1] [2]

[3]

[4]

Worker1

Worker#

TargetDBMS

UserorOpt.

Page 21: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

DataMovementwithPipeGen

A+

DBMSBytecode

UnitTests

PipeGen

Pipegen-EnabledDBMS

21

PipeGen:DataPipeGeneratorforHybridAnalyticsBrandonHaynes,AlvinCheung,andMagdalenaBalazinska.SOCC2016.

DBMSbytecode

DBMS with optimizeddata pipe

PipeVerify:Verification

IORedirect: I/O RedirectorIdentify

File Open Expressions

InjectConditional Redirection

InstrumentUnit Tests

InstrumentUnit Tests

Data Flow Analysis

Type Substitution

FormOpt: Format Optimizer

Data Pipe Type

Augmented Types

Page 22: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

PipeGen’s Performance

MagdalenaBalazinska- UniversityofWashington 22

16-nodeclusterwith16workers/tasksTransfer10^9tupleswith4ints and3doubles

Page 23: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria Polystore Stack

Browser SpecializedServices

RACO

MyMergerTree

QueryTranslation,Optimization,andOrchestration

Python/Jupyter

Parallel, Iterative, and Elastic Query

Execution

MyriaXMPI

SciDB

Graphs

NoSQL

MagdalenaBalazinska- UniversityofWashington 23

Page 24: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

MyriaX EngineandCloudDeployment

MagdalenaBalazinska- UniversityofWashington 24

AmazonEC2Instance

JSONqueryplans&APIcalls

CoordinatorREST Interface

Worker

HDFSAmazonEBSVolumesand/orLocalStorage

RDBMS

AmazonS3

Worker

YARNContainer

Worker

YARNContainer

YARNContainer

… …

YARNContainer

AmazonEC2Instance

RDBMS RDBMS

AmazonEC2Instance

… …

Page 25: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

MyriaX Overview

25MagdalenaBalazinska- UniversityofWashington

• Datastorage– ReaddatafromS3,HDFS,localfiles– ParseCSV,TSV,andvariousscientificfileformats– StoredatainlocalrelationalDBMSinstances

• Faststoragewithphysicaltuning(indexing,hash-partitioning)

• Queryexecution– FundamentallyaparallelDBMS

• Fast,pipelinedqueryexecution– Butschedulingmoreflexibletosupportelasticity– Novelfeatures:Multiwayjoinsanditerations

• Resourcemanagement– ExecutesontopoftheYARNresourcemanager

Page 26: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

EfficientIterativeProcessing

• Userspecifiesquerydeclaratively– SubsetofDatalog withaggregation

• Generateefficient,shared-nothingqueryplan– Smallextensions to existingshared-nothingsystems

• Planamenabletoruntimeoptimizations– Synchronousvsasynchronous– Differentprocessingpriorities

• OptimizationssignificantlyaffectperformanceMagdalenaBalazinska- UniversityofWashington 26

AsynchronousandFault-TolerantRecursiveDatalogEvaluationinShared-NothingEnginesJingjing Wang,MagdalenaBalazinska,andDanielHalperin.PVLDB 8(12):1542-1553(2015)

Page 27: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria’s OptimizedIterationsExample

Declarative QueryE = scan(jwang:cc:graph);V = select distinct E.$0 from E;doCC := [$0, MIN($1)] <-[from V emit V.$0 as x, V.$0 as y] +[from E, CC where E.$0 = CC.$0 emit E.$1, CC.$1];

until convergence;store(CC, CC);

MagdalenaBalazinska - UniversityofWashington 27

AsynchronousandFault-TolerantRecursiveDatalogEvaluationinShared-NothingEnginesJingjing Wang,MagdalenaBalazinska,andDanielHalperin.PVLDB 8(12):1542-1553(2015)

//Canhave multiple relations//with recursive dep.

IDBController(CC) Scan(Edges) 

Join 

Scan(Edges) 

Compiled to a Distributed Query Plan

Page 28: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

PerformanceComparisonwithSparkDeclarativeQuery

(subsetofDatalog withagg.)

Shared-NothingQueryPlanIn-MemoryProcessing

Synchronous

Asynchronous

PrioritizeNewData PrioritizeBaseData

28

# of Workers8 16 32 64

0

50

100

150

200

250

Que

ry T

ime

(Sec

onds

)

Spark Myria, Sync Myria, Async

(GraphX) 28

ConnectedComponents– Twittersubgraph221millionedgesand5millionvertices

Page 29: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria Polystore Stack

Browser SpecializedServices

RACO

MyMergerTree

QueryTranslation,Optimization,andOrchestration

Python/Jupyter

Parallel, Iterative, and Elastic Query

Execution

MyriaXMPI

SciDB

Graphs

NoSQL

MagdalenaBalazinska- UniversityofWashington 29

Page 30: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

MagdalenaBalazinska- UniversityofWashington 30

CloudOperationinMyria

OrpointtodatainAmazonS3

Page 31: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria’s PersonalizedServiceLevelAgreements

31

ChangingtheFaceofDatabaseCloudServiceswithPersonalizedServiceLevelAgreementsJenniferOrtiz,VictorT.Almeida,andMagdalenaBalazinska.CIDR2015

MagdalenaBalazinska- UniversityofWashington

WorkloadCompressionintoPSLA

WorkloadGeneration

QueryClustering

TemplateGeneration

Cross-TierPruning PSLASchema

RuntimePrediction

Myria’s SLAgeneration

Page 32: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Myria’s PerfEnforce Subsystem

32

PerfEnforceDemonstration:DataAnalyticswithPerformanceGuaranteesJenniferOrtiz,BrendanLee,andMagdalenaBalazinska.SIGMOD2016.

MagdalenaBalazinska- UniversityofWashington

Page 33: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

MagdalenaBalazinska - UniversityofWashington

Myria’s PerfEnforce Subsystem

33

Clustersizechangesduringquerysession

PerfEnforceDemonstration:DataAnalyticswithPerformanceGuaranteesJenniferOrtiz,BrendanLee,andMagdalenaBalazinska.SIGMOD2016.

Page 34: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

AutomaticDataPipes

ImageProcessingPerf.Debugging

CloudPSLAs

Myria CloudOperation

PerformanceGuarantees ElasticMemory

EfficientMulti-Join IterativeQueries

EfficientProcessing&ComplexAnalyticswithMyriaX

DataSummaries

Myria’s InnovationsSummary

Myria Polystore

FederatedAnalytics

MagdalenaBalazinska- UniversityofWashington 34

Page 35: The MyriaBig Data Management and Analytics System and ...cidrdb.org/cidr2017/slides/p37-wang-cidr17-slides.pdfMyria’sData Model and Query Interface • Relational Algebra Compiler

Conclusion• Highlyexpressive

– MyriaL (RA+iterations)&Python• Polystore withhybridanalytics• Highperformanceonvarietyofqueries• Availableasaservice

– Focusonlowbarriertoentry– Andturningusersintoself-sufficientexperts– Alsofocusontheserviceprovider:OperateMyria

• Sourcecodeandmoreinfo(includesvideos)http://myria.cs.washington.edu/

35MagdalenaBalazinska- UniversityofWashington


Top Related