adaptive schema databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf ·...

Post on 26-Oct-2019

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

AdaptiveSchemaDatabases

WilliamSpothb,Bahareh SadatArabi,EricS.Chano,DieterGawlicko,AdelGhoneimyo,BorisGlavici,BedaHammerschmidto,OliverKennedyb,

Seokki Leei,ZhenHua Liuo,XingNiui,YingYangb

b:UniversityatBuffalo i:IllinoisInst.Tech. o:Oracle

1

AdaptiveSchemaDatabases

2

Classicrelationaldatabase• Navigationalandorganizationalpurposeretaindiscovery,goodperformanceandspace,reusable.

3

Classicrelationaldatabase• But...Highupfrontcostandinflexible

4

BigData/NOSQL• Datacanbeusedimmediately.

5

BigData/NOSQL• But...SacrificenavigationalandPerformancebenefitandmayendupwithduplicateofwork

6

AdaptiveSchemaDatabases

Queriesandfeedback...

eventually

• BridgethegapbetweenrelationaldatabaseandNoSQl.

7

AdaptiveSchemaDatabases

Queriesandfeedback...

eventually

• BridgethegapbetweenrelationaldatabaseandNoSQl.

8

AdaptiveSchemaDatabases

Input:

Queries:SELECTnameFROMUndergradUNIONSELECTnameFROMGrad

SELECTdeg FROMGrad

SELECTnameFROMStudent

9

Outline

Unstructured Data Semi-structed Data (e.g., JSON)

Extraction workflow

SchemaWorkspace

Schema Matching

Extraction workflow Extraction workflow

Extraction Schema Candidates

SchemaWorkspace

SchemaWorkspace

SchemaWorkspace

Queries + Feedback• Extractionanddiscovery• Adaptive,personalizedschemas

fromqueries• Explanationsandfeedback• Adaptiveorganization• Conclusionsandfuturework

10

Extraction

Unstructured Data Semi-structed Data (e.g., JSON)

Extraction workflow

SchemaWorkspace

Schema Matching

Extraction workflow Extraction workflow

Extraction Schema Candidates

SchemaWorkspace

SchemaWorkspace

SchemaWorkspace

Queries + Feedback

11

• ASDextractsschemacandidateset

Giveninput:

12

Extraction

• ASDextractsschemacandidateset

Giveninput:

13

Extraction

• ASDextractsschemacandidateset

Giveninput:

14

Extraction

Extraction• ASDextractsschemacandidateset

Giveninput:

15

• ASDextractsschemacandidateset

schemacandidatesetCext={Sext,Pext},whereSext isasetofcandidateschemas,Pext isaprobabilitydistributionovertheseschemas.

16

Discovery

• ASDextractsschemacandidateset

Smax:thebestguessschema

17

Discovery

Adaptive,personalizedschemasfromqueries

Unstructured Data Semi-structed Data (e.g., JSON)

Extraction workflow

SchemaWorkspace

Schema Matching

Extraction workflow Extraction workflow

Extraction Schema Candidates

SchemaWorkspace

SchemaWorkspace

SchemaWorkspace

Queries + Feedback

18

• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.

Initially,W={}

19

Adaptive,personalizedschemas

• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.

20

Query1:SELECTname FROMUndergrad UNIONSELECTname FROMGrad

FindingSchemasfromQueries

• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.

21

FindingSchemasfromQueries

Query1:SELECTname FROMUndergrad UNIONSELECTname FROMGrad

• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.

22

Query2:SELECTdeg FROMGrad

FindingSchemasfromQueries

• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.

Query3:SELECTnameFROMStudent

W1 =(S1={Undergrad(name)},P1=0.27),(S1={Grad(name)},P1=0.23),(S1={Undergrad(name), Grad(name)},P1=0.5)

23

SynthesizingTables

Explanationsandfeedback

Unstructured Data Semi-structed Data (e.g., JSON)

Extraction workflow

SchemaWorkspace

Schema Matching

Extraction workflow Extraction workflow

Extraction Schema Candidates

SchemaWorkspace

SchemaWorkspace

SchemaWorkspace

Queries + Feedback

24

Extractionerrorsappearinthreeforms:(1)AqueryincompatiblewithSmax

(2)AnupdatewithdatathatviolatesSmax(3)Anextractionerrorpresentedtouser

Weprovide:(1)explanationofresults(2)provenance(3)Warn theanalystwithambiguity(4)Explain theambiguity(5)Evaluate themagnitudeofambiguity(6)Assisttheanalysttoresolve theambiguity

Whatmightgowrong

25

ASDinteractswiththeoutsideworld:Schema,Data,andUpdate.

Schemainteractions:WhenaqueryincompatiblewithSmax andtheworkspace

Datainteractions:provenanceforattributeandrowlevelambiguity.

Updateinteractions:• representschemamismatchesasmissingvalues.• resolvedataerrorswithaprobabilisticrepair.• upgradeherschematomatchthechanges.• checkpointherworkspaceandignorenewupdates.

Typesoferrors

26

Explanationsandfeedback

Explanations:WematchStudentwith

bothGradandUndergrad

27

Condition2:Queryfromunknown schemaelements:SELECTnameFROMStudent

W1 =(S1={Undergrad(name)},P1=0.27),(S1={Grad(name)},P1=0.23),(S1={Undergrad(name), Grad(name)},P1=0.5)

Adaptiveorganization

Unstructured Data Semi-structed Data (e.g., JSON)

Extraction workflow

SchemaWorkspace

Schema Matching

Extraction workflow Extraction workflow

Extraction Schema Candidates

SchemaWorkspace

SchemaWorkspace

SchemaWorkspace

Queries + Feedback

28

AdaptiveorganizationTrade-offbetweenstoringdatainitsnativeformatandbasedonaspecificschema.

Whatisthechallenge?Manyworkspaces,addtabletotheschema,….

ChallengesandPossibleSolutions:• Wewantmultiplepersonalizedschemas

1.Relationalworkspaceschemaisessentiallyaviewoverrawdata.Materializingviewcanbeused.

2.Useexistingadaptivephysicaldesign andcaching techniques.• Sharedmaterializations

1.Incrementalmaterializedviewmaintenance.Leveragetechniquesfromrevisioncontrolsystems.

2.Viewselectionproblem.29

Conclusionsandfuturework

ASDbridgesthegapbetweenrelationaldatabasesandNoSQL.

• Discovery:Helpuserexploreandunderstandnewdatabyprovidinganoutlineoftheavailableinformation.Done

• Materialization:Adoptworkonadaptivedatastructures.Partiallydone• DataSynthesis:Synthesisnewtablesandattributesfromexistingdata.

Done• ConflictResponse:

– Versioningorbranchingtheschema.– Loganalysistohelpusersassesstheimpactofschemarevisions.

30

top related