the researcher’s guide to the data deluge: querying a scientific database in just a few seconds

Post on 19-Mar-2016

29 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in just a Few Seconds. Martin L. Kersten Stratos Idreos Stefan Manegold Erietta Liarou (and members of the CWI database group). Science Feb’11 Data. http://www.sciencemag.org/site/special/data/. - PowerPoint PPT Presentation

TRANSCRIPT

The Researcher’s Guide to the Data Deluge:Querying a Scientific Database

in just a Few Seconds

Martin L. KerstenStratos Idreos

Stefan ManegoldErietta Liarou

(and members of the CWI database group)

Science Feb’11 Data

http://www.sciencemag.org/site/special/data/

Science Feb’11 Data …. We have recently passed the point where more data is being collected than we can physically store. This storage gap will widen rapidly in data-intensive fields. Thus, decisions will be needed on which data to archive and which to discard. A separate problem is how to access and use these data. Many data sets are becoming too large to download. Even fields with well-established data archives, such as genomics, are facing new and growing challenges in data volume and management. And even where accessible, much data in many fields is too poorly organized to enable it to be efficiently used….

Science Feb’11 Data

Science Feb’11 Data

Database research vision• Throwing away data before harvesting is the worst

ROI one can imagine.

• LSST budget is 100 M$– During its ten-year survey, LSST will acquire 5.6

million 15-second images, spread over 2.8 million pointings.

– 20 billion rows in the Object table, 3 trillion rows in the Source table

Database technology is not designed for the challenges

All sizes don’t fit

The Dawn of a new Database Era

Capture the query intent !

FIVE STEPS INTO THE FUTURE

• One-minute DBMS for real-time performance.

• Multi-scale query processing for gradual exploration.

• Post processing for conveying meaningful data.

• Query morphing to adjust for proximity results.

• Query alternatives to cope with lack of providence.

One-minute database kernels Step 1: Do the BEST you can within a given time frame !

• Research how to …– organize query evaluation around what is

available at low cost– redesign algorithms and operators such that they

adaptively avoid expensive steps normally needed for correctness and completeness

– stop process after agreed upon time– ensure continuation upon request.

Multi-scale query processing Step 2: Use a staging scheme for query evaluation !

• Research how to …– partition the database for producing incremental

valuable resultsD => D1 union (D2.1 union (D2.2 union (D2.3 union ..

– avoid harmful SELECT * FROM table queries

– break a query into a converging query sequenceQ => Q1 union Q2 => Q1 union Q2.1 union Q2.2 =>Q1 union Q2.1 union Q2.2.1 union Q2.2.2 …….

Result-set post processing Step 3: Use meaningful compression to convey more !

• Research how to …– post-process results sets statistically– prepare for facetted query answers– show sort for boundaries first• Min/max domain enclosures for all attributes

Query morphing Step 4: Bend the search towards interesting areas !

• Research how to …– explore the query expression space?– transform a query with small result set such that it

produces relevant, nearby answers

Result-set post processing Step 5: Ignore stupid questions, give hints instead !

• Research how to …– find alternative queries in terms of expressiveness

+ performance– Better exploit the query log for hints

-- Q1: Using the time budget. (36291322 tuples) SELECT ra, dec, band1, intensity1, type FROM PhotoObj;-- Q2: Using data statistics. (879300 tuples) SELECT * FROM PhotoObj WHERE ra BETWEEN 53 AND 54 AND dec BETWEEN 80 AND 82;-- Q3: Using query statistics. (899 tuples) SELECT * FROM PhotoObj WHERE ra BETWEEN 53 AND 54 AND dec BETWEEN 80 AND 82 AND distance(ra,dec,radius) < 10;

SELECT * FROM PhotoObj

The Dawn of a new Database Era

Brought to you by the CWI database research group

top related