astronomy, petabytes, and mysql mysql conference santa clara, ca april 16, 2008 kian-tat lim...
TRANSCRIPT
Astronomy, Petabytes, and MySQL
MySQL ConferenceSanta Clara, CAApril 16, 2008
Kian-Tat LimStanford Linear Accelerator Center
MySQL ConferenceApril 16, 2008 Santa Clara, CA
2 / 47
Outline
LSSTLSST Database
LSST Database + MySQL
MySQL ConferenceApril 16, 2008 Santa Clara, CA
3 / 47
LSST
What Is It?Why Build It?
MySQL ConferenceApril 16, 2008 Santa Clara, CA
4 / 47
LSST
What Is It?Why Build It?
MySQL ConferenceApril 16, 2008 Santa Clara, CA
5 / 47
Telescope
Proposed telescope to be
built in Chile
MySQL ConferenceApril 16, 2008 Santa Clara, CA
6 / 47
Large
3.2 gigapixel camera
8.4 meter diameter mirror
MySQL ConferenceApril 16, 2008 Santa Clara, CA
7 / 47
Synoptic Survey
Wide
Deep
Fast
MySQL ConferenceApril 16, 2008 Santa Clara, CA
8 / 47
LSST
What Is It?Why Build It?
MySQL ConferenceApril 16, 2008 Santa Clara, CA
9 / 47
Dark Matter and Energy
Photo: J. A. Tyson, W. Colley, E. L. Turner, and NASA
MySQL ConferenceApril 16, 2008 Santa Clara, CA
10
/ 47
Variable Objects
MySQL ConferenceApril 16, 2008 Santa Clara, CA
11
/ 47
Transient Objects
MySQL ConferenceApril 16, 2008 Santa Clara, CA
12
/ 47
Moving Objects
Photo: D. Roddy, Lunar and Planetary Institute
MySQL ConferenceApril 16, 2008 Santa Clara, CA
13
/ 47
LSST Database
What’s In It?How Big?
How Often?What Queries?Unusual Needs
MySQL ConferenceApril 16, 2008 Santa Clara, CA
14
/ 47
LSST Database
What’s In It?How Big?
How Often?What Queries?Unusual Needs
MySQL ConferenceApril 16, 2008 Santa Clara, CA
15
/ 47
Database: Components
Image Metadata
Moving
Objects
CatalogObject Catalog
Source Catalog
Difference Image Source Catalog
Provenance
Statistics
Summaries
Calibration Engineering and Facility Database
MySQL ConferenceApril 16, 2008 Santa Clara, CA
16
/ 47
Astronomical Objects
Image Metadata
Moving
Objects
CatalogObject Catalog
Source Catalog
Difference Image Source Catalog
Provenance
Statistics
Summaries
Calibration Engineering and Facility Database
MySQL ConferenceApril 16, 2008 Santa Clara, CA
17
/ 47
Sources
Image Metadata
Moving
Objects
CatalogObject Catalog
Source Catalog
Difference Image Source Catalog
Provenance
Statistics
Summaries
Calibration Engineering and Facility Database
MySQL ConferenceApril 16, 2008 Santa Clara, CA
18
/ 47
Changes
Image Metadata
Moving
Objects
CatalogObject Catalog
Source Catalog
Difference Image Source Catalog
Provenance
Statistics
Summaries
Calibration Engineering and Facility Database
MySQL ConferenceApril 16, 2008 Santa Clara, CA
19
/ 47
Image Metadata
Image Metadata
Moving
Objects
CatalogObject Catalog
Source Catalog
Difference Image Source Catalog
Provenance
Statistics
Summaries
Calibration Engineering and Facility Database
MySQL ConferenceApril 16, 2008 Santa Clara, CA
20
/ 47
Calibration and Facility
Image Metadata
Moving
Objects
CatalogObject Catalog
Source Catalog
Difference Image Source Catalog
Provenance
Statistics
Summaries
Calibration Engineering and Facility Database
MySQL ConferenceApril 16, 2008 Santa Clara, CA
21
/ 47
LSST Database
What’s In It?How Big?
How Often?What Queries?Unusual Needs
MySQL ConferenceApril 16, 2008 Santa Clara, CA
22
/ 47
Sagans of Rows
49 billion objects
2.8 trillion sources
MySQL ConferenceApril 16, 2008 Santa Clara, CA
23
/ 47
Lots of Columns
308 columns for objects
56 columns for sources
(for now)
MySQL ConferenceApril 16, 2008 Santa Clara, CA
24
/ 47
Database Size
Grows to >14 PB
MySQL ConferenceApril 16, 2008 Santa Clara, CA
25
/ 47
LSST Database
What’s In It?How Big?
How Often?What Queries?Unusual Needs
MySQL ConferenceApril 16, 2008 Santa Clara, CA
26
/ 47
Frequency
Nightly updates
Semi-annual data releases
MySQL ConferenceApril 16, 2008 Santa Clara, CA
27
/ 47
LSST Database
What’s In It?How Big?
How Often?What Queries?Unusual Needs
MySQL ConferenceApril 16, 2008 Santa Clara, CA
28
/ 47
Queries
•All about an object•All objects meeting criteria•All objects near objects meeting
criteria•All objects with interesting time
series•All pairs of objects with similar time
series
MySQL ConferenceApril 16, 2008 Santa Clara, CA
29
/ 47
LSST Database
What’s In It?How Big?
How Often?What Queries?Unusual Needs
MySQL ConferenceApril 16, 2008 Santa Clara, CA
30
/ 47
Unusual Needs
Flexibility
Provenance
MySQL ConferenceApril 16, 2008 Santa Clara, CA
31
/ 47
LSST Database + MySQL
Why MySQL?Scalability?
Performance?
MySQL ConferenceApril 16, 2008 Santa Clara, CA
32
/ 47
LSST Database + MySQL
Why MySQL?Scalability?
Performance?
MySQL ConferenceApril 16, 2008 Santa Clara, CA
33
/ 47
MySQL
Relational database management system
MySQL ConferenceApril 16, 2008 Santa Clara, CA
34
/ 47
Open Source
Vibrant community
Strong company support
MySQL ConferenceApril 16, 2008 Santa Clara, CA
35
/ 47
Hardware
Runs on commodity hardware
MySQL ConferenceApril 16, 2008 Santa Clara, CA
36
/ 47
In-Memory Tables
Needed for near-real-time processing
MySQL ConferenceApril 16, 2008 Santa Clara, CA
37
/ 47
LSST Database + MySQL
Why MySQL?Scalability?
Performance?
MySQL ConferenceApril 16, 2008 Santa Clara, CA
38
/ 47
“MySQL Grid”
MySQL ConferenceApril 16, 2008 Santa Clara, CA
39
/ 47
Partitioning
Large tables partitioned spatially
MySQL ConferenceApril 16, 2008 Santa Clara, CA
40
/ 47
Replication
Dimension tables likely replicated
MySQL ConferenceApril 16, 2008 Santa Clara, CA
41
/ 47
Needs: Distributor/Combiner
LSST will build prototypeNeed long-term support
MySQL ConferenceApril 16, 2008 Santa Clara, CA
42
/ 47
LSST Database + MySQL
Why MySQL?Scalability?
Performance?
MySQL ConferenceApril 16, 2008 Santa Clara, CA
43
/ 47
Per-Column Indexing
2X data size
MySQL ConferenceApril 16, 2008 Santa Clara, CA
44
/ 47
Needs: Optimizer
Efficient use of multiple (20-30) indexes
MySQL ConferenceApril 16, 2008 Santa Clara, CA
45
/ 47
Needs: Indexes
Bitmap/compressed indexes
MySQL ConferenceApril 16, 2008 Santa Clara, CA
46
/ 47
Needs: Storage Engine
“Shared scan” for long-running full-table queries
MySQL ConferenceApril 16, 2008 Santa Clara, CA
47
/ 47
Summary
Building a petabyte DB
MySQL can be a core component