non-traditional databases. reading 1. scientific data management at the johns hopkins institute for...

16
Non-Traditional Non-Traditional Databases Databases

Upload: scott-wilkinson

Post on 14-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

Non-Traditional Non-Traditional DatabasesDatabases

Page 2: Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

ReadingReading

1.1. Scientific data management at the Johns Hopkins Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif institute for data intensive engineering and science Yanif Ahmad, Randal Burns, Michael Kazhdan, Charles Ahmad, Randal Burns, Michael Kazhdan, Charles Meneveau, Alex Szalay, Andreas Terzis, February 2011 Meneveau, Alex Szalay, Andreas Terzis, February 2011 SIGMOD Record , Volume 39 Issue 3 , SIGMOD Record , Volume 39 Issue 3 , http://dl.acm.org/citation.cfm?http://dl.acm.org/citation.cfm?id=1942776.1942782&coll=DL&dl=ACM&CFID=6620605id=1942776.1942782&coll=DL&dl=ACM&CFID=66206057&CFTOKEN=48992457 7&CFTOKEN=48992457

2.2. Migrating a (large) science database to the cloud Ani Migrating a (large) science database to the cloud Ani Thakar, Alex Szalay, June 2010 HPDC '10: Proceedings of Thakar, Alex Szalay, June 2010 HPDC '10: Proceedings of the 19th ACM International Symposium the 19th ACM International Symposium on High on High Performance Distributed Computing , Performance Distributed Computing , http://dl.acm.org/citation.cfm?id=1851539&bnc=1 http://dl.acm.org/citation.cfm?id=1851539&bnc=1

FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 22

Page 3: Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

ReadingReading

3.3. M. Stonebaker, U. Cetintemel, One Size Fits All": An M. Stonebaker, U. Cetintemel, One Size Fits All": An Idea Whose Time Has Come and Gone, in Idea Whose Time Has Come and Gone, in Proceeding of CDE '05 Proceedings of the 21st Proceeding of CDE '05 Proceedings of the 21st International Conference on Data Engineering, International Conference on Data Engineering, IEEE Computer Society Washington, DC, USA, IEEE Computer Society Washington, DC, USA, 2005, 2005, http://www.computer.org/portal/web/csdl/abs/prochttp://www.computer.org/portal/web/csdl/abs/proceedings/icde/2005/2285/00/22850002abs.htm eedings/icde/2005/2285/00/22850002abs.htm

FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 33

Page 4: Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

Traditional Database Traditional Database Management SystemsManagement Systems Focus on business data Focus on business data

managementmanagement Provide uniform capabilities Provide uniform capabilities

regardless of the data regardless of the data characteristicscharacteristics

Need: Need: capabilities to meet new capabilities to meet new application requirementsapplication requirements

FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 44

Page 5: Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

Examples of New Examples of New NeedsNeeds Stream Data ProcessingStream Data Processing Large scale scientific databasesLarge scale scientific databases Data warehousingData warehousing

FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 55

Page 6: Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

Streaming DataStreaming Data

Sensor-based applicationsSensor-based applications– Real-time systems: sophisticated Real-time systems: sophisticated

alerting, location-based services, alerting, location-based services, – Historical dataHistorical data

Financial applicationsFinancial applications– Support applications, such as electronic Support applications, such as electronic

trading, legal compliance, real-time trading, legal compliance, real-time marker analysis, etc.marker analysis, etc.

Performance requirementsPerformance requirements

FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 66

Page 7: Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

Performance SDMS vs. Performance SDMS vs. RDMSRDMS

Empirical results (see reference paper #3)Empirical results (see reference paper #3) Issues:Issues:

– Inbound processing model Inbound processing model – Correct primitives for stream processing Correct primitives for stream processing

(aggregates, “timeout,” “slack”)(aggregates, “timeout,” “slack”)– Seamless integration of DBMS processing Seamless integration of DBMS processing

with application processing (client-server vs. with application processing (client-server vs. embedded applications)embedded applications)

– Transactional behavior (weaker notion of Transactional behavior (weaker notion of recovery, tolerance, no ACID requirements)recovery, tolerance, no ACID requirements)

FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 77

Page 8: Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

Security for Security for Streaming Data?Streaming Data? What is the difference between What is the difference between

the security needs of streaming the security needs of streaming vs. traditional (e.g., relational) vs. traditional (e.g., relational) data?data?

How to enforce security?How to enforce security?– Security punctuationSecurity punctuation

FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 88

Page 9: Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

Scientific DatabasesScientific Databases

Massive amount of dataMassive amount of data Heterogeneous dataHeterogeneous data

– Sensor data, satellite, scientific Sensor data, satellite, scientific simulation data, etc.simulation data, etc.

Goal: better understanding of Goal: better understanding of physical phenomenaphysical phenomena– Genomic database, geological Genomic database, geological

exploration, astronomy, etc. exploration, astronomy, etc. FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 99

Page 10: Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

Scientific DatabasesScientific Databases

Need efficient analysis and querying Need efficient analysis and querying capabilitiescapabilities– Multi-dimensional indexing (e.g., Multi-dimensional indexing (e.g.,

genomic sequence indexing)genomic sequence indexing)– Specific applications (e.g., visualization Specific applications (e.g., visualization

of seismic data)of seismic data)– Specific aggregations (e.g., data mining Specific aggregations (e.g., data mining

for biological correlation)for biological correlation)– Efficient data archiving, staging, lineage, Efficient data archiving, staging, lineage,

and error propagation techniquesand error propagation techniquesFarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1010

Page 11: Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

Example Scientific Example Scientific Data Management Data Management Reference #1Reference #1 Basic research: Basic research:

1.1. formation of hypotheses and theoriesformation of hypotheses and theories

2.2. designing experiments for their designing experiments for their validationvalidation

3.3. collecting data by experimentationcollecting data by experimentation

4.4. analyzing data to guide new insights for analyzing data to guide new insights for further researchfurther research

FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1111

Page 12: Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

Scientific ComputingScientific Computing

Steps 3 and 4 are data intensiveSteps 3 and 4 are data intensive Need to improve computational Need to improve computational

powerpower– Parallel processingParallel processing– Grid and supercomputersGrid and supercomputers– Special application logic Special application logic – Preservation of scientific dataPreservation of scientific data

FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1212

Page 13: Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

Current Technologies and Current Technologies and Scientific DatabasesScientific Databases

Reference #2: How to migrate Reference #2: How to migrate large scale scientific database to large scale scientific database to cloud environment?cloud environment?

Difficult engineering processDifficult engineering process Limited capabilities of database Limited capabilities of database

useruser Based on commercial cloudBased on commercial cloud

FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1313

Page 14: Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

Data WarehousingData Warehousing

Repository of data providing Repository of data providing organized and cleaned organized and cleaned enterprise-wide data (obtained enterprise-wide data (obtained form a variety of sources) in a form a variety of sources) in a standardized formatstandardized format– Data mart (single subject area)Data mart (single subject area)– Enterprise data warehouse (integrated Enterprise data warehouse (integrated

data marts)data marts)– Metadata Metadata

FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1414

Page 15: Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

Data WarehousingData Warehousing

Difference between OLTP and Difference between OLTP and OLAPOLAP

Data management: updates, Data management: updates, indexing, dependencies, etc.indexing, dependencies, etc.

OLAP: needs Read Optimized OLAP: needs Read Optimized storagestorage

FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1515

Page 16: Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

FarkasFarkas CSCE 824 - Spring 2011CSCE 824 - Spring 2011 1616

Next ClassNext Class

Geographical DatabasesGeographical Databases