amadeus (analysis of massive data in earth and universe ... · amadeus (analysis of massive data in...

12
AMADEUS (Analysis of MAssive Data in Earth and Universe Sciences) C. Surace, CeSAM - LAM S. Maabout (LaBRI), N. Novelli (LIF), P.Y. Chabaud (LAM) AMADEUS - MASTODONS - 23/01/2014

Upload: others

Post on 16-May-2020

11 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: AMADEUS (Analysis of MAssive Data in Earth and Universe ... · AMADEUS (Analysis of MAssive Data in Earth and Universe Sciences) C. Surace, CeSAM - LAM S. Maabout (LaBRI), N. Novelli

AMADEUS (Analysis of MAssive Data in Earth and Universe Sciences)

C. Surace,

CeSAM - LAM

S. Maabout (LaBRI), N. Novelli (LIF), P.Y. Chabaud (LAM)

AMADEUS - MASTODONS - 23/01/2014

Page 2: AMADEUS (Analysis of MAssive Data in Earth and Universe ... · AMADEUS (Analysis of MAssive Data in Earth and Universe Sciences) C. Surace, CeSAM - LAM S. Maabout (LaBRI), N. Novelli

Define characteristics of planet(s) hosting stars

Try to guess the new targets for next observations of exoplanets Using CoRoT « Exodat » database for Exoplanets data mining

AMADEUS Goals - I

AMADEUS - MASTODONS - 23/01/2014

Page 3: AMADEUS (Analysis of MAssive Data in Earth and Universe ... · AMADEUS (Analysis of MAssive Data in Earth and Universe Sciences) C. Surace, CeSAM - LAM S. Maabout (LaBRI), N. Novelli

AMADEUS Goals - II

Data mining techniques under consideration • Pattern detection, including functional dependencies (article in press)

• Incremental approaches

• Approximation techniques

• Distributed approaches

• Parallel techniques

• Outlier detection (article in press)

• Multi dimensional ranking : Skyline approach (article in press)

• Summarisation Techniques (in test)

• Descriptive approaches/ clustering (in test)

• Active learning / Semi-supervised techniques (in test)

• Techniques for streaming data (not yet implemented)

• Predictive techniques (not yet implemented) AMADEUS - MASTODONS - 23/01/2014

Page 4: AMADEUS (Analysis of MAssive Data in Earth and Universe ... · AMADEUS (Analysis of MAssive Data in Earth and Universe Sciences) C. Surace, CeSAM - LAM S. Maabout (LaBRI), N. Novelli

AMADEUS Data

150000 stars

11500 spectral classification

95000 precise photometry

104000 stellar activities

531 transits

30 planets

AMADEUS - MASTODONS - 23/01/2014

Page 5: AMADEUS (Analysis of MAssive Data in Earth and Universe ... · AMADEUS (Analysis of MAssive Data in Earth and Universe Sciences) C. Surace, CeSAM - LAM S. Maabout (LaBRI), N. Novelli

Observation fields

Towards Galactic

center Towards Galactic

Anti-center

AMADEUS - MASTODONS - 23/01/2014

Page 6: AMADEUS (Analysis of MAssive Data in Earth and Universe ... · AMADEUS (Analysis of MAssive Data in Earth and Universe Sciences) C. Surace, CeSAM - LAM S. Maabout (LaBRI), N. Novelli

General issues

Astrophysical data displays various characteristics interesting from a data-mining viewpoint, including :

• missing data,

• errors associated with the measurements,

• multiple measurements for the same object over time,

• heterogeneous data,

• bias in the sample selection,

• and imbalanced data.

For the particular dataset under investigation, the most relevant issues to be addressed are the missing values and the extreme imbalance in the data.

AMADEUS - MASTODONS - 23/01/2014

Page 7: AMADEUS (Analysis of MAssive Data in Earth and Universe ... · AMADEUS (Analysis of MAssive Data in Earth and Universe Sciences) C. Surace, CeSAM - LAM S. Maabout (LaBRI), N. Novelli

Jordi Nin, Marc Sole, Dino Ienco, …

Amadeus Exoplanet Data Analysis First tests

Results

Future work

AMADEUS - MASTODONS - 23/01/2014

Page 8: AMADEUS (Analysis of MAssive Data in Earth and Universe ... · AMADEUS (Analysis of MAssive Data in Earth and Universe Sciences) C. Surace, CeSAM - LAM S. Maabout (LaBRI), N. Novelli

Data Correlation E. Garnaud, N. Hanusse, S. Maabout, N. Novelli,…

• Focus on extraction of exact, approximative and conditional Functional Dependencies

• Focus on visualisation • Use of TULIP software

• Focus on usage of skyline approach (selection with compromise)

Goals

• How to deal with missing values (constraint : Do NOT replace missing values)

• How to deal with experimental data (precision, accuracy, repeatability)

• How to scale up to massive data • How to deal with distributed VO data

Questions

• Scale up studies with massive data : data reduction with use of Functional Dependencies (in test)

• Solutions to extract Functional Dependencies in case of missing values (publication in prep)

Future work

AMADEUS - MASTODONS - 23/01/2014

Page 9: AMADEUS (Analysis of MAssive Data in Earth and Universe ... · AMADEUS (Analysis of MAssive Data in Earth and Universe Sciences) C. Surace, CeSAM - LAM S. Maabout (LaBRI), N. Novelli

Visualisation with TULIP

Objectives : provide visual representations to the experts in order to check and validate new hypothesis

In the AMADEUS project :

Visualisation of overall raw data using interactive data visualisation

Data Cleaning and Staging using visualisation to fix initial parameters

Visualisation of data mining results to check outliers, regions of interest, correlation, clustering…

Application to Astrophysics and Climatology

R. Bourqui, A. Sallaberry, N. Novelli,…

AMADEUS - MASTODONS - 23/01/2014

Page 10: AMADEUS (Analysis of MAssive Data in Earth and Universe ... · AMADEUS (Analysis of MAssive Data in Earth and Universe Sciences) C. Surace, CeSAM - LAM S. Maabout (LaBRI), N. Novelli

Organisation • Workshop co-organisation with Gaia et PetaSky (with BDA meeting) • Workshop co-organisation with Gaia et PetaSky(performance/visualisation) • participation "Indexation" meeting du 15 Janvier joint meeting with Gaia et PetaSky

• Invitation of researcher Sabine Mc Connell (University of Trent) • Invitation of researcher Jordi Nin (Universitat Politècnica de Catalunya) • Collaboration with Universitat Politècnica de Catalunya (LIRMM)

• Teaching in the Summer school "masses de données distribuées" (June 2014)

AMADEUS and extra financial support • Participation to a COST project • Financial support for invited researcher • Financial support « Investissement d’avenir » for a « Big Data » project. • Financial support for a PhD thesis (conseil Régional d’Aquitaine) • Financial support for Engineer ADT (INRIA) for Hadoop testing

2013 : What else ?

Page 11: AMADEUS (Analysis of MAssive Data in Earth and Universe ... · AMADEUS (Analysis of MAssive Data in Earth and Universe Sciences) C. Surace, CeSAM - LAM S. Maabout (LaBRI), N. Novelli

AMADEUS - MASTODONS - 23/01/2014

Improving Astrophysical Data

• More data, more complete, Include other surveys

Scaling Data mining techniques

• Pattern detection, including functional dependencies

• Outlier detection

Optimizing Data mining techniques

• Summarisation Techniques

• Descriptive approaches/ clustering

• Active learning / Semi-supervised techniques

Start

• Techniques for streaming data

• Predictive techniques

• Astrophysical analysis of the dependencies

2014 : What’s next ?

Page 12: AMADEUS (Analysis of MAssive Data in Earth and Universe ... · AMADEUS (Analysis of MAssive Data in Earth and Universe Sciences) C. Surace, CeSAM - LAM S. Maabout (LaBRI), N. Novelli

2014 : What’s next ?

• Joint Venture : AMADEUS - PETASKY - GAIA • share Hardware cost, • share engineer time, • share data sets, • compare queries optimisation, • differ use cases for astrophysical queries • co organise Workshops

• Focus on Astrophysical data

• Focus on visualisation