amadeus (analysis of massive data in earth and universe ... · amadeus (analysis of massive data in...
TRANSCRIPT
AMADEUS (Analysis of MAssive Data in Earth and Universe Sciences)
C. Surace,
CeSAM - LAM
S. Maabout (LaBRI), N. Novelli (LIF), P.Y. Chabaud (LAM)
AMADEUS - MASTODONS - 23/01/2014
Define characteristics of planet(s) hosting stars
Try to guess the new targets for next observations of exoplanets Using CoRoT « Exodat » database for Exoplanets data mining
AMADEUS Goals - I
AMADEUS - MASTODONS - 23/01/2014
AMADEUS Goals - II
Data mining techniques under consideration • Pattern detection, including functional dependencies (article in press)
• Incremental approaches
• Approximation techniques
• Distributed approaches
• Parallel techniques
• Outlier detection (article in press)
• Multi dimensional ranking : Skyline approach (article in press)
• Summarisation Techniques (in test)
• Descriptive approaches/ clustering (in test)
• Active learning / Semi-supervised techniques (in test)
• Techniques for streaming data (not yet implemented)
• Predictive techniques (not yet implemented) AMADEUS - MASTODONS - 23/01/2014
AMADEUS Data
150000 stars
11500 spectral classification
95000 precise photometry
104000 stellar activities
531 transits
30 planets
AMADEUS - MASTODONS - 23/01/2014
Observation fields
Towards Galactic
center Towards Galactic
Anti-center
AMADEUS - MASTODONS - 23/01/2014
General issues
Astrophysical data displays various characteristics interesting from a data-mining viewpoint, including :
• missing data,
• errors associated with the measurements,
• multiple measurements for the same object over time,
• heterogeneous data,
• bias in the sample selection,
• and imbalanced data.
For the particular dataset under investigation, the most relevant issues to be addressed are the missing values and the extreme imbalance in the data.
AMADEUS - MASTODONS - 23/01/2014
Jordi Nin, Marc Sole, Dino Ienco, …
Amadeus Exoplanet Data Analysis First tests
Results
Future work
AMADEUS - MASTODONS - 23/01/2014
Data Correlation E. Garnaud, N. Hanusse, S. Maabout, N. Novelli,…
• Focus on extraction of exact, approximative and conditional Functional Dependencies
• Focus on visualisation • Use of TULIP software
• Focus on usage of skyline approach (selection with compromise)
Goals
• How to deal with missing values (constraint : Do NOT replace missing values)
• How to deal with experimental data (precision, accuracy, repeatability)
• How to scale up to massive data • How to deal with distributed VO data
Questions
• Scale up studies with massive data : data reduction with use of Functional Dependencies (in test)
• Solutions to extract Functional Dependencies in case of missing values (publication in prep)
Future work
AMADEUS - MASTODONS - 23/01/2014
Visualisation with TULIP
Objectives : provide visual representations to the experts in order to check and validate new hypothesis
In the AMADEUS project :
Visualisation of overall raw data using interactive data visualisation
Data Cleaning and Staging using visualisation to fix initial parameters
Visualisation of data mining results to check outliers, regions of interest, correlation, clustering…
Application to Astrophysics and Climatology
R. Bourqui, A. Sallaberry, N. Novelli,…
AMADEUS - MASTODONS - 23/01/2014
Organisation • Workshop co-organisation with Gaia et PetaSky (with BDA meeting) • Workshop co-organisation with Gaia et PetaSky(performance/visualisation) • participation "Indexation" meeting du 15 Janvier joint meeting with Gaia et PetaSky
• Invitation of researcher Sabine Mc Connell (University of Trent) • Invitation of researcher Jordi Nin (Universitat Politècnica de Catalunya) • Collaboration with Universitat Politècnica de Catalunya (LIRMM)
• Teaching in the Summer school "masses de données distribuées" (June 2014)
AMADEUS and extra financial support • Participation to a COST project • Financial support for invited researcher • Financial support « Investissement d’avenir » for a « Big Data » project. • Financial support for a PhD thesis (conseil Régional d’Aquitaine) • Financial support for Engineer ADT (INRIA) for Hadoop testing
2013 : What else ?
AMADEUS - MASTODONS - 23/01/2014
Improving Astrophysical Data
• More data, more complete, Include other surveys
Scaling Data mining techniques
• Pattern detection, including functional dependencies
• Outlier detection
Optimizing Data mining techniques
• Summarisation Techniques
• Descriptive approaches/ clustering
• Active learning / Semi-supervised techniques
Start
• Techniques for streaming data
• Predictive techniques
• Astrophysical analysis of the dependencies
2014 : What’s next ?
2014 : What’s next ?
• Joint Venture : AMADEUS - PETASKY - GAIA • share Hardware cost, • share engineer time, • share data sets, • compare queries optimisation, • differ use cases for astrophysical queries • co organise Workshops
• Focus on Astrophysical data
• Focus on visualisation