2008-5-20ivoa interoperability meeting, trieste1 mining data using matlab through astrobox chao liu,...
TRANSCRIPT
2008-5-20IVOA Interoperability Meeting, Trieste 1
Mining data using MATLAB through AstroBox
Chao LIU, Chenzhou CUIPresented by: Chenzhou CUI
National Astronomical Observatory, China
The Chinese VIRTUAL OBSERVATORY
2008-5-20IVOA Interoperability Meeting, Trieste 2
China-VO
• Chinese Virtual Observatory (China-VO) is the national VO project in China initiated in 2002 by Chinese astronomical community led by National Astronomical Observatories, Chinese Academy of Sciences.
• It focuses its research and development on VO science and applications.
• R&D focuses:– China-VO Platform – Unified Access to On-line Astronomical Resources and
Services – VO-ready Projects and Facilities – VO-based Astronomical Research Activities – VO-based Public Education
2008-5-20IVOA Interoperability Meeting, Trieste 3
An active IVOA member
IVOA 2007, Beijing
1st Small projects meeting, 2003
2008-5-20IVOA Interoperability Meeting, Trieste 4
Our products
• VOFilter– an XML filter for OpenOffice.org Calc
to open VOTable files• SkyMouse
– A Smart On-line Astronomical Information Collector
• FitHAS– FITS Header Archiving System
• VO-DAS– An OGSA-DAI based data access
service system to provide unified access to astronomy data, including catalogs, images and spectra.
• AstroBox– Coming soon– ...
http://services.china-vo.org
2008-5-20IVOA Interoperability Meeting, Trieste 5
First Science Paper from China-VO
• SDSS DR5 photometric data were searched for new Milky Way companions or substructures in the Galactic halo.
• Data analysis procedures were based on the VO-DAS.• Five candidates are identified as over-dense faint stellar
sources that have color-magnitude diagrams similar to those of known globular clusters, or dwarf spherical galaxies.– ( Liu et al., 2008, A&A)
2008-5-20IVOA Interoperability Meeting, Trieste 6
AstroBox: Goals
• To provide an astronomical data mining application service, supporting VO protocols and tools
• To provide an network environment for time-consuming astronomical data mining computing
• A high-level data analysis environment, NOT a raw data analysis tool as IRAF
2008-5-20IVOA Interoperability Meeting, Trieste 7
General procedures of data mining
• Data Accessing– query database– high volume of data
• Data Pre-processing– select qualified data– eliminate BAD data
• Data Mining– try multiple times and find a way to get unknown
knowledge from specific data set• Data Analysis and Interpretation
– visualization– comparisons with different data source– associate results with physical meaning
2008-5-20IVOA Interoperability Meeting, Trieste 8
An introduction to MATLAB
• MATLAB is a popular numerical computation software used in variant fields.
• It provides dozens of toolboxes for different purposes, e.g. statistics, pattern recognizing, optimizing, neural networks etc., as well as a number of way to access data from either local or remote sites.
• It also offers visualizations by flexible 2D and 3D graphics routines.
• It supports Java, C, and Fortran as well as its own M-language.• It is available of accessing URL resources and parsing XML,
which is necessary for embedding web service.• In its latest release, refined parallel computation is ready.• We conclude that MATLAB is one of the best platforms
on which astronomical data mining tools can be developed
2008-5-20IVOA Interoperability Meeting, Trieste 9
AstroBox
• AstroBox is a plug-in package for MATLAB to be used for astronomical computing and data mining
• It comprises of:– PLASTIC– VOTable– Local DB– VO-DAS client– Astronomical algorithms
VO-DAS
MATLAB VO-DAS Client
MA
TL
AB
D
ata
base
Too
lbox
Local DBJa
vaLi
brar
ies
VO
Tab
les
PLASTIC
VO Tools(Aladin, TOPCAT)
AstroBoxAstroBox
2008-5-20IVOA Interoperability Meeting, Trieste 10
VO utilities in AstroBox
• VOTable access and conversion– integrate STILS package
• PLASTIC availability– embed a Java subroutine to connect to PLASTIC Hub
through which to exchange data and messages with third party applications, e. g. Aladin and TOPCAT.
– SAMP support next...
• VO-DAS client interface– embed a VO-DAS command line client to send an
ADQL to VO-DAS server and wait for query result– It is also capable for asynchronous query, which can
access millions of rows of data (on going)
2008-5-20IVOA Interoperability Meeting, Trieste 11
Data mining support
• Regressions– linear regression
• inherited from MATLAB– nonlinear regression
• provide astronomical common regressive functions, e.g. King model for density profile of a dwarf galaxy.
– kernel regression• Fitting
– provide specific algorithms for non analytic expression such as complicated observation dataset or user defined functions
– several times faster than existed MATLAB functions• Spherical surface projecting functions
– Equatorial projection & Galactic projection– equal-area Lambert projection in particular for density
measurement on spherical surface– Aitoff projection for overall viewing
• Visualizing functions– 2-D plotting– 3-D plotting– modified on existed MATLAB functions
2008-5-20IVOA Interoperability Meeting, Trieste 12
Other functions
• High level functionsaiming at specific research topics, most of which currently
are Milky Way related– Kurucz stellar model– Gerardi stellar population model– isochrone fitting the stellar population– Galactic star count model with disk and halo components– Chemical evolution model for stellar population (on going)
• Most common used utilities– Monte Carlo methods– coordination transformations– magnitude system transformations
2008-5-20IVOA Interoperability Meeting, Trieste 13
Demos 1
• PLASTIC implementation
2008-5-20IVOA Interoperability Meeting, Trieste 14
Demo 2
• Special regression– using a hyperbola relationship
between independent and dependent variables
• Model fitting– density profiles of candidate dwarf
galaxy
2008-5-20IVOA Interoperability Meeting, Trieste 15
Demo 3
• Isochrone fitting– observed data are
accessed from either local database or VO-DAS server
– query reference data from Gerardi database to fit theoretical isochrones
2008-5-20IVOA Interoperability Meeting, Trieste 16
Demo 4
• Visualization
2008-5-20IVOA Interoperability Meeting, Trieste 17
Demo 5
• Parallel computation– fitting a 9-parameter star count model in a
8-core server– faster than that in a single-core computer at
a factor of ~8.
2008-5-20IVOA Interoperability Meeting, Trieste 18
Future works
• Release as a tool to the community• Extend cosmology methods• Establish a distributed parallel
computation environment • Deploy an on-line data mining service
2008-5-20IVOA Interoperability Meeting, Trieste 19
Q & A