2008-5-20ivoa interoperability meeting, trieste1 mining data using matlab through astrobox chao liu,...

19
2008-5-20 IVOA Interoperability Meeting, Trieste 1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical Observatory, China The Chinese VIRTUAL OBSERVATORY

Upload: sara-bowman

Post on 27-Mar-2015

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 1

Mining data using MATLAB through AstroBox

Chao LIU, Chenzhou CUIPresented by: Chenzhou CUI

National Astronomical Observatory, China

The Chinese VIRTUAL OBSERVATORY

Page 2: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 2

China-VO

• Chinese Virtual Observatory (China-VO) is the national VO project in China initiated in 2002 by Chinese astronomical community led by National Astronomical Observatories, Chinese Academy of Sciences.

• It focuses its research and development on VO science and applications.

• R&D focuses:– China-VO Platform – Unified Access to On-line Astronomical Resources and

Services – VO-ready Projects and Facilities – VO-based Astronomical Research Activities – VO-based Public Education

Page 3: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 3

An active IVOA member

IVOA 2007, Beijing

1st Small projects meeting, 2003

Page 4: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 4

Our products

• VOFilter– an XML filter for OpenOffice.org Calc

to open VOTable files• SkyMouse

– A Smart On-line Astronomical Information Collector

• FitHAS– FITS Header Archiving System

• VO-DAS– An OGSA-DAI based data access

service system to provide unified access to astronomy data, including catalogs, images and spectra.

• AstroBox– Coming soon– ...

http://services.china-vo.org

Page 5: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 5

First Science Paper from China-VO

• SDSS DR5 photometric data were searched for new Milky Way companions or substructures in the Galactic halo.

• Data analysis procedures were based on the VO-DAS.• Five candidates are identified as over-dense faint stellar

sources that have color-magnitude diagrams similar to those of known globular clusters, or dwarf spherical galaxies.– ( Liu et al., 2008, A&A)

Page 6: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 6

AstroBox: Goals

• To provide an astronomical data mining application service, supporting VO protocols and tools

• To provide an network environment for time-consuming astronomical data mining computing

• A high-level data analysis environment, NOT a raw data analysis tool as IRAF

Page 7: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 7

General procedures of data mining

• Data Accessing– query database– high volume of data

• Data Pre-processing– select qualified data– eliminate BAD data

• Data Mining– try multiple times and find a way to get unknown

knowledge from specific data set• Data Analysis and Interpretation

– visualization– comparisons with different data source– associate results with physical meaning

Page 8: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 8

An introduction to MATLAB

• MATLAB is a popular numerical computation software used in variant fields.

• It provides dozens of toolboxes for different purposes, e.g. statistics, pattern recognizing, optimizing, neural networks etc., as well as a number of way to access data from either local or remote sites.

• It also offers visualizations by flexible 2D and 3D graphics routines.

• It supports Java, C, and Fortran as well as its own M-language.• It is available of accessing URL resources and parsing XML,

which is necessary for embedding web service.• In its latest release, refined parallel computation is ready.• We conclude that MATLAB is one of the best platforms

on which astronomical data mining tools can be developed

Page 9: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 9

AstroBox

• AstroBox is a plug-in package for MATLAB to be used for astronomical computing and data mining

• It comprises of:– PLASTIC– VOTable– Local DB– VO-DAS client– Astronomical algorithms

VO-DAS

MATLAB VO-DAS Client

MA

TL

AB

D

ata

base

Too

lbox

Local DBJa

vaLi

brar

ies

VO

Tab

les

PLASTIC

VO Tools(Aladin, TOPCAT)

AstroBoxAstroBox

Page 10: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 10

VO utilities in AstroBox

• VOTable access and conversion– integrate STILS package

• PLASTIC availability– embed a Java subroutine to connect to PLASTIC Hub

through which to exchange data and messages with third party applications, e. g. Aladin and TOPCAT.

– SAMP support next...

• VO-DAS client interface– embed a VO-DAS command line client to send an

ADQL to VO-DAS server and wait for query result– It is also capable for asynchronous query, which can

access millions of rows of data (on going)

Page 11: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 11

Data mining support

• Regressions– linear regression

• inherited from MATLAB– nonlinear regression

• provide astronomical common regressive functions, e.g. King model for density profile of a dwarf galaxy.

– kernel regression• Fitting

– provide specific algorithms for non analytic expression such as complicated observation dataset or user defined functions

– several times faster than existed MATLAB functions• Spherical surface projecting functions

– Equatorial projection & Galactic projection– equal-area Lambert projection in particular for density

measurement on spherical surface– Aitoff projection for overall viewing

• Visualizing functions– 2-D plotting– 3-D plotting– modified on existed MATLAB functions

Page 12: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 12

Other functions

• High level functionsaiming at specific research topics, most of which currently

are Milky Way related– Kurucz stellar model– Gerardi stellar population model– isochrone fitting the stellar population– Galactic star count model with disk and halo components– Chemical evolution model for stellar population (on going)

• Most common used utilities– Monte Carlo methods– coordination transformations– magnitude system transformations

Page 13: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 13

Demos 1

• PLASTIC implementation

Page 14: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 14

Demo 2

• Special regression– using a hyperbola relationship

between independent and dependent variables

• Model fitting– density profiles of candidate dwarf

galaxy

Page 15: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 15

Demo 3

• Isochrone fitting– observed data are

accessed from either local database or VO-DAS server

– query reference data from Gerardi database to fit theoretical isochrones

Page 16: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 16

Demo 4

• Visualization

Page 17: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 17

Demo 5

• Parallel computation– fitting a 9-parameter star count model in a

8-core server– faster than that in a single-core computer at

a factor of ~8.

Page 18: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 18

Future works

• Release as a tool to the community• Extend cosmology methods• Establish a distributed parallel

computation environment • Deploy an on-line data mining service

Page 19: 2008-5-20IVOA Interoperability Meeting, Trieste1 Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical

2008-5-20IVOA Interoperability Meeting, Trieste 19

Q & A