2008-5-20ivoa interoperability meeting, trieste1 mining data using matlab through astrobox chao liu,...

Post on 27-Mar-2015

228 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2008-5-20IVOA Interoperability Meeting, Trieste 1

Mining data using MATLAB through AstroBox

Chao LIU, Chenzhou CUIPresented by: Chenzhou CUI

National Astronomical Observatory, China

The Chinese VIRTUAL OBSERVATORY

2008-5-20IVOA Interoperability Meeting, Trieste 2

China-VO

• Chinese Virtual Observatory (China-VO) is the national VO project in China initiated in 2002 by Chinese astronomical community led by National Astronomical Observatories, Chinese Academy of Sciences.

• It focuses its research and development on VO science and applications.

• R&D focuses:– China-VO Platform – Unified Access to On-line Astronomical Resources and

Services – VO-ready Projects and Facilities – VO-based Astronomical Research Activities – VO-based Public Education

2008-5-20IVOA Interoperability Meeting, Trieste 3

An active IVOA member

IVOA 2007, Beijing

1st Small projects meeting, 2003

2008-5-20IVOA Interoperability Meeting, Trieste 4

Our products

• VOFilter– an XML filter for OpenOffice.org Calc

to open VOTable files• SkyMouse

– A Smart On-line Astronomical Information Collector

• FitHAS– FITS Header Archiving System

• VO-DAS– An OGSA-DAI based data access

service system to provide unified access to astronomy data, including catalogs, images and spectra.

• AstroBox– Coming soon– ...

http://services.china-vo.org

2008-5-20IVOA Interoperability Meeting, Trieste 5

First Science Paper from China-VO

• SDSS DR5 photometric data were searched for new Milky Way companions or substructures in the Galactic halo.

• Data analysis procedures were based on the VO-DAS.• Five candidates are identified as over-dense faint stellar

sources that have color-magnitude diagrams similar to those of known globular clusters, or dwarf spherical galaxies.– ( Liu et al., 2008, A&A)

2008-5-20IVOA Interoperability Meeting, Trieste 6

AstroBox: Goals

• To provide an astronomical data mining application service, supporting VO protocols and tools

• To provide an network environment for time-consuming astronomical data mining computing

• A high-level data analysis environment, NOT a raw data analysis tool as IRAF

2008-5-20IVOA Interoperability Meeting, Trieste 7

General procedures of data mining

• Data Accessing– query database– high volume of data

• Data Pre-processing– select qualified data– eliminate BAD data

• Data Mining– try multiple times and find a way to get unknown

knowledge from specific data set• Data Analysis and Interpretation

– visualization– comparisons with different data source– associate results with physical meaning

2008-5-20IVOA Interoperability Meeting, Trieste 8

An introduction to MATLAB

• MATLAB is a popular numerical computation software used in variant fields.

• It provides dozens of toolboxes for different purposes, e.g. statistics, pattern recognizing, optimizing, neural networks etc., as well as a number of way to access data from either local or remote sites.

• It also offers visualizations by flexible 2D and 3D graphics routines.

• It supports Java, C, and Fortran as well as its own M-language.• It is available of accessing URL resources and parsing XML,

which is necessary for embedding web service.• In its latest release, refined parallel computation is ready.• We conclude that MATLAB is one of the best platforms

on which astronomical data mining tools can be developed

2008-5-20IVOA Interoperability Meeting, Trieste 9

AstroBox

• AstroBox is a plug-in package for MATLAB to be used for astronomical computing and data mining

• It comprises of:– PLASTIC– VOTable– Local DB– VO-DAS client– Astronomical algorithms

VO-DAS

MATLAB VO-DAS Client

MA

TL

AB

D

ata

base

Too

lbox

Local DBJa

vaLi

brar

ies

VO

Tab

les

PLASTIC

VO Tools(Aladin, TOPCAT)

AstroBoxAstroBox

2008-5-20IVOA Interoperability Meeting, Trieste 10

VO utilities in AstroBox

• VOTable access and conversion– integrate STILS package

• PLASTIC availability– embed a Java subroutine to connect to PLASTIC Hub

through which to exchange data and messages with third party applications, e. g. Aladin and TOPCAT.

– SAMP support next...

• VO-DAS client interface– embed a VO-DAS command line client to send an

ADQL to VO-DAS server and wait for query result– It is also capable for asynchronous query, which can

access millions of rows of data (on going)

2008-5-20IVOA Interoperability Meeting, Trieste 11

Data mining support

• Regressions– linear regression

• inherited from MATLAB– nonlinear regression

• provide astronomical common regressive functions, e.g. King model for density profile of a dwarf galaxy.

– kernel regression• Fitting

– provide specific algorithms for non analytic expression such as complicated observation dataset or user defined functions

– several times faster than existed MATLAB functions• Spherical surface projecting functions

– Equatorial projection & Galactic projection– equal-area Lambert projection in particular for density

measurement on spherical surface– Aitoff projection for overall viewing

• Visualizing functions– 2-D plotting– 3-D plotting– modified on existed MATLAB functions

2008-5-20IVOA Interoperability Meeting, Trieste 12

Other functions

• High level functionsaiming at specific research topics, most of which currently

are Milky Way related– Kurucz stellar model– Gerardi stellar population model– isochrone fitting the stellar population– Galactic star count model with disk and halo components– Chemical evolution model for stellar population (on going)

• Most common used utilities– Monte Carlo methods– coordination transformations– magnitude system transformations

2008-5-20IVOA Interoperability Meeting, Trieste 13

Demos 1

• PLASTIC implementation

2008-5-20IVOA Interoperability Meeting, Trieste 14

Demo 2

• Special regression– using a hyperbola relationship

between independent and dependent variables

• Model fitting– density profiles of candidate dwarf

galaxy

2008-5-20IVOA Interoperability Meeting, Trieste 15

Demo 3

• Isochrone fitting– observed data are

accessed from either local database or VO-DAS server

– query reference data from Gerardi database to fit theoretical isochrones

2008-5-20IVOA Interoperability Meeting, Trieste 16

Demo 4

• Visualization

2008-5-20IVOA Interoperability Meeting, Trieste 17

Demo 5

• Parallel computation– fitting a 9-parameter star count model in a

8-core server– faster than that in a single-core computer at

a factor of ~8.

2008-5-20IVOA Interoperability Meeting, Trieste 18

Future works

• Release as a tool to the community• Extend cosmology methods• Establish a distributed parallel

computation environment • Deploy an on-line data mining service

2008-5-20IVOA Interoperability Meeting, Trieste 19

Q & A

top related