bringing order to chaos new data-mining techniques for new surveys

10
Peter-Christian Zinn | Bringing order to chaos | SKANZ 2012 | Auckland, New Zealand 101010101110101010100101101001010010100111010100100010010101110101011010101 111010101010010110100101001010011101010010001001010111010101101010101110101 101001011010010100101001110101001000100101011101010110101010111010101010010 101001010010100111010100100010010101110101011010101011101010101001011010010 001010011101010010001001010111010101101010101110101010100101101001010010100 110101001000100101011101010110101010111010101010010110100101001010011101010 100010010101110101011010101011101010101001011010010100101001110101001000100 010111010101101010101110101010100101101001010010100111010100100010010101110 010110101010111010101010010110100101001010011101010010001001010111010101101 101011101010101001011010010100101001110101001000100101011101010110101010111 101010100101101001010010100111010100100010010101110101011010101011101010101 010110100101001010011101010010001001010111010101101010101110101010100101101 010100101001110101001000100101011101010110101010111010101010010110100101001 100111010100100010010101110101011010101011101010101001011010010100101001110 010010001001010111010101101010101110101010100101101001010010100111010100100 100101011101010110101010111010101010010110100101001010011101010010001001010 110101011010101011101010101001011010010100101001110101001000100101011101010 101010101110101010100101101001010010100111010100100010010101110101011010101 111010101010010110100101001010011101010010001001010111010101101010101110101 101001011010010100101001110101001000100101011101010110101010111010101010010 101001010010100111010100100010010101110101011010101011101010101001011010010 001010011101010010001001010111010101101010101110101010100101101001010010100 110101001000100101011101010110101010111010101010010110100101001010011101010 100010010101110101011010101011101010101001011010010100101001110101001000100 010111010101101010101110101010100101101001010010100111010100100010010101110 010110101010111010101010010110100101001010011101010010001001010111010101101 101011101010101001011010010100101001110101001000100101011101010110101010111 101010100101101001010010100111010100100010010101110101011010101011101010101 010110100101001010011101010010001001010111010101101010101110101010100101101 010100101001110101001000100101011101010110101010111010101010010110100101001 Bringing order to chaos New data-mining techniques for new surveys Peter-Christian Zinn Astronomical Institute of Ruhr-University, Bochum, Germany CSIRO Astronomy & Space Science, Sydney, Australia RUHR-UNIVERSITÄT BOCHUM

Upload: yakov

Post on 04-Jan-2016

33 views

Category:

Documents


1 download

DESCRIPTION

- PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bringing order to chaos New data-mining techniques for new surveys

Peter-Christian Zinn | Bringing order to chaos | SKANZ 2012 | Auckland, New Zealand

10101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000

Bringing order to chaosNew data-mining techniques for new surveys

Peter-Christian ZinnAstronomical Institute of Ruhr-University, Bochum, GermanyCSIRO Astronomy & Space Science, Sydney, Australia

RUHR-UNIVERSITÄT BOCHUM

Page 2: Bringing order to chaos New data-mining techniques for new surveys

Peter-Christian Zinn | Bringing order to chaos | SKANZ 2012 | Auckland, New Zealand

Why new data handling techniques? New radio surveys will produce lots of

data!− ASKAP/EMU ~ 70 million objects− LOFAR/Tier 1 ~ 7 million objects− WSRT/WODAN ~ 10 million objects

New optical/NIR surveys will produce even more data!− Pan-STARRS/PS1-3π

~ 5-30 billion objects− LSST/Galaxy “gold sample“

~ 10 billion objects Astronomers go wild!− Tera-, exa-, petabyte scale

∑~70 million

Norris et al. (2011)

LSST Science Book

Mostly no

spectra availa

ble

Page 3: Bringing order to chaos New data-mining techniques for new surveys

Peter-Christian Zinn | Bringing order to chaos | SKANZ 2012 | Auckland, New Zealand 3/10

Implications for survey science There are no spectroscopic redshifts− Redshift information must be accessed on other

ways → photometric (better: statistical) redshifts There are no spectral classifications− Classification of an object must be inferred on other

ways → Flux ratios or SED-fitting (better: kNN classification) becomes more important

There are no spectroscopically derived parameters− Classic parameters such as metallicity must be

derived on other ways → scaling relations (better: kNN regression) must be utilized

Page 4: Bringing order to chaos New data-mining techniques for new surveys

Peter-Christian Zinn | Bringing order to chaos | SKANZ 2012 | Auckland, New Zealand

Common approaches Define plain color criteria Model SEDs Look for morphology, scaling

relations, ...

− PROs:◦ Physically motivated◦ Easy to reproduce in 2d diagrams◦ High completeness

− CONs:◦ Global model◦ Does not work for high dimensions◦ Many false positive candidates

Fan et al. 2001

Page 5: Bringing order to chaos New data-mining techniques for new surveys

Peter-Christian Zinn | Bringing order to chaos | SKANZ 2012 | Auckland, New Zealand 5/10

use k-Nearest Neighbours− local model− works fine in high dimensions− does not require physical assumptions− good reference samples available

?

Our approach: k nearest neighbors

4

0 0

0mean=1median=0

Page 6: Bringing order to chaos New data-mining techniques for new surveys

Peter-Christian Zinn | Bringing order to chaos | SKANZ 2012 | Auckland, New Zealand 6/10

PCZ, Polsterer & Gieseke (subm.)

Example 1: statistical redshifts stat-z for ATLAS

− ATLAS has spec-z for ~30% of allobjects

− Training with 12-band data (ugriz,IRAC,MIPS24,13cm,20cm)

Advantages of statistical redshifts− No assumptions must be made (no template SEDs,

luminosity range, dust reddening, flux homogenization, ...)

− Computation much faster than for class. photo-z (tstat-

z~ n*log2(n) | tphoto-z~ nα , α>2)

Comparison:Cardamone et al. (2010) 14-band photo-z: 0.026

PCZ, Polsterer & Gieseke (subm.)

Page 7: Bringing order to chaos New data-mining techniques for new surveys

Peter-Christian Zinn | Bringing order to chaos | SKANZ 2012 | Auckland, New Zealand 7/10

Redshift estimation for SDSS quasars kNN regression model + selected reference set− 77,000 references reduced to 1,100 objects− optimized for z > 4.8− 4 colors used

Laurino et al. (2011)Polsterer, PCZ & Gieseke (2012)

Page 8: Bringing order to chaos New data-mining techniques for new surveys

Peter-Christian Zinn | Bringing order to chaos | SKANZ 2012 | Auckland, New Zealand 8/10

Example 2: object classification SF / AGN separation− Classical tool: BPT-diagram (requires

spectroscopy)− Alternative: MIR color-color selection

(not very reliable)− SED fitting (work-intensive)

− kNN-based classification of ATLAS test-sample yields combined false classification rate of 9%

− Smolcic et al. (2008) achieve contamination rates between 15% - 20% using a highly sophisticated photometric method

PCZ et al. (in prep)

Page 9: Bringing order to chaos New data-mining techniques for new surveys

Peter-Christian Zinn | Bringing order to chaos | SKANZ 2012 | Auckland, New Zealand 9/10

Example 3: metallicity Metallicity from L-Z relation

− Spectroscopic input: SDSS metallicities as derived by Brinchman et al. (2004)

− Lr-Z relation calibrated by the 2dF survey (Lamareille et al. 2004) applied to Galactic extinction-corrected fluxes

− No other assumptions made

Metallicity from kNN regression− Spectroscopic input: SDSS metallici-ties

derived by Brinchman+ (2004)− kNN regression with respect to the 90 nearest

neighbors− No other assumptions made

PCZ, Polsterer & Gieseke (subm.) PCZ, Polsterer & Gieseke (subm.)

Page 10: Bringing order to chaos New data-mining techniques for new surveys

Peter-Christian Zinn | Bringing order to chaos | SKANZ 2012 | Auckland, New Zealand 10/10

Summary

We presented the first results of utilizing advanced machine-learning techniques to classify/analyze large data sets.

Dealing with large data sets will become increasingly important due to the enormous amounts of data forthcoming (radio) surveys will produce.

A k nearest neighbor-based approach was tested on available data from ATLAS, COSMOS and the SDSS.

Results for redshifts, object classifications and the regressional computation of astrophysical quantities (e.g. metallicity) all yield promising results.

Data-mining will already play an important role in currently upcoming projects, e.g. ASKAP/EMU.