sparse representation and efcient inference for sdss spectra · methods: principal components...

30
Sparse Representation and Efficient Inference for SDSS Spectra Joseph Richards [email protected] Department of Statistics Carnegie Mellon University INternational Computational Astrostatistics www.incagroup.org ADA V Presentation 09/05/2008 – p.1

Upload: others

Post on 30-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Sparse Representation and EfficientInference for SDSS Spectra

Joseph [email protected]

Department of StatisticsCarnegie Mellon University

INternational Computational Astrostatisticswww.incagroup.org

ADA V Presentation 09/05/2008 – p.1

MotivationAstronomical spectra are data in very high dimensionalspace

Our Goal: Estimate physically-important parameters fora large database of galaxies through their spectra

ADA V Presentation 09/05/2008 – p.2

MotivationAstronomical spectra are data in very high dimensionalspaceSloan Digital Sky Survey (SDSS) spectra: 3850wavelength bins per spectrum

4000 4500 5000 5500 6000 6500 7000 7500 8000 85000

5

10

15

Wavelength

4000 4500 5000 5500 6000 6500 7000 7500 8000 85000

5

10

15

Wavelengthflux

(10e

−17

erg/

cm/s

2 /Ang

)

4000 4500 5000 5500 6000 6500 7000 7500 8000 85000

20

40

60

80

Wavelength

Our Goal: Estimate physically-important parameters fora large database of galaxies through their spectra

ADA V Presentation 09/05/2008 – p.2

MotivationAstronomical spectra are data in very high dimensionalspaceOur Goal: Estimate physically-important parameters fora large database of galaxies through their spectra

ADA V Presentation 09/05/2008 – p.2

OutlineMethods

Principal Components AnalysisDiffusion MapAdaptive Regression

ApplicationsRedshift predictionAge and Metallicity estimation

Future Work

ADA V Presentation 09/05/2008 – p.3

Intro: Dimensionality ReductionIn high dimensions, determining relationships betweendata points is difficult (curse of dimensionality)Explore whether there is a simple, lower-dimensionalrepresentation of the data.Make statistical inferences with this lower-dimensionalparameterization of the data.In reducing dimensionality,

What properties of the original data set do we wishto retain?

ADA V Presentation 09/05/2008 – p.4

Methods: Principal Components Analysis (PCA)

For data X (n x p), compute Σ = Cov(X). Performspectral decomposition of Σ:

UT ΣU = Λ

Λ is the diagonal matrix of eigenvalues of ΣU is a matrix of the corresponding eigenvectorsProject the data X onto the eigenvectors correspondingto the largest m p eigenvaluesPCA projects original data to the m-dimensionalhyperplane that maximizes variance of the new datalow-dimensional PCA representation optimally capturesall Euclidean distances between original data points

ADA V Presentation 09/05/2008 – p.5

PCA Example

Elements of Statistical Learning,Hastie, Tibshirani, and Friedman, pg. 488

ADA V Presentation 09/05/2008 – p.6

Drawbacks to PCA

PCA is a global technique:1) fails to capture local heterogeneous features within a

datapoint2) preserves all (Euclidean) distances between

datapointsIs linear; projects data onto a hyperplaneIt is not robust to noise or outliers

Better strategy: capture the intrinsic geometry of thedata set (i.e. the connectivity of the data)

ADA V Presentation 09/05/2008 – p.7

Drawbacks to PCA

PCA is a global technique:1) fails to capture local heterogeneous features within a

datapoint2) preserves all (Euclidean) distances between

datapointsIs linear; projects data onto a hyperplaneIt is not robust to noise or outliersBetter strategy: capture the intrinsic geometry of thedata set (i.e. the connectivity of the data)

ADA V Presentation 09/05/2008 – p.7

Methods: Diffusion MapsGoal: learn the intrinsic (lower-dimensional) geometryof a dataset in R

p

In high dimensions, distance measures are usually onlyrelevant locally.

Diffusion Map Idea:Take a distance measure that makes sense locally,e.g. s(x,y) = ‖x − y‖2

Use s to define a distance metric, D(x,y), thatreflects the connectivity of data points x and y.Preserve this distance when reducing dimensionality.

ADA V Presentation 09/05/2008 – p.8

Methods: Diffusion MapsGoal: learn the intrinsic (lower-dimensional) geometryof a dataset in R

p

In high dimensions, distance measures are usually onlyrelevant locally.Diffusion Map Idea:

Take a distance measure that makes sense locally,e.g. s(x,y) = ‖x − y‖2

Use s to define a distance metric, D(x,y), thatreflects the connectivity of data points x and y.Preserve this distance when reducing dimensionality.

ADA V Presentation 09/05/2008 – p.8

Diffusion Maps: FormulationStart with a local weight matrix,

w(x,y) = exp

(

−s2(x,y)

ε

)

where ε is a tuning parameter.The diffusion kernel is

p1(x,y) =w(x,y)

zw(x, z)

p1(x,y) can be interpreted as the transition probability ofa Markov random walk on the data.

ADA V Presentation 09/05/2008 – p.9

Diffusion Maps: FormulationRunning this Markov chain forward in time, t, isequivalent to propogating the local influence of eachdata point with its neighborsDefine the diffusion distance (Coifman and Lafon[2006]) Dt for time t > 0 as

D2t (x,y) = ‖pt(x, ·) − pt(y, ·)‖

22 =

z∈Ω

(pt(x, z) − pt(y, z))2

Dt(x,y) is closely related to pt(x,y)

The diffusion distance, Dt, between 2 points is relatedto their connectivity.

ADA V Presentation 09/05/2008 – p.10

Ex: Diffusion vs. Euclidean Distances

−1 −0.5 0 0.5 1−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

A

B

Diffusion distance more accurately describes the amount ofdissimilarity between A and B (Lafon and Lee [2006]).

ADA V Presentation 09/05/2008 – p.11

Diffusion Maps: Dimension Reduction

Perform spectral decomposition of P t:

pt(x,y) =∑

j≥0

λtjψj(x)φj(y)

where as in PCA, we retain the m eigenvectorscorresponding to the m largest nontrivial eigenvaluesThe m-dimensional diffusion map of the data is

Ψt : x 7→ (λt1ψ1(x), λt

2ψ2(x), · · · , λtmψm(x))

where D2t (x,y) ' ||Ψt(x) − Ψt(y)||2

Instead of capturing Euclidean distances (PCA),low-dim. representation captures diffusion distances

ADA V Presentation 09/05/2008 – p.12

Results: z predictionSample of 3846 SDSS galaxy spectraColored by log(zSDSS)

−8 −6 −4 −2 0 2 4

x 10−3−5

0

5

x 10−3

−7

−6

−5

−4

−3

−2

−1

0

1

2x 10−3

PC Map, SDSS spectra, L2 dists, log(redshift) in color

PC1

PC2

PC3

−1.3

−1.2

−1.1

−1

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−3 −2 −1 0 1 2 3 4

x 10−9−5

0

5

10

15

x 10−14

−1.5

−1

−0.5

0

0.5

1

1.5x 10−16

λ1 ψ1

Diffusion Map, SDSS spectra, log(redshift) in color

λ2 ψ2

λ 3 ψ

3−1.3

−1.2

−1.1

−1

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

PC Map Diffusion Map

ADA V Presentation 09/05/2008 – p.13

Methods: Adaptive RegressionAny smooth regression function r can be written as

r(x) =∞

j=1

βjψj(x)

where ψ1, ψ2, ... form an orthonormal basis, x ∈ Rp

Generally, choice of ψj is arbitrary (e.g. high-dimensionalcosine basis, wavelets, curvelets, etc.)

Regression function estimate: r(x) =∑J

j=1βjψj(x)

Adaptive framework: use the learned orthogonal functionestimates ψ1, ..., ψm for X ⊂ R

p from PCA or diffusionmaps.

ADA V Presentation 09/05/2008 – p.14

Results: z predictionCV Pred. Risk versus J

0 50 100 150 200

0.2

0.3

0.4

0.5

0.6

0.7

# of basis functions

CV P

redi

ctio

n Ri

sk

PCDiffusion Map

Find PC and diffusionmap coefficients forall 3846 spectraRegress only onspectra with zSDSS CL> 0.99 (2796, 73%)Choose diffusion mapε that minimizes CVprediction risk.

ADA V Presentation 09/05/2008 – p.15

Results: z predictionPredicted log(1 + z) vs. log(1 + zSDSS)

Optimal model: 20 diffusion map coordinates

0.0 0.1 0.2 0.3

0.0

0.1

0.2

0.3

Redshift predictions for SDSS z CL > 0.99

SDSS log(1+z)

10−f

old

CV p

redi

cted

log(

1+z)

2796 galaxies

0.0 0.1 0.2 0.3

0.0

0.1

0.2

0.3

Redshift predictions for SDSS z CL < 0.99

SDSS log(1+z)

pred

icted

log(

1+z)

0.95<CL<0.99 (663 galaxies)0.9<CL<0.95 (246 galaxies)CL<0.9 (141 galaxies)

zSDSS CL > 0.99 zSDSS CL ≤ 0.99ADA V Presentation 09/05/2008 – p.16

Problem 2: Age & MetallicityOur main goal: Estimate age and metallicity of SDSSgalaxiesTraining set: 1146 synthetic spectra fromBruzual and Charlot [2003] population synthesismodels (Cid Fernandes et al. [2005],www.starlight.ufsc.br)For each synthetic spectrum, we know age & Z.

ADA V Presentation 09/05/2008 – p.17

Maps for synthetic spectraMaps for simulated spectra, color is log(age)

−2 −1 0 1 2 3 4

x 10−3

−2

0

2

x 10−3

−1

−0.5

0

0.5

1

1.5x 10−3

PC Map, SDSS spectra, L2 dists, log(age) in color

PC1

PC2

PC3

6.5

7

7.5

8

8.5

9

9.5

10

−0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06−2

0

2

x 10−4

−1.5

−1

−0.5

0

0.5

1

1.5x 10−7

λ1 ψ1

Diffusion Map, STARLIGHT spectra, L2 dists, log10(age) in color

λ2 ψ2

λ 3 ψ

3

6.5

7

7.5

8

8.5

9

9.5

10

PC Map Diffusion Map

ADA V Presentation 09/05/2008 – p.18

Results: Age, Z predictionAge Metallicity

7 8 9 10

78

910

Starlight Actual vs. Fitted log(Age), best fit (eps=0.02, 188 vars. by CV)

fitted log(Age)

log(

Age)

−4.0 −3.5 −3.0 −2.5 −2.0 −1.5

−4.0

−3.5

−3.0

−2.5

−2.0

−1.5

Starlight Actual vs. Fitted log(Z), best fit (eps=0.02, 188 vars. by CV)

fitted log(Z)

log(

Z)

ADA V Presentation 09/05/2008 – p.19

Future WorkRedshift Prediction

Apply redshift prediction techniques to more SDSSdataPredict redshift using continuum-subtracted spectra

Age & Metallicity EstimationPredict ages and metallicities for SDSS galaxiesCompare results to template-matching method ofCid Fernandes et al. [2005]

ADA V Presentation 09/05/2008 – p.20

Conclusions1. Diffusion map is an effective and efficient dimensionality

reduction technique2. Outperforms linear methods (PCA) in regression tasks

on real astronomical data sets3. Adaptive regression used to perform statistical

inference and choose optimal dimensionality4. Methods applicable for wide variety of astronomical

data sets (e.g. spectra, images, multi-wavelength datasets, etc., etc.) for variety of tasks (e.g. regression,classification, clustering, etc.)

ADA V Presentation 09/05/2008 – p.21

ReferencesG. Bruzual and S. Charlot. Stellar population synthesis at the resolution of 2003. MNRAS,

344:1000–1028, Oct. 2003.

R. Cid Fernandes, A. Mateus, L. Sodré, G. Stasinska, and J. M. Gomes. Semi-empiricalanalysis of Sloan Digital Sky Survey galaxies - I. Spectral synthesis method. MNRAS,358:363–378, Apr. 2005.

R. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis,21(1):5–30, July 2006. URLhttp://dx.doi.org/10.1016\%2Fj.acha.2006.04.006 .

S. Lafon and A. B. Lee. Diffusion maps and coarse-graining: A unified framework fordimensionality reduction, graph partitioning, and data set parameterization. IEEETransactions on Pattern Analysis and Machine Intelligence, 28(9), September 2006.

ADA V Presentation 09/05/2008 – p.22

Appendix: Epsilon Selection

−4.0 −3.5 −3.0 −2.5

0.15

0.20

0.25

0.30

0.35

0.40

Epsilon selection

log10(Epsilon)

min

CV

pred

ictio

n ris

k

ADA V Presentation 09/05/2008 – p.23

Appendix: Eigenvalue Dropoff

0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

0.3

0.35

eige

nval

ue

Eigenvalue dropoff, PC Map, SDSS spectra

0 5 10 15 20 250

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

eige

nval

ue

Eigenvalue dropoff, Diffusion Map, SDSS spectra

PCA Diffusion Mapping

ADA V Presentation 09/05/2008 – p.24

Appendix: Kernel Regression

−1.2 −1.0 −0.8 −0.6 −0.4 −0.2 0.0

−1.2

−1.0

−0.8

−0.6

−0.4

−0.2

0.0

a) Linear regression 3 diffusion coords. (CV risk=36.6)

CV predicted log(z)

SDSS

log(

z)

−1.2 −1.0 −0.8 −0.6 −0.4 −0.2 0.0

−1.2

−1.0

−0.8

−0.6

−0.4

−0.2

0.0

b) Kernel regression 3 diffusion coords. (CV risk=36.2)

CV predicted log(z)

SDSS

log(

z)

ADA V Presentation 09/05/2008 – p.25

Appendix: SDSS-SL calibrationThe following steps are performed to compare syntheticand SDSS spectra:1. SDSS spectra are brought to rest-frame wavelengths

(based on SDSS estimate of z)2. Emission lines in SDSS spectra are masked out3. SL spectra are rebinned to match SDSS binning4. SL and SDSS spectra are normalized to sum to 1 in

overlapping wavelength range5. L2 distance computed between all SL and SDSS

spectraa. SDSS flux errors accounted for, orb. SDSS spectra smoothed first

ADA V Presentation 09/05/2008 – p.26