codes for astrostatistics: statcodes & vostat eric feigelson penn state

9
Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State

Upload: moris-palmer

Post on 17-Dec-2015

218 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State

Codes for astrostatistics:StatCodes & VOStat

Eric FeigelsonPenn State

Page 2: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State

Vast range of statistical problemsin modern astronomy

• Poisson processes: point processes, time series analysis• Image analysis: MLE deconvolution, adaptive smoothing,

wavelet analyses• Multivariate analysis & classification (w/ meas errors)• Survival analysis (censoring & truncation w/ meas errors)• Parametric models: Model selection, non-linear regression• Non-parametric methods• Confidence limits: bootstrap resampling• Prior knowledge: Bayesian inference

(see talk at PhysStat 2003 conference)

Page 3: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State

The problem

Astronomers are insufficiently trained in

modern applied statistics …..

but even if they knew what to do, they

inadequate access to computer codes.

Page 4: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State

• Astronomers never use large commercial statistical packages like SAS, SPSS, Statistica

• Some astronomers sometimes use UNIX-based command-line systems like MatLab or S-Plus.

• Astronomers like mini-codes in Numerical Recipes & often write their own codes. Many like IDL which has simple statistics.

• NASA/NSF observatories produce huge data analysis codes (IRAF, AIPS, CIAO, …) which by policy avoid proprietary codes

• A few specialized stand-along astrostat codes written under NASA funding: ROSTAT, ASURV, SLOPES, StatPy

Altogether this is a very bad situation: vast statistical needs

with very inadequate codes

Page 5: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State

The rise of the Virtual Observatory

Vast collections of calibrated data (images, spectra, time series), extracted catalogs (rows=sources, columns=properties), and source bibliographies emerged during the 1990s.

NASA Science Archive Centers (MAST, HEASARC, IRSA, LAMDA), bibliographic databases (ADS, SIMBAD, NED), & more are being transformed into a federated (though still distributed & heterogeneous) system. XML metadata (VOTable), SOAP protocols, … for data mining & extraction.

but originally no plan for visualization &

statistical analysis of extracted datasets

Page 6: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State

StatCodes: A partial solution

• In late-1990s, the Penn State group created a Web metasite with annotated links to ~200 open source packages & codes of utility to astronomers.

• Quite successful: 50-100 hits/day for 7 years.• Multivariate & time series methods most popular.

But the collection of on-line codes was

very inhomogeneous and incomplete

Page 7: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State

RFinally a broad public-domain

statistical software system emerges

Based on the successful commercial UNIX-based

S/S-Plus, R has an interactive command-line feel

(like IDL), flexible data I/O, acceptable graphics,

integration to C/Fortran/Python/…, and quite a lot of

sophisticated statistical methods.

Core R: 2000-page manual with ~200 functionalities, some very complex & advanced

CRAN: 300 add-on packages, dozens useful to astronomers. Some are themselves full systems.

Page 8: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State

VOStat: A Web service

1. Web form interface providing simple statistical R functions with VOTable inputs

2. Same R functions provided through a more sophisticated Java-based grid-computing mode.

User

databases

DispersedVO

VOStatserver

Heavystatistical

computationAnswers

Requests

Heavy data

Page 9: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State

VOStat may be a big improvement but …

• Generic Web-based services are inherently inflexible & limited. VOStat may serve to entice the astronomer to download R & perform the real analysis at home.

• Astronomers need training in advanced methods before using them with R. Penn State has just created a Center for Astrostatistics to develop curriculum, conduct tutorials, provide template R code, etc.

• R/CRAN does not serve huge VO datasets or some special astrostat needs. New methodological/code development underway (CMU, Cornell, PSU, UCIrv,…)