codes for astrostatistics: statcodes & vostat eric feigelson penn state
TRANSCRIPT
![Page 1: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d025503460f949d5dcd/html5/thumbnails/1.jpg)
Codes for astrostatistics:StatCodes & VOStat
Eric FeigelsonPenn State
![Page 2: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d025503460f949d5dcd/html5/thumbnails/2.jpg)
Vast range of statistical problemsin modern astronomy
• Poisson processes: point processes, time series analysis• Image analysis: MLE deconvolution, adaptive smoothing,
wavelet analyses• Multivariate analysis & classification (w/ meas errors)• Survival analysis (censoring & truncation w/ meas errors)• Parametric models: Model selection, non-linear regression• Non-parametric methods• Confidence limits: bootstrap resampling• Prior knowledge: Bayesian inference
(see talk at PhysStat 2003 conference)
![Page 3: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d025503460f949d5dcd/html5/thumbnails/3.jpg)
The problem
Astronomers are insufficiently trained in
modern applied statistics …..
but even if they knew what to do, they
inadequate access to computer codes.
![Page 4: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d025503460f949d5dcd/html5/thumbnails/4.jpg)
• Astronomers never use large commercial statistical packages like SAS, SPSS, Statistica
• Some astronomers sometimes use UNIX-based command-line systems like MatLab or S-Plus.
• Astronomers like mini-codes in Numerical Recipes & often write their own codes. Many like IDL which has simple statistics.
• NASA/NSF observatories produce huge data analysis codes (IRAF, AIPS, CIAO, …) which by policy avoid proprietary codes
• A few specialized stand-along astrostat codes written under NASA funding: ROSTAT, ASURV, SLOPES, StatPy
Altogether this is a very bad situation: vast statistical needs
with very inadequate codes
![Page 5: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d025503460f949d5dcd/html5/thumbnails/5.jpg)
The rise of the Virtual Observatory
Vast collections of calibrated data (images, spectra, time series), extracted catalogs (rows=sources, columns=properties), and source bibliographies emerged during the 1990s.
NASA Science Archive Centers (MAST, HEASARC, IRSA, LAMDA), bibliographic databases (ADS, SIMBAD, NED), & more are being transformed into a federated (though still distributed & heterogeneous) system. XML metadata (VOTable), SOAP protocols, … for data mining & extraction.
but originally no plan for visualization &
statistical analysis of extracted datasets
![Page 6: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d025503460f949d5dcd/html5/thumbnails/6.jpg)
StatCodes: A partial solution
• In late-1990s, the Penn State group created a Web metasite with annotated links to ~200 open source packages & codes of utility to astronomers.
• Quite successful: 50-100 hits/day for 7 years.• Multivariate & time series methods most popular.
But the collection of on-line codes was
very inhomogeneous and incomplete
![Page 7: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d025503460f949d5dcd/html5/thumbnails/7.jpg)
RFinally a broad public-domain
statistical software system emerges
Based on the successful commercial UNIX-based
S/S-Plus, R has an interactive command-line feel
(like IDL), flexible data I/O, acceptable graphics,
integration to C/Fortran/Python/…, and quite a lot of
sophisticated statistical methods.
Core R: 2000-page manual with ~200 functionalities, some very complex & advanced
CRAN: 300 add-on packages, dozens useful to astronomers. Some are themselves full systems.
![Page 8: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d025503460f949d5dcd/html5/thumbnails/8.jpg)
VOStat: A Web service
1. Web form interface providing simple statistical R functions with VOTable inputs
2. Same R functions provided through a more sophisticated Java-based grid-computing mode.
User
databases
DispersedVO
VOStatserver
Heavystatistical
computationAnswers
Requests
Heavy data
![Page 9: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State](https://reader035.vdocument.in/reader035/viewer/2022072006/56649d025503460f949d5dcd/html5/thumbnails/9.jpg)
VOStat may be a big improvement but …
• Generic Web-based services are inherently inflexible & limited. VOStat may serve to entice the astronomer to download R & perform the real analysis at home.
• Astronomers need training in advanced methods before using them with R. Penn State has just created a Center for Astrostatistics to develop curriculum, conduct tutorials, provide template R code, etc.
• R/CRAN does not serve huge VO datasets or some special astrostat needs. New methodological/code development underway (CMU, Cornell, PSU, UCIrv,…)