the vao is operated by the vao, llc. ashish mahabal (aam@astro.caltech.edu)aam@astro.caltech.edu...

Post on 21-Jan-2016

220 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The VAO is operated by the VAO, LLC.

Ashish Mahabal (aam@astro.caltech.edu)

Ciro DonalekMatthew Graham

Ray PlanteGeorge Djorgovski

Data 2 Knowledge study project

VAO-LSST Meeting, NOAO, 24 March 2011

March 23, 2011Ashish Mahabal

2

Goals

• Feasibility study•What is out there• What is needed

• Milestones• What can be done

Exploration of observable parameter spaces and searches for rare or new types of objects

Djorgovski

March 23, 2011Ashish Mahabal

4

Overview – many connections

Astroinformatics (next meeting in Sep. 2011) VOStat and other R/Statistics tools Data challenges Various sky surveys

Related issues Semantics Classification/characterization Distributed data GPUs

Focus on time domain

March 23, 2011Ashish Mahabal

Focus on time-domain5

Expertise, and it encompasses all aspects of data mining (save one)Plus, real-time forces us to be fast.

Portfolio building – growing columns of tablesBayesian networks utilizing auxiliary informationLightcurve techniques for characterizing objects

March 23, 2011Ashish Mahabal

Missing stat and CS tools6

March 23, 2011Ashish Mahabal

Missing stat and CS tools7

Bootstrap aggregatingMixture of expertsBoostingSimulated annealingSemi-supervised learning….

From IVOA KDD User guide for Data Mining (Nick Ball)

March 23, 2011Ashish Mahabal

8

Science goal: to solve the growing gap between the huge generation of data and our understanding of it

Data Gathering (e.g., new generation instruments …)

Data Farming: Storage/ArchivingIndexing, SearchabilityData Fusion, Interoperability, ontologies, etc.

Data Mining (or Knowledge Discovery in Databases):Pattern or correlation searchClustering analysis, automated classificationOutlier / anomaly searchesHyperdimensional visualizationData visualization and understanding

Computer aided understandingKDDEtc.New Knowledge

Data storage , PbytesData access >103 access

Scalability: Petaflops, ExaflopsComputing power (multicore)Algorithm: parallelismVisualization: N-dimensional

March 23, 2011Ashish Mahabal

9

Currently on the plate

• DAME• Knime (Konstanz Information Miner)• Orange (Visual/python)• Weka (ML/Java)• Rapidminer (standalone)

March 23, 2011Ashish Mahabal

10

Comparison matrix for DM/Viz tools

Accuracy Scalability Interpretability Usability Robustness Versatility Speed Popularity

March 23, 2011Ashish Mahabal

11

Related activities

Skyalert integration (Graham) – adding data and methods Solicitation of examples from community

WD, Blazars’ example Making R more astronomy friendly

Various datasets Differing number of rows, columns For supervised/unsupervised classification

TA on GPUs – incorporate in pipeline

March 23, 2011Ashish Mahabal

Slide from Budavari12

CUDA zone, PyCUDA, …

March 23, 2011Ashish Mahabal

VAO People working on this13

• Ashish Mahabal, Ciro Donalek, Matthew Graham, George Djorgovski (Caltech)

• Ray Plante (NCSA)

• But we are in touch with many others in astro/CS/stats and relying on many groups including LSST transients and informatics working groups

top related