1 10-june-2004andy lawrence : pparc data curation panel meeting astrogrid, data centres, &...
TRANSCRIPT
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 1
AstroGrid, Data Centres, & Edinburgh
What is curation ?Data Centres in the VO eraData curation at WFAUEdinburgh e-Science
what is curation ?
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 3
not just preservation
Data product creationDocumentationPhysical storage, organisation, migrationRelease controlRevision & AnnotationServices attached to holdings
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 4
Data Services
BrowsingDownloadQueriesAnalysis
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 5
not just a cupboard
Plain archive = organised repositoryScience archive = system for doing science
repository + access + services
The Virtual Observatory
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 7
VObs : Generic Science Drivers
data growth : volume and richnessdesire to work on-linemulti-archive sciencelarge database scienceempowerment
--> professional data management
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 8
Data re-use : a market fact
HST : more retrieval than ingestHST : more retrieval than ingest
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 9
VObs : Technical Drivers
data rate, storage and flops : x 1000 /decade but device bw x 10/decade search engines next to the data
backbone network bw ~Gbpsbut end-end bw ~10Mbps analyse in situ : shift results not data
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 10
network development
higher level protocols ==> transparency
TCP/IP message exchangeHTTP doc sharing (web) grid suite CPU sharingXML/SOAP data exchange
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 11
data centre primacy
these developments all point same way
data centres take a central rolethey become active service centresthey need to present a common front
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 12
the VObs concept
web all docs in the world inside your PCVObs all databases in the world inside your PC
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 13
VObs geometry
not a warehousenot a hierarchynot a peer-to-peer system
small set of service centresand large population of end users
consistent with market model or centrally planned model for DCs
the VObs
appl
icat
ion
webservice
job
results
anyt
hin
g
webservice
webservice
webservice
webservice
webservice
Registry Workflow
GLUE Certification MySpace
standard semantics
publish W
SDL
gri
d c
onn
ect
ed
work needed
appl
icat
ion
gridservice
job
results
anyt
hin
g
gridservice
gridservice
gridservice
gridservice
gridservice
Registry
Workflow GLUE AstroPass MySpace
poo
led
res
ourc
e
standard semantics
TOOLS
STANDARDS
INFRASTRUCTURE
TECHNOLOGYRESEARCH
DATA SERVICES(access and analysis)
INF. UPTAKE
DA
TA
PIP
EL
INE
S
ontology
PH
YS
ICA
L G
RID
who does what ?
pipelines DCs
physical grid DCs but we help
infrastructure uptake DCs but we help
data services DCs but we help
technology research AstroGrid
infrastructure AstroGrid
standards IVOA including AstroGrid
tools various but we collaborate
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 17
publishing metaphor
facilities are authorsdata centres are publishersVObs portals are shopsend-users are readers
VObs infrastructure is distribution system.
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 18
Data Centre Alliance (DCA)
AstroGrid-2 and Euro-VO proposals both propose forming a DCAAG2 requested money for baseline DC support VO uptake not funded
VEGA proposed joint pipe/archive development for VISTA, Eddington, GAIA partly funded
data curation at WFAU
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 20
WFAU approach
science products not raw datasurvey or large project datasets live data sets not historical archive not a one-stop shop
focus on on improved service query interface, data mining
work with local computer scientists eg junk detection algorithms XML compression methods
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 21
WFAU now
Legacy photographic atlas 17,000 plates
SuperCOSMOS Science Archive all-sky pixel map and object catalogue SQL interface
6DF redshift survey spectra and image thumbnails SQL interface
SDSS mirror
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 22
WFAU next
WFCAM Science Archive 2005 20TB/yr deep IR sky survey : pixels and catalogues collaboration with CASU and JAC 2MASS, USNO-B, SDSS attached
VISTA 2007 100TB/yr IR and maybe optical
considering : RAVE, JSA, GAIA
Edinburgh e-Science
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 24
Edinburgh activities
School of Informatics AI, databases, DM algorithms,..
National e-Science CentreEPCCNational Digital Curation Centre
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 25
Digital Curation Centre
funded by JISC and EPSRCpartners : Edinburgh, Glasgow, CLRC, UKOLNnot a warehouse ...
research in curation technologydevelopment of standard protocols and policiesadvice, support, training, best practice
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 26
Digital Curation Structure
Archival Storage
Access
Management
Producer
Preservation Planning
Administration
Consumer
Ingest
Data Management
From CCDSD, 2001Lord, Macdonald
10-June-2004 Andy Lawrence : PPARC data curation panel meeting 27
. . . + curation
Scientist
Research Process
Secondary(derived)
data
Tertiarydata for
publication
Primary publication
Secondarypublication
TertiaryPublication
PeerReview
Pre-print
PublicationArchives
Library - Peers - Public - Industry
PublicationProcess
Primary data
Web Content
Patent data
Research Process
Researchbased on
data
CurationCurator
Curation Process
Archiveddata
Data repositories
Metadata