11 sep 2006 nvo summer school 20061 managing data in the vo matthew j. graham cacr/caltech t he us n...

16
11 Sep 2006 NVO Summer School 2006 1 Managing data in the VO Matthew J. Graham CACR/Caltech THE US NATIONAL VIRTUAL OBSERVATORY

Upload: erin-lowe

Post on 27-Mar-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 11 Sep 2006 NVO Summer School 20061 Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

11 Sep 2006

NVO Summer School 2006 1

Managing data in the VO

Matthew J. GrahamCACR/Caltech

THE US NATIONAL VIRTUAL OBSERVATORY

Page 2: 11 Sep 2006 NVO Summer School 20061 Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

11 Sep 2006

NVO Summer School 2006 2

The importance of data

• Data is the raison d’être of the VO• LSST is the data source nonpareil

– data rates of 540MB/s ~16TB in 8 hrs– final archive > 3PB of data

VO Wheel™

• Well-established ways of handling distributed data:

– SRB– PVFS– OGSA-DAI

Page 3: 11 Sep 2006 NVO Summer School 20061 Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

11 Sep 2006

NVO Summer School 2006 3

Requirements

• A distributed storage mechanism that allows easy reference to data without concerns about physical location.

• Primary use cases:– User wants to easily publish and share own data– Data need to reside close to computation nodes

• Data use cases:– Client has data:

• stored locally: transfers it to service• stored locally: service retrieves it• stored elsewhere: service retrieves it

– Service generates data:• stores it locally: notifies client of location• transfers it to the client’s local store• transfers it to a client-designated store

Page 4: 11 Sep 2006 NVO Summer School 20061 Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

11 Sep 2006

NVO Summer School 2006 4

Logical architecture

• User view• Logical namespace• Physical storage

Page 5: 11 Sep 2006 NVO Summer School 20061 Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

11 Sep 2006

NVO Summer School 2006 5

VOSpace

• Provides a uniform interface to existing or new data storage locations (Facade pattern)

• Structured/unstructured data both first level• A peer network of VOSpace servers

Page 6: 11 Sep 2006 NVO Summer School 20061 Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

11 Sep 2006

NVO Summer School 2006 6

Data structures - I

• Each data object is represented as a node:<node/>

• Nodes are identified by a vos://[service]/[name] identifier:<node uri=“vos://nvo.caltech!vospace/mydata1”/>– Why not ivo://nvo.caltech/vospace/mydata1?

– RFC2396 - hierarchy

Page 7: 11 Sep 2006 NVO Summer School 20061 Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

11 Sep 2006

NVO Summer School 2006 7

UnstructuredDataNode

Data structures - II

• Each node contains a map of key:value properties:<node uri=“vos://nvo.caltech!vospace/mydata1”>

<properties><property

uri=“ivo://net.ivoa.vospace/properties/create.date”>2006-09-11T13:35:51Z</property>

</properties></node>

• There are currently four types of node:<node/><node xsi:type=”vos:DataNode”/><node xsi:type=“vos:UnstructuredDataNode”/><node xsi:type=“vos:StructuredDataNode”/>

Node

DataNode

StructuredDataNode

readonly=“true”

Page 8: 11 Sep 2006 NVO Summer School 20061 Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

11 Sep 2006

NVO Summer School 2006 8

Data structures - III

• Data nodes contain a list of data views (formats) that the node can accept and provide:<node xsi:type=“vos:UnstructuredDataNode”

uri=“vos://nvo.caltech!vospace/mydata1”>…<views>

<accepts><view uri=“ivo://net.ivoa.vospace/views/any”/></accepts><provides><view uri=“ivo://net.ivoa.vospace/views/votable-

1.1”/></provides>

</views></node>

Page 9: 11 Sep 2006 NVO Summer School 20061 Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

11 Sep 2006

NVO Summer School 2006 9

Data structures - IV

<node xsi:type=“vos:StructuredDataNode” uri=“vos://nvo.caltech!vospace/mydata1”>

…<views>

<accepts><view uri=“ivo://net.ivoa.vospace/views/votable-1.1”/>

</accepts><provides>

<view uri=“ivo://net.ivoa.vospace/views/votable-1.1” original=“true”/><view uri=“ivo://net.ivoa.vospace/views/votable-1.0”/>

</provides></views>

</node>– Why not use MIME type?

• Easier to define new astronomy specific data types

Page 10: 11 Sep 2006 NVO Summer School 20061 Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

11 Sep 2006

NVO Summer School 2006 10

Data structures - V

• Data transfers are represented by transfers:<transfer/>

• The format of the data transfer is specified by a view:<transfer>

<view uri=“ivo://net.ivoa/vospace/views/votable-1.1”/></transfer>

• The protocol of the data transfer is specified by a protocol:<transfer>

…<protocols>

<protocol uri=“http://net.ivoa/vospace/protocols/http-get”><endpoint=“http://192.168.1.33:7007/vospace”/>

</protocol><protocols>

</transfer>

Page 11: 11 Sep 2006 NVO Summer School 20061 Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

11 Sep 2006

NVO Summer School 2006 11

Data structures - VI

• The space has a list of which protocols the service can accept to fetch data and what protocol endpoints it provides:

<protocols><accepts>

<protocol uri=“ivo://net.ivoa.vospace/protocols/ftp-get”/><protocol uri=“ivo://net.ivoa.vospace/protocols/ftp-put”/><protocol uri=“ivo://net.ivoa.vospace/protocols/http-get”/><protocol uri=“ivo://net.ivoa.vospace/protocols/http-put”/>

</accepts><provides>

<protocol uri=“ivo://net.ivoa.vospace/protocols/http-get”/><protocol uri=“ivo://net.ivoa.vospace/protocols/http-get”/>

</provides></protocols>• Why not use protocol schemes?

Page 12: 11 Sep 2006 NVO Summer School 20061 Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

11 Sep 2006

NVO Summer School 2006 12

Operations - I

• Service metadata:– getProtocols(): <protocols>– getViews(): <accepts>, <provides>– getProperties(): <accepts>, <provides>, <contains>

• Creating and manipulating nodes– createNode(<node>): <node>– deleteNode(uri): -– listNodes(token, limit, detail, <nodes>): token, limit,

<nodes> – moveNode(uri, <node>): <node>– copyNode(uri, <node>): <node>

Page 13: 11 Sep 2006 NVO Summer School 20061 Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

11 Sep 2006

NVO Summer School 2006 13

Operations - II

• Manipulating node metadata– getNode(uri): <node>– setNode(<node>): <node>

• Transferring data– pushToVoSpace(<node>, <transfer>): <node>,

<transfer>– pullToVoSpace(<node>, <transfer>): <node>– pushFromVoSpace(uri, <transfer>): -– pullFromVoSpace(uri, <transfer>): <transfer>

Page 14: 11 Sep 2006 NVO Summer School 20061 Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

11 Sep 2006

NVO Summer School 2006 14

Authentication and authorization

• WS-Security• Access policies:

– No access control– No authorization but authentication– Clients may not create or change nodes– Nodes are considered to be owner by the

user who created them.

Page 15: 11 Sep 2006 NVO Summer School 20061 Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

11 Sep 2006

NVO Summer School 2006 15

Forthcoming attractions

• Containers• Links• Asynchronous transfers• Querying• Replicas

Page 16: 11 Sep 2006 NVO Summer School 20061 Managing data in the VO Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY

11 Sep 2006

NVO Summer School 2006 16

Federation by links