11 sep 2006 nvo summer school 20061 managing data in the vo matthew j. graham cacr/caltech t he us n...
TRANSCRIPT
11 Sep 2006
NVO Summer School 2006 1
Managing data in the VO
Matthew J. GrahamCACR/Caltech
THE US NATIONAL VIRTUAL OBSERVATORY
11 Sep 2006
NVO Summer School 2006 2
The importance of data
• Data is the raison d’être of the VO• LSST is the data source nonpareil
– data rates of 540MB/s ~16TB in 8 hrs– final archive > 3PB of data
VO Wheel™
• Well-established ways of handling distributed data:
– SRB– PVFS– OGSA-DAI
11 Sep 2006
NVO Summer School 2006 3
Requirements
• A distributed storage mechanism that allows easy reference to data without concerns about physical location.
• Primary use cases:– User wants to easily publish and share own data– Data need to reside close to computation nodes
• Data use cases:– Client has data:
• stored locally: transfers it to service• stored locally: service retrieves it• stored elsewhere: service retrieves it
– Service generates data:• stores it locally: notifies client of location• transfers it to the client’s local store• transfers it to a client-designated store
11 Sep 2006
NVO Summer School 2006 4
Logical architecture
• User view• Logical namespace• Physical storage
11 Sep 2006
NVO Summer School 2006 5
VOSpace
• Provides a uniform interface to existing or new data storage locations (Facade pattern)
• Structured/unstructured data both first level• A peer network of VOSpace servers
11 Sep 2006
NVO Summer School 2006 6
Data structures - I
• Each data object is represented as a node:<node/>
• Nodes are identified by a vos://[service]/[name] identifier:<node uri=“vos://nvo.caltech!vospace/mydata1”/>– Why not ivo://nvo.caltech/vospace/mydata1?
– RFC2396 - hierarchy
11 Sep 2006
NVO Summer School 2006 7
UnstructuredDataNode
Data structures - II
• Each node contains a map of key:value properties:<node uri=“vos://nvo.caltech!vospace/mydata1”>
<properties><property
uri=“ivo://net.ivoa.vospace/properties/create.date”>2006-09-11T13:35:51Z</property>
</properties></node>
• There are currently four types of node:<node/><node xsi:type=”vos:DataNode”/><node xsi:type=“vos:UnstructuredDataNode”/><node xsi:type=“vos:StructuredDataNode”/>
Node
DataNode
StructuredDataNode
readonly=“true”
11 Sep 2006
NVO Summer School 2006 8
Data structures - III
• Data nodes contain a list of data views (formats) that the node can accept and provide:<node xsi:type=“vos:UnstructuredDataNode”
uri=“vos://nvo.caltech!vospace/mydata1”>…<views>
<accepts><view uri=“ivo://net.ivoa.vospace/views/any”/></accepts><provides><view uri=“ivo://net.ivoa.vospace/views/votable-
1.1”/></provides>
</views></node>
11 Sep 2006
NVO Summer School 2006 9
Data structures - IV
<node xsi:type=“vos:StructuredDataNode” uri=“vos://nvo.caltech!vospace/mydata1”>
…<views>
<accepts><view uri=“ivo://net.ivoa.vospace/views/votable-1.1”/>
</accepts><provides>
<view uri=“ivo://net.ivoa.vospace/views/votable-1.1” original=“true”/><view uri=“ivo://net.ivoa.vospace/views/votable-1.0”/>
</provides></views>
</node>– Why not use MIME type?
• Easier to define new astronomy specific data types
11 Sep 2006
NVO Summer School 2006 10
Data structures - V
• Data transfers are represented by transfers:<transfer/>
• The format of the data transfer is specified by a view:<transfer>
<view uri=“ivo://net.ivoa/vospace/views/votable-1.1”/></transfer>
• The protocol of the data transfer is specified by a protocol:<transfer>
…<protocols>
<protocol uri=“http://net.ivoa/vospace/protocols/http-get”><endpoint=“http://192.168.1.33:7007/vospace”/>
</protocol><protocols>
</transfer>
11 Sep 2006
NVO Summer School 2006 11
Data structures - VI
• The space has a list of which protocols the service can accept to fetch data and what protocol endpoints it provides:
<protocols><accepts>
<protocol uri=“ivo://net.ivoa.vospace/protocols/ftp-get”/><protocol uri=“ivo://net.ivoa.vospace/protocols/ftp-put”/><protocol uri=“ivo://net.ivoa.vospace/protocols/http-get”/><protocol uri=“ivo://net.ivoa.vospace/protocols/http-put”/>
</accepts><provides>
<protocol uri=“ivo://net.ivoa.vospace/protocols/http-get”/><protocol uri=“ivo://net.ivoa.vospace/protocols/http-get”/>
</provides></protocols>• Why not use protocol schemes?
11 Sep 2006
NVO Summer School 2006 12
Operations - I
• Service metadata:– getProtocols(): <protocols>– getViews(): <accepts>, <provides>– getProperties(): <accepts>, <provides>, <contains>
• Creating and manipulating nodes– createNode(<node>): <node>– deleteNode(uri): -– listNodes(token, limit, detail, <nodes>): token, limit,
<nodes> – moveNode(uri, <node>): <node>– copyNode(uri, <node>): <node>
11 Sep 2006
NVO Summer School 2006 13
Operations - II
• Manipulating node metadata– getNode(uri): <node>– setNode(<node>): <node>
• Transferring data– pushToVoSpace(<node>, <transfer>): <node>,
<transfer>– pullToVoSpace(<node>, <transfer>): <node>– pushFromVoSpace(uri, <transfer>): -– pullFromVoSpace(uri, <transfer>): <transfer>
11 Sep 2006
NVO Summer School 2006 14
Authentication and authorization
• WS-Security• Access policies:
– No access control– No authorization but authentication– Clients may not create or change nodes– Nodes are considered to be owner by the
user who created them.
11 Sep 2006
NVO Summer School 2006 15
Forthcoming attractions
• Containers• Links• Asynchronous transfers• Querying• Replicas
11 Sep 2006
NVO Summer School 2006 16
Federation by links