getting started with the talis platform
DESCRIPTION
Developer training session providing an overview of the core features and services of the Talis Platform. Includes basic overview of REST and RDFTRANSCRIPT
Getting Started with the Talis PlatformLeigh Dodds
Platform Programme Manager
Talis
December 2008
http://creativecommons.org/licenses/by/2.0/uk/
shared innovation
Agenda
• Platform Overview• Core Concepts• Review of the RDF Model• Managing binary data• Managing structured metadata• Exploring RDF data with SPARQL• Extra Features• Store Administration• Summary
Platform Overview
shared innovation
Software as a Service
Multi-Tenant Data Storage Service
shared innovation
Unstructured Data Storage
e.g. binary files, including images, documents, etc
shared innovation
Structured Data Storage
RDF metadata
shared innovation
Access Control
All data is open (to read) by defaultConfigurable access options
shared innovation
Full-Text Searching and Querying
shared innovation
Standards Compliance
RDF, SPARQL, HTTP
shared innovation
Platform Architecture
Web API
Metabox
Contentbox
REST, RDFAuthentication & AuthorizationContent Negotiation
Core Conceptsaka “The Science Bit”
shared innovation
REST
Representational State Transfer
Correct Use of HTTP
shared innovation
Resource-Centric API
Everything has a unique URI
shared innovation
Interact with resources using HTTP
GET = readPUT = write
POST = update/modifyDELETE = delete
shared innovation
Use HTTP Response Codes
200 = OK201 = Created (new resource)
202 = Accepted (for processing)400 = Bad Request500 = Server Error
shared innovation
Mime Types
Used to identifiy content & meaning of request and response body
shared innovation
Content Negotiation
Majority of services support multiple output options, list varies by resource
Accept headeroutput parameter
shared innovation
Our Service Checklist
Consistent URI structureEvery service has human interfacePlain text error messages for easy debuggingCacheable…etc
shared innovation
Authentication
HTTP Digest Authentication
shared innovation
Authentication Example
shared innovation
Authorization
By default stores are world-readable, Store owner writable
Customisable roles and privileges per-Store
Review of the RDF Model
shared innovation
Apollo 11 was launched from Cape Canaveral
shared innovation
Apollo 11 was launched from Cape Canaveral
Subject Predicate Object
shared innovation
<http://purl.org/net/schemas/space/spacecraft/apollo-11> <http://purl.org/net/schemas/space/launchsite>
<http://purl.org/net/schemas/space/launchsite/capecanaveral>.
shared innovation
space:spacecraft/apollo-11 space:launchsite
space:launchsite/capecanaveral.
shared innovation
space:spacecraft/apollo-11 space:launchsite space:launchsite/capecanaveral.
space:spacecraft/apollo-11 rdfs:label “Apollo 11”.
space:launchsite/capecanaveral rdfs:label “Cape Canaveral”.
shared innovation
shared innovation
shared innovation
Benefits of RDF?
shared innovation
Good for Semi-structured Data
“Schema-Free”Very Flexible
shared innovation
Extensible
New propertiesNew resources
New types of resourceNew statements
shared innovation
Encourages Convergence
Reuse of vocabularies (i.e. properties)Reuse of identifiers (i.e. talk about the same things)
shared innovation
Simplifies Data Integration and Aggregation
Shared identifiersCommon data model
Common query languageCommon data formats
shared innovation
Several Different Ways to Serialize RDF
Optimized for different purposes
shared innovation
Turtle
Simple to read and hand-authorUsed in SPARQL query language
shared innovation
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>@prefix space: <http://purl.org/net/schemas/space/> @@prefix dc: <http://purl.org/dc/elements/1.1/>
<http://purl.org/net/schemas/space/spacecraft/1969-059A> rdf:type
<http://purl.org/net/schemas/space/Spacecraft>; dc:description "Apollo 11 was…”; space:agency "United States" .
shared innovation
RDF/XML
Best for data interchangeHarder to read
shared innovation
<rdf:RDF xmlns:j.0="http://xmlns.com/foaf/0.1/“ xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:space="http://purl.org/net/schemas/space/" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:base="http://purl.org/net/schemas/space">
<rdf:Description rdf:about="/spacecraft/1969-059A">
<dc:description>Apollo 11 was…</dc:description> <rdf:type rdf:resource="http://purl.org/net/schemas/space/Spacecraft"/> <space:agency>United States</space:agency>
</rdf:Description>
</rdf:RDF>
The Content BoxManaging unstructured, binary data
shared innovation
Store any stream of binary data
Images, documents, Javascript, etc
shared innovation
Full HTTP Caching Support
ETagsEfficient retrieval
Conditional updates
shared innovation
Server or Client Assignment of Identifiers
Provides full control over how URIs assigned
shared innovation
ContentBox URLs
• /storename/items– The Contentbox container
• /storename/items/<id>– An individual item
shared innovation
Adding Content
shared innovation
Deleting Content
shared innovation
Metadata for Contentbox Resources
Minimum is URI and ETagExtract height & width of images
…more metadata extraction in future
The Meta Box
Managing structured metadata
shared innovation
Full RDF Data Storage
Create, read, update, delete RDF resourcesQuery RDF data
shared innovation
Configurable Full Text Indexing of RDF
Indexes updated whenever new metadata added
shared innovation
Versioned and Un-Versioned Updates
By submitting data to separate resourcesMaintain audit trail
shared innovation
Can be Divided into Sub-Graphs
Separate access control options
shared innovation
Metabox URLs
• /storename/meta– The metabox
• /storename/meta/changesets– The collection of changesets associated with this metabox
• /storename/meta/graphs – The collection of sub-graphs
• /storename/meta/graphs/{id}– A sub-graph
• /storename/meta/graphs/{id}/changesets – The collection of changesets associated with a sub-graph
• /storename/services/sparql– SPARQL endpoint for metabox
• /storename/services/multisparql– SPARQL endpoint for querying across all sub-graphs
shared innovation
Storing RDF
POST application/rdf+xmlChanges saved immediately
Search indexing asynchronous
shared innovation
Triples are Merged into Store
Can catch out the unwaryUpdates happen through separate mechanism
shared innovation
Retrieving Metadata
/meta?about=…URI…Can select RDF serialization
shared innovation
Updating Resources
POST application/vnd.talis.changeset+xml
shared innovation
ChangeSets
Vocabulary that specifies removals/additions to an RDF graph
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cs="http://purl.org/vocab/changeset/schema#"> <cs:ChangeSet rdf:about="http://example.com/changesets#change"> <cs:subjectOfChange rdf:resource="http://purl.org/net/schema/space/launch/1969-059"/> <cs:createdDate>2008-12-08T00:00:00Z</cs:createdDate> <cs:creatorName>Leigh Dodds</cs:creatorName> <cs:changeReason>More accurate launch time</cs:changeReason> <cs:removal> <rdf:Statement> <rdf:subject rdf:resource="http://purl.org/net/schema/space/launch/1969-
059"/> <rdf:predicate rdf:resource="http://purl.org/net/schema/space/launched"/> <rdf:object>1969-07-16</rdf:object> </rdf:Statement> </cs:removal> <cs:addition> <rdf:Statement> <rdf:subject rdf:resource="http://purl.org/net/schema/space/launch/1969-
059"/> <rdf:predicate rdf:resource="http://purl.org/net/schema/space/launched"/> <rdf:object>1969-07-16T13:32:00</rdf:object> </rdf:Statement> </cs:addition> </cs:ChangeSet></rdf:RDF>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cs="http://purl.org/vocab/changeset/schema#"> <cs:ChangeSet rdf:about="http://example.com/changesets#change"> <cs:subjectOfChange rdf:resource="http://purl.org/net/schema/space/launch/1969-059"/> <cs:createdDate>2008-12-08T00:00:00Z</cs:createdDate> <cs:creatorName>Leigh Dodds</cs:creatorName> <cs:changeReason>More accurate launch time</cs:changeReason> <cs:removal> <rdf:Statement> <rdf:subject rdf:resource="http://purl.org/net/schema/space/launch/1969-
059"/> <rdf:predicate
rdf:resource="http://purl.org/net/schema/space/launched"/> <rdf:object>1969-07-16</rdf:object> </rdf:Statement> </cs:removal> <cs:addition> <rdf:Statement> <rdf:subject rdf:resource="http://purl.org/net/schema/space/launch/1969-
059"/> <rdf:predicate rdf:resource="http://purl.org/net/schema/space/launched"/> <rdf:object>1969-07-16T13:32:00</rdf:object> </rdf:Statement> </cs:addition> </cs:ChangeSet></rdf:RDF>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cs="http://purl.org/vocab/changeset/schema#"> <cs:ChangeSet rdf:about="http://example.com/changesets#change"> <cs:subjectOfChange rdf:resource="http://purl.org/net/schema/space/launch/1969-059"/> <cs:createdDate>2008-12-08T00:00:00Z</cs:createdDate> <cs:creatorName>Leigh Dodds</cs:creatorName> <cs:changeReason>More accurate launch time</cs:changeReason> <cs:removal> <rdf:Statement> <rdf:subject rdf:resource="http://purl.org/net/schema/space/launch/1969-
059"/> <rdf:predicate rdf:resource="http://purl.org/net/schema/space/launched"/> <rdf:object>1969-07-16</rdf:object> </rdf:Statement> </cs:removal> <cs:addition> <rdf:Statement> <rdf:subject rdf:resource="http://purl.org/net/schema/space/launch/1969-
059"/> <rdf:predicate
rdf:resource="http://purl.org/net/schema/space/launched"/> <rdf:object>1969-07-16T13:32:00</rdf:object> </rdf:Statement> </cs:addition> </cs:ChangeSet></rdf:RDF>
shared innovation
Versioned Updates
POST to /meta/changesetsApply update and stores changeset for later retrieval
shared innovation
Batch Updates
Combine several changesets into single POSTLinked together to define ordering
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cs="http://purl.org/vocab/changeset/schema#"> <cs:ChangeSet rdf:about="http://example.com/changesets/1"> <cs:subjectOfChange
rdf:resource="http://purl.org/net/schema/space/launch/1969-059"/>
<cs:changeReason>More accurate launch time</cs:changeReason> <cs:precedingChangeset rdf:resource="http://example.com/changesets/2"/> <!– changes --> </cs:ChangeSet> <cs:ChangeSet rdf:about="http://example.com/changesets/2"> <cs:subjectOfChange
rdf:resource="http://purl.org/net/schema/space/launch/1969-059"/>
<cs:precedingChangeset rdf:resource="http://example.com/changesets/3"/>
<!– changes --> </cs:ChangeSet> <cs:ChangeSet rdf:about="http://example.com/changesets/3"> <cs:subjectOfChange
rdf:resource="http://purl.org/net/schema/space/spacecraft/1969-059D"/> <!– changes -->... </cs:ChangeSet> </rdf:RDF>
Data Extraction & Exploration with SPARQL
shared innovation
SPARQL
RDF query language; HTTP protocol; Results format4 different forms of query
shared innovation
ASK
Test whether the graph contains some data of interest
shared innovation
#Was there a launch on 16th July 1969?
PREFIX space: <http://purl.org/net/schemas/space/>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
ASK WHERE { ?launch space:launched "1969-07-16"^^xsd:date.}
shared innovation
<?xml version="1.0"?><sparql xmlns="http://www.w3.org/2005/sparql-results#"> <head> </head> <boolean>true</boolean></sparql>
shared innovation
DESCRIBE
Generate an RDF description of a resource(s)
shared innovation
#Describe launch(es) that occurred on 16th July 1969
PREFIX space: <http://purl.org/net/schemas/space/>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
DESCRIBE ?launch WHERE { ?launch space:launched "1969-07-16"^^xsd:date.}
shared innovation
#Describe spacecraft launched on 16th July 1969
PREFIX space: <http://purl.org/net/schemas/space/>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
DESCRIBE ?spacecraft WHERE {
?launch space:launched "1969-07-16"^^xsd:date.
?spacecraft space:launch ?launch.
}
shared innovation
CONSTRUCT
Create a custom RDF graph based on query criteria
shared innovation
PREFIX space: <http://purl.org/net/schemas/space/>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>PREFIX foaf: <http://xmlns.com/foaf/0.1/>
CONSTRUCT { ?spacecraft foaf:name ?name; space:agency ?agency; space:mass ?mass. }WHERE { ?launch space:launched "1969-07-16"^^xsd:date.
?spacecraft space:launch ?launch; foaf:name ?name; space:agency ?agency; space:mass ?mass. }
shared innovation
SELECT
SQL style result set retrieval
shared innovation
PREFIX space: <http://purl.org/net/schemas/space/>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?agency ?massWHERE {
?launch space:launched "1969-07-16"^^xsd:date.
?spacecraft space:launch ?launch; foaf:name ?name; space:agency ?agency; space:mass ?mass. }
shared innovation
<?xml version="1.0"?><sparql xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.w3.org/2005/sparql-results#" > <head> <variable name="name"/> <variable name="agency"/> <variable name="mass"/> </head> <results> <result> <binding name="name"> <literal>Apollo 11 Command and Service Module (CSM)</literal> </binding> <binding name="agency"> <literal>United States</literal> </binding> <binding name="mass"> <literal>28801.0</literal> </binding> </result> <!– more results --> </results></sparql>
…as XML
shared innovation
{ "head": { "vars": [ "name" , "agency" , "mass" ] } , "results": { "bindings": [ { "name": { "type": "literal" , "value": "Apollo 11 Command and Service Module (CSM)" } , "agency": { "type": "literal" , "value": "United States" } , "mass": { "type": "literal" , "value": "28801.0" } } , { "name": { "type": "literal" , "value": "Apollo 11 SIVB" } , "agency": { "type": "literal" , "value": "United States" } , "mass": { "type": "literal" , "value": "13300.0" } } , { "name": { "type": "literal" , "value": "Apollo 11 Lunar Module / EASEP" } , "agency": { "type": "literal" , "value": "United States" } , "mass": { "type": "literal" , "value": "15065.0" } } ] }}
…as JSON
Tour of Extra Features
Searching, browsing, augmentation
shared innovation
Searching
Full text index over RDF literalsConfigurable indexing options
shared innovation
/items?query=[query] &max=[10] &offset=[0] &sort=[comma-separated fieldnames] &xsl=[XSLT stylesheet] &content-type=[mimetype for XSLT results]
shared innovation
Query Syntax
• lunar
• luna*
• “apollo 11”
• lunar OR apollo
• name:apollo
• (lunar OR apollo) AND agency:united states
shared innovation
Query Results
RSS 1.0 feedOpenSearch extensions (paging, relevance)
Full description of each resource
<rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:relevance="http://a9.com/-/opensearch/extensions/relevance/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:os="http://a9.com/-/spec/opensearch/1.1/" xmlns:ns.1="http://purl.org/net/schemas/space/"> <channel rdf:about=“…"> <title>lunar</title> <link>…</link> <description>Results of a search for lunar on space</description> <items> <rdf:Seq rdf:about="urn:uuid:eae4ead8-ca6a-4b12-b714-fe631d38e447"> <rdf:li resource="http://purl.org/net/schemas/space/spacecraft/LUNAR-A" /> </rdf:Seq> </items> <os:startIndex>0</os:startIndex> <os:itemsPerPage>10</os:itemsPerPage> <os:totalResults>118</os:totalResults></channel>
<item rdf:about="http://purl.org/net/schemas/space/spacecraft/LUNAR-A"> <title>Item</title> <link>http://purl.org/net/schemas/space/spacecraft/LUNAR-A</link> <relevance:score>1.0</relevance:score> <foaf:name>Lunar-A</foaf:name> <space:mass>520.0</space:mass> <space:internationalDesignator>LUNAR-A</space:internationalDesignator></item></rdf:RDF>
shared innovation
Facetted Search
Similar to Amazon product search, etcGroup search results by specific fields
shared innovation
/services/facet?query=[query] &fields=[comma-separated fieldnames]
&top=[10] &format=[xml|html]
<facet-results xmlns="http://schemas.talis.com/2007/facet-results#"> <head> <query>name:luna*</query>
<fields>agency</fields><top>10</top><output>xml</output>
</head> <fields> <field name="agency">
<term value="U.S.S.R" number="25" facet-uri=“…" search-uri=“…"/>
<term value="United States" number="9" facet-uri=“…" search-uri=“…"/>
<term value="Japan" number="1" facet-uri=“…" search-uri=“…"/>
<term value="India" number="1" facet-uri=“…" search-uri=“…"/>
</field> </fields></facet-results>
shared innovation
Augmentation
Annotate an RSS 1.0 feed against a storeAutomatically add a description of each referenced
resource
Store Administration
Job Control, Store Configuration
shared innovation
Field Predicate Map
Associate a short name to a RDF propertyProperties in field predicate map are indexed for
searchingShort name used in query syntax, sort order, etc
shared innovation
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:bf="http://schemas.talis.com/2006/bigfoot/configuration#" xmlns:frm="http://schemas.talis.com/2006/frame/schema#“ xml:base=“http://api.talis.com/stores/space”>
<bf:FieldPredicateMap rdf:about="/indexes/default/fpmaps/default">
<frm:mappedDatatypeProperty> <rdf:Description rdf:about="/indexes/default/fpmaps/default#agency">
<frm:property rdf:resource="http://purl.org/net/schema/space/agency"/> <frm:name>agency</frm:name>
</rdf:Description> </frm:mappedDatatypeProperty>
</bf:FieldPredicateMap>
</rdf:RDF>
shared innovation
Query Profile
Assign weightings to fields for searching
shared innovation
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:bf="http://schemas.talis.com/2006/bigfoot/configuration#" xmlns:frm="http://schemas.talis.com/2006/frame/schema#“xml:base=“http://api.talis.com/stores/space”> <bf:QueryProfile rdf:about="">
<bf:fieldWeight> <rdf:Description rdf:about="/indexes/default/queryprofiles/default#name"> <bf:weight>10.0</bf:weight> <frm:name>name</frm:name> </rdf:Description> </bf:fieldWeight> <bf:fieldWeight> <rdf:Description rdf:about="/indexes/default/queryprofiles/default#agency"> <bf:weight>5.0</bf:weight> <frm:name>agency</frm:name> </rdf:Description> </bf:fieldWeight> </bf:QueryProfile></rdf:RDF>
shared innovation
Job Control
Reindex, Reset, Snapshot, Restore
POST Job Request to /jobs
shared innovation
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:bf="http://schemas.talis.com/2006/bigfoot/configuration#">
<bf:JobRequest> <rdfs:label>Reset the data in my store</rdfs:label> <bf:jobType
rdf:resource="http://schemas.talis.com/2006/bigfoot/configuration#ResetDataJob"/> <bf:startTime>2008-12-01T15:10:00Z</bf:startTime> </bf:JobRequest> </rdf:RDF>
shared innovation
Jobs
Each job is a resource, with a URI
GET to monitor status, DELETE to remove
Summing Up
Summary, Additional Resources
shared innovation
The Talis Platform…
• Provides a standards compliant storage infrastructure for structured and unstructured metadata
• Uses RDF to support widest possible variety of data models and integration options
• Allow managing of data assets through simple web APIs
• Offers a range of data extraction options including full-text searching, SPARQL, RSS augmentation
• Can be tailored to individual applications using the API
• Can be driven by scheduling jobs to perform data management tasks
• Is constantly evolving…
shared innovation
Additional Resources
• API Reference– http://n2.talis.com/wiki/Platform_API
• Mailing List– http://groups.google.com/group/n2-dev
• Blog– http://blogs.talis.com/n2/
shared innovation
Client Libraries (in various states of development)
• Moriarty– http://code.google.com/p/moriarty/
• Javascript/JQuery– http://n2.talis.com/wiki/Talis_jQuery_plugin
• Ruby Client– http://rubyforge.org/projects/talis-platform/
• Java Client– http://code.google.com/p/penry/
shared innovation