getting started with the talis platform

Post on 13-May-2015

9.508 Views

Category:

Education

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Developer training session providing an overview of the core features and services of the Talis Platform. Includes basic overview of REST and RDF

TRANSCRIPT

Getting Started with the Talis PlatformLeigh Dodds

Platform Programme Manager

Talis

December 2008

http://creativecommons.org/licenses/by/2.0/uk/

shared innovation

Agenda

• Platform Overview• Core Concepts• Review of the RDF Model• Managing binary data• Managing structured metadata• Exploring RDF data with SPARQL• Extra Features• Store Administration• Summary

Platform Overview

shared innovation

Software as a Service

Multi-Tenant Data Storage Service

shared innovation

Unstructured Data Storage

e.g. binary files, including images, documents, etc

shared innovation

Structured Data Storage

RDF metadata

shared innovation

Access Control

All data is open (to read) by defaultConfigurable access options

shared innovation

Full-Text Searching and Querying

shared innovation

Standards Compliance

RDF, SPARQL, HTTP

shared innovation

Platform Architecture

Web API

Metabox

Contentbox

REST, RDFAuthentication & AuthorizationContent Negotiation

Core Conceptsaka “The Science Bit”

shared innovation

REST

Representational State Transfer

Correct Use of HTTP

shared innovation

Resource-Centric API

Everything has a unique URI

shared innovation

Interact with resources using HTTP

GET = readPUT = write

POST = update/modifyDELETE = delete

shared innovation

Use HTTP Response Codes

200 = OK201 = Created (new resource)

202 = Accepted (for processing)400 = Bad Request500 = Server Error

shared innovation

Mime Types

Used to identifiy content & meaning of request and response body

shared innovation

Content Negotiation

Majority of services support multiple output options, list varies by resource

Accept headeroutput parameter

shared innovation

Our Service Checklist

Consistent URI structureEvery service has human interfacePlain text error messages for easy debuggingCacheable…etc

shared innovation

Authentication

HTTP Digest Authentication

shared innovation

Authentication Example

shared innovation

Authorization

By default stores are world-readable, Store owner writable

Customisable roles and privileges per-Store

Review of the RDF Model

shared innovation

Apollo 11 was launched from Cape Canaveral

shared innovation

Apollo 11 was launched from Cape Canaveral

Subject Predicate Object

shared innovation

<http://purl.org/net/schemas/space/spacecraft/apollo-11> <http://purl.org/net/schemas/space/launchsite>

<http://purl.org/net/schemas/space/launchsite/capecanaveral>.

shared innovation

space:spacecraft/apollo-11 space:launchsite

space:launchsite/capecanaveral.

shared innovation

space:spacecraft/apollo-11 space:launchsite space:launchsite/capecanaveral.

space:spacecraft/apollo-11 rdfs:label “Apollo 11”.

space:launchsite/capecanaveral rdfs:label “Cape Canaveral”.

shared innovation

shared innovation

shared innovation

Benefits of RDF?

shared innovation

Good for Semi-structured Data

“Schema-Free”Very Flexible

shared innovation

Extensible

New propertiesNew resources

New types of resourceNew statements

shared innovation

Encourages Convergence

Reuse of vocabularies (i.e. properties)Reuse of identifiers (i.e. talk about the same things)

shared innovation

Simplifies Data Integration and Aggregation

Shared identifiersCommon data model

Common query languageCommon data formats

shared innovation

Several Different Ways to Serialize RDF

Optimized for different purposes

shared innovation

Turtle

Simple to read and hand-authorUsed in SPARQL query language

shared innovation

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>@prefix space: <http://purl.org/net/schemas/space/> @@prefix dc: <http://purl.org/dc/elements/1.1/>

<http://purl.org/net/schemas/space/spacecraft/1969-059A> rdf:type

<http://purl.org/net/schemas/space/Spacecraft>; dc:description "Apollo 11 was…”; space:agency "United States" .

shared innovation

RDF/XML

Best for data interchangeHarder to read

shared innovation

<rdf:RDF xmlns:j.0="http://xmlns.com/foaf/0.1/“ xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:space="http://purl.org/net/schemas/space/" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:base="http://purl.org/net/schemas/space">

<rdf:Description rdf:about="/spacecraft/1969-059A">

<dc:description>Apollo 11 was…</dc:description> <rdf:type rdf:resource="http://purl.org/net/schemas/space/Spacecraft"/> <space:agency>United States</space:agency>

</rdf:Description>

</rdf:RDF>

The Content BoxManaging unstructured, binary data

shared innovation

Store any stream of binary data

Images, documents, Javascript, etc

shared innovation

Full HTTP Caching Support

ETagsEfficient retrieval

Conditional updates

shared innovation

Server or Client Assignment of Identifiers

Provides full control over how URIs assigned

shared innovation

ContentBox URLs

• /storename/items– The Contentbox container

• /storename/items/<id>– An individual item

shared innovation

Adding Content

shared innovation

Deleting Content

shared innovation

Metadata for Contentbox Resources

Minimum is URI and ETagExtract height & width of images

…more metadata extraction in future

The Meta Box

Managing structured metadata

shared innovation

Full RDF Data Storage

Create, read, update, delete RDF resourcesQuery RDF data

shared innovation

Configurable Full Text Indexing of RDF

Indexes updated whenever new metadata added

shared innovation

Versioned and Un-Versioned Updates

By submitting data to separate resourcesMaintain audit trail

shared innovation

Can be Divided into Sub-Graphs

Separate access control options

shared innovation

Metabox URLs

• /storename/meta– The metabox

• /storename/meta/changesets– The collection of changesets associated with this metabox

• /storename/meta/graphs – The collection of sub-graphs

• /storename/meta/graphs/{id}– A sub-graph

• /storename/meta/graphs/{id}/changesets – The collection of changesets associated with a sub-graph

• /storename/services/sparql– SPARQL endpoint for metabox

• /storename/services/multisparql– SPARQL endpoint for querying across all sub-graphs

shared innovation

Storing RDF

POST application/rdf+xmlChanges saved immediately

Search indexing asynchronous

shared innovation

Triples are Merged into Store

Can catch out the unwaryUpdates happen through separate mechanism

shared innovation

Retrieving Metadata

/meta?about=…URI…Can select RDF serialization

shared innovation

Updating Resources

POST application/vnd.talis.changeset+xml

shared innovation

ChangeSets

Vocabulary that specifies removals/additions to an RDF graph

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cs="http://purl.org/vocab/changeset/schema#"> <cs:ChangeSet rdf:about="http://example.com/changesets#change"> <cs:subjectOfChange rdf:resource="http://purl.org/net/schema/space/launch/1969-059"/> <cs:createdDate>2008-12-08T00:00:00Z</cs:createdDate> <cs:creatorName>Leigh Dodds</cs:creatorName> <cs:changeReason>More accurate launch time</cs:changeReason> <cs:removal> <rdf:Statement> <rdf:subject rdf:resource="http://purl.org/net/schema/space/launch/1969-

059"/> <rdf:predicate rdf:resource="http://purl.org/net/schema/space/launched"/> <rdf:object>1969-07-16</rdf:object> </rdf:Statement> </cs:removal> <cs:addition> <rdf:Statement> <rdf:subject rdf:resource="http://purl.org/net/schema/space/launch/1969-

059"/> <rdf:predicate rdf:resource="http://purl.org/net/schema/space/launched"/> <rdf:object>1969-07-16T13:32:00</rdf:object> </rdf:Statement> </cs:addition> </cs:ChangeSet></rdf:RDF>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cs="http://purl.org/vocab/changeset/schema#"> <cs:ChangeSet rdf:about="http://example.com/changesets#change"> <cs:subjectOfChange rdf:resource="http://purl.org/net/schema/space/launch/1969-059"/> <cs:createdDate>2008-12-08T00:00:00Z</cs:createdDate> <cs:creatorName>Leigh Dodds</cs:creatorName> <cs:changeReason>More accurate launch time</cs:changeReason> <cs:removal> <rdf:Statement> <rdf:subject rdf:resource="http://purl.org/net/schema/space/launch/1969-

059"/> <rdf:predicate

rdf:resource="http://purl.org/net/schema/space/launched"/> <rdf:object>1969-07-16</rdf:object> </rdf:Statement> </cs:removal> <cs:addition> <rdf:Statement> <rdf:subject rdf:resource="http://purl.org/net/schema/space/launch/1969-

059"/> <rdf:predicate rdf:resource="http://purl.org/net/schema/space/launched"/> <rdf:object>1969-07-16T13:32:00</rdf:object> </rdf:Statement> </cs:addition> </cs:ChangeSet></rdf:RDF>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cs="http://purl.org/vocab/changeset/schema#"> <cs:ChangeSet rdf:about="http://example.com/changesets#change"> <cs:subjectOfChange rdf:resource="http://purl.org/net/schema/space/launch/1969-059"/> <cs:createdDate>2008-12-08T00:00:00Z</cs:createdDate> <cs:creatorName>Leigh Dodds</cs:creatorName> <cs:changeReason>More accurate launch time</cs:changeReason> <cs:removal> <rdf:Statement> <rdf:subject rdf:resource="http://purl.org/net/schema/space/launch/1969-

059"/> <rdf:predicate rdf:resource="http://purl.org/net/schema/space/launched"/> <rdf:object>1969-07-16</rdf:object> </rdf:Statement> </cs:removal> <cs:addition> <rdf:Statement> <rdf:subject rdf:resource="http://purl.org/net/schema/space/launch/1969-

059"/> <rdf:predicate

rdf:resource="http://purl.org/net/schema/space/launched"/> <rdf:object>1969-07-16T13:32:00</rdf:object> </rdf:Statement> </cs:addition> </cs:ChangeSet></rdf:RDF>

shared innovation

Versioned Updates

POST to /meta/changesetsApply update and stores changeset for later retrieval

shared innovation

Batch Updates

Combine several changesets into single POSTLinked together to define ordering

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cs="http://purl.org/vocab/changeset/schema#"> <cs:ChangeSet rdf:about="http://example.com/changesets/1"> <cs:subjectOfChange

rdf:resource="http://purl.org/net/schema/space/launch/1969-059"/>

<cs:changeReason>More accurate launch time</cs:changeReason> <cs:precedingChangeset rdf:resource="http://example.com/changesets/2"/> <!– changes --> </cs:ChangeSet> <cs:ChangeSet rdf:about="http://example.com/changesets/2"> <cs:subjectOfChange

rdf:resource="http://purl.org/net/schema/space/launch/1969-059"/>

<cs:precedingChangeset rdf:resource="http://example.com/changesets/3"/>

<!– changes --> </cs:ChangeSet> <cs:ChangeSet rdf:about="http://example.com/changesets/3"> <cs:subjectOfChange

rdf:resource="http://purl.org/net/schema/space/spacecraft/1969-059D"/> <!– changes -->... </cs:ChangeSet> </rdf:RDF>

Data Extraction & Exploration with SPARQL

shared innovation

SPARQL

RDF query language; HTTP protocol; Results format4 different forms of query

shared innovation

ASK

Test whether the graph contains some data of interest

shared innovation

#Was there a launch on 16th July 1969?

PREFIX space: <http://purl.org/net/schemas/space/>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

ASK WHERE { ?launch space:launched "1969-07-16"^^xsd:date.}

shared innovation

<?xml version="1.0"?><sparql xmlns="http://www.w3.org/2005/sparql-results#"> <head> </head> <boolean>true</boolean></sparql>

shared innovation

DESCRIBE

Generate an RDF description of a resource(s)

shared innovation

#Describe launch(es) that occurred on 16th July 1969

PREFIX space: <http://purl.org/net/schemas/space/>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

DESCRIBE ?launch WHERE { ?launch space:launched "1969-07-16"^^xsd:date.}

shared innovation

#Describe spacecraft launched on 16th July 1969

PREFIX space: <http://purl.org/net/schemas/space/>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

DESCRIBE ?spacecraft WHERE {

?launch space:launched "1969-07-16"^^xsd:date.

?spacecraft space:launch ?launch.

}

shared innovation

CONSTRUCT

Create a custom RDF graph based on query criteria

shared innovation

PREFIX space: <http://purl.org/net/schemas/space/>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>PREFIX foaf: <http://xmlns.com/foaf/0.1/>

CONSTRUCT { ?spacecraft foaf:name ?name; space:agency ?agency; space:mass ?mass. }WHERE { ?launch space:launched "1969-07-16"^^xsd:date.

?spacecraft space:launch ?launch; foaf:name ?name; space:agency ?agency; space:mass ?mass. }

shared innovation

SELECT

SQL style result set retrieval

shared innovation

PREFIX space: <http://purl.org/net/schemas/space/>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?name ?agency ?massWHERE {

?launch space:launched "1969-07-16"^^xsd:date.

?spacecraft space:launch ?launch; foaf:name ?name; space:agency ?agency; space:mass ?mass. }

shared innovation

<?xml version="1.0"?><sparql xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.w3.org/2005/sparql-results#" > <head> <variable name="name"/> <variable name="agency"/> <variable name="mass"/> </head> <results> <result> <binding name="name"> <literal>Apollo 11 Command and Service Module (CSM)</literal> </binding> <binding name="agency"> <literal>United States</literal> </binding> <binding name="mass"> <literal>28801.0</literal> </binding> </result> <!– more results --> </results></sparql>

…as XML

shared innovation

{ "head": { "vars": [ "name" , "agency" , "mass" ] } , "results": { "bindings": [ { "name": { "type": "literal" , "value": "Apollo 11 Command and Service Module (CSM)" } , "agency": { "type": "literal" , "value": "United States" } , "mass": { "type": "literal" , "value": "28801.0" } } , { "name": { "type": "literal" , "value": "Apollo 11 SIVB" } , "agency": { "type": "literal" , "value": "United States" } , "mass": { "type": "literal" , "value": "13300.0" } } , { "name": { "type": "literal" , "value": "Apollo 11 Lunar Module / EASEP" } , "agency": { "type": "literal" , "value": "United States" } , "mass": { "type": "literal" , "value": "15065.0" } } ] }}

…as JSON

Tour of Extra Features

Searching, browsing, augmentation

shared innovation

Searching

Full text index over RDF literalsConfigurable indexing options

shared innovation

/items?query=[query] &max=[10] &offset=[0] &sort=[comma-separated fieldnames] &xsl=[XSLT stylesheet] &content-type=[mimetype for XSLT results]

shared innovation

Query Syntax

• lunar

• luna*

• “apollo 11”

• lunar OR apollo

• name:apollo

• (lunar OR apollo) AND agency:united states

shared innovation

Query Results

RSS 1.0 feedOpenSearch extensions (paging, relevance)

Full description of each resource

<rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:relevance="http://a9.com/-/opensearch/extensions/relevance/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:os="http://a9.com/-/spec/opensearch/1.1/" xmlns:ns.1="http://purl.org/net/schemas/space/"> <channel rdf:about=“…"> <title>lunar</title> <link>…</link> <description>Results of a search for lunar on space</description> <items> <rdf:Seq rdf:about="urn:uuid:eae4ead8-ca6a-4b12-b714-fe631d38e447"> <rdf:li resource="http://purl.org/net/schemas/space/spacecraft/LUNAR-A" /> </rdf:Seq> </items> <os:startIndex>0</os:startIndex> <os:itemsPerPage>10</os:itemsPerPage> <os:totalResults>118</os:totalResults></channel>

<item rdf:about="http://purl.org/net/schemas/space/spacecraft/LUNAR-A"> <title>Item</title> <link>http://purl.org/net/schemas/space/spacecraft/LUNAR-A</link> <relevance:score>1.0</relevance:score> <foaf:name>Lunar-A</foaf:name> <space:mass>520.0</space:mass> <space:internationalDesignator>LUNAR-A</space:internationalDesignator></item></rdf:RDF>

shared innovation

Facetted Search

Similar to Amazon product search, etcGroup search results by specific fields

shared innovation

/services/facet?query=[query] &fields=[comma-separated fieldnames]

&top=[10] &format=[xml|html]

<facet-results xmlns="http://schemas.talis.com/2007/facet-results#"> <head> <query>name:luna*</query>

<fields>agency</fields><top>10</top><output>xml</output>

</head> <fields> <field name="agency">

<term value="U.S.S.R" number="25" facet-uri=“…" search-uri=“…"/>

<term value="United States" number="9" facet-uri=“…" search-uri=“…"/>

<term value="Japan" number="1" facet-uri=“…" search-uri=“…"/>

<term value="India" number="1" facet-uri=“…" search-uri=“…"/>

</field> </fields></facet-results>

shared innovation

Augmentation

Annotate an RSS 1.0 feed against a storeAutomatically add a description of each referenced

resource

Store Administration

Job Control, Store Configuration

shared innovation

Field Predicate Map

Associate a short name to a RDF propertyProperties in field predicate map are indexed for

searchingShort name used in query syntax, sort order, etc

shared innovation

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:bf="http://schemas.talis.com/2006/bigfoot/configuration#" xmlns:frm="http://schemas.talis.com/2006/frame/schema#“ xml:base=“http://api.talis.com/stores/space”>

<bf:FieldPredicateMap rdf:about="/indexes/default/fpmaps/default">

<frm:mappedDatatypeProperty> <rdf:Description rdf:about="/indexes/default/fpmaps/default#agency">

<frm:property rdf:resource="http://purl.org/net/schema/space/agency"/> <frm:name>agency</frm:name>

</rdf:Description> </frm:mappedDatatypeProperty>

</bf:FieldPredicateMap>

</rdf:RDF>

shared innovation

Query Profile

Assign weightings to fields for searching

shared innovation

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:bf="http://schemas.talis.com/2006/bigfoot/configuration#" xmlns:frm="http://schemas.talis.com/2006/frame/schema#“xml:base=“http://api.talis.com/stores/space”> <bf:QueryProfile rdf:about="">

<bf:fieldWeight> <rdf:Description rdf:about="/indexes/default/queryprofiles/default#name"> <bf:weight>10.0</bf:weight> <frm:name>name</frm:name> </rdf:Description> </bf:fieldWeight> <bf:fieldWeight> <rdf:Description rdf:about="/indexes/default/queryprofiles/default#agency"> <bf:weight>5.0</bf:weight> <frm:name>agency</frm:name> </rdf:Description> </bf:fieldWeight> </bf:QueryProfile></rdf:RDF>

shared innovation

Job Control

Reindex, Reset, Snapshot, Restore

POST Job Request to /jobs

shared innovation

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:bf="http://schemas.talis.com/2006/bigfoot/configuration#">

<bf:JobRequest> <rdfs:label>Reset the data in my store</rdfs:label> <bf:jobType

rdf:resource="http://schemas.talis.com/2006/bigfoot/configuration#ResetDataJob"/> <bf:startTime>2008-12-01T15:10:00Z</bf:startTime> </bf:JobRequest> </rdf:RDF>

shared innovation

Jobs

Each job is a resource, with a URI

GET to monitor status, DELETE to remove

Summing Up

Summary, Additional Resources

shared innovation

The Talis Platform…

• Provides a standards compliant storage infrastructure for structured and unstructured metadata

• Uses RDF to support widest possible variety of data models and integration options

• Allow managing of data assets through simple web APIs

• Offers a range of data extraction options including full-text searching, SPARQL, RSS augmentation

• Can be tailored to individual applications using the API

• Can be driven by scheduling jobs to perform data management tasks

• Is constantly evolving…

shared innovation

Additional Resources

• API Reference– http://n2.talis.com/wiki/Platform_API

• Mailing List– http://groups.google.com/group/n2-dev

• Blog– http://blogs.talis.com/n2/

shared innovation

Client Libraries (in various states of development)

• Moriarty– http://code.google.com/p/moriarty/

• Javascript/JQuery– http://n2.talis.com/wiki/Talis_jQuery_plugin

• Ruby Client– http://rubyforge.org/projects/talis-platform/

• Java Client– http://code.google.com/p/penry/

shared innovation

top related