scape david open planets foundation / university of southampton ipres2012 toronto, october 2012 lds...

42
SCAPE David Tarrant @davetaz [email protected] Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying Preservation Principals to Linked Data Systems This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).

Upload: toby-french

Post on 18-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

SCAPE Presenting the REF The Results Evaluation Framework 5 Tools (Droid, Fits, file, fido, Tika) 65 Versions (from 2008 to now) 1 Govdocs Corpora 1 Question…. 3

TRANSCRIPT

Page 1: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

David Tarrant @davetaz [email protected]

Open Planets Foundation / University of SouthamptoniPres2012Toronto, October 2012

LDS3Applying Preservation Principals to Linked Data Systems

This work was partially supported by the SCAPE Project.The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).

Page 2: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

Present Day

2

Page 3: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPEPresenting the REF

The Results Evaluation Framework

• 5 Tools (Droid, Fits, file, fido, Tika)

• 65 Versions (from 2008 to now)

• 1 Govdocs Corpora

• 1 Question….

3

Page 4: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

How accurate are file format identification tools historically?

4

Page 5: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

5

PDF 1.4

Page 6: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

6

DOCX

Page 7: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

9 Months Ago

7

Page 8: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPEWhy is Data Important?

• Data and Metadata are knowledge.• Knowledge is power.• Knowledge enables decision.• Knowledge enables process.• Knowledge empowers action.• Knowledge enables us to say because…

8

Page 9: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPEProcesses

9

ProcessDecision

DATA

DATA

DATA

A Classic Flow ChartData is key to making decisions

Page 10: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPEPolicy

10

ProcessPolicy

DATA

DATA

DATA

A Preservation Flow ChartData is key to informing policy

Page 11: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPEPolicy Data - Generated

• When?• Who?• What it affects?• What action is taken?

• Why?11

Policy

Page 12: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

Why?• Because something said so?

12

• When?• Who?• What it affects?• What action is taken?

• Why?

DATA

DATA

DATA

Page 13: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPECase Study Example (Opinion)

• Due to format obsolescence, all flash video files are to be migrated to H264/AAC.• Input data: Study on proliferation of flash and evidence of

lacking support from the rights holder, adobe. • File B was created from File A a year ago as it was

identified as being a flash video file.• Today, File A is identified as being an ogg video file.

• What has changed? Why? Does it affect me? Who generated the wrong information? Did they generate any other wrong information? 13

Page 14: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

I Don’t Know!

14

Page 15: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

6 Months Ago

15

Page 16: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPEA Fact?

16

File#1

application/zip

hasIdentification

Page 17: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPEProvenance

• Tarrant, David and Carr, Leslie (2012) LDS3: Applying Digital Preservation Principals to Linked Data Systems. In, Ninth International Conference on Digital Preservation (iPres2012), Toronto, Canada

17

Tim Berners-Lee

5-Star Linked Data Guide

Provides

Page 18: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPEData!!!

• One fact.• One document the fact comes from• One citation about the documents place of publication.

• Who, What, When and Where• Who they worked for and with.

18

Page 19: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

Named-Graph • In Linked Data a document is called a named-graph.

• But these also get used for two purposes!!

19

File#1

Application/zip

hasIdentification

Page 20: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPEThe two uses of the named-graph

No. 1 – Data Publication

20

DATA

DATA

DATA

Named-GraphFile#1

Application/zip

hasIdentification

Page 21: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPEThe two uses of the named-graph

No. 2 – Data Discovery/Query

21

Named-GraphFile#1

application/zip

hasIdentification

DATA

DATA

DATA

File#1

application/msword

hasIdentification

Page 22: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPEThe two uses of the named-graph

No. 2 – Data Discovery/Query

22

Works For

Works For

Named-GraphFile#1

Application/zip

hasIdentification

Named-GraphFile#1

application/zip

hasIdentification

File#1

application/msword

hasIdentification

Page 23: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

Query Graph

Source Graph 2

Source Graph 1

Quads

23

File#1

application/zip

hasIdentification

File#1

application/msword

hasIdentification

After all, RDF is a graph model

RDF the spec, not the RDF/XML serialization

Page 24: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

Query Graph

Source Graph 2

Source Graph 1

Quads

24

File#1

application/zip

hasIdentification

File#1

application/msword

hasIdentification

usesTool

File 5.04

usesTool

File 5.07

Page 25: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

File1/Identification/tool/file/version/5.03

File#1

University of Southampton

hasIdentification

Still with me…

• Ok so what about versioning?

25

File1/Identification/tool/file/version/5.07

File#1

application/msword

hasIdentification

Page 26: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

Latest

26

/File1/Identification/tool/file/

File1/Identification/tool/file/version/5.03

File#1

University of Southampton

hasIdentification

File1/Identification/tool/file/version/5.07

File#1

application/msword

hasIdentification

prev

ious

ver

sion

Page 27: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

3 Months Ago

27

Page 28: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPEwww.LDS3.org

• A technical solution to all the complexity, automatic:

• Versioning• Linking• Annotation• Named-Graph Management• Query Management

28

Page 29: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

Demo

29

Page 30: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPEwww.LDS3.org

• CRUD

• SWORDv2 (Based Upon)

• Oauth Authentication

30

Page 31: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPEIn the paper

• Links between P2-Registry, Pronom and LDS3

• Description of the LDS3 specification• Overview of software in the LDS3 stack (hardly any of

it is new)• How LDS3 relates to Amazon S3• More on named-graphs versioning• More on information and non-information resources.

31

Page 32: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

2 Months Ago

32

Page 34: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

34

Page 35: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

35

Present Day

Page 36: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPEPresenting the REF

The Results Evaluation Framework

• 5 Tools (Droid, Fits, file, fido, Tika)

• 65 Versions (from 2008 to now)

• 1 Govdocs Corpora

• 1 Question….

36

Page 37: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

How accurate are file format identification tools historically?

37

Page 39: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

39

DOCX

http://data.openplanetsfoundation.org/ref/docx/

Page 40: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

40

Back To The Future

Page 41: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPEThe Future

• Get me the identification for a file as it would have been on 3rd October 2010.

GET /ref/?query=“SELECT ?identificaiton where file = X” HTTP/1.1

Accept-Datetime: Sun, 3 Oct 2010 12:00:00 GMT Accept: text/plain

application/zip

41

Page 42: SCAPE David Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying

SCAPE

David Tarrant @davetaz [email protected]

Open Planets Foundation / University of SouthamptoniPres2012Toronto, October 2012

LDS3Applying Preservation Principals to Linked Data Systems

This work was partially supported by the SCAPE Project.The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).