Download - Autodiscovery or The long tail of open data
![Page 1: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/1.jpg)
Autodiscovery
or
The long tail of open data
Christopher Gutteridge
University of Southampton
& data.ac.uk
![Page 2: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/2.jpg)
Bragsheet
Christopher Gutteridge - @cgutteridge• Previously; Lead Developer of EPrints
(Open access research repository software).
• “Linked Open Data Architect” for University of Southampton.
(or whatever we’re currently call doing LOD stuff for an organisation)
• Benevolent technical dictator of data.ac.uk(recently deposed)
• Webmaster WWW2006• Assistant Webmaster WWW2007, WWW2009
![Page 3: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/3.jpg)
Image Attributions:• Backgrounds:
– http://www.fansshare.com/gallery/photos/14646865/abstract-background-brown-and-blue-circles/
– http://www.pptback.com/old-machine-gears-pptbackground.html
• Cliff leap pic: Justin De La Ornellas @ Flickr• Train tracks: duncanh1 @ Flickr• Lego bricks: rawdonfox @ Flickr• Mechano Box: Lady alys @ Wikipedia• Stickle Bricks: Simon Jobling @ Flickr• Free Universal Construction Kit: F.A.T. Lab + Sy-Lab.• Telescope: Brongaeh @ Flickr• Pinata: Peasap @ Flickr• Containers: l2f1 @ Flickr
![Page 4: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/4.jpg)
![Page 5: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/5.jpg)
Why don’t organisations
share data?
(and what stops them)
![Page 6: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/6.jpg)
Us early adopters have shared data because it’s cool.
We were not 100% clear on the benefits but it looks like fun and maybe gains us reputation.
![Page 7: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/7.jpg)
Fear. Uncertainty. Doubt.
![Page 8: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/8.jpg)
Open Data Excuse Bingo
Terrorists will use it
We'll get spam It's too big It's not very interesting
Thieves will use it
I don't mind, but someone
else might
We will get too many enquiries
Lawyers want a custom License
There's no API Poor Quality There's already a project to...
We might want to use it in a
paper
It's too complicated
Data Protection People may misinterpret
the data
What if we want to sell it
later
Don’t get depressed! Go here for antidotes: http://is.gd/odbingo
![Page 9: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/9.jpg)
Menu
Burger ….. £3.50Chips ….. £1.50 ≠
![Page 10: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/10.jpg)
Greater than the sum of
its parts
![Page 11: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/11.jpg)
Interoperable datasets
allow results that are
greater than the sum
of the parts…
11
![Page 12: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/12.jpg)
bu
http://bus.southampton.ac.uk/
![Page 13: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/13.jpg)
13
![Page 14: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/14.jpg)
14
![Page 15: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/15.jpg)
15
![Page 16: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/16.jpg)
16
![Page 17: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/17.jpg)
![Page 18: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/18.jpg)
http://www.minecraftworldmap.com/worlds/xO3X4/full#/4469/64/-1806/-3/0/0
![Page 19: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/19.jpg)
data.southampton.ac.uk
![Page 20: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/20.jpg)
DiscreteFacts
Statistitics
![Page 21: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/21.jpg)
What I want from data
• Where am I going?
• How can I get there?
• Where can I get a coffee enroute?
![Page 22: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/22.jpg)
![Page 23: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/23.jpg)
Why aren’t they using
our data?
![Page 24: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/24.jpg)
“If you build it, they will come.”
![Page 25: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/25.jpg)
“If you build it, they will come.”
![Page 26: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/26.jpg)
Value of dataset to audienceX
Potential audience sizeX
Ease of discoveryX
Ease of grasping the value of the datasetX
Ease of exploiting dataset
Probability of open dataset reuse =
![Page 27: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/27.jpg)
Value of dataset to audienceX
Potential audience sizeX
Ease of discoveryX
Ease of grasping the value of the datasetX
Ease of exploiting datasetX
Perceived quality & reliability
Probability of open dataset reuse =
![Page 28: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/28.jpg)
…Autodiscoverable
and interoperable data
can massively increase
the potential audience
28
![Page 29: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/29.jpg)
$ ./generate-world Demo --postcode PO381NL --size 250
29
![Page 30: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/30.jpg)
$ ./generate-world Demo --postcode PO381NL --size 250
30
![Page 31: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/31.jpg)
data.ac.uk
![Page 32: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/32.jpg)
• Automatically discovers equipment data from all .ac.uk sites
– 2769 websites
– 42 providing data
– 11,028 records
• Automation massively reduces staffing costs
• Low effort for institutions-
– A third just provide a well-structured spreadsheet!
• Not a single-point-of-failure
32
.ac.uk
![Page 33: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/33.jpg)
33
![Page 34: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/34.jpg)
UK National Equipment Portal
34http://equipment.data.ac.uk
![Page 35: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/35.jpg)
UNIQUIP
Column Heading Required
Type No
NameAt least one of these fields must be completed.
Description
Related Facility ID No
Technique(:cpv) or (:N8) No
Location No
Contact Name No
Contact Telephone
At least one of these fields must be completed.Contact URL
Contact Email
Secondary Contact Name No
Secondary Contact TelephoneAt least one of these fields must be completed with second contact name.
Secondary Contact URL
Secondary Contact Email
ID No
Photo No
Department No
Site Location Yes
Building No
Service Level No
Web Address No35
![Page 36: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/36.jpg)
36
.ac.uk
![Page 37: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/37.jpg)
Doin’ it on the cheap
37
![Page 38: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/38.jpg)
Doin’ it on the cheap
38
![Page 39: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/39.jpg)
Ensuring a sustainable
service through
autodiscovery
39
![Page 40: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/40.jpg)
Sustainability via Autodiscovery
• How do we add new datasets?
• How are changes made?
• How do we know the data is open data?
![Page 41: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/41.jpg)
Sustainability via Autodiscovery
• Have a machine readable document
describing the institution and any open
datasets (with licences)
• Place a link to it on the Institutions homepage
![Page 42: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/42.jpg)
/.well-known/openorg
http://www.soton.ac.uk/.well-known/openorg
or
<link rel=“openorg” href=“http://id.southampton.ac.uk/dataset/profile/latest”>
![Page 43: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/43.jpg)
/.well-known/openorg
http://www.soton.ac.uk/.well-known/openorg
or
<link rel=“openorg” href=“http://id.southampton.ac.uk/dataset/profile/latest”>
![Page 44: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/44.jpg)
What is an Organisation Profile Document,
44
A RDF Document that describes the organisation:
– General information provided:
• Official name, Postal address, Contact phone number,The correct logo,
Physical location
– Links to the parts of the organisation,
• Admissions, Alumni, Freedom of Information, Complaints
– A semantic sitemap
• Key pages such as jobs, news, events…
– Links to the organisation’s discoverable open data sets and APIs
• The equipment dataset
![Page 45: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/45.jpg)
What is an Organisation Profile Document,
45
![Page 46: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/46.jpg)
46
![Page 47: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/47.jpg)
Autodiscovery
47
![Page 48: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/48.jpg)
Autodiscovery
48
• Dataset publicly available on website.
• Dataset has to be added manually along with all the institutions details,
contacts etc
Requires staff time (especially if any dataset changes location)
![Page 49: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/49.jpg)
Autodiscovery
49
• Dataset publicly available on website.
• Dataset has to be added manually along with all the institutions details,
contacts etc
Requires staff time (especially if any dataset changes location)
• Organisation has an OPD linking to dataset
• The OPD has to be added manually, but the dataset location and
institution info is consumed directly from the OPD.
Requires less staff time (as any changes made to OPD will get updated)
![Page 50: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/50.jpg)
Autodiscovery
50
• Dataset publicly available on website.
• Dataset has to be added manually along with all the institutions details,
contacts etc
Requires staff time (especially if any dataset changes location)
• Organisation has an OPD linking to dataset
• The OPD has to be added manually, but the dataset location and
institution info is consumed directly from the OPD.
Requires less staff time (as any changes made to OPD will get updated)
• Link to OPD from organisation’s home page
• OPD autodiscovered, so the dataset is automatically added to the
service.
Requires no staff time (as data is autodiscovered)
![Page 51: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/51.jpg)
Never appeal to a man’s “better nature.” He may not have one.
Invoking his “self—interest” gives you more leverage.
- Robert Heinlein, “The Notebooks of Lazarus Long”
![Page 52: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/52.jpg)
Status Report – Contributors and data statistics
52
![Page 53: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/53.jpg)
Bronze Silver Gold
Data is on the internet and in an acceptable format.
✔ ✔ ✔
Description of dataset is provided by a remotely hosted OPD
✔ ✔
The OPD is discovered via autodiscovery.
✔
The OPD/dataset has a recognised and supported open licence (eg CCO, ODCA or OGL)
✔
53
![Page 54: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/54.jpg)
Bronze Silver Gold
Data is on the internet and in an acceptable format.
✔ ✔ ✔
Description of dataset is provided by a remotely hosted OPD
✔ ✔
The OPD is discovered via autodiscovery.
✔
The OPD/dataset has a recognised and supported open licence (eg CCO, ODCA or OGL)
✔
All items in the dataset are assigned an ID code which is unique within theassigning organisation.
✔
54
![Page 55: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/55.jpg)
![Page 56: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/56.jpg)
![Page 57: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/57.jpg)
Exploiting profile
documents
![Page 58: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/58.jpg)
Exploiting profile documents
• We’ve barely begun
• Lets try a live demo....
![Page 59: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/59.jpg)
![Page 60: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/60.jpg)
![Page 61: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/61.jpg)
![Page 62: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/62.jpg)
Warning:
Metaphor mixing detected
![Page 63: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/63.jpg)
63
Needless heterogeneity means research doesn’t join up.
Aligning datasets every timecosts too much.
Tools can’t be reused
![Page 64: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/64.jpg)
So what do we do about it?
![Page 65: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/65.jpg)
![Page 66: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/66.jpg)
Building easy-to-use tools to cross between formats, platforms and paradigms is very specialist work.
![Page 67: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/67.jpg)
Building easy-to-use tools to cross between formats, platforms and paradigms is very specialist work.
The solutions need to be discoverable.
![Page 68: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/68.jpg)
Building easy-to-use tools to cross between formats, platforms and paradigms is very specialist work.
The solutions need to be discoverable.
Just putting it on Github is not making a tool discoverable!
![Page 69: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/69.jpg)
Building easy-to-use tools to cross between formats, platforms and paradigms is very specialist work.
The solutions need to be discoverable.
Just putting it on Github is not making a tool discoverable!
https://github.com/cgutteridge/
![Page 70: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/70.jpg)
Organisation Datasets
Well known formats available for:
• Events
• Publications
• News headlines
Nothing in common use for:
• Staff Expertise
• Programmes of Events
• Vacancies
• Organisational Structure
• Buildings, Rooms
• Points of service
• Products– Food Menus
![Page 71: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/71.jpg)
![Page 72: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/72.jpg)
RDF or XML Vocabularies don’t solve the problem
by themselves.
You need:
Examples to copy.
Tools which consume and produce the format.
Online checking tools.
![Page 73: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/73.jpg)
A dataset should at least solve one
usecase.
Over modelling is fun.
Stop it.
![Page 74: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/74.jpg)
• TODO:
• OPD DOCUMENTATION
![Page 75: Autodiscovery or The long tail of open data](https://reader034.vdocument.in/reader034/viewer/2022042907/5876bbaf1a28abad1a8b6ed3/html5/thumbnails/75.jpg)