exploring museum collections online: the quantitative method
DESCRIPTION
A slightly adjusted (better described) version of the presentation given at Museums and the Web 2008.TRANSCRIPT
Exploring Museum Collections Online: The Quantitative Method
Frankie Roberto, Science Museum
It all started at a ‘Mashup workshop’ here...
If you want to mashup museum object data, you’ll come across 3 problems:
If you want to mashup museum object data, you’ll come across 3 problems:
1. Getting It - museum data isn’t always accessible
If you want to mashup museum object data, you’ll come across 3 problems:
1. Getting It - museum data isn’t always accessible
2. Structure - the format of the data varies across museums.
If you want to mashup museum object data, you’ll come across 3 problems:
1. Getting It - museum data isn’t always accessible
2. Structure - the format of the data varies across museums.
3. Dodgy Data - data is often full of errors, typos and incomplete fields.
There are 3 traditionally-advocated
solutions to these problems
1. Getting It = APIs
1. Getting It = APIs
2. Structure = Metadata standards
2. Structure = Metadata standards
3. Dodgy Data = Hard Work (data entry)
3. Dodgy Data = Hard Work (data entry)
These may be good solutions, but they’re all hypothetical, and rely on other people
doing things sometime in the future.
I’m not interested in perfection, I just want data that’s...
Good Enough(Assez bon)
...and I’ll return to this slide if we start to get bogged down in a search for perfection.
So here are my alternative solutions to
the 3 problems...
1. Getting It = Screen Scraping
1. Getting It = Screen Scraping
Or...
1. Getting It = making a Freedom of Information request
2. Structure = Crude data mapping
A Simple Format
Museum X Museum Y
Museum Z
(some logic) (some logic)
(some logic)
3. Dodgy Data = Just say...
3. Dodgy Data = Just say...
Good Enough(Assez bon)
I’m also interested in how we’re displaying museum collections
data online.
This is the usual approach:
This is the usual approach:
Or like this:
Or like this:
Or this...
Or this...
...basically, an image with a description.
...basically, an image with a description.
Which is all well and good, but doesn’t give you much of a sense of
this...
The objects as a collection.
So I sent an FOI request to a bunch of
museums.
...and this was the response:
Museum Granted? Response
British Museum
...and this was the response:
Museum Granted? Response
British Museum No
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
Natural History Museum
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
Natural History Museum No
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
Natural History Museum No “70m specimens”
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
Natural History Museum No “70m specimens”
Royal Armouries
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
Natural History Museum No “70m specimens”
Royal Armouries Yes
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
Natural History Museum No “70m specimens”
Royal Armouries Yes 50MB CSV file
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
Natural History Museum No “70m specimens”
Royal Armouries Yes 50MB CSV file
Sir John Soane’s Museum
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
Natural History Museum No “70m specimens”
Royal Armouries Yes 50MB CSV file
Sir John Soane’s Museum Yes
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
Natural History Museum No “70m specimens”
Royal Armouries Yes 50MB CSV file
Sir John Soane’s Museum Yes Word document
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
Natural History Museum No “70m specimens”
Royal Armouries Yes 50MB CSV file
Sir John Soane’s Museum Yes Word document
Tate Galleries
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
Natural History Museum No “70m specimens”
Royal Armouries Yes 50MB CSV file
Sir John Soane’s Museum Yes Word document
Tate Galleries No none
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
Natural History Museum No “70m specimens”
Royal Armouries Yes 50MB CSV file
Sir John Soane’s Museum Yes Word document
Tate Galleries No none
Victoria & Albert Museum
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
Natural History Museum No “70m specimens”
Royal Armouries Yes 50MB CSV file
Sir John Soane’s Museum Yes Word document
Tate Galleries No none
Victoria & Albert Museum Yes
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
Natural History Museum No “70m specimens”
Royal Armouries Yes 50MB CSV file
Sir John Soane’s Museum Yes Word document
Tate Galleries No none
Victoria & Albert Museum Yes 2.9GB XML file on DVD
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
Natural History Museum No “70m specimens”
Royal Armouries Yes 50MB CSV file
Sir John Soane’s Museum Yes Word document
Tate Galleries No none
Victoria & Albert Museum Yes 2.9GB XML file on DVD
Wallace Collection
...and this was the response:
Museum Granted? Response
British Museum No “3-4 days and over £1000”
Imperial War Museum No none
Museum of London No none
National Gallery No none
National Maritime Museum Yes 50 Excel spreadsheets
National Museums Liverpool No “More than 2.5 days”
National Portrait Gallery No “Information on website”
Natural History Museum No “70m specimens”
Royal Armouries Yes 50MB CSV file
Sir John Soane’s Museum Yes Word document
Tate Galleries No none
Victoria & Albert Museum Yes 2.9GB XML file on DVD
Wallace Collection No none
...and this was the response:
This is the data I was after:
This is the data I was after:
Who?
What?
Where?
When?
How?
This is the data I was after:
Who?
What?
Where?
When?
How?
collector/curator
object type
country of origin
year of acquisition
acquisition method
And these are some examples of
the data I got...
Who?
Who?
hoped-for data: “Henry Wellcome”, “curator x”
Who?
actual response: no data
hoped-for data: “Henry Wellcome”, “curator x”
What?
What?
target data: tag-style descriptors in the form of this is an x
What?
target data: tag-style descriptors in the form of this is an x
actual data: categories in the style of‘Horological Instruments’,
‘Coins & Commemorative Medals’,‘Jewellery’
What?Mapping process
‘Firearm’‘Edged weapon’
‘Furniture’‘Glass’
‘Merchant Ship Plans’‘Miscellaneous Antiques’
firearmweaponitem of furniturepiece of glasswareship planantique
Category Object-type tag
What?Mapping process
‘Firearm’‘Edged weapon’
‘Furniture’‘Glass’
‘Merchant Ship Plans’‘Miscellaneous Antiques’
firearmweaponitem of furniturepiece of glasswareship planantique
Would it be better to try and parse the title?
Category Object-type tag
Where?target data: country-level location
(list of countries from ISO website)
Where?
“Netherlands”
“Clydebank, Cumbria, England”
“Barrow, Cumbria, England:
“Germany”
“Glasgow, Strathclyde, Scotland”
“Newcastle upon Tyne, Tyne and Wear, England”
“Birkenhead, Merseyside, England”
“France”
“Sheernet Dockyard, Isle of Sheppey, Kent, England”
“Dockyard, Chatham, Kent, England”
“England”
“Dockyard, Portsmouth, Hampshire, England”
“Dockyard, Devonport, Devon, England”
“London, England”
0 750 1,500 2,250Number of objects
Where?
“Netherlands”
“Clydebank, Cumbria, England”
“Barrow, Cumbria, England:
“Germany”
“Glasgow, Strathclyde, Scotland”
“Newcastle upon Tyne, Tyne and Wear, England”
“Birkenhead, Merseyside, England”
“France”
“Sheernet Dockyard, Isle of Sheppey, Kent, England”
“Dockyard, Chatham, Kent, England”
“England”
“Dockyard, Portsmouth, Hampshire, England”
“Dockyard, Devonport, Devon, England”
“London, England”
0 750 1,500 2,250Number of objects
Some examples of the actual data:
Where?National Maritme Museum: 1,496 unique place_made strings
Tricky cases:
“probably Germany”“Asia”
“possibly: Chatham Dockyard”“USSR”
“Far East”“Continental Europe”
“Arabia”“Italy or England”
“Persia”“Czechoslovakia”
Good Enough(Assez bon)
Good Enough(Assez bon)
This is where you have to ignore some data and say...
When?
19051935-0412/09/0613/04/0928/06/96
190519352006 (possibly)1909 (probably)1996? 1986?
When?target data: year of acquisition
19051935-0412/09/0613/04/0928/06/96
190519352006 (possibly)1909 (probably)1996? 1986?
When?target data: year of acquisition
(this should be easy)
19051935-0412/09/0613/04/0928/06/96
190519352006 (possibly)1909 (probably)1996? 1986?
When?target data: year of acquisition
(this should be easy)
actual data:
19051935-0412/09/0613/04/0928/06/96
190519352006 (possibly)1909 (probably)1996? 1986?
How?
How?
anticipated: donation / purchase / loan
How?
anticipated: donation / purchase / loan
real data: gift / purchase / bequest / transfer / transfer from MOD / transfer; gift / purchase; transfer / loan / Sale / deposit / Acceptance in Lieu of Tax / Acquisition / Exchange / presented / museum copy / Allocated by the Naval War Trophies Committee
Prior art?
http://england.prm.ox.ac.uk/englishness-analysis-overview.html
Lessons from politics
This site uses publicly-available House of Commons data. Would someone create a MuseumsCollectForYou.com?
Lessons from politics
This site uses publicly-available House of Commons data. Would someone create a MuseumsCollectForYou.com?
Issues
Issues
• All objects are counted equally.
Issues
• All objects are counted equally.
• How can we add photographs?
Issues
• All objects are counted equally.
• How can we add photographs?
• Should we incorporate user interactions and annotations?
Where next?
Where next?
• Prototype site online: http://www.museum-collections.org
Where next?
• Prototype site online: http://www.museum-collections.org
• Get data from more museums?
Where next?
• Prototype site online: http://www.museum-collections.org
• Get data from more museums?
• I’m happy to share the data
Where next?
• Prototype site online: http://www.museum-collections.org
• Get data from more museums?
• I’m happy to share the data
• Who’s role is it to do this stuff?
Where next?
• Prototype site online: http://www.museum-collections.org
• Get data from more museums?
• I’m happy to share the data
• Who’s role is it to do this stuff?
• Would it also work for private collections (eBay addicts)?
Thanks!(presentation originally given at
Museums and the Web 2008, Montreal)
http://www.archimuse.com/mw2008/papers/roberto/roberto.htmlSee also my written paper: