digital entomology at the nhm, london · 07.09.2016 · nhm collection - 2014 collection area no...
TRANSCRIPT
Digital Entomology at the NHM, London Digitisation, Crowdsourcing, Portals, Linked data, Apps, & Computer vision
Digital Science & Innovation Group Natural History Museum, London
Vince Smith & NHM DCP team
International Entomological Congress (ICE), Orlando, FL, 26 Sept. 2016
“To collate, organise and make available to global scientific &
public audiences one of the world’s most important natural
history collections”
NHM Digital Collections Programme (2014-2024)
Digitisation Target = 20 million / 5 Years = >10,000 digitised per day
DCP exists to tackle digitisation at this scale, albeit working up to it.
NHM Collection - 2014 Collection area No of objects No of type
specimens Physical register
Digital data
Palaeontology 6,919,207 43,146 2,364,232 340,636 Mineralogy 423,563 615 425,000 402,727 Botany 5,863,000 172,750 127,200 645,222 Entomology 33,753,257 612,796 57,197 255,000 Zoology 27,501,350 325,000 1,986,000 1,160,216 Library & archives 5,460,000 - - - TOTAL 79,920,377 1,154,307 4,959,629 2,803,801
Insect specimens are the least digitised of our collections
NHM 2016 Collections Assessment • Designed to aid strategic development of the collection (inform collections digitisation) • Developed by the NMNH, minor mods. for NHM to criteria, ranks & data processes • 14 assessment criteria scored 0-5 (Condition, importance, information and Outreach) • 1436 collections assessments (70 assessors, 20k ranking assessments)
Mega Orders Coleoptera 8,515,173 Diptera 2,278,510 Hymenoptera 5,175,117 Lepidoptera 11,394,790
Other Orders Ephemeroptera 15,010 Hemiptera 356,960 Neuroptera 37,700 Odonata 106,374 Orthopteroids 802,797 Psocodea 98,618 Siphonaptera 220,153 Sternorrhyncha 60,000 Thysanoptera 80,000 Trichoptera 66,621
Small Orders 33,189
Accessions 2,460,000
GRAND TOTAL 31,701,012
Rank 0 0%
Rank 1 1% Rank 2
14%
Rank 3 16%
Rank 4 30%
Rank 5 39%
Condition Rank 0
12% Rank 1
1% Rank 2 1%
Rank 3 4%
Rank 4 22% Rank 5
60%
Importance & Significance
Rank 0 9%
Rank 1 10%
Rank 2 10%
Rank 3 18%
Rank 4 19%
Rank 5 34%
Information Rank 0 9%
Rank 1 7%
Rank 2 9%
Rank 3 13%
Rank 4 13%
Rank 5 49%
Outreach
Large-scale pinned insect digitisation: • High-throughput digitisation workflows • Informatic pipelines • Computer-assisted object recognition
Environmental Change • UK butterflies & moths • 800k specimens • 2 mins per specimen • £1 per specimen
Bulk insect digitisation: • High resolution whole drawer images • Computer vision object recognition • Reduces specimen handling
Associated software • 5 minutes per drawer scan • Circa 500+ MB images • Automated specimen recognition, barcode reading & rapid annotation
InSelect: digitisation workflow software
Edge detection software for collections • Automatically detects specimens • Creates bounding boxes for cropping • Batch processing (annotation & export) • Embedded in 10+ institutional digi. workflows • Multiple use cases
Pitfall trap samples (Skinks)
Insect soups (Diptera)
Pinned Insects (Lucanidae)
Microscope slides (Ephemeroptera)
https://naturalhistorymuseum.github.io/inselect/
Parasites & Vectors • ~100k microscope slides • 2 mins per specimen • <£0.50 per slide • Comp. vision pipelines • Low res. specimen images • Crowdsourcing transcription • Still too slow!
Rapid slide digitisation hardware • Compact, cheap & simple • £500+, including 3D printed parts, plus two Canon SLR cameras (£3k each) • Combined low-res. & high-res. Workflow as one or two-step processes • Ergonomic, 2/5 seconds for low- & high-res. images • Open source software, Canon Software Developers Kit, file & data orchestration
(workflow, file processing, barcode reader & annotation)
Overview camera
Area-of-interest camera
One-click operation
Light panel
Laptop (camera & workflow control)
Phthiraptera slide digitisation
• Trial on 78k louse slides
• Starting Oct. 2016 (8 months)
• Low res. + high res. types
• Crowdsourcing annotations
• High res. type imagery • New host-parasite associations • Unidentified accessions • Inclu. Meinertzhagen material • Validate Meinertzhagen bird collection
Low res. High res. Richard Meinertzhagen
Crowdsourcing platforms & projects Partnership with Notes from Nature Part of a wider community of over 1 million users interested in citizen science projects First Project Launched in August 2016 Based on microscope slides digitised in DCP.
Many more projects planned in coordination with other partners
https://www.notesfromnature.org/
http://www.nhm.ac.uk/mlm
Data Access: NHM Data Portal • Discovery of NHM collections & research data • Easy access & reuse to promote collaboration • 4m data records (+ images, sound, video & 3D) • Integrates with our collection management
system & DAMS • Traffic light data quality indicators • Stable, citable (DataCite) identifiers on datasets &
GUIDs on records to measure impact • Technically sustainable & scalable • Default open licensing (CC-Zero, CC-BY)
http://data.nhm.ac.uk
Not just collections data
Bioacoustica online repository and analysis platform for scientific recordings of wildlife sounds
NHM interactions bank Serving our data through GloBI ~500k interactions
Many datasets managed through Scratchpads Research community platform (1k communities, 10k users)
50+ additional datasets uploaded / ingested from across NHM
http://bio.acousti.ca/
http://data.nhm.ac.uk/dataset/nhm-ib
http://scratchpads.eu/
App & data visualisations driven by API App. (field guide to British fossils) • Location based mobile app • Fossil data/images & BGS stratigraphy • iOS (16 Sept.) & Android (Oct.) • Prototype for broader Sci. App
– cit. sci., communities, dig. collections
Data visualisations • Rich API, R-package, RDF & LOD • Easy to analyse & build data visualisations • Heat maps & point maps • Collection intensity • Interactive globe
Computer vision research
Label Detection
Reconstruction
Source
Morphometrics
Measuring specimens
Measuring rulers
Colour Palette Analysis
Trait detection
♂ ♀ Sex brands in Hesperia comma Hyles euphorbiae (Linnaeus, 1758)
High speed, automated data collection from large sets of images
Acknowledgements Collections Survey
Smithsonian NMNH, C. Valentine, M. Woodburn & 70 assessors
iCollections G. Paterson, D. Siebert, S. Brooks, P. Wing, F. Toloni & digitiser team
Inselect Software L. Hudson, V. Blagoderov, A. Heaton, P. Holtzhausen, L. Livermore, B. Price, S. van der Walt
Parasite slide digitisation E. Sherlock, V. Blagoderov, A. Ball, P. Ward, A. Purvis, R. Summerfield, L. Hudson, B. Price
Slide digitisation hardware S. Dupont, V. Blagoderov
Crowdsourcing (part of NfN) J. Lauren Cawthray, L. Livermore, L. Robinson, NfN Team (R. Guralnick, M. Denslow, A. Mast)
Data Portal & data visualisation B. Scott, A. Heaton, M. Woodburn
BioAcoustica & Interactions bank E. Baker, (GloBi: J. Poelen)
British Fossil App K. Johnson, B. Atkinson, I. Teage
Computer vision research J. Durrant, L. Hudson
DCP Programme Board (strategy & funding) H. Hardy (DCPT2), V. Smith, I. Owens, C. Valentine, C. Smith, B. Atkinson (DCPT1), A. Purvis, G. Patterson
We are hiring! Entomology • Principle Curator in Charge
• Contract: Permanent role • Closing date: 9am, Monday 17 October 2016
• Senior Curators in Charge • (TBC multiple orders/groups)
Digital Science • Research Software Engineer
• Contract: Permanent role
• Data Portal Developer • Scratchpads Developer • Data Analyst
http://www.nhm.ac.uk/about-us/careers.html
Queries - contact [email protected]