oboyski ecn2013

31
Notes from Nature Citizen Science data transcription Peter Oboyski, Jun Ying Lim, Joyce Gross, Chris Snyder*, Arfon Smith*, Joanie Ball, Kip Will, Rosemary Gillespie Essig Museum of Entomology * Zooniverse Citizen Science Alliance

Upload: ecnofficer

Post on 10-May-2015

311 views

Category:

Business


2 download

TRANSCRIPT

Page 1: Oboyski ecn2013

Notes from NatureCitizen Science data transcription

Peter Oboyski, Jun Ying Lim, Joyce Gross, Chris Snyder*, Arfon Smith*, Joanie Ball,

Kip Will, Rosemary Gillespie Essig Museum of Entomology

* Zooniverse Citizen Science Alliance

Page 2: Oboyski ecn2013
Page 3: Oboyski ecn2013

How does it work?

• Introduction to CalBug• What is Zooniverse?• What do we provide?• What happens online?• What do we get back?• Technical issues• Maintaining interest• How can you get involved?

Page 4: Oboyski ecn2013

What is CalBug?

NSF - ADBC grantCollaboration among the eight major entomology

collections in CaliforniaDigitize 1.2 million specimens

Essig Museum of Entomology

California Academy of Sciences

California State Collection of Arthropods

Bohart Museum, UC Davis

Entomology Research Museum, UC Riverside

San Diego Natural History Museum

Santa Barbara Museum of Natural History

LA County Museum

Page 5: Oboyski ecn2013

Stephen Dowlan

CalPhotosMySQL database

Berkeley Mapper

http://calbug.berkeley.edu

Page 6: Oboyski ecn2013

• In development– Integrating point data (specimen records) with

Habitat, Range maps, Elevation, Climate, etc.– Historical recreation of the environment– Predict potential impacts of environmental change– Facilitate land use/management decisions

Berkeley Natural History Museums

Page 7: Oboyski ecn2013

(Optional) Sort by locality, date, sex, etc.

Remove labels, add unique identifier

Replace labels, return to collection

Manually enter data into MySQL database

Online crowd-sourcing of manual data entry

Optical Character Recognition (OCR) &

Automated data parsing

Error checking

Geographic referencing

Aggregate data in online cache

Temporospatial analyses

Take digital image, name and save file

Digitization workflow

Handling & Imaging Data Capture Data Manipulation

Page 8: Oboyski ecn2013

Why Image Labels?• Magnify difficult to read labels• Verbatim archive of label data– Essential for proofing data– Useful for taxonomists interested in label data

• Data capture can be done remotely

Page 9: Oboyski ecn2013

Digital camera tethered to computerAverage 50-55 images per hour

Including imaging, file renaming, and upload

Filename = EMEC218958 Paracotalpa ursina.jpg

Page 10: Oboyski ecn2013

Slide Scanningaverage 150 slides per hour

including scan, file renaming, and upload

Page 11: Oboyski ecn2013

400 DPISeems to

provide high enough

resolution for difficult to read

labels while keeping file

size relatively small

Page 12: Oboyski ecn2013

But not high resolution enough for taxonomic work

Page 13: Oboyski ecn2013

Using Citizen Scientist to transcribe label data

Page 14: Oboyski ecn2013
Page 15: Oboyski ecn2013

http://www.notesfromnature.org/ Launched April 22, 2013

Page 16: Oboyski ecn2013
Page 17: Oboyski ecn2013
Page 18: Oboyski ecn2013
Page 19: Oboyski ecn2013
Page 20: Oboyski ecn2013
Page 21: Oboyski ecn2013
Page 22: Oboyski ecn2013
Page 23: Oboyski ecn2013
Page 24: Oboyski ecn2013

Images in Transcriptions out

• We supply jpeg images– 400 DPI (300 DPI good)– Deposited as zip file– Stored in Amazon Cloud

• In development– Automated service to

upload images to A.C.– Be able to prioritize

image set

• Zooniverse provides– MondoDB data dump– 1 record = 1 transcription– 4 transcriptions / image

• In development– Automated daily dump

Page 25: Oboyski ecn2013

Reconciling transcriptions

• Drop down lists (Country, State, County, Date) are compared for exact match– Occasionally missing, sometimes wrong– Majority rule

• Free-form text fields (Locality, Collectors) are much more problematic– Transcribers asked to record label data verbatim– Puctuation, capitalization, spacing between words– Misspelling, expanding abbreviations, interpretations

Page 26: Oboyski ecn2013

• Developing scripts in R to reconcile free-form text

• Text matching for maximum correspondence among multiple transcriptions (cf. DNA alignment methods)

• Final result = 1 transcription in our database with links to the 4 original transcriptions marked as Citizen Science transcribed record

• Vetting by CalBug personnel still necessary, but we can prioritize based on record-matching confidence scores

Reconciling transcriptions

Page 27: Oboyski ecn2013

Generating & Maintaining Interest

Number of Notes from Nature transcriptions for CalBug

Page 28: Oboyski ecn2013

Generating & Maintaining Interest

Page 29: Oboyski ecn2013

• Popular media, social media, and press releases– Only so many occasions for a press release

• Campaigns– Highlight particular taxa, habitats, geographic regions

• Education– High quality, high resolution photo of species transcribed– Create links to other services to learn more about species

• Competitions– Prizes are worth more than badges– However, need to watch for bad data in pursuit of prize

Generating & Maintaining Interest

Page 30: Oboyski ecn2013

• Right now you cannot• iDigBio is interested in getting involved• iDigBio hosting a hackathon in December

• Begin building up collections of images

How can you get involved?

Page 31: Oboyski ecn2013

Thank youAnd a HUGE thank you to the

CalBug Armywho image our specimens

Chris Amy, Maritess Aristorenas, Jazmin Calderon, Alex Carolina, Sonia Castillo, Matthew Chan, Sabina Cook, Alex Darwish, John Davie, Jesson Go, Nick Grady-Grote, Ginger Haight, Laura Hayes, Dennis Ho, Aubrey Huey, Leah Humphreys, Veronica Hurd, Hanna Huynh, Eseosa Igbinedion, Ilona Istenes, Emma Kohlsmith, Asia Kwan, Tiffany Kyo, Jerry Lee, Ken Lee, Christina Lew, Maggie Lewis, Alex Lim, Derick Matano, Christian Munevar, Frank Ngo, Kent Nguyen,

Minh Nguyen, Riley O'Brien, Marielle Pinheiro, Rammonhan Reddy, Jessica Rothery, Stacey Rutherford, Anna Szendrenyi, Anni Sheh, Hannah Shin, Erika So, Mee Thao, Cindy Truong, Darleen Tu, Skyler Valle, Daug Vaughn, Hayden Wong, Yiu Kei Wong, Keane Yang, Kevin Yao, Frances Zhang