tips for taxonomic cleaning with the tnrs · 2019. 11. 18. · web interface • script:...

19
Tips for taxonomic cleaning with the TNRS Brad Boyle University of Arizona 9 January 2016

Upload: others

Post on 24-Feb-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

Tipsfortaxonomic

cleaningwiththeTNRS

BradBoyleUniversityofArizona

9January2016

Page 2: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

Taxonomiccleaning

• Whybother?• Taxonomicscrubbingapplications• Generalglitchesandgotchas• TNRSglitchesandgotchas• Pre-processing• Post-processing• Understandingtheoutput

Page 3: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

Taxonomiccleaning:whybother?

HieronymaoblongaWidespreadtropicaltree

Page 4: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

Taxonomiccleaning:whybother?

HieronymaoblongaWidespreadtropicaltree

HieronymapoasanaSynonymofHieronymaoblonga,oncethought tobeendemictoCostaRica

Page 5: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

Taxonomiccleaning:whybother?

HieronymaoblongaWidespreadtropicaltree

HieronymapoasanaSynonymofHieronymaoblonga,oncethought tobeendemictoCostaRica

HyeronimaoblongaHieronimaoblonga

CommonmisspellingsofHieronymaoblonga

Page 6: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

Whybother?

10%“bad”names

Page 7: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

Whybother?

Overlapbetweendatabasesonly3%!

Page 8: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

Whybother?

400%increaseinoverlap

Page 9: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

Taxonomiccleaningapplications

• TNRS• (http://tnrs.iplantcollaborative.org/index.html)

• TaxonStand• http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2012.00232.x/full

• GlobalNameResolver• http://resolver.globalnames.org/

• PlantMiner• http://www.plantminer.com/

• Manyothers…

Page 10: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

Generalarchitecture

• Nameparser– Breaksupandclassifiesnamecomponents

HieronimapoasanaStandley

Specificepithet

Genus Authority

Page 11: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

Generalarchitecture

• Nameresolver– Matchesthenametoreferencedatabase– Triesfuzzymatchingifexactmatchfails

HieronimapoasanaStandley

HieronymapoasanaStandl.

Misspelled

Correctspelling(aspublished)

Page 12: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

Generalarchitecture

• Taxonomicstatus&synonymconversion– Someapplicationsdonodothislaststep

HieronymapoasanaStandl.

Hieronymaoblonga(Tul.)Müll.Arg.

Synonym

Currentlyacceptedname

Page 13: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

ExampleworkflowwithTNRSAPI

• Script:tnrs_api_example.R• Steps:1. Extractthenames2. Turnintoastringseparatedbycommas3. URL-encodeandsendtotheTNRSAPI4. ConvertthereturnedJSONtodataframe5. Updateyournames

Page 14: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

ProsandConsofTNRSAPI• Advantages

– Fast,simple,fullyautomated• Disadvantages

– Can’tadjustallsettingsavailableinwebinterface– UsesTropicosasonlysource– Can’ttakeadvantageofwebinterfacetoinspectresults,choosealternativematchesandresearchnames

– Can’taccessdownloadoptionsavailableinwebinterface

– Parse-onlyoptionnotavailable

Page 15: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

ExamplebasicworkflowwithTNRSwebinterface

• Script:tnrs_gui_example.R• Steps:1. ExtractnamestoCSVfilewithtwocolumns:UniqueID&

names2. UploadtoTNRSusingbulk“UploadandSubmitList”tab,

checkingbox“Myfilecontainsanidentifierasfirstcolumn”3. Adjustnameprocessingsettingsandsubmit4. Inspectresultsonline,selectingalternatematchesif

appropriate5. Downloadresults,usingoptions:Bestmatchesonly,

Detailedresults,UTF-8format6. ImportTNRSresultsastab-delimittedfile7. RemainingprocessingasforAPI

Page 16: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

ProsandConsofTNRSWebInterface

• Disadvantages– Notfullyautomated

• Advantages– Canadjustnameresolutionsettings– Morenameresolutionsources– Usewebinterfacetoinspectresults,choosealternativematchesandresearchnames

– Selectanddownloadalternativematchesonthefly– Moredownloadoptions,including“Allmatches”(usefulifyoudon’tlikehowTNRSchoosesbestmatchandwanttoscriptityourself)

– Parse-onlymore(usefulforcomparingpartoforiginalnametomatchedname)

Page 17: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

TNRSTips&Gotchas

• Tip:Pre-pendfamilytonametopreventmatchingsimilarnamesindifferentfamilies

• Gotcha:IfyouwanttouseThePlantList,*always*selectTPL+ILDIS+GCCtogether

• Tip:ResearchanynamewhereTaxonomicStatus<>AcceptedorSynonym

• Gotcha:Evenacceptednamescanbewrong!

Page 18: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

TaxonomicStatusTaxonomicStatusreferstotheMatchedName

• Accepted:Goodtogo!• Synonym:Goodtogo,aslongasacceptednamesupplied• Noopinion:Couldbegoodorbadname.RESEARCHIT• Invalid:Nevervalidlypublished.DON’TUSE• Illegitimate:Violatesnomenclaturalrules.DON’TUSE• Rejectedname:Rejectedbynomenclaturalcommittee.DON’TUSE

• Misappliedname:Commonlymisappliedtothethewrongspecies.Mayormaynotbecorrect.RESEARCHIT

Page 19: Tips for taxonomic cleaning with the TNRS · 2019. 11. 18. · web interface • Script: tnrs_gui_example.R • Steps: 1. Extract names to CSV file with two columns: Unique ID & names

Evenacceptednamescanbewrong!

Name submitted Tropicos The Plant List

Henriettea fascicularis =Henriettella fascicularis =Henriettella fascicularis

Henriettea ramiflora Accepted Accepted

Henriettea succosa Accepted Accepted

Henriettella fascicularis Accepted Accepted

Henriettella tuberculosa Accepted =Henriettea tuberculosa

Actually,allbelonginHenriettea