untangling names lessons learned (so far) from the linking of ipni and tropicos julius welby rbg kew...
TRANSCRIPT
![Page 1: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/1.jpg)
Untangling Names
Lessons learned (so far) from the linking ofIPNI and TROPICOS
Julius WelbyRBG Kew
![Page 2: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/2.jpg)
TROPICOS + IPNI
![Page 3: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/3.jpg)
Why match?
![Page 4: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/4.jpg)
Why is this difficult?
![Page 5: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/5.jpg)
Variation
Calophyllum kiong K.Schum. & Lauterb.
Fl. Deutsch. Sudsee, 450.
Calophyllum kiong Lauterb. & K.Schum.
Die Flora der Deutschen Schutzgebiete in der Sudsee 1900
![Page 6: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/6.jpg)
Duplication• Poa annua L. -- Sp. Pl. 68. 1753 (GCI)• Poa annua L. -- Species Plantarum 2 1753 (APNI)• Poa annua L. -- Sp. Pl. 68. (IK)
![Page 7: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/7.jpg)
Duplication• Calophyllum microphyllum Scheff
in Tijdschr. Nederl. Ind. xxxii. (1871) 406. (IK)• Calophyllum microphyllum Planch. & Triana
in Ann. Sc. Nat. Ser. IV. xv. (1861) 282. (IK)• Calophyllum microphyllum T.Anders.
Fl. Brit. Ind. (J. D. Hooker). i. 272. (IK)
![Page 8: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/8.jpg)
Matching
![Page 9: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/9.jpg)
Fields
1 Calophyllum Calophyllum
2 kiong kiong
3 K.Schum. & Lauterb. Lauterb. & K.Schum.
4 Fl. Deutsch. Sudsee Die Flora der Deutschen…
5 450. 1900
![Page 10: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/10.jpg)
Lesson 1
Speed matters
![Page 11: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/11.jpg)
Speed matters
2,500 by 2,000 by 4 fields
20,000,000 comparisons
~5.5 hours at 1ms per comparison
![Page 12: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/12.jpg)
Be lazy
• Do as little as possible• Do easy things if possible• Do hard things only if necessary• Only expend effort when it’s worth it
![Page 13: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/13.jpg)
Be lazy
• Do as little as possible– Specify fields as ‘must match’– If a ‘must match’ field fails
• Mark the match as failed• Stop comparing fields
![Page 14: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/14.jpg)
Parameterised matchingspecies
infragenusinfraspeciesauthorsrank …
![Page 15: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/15.jpg)
How lazy?
![Page 16: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/16.jpg)
Optimising
• The order of field matching is important– Choose suitable fields to match first– Aim to fail matches early
• Significant speed-up
![Page 17: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/17.jpg)
Also, for speed
• Do as little as possible– Do escaping or standardisation once
– Done on import for each dataset
– Keep field matching functions clean
![Page 18: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/18.jpg)
More speed optimisation• Do easy things if possible
– Define cascading tests– Do easy tests first, if practical
– Length comparisons– Composition comparisons
![Page 19: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/19.jpg)
Speed Lessons
• Speed matters
• Minimise comparisons made– ‘Must match’ parameters– Match fields in an efficient order
• Do data cleaning once, up front
• Look for ways to fail matches cheaply
![Page 20: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/20.jpg)
Accuracy
![Page 21: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/21.jpg)
Accuracy
False +
False -
OK
![Page 22: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/22.jpg)
Strict match F-
OK
![Page 23: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/23.jpg)
Fuzzy match
F+OK
![Page 24: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/24.jpg)
Doughnut of uncertainty
![Page 25: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/25.jpg)
Lesson 2:Look at near misses
![Page 26: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/26.jpg)
Near misses are checkable
![Page 27: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/27.jpg)
One approach• Currently, to get best results:
– Tend towards strictness– Handle false negatives
![Page 28: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/28.jpg)
One approach• Currently, best results from:
– Tend towards strictness– Handle false negatives
• Failures on ‘rightmost’ fields can be written to a report
• Checked and fed back in as escapes
• Rerun
![Page 29: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/29.jpg)
Lesson 3:Remove predictable variation
![Page 30: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/30.jpg)
Predictable variation• Gendered endings
• Common alternatives– Endings:
• ii,i• Iae,ae
• Dataset specific quirks:– &, &
![Page 31: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/31.jpg)
The framework• Python
• Psyco• Modular• Extensible • In progress• More details will be available on the TDWG website• Source code availability
![Page 32: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/32.jpg)
The framework• Some results (HTML)
![Page 33: Untangling Names Lessons learned (so far) from the linking of IPNI and TROPICOS Julius Welby RBG Kew j.welby@kew.org](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e2d5503460f94b1c608/html5/thumbnails/33.jpg)
Thanks to• Bob Magill• Sally Hinchcliffe• The Moore Foundation
• Contact:• [email protected]• or after Jan 2007 :