a generic data import layer for the berlin taxonomic information model
DESCRIPTION
A generic data import layer for the Berlin Taxonomic Information Model. Anton Güntsch, Andreas Müller & Walter G. Berendsohn Botanic Garden and Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories. The Berlin Taxonomic Information Model. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A generic data import layer for the Berlin Taxonomic Information Model](https://reader036.vdocument.in/reader036/viewer/2022081512/56815718550346895dc4b74e/html5/thumbnails/1.jpg)
A generic data import layer for the Berlin Taxonomic
Information Model
Anton Güntsch, Andreas Müller & Walter G. BerendsohnBotanic Garden and Botanical Museum Berlin-Dahlem
Dept. of Biodiversity Informatics and Laboratories
![Page 2: A generic data import layer for the Berlin Taxonomic Information Model](https://reader036.vdocument.in/reader036/viewer/2022081512/56815718550346895dc4b74e/html5/thumbnails/2.jpg)
A. Güntsch: A generic data import layer for the Berlin Model
The Berlin Taxonomic Information Model
Name Concept Reference
„FactualData“
Relation
• Concepts as name-reference pairs
• Explicit representation of relations between concepts
• Mechanisms for calculating factual data
![Page 3: A generic data import layer for the Berlin Taxonomic Information Model](https://reader036.vdocument.in/reader036/viewer/2022081512/56815718550346895dc4b74e/html5/thumbnails/3.jpg)
A. Güntsch: A generic data import layer for the Berlin Model
Berlin Model used by
Euro+MedMed-ChecklistIOPI Species Plantarum InitiativeAlgaterraDendroflora of El SalvadorGerman Standard List of Vascular Plants
and FernsReference List of the German MossesEDIT WP6
![Page 4: A generic data import layer for the Berlin Taxonomic Information Model](https://reader036.vdocument.in/reader036/viewer/2022081512/56815718550346895dc4b74e/html5/thumbnails/4.jpg)
A. Güntsch: A generic data import layer for the Berlin Model
Data imports (1)
Heterogeneous sources (e.g. text files, printer-formatted data, spread sheets, DBs)
Complex target model
Imports consume a substantial fraction of project costs which are often substantially underestimated.
![Page 5: A generic data import layer for the Berlin Taxonomic Information Model](https://reader036.vdocument.in/reader036/viewer/2022081512/56815718550346895dc4b74e/html5/thumbnails/5.jpg)
A. Güntsch: A generic data import layer for the Berlin Model
Data imports (2)
Analysesource
Identifysemantic
units
Transforminto
appropriateprocessable
format
Parse toformat close
to targetmodel
Duplicatedetection and
importTesting
![Page 6: A generic data import layer for the Berlin Taxonomic Information Model](https://reader036.vdocument.in/reader036/viewer/2022081512/56815718550346895dc4b74e/html5/thumbnails/6.jpg)
A. Güntsch: A generic data import layer for the Berlin Model
Data imports (2)
Analysesource
Identifysemantic
units
Transforminto
appropriateprocessable
format
Parse toformat close
to targetmodel
Duplicatedetection and
importTesting
Needs a great deal of human input
Can be automated
![Page 7: A generic data import layer for the Berlin Taxonomic Information Model](https://reader036.vdocument.in/reader036/viewer/2022081512/56815718550346895dc4b74e/html5/thumbnails/7.jpg)
A. Güntsch: A generic data import layer for the Berlin Model
Step-by-step transformation of taxonomic information: preparation
TargetBerlinModel
Database
XMLSource
XMLSoftSchema
XMLStrictSchema
Phase I Phase IIIPhase II
largely notautomatable
largelyautomated
fullyautomated
feedback
• Identify patterns
• Communicate problems
• Export to simple XML
![Page 8: A generic data import layer for the Berlin Taxonomic Information Model](https://reader036.vdocument.in/reader036/viewer/2022081512/56815718550346895dc4b74e/html5/thumbnails/8.jpg)
A. Güntsch: A generic data import layer for the Berlin Model
Step-by-step transformation of taxonomic information: preparation<Aizoaceae xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<AcceptedTaxa><Taxon>
<ID>7814</ID><Genus>Acrodon</Genus><Epithet>bellidiflorus</Epithet><AllAuthorsString>N.E.Br.</AllAuthorsString><SubSpeciesEpi>v</SubSpeciesEpi><AllAuthorsStringSubSpecies/><SpeciesName>Acrodon bellidiflorus</SpeciesName>
</Taxon><Taxon>
<ID>8566</ID><Genus>Acrodon</Genus><Epithet>subulatus</Epithet><AllAuthorsString>(Miller) N.E.Br.</AllAuthorsString><AllAuthorsStringSubSpecies/><SpeciesName>Acrodon subulatus</SpeciesName>
</Taxon></AcceptedTaxa><SynonymTaxa> […] </SynonymTaxa>
</Aizoaceae>
![Page 9: A generic data import layer for the Berlin Taxonomic Information Model](https://reader036.vdocument.in/reader036/viewer/2022081512/56815718550346895dc4b74e/html5/thumbnails/9.jpg)
A. Güntsch: A generic data import layer for the Berlin Model
Step-by-step transformation of taxonomic information: phase I
TargetBerlinModel
Database
XMLSource
XMLSoftSchema
XMLStrictSchema
Phase I Phase IIIPhase II
largely notautomatable
largelyautomated
fullyautomated
feedback
• Transform into soft schema xml
• Re-arrange, lump and split elements
• Don‘t check „taxonomic integrity“
• Tools: XSLT, Taxonomic Transformation Library (TTL), and others
![Page 10: A generic data import layer for the Berlin Taxonomic Information Model](https://reader036.vdocument.in/reader036/viewer/2022081512/56815718550346895dc4b74e/html5/thumbnails/10.jpg)
A. Güntsch: A generic data import layer for the Berlin Model
Step-by-step transformation of taxonomic information: phase I<BMIDataSource xmlns="http://www.bgbm.org/schemas/BMI/s0.7" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.bgbm.org/schemas/BMI/s0.7P:\XMLSchema\ImportSchicht\BMISoft0.7.xsd">
<MetaData> […] </MetaData><ConceptReference>
<RefCategory>database</RefCategory><RefString>Aizoaceae</RefString>
</ConceptReference><PotentialTaxa>
<PTaxon><TaxonName>
<Rank>species</Rank><GenusEpi>Acrodon</GenusEpi><SpeciesEpi>bellidiflorus</SpeciesEpi><AllAuthors>N.E.Br.</AllAuthors>
</TaxonName><TaxonStatus>Accepted</TaxonStatus><IdInSource>7814</IdInSource><RelatedTaxon ref="21" relType="basionym"/>
</PTaxon>[…]
</PotentialTaxa></BMIDataSource>
![Page 11: A generic data import layer for the Berlin Taxonomic Information Model](https://reader036.vdocument.in/reader036/viewer/2022081512/56815718550346895dc4b74e/html5/thumbnails/11.jpg)
A. Güntsch: A generic data import layer for the Berlin Model
Step-by-step transformation of taxonomic information: phase II
TargetBerlinModel
Database
XMLSource
XMLSoftSchema
XMLStrictSchema
Phase I Phase IIIPhase II
largely notautomatable
largelyautomated
fullyautomated
feedback
• Transform into strict schema XML
• Check data integrity
• Report malformed data
• Tool: TTL
![Page 12: A generic data import layer for the Berlin Taxonomic Information Model](https://reader036.vdocument.in/reader036/viewer/2022081512/56815718550346895dc4b74e/html5/thumbnails/12.jpg)
A. Güntsch: A generic data import layer for the Berlin Model
Step-by-step transformation of taxonomic information: phase II
<BMIDataSource xmlns="http://www.bgbm.org/schemas/BMI/0.7" […]><MetaData> […] </MetaData><ConceptReference>
<RefCategoryAbbrev>BK</RefCategoryAbbrev><RefString>refString</RefString><DatabaseID>4</DatabaseID>
</ConceptReference><PotentialTaxa>
<PTaxon><TaxonName>
<SpeciesName><GenusEpi>Acrodon</GenusEpi><SpeciesEpi>bellidiflorus</SpeciesEpi><AuthorTeam><AuthorTeamCache>N.E.Br.</AuthorTeamCache></AuthorTeam>
</SpeciesName></TaxonName><TaxonStatusAbbrev>A</TaxonStatusAbbrev><IdInSource>7814</IdInSource><RelatedTaxa> […] </RelatedTaxa>
</PTaxon></PotentialTaxa>
</BMIDataSource>
![Page 13: A generic data import layer for the Berlin Taxonomic Information Model](https://reader036.vdocument.in/reader036/viewer/2022081512/56815718550346895dc4b74e/html5/thumbnails/13.jpg)
A. Güntsch: A generic data import layer for the Berlin Model
Step-by-step transformation of taxonomic information: phase III
TargetBerlinModel
Database
XMLSource
XMLSoftSchema
XMLStrictSchema
Phase I Phase IIIPhase II
largely notautomatable
largelyautomated
fullyautomated
feedback
• Import into database
• Duplicate detection and resolution
• No User interaction required
• Tools: Berlin Model Object Layer (BMOL)
![Page 14: A generic data import layer for the Berlin Taxonomic Information Model](https://reader036.vdocument.in/reader036/viewer/2022081512/56815718550346895dc4b74e/html5/thumbnails/14.jpg)
A. Güntsch: A generic data import layer for the Berlin Model
Berlin Model Object Layer (BMOL)
Hides the database key systemDuplicate detectionCore-Module provides objects
corresponding to database entitiesMapper-Module interfaces with databasePersistence-Module manages data flow
between core-module and mapper-module
![Page 15: A generic data import layer for the Berlin Taxonomic Information Model](https://reader036.vdocument.in/reader036/viewer/2022081512/56815718550346895dc4b74e/html5/thumbnails/15.jpg)
A. Güntsch: A generic data import layer for the Berlin Model
Outlook
Method has been successfully tested for import of Med Checklist I, II & IV
Further imports planned for 2006Programming of additional mapper
modules desirable
![Page 16: A generic data import layer for the Berlin Taxonomic Information Model](https://reader036.vdocument.in/reader036/viewer/2022081512/56815718550346895dc4b74e/html5/thumbnails/16.jpg)
A. Güntsch: A generic data import layer for the Berlin Model
www.bgbm.org/biodivinf/