requirements of a taxonomy database tcl-db a prototype

21
Requirements of a Taxonomy Database Tcl-DB a Prototype

Upload: turi

Post on 05-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Requirements of a Taxonomy Database Tcl-DB a Prototype. Outline Requirements Hierarchy Alternative Search Terms: Synonyms and Vernaculars Alternative Spellings Alternative Classifications Tcl-DB Prototype System Tcl-DB Structure 2NF Extensibile: Adding a new data source e.g. NCBI - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Requirements of a Taxonomy Database Tcl-DB a Prototype

Requirements of a Taxonomy Database

Tcl-DB a Prototype

Page 2: Requirements of a Taxonomy Database Tcl-DB a Prototype

Outline1. Requirements

• Hierarchy• Alternative Search Terms: Synonyms and

Vernaculars• Alternative Spellings• Alternative Classifications

2. Tcl-DB Prototype System• Tcl-DB Structure• 2NF

3. Extensibile: Adding a new data source e.g. NCBI4. Tcl-DB: UID Tracking5. Tcl-DB: Stats6. Utility and Further Work

Page 3: Requirements of a Taxonomy Database Tcl-DB a Prototype

1. Hierarchy

Page 4: Requirements of a Taxonomy Database Tcl-DB a Prototype

2. Alternative Search Terms: Synonyms and Vernaculars

Page 5: Requirements of a Taxonomy Database Tcl-DB a Prototype

3. Alternative Spellings: Caenorabditis elegans, C elegansand Caenorhabditis elegans  

Page 6: Requirements of a Taxonomy Database Tcl-DB a Prototype

4. Alternative Classifications:

Page 7: Requirements of a Taxonomy Database Tcl-DB a Prototype

Tcl-DB Prototype System. Proposed Architecture

Page 8: Requirements of a Taxonomy Database Tcl-DB a Prototype

Tcl-DB: Logical Structure

Page 9: Requirements of a Taxonomy Database Tcl-DB a Prototype

Tcl-DB Physical Database Structure

Page 10: Requirements of a Taxonomy Database Tcl-DB a Prototype

Assertion:Resolving the M:M with an association entity

Page 11: Requirements of a Taxonomy Database Tcl-DB a Prototype

Node:Hierarchical QueriesNested Set, Path and Connect by

>select count(name_id) from node

start with name_id = ‘100891'

connect by prior name_id = parent_name_id;

>select count(name_id) from node

where path like '/%';

>select count(name_id) from node

where left_id between 1 and 9290;

Page 12: Requirements of a Taxonomy Database Tcl-DB a Prototype

synonym_name and vernacular:subtypes,multi-valued attributes or weak entities

Page 13: Requirements of a Taxonomy Database Tcl-DB a Prototype

Tcl-DB: 2NF

Kingdom

KINGDOM_ID

NAME_ID NAME_TEXT SOURCE_ID

Rank

RANK_ID

RANK_NAME SOURCE_ID

ASSERTION

PK ASSERTION_ID

I2,I1 NAME_IDI1 SOURCE_IDI1 DBSOURCE_ID AID NID RANK_ID KINGDOM_ID

ASSERTION

PK ASSERTION_ID

I1,I2 NAME_IDI1 SOURCE_IDI1 DBSOURCE_ID AID NID RANK KINGDOM

Page 14: Requirements of a Taxonomy Database Tcl-DB a Prototype

Adding a new data source e.g. NCBITcl-DB: Procedures, Packages and Functions:

Page 15: Requirements of a Taxonomy Database Tcl-DB a Prototype

Step 1: Build Views, what names are already in the database

Page 16: Requirements of a Taxonomy Database Tcl-DB a Prototype

Step 2: Move names from view to Tcl schema

Page 17: Requirements of a Taxonomy Database Tcl-DB a Prototype

Step 3: Fill the nodes table in tcl schema

Page 18: Requirements of a Taxonomy Database Tcl-DB a Prototype

Step 4: fill synonym_name table in tcl schemaStep 5: fill vernacular table in tcl schema

Page 19: Requirements of a Taxonomy Database Tcl-DB a Prototype

Tcl-DB: UID Tracking

after name data load:

1. Run two joins on name and nids_mv

• Nids – name_id when the name_text exist

• Null – name_id when the name_text not exist

2. Update name and give all new names a NID

3. Update name give all names their original NID

4. Refresh the NID_view

Page 20: Requirements of a Taxonomy Database Tcl-DB a Prototype

Tcl-DB: Utility and Further Work

Computing Interesting Stats:•How much overlap between ITIS and NCBI?•How many names unique to NCBI?•How many of these are binomials Vs ‘environmental sample 256’•How many of these names can be matched allowing for 1 – 3 letter mismatches.•NCBI taxonomy – data quality, Integrity and Usability?Transitively closing the Synonyms Table and Vernacular TableBuilding an interface.•Spell checkers

Page 21: Requirements of a Taxonomy Database Tcl-DB a Prototype

Lots of Questions?How do we use this to build taxonomically aware databases?How about updates to the data?Database links , Web services, Simple DB Cross References?Use Genbank Model?Open to Suggestions/Ideas!

Do we need to think about:PhyloCode?Type Specimens?