evs data curation
DESCRIPTION
EVS Data Curation. The processing and publication of data for web browsing and programmatic access. Data Curation Flowchart. Gene Ontology and Zebrafish. Downloaded as OBO from web sites Processed with C++ program into Ontylog xml – OBO2TDE.exe - PowerPoint PPT PresentationTRANSCRIPT
Gene Ontology and Zebrafish
Downloaded as OBO from web sites Processed with C++ program into Ontylog
xml – OBO2TDE.exe Processed with C++ program into OWL –
ontyxToOWL.exe Loaded using LoadNCIThesOWL.sh Metadata loaded using LoadMetadata Hierarchy and Sources manually edited
HL7 and VA_NDFRT
Retrieved from sources Processed by Apelon into Ontylog XML Loaded into LexBIG using LoadNCIThesOwl
and manifest Metadata loaded using LoadMetadata
MGED
OWL file downloaded from source web site Loaded into Protégé Classified Inferred version exported as OWL file Loaded into LexBIG using LoadNCIThesOwl Metadata loaded using LoadMetadata Hierarchy and Sources manually edited
Snomed, MedDRA and LOINC Extracted from the UMLS into RRF files Loaded into LexBIG using LoadUMLSFiles Metadata loaded using LoadMetadata
UMLS Semnet
Downloaded from UMLS Semnet web site Loaded using LoadUMLSSemnet Metadata loaded using LoadMetadata
Metathesaurus
Load from UMLS into MEME NCI Thesaurus imported monthly Other vocabs added or removed NCI specific edits made to data and relations Exported as RRF Imported to LexBIG using LoadNCIMeta Metadata loaded using LoadMetadata
Preparing TDE Thesaurus for MEME Thesaurus Ontylog XML baseline is
processed through C++ app publishMEME.exe
Current baseline compared to previous to get summary of new properties or roles
Summary used to create import configuration file
Baseline imported into MEME
NCI Thesaurus from TDE
Edited in TDE and exported to Ontylog XML by name
Run through publishTDE to remove unpublishable properties
run through OntyxToOwl.exe to create OWL file by code
Loaded into LexBIG using LoadNCIThesOWL Metadata loaded using LoadMetadata History generated from TDE baseline History loaded using LoadNCIHistory
NCI Thesaurus from Protege
Run OWL through application to get Ontylog XML by name
Run Ontylog XML through publishTDE to remove unpublishable properties
Run through OntylogtoOWL to get OWL by code
Do history using the Ontylog XML
NCI Thesaurus History Processing evs_history records concept modifications
made in editor These records are extracted monthly to
consolidate and to remove identifying information
Cleaned records are loaded into concept_history
Full concept_history loaded into LexBIG for NCI Thesaurus
log.outNew concepts created through Create or Split actions:C72675|Feet_First.Concepts merged into other concepts:C17841|Oncologic_Surgeon.Retired concepts (including merged):C17841|Oncologic_Surgeon.New concepts not found in BSLN2:C73140|Ethaverine_.Retired concepts not found in BSLN2 C73401|Maqui_Berry_Flavor.Modify records correponding to Retired_Kind are discarded:667487|C62920|Medical_Device_Unsafe_to_Use|Modify|2008-03-05 ….Modify records correponding to new codes are discarded:666753|C72831|Pramiracetam_Hydrochloride|Modify|2008-02-29 ….Modify records correponding to merged codes are discarded:668629|C3824|Lesion|Modify|2008-03-06 11:03:49.0|remennik|6116otsaremennl.nci.nih.gov|(null)|0.Records correponding to codes not found in BSLN2 are discarded:671933|C73140|Ethaverine_|New|2008-03-19 12:03:01.0|shaiu|MSDCorp-Mesh001.inside.msdinc.com|(null)|0.WARNING: New codes created, then retired, but still found in BSLN2: (to be edited manually)C72675|Feet_First.List of all remaining records.List of all discarded records:666753|C72831|Pramiracetam_Hydrochloride|Modify|2008-02-29 09:02:56.0|shaiu|MSDCorp-Mesh001.inside.msdinc.com|(null)|0.
tde_history_report.txt
Spilanthes_oleracea (Code: C72446)
Number of modelers: 3Modeler: shaiuModeler: thomasModeler: creech
Modeler: shaiuAction: modify time: 2008-03-05 05:03:58.0
Modeler: thomasAction: modify time: 2008-03-06 02:03:05.0Action: modify time: 2008-03-14 10:03:06.0
Modeler: creechAction: modify time: 2008-03-06 02:03:06.0
------------------------------------------------------------------.
Edited actions for the following concepts are discarded:
Concept codes requiring manual review:
DTS_history
DTS_history_script.sql insert into concept_history(concept, editaction, editdate, reference)
values ('C72675', 'create', '28-MAR-08', null);
insert into concept_history(concept, editaction, editdate, reference)
values ('C72676', 'create', '28-MAR-08', null);
.
.
DTS_history_out.txt666540|C72675|create|28-MAR-08|(null)
666541|C72676|create|28-MAR-08|(null)
666542|C62171|modify|28-MAR-08|(null)
.
.
DTS_history_out.outLists complete contents of both baselines.Number of codes in {baseline A} : 65265Number of codes in {baseline B} : 66022
Concepts found in {baseline B}: but not in {baseline A} C72675C72676.Concepts found in {baseline A}: but not in {baseline B} (should be empty).Verify DTS_history_out.txt against baseline data.New Concepts: 757
(1) C72675(2) C72676
.Concepts created through Split: 0
Split Concepts: 0
Retired Concepts: 4(1) C20920(2) C62920
Concepts retired through Merge: 5(1) C14142
Merge Concepts: 5(1) C1363
Modified Concepts: 1364
Invalid actions: 0
Tiered Deployments
NCICB uses 4-tiered deployments Dev tier – used internally by EVS team to test
software and data QA tier – used by QA and other software teams to
test against new EVS software or data Stage tier – used to test software deployments in
a near-production environment Production – available to outside users