finding functional gene relationships using the semantic gene organizer (sgo) kevin heinrich...
TRANSCRIPT
![Page 1: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/1.jpg)
Finding Functional Gene Relationships Using the Semantic
Gene Organizer (SGO)
Kevin Heinrich
Master’s Defense
July 16, 2004
![Page 2: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/2.jpg)
Outline
• Problem / Goals
• Related Work
• Information Retrieval– Vector Space Model– Latent Semantic Indexing (LSI)
• Biological Databases
• SGO Use & Results
![Page 3: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/3.jpg)
Problem
• Biological tools are creating vast amounts of data.
• Current techniques are time-consuming and expensive.
• Want to know phenotype (function) from genotype (structure/sequence).
![Page 4: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/4.jpg)
Goals
• Develop a tool to aid researchers in finding and understanding functional gene relationships.
• Use information that covers whole genome, e.g. literature.
![Page 5: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/5.jpg)
Related Work
• Jenssen et al. (2001) developed PubGene.– Literature network– Assigns functional association if there is a co-
occurrence of gene symbols
• Wilkinson and Huberman (2004) expanded this idea to find communities of related genes.
• Yandell and Majoros (2002) use natural language processing techniques to identify nature of relationships.
![Page 6: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/6.jpg)
Related Work
• Most all literature-based techniques rely on term co-occurrence.
• What about gene aliases?
• Solution: Apply a more robust technique.
![Page 7: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/7.jpg)
Information RetrievalVector Space Model
• Documents are parsed into tokens.
• Tokens are assigned a weight of, wij, of ith token in jth document.
• An m x n term-by-document matrix, A, is created where
– Documents are m-dimensional vectors.– Tokens are n-dimensional vectors.
ijwA
![Page 8: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/8.jpg)
Information RetrievalTerm Weights
• Term weights are the product of a local and global component
• tf
• idf
• idf2
jiijij dglw
ijij fl
jij
jij
i f
f
g
1log2 j
iji f
ng
![Page 9: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/9.jpg)
Information RetrievalTerm Weights (cont’d)
• log-entropy
• Goal is to give distinguishing terms more weight.
n
pp
g jijij
i2
2
log
log
1
ijij fl 1log
jij
ijij f
fp
![Page 10: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/10.jpg)
Information RetrievalQuery & Similarity
• Queries are represented by a pseudo-document vector
• Similarity is the cosine of the angle between document vectors.
mgggq ,,, 210
m
kk
m
kkj
m
kkjk
j
jjj
gw
wg
dq
dqdqsim
1
2
1
2
1cos,
![Page 11: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/11.jpg)
Information RetrievalLatent Semantic Indexing (LSI)
LSI performs a truncated SVD on
A = UΣVT
• U is the m x n matrix of eigenvectors of AAT
• VT is the r x n matrix of eigenvectors of ATA• Σ is the r x r diagonal matrix containing the r nonnegative
singular values of A• r is the rank of A
A rank-k approximation is given by Ak = UkΣkVkT
![Page 12: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/12.jpg)
Information RetrievalLSI (cont’d)
• Document-to-document similarity is
• Queries are projected into low-rank approximation space
TkkkkTk VVAA
10
kkTUqq
![Page 13: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/13.jpg)
Information RetrievalLSI (cont’d)
• Scaled document vectors can be computed once and stored for quick retrieval.
• The lower-dimensional space forces queries and documents to be compared in a more conceptual manner and saves storage.
• Choice of number of factors is an open question.
• End Effect: LSI can find similarities between documents that have no term co-occurrence.
![Page 14: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/14.jpg)
Information RetrievalEvaluation Measures
• Precision – ratio of relevant returned documents to the total number of returned documents.
• Recall – ratio of relevant returned documents to the total number of relevant documents.
• Goal is to have high precision at all levels of recall.
• Systems are often evaluated by average precision (AP), which is the average of 11 interpolated precision values at the decile ranges.
![Page 15: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/15.jpg)
Biological DatabasesMEDLINE
• MEDLINE (NLM)– Contains 14+ million references to journal
articles with a concentration in medicine– Span over 4,600 journals worldwide– 1966 to present– ~500,000 citations added annually– Each citation is manually indexed with MeSH
terms.
![Page 16: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/16.jpg)
Biological DatabasesPubMed
• PubMed– Retrieves articles from MEDLINE and other
journals.– Can be queried via any combination of
attributes.
![Page 17: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/17.jpg)
Biological DatabasesLocusLink
• NCBI human-curated database• Single query interface to a comprehensive
directory for genes and gene reference sequences for key genomes.
• Provides links to related records in PubMed and other citations when applicable.
• Provides RefSeq Summary of gene function and links to key MEDLINE citations relevant to each gene.
![Page 18: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/18.jpg)
Biological DatabasesOverview
• MEDLINE has lots information– Not all articles relate to genes– Gene terminology problem
• LocusLink does not cover all relevant citations, but a representative few.
![Page 19: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/19.jpg)
Biological DatabasesGene Document Construction
• Concatenate titles and abstracts of MEDLINE citations cross-referenced in Human, Rat, and Mouse LocusLink entries.
• Sequencing abstracts included – noise
• LocusLink references are not comprehensive, so recall of all relevant abstracts is not guaranteed.
![Page 20: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/20.jpg)
SGO
• Primarily uses LSI to rank genes.
• Enables user to specify query method– Gene query– Keyword query– Number of factors– Show latent matches
• Saves previous query sessions.
![Page 21: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/21.jpg)
SGOInterface
![Page 22: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/22.jpg)
SGOInterface (cont’d)
![Page 23: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/23.jpg)
SGOTrees
• Unfortunately, ranked lists mean little to biologists.
• Pairwise distances can be formed into a matrix
where is the similarity between documents i and j
ijdD
ijijd cos1
ijcos
![Page 24: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/24.jpg)
SGOTrees (cont’d)
• Fitch-Margoliash (1967) method in PHYLIP is applied to D to generate hierarchical trees.
• Thresholds can be applied to self-similarity matrix to produce graphs.
![Page 25: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/25.jpg)
SGOHierarchical Tree
![Page 26: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/26.jpg)
SGOGraph or Nodal Tree
![Page 27: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/27.jpg)
SGOCoding Issues
• Web interface – must be interactive– Queries are processed on click– Document collections are parsed offline– Trees are constructed offline
• Storage will eventually become an issue.
![Page 28: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/28.jpg)
ResultsTest Data Set
• 50 gene test data set was constructed.– Alzheimer’s Disease– Cancer– Development
• Reelin signaling pathway used as basis for evaluation– 5 primary genes (directly
associated)– 7 secondary genes (indirectly
associated)
![Page 29: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/29.jpg)
ResultsPrimary AP
• AP for 5 primary genes– 61% for 5 factors– 84% for 25 factors– 84% for 50 factors
![Page 30: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/30.jpg)
ResultsSecondary AP
• AP for 12 secondary genes– 53% for 5 factors– 59% for 25 factors– 61% for 50 factors
![Page 31: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/31.jpg)
ResultsComparison
• LSI comparable to tf-idf for 5 primary genes• Far superior to tf-idf for 12 second genes
– PubMed co-citation identifies 2 of the 7 indirectly related genes
– Abstract overlap of LocusLink citations fails to identify any indirectly related genes
• tf-idf fails on many keyword queries
• Tested on Gene Ontology classifications (not shown)– Similar tendencies are observed
![Page 32: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/32.jpg)
ResultsAbstract Representation
• To simulate scaling up, decrease representation of reelin-related genes
• AP of 47% on 20,856 Human LocusLink abstracts
![Page 33: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/33.jpg)
ResultsHierarchical Tree
![Page 34: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/34.jpg)
ResultsHierarchical Tree
![Page 35: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/35.jpg)
ResultsHierarchical Tree
![Page 36: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/36.jpg)
Conclusions
• SGO allows genes to be compared to each other and to keyword (function).
• SGO identifies latent relationships with promising accuracy.
• SGO is not meant to replace existing technologies, but to assist researchers– Verify current results– Direct future exploration
![Page 37: Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004](https://reader036.vdocument.in/reader036/viewer/2022062422/56649f1e5503460f94c35c68/html5/thumbnails/37.jpg)
Future Work
• Scale up to entire genome
• Document construction
• Incorporate structural or other information for multi-modal similarity
• Test other models e.g. NMF, QR, etc.
• Interactive tree building
• Keep collections current