kbd_poster.pptx

1
Knowledge Based Discovery: Through Text Mining and Graph Theory UNC Charlotte Department of Bioinformatics and Genomics Introduction Methods Results Discussion JSON (Text Mining Results) Ontologies • MeSH • ChEBI • NALT • NCBI Entrez Gene Agrico la Text Mining Software NLP Ontology based Query development Co-occurrence CSV (Intermediat e Files) Extract Data Load Data Graph Databas e Termit e References Christina Stylianou, Bishop Duhon, Walter Clements Statistics Society has become more aware of the connection between diet and human health.¹ The diagnosis of diseases linked to lifestyle choices, such as type 2 diabetes, are increasing at an alarming rate. The number of Americans diagnosed with diabetes increased from 5.6 million in 1980 to 20.9 million in 2011.² An extensive amount of published literature with information on these diseases is contained in PubMed, a database for scientific literature . However, PubMed is comprised of over 24 million citations for scientific articles, with a new article uploaded every minute.³ We aim to make full use of the vast amount of published literature through text mining, a method of literature-based discovery. Linguamatics I2E is a natural language processing (NLP) based text mining platform that we used to extracting information.. We are able to extract explicit relationships to describe the interactions between phytochemicals and genes at the molecular level. Ultimately, our work generates a graph database that can be queried to further investigate the effects of diet on human health. 1. Jensen, K., Panagiotou, G., & Kouskoumvekaki, I. (2014). Integrated Text Mining and Chemoinformatics Analysis Associates Diet to Health Benefit at Molecular Level. PLoS Comput Biol PLoS Computational Biology, 10(1). http://doi.org/10.1371/journal.pcbi.1003432 2. Number (in Millions) of Civilian, Noninstitutionalized Persons with Diagnosed Diabetes, United States, 1980-2011. (2014, May). Retrieved July 22, 2015, from http://www.cdc.gov/diabetes/statistics/prev/national/figpersons.htm 3. Using PubMed. Retrieved July 22, 2015, from http://www.ncbi.nlm.nih.gov/pubmed/ 4. sBrand, E., & Sandberg, M. (1926). The Lability of the Sulfur in Cystine Derivatives and its Possible Bearing on the Constitution of Insulin. 5. Inflammation Is Necessary for Long-Term but Not Short-Term High-Fat Diet–Induced Insulin Resistance. (2011). Diabetes. 6. Laville M, N. J. (2009). Diabetes, insulin resistance and sugars. Diabetes. 7. Holecek, M. (2015). Ammonia and amino acid profiles in liver cirrhosis: effects of variables leading to hepatic encepalopathy. Nutrition. 8. Xu, C. (2015). High expression of NQO1 is associated with poor prognosis in serous overian carcinoma. BMC Cancer. Graph Statistics Number of Nodes 36,743 Number of Relationship s 979,65 6 Number of Properties 2,975, 438 Diameter 4 Entity Statistics Plants 9,920 Chemica ls 12,557 Genes 11,331 Pathway s 2,631 Disease 304 Initial queries on the graph database of plant and disease relationships produce interesting results that show how the two are interrelated. The query revealed a pathway linking diabetes to broccoli(Fig. 3). Broccoli contains sulfur, which is beneficial for the production of insulin inside the human body. Sulfur is a component that is part of the insulin protein, which is responsible for glucose absorption within the bloodstream⁴. The RAG1 gene is known to aid in the control and production of insulin⁵. Furthermore, insulin plays a key part in the development of diabetes mellitus type 2⁶. Our query found a relationship between the sulfur atom and the RAG1 gene which are known to influence insulin levels inside the body. Hepatic encephalopathy is a brain disorder associated with liver failure and high ammonia concentrations in the liver activated by the breakdown of glutamine. It is linked to the detoxification pathway due to the function of the liver in the body. An integral gene in the detoxification pathway is NQO1. NQO1 is a quinone oxidoreductase is a flavoprotein responsible for the removal of radicals and detoxification of quinones. High ammonia levels result in oxidative stress. Vaccinium corymbosum (blueberry) is an active producer of polyphenolic compounds that act as antioxidants. The NQO1 gene and antioxidant enzyme has been found to be activated by polyphenols. It is still uncertain as to how polyphenols play a role in disease outcome. Problems that we faced include having to curate results manually in order to refine our patterns to resolve duplicate information and false Table 1.- Statistics describing graph database Table 2.- Statistics describing node types in graph To explore the pathways by which foods provide benefits to human health, we search scientific literature by developing patterns to find relationships linking disease to diet. The process is described in the workflow diagram above. Disease Pathway Gene Chemical Plant Figure 3: Relationships between Brassica oleracea var. italica (broccoli), Sulfur atom, RAG1 gene, and diabetes mellitus. Graph resulted from a shortest path query between broccoli and diabetes mellitus. Figure 1: Associations between diseases (Iron Overload & Amyloidosis), biological pathways, genes, chemicals, and plants. The query randomly selected two diseases and searched for all associated genes and chemicals. Figure 2: This database image shows relationships between diseases and the pathways they correspond with. This graph shows the process of how various diseases (red) are related to broccoli. The query specified broccoli as the plant of interest and limited the quantity of diseases to the amount shown. Only the pathways, genes, and chemicals shared between entities are displayed.

Upload: bishop-duhon

Post on 14-Apr-2017

75 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: KBD_Poster.pptx

Knowledge Based Discovery: Through Text Mining and Graph Theory

UNC Charlotte Department of Bioinformatics and Genomics

Introduction Methods

Results

Discussion

JSON(Text Mining

Results)

Ontologies• MeSH• ChEBI• NALT• NCBI• Entrez Gene

Agricola

Text Mining Software• NLP• Ontology based• Query development• Co-occurrence

CSV(Intermediate

Files)

Extract Data Load Data Graph Database

Term

ite

References

Christina Stylianou, Bishop Duhon, Walter Clements

StatisticsSociety has become more aware of the connection between diet and human health.¹ The diagnosis of diseases linked to lifestyle choices, such as type 2 diabetes, are increasing at an alarming rate. The number of Americans diagnosed with diabetes increased from 5.6 million in 1980 to 20.9 million in 2011.² An extensive amount of published literature with information on these diseases is contained in PubMed, a database for scientific literature . However, PubMed is comprised of over 24 million citations for scientific articles, with a new article uploaded every minute.³ We aim to make full use of the vast amount of published literature through text mining, a method of literature-based discovery. Linguamatics I2E is a natural language processing (NLP) based text mining platform that we used to extracting information.. We are able to extract explicit relationships to describe the interactions between phytochemicals and genes at the molecular level. Ultimately, our work generates a graph database that can be queried to further investigate the effects of diet on human health.

1. Jensen, K., Panagiotou, G., & Kouskoumvekaki, I. (2014). Integrated Text Mining and Chemoinformatics Analysis Associates Diet to Health Benefit at Molecular Level. PLoS Comput Biol PLoS Computational Biology, 10(1). http://doi.org/10.1371/journal.pcbi.10034322. Number (in Millions) of Civilian, Noninstitutionalized Persons with Diagnosed Diabetes, United States, 1980-2011. (2014, May). Retrieved July 22, 2015, from http://www.cdc.gov/diabetes/statistics/prev/national/figpersons.htm3. Using PubMed. Retrieved July 22, 2015, from http://www.ncbi.nlm.nih.gov/pubmed/4. sBrand, E., & Sandberg, M. (1926). The Lability of the Sulfur in Cystine Derivatives and its Possible Bearing on the Constitution of Insulin.5. Inflammation Is Necessary for Long-Term but Not Short-Term High-Fat Diet–Induced Insulin Resistance. (2011). Diabetes.6. Laville M, N. J. (2009). Diabetes, insulin resistance and sugars. Diabetes.7. Holecek, M. (2015). Ammonia and amino acid profiles in liver cirrhosis: effects of variables leading to hepatic encepalopathy. Nutrition.8. Xu, C. (2015). High expression of NQO1 is associated with poor prognosis in serous overian carcinoma. BMC Cancer.

Graph Statistics

Number of Nodes

36,743

Number of Relationships

979,656

Number of Properties

2,975,438

Diameter 4

Entity Statistics

Plants

9,920

Chemicals

12,557

Genes 11,331

Pathways

2,631

Disease 304

Initial queries on the graph database of plant and disease relationships produce interesting results that show how the two are interrelated. The query revealed a pathway linking diabetes to broccoli(Fig. 3). Broccoli contains sulfur, which is beneficial for the production of insulin inside the human body. Sulfur is a component that is part of the insulin protein, which is responsible for glucose absorption within the bloodstream⁴. The RAG1 gene is known to aid in the control and production of insulin⁵. Furthermore, insulin plays a key part in the development of diabetes mellitus type 2⁶. Our query found a relationship between the sulfur atom and the RAG1 gene which are known to influence insulin levels inside the body.

Hepatic encephalopathy is a brain disorder associated with liver failure and high ammonia concentrations in the liver activated by the breakdown of glutamine. It is linked to the detoxification pathway due to the function of the liver in the body. An integral gene in the detoxification pathway is NQO1. NQO1 is a quinone oxidoreductase is a flavoprotein responsible for the removal of radicals and detoxification of quinones. High ammonia levels result in oxidative stress. Vaccinium corymbosum (blueberry) is an active producer of polyphenolic compounds that act as antioxidants. The NQO1 gene and antioxidant enzyme has been found to be activated by polyphenols. It is still uncertain as to how polyphenols play a role in disease outcome.

Problems that we faced include having to curate results manually in order to refine our patterns to resolve duplicate information and false positives. In addition, gathering the proper ontologies and putting them in the correct format to be used by the text mining software was tedious and time consuming. Our research generates a graph-based system mapping the connections between dietary phytochemicals and genes associated with nutritional and metabolic diseases. Through continued research and refined querying, we can improve the results to better elucidate the molecular pathway. Impending research goals comprise of widening the scope of diseases beyond nutritional and metabolic disorders.

Table 1.- Statistics describing graph database

Table 2.- Statistics describing node types in graph

To explore the pathways by which foods provide benefits to human health, we search scientific literature by developing patterns to find relationships linking disease to diet. The process is described in the workflow diagram above.

Disease Pathway Gene Chemical Plant

Figure 3: Relationships between Brassica oleracea var. italica (broccoli), Sulfur atom, RAG1 gene, and diabetes mellitus. Graph resulted from a shortest path query between broccoli and diabetes mellitus.

Figure 1: Associations between diseases (Iron Overload & Amyloidosis), biological pathways, genes, chemicals, and plants. The query randomly selected two diseases and searched for all associated genes and chemicals.

Figure 2: This database image shows relationships between diseases and the pathways they correspond with. This graph shows the process of how various diseases (red) are related to broccoli. The query specified broccoli as the plant of interest and limited the quantity of diseases to the amount shown. Only the pathways, genes, and chemicals shared between entities are displayed.