literature review of text mining and pharmacogenomics

Literature Review of Text Mining and Pharmacogenomics

Text mining is the process of analyzing naturally occurring text for the purpose of

discovering and capturing semantic information for insertion and storage in a knowledge

organization structure that enables knowledge discovery via textual or visual access for

a range of applications [5]. This is a computational process of extracting useful

information from massive amounts of digital data, and then mapping meaningful

patterns implicitly present in the data [1]. This process is useful for discovering patterns

and trends within scientific literature, and also the confirmation of novel hypotheses

within specific domains. Text mining shifts the burden of retrieving information from the

researcher, to the computer by four main methods. These methods include Natural

language processing, named entity recognition, text classification, and information

retrieval (IR).

Natural language is a method to translate between human and computer

languages. This automates the translation process between humans and computers.

Named entity recognition is the process of the computer to look inside the text of

scientific literature to identify all instances of a specific domain. For example, in a

scientific article, a molecular compound in broccoli called “Sulforaphane” may be

referred to as “it” later on in the literature. Named entity recognition allows the computer

to understand that “Sulforaphane” is has the same semantic meaning as “it” in this

instance. Text classification is the ability of the computer to determine if a document

discusses a certain topic or contains specific information that the user is requesting [10].

An instance of this being used, is a search engine, such as Google. When a user types

in a word or phrase that they want more information on, Google will search to find a

relevant document or web page that is in the same domain as what was being

searched. Information retrieval aims to identify text segments in documents that pertain

to specific topics. Ideally, information retrieval should be able to recognize sentences

that are related to one another with 100% accuracy. This is not the case though, due to

the occurrence of false positives being retrieved instead of information directly

pertaining to the specified domain. Text mining also operates through semantics and

syntax. Syntax refers to the orderly manner in which words are put to form phrases and

sentences, while semantics refers to the meaning implied by words according to the

sentence it is placed in [4]. Text mining allows the user to extract information from

unstructured data sets and format it in a way that puts this newly found information into

a database or graph [6]. About 85% of data on the internet is in the form of free text.

This is a substantial amount of knowledge that needs to be exploited and utilized. In

addition, about 300 scientific articles are published to PubMed every hour. This makes it

very hard for researchers to keep up to date with new literature that pertains to their

area of research. This is why text mining is essential; it has clustering capabilities that

allows the program to group texts according to similarity of content [9].

Text mining is now moving into new areas and is growing at a rapid rate. This is

shown readily in biological articles, because biological research is shifting from

individual genes and proteins to entire biological systems. This is especially useful for

protein-protein reactions. It is widely known that proteins are the molecules that facilitate

most of the biological processes in a cell. In addition, most of the known proteins are

characterized by a unique function, when many of them act in coordination with others

towards the formation of protein networks in order to deliver complex actions [2]. This

type of interaction plays a key role in pharmaceutical development because when a

drug is created, it is likely to interact in the body with numerous proteins (transporters,

carriers, enzymes) and bind to multiple receptors.

These specific proteins determine absorption, distribution, and excretion of drugs

ingested inside the body. Through this, multiple snips within a gene could affect the

drugs overall response in the body. In addition, there are thousands of receptor genes

inside the body, many of which are closely related to one another through evolution by

gene duplication. In summary, text mining is important in this field of drug development

because researchers are now well aware that a drug will most likely bind to multiple

receptors inside the body, not just one receptor. Text mining allows researchers to find

connections of how certain drugs and proteins interact with one another. Text mining is

also able to identify biomarkers. Identifying biomarkers helps assess phenotypic states

of cells correlated to the genotypes of diseases from large biological data. This is the

future for predictive and personalized medicine. The era of “one drug fits all” is steadily

shifting to individualized therapy, matching the patient’s unique genetic makeup with an

optimally effective drug. Another field of study that is growing and using text mining

more readily is Pharmacogenomics.

Pharmacogenomics is a field that studies how human genetic variation impacts

drug response [3]. This involves the use of biological markers (DNA, RNA, or Proteins)

to predict the efficiency of a drug, and the likelihood of the occurrence of an adverse

event in individual patients [7]. Text mining in this field of study is mainly used for

pharmaceutical drugs and their properties. The properties include information like

dosage size and has recently moved to drug-drug interaction. This is completely new in

the field of Pharmaceuticals for many reasons. One being that research and

development cost for new pharmaceuticals is at an all-time high. Creating new drugs

requires lots of time, money, and patience. This is due to the fact that most simple drugs

involving a single gene interaction have already been created. Drugs that interact with

multiple genes are harder to create and require large amounts of time and money. This

is where text mining in Pharmacogenomics comes into play. Text mining in

pharmacogenomics tries to identify drugs and other chemicals that are functionally

important in treating and causing medically significant phenotypes in the course of

treatment [8]. It finds interrelationships that occur at the phenotype-drug level, and can

be traced back to possible genetic traits.

In a way, text mining is essentially the future of medicine because it will not only

pave the way for personalized medicine, but it will allow researchers to find different

uses for drugs that are already on the market. This will save pharmaceutical companies

time and money in their research and development for creating new drugs, when they

can just give drugs they have already created a different purpose. Text mining will allow

researchers in the pharmaceutical field to find implicit relationships between drugs that

have already been created, against different diseases and disorders. Text mining can

be applied to many different domains which include: biology, pharmacology, and

medicine. With the constant and rapid integration of these fields, we can predict that

individualized patient care and therapy will be here quicker than we anticipated.

WORKS CITED

1.) Li, H., & Liu, C. (2012). Biomarker Identification Using Text Mining.

Computational and Mathematical Methods in Medicine, 2012, e135780.

http://doi.org/10.1155/2012/135780

2.) Papanikolaou, N., Pavlopoulos, G. A., Theodosiou, T., & Iliopoulos, I. (2015).

Protein–protein interaction predictions using text mining methods. Methods, 74,

47–53. http://doi.org/10.1016/j.ymeth.2014.10.026

3.) Sadèe, W. (1999). Pharmacogenomics. BMJ : British Medical Journal, 319(7220),

1286.

4.) Hirschman, L., Burns, G. A. P. C., Krallinger, M., Arighi, C., Cohen, K. B.,

Valencia, A., … Winter, A. G. (2012). Text mining for the biocuration workflow.

Database, 2012, bas020. http://doi.org/10.1093/database/bas020

5.) Nagarkar, S. P., & Kumbhar, R. (2015). Text mining: An analysis of research

published under the subject category “Information Science Library Science” in

Web of Science Database during 1999-2013. Library Review, 64(3), 248–262.

http://doi.org/10.1108/LR-08-2014-0091

6.) Nasreen, S., Azam, M. A., Shehzad, K., Naeem, U., & Ghazanfar, M. A. (2014).

Frequent Pattern Mining Algorithms for Finding Associated Frequent Patterns for

Data Streams: A Survey. Procedia Computer Science, 37, 109–116.

http://doi.org/10.1016/j.procs.2014.08.019

7.) Lambert, D. G. (2010). Pharmacogenomics. Anaesthesia & Intensive Care

Medicine, 11(9), 374–376. http://doi.org/10.1016/j.mpaic.2010.06.003

http://doi.org/10.1016/j.mpaic.2010.06.003

http://doi.org/10.1016/j.procs.2014.08.019

http://doi.org/10.1108/LR-08-2014-0091

http://doi.org/10.1093/database/bas020

http://doi.org/10.1016/j.ymeth.2014.10.026

8.) Kumar, V. D., & Tipney, H. J. (Eds.). (2014). Text Mining for Drug–Drug

Interaction - Springer. Springer New York. Retrieved from

http://link.springer.com.prox.lib.ncsu.edu/protocol/10.1007%2F978-1-4939-0709-

0_4

9.) Denkena, B., Schmidt, J., & Krüger, M. (2014). Data Mining Approach for

Knowledge-based Process Planning. Procedia Technology, 15, 406–415.

http://doi.org/10.1016/j.protcy.2014.09.095

10.) Ventura, J., & Silva, J. (2012). Mining Concepts from Texts. Procedia

Computer Science, 9, 27–36. http://doi.org/10.1016/j.procs.2012.04.004

http://doi.org/10.1016/j.protcy.2014.09.095

http://link.springer.com.prox.lib.ncsu.edu/protocol/10.1007%2F978-1-4939-0709-0_4

http://link.springer.com.prox.lib.ncsu.edu/protocol/10.1007%2F978-1-4939-0709-0_4

literature review of text mining and pharmacogenomics

Documents