literature review of text mining and pharmacogenomics
TRANSCRIPT
Literature Review of Text Mining and Pharmacogenomics
Text mining is the process of analyzing naturally occurring text for the purpose of
discovering and capturing semantic information for insertion and storage in a knowledge
organization structure that enables knowledge discovery via textual or visual access for
a range of applications [5]. This is a computational process of extracting useful
information from massive amounts of digital data, and then mapping meaningful
patterns implicitly present in the data [1]. This process is useful for discovering patterns
and trends within scientific literature, and also the confirmation of novel hypotheses
within specific domains. Text mining shifts the burden of retrieving information from the
researcher, to the computer by four main methods. These methods include Natural
language processing, named entity recognition, text classification, and information
retrieval (IR).
Natural language is a method to translate between human and computer
languages. This automates the translation process between humans and computers.
Named entity recognition is the process of the computer to look inside the text of
scientific literature to identify all instances of a specific domain. For example, in a
scientific article, a molecular compound in broccoli called “Sulforaphane” may be
referred to as “it” later on in the literature. Named entity recognition allows the computer
to understand that “Sulforaphane” is has the same semantic meaning as “it” in this
instance. Text classification is the ability of the computer to determine if a document
discusses a certain topic or contains specific information that the user is requesting [10].
An instance of this being used, is a search engine, such as Google. When a user types
in a word or phrase that they want more information on, Google will search to find a
relevant document or web page that is in the same domain as what was being
searched. Information retrieval aims to identify text segments in documents that pertain
to specific topics. Ideally, information retrieval should be able to recognize sentences
that are related to one another with 100% accuracy. This is not the case though, due to
the occurrence of false positives being retrieved instead of information directly
pertaining to the specified domain. Text mining also operates through semantics and
syntax. Syntax refers to the orderly manner in which words are put to form phrases and
sentences, while semantics refers to the meaning implied by words according to the
sentence it is placed in [4]. Text mining allows the user to extract information from
unstructured data sets and format it in a way that puts this newly found information into
a database or graph [6]. About 85% of data on the internet is in the form of free text.
This is a substantial amount of knowledge that needs to be exploited and utilized. In
addition, about 300 scientific articles are published to PubMed every hour. This makes it
very hard for researchers to keep up to date with new literature that pertains to their
area of research. This is why text mining is essential; it has clustering capabilities that
allows the program to group texts according to similarity of content [9].
Text mining is now moving into new areas and is growing at a rapid rate. This is
shown readily in biological articles, because biological research is shifting from
individual genes and proteins to entire biological systems. This is especially useful for
protein-protein reactions. It is widely known that proteins are the molecules that facilitate
most of the biological processes in a cell. In addition, most of the known proteins are
characterized by a unique function, when many of them act in coordination with others
towards the formation of protein networks in order to deliver complex actions [2]. This
type of interaction plays a key role in pharmaceutical development because when a
drug is created, it is likely to interact in the body with numerous proteins (transporters,
carriers, enzymes) and bind to multiple receptors.
These specific proteins determine absorption, distribution, and excretion of drugs
ingested inside the body. Through this, multiple snips within a gene could affect the
drugs overall response in the body. In addition, there are thousands of receptor genes
inside the body, many of which are closely related to one another through evolution by
gene duplication. In summary, text mining is important in this field of drug development
because researchers are now well aware that a drug will most likely bind to multiple
receptors inside the body, not just one receptor. Text mining allows researchers to find
connections of how certain drugs and proteins interact with one another. Text mining is
also able to identify biomarkers. Identifying biomarkers helps assess phenotypic states
of cells correlated to the genotypes of diseases from large biological data. This is the
future for predictive and personalized medicine. The era of “one drug fits all” is steadily
shifting to individualized therapy, matching the patient’s unique genetic makeup with an
optimally effective drug. Another field of study that is growing and using text mining
more readily is Pharmacogenomics.
Pharmacogenomics is a field that studies how human genetic variation impacts
drug response [3]. This involves the use of biological markers (DNA, RNA, or Proteins)
to predict the efficiency of a drug, and the likelihood of the occurrence of an adverse
event in individual patients [7]. Text mining in this field of study is mainly used for
pharmaceutical drugs and their properties. The properties include information like
dosage size and has recently moved to drug-drug interaction. This is completely new in
the field of Pharmaceuticals for many reasons. One being that research and
development cost for new pharmaceuticals is at an all-time high. Creating new drugs
requires lots of time, money, and patience. This is due to the fact that most simple drugs
involving a single gene interaction have already been created. Drugs that interact with
multiple genes are harder to create and require large amounts of time and money. This
is where text mining in Pharmacogenomics comes into play. Text mining in
pharmacogenomics tries to identify drugs and other chemicals that are functionally
important in treating and causing medically significant phenotypes in the course of
treatment [8]. It finds interrelationships that occur at the phenotype-drug level, and can
be traced back to possible genetic traits.
In a way, text mining is essentially the future of medicine because it will not only
pave the way for personalized medicine, but it will allow researchers to find different
uses for drugs that are already on the market. This will save pharmaceutical companies
time and money in their research and development for creating new drugs, when they
can just give drugs they have already created a different purpose. Text mining will allow
researchers in the pharmaceutical field to find implicit relationships between drugs that
have already been created, against different diseases and disorders. Text mining can
be applied to many different domains which include: biology, pharmacology, and
medicine. With the constant and rapid integration of these fields, we can predict that
individualized patient care and therapy will be here quicker than we anticipated.
WORKS CITED
1.) Li, H., & Liu, C. (2012). Biomarker Identification Using Text Mining.
Computational and Mathematical Methods in Medicine, 2012, e135780.
http://doi.org/10.1155/2012/135780
2.) Papanikolaou, N., Pavlopoulos, G. A., Theodosiou, T., & Iliopoulos, I. (2015).
Protein–protein interaction predictions using text mining methods. Methods, 74,
47–53. http://doi.org/10.1016/j.ymeth.2014.10.026
3.) Sadèe, W. (1999). Pharmacogenomics. BMJ : British Medical Journal, 319(7220),
1286.
4.) Hirschman, L., Burns, G. A. P. C., Krallinger, M., Arighi, C., Cohen, K. B.,
Valencia, A., … Winter, A. G. (2012). Text mining for the biocuration workflow.
Database, 2012, bas020. http://doi.org/10.1093/database/bas020
5.) Nagarkar, S. P., & Kumbhar, R. (2015). Text mining: An analysis of research
published under the subject category “Information Science Library Science” in
Web of Science Database during 1999-2013. Library Review, 64(3), 248–262.
http://doi.org/10.1108/LR-08-2014-0091
6.) Nasreen, S., Azam, M. A., Shehzad, K., Naeem, U., & Ghazanfar, M. A. (2014).
Frequent Pattern Mining Algorithms for Finding Associated Frequent Patterns for
Data Streams: A Survey. Procedia Computer Science, 37, 109–116.
http://doi.org/10.1016/j.procs.2014.08.019
7.) Lambert, D. G. (2010). Pharmacogenomics. Anaesthesia & Intensive Care
Medicine, 11(9), 374–376. http://doi.org/10.1016/j.mpaic.2010.06.003
8.) Kumar, V. D., & Tipney, H. J. (Eds.). (2014). Text Mining for Drug–Drug
Interaction - Springer. Springer New York. Retrieved from
http://link.springer.com.prox.lib.ncsu.edu/protocol/10.1007%2F978-1-4939-0709-
0_4
9.) Denkena, B., Schmidt, J., & Krüger, M. (2014). Data Mining Approach for
Knowledge-based Process Planning. Procedia Technology, 15, 406–415.
http://doi.org/10.1016/j.protcy.2014.09.095
10.) Ventura, J., & Silva, J. (2012). Mining Concepts from Texts. Procedia
Computer Science, 9, 27–36. http://doi.org/10.1016/j.procs.2012.04.004