mir-155 target prediction and validation in nasopharyngeal ... · mir-155 target prediction and...

MiR-155 Target Prediction and Validation in Nasopharyngeal

Carcinoma

I L Q A R A B D U L L A Y E V

Master of Science Thesis Stockholm, Sweden 2010

MiR-155 Target Prediction and Validation in Nasopharyngeal

Carcinoma

I L Q A R A B D U L L A Y E V

Master’s Thesis in Biomedical Engineering (30 ECTS credits) at the Computational and Systems Biology Master Programme Royal Institute of Technology year 2010 Supervisor at CSC was Erik Aurell Examiner was Anders Lansner TRITA-CSC-E 2010:164 ISRN-KTH/CSC/E--10/164--SE ISSN-1653-5715 Royal Institute of Technology School of Computer Science and Communication KTH CSC SE-100 44 Stockholm, Sweden URL: www.kth.se/csc

MiR-155 target prediction and validation in nasopharyngeal carcinoma

Abstract

MicroRNAs (miRNAs) play an important role in controlling gene expression in Euka-

ryotes. They target many mRNAs and either degrade them or inhibit their translation into

protein. Thus finding targets of miRNAs has been a hot topic since their first discovery.

Many prediction tools have been designed for the purpose of target prediction. Different

tools use different approaches, and hence they predict different targets. Thus finding the

best working tool or combination of tools is important. MicroRNA-155 (miR-155) is one

of well-studied miRNAs which is associated (mostly upregulated) to numerous diseases

including nasopharyngeal carcinoma (NPC) - one of the most common malignancies in

certain areas of South-China, and Africa. This project aims to find the best scoring miRNA

prediction tool, implementing it on miR-155, compared to the result from Microarray

experiment and in this way shed some light on NPC.

Målförutsägelse och validering av Mir-155 i nasofaryngealt

carcinoma

Sammanfattning MikroRNA (miRNA) spelar en stor roll vid reglering av genuttrycken i eukaryoter. En

betydande del av cellers mRNA påverkas av sådana miRNA, antingen genom nedbrytning

eller genom att translationen till proteinerna hämmas. Att söka efter mål för olika miRNA

har därför varit ett hett änme allt sedan miRNA först upptäcktes. Många olika verktyg har

designats och utvecklats för detta syfte och att hitta det bästa verktygen är därför viktigt.

MikroRNA-155 (miR-155) är ett välstuderat miRNA, associerat med ett flertal olika

sjukdommar såsom till exempel nasopharyngeal carcinoma (NPC) - en av det vanligast

förekomande elakartade cancrarna i vissa delar av södra Kina och Afrika. Detta projekt har

som mål att hitta det bästa verktyget för miRNA prediktion, implementera det på miR-155,

för att sedan korrelatera det med redan funna resultat från microarray experiment och på så

sätt öka förståelsen av NPC.

Aknowledgement

I would like to thank some people First of all, I would like to thank for my supervisor Erik

Aurell for taking me to his group and introducing me to his collaborators, in which I ended

up doing my thesis. He also helped me to learn making good research and to improve my

writing skills. I would also like to thank for Aymeric for his valuable contributions

especially about computational prediction part of my thesis.

Socondly, I am grateful for my supervisor at Microbiology, Tumor and Cell Biology

Department of Karolinska Institute, Professor Ingemar Ernberg, for providing me this thesis

and sustaining the suitable scientific environment as well as experimental platform. I am

deeply thankful for his doctorate student – Ziming Du, for helping me doing wet-lab

experiments. I learned a lot from Ziming.

I would like to thank for my friends – Rustam, Rasim, Emre, Alejandro, James,

Shaghayegh who supported and motivated me through the entire process. I would also like

to thank Ann Bengston for her coordinations during the administrative processes. My

special thanks go for my family who believed and supported me through my entire life

whatever the conditions are. I am and always will be deeply grateful for them. Finally, all

my heart goes for my lovely wife – Aysegul.

Table of Contents

1. Introduction ....................................................................................................................... 1

1.1 General Information about miRNAs ........................................................................... 1

1.1.1 Biogenesis ........................................................................................................... 1

1.1.2 Plant miRNA target prediction works perfect ..................................................... 4

1.1.3 Animal miRNAs ................................................................................................. 4

1.2. Target Prediction of miRNAs .................................................................................... 6

1.2.1 Features/Parameters for miRNA target prediction ............................................... 6

1.2.2 Target prediction software packages .................................................................. 10

1.3. Gene set analysis ...................................................................................................... 14

1.3.1 Gene Ontology Enrichment Analysis Software Toolkit (GOEAST) ................. 14

2 Methodology .................................................................................................................... 15

2.1 Target prediction - Gathering and handling data ...................................................... 15

2.2 Database of experimentally validated genes ............................................................. 15

2.3 Comparison ............................................................................................................... 17

2.4 Microarray experiment set-up .................................................................................. 17

2.4.1 Cell lines and tissue samples ............................................................................ 17

2.4.2 MiRNA transfections ........................................................................................ 17

2.5 Polymerase Chain Reaction (PCR) assays ................................................................ 18

2.5.1 Real-time polymerase chain reaction (qPCR) .................................................. 18

2.5.2 PCR ................................................................................................................... 19

2.6 Microarray Analysis ................................................................................................. 19

2.6.1 Defining parameters used: ................................................................................. 20

2.7 How to use of Microarray data and target predictions ............................................. 20

3 Results .............................................................................................................................. 20

Result I: The comparison between predicted targets and experimentally validated ........... 20

Part 1 of Result I: The precision test by using manually constructed database ........... 21

Part 2 of Result I: The precision test by using Mirwalk .............................................. 22

Result II: Amalgamation of predicted targets of top 4 software packages ....................... 23

Result III: Quality control of microarray data ................................................................. 24

Result IV: Elucidating microarray data ........................................................................ 25

Result V: GOEAST analysis ..................................................................................... 29

Result VI: Validation of microarray results by qPCR ....................................................... 30

4 Discussion ...................................................................................................................... 33

5 References ...................................................................................................................... 35

Appendices ............................................................................................................................ 38

Appendix 1 ............................................................................................................................ 39

Appendix 2 ............................................................................................................................ 45

Appendix 3 ............................................................................................................................ 48

Abbreviations:

MiRNA: MicroRNA

Mir-155: MicroRNA 155

3’ UTR: 3' untranslated region

RISC: RNA-induced silencing complex

mRNA: Messenger RNA: mRNA

Ago: Argonaute protein

CDS: Coding sequence

qPCR: Quantitative (real-time) polymerase chain reaction

DAVID: Database for Annotation, Visualization and Integrated Discovery

RNA Pol II: RNA Polymerase II

Pri-miRNA: Primary microRNA

Pre-miRNA: Precursor microRNA

FOXO3A: forkhead box O3A

GO: Gene Ontology

KEGG: Kyoto Encyclopedia of Genes and Genomes

GOEAST: Gene Ontology Enrichment Analysis Software Toolkit

MAMI: Meta Mir:Target Inference

ENG: Ensemble gene ID

WC: Watson-Crick

kb: kilobase

1

1. Introduction

1.1 General Information about miRNAs Micro RNAs (miRNAs) are short (19-24 nucleotides in length), endogenously expressed

RNA molecules, that regulate gene expression by directly and favorably binding to 3'

untranslated regions (UTRs) of protein coding genes [1]. It is expected that miRNAs

regulate up to 60% of all mammalian genes [22]. MiRNAs are well conserved among the

species, being an evolutionary important component [22].

The first miRNA was discovered in 1993 during the study of the gene lin-4 in the nematode

Caenorhabditis elegans [2]. It had been found that the corresponding protein's – LIN-4 –

translation is regulated by an RNA that is encoded by lin-4 itself. That endogenous RNA,

which is called lin-4, acted as post-transcriptional regulator, and one thought at that time

that this was a unique property of nematodes [2].

Plant miRNAs are usually complementary to the coding regions of mRNAs, which

promotes the cleavage of RNA. In contrast, microRNAs in animals partially base pair and

inhibit protein translation of the target mRNA. This exists in plants also, but is less

common. MicroRNAs that are partially complementary to the target can also speed

up deadenylation (shortening of polyA tail on mRNA), causing mRNAs to be degraded in

comparatively shorter time.

It is thought that miRNAs can have hundreds of targets. Until now - as reported in the

miRBase database, 14197 miRNAs in 133 species are known [26].

1.1.1 Biogenesis

Mature miRNAs are processed from longer transcripts called primary miRNAs (pri-

miRNAs). Primary miRNAs are usually transcribed by RNA Polymerase II (RNA Pol II).

They are further processed in the nucleus and form ~70 nucleotide step-loop structures

referred to as precursor miRNA (pre-miRNA) (see Figure 1).

Furthermore, pre-miRNAs are cleaved in the cytoplasm by endonuclease called Dicer into

complementary short RNA molecules. One of the short RNA molecules integrates into the

RNA-induced silencing complex (RISC) and leads the whole complex towards a target

messenger RNA (mRNA). In other words, miRNAs provide the specificity that selects the

individual gene targets through (partially) complementary base-pairing between the miRNA

and the mRNA transcript of its target gene (see Figure 1).

http://sv.wikipedia.org/wiki/Caenorhabditis_elegans

http://en.wikipedia.org/wiki/Deadenylation

2

Figure 1: Regulation of gene expression by miRNAs. Adopted from [25]. Pri-miRNAs are first processed

by the Drosha/Pasha complex into 60-70 nt pre-miRNAs in the nucleus. These pre-miRNAs are transported

by Exportin 5 into the cytoplasm. Dicer then cleaves pre-miRNAs into duplexes. Only one strand of this

duplex is incorporated into the RISC. The final complex is the functioning as both mRNA cleavage and

translational repressor by binding to the target mRNA.

Target selection then brings the mRNA transcripts close to the acting range of the RISC

effector proteins, the principle components which are a miRNA-specific Argonaute protein

(Ago) and a GW182 (scaffold protein) [27]. Purification of the RISC has shown that it

3

contains at least one member of the Ago protein family. Furthermore, mutagenesis studies

suggest that Ago2 is particularly responsible for cleavage activity of RISC [25].

Figure 2: A Speculative model showing the roles of each miRNA region and the way it binds to Ago protein.

[1]. (A) MicroRNA (red) is bound to Argonaute (AGO). The first nucleotide is twisted away from the helix

and permanently unavailable for pairing. Nucleotides 2–8 are bound (to Ago) in a way that they are

preorganized to favor efficient pairing. Nucleotides 9–11 are facing away from an incoming mRNA and

unavailable for binding; the remainder of the miRNA is bound in a configuration that has not been

preorganized for efficient pairing. (B) 8mer site has been recognized by the complex. (C) The conformational

accommodation of extensively paired sites allowing the miRNA and mRNA to wrap around each other. (D)

This pairing is suitable for mRNA cleavage, in which Ago locks the paired duplex down so that the active site

(shown with black arrow) will end up cleaving the mRNA. (E) The 3′-supplementary pairing, in which shown

that the message can pair to nucleotides 13–16. In this model, miRNA and mRNA are not wrapped around

each other. Adopted from [1].

MiRNAs are involved in diverse biological functions, such as development, proliferation,

differentiation and apoptosis [9, 10]. Accumulative evidence allude to that microRNAs are

deregulating the pathogenesis of tumors. Approximately 50% of all miRNAs are physically

located in cancer-associated regions of genome [11]. Several miRNAs are functioning as

tumor suppressors or as oncogenes [11].

Individual miRNAs are well-studied compared to multiple miRNA cooperativity. There is

the possibility that miRNAs act synergistically, which is largely unknown [12]. This makes

target prediction very complicated. Microarray studies do not reveal full information about

miRNA targets because they do not capture the effect of translation inhibition, they capture

only degradation. Proteomics studies, on the other hand, uncover more information,

because it yields data on the protein level. There are very few large proteomics studies due

to cost issues. So, when the final data production is considered, proteomics also expected

to produce less data [8, 13].

The last track on miRNA target prediction could be checking pathways. That might yield

better understanding and solving target prediction problem from the systems approach.

Particular miRNAs could act on particular pathways.

4

1.1.2 Plant miRNA target prediction works perfect

Plant miRNAs are involved in various aspects of plant growth and development, including

root formation, leaf morphology and polarity, molecular signaling, diverse transition

phases, flowering time and floral organ identity. Plant miRNAs are also involved in dealing

with stress by post-transcriptional regulation of target genes. MiRNA genes are transcribed

by RNA polymerase II [34].

Plant miRNA target prediction shows high success about finding direct targets. Simply,

checking the high complementarities between miRNA and potential mRNA coding

sequences (CDS) reveals the most probable targets [3].

Since plant miRNA target prediction shows great success in silico, there is not that much

need for novel prediction software or combination of different software.

1.1.3 Animal miRNAs

Genetics is important to identify animal miRNA targets. In contrast to plant miRNAs, it has

been found that lin-4 and let-7 regulate their gene targets by loose complementarity to the

3'UTRs of those targets. It has been established that animal miRNAs do not generally show

extensive complementarity to any endogenous transcripts [4, 5].

There are numerous target prediction software packages which try to shed some light on the

animal miRNA targeting problem. Different prediction tools try different approaches by

introducing various parameters, resulting different sets of predicted targets. The

challenging part is to identify which prediction tool(s) (or combination of different tools)

work best. The goal of this study was to find best working prediction tool(s), thus, by the

help of that finding and trying to validating some of those targets for microRNA-155 (miR-

155) in nasopharyngeal carcinoma.

1.1.3.1 MiR-155

MiR-155 is contained in Bic, a 64 nucleotide long non-coding gene, residing in

chromosome 21 : 25868163 – 25868227. Primary microRNA transcript is transcribed from

Bic, and is processed into pre-miR-155, which is 62 nucleotide long, whereas mature miR-

155 is 22 nucleotides. According to the [26], there are 16 species that miR-155 is

expressed. Some of the well-studied species are Homo sapiens, Mus musculus, Gallus

gallus, Danio rerio, Ciona savignyi and Ciona intestinalis. The miR155 gene is present in

only one copy, and miR155 does not share significant sequence with other reported

miRNAs [26, 35].

MiR-155 is involved in various biological processes including immunity, haematopoiesis

and inflammation. Mir-155 is highly expressed in Hodgkin‟s lymphoma and in large B cell

lymphomas. The overexpression of miR-155 indicates that it is an oncogene. MiR-155

null mice had serious immune defects in both adaptive and innate immunity [35].

5

Figure 3: The representation of precursor miR-155 (65 bp) sequence by Genome Browser, which resides in

chromosome 21 : 25868163 - 25868227 : + Adopted from [16]

Accumulating evidence indicates that miR-155 is an oncogenic miRNA. Many profiling

studies have already shown that miR-155 is upregulated in various types of human

malignancies [23, 24]. Those malignancies include B cell lymphoma and breast,

nasopharyngeal, colon, lung, and kidney carcinomas. For instance, in breast cancer miR-

155 induces cell survival and has a role in chemoresistance [24]. Its anti-apoptotic function

is mediated by direct inhibition of FOXO3a (the gene that belongs to the forkhead family of

transcription factors, associated with acute leukemia). Furthermore, elevated miR-155

levels have recently been observed in late stage and poor overall survival cases suffering

from several different types of malignancies. Knock-down of miR-155 has been associated

with impaired immune activity [24]. In addition, it has been linked to inflammation, as

well [24].

6

Figure4: The secondary structure of precursor miR-155 predicted by MirnaMap. Adopted from [17].

1.2. Target Prediction of miRNAs

1.2.1 Features/Parameters for miRNA target prediction

Determination of parameters that are crucial in target prediction has been quite challenging.

This is mainly due to limited pairing between miRNAs and target mRNAs. To solve that

problem, many computational and experimental approaches have been used synergistically.

Widely proposed parameters/features are divided into six categories: „seed site‟ pairing, site

location, conservation, site accessibility, multiple sites and expression profile.

1.2.1.1 ‘Seed site’ is the most important feature for target recognition

MiRNA targets contain at least one region that has Watson-Crick (WC) pairing (in which

adenine (A) forms a base pair with thymine (T) and cytosine (C) with guanine (G) )

towards the 5′ end of the miRNA binding site. Specifically, this region, which is located at

positions 2–7 from the 5′ end of miRNA, is known as the „seed‟. RISC uses this site as a

nucleation signal for recognizing target mRNAs.

A stringent-seed site has perfect Watson–Crick pairing and can be divided into four „seed‟

types: 8mer, 7mer-m8, 7mer-A1 and 6mer – varying due to the combination of the

nucleotide of position 1 and pairing at position 8. 8mer has both an adenine residue at

position 1 of the target site and base pairing at position 8. 7mer-A1 has an adenine at

http://en.wikipedia.org/wiki/Adenine

http://en.wikipedia.org/wiki/Thymine

http://en.wikipedia.org/wiki/Guanine

7

position 1, but no base pairing at position 8. On the other hand, 7mer-m8 has base pairing at

position 8, but not adenine at position 1. Finally, 6mer has neither an adenine at position 1

nor base pairing at position 8 [14]. The importance of the adenine at position 1 is that, it

increases the efficiency of target recognition [8]. The hierarchy can be stated as:

8mer > 7mer-m8 > 7mer-A1 > 6mer in the stringent-seed types [14].

In addition, moderate-stringent-seed matching – RISC tolerating little mismatches or the

G:U wobble within the seed region – is functional as well, because the RISC can tolerate

little mismatches or the G:U wobble within the seed region. This moderate-stringent-seed

matching has five „seed‟ types: GUM, GUT, BM, BT and LP, defined regarding to the

mismatch type [14].

The preferable nucleotide number of matches in the 3′ part differs between the site that has

stringent-seed pairing and the one that has moderate-stringent-seed pairing. Stringent-seeds

require 3–4 matches in the positions 13–16, whereas moderate-stringent-seeds require 4–5

matches in the positions 13–19. Sites with this additional 3′ pairing are called 3′-

supplementary

The advantage of using different set of seed types is increasing sensitivity. On the other

hand, high specificity is obtained when only stringent-seed types are considered, but some

targets could be missed in that way (due to tolerated mismatches, wobbles, and so on).

Figure 5: Types of miRNA target sites and multiple sites. (a) Stringent-seed site, 7mer-A1. Vertical lines

8

indicate Watson–Crick paring. (b) Moderate-stringent-seed site, showing BM as an example. (c) 3′-

supplementary site, in which more than three to four nucleotides paring required. (d) Optimal distance of two

miRNA target sites. Adopted from [15].

1.2.1.2 Site location

Most target sites of miRNAs are located in 3‟UTRs of target genes. . Somehow RISC

prefers acting on 3‟UTR. Target sites are not uniformly distributed within 3‟UTRs, but

instead tend to cluster near ends if the sequence is more than 2kb long. Some genes have

comparatively short 3‟UTRs, e.g. house-keeping genes, which is believed to help avoid

interference from miRNAs. If the 3‟UTR is short, then the binding sites (if there is any) are

usually located 15-20 nucleotides away from stop codons [15].

Alternative splicing and polyadenylation makes it difficult to predict miRNA targets,

because they result in unexpected or difficult to calculate target features. Consequently,

software packages predict many false positive targets. More specifically, polyadenylation

shortens the 3‟UTR, while alternative splicing makes different potential targets [15].

Even though many known miRNA targets are preferentially located in 3‟UTR, it is reported

that some miRNA targets are also found on 5‟UTR and CDS [19]. Reasonably, functioning

on CDS and 5‟UTR is more difficult for RISC than functioning on 3‟UTR since it might

have to compete with ribosomes, transcription factors and many other regulatory proteins.

This is believed to be one of the reasons why RISC prefers 3‟UTR [15].

1.2.1.3 Conservation: Targets and miRNAs are conserved among related species

MiRNAs that have the same seed site belong to the same miRNA family, and are well

conserved among related species. Additionally, miRNA families have targets that are

conserved among related species [9]. Applying conservation filters decreases the false

positive rate and is especially effective amongst conserved miRNAs. On the other hand, it

has been reported that 30% of all experimentally validated miRNA target genes may not be

well-conserved.

1.2.1.4 Accessibility

The secondary structure of mRNA affects the target accessibility significantly. Target sites

have to be accessible, meaning that they have to be opened and must not interact with other

sites within the mRNA. After the first interaction, the secondary structure of mRNA could

be disrupted by RISC on the binding site to elongate hybridization [15].

9

Figure 6: Accessibility of mRNA. For binding to the miRNA, the target site has to be accessible,

meaning it has to be opened and must not interact with other sites within the mRNA. Opening costs

a certain amount of energy ΔGopen . The total free energy change is Δ ΔG =ΔGduplex – ΔGopen. Δ ΔG

represents score for the accessibility of the target site and the probability for a miRNA-target

interaction. Adopted from [15].

Lower AU content is preferential, meaning that it is easy to access mRNA and bind to it,

due to less hydrogen bond between A and U. Especially, the A:Us surrounding the binding

site could be used as a significant parameter to calculate accessibility. Efficient target sites

preferentially have A:U rich context in ~30 nucleotides upstream and downstream from the

seed site [14].

10

1.2.1.5 Multiple sites in single target

Multiple binding sites might exist on the same 3‟UTR. This in fact will result in

cooperativity, which may enhance overall miRNA functionality. MiRNAs can act on their

targets synergistically. Two target sites within the optimal distance are shown to enhance

target site efficacy [14]. The optimal length is often between 17 and 35 nucleotides [14,

13].

1.2.1.6 Expression profile: miRNA:mRNA pairs are negatively correlated in

expression profiles

Single miRNA is capable of regulating many genes; thus expression profiles of mRNAs

might vary considerably depending on the miRNA expression levels. In addition, many

miRNAs are also expressed differently in different tissues. As a result, if negatively

correlated expression values of a miRNA:mRNA pair are detected across different tissue

profiles, the mRNA of the pair is probably targeted by the miRNA [15]. This approach

effectively reduces false positives. The majority of miRNA targets appear to be regulated

both at the mRNA and protein level, but some targets only show an effect at the protein

level [32].

1.2.2 Target prediction software packages

1.2.2.1 Mirtarget2

Mirtarget2 is machine learning tool, which has been developed by analyzing thousands of

genes downregulated by miRNAs Available database for miRNA target prediction in five

species are: human, mouse, rat, dog and chicken. Mirtarget2 incorporates 4 parameters

which are: moderately-stringent seeds, site positions, and site accessibility and conservation

filter [6, 7].

1.2.2.2 TargetScan

TargetScan presents several approaches for predicting microRNA target sites in several

species. The first established version of TargetScan was designed to search for seed pairing.

The ranking was based on the thermodynamic stability of the binding site. Furthermore, the

predicted targets for multiple species were combined to get predictions for conserved target

sites [18].

The context score for a specific site is the sum of the contribution of these four features:

11

i. Site (seed) contribution

ii. 3' pairing contribution

iii. Local AU content

iv. Positional contribution

The imperfect seed matching with addition of 3‟ compensatory pairing is later incorporated

to the TargetScan algorithm. The efficiencies of the sites are calculated by looking at the

3‟UTR context of the target mRNA sites. Web server of TargetScan provides miRNA

predictions for human, dog, chimpanzee, rat, mouse, chicken, rhesus, cow, frog, opossum,

worm and fly. The conservation filter is carefully quantified by TargetScan, which is called

PCT. The probability of conserved targeting considering multiple sites, gives Aggregate PCT:

1 - ( (1 - PCT)site1 x (1 - PCT)site2 x (1 - PCT)site3 ... ) [22]

Figure 7: Snapshot taken from the TargetScan web server, while looking for miR-155 putative targets.

TargetScan provides clear picture of predicted targets. Both gene symbol and the gene name are reported.

Moreover, the number of different seed types, type of conservation (conserved and poorly conserved), total

context score and aggregate PCT are shown on the website.

1.2.2.3 DIANA-MicroT v3.0

DIANA-MicroT algorithm searches stringent seed pairing to target mRNAs, which are at

least 7 consecutive WC pairs. In addition, 6mer and seeds with G:U wobble are also

accepted if the 3‟ end of the miRNA has a compensating pairing with the target [21].

12

By using the targets identified by the molecular biological method pSILAC developed by

[13], the performance of various target prediction programs was assessed. DIANA-microT

v3.0 accomplished the highest score of 66% accurately predicting targets over all predicted

targets [21].

DIANA microT web server is very user-friendly, where prediction results are organized in

expandable tabs (see Fig 8). For human and mouse those predictions are available at

http://diana.cslab.ece.ntua.gr/microT/. DIANA provides the opportunity to search for

targets of a specific miRNA and as well as miRNA(s) of specific mRNA (target genes).

Furthermore, DIANA microT v3.0 provides a signal-to-noise ratio (SNR), miTG score and

precision score. Results are ranked according to the miTG, in which user defines threshold

miTG score. Official gene symbol and Ensemble gene IDs are used as an identifier.

Finally, results can be downloaded as a spreadsheet to work on independently.

Figure 8: Snapshot taken from DIANA MicroT web server, while predicting miR-155 targets. The

expandable tab shows almost all necessary information about predicted target (in this case BACH1). One of

the very important one is seed type (shown on the very left). Shown here that in 3‟ UTR of gene BACH1,

there are 4 miR-155 target sites. Also, the number of conservations among species of that specific binding site

is expressed as well. On the very right, one can see the prediction confirmation by other well-known software

packages.

13

1.2.2.4 PicTar

PicTar – probabilistic identification of combinations of target sites – is an algorithm to

predict miRNA targets. The PicTar algorithm uses a different approach, which is ranking

targets by considering whether the mRNA is a target for combinations of other miRNAs as

well.

PicTar algorithm requires perfect 7mer of WC pairing of either nucleotide 1-7 or 2-8.

Imperfect seed pairing is also allowed in PicTar, but it does not increase the overall score.

PicTar uses RNAhybrid to calculate free energy required to form a miRNA:mRNA hybrid

in order to filter the potential targets according to the free energy filter. Additionally,

PicTar uses a conservation filter to reduce the number of false positives. Finally, the

magnitude of all inputs is put together and sent to PicTar Sequence Scoring Algorithm,

which uses Hidden Markov Model (HMM) to compute maximum-likelihood score (MLS).

MLS defines the likelihood of a gene being a target of a specific miRNA. The MLS score

is calculated for every species separately, and combined to get final PicTar score, which is

in turn used for ranking the potential targets. Typical MLS values for top predicted targets

are ranging from 5 to 10.

At http://pictar.mdc-berlin.de/ precompiled predictions for vertebrates, flies, mice and

nematodes are available.

1.2.2.5 MAMI

MAMI (Meta Mir:Target Inference) is a software/database which uses pre-compiled lists of

targets from other softwares to increase the reliability of predictions. MAMI also allows

users to choose the preferred sensitivity and specificity values.

Sensitivity = True positives / (True positives + False negatives)

Specificity = True negatives / (True negatives + False positives)

Sensitivity and specificity are easily tunable to the user's needs, which is 5 different levels

of sensitivity and specificity, to best suit for the experimental goals.

The internal cutoff values, which were used to generate each performance in the validated

set, were applied to all human miR-target predictions. Aim was to calculate the percentile

of predictions that satisfy these cutoffs.

1.2.2.6 Other prediction tools

Other prediction tools are PITA, EIMMO, Miranda, RNAhybrid, TargetRank, RNA22 and

etc.

14

Table 2: List of miRNA prediction tools and their features. Adopted from [15]

A Seed pairing. ●: stringent seeds, ○: moderately stringent seeds, Blank: seed sites not

considered. b Site location. ●: target positions considered, Blank: target positions not considered.

c Conservation. ●: with/without conservation filter, ○: with conservation filter, Blank:

conservation not considered. d Site accessibility. ●: site accessibility with minimum free energy considered, ○: A:U rich

flanking considered, Blank: site accessibility not considered. e Multiple sites in single mRNA. ●: multiple sites considered, ○: the number of putative

sites considered, Blank: multiple co-operability not considered. f Expression profile. ●: expression profiles used, Blank: expression profiles not used.

1.3. Gene set analysis Several methods have been developed for gene set analysis of microarray data. These

methods calculate the differential gene expression patterns of group of functionally related

genes rather than individual ones. The basic goal is to discover gene sets whose expression

patterns are associated with phenotypes of interest. Gene Ontology (GO) and Kyoto

Encyclopedia of Genes and Genomes (KEGG) are good examples for collecting genes into

functional groups.

1.3.1 Gene Ontology Enrichment Analysis Software Toolkit (GOEAST)

GOEAST is web based software toolkit which provides an easy way to analyze high-

throughput experimental results, i.e. microarray data. It has a user friendly interface which

is easy to visualize extensive data and perform GO analysis. Moreover, the main function

of GOEAST is to identify significantly enriched GO terms among give lists of genes using

desired statistical methods [31].

15

2 Methodology

2.1 Target prediction - Gathering and handling data First of all, all the miR-155 related predictions are obtained from each website. The

following is the list of target prediction software‟s websites:

Table 1: List of target prediction softwares/databases and their corresponding websites:

PicTar http://pictar.mdc-berlin.de/

TargetScan 5.1 www.targetscan.org

DIANA-MicroT 3.0 http://diana.cslab.ece.ntua.gr/microT/

MAMI http://mami.med.harvard.edu/

EIMMO 3 www.mirz.unibas.ch/ElMMo3/

MirTarget2 http://mirdb.org/miRDB/

PITA http://genie.weizmann.ac.il/pubs/mir07/

TargetRank http://genes.mit.edu/targetrank/

RNA22 http://cbcsrv.watson.ibm.com/rna22.html

Prediction softwares do not use a common gene identifier. As a result, DIANA-MicroT 3.0

gives gene symbol and Ensemble gene ID (ENG), TargetScan 5.1and MirTarget2 yield gene

symbol and gene name, PicTar gives gene name and RefSeq ID, MAMI shows only gene

symbols and so on. So, those results were mapped to unique identifier, which is found to

be ENG, because most genes have a unique ENG identifier.

2.2 Database of experimentally validated genes Total numbers of experimentally validated genes are constructed using Tarbase [28] and

Mirwalk [29]. These databases show both mRNA and protein level downregulation. Thus,

only mRNA level (validated by Luciferase reporter assay) down-regulations, which are

constructed by manually checking Tarbase [28] and publications are considered separately

in this study. By doing this, finally, 37 mRNA level experimentally validated

downregulated genes were obtained (see Table 2). By using those targets, one can only

study mRNA degredation, because translation inhibition is not detectable in Luciferase

http://pictar.mdc-berlin.de/

http://www.targetscan.org/

http://diana.cslab.ece.ntua.gr/microT/

http://mami.med.harvard.edu/

http://www.mirz.unibas.ch/ElMMo3/

http://mirdb.org/miRDB/

http://genie.weizmann.ac.il/pubs/mir07/

http://genes.mit.edu/targetrank/

http://cbcsrv.watson.ibm.com/rna22.html

16

reporter assay. The second database was Mirwalk [29], which comprised all the targets of

Tarbase. It was also used as a validation source, but keeping in mind that validated targets

by Mirwalk are derived from online publications (considering any kind of miRNA-target

interactions that are reported). As a result, 528 “DIRECT and “INDIRECT” (study includes

and doesn‟t include Luciferase reporter assay, respectively) targets of miR-155 were

collected by using Mirwalk [29].

Gene_symbol Gene_name

AGTR1 Angiotensin II receptor, type 1

AGTRAP Angiotensin II receptor-associated protein

AID Activation-induced cytidine deaminase

ARID2 AT rich interactive domain 2 (ARID, RFX-like)

ARNTL Aryl hydrocarbon receptor nuclear translocator-like

AT1R angiotensin II receptor 1B

BACH1 BTB and CNC homology 1, basic leucine zipper transcription factor 1

BCL2L13 BCL2-like 13 (apoptosis facilitator)

BIRC4BP XIAP associated factor 1

CEBPB CCAAT/enhancer binding protein (C/EBP), beta

CSF1R Colony stimulating factor 1 receptor

CUTL1 Cut-like homeobox 1

Ets-1 v-ets erythroblastosis virus E26 oncogene homolog 1

FGF7 Fibroblast growth factor 7 (keratinocyte growth factor)

FOS FBJ murine osteosarcoma viral oncogene homolog

HIF1A Hypoxia inducible factor 1, alpha subunit (basic helix-loop-helix transcription factor)

HIVEP2 Human immunodeficiency virus type I enhancer binding protein 2

IKBKE Inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase epsilon

JARID2 Jumonji, AT rich interactive domain 2

MAF V-maf musculoaponeurotic fibrosarcoma oncogene homolog (avian)

MAP3K10 Mitogen-activated protein kinase kinase kinase 10

MEIS1 Meis homeobox 1

PDCD6 Programmed cell death 6

PICALM Phosphatidylinositol binding clathrin assembly protein

PU.1 Spleen focus forming virus (SFFV) proviral integration oncogene spi1

RFK Riboflavin kinase

RHOA Ras homolog gene family, member A

RPS6KA3 Ribosomal protein S6 kinase, 90kDa, polypeptide 3

SAMHD1 SAM domain and HD domain 1

SHIP1 inositol polyphosphate-5-phosphatase

SLA Src-like-adaptor

SMAD5 SMAD family member 5

TAB2 TGF-beta activated kinase 1/MAP3K7 binding protein 2

17

TP53INP1 Tumor protein p53 inducible nuclear protein 1

ZIC3 Zic family member 3 (odd-paired homolog, Drosophila)

ZNF537 Zinc finger protein 537

ZNF652 Zinc finger protein 652

Table 3: The list of experimentally validated 37 genes.

2.3 Comparison 37 validated genes were compared with predicted targets of each software/database. The

result was put into the list, which includes total number of predicted targets for each

software packages and number of validated targets are among those targets. Precision, the

percentage of validated targets to total predicted targets, was calculated for each

software/database. This parameter – precision, shows the combinatorial effect of both

sensitivity and specificity. Since, pre-compiled results are obtained directly from websites

of different softwares, it was impossible to calculated specificity (because number of True

Negatives are unknown) unless they already mention it (i.e., MAMI). On the other hand,

sensitivity (by considering validated targets) could be calculated, since number of True

Positives (TP) and False Negatives (FN) are known.

Precision = True positives / Total predicted targets

2.4 Microarray experiment set-up Microarray experimental design was done at Microbiology Tumor and Cell Biology (MTC)

department of Karolinska Institute with the help of doctoral student, Ziming Du, under the

supervision of Prof. Ingemar Ernberg. The whole experimental design, from harvesting

cells to extracting RNA took place in March 2010. Microarray experiment was done using

Affymetrix platform at the core facility for Bioinformatics and Expression Analysis (BEA),

located at the Department of Biosciences and Nutrition at Novum, Huddinge.

2.4.1 Cell lines and tissue samples

Human NPC cell line TW03 cells were cultured in IMEM (Gibco USA) containing

10% fetal calf serum (FCS). The immortalized nasopharyngeal epithelial cell line NP69

was cultured in keratinocyte serum-free medium (Invitrogen) supplemented with 5% FCS,

25 μg/ml bovine pituitary extract, and 0.2 ng/ml recombinant epidermal growth factor, as

suggested by the manufacturer. All the cell lines were grown in a humidified incubator at

37oC with 5% CO2.

2.4.2 MiRNA transfections

Before transfection, 2 × 105 cells per well were plated into 6-well plates and grown for one

day in antibiotic-free medium containing 10% FCS. When the cells confluent were reached

to 40% to 60%, cells were transfected with miR-155 Pre-miR™ miRNA Precursor (miR-

155 mimic) Molecules (Cat#: PM12601, Ambion, USA), or Pre-miR™ miRNA Precursor

Molecules-Negative Control #1 (Cat#: AM17110, Ambion, USA) or miR-155 Anti-miR™

18

miRNA Inhibitor (Cat#: AM12601, Ambion, USA), or Anti-miR™ miRNA Inhibitors-

Negative Control #1 (Cat#: AM17010, Ambion, USA) using Lipofectamine 2000

(Invitrogen, USA) according to the manufacturer‟s instructions.

Transfected (miR-155 mimic 100nM, miR-155 mimic 50nM, miR-155 control 50nM) cells

were grown at 37oC for 6 hr, followed by incubation with complete medium. For miR-155

assay and Western blot analysis, cells were harvested for RNA and protein after 48 hr.

2.5 Polymerase Chain Reaction (PCR) assays The PCR assays were done at Microbiology Tumor and Cell Biology (MTC) department of

Karolinska Institute with the help of doctoral student, Ziming Du, under the supervision of

Prof. Ingemar Ernberg. Whole experimental design took place in June 2010.

symbol mimic

100nM

mimic

50nM Control 50nM NP69 TW03 LOG2_100 LOG2_50 LOG2_TW03 prediction

C9orf5 884,29 828,1 1656,07 791,85 975,8 -0.91 -1 0.3 TargetScan

PERP 1531,4 594,97 1328,54 2120,2 1098,3 0.21 -1.16 -0.95 DIANA-MicroT

TP53INP1 48,38 50,02 164,9 7,19 111,41 -1.77 -1.72 3.95 TargetScan

TERF1 422,37 350,07 553,02 312,8 449,22 -0.39 -0.66 0.52 DIANA-MicroT+

TargeScan

BCLAF1 530,82 455,01 691,37 748,89 453,08 -0.38 -0.6 -0.72 DIANA-MicroT

E2F2 95,39 101,94 129,26 168,52 142,3 -0.44 -0.34 -0.24 DIANA-MicroT

Table 4: 6 genes which are found to be interesting enough to perform validation experiments on

them, since they have been predicted by at least one of softwares as potential targets. In addition,

the microarray expression values of those genes are downregulated compared to the control_50nM

or NP69 normal tissue.

2.5.1 Real-time polymerase chain reaction (qPCR)

For the qPCR assay, total RNA was isolated from cell lines using TRIzol reagent

(Invitrogen) according to the manufacturer‟s instructions, then was treated with RNase free

DNase I (Cat#: 04716728001, Roche). The miR-155 qPCR assay was performed by

TaqMan® MicroRNA Assays (Cat#: 4373124, Applied Biosystems, USA) and RNU6B

(Cat#: 4373381, Applied Biosystems, USA) was used as internal control. The relative

expression level was determined as 2-ΔΔCt

.

Data are presented as the expression level relative to the calibrator, with the standard error

of the mean of triplicate measures for each test sample.

After reverse transcription of the total RNA, the first-strand cDNA was then used as

template for detection of PERP, TP53INP1, TERF1, BCLAF1 and E2F2 expression by

quantitative real time PCR (QT-PCR) with the SYBR Green I chemistry (Power SYBR

Green PCR Master Mix, CAT#: 4367659, ABI Inc., USA). GAPDH was used as internal

control.

19

Here is the list of picked genes (with their corresponding primers) from microarray data for

further validations:

qRT-Primers for ZDHHC2 (NM_016353)

ZDHHC2 Forward: TCTTAGGCGAGCAGCCAAGGAT

ZDHHC2 Reverse: CAGTGATGGCAGCGATCTGGTT

qRT-Primers for KDM5B (NM_006618)

KDM5B Forward: AGCCAGAGACTGGCTTCAGGAT

KDM5B Reverse: AGCCTGAACCTCAGCTACTAGG

qRT-Primers for E2F2 (NM_004091)

E2F2 Forward: CTCTCTGAGCTTCAAGCACCTG

E2F2 Reverse: CTTGACGGCAATCACTGTCTGC

qRT-Primers for BCLAF1 (NM_014739)

BCLAF1 Forward: CCTAAACGAGCGGTTCACTTCG

BCLAF1 Reverse: GCTAAACGGGTATGCTTCCTCAG

qRT-Primers for TERF1 (NM_017489)

TERF1 Forward: CATGGAACCCAGCAACAAGACC

TERF1 Reverse: CTGCTTTCAGTGGCTCTTCTGC

qRT-Primers for TP53INP1 (NM_033285)

TP53INP1 Forward: TGATGAATGGATTCTTGTTGACTTC

TP53INP1 Reverse: TGAAGGGTGCTCAGTAGGTGAC

qRT-Primers for PERP (NM_022121)

PERP Forward: CCAGATGCTTGTCTTCCTGAGAG

PERP Reverse: AGTGACAGCAGGGTTGGCATGA

2.5.2 PCR

For normal PCR assay, total RNA was extracted from cell lines using TRIzol reagent

(Invitrogen). This was done as a quality check before running qPCR.

2.6 Microarray Analysis Microarray analysis was done at Department of Computational Biology at KTH with the

help of doctoral student, Aymeric Fouquier d‟Hérouel, under the supervision of Prof. Erik

http://www.csc.kth.se/forskning/cb/

20

Aurell. Annotations were obtained from Affymetrix probset annotation file - HuGene-1_0-

st-v1.r3.cdf. The whole analysis took place in June 2010. The PLIER algorithm was used

for gene expression analysis. The primary analysis includes the following individual

operations:

1) Image correction

2) Global and local background correction

3) Feature normalization

4) Spatial normalizatione

5) Global normalization

2.6.1 Defining parameters used:

In order to analyze large microarray data, it is important to introduce some parameters to

filter out noise. The expression values of genes are ranging approximately from 0.01 to

10000. The following parameters are chosen for eliminating noise, while not losing useful

information:

1. Expression values > 30 (applied on all samples simultaneously) AND

2. Log2 (miR-155 mimic 100nM / miR-155 control 50nM) < - 0.5 AND

3. Log2 (miR-155 mimic 50nM / miR-155 control 50nM) < - 0.5 AND

4. 2 < Log2 (miR-155 control 50nM / Np69) < 0.5

2.7 How to use Microarray data and target predictions Microarray data shows the change in mRNA expression in vitro, whereas target prediction

predicting the miRNA-mRNA interaction in silico. By combining those two types of data,

the targeting mechanism was investigated.

21

3 Results

Result I: The comparison between predicted targets and experimentally validated targets

The list of predicted targets for PicTar was obtained from online database at

http://pictar.mdc-berlin.de/ in February 2010. In total, 199 miR-155 target genes were

obtained. The list of predicted targets for TargetScan 5.1 obtained from online database at

www.targetscan.org in February 2010. In total, 281 miR-155 target genes were obtained.

The list of predicted targets for DIANA-MicroT 3.0 obtained from online database at

http://diana.cslab.ece.ntua.gr/microT/ in February 2010. In total, 166 miR-155 target genes

were obtained. The list of predicted targets for MAMI obtained from online database at

http://mami.med.harvard.edu/ in February 2010. In total, 205 miR-155 target genes were

obtained.

The manually constructed database has been created by using Tarbase [28] and different

publications. Totally 37 genes were identified as experimentally validated miR-155 targets.

Those genes were used to check the precision of software packages during downstream

processes.

Mirwalk [29] has been used for the construction of the second database. Totally 528 genes

were identified as indirect miR-155 targets. Those genes were also used to check the

precision of software packages during downstream processes.

Eleven software packages/databases were tested by using a manually constructed database

(37 targets) and Mirwalk [29] database (528 genes). By checking the precision score of

eleven softwares/databases using 2 different sets of validated targets, the reliability of those

was assessed. The ones which showed highest precision and sensitivity at the same time

were chosen to perform further predictions.

22

Part 1 of Result I: The precision test using manually constructed database

The software benchmark was implemented using 37 direct targets. The precision and sensitivity

score of eleven software/databases were checked and ranked. Top four ones are significant enough

for our further analysis.

Table5: The software benchmark using 37 direct targets. The precision and sensitivity score of

eleven software/databases were checked and ranked. Top four ones are significant enough for our

further analysis.

Part 2 of Result I: The precision test by using Mirwalk The software benchmark was implemented using 528 indirect targets. The precision and sensitivity

score of eleven software/databases were checked and ranked. The same top four ones are obtained

as in the previous test case (see Table 3). Therefore, those four software packages/databases were

found significant enough for our further analysis.

Software/

Database TRUE_POSITIVE Total_#_of_targets Precision Sensitivity

DIANA-

microT 3.0 12 166 7.23 0.32

Targetscan 22 281 7.83 0.59

Pictar 17 199 8.54 0.46

MAMI 16 205 7.8 0.43

EIMMO 31 2955 1.05 0.84

Miranda 26 1952 1.33 0.7

PITA 29 1266 2.29 0.78

RNA22 1 332 0.3 0.03

Targetrank 27 682 3.96 0.73

Mirgator 28 723 3.87 0.76

Mirbase 15 854 1,75 0,40

23

Software TRUE_POSITIVES Total_#_of_targets Precision Sensitivity

DIANA-

microT 24 166 14.46 0.05

Targetscan 38 281 13.52 0.07

Pictar 24 199 12.06 0.05

MAMI 33 205 16.1 0.06

EIMMO 123 2955 4.16 0.23

Miranda 118 1952 6.05 0.22

PITA 82 1266 6.48 0.16

RNA22 18 332 5.42 0.03

Targetrank 53 682 7.77 0.1

Mirgator 57 723 7.88 0.11

Mirbase 35 854 4.1 0.07

Table6: The software benchmark using 528 direct and indirect targets by Mirwalk - the database of

experimentally validated miRNA targets. The precision and sensitivity score of eleven

software/databases were checked and ranked. Top four ones are significant enough for our further

analysis.

Result II: Amalgamation of predicted targets of top 4 software packages yielded 9 miR-155 target candidates Top scoring software packages predicted their own gene-sets. It is not obvious which genes are

potentially targets without combining targets of four software packages. Thus by amalgamating

predicted targets from all four, the list of genes that were predicted by corresponding software

package was constructed as below:

Gene_Symbol DIANA TargS MAMI Pictar TOTAL Gene_name

NUFIP2 + + + + 4 nuclear fragile X mental retardation protein interacting protein 2

MAP3K7IP2 + + + + 4 mitogen-activated protein kinase kinase kinase 7 interacting protein 2

SGK3 + + + + 4 serum/glucocorticoid regulated kinase family, member 3

TSHZ3 + + + + 4 teashirt zinc finger homeobox 3

SEMA5A + + + + 4 sema domain, seven thrombospondin repeats (type 1 and type 1-like),

transmembrane

RAB11FIP2 + + + + 4 RAB11 family interacting protein 2 (class I)

24

SEPT11 + + + + 4 septin 11

FAR1 + + + + 4 fatty acyl CoA reductase 1

KRAS + + + + 4 v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog

ETS1 + + + + 4 v-ets erythroblastosis virus E26 oncogene homolog 1 (avian)

BACH1 + + + + 4 BTB and CNC homology 1, basic leucine zipper transcription factor 1

ZNF236 + + + + 4 zinc finger protein 236

DCUN1D3 - + + + 3 DCN1, defective in cullin neddylation 1, domain containing 3

ETNK2 - + + + 3 ethanolamine kinase 2

DNAJB7 - + + + 3 DnaJ (Hsp40) homolog, subfamily B, member 7

IKBKE - + + + 3 inhibitor of kappa light polypeptide gene enhancer in B-cells

HDAC4 + - + + 3 histone deacetylase 4

FBXO11 + + - + 3 F-box protein 11

CACNA1C - + + + 3 hypothetical protein LOC100131098;

C3orf18 - + + + 3 chromosome 3 open reading frame 18

UBQLN1 + - + + 3 ubiquilin 1

CSF1R - + + + 3 colony stimulating factor 1 receptor

CD47 - + + + 3 CD47 molecule

CARHSP1 - + + + 3 calcium regulated heat stable protein 1, 24kDa

YWHAE - + + + 3 similar to 14-3-3 protein epsilon (14-3-3E)

MIDN + + + - 3 midnolin

MAP3K14 + + - + 3 mitogen-activated protein kinase kinase kinase 14

MAP3K10 - + + + 3 mitogen-activated protein kinase kinase kinase 10

NFAT5 - + + + 3 nuclear factor of activated T-cells 5, tonicity-responsive

N4BP1 - + + + 3 NEDD4 binding protein 1

MYO10 - + + + 3 myosin X

KPNA1 + + - + 3 karyopherin alpha 1 (importin alpha 5)

KIAA1274 - + + + 3 KIAA1274

JARID2 + + + - 3 jumonji, AT rich interactive domain 2

LRRC59 - + + + 3 leucine rich repeat containing 59

Table 7: Intersection of predicted targets by four different softwares. Blue ones are validated

DIRECT targets of miR-155. The whole list is shown at Appendix 1.

If we consider the precision scores from test case I: it is ~ 8 %. After combining prediction

results of four software packages, this percentage increases ~25 % when considering 4 hits.

This means that, 3 out of 12 hits which were predicted by all four software packages are

experimentally validated direct miR-155 targets. This brings the idea that other 9 targets

(see Table 8) are strong potential miR-155 targets, which could be checked during further

validation experiments.

25


NUFIP2 + + + + 4 nuclear fragile X mental retardation protein interacting protein 2

MAP3K7IP2 + + + + 4 mitogen-activated protein kinase kinase kinase 7 interacting protein 2

SGK3 + + + + 4 serum/glucocorticoid regulated kinase family, member 3

SEMA5A + + + + 4 sema domain, seven thrombospondin repeats (type 1 and type 1-like),

transmembrane

RAB11FIP2 + + + + 4 RAB11 family interacting protein 2 (class I)

SEPT11 + + + + 4 septin 11

FAR1 + + + + 4 fatty acyl CoA reductase 1

KRAS + + + + 4 v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog

Table 8: The list of 9 miR-155 target candidates. All of those genes have been predicted by all 4

top scoring target prediction software packages.

Result III: Quality control of microarray data By plotting the scatter plot the reproducibility of the microarray experiment was checked.

Even though it is not the exact parameters that were checked (miR-155 mimic 50nM was

aimed to represent “roughly” the duplicate of miR-155 mimic 100nM), it still shows that

the data is reproducible.

26

Figure 9: Scatter plot showing miR-155 mimic 100nM and miR-155 mimic 50Mg. This figure

roughly suggests the correlation between miR-155 mimic 100ng and miR-155 mimic 50nM data.

Result IV: Elucidating microarray data revealed some potential miR-155 target genes

Part I: Using DAVID revealed two candidate genes: WEE1 and DPY19L1

As a result of Microarray analysis, using specified parameters described in Methods

section, 395 genes (not shown) were obtained. Only 363 out of 395 genes were annotated

on The Database for Annotation, Visualization and Integrated Discovery v6.7 (DAVID)‟s

database, thus were chosen for further functional analysis [36, 37]. The human genome

was used as background for the functional annotation. Results are sorted according to p-

values (see Figure 10).

The first significant functionally annotated group was identified – “urinary bladder

tumor_disease_3rd”, which belongs to the database – “UNIGENE_EST_QUARTILE”.

The list of 105 genes which belong to “urinary bladder tumor_disease_3rd” are provided in

Appendix 2.

Furthermore, other “UNIGENE_EST_QUARTILE” related enriched datasets are: adrenal

tumor_disease_3rd

, oral tumor_disease_3rd

, thyroid tumor_disease_3rd

, ear_normal_3rd

,

esophageal tumor_disease_3rd

, tongue_normal_3rd

, pharynx_normal_3rd

, mammary

gland_normal_3rd

, larynx_normal_3rd

, laryngeal cancer_disease_3rd

, pharyngeal

tumor_disease_3rd

and esophagus_normal_3rd

. Those datasets include list of genes that are

related to corresponding tissues. Since those datasets are highly enriched in this study,

meaning that miR-155 mimic downregulated genes in these datasets. In addition, many of

those tissues are located on either digestive or respiratory track where they are somehow

anatomically close to the nasopharyngeal tissue. To give a concrete example,

pharynx_normal_3rd

has 68 genes which are enriched among those 363 annotated genes

(see Appendix 3). Those genes are related to normal pharynx tissue according to

“UNIGENE_EST_QUARTILE” database. Since we are dealing with nasopharyngeal

tissue, those 68 genes are found to be highly significant for further validation analysis. By

quick looking up to the DIANA MicroT prediction results, 2 genes from the

pharynx_normal_3rd

dataset are found, which are:

DPY19L1, dpy-19-like 1 (C. elegans); similar to hCG1645499 [36]

WEE1, WEE1 homolog (S. pombe) [36]

WEE1 is predicted by three top scoring software packages: DIANA MicroT, TargetScan

and Pictar. According to DIANA MicroT it has 9mer (9 nucleotide match at seed region).

This increases the possibility of WEE1 being a potential miR-155 target. Additionally,

DPY19L1 has also been predicted by DIANA MicroT, in which it has two 8mers.

27

Figure 10: Functional annotation of 395 genes that were obtained by microarray data.

DAVID [36, 37] is used to perform the functional annotation.

Part II: Comparing predictions to microarray data slightly increased the accuracy

and revealed potential miR-155 target candidate genes, such as kras, sgk3,

MAP3K7IP2 and far1

Among the genes predicted by at least 1 top scoring software packages, the genes most

downregulated ones are also predicted more than once. In addition, experimentally

validated direct targets were enriched. The precision increased a little bit, ~30% (9 out of

30 genes are on the Table 6). Moreover, 4 genes shown red in Table 6 - kras, sgk3,

MAP3K7IP2 and far1 are predicted by all of the top scoring software packages and also

significantly (at least 25%) downregulated in microarray experiment. Those 4 genes are

strong miR-155 candidate target genes that could be considered for further validations.

28

Gene_Symbol DIANA TargS MAMI Pictar TOTAL LOG2_100 LOG2_50 LOG2_TW

03

p53DINP1 - + + + 3 -1.77 -1.72 3.95

Myo1d - + + + 3 0.41 -1.59 -1.85

VAV3 + + - - 2 0.26 -1.28 -0.94

KRAS + + + + 4 -0.69 -1.18 -1.08

ADD3 + - + + 3 -1.39 -1.15 1.6

ETNK2 - + + + 3 -0.77 -1.01 3.15

PICALM + - + - 2 -0.62 -0.83 -1.41

BCAT1 + + - + 3 0.15 -0.73 -2.1

ZNF652 + + + - 3 -0.66 -0.71 -0.55

TSGA14 - + + + 3 -0.97 -0.66 1.37

ETS1 + + + + 4 -0.34 -0.66 0.29

CARHSP1 - + + + 3 -0.39 -0.65 -0.48

JARID2 + + + - 3 0.17 -0.65 -0.88

SDCBP - + + + 3 0.03 -0.6 -0.54

USP48 - + + + 3 -0.47 -0.57 0.27

SMAD1 + + - - 2 0.08 -0.55 -0.22

MEIS1 - + - + 2 0.08 -0.54 1.55

kcip-1 + + - + 3 -0.1 -0.54 -0.42

MYO10 - + + + 3 0.14 -0.53 -1.88

SGK3 + + + + 4 -0.32 -0.5 -0.62

WWC1 - + + + 3 0 -0.48 -1.1

CSNK1G2 - + + + 3 -0.04 -0.47 -0.86

HIF1A - - + + 2 -0.18 -0.46 -0.22

UBQLN1 + - + + 3 -0.19 -0.46 -0.71

YWHAE - + + + 3 -0.1 -0.38 -0.37

ARID2 + + - - 2 -0.29 -0.33 -0.01

MAP3K7IP2 + + + + 4 -0.17 -0.32 -0.53

KPNA1 + + - + 3 0.1 -0.29 -1.49

FAR1 + + + + 4 -0.2 -0.28 -1.32

SLA - + + + 3 -0.06 -0.28 -0.16

Table 9: Combination of microarray data with prediction data. The microarray data is

incorporated to the list of targets in Appendix 1. Blue ones on the left indicate that the gene has

been validated by wet-lab experiments. LOG2_100 indicates: log2(miR-155 mimic 100 nM / miR-

155 control 50 nM). LOG2_50 indicates: log2(miR-155 mimic 50 nM / miR-155 control 50 nM).

LOG2_TW03 indicates: log2 (TW03 / NP69).

29

Result V: GOEAST analysis revealed the importance of protein and nucleotide binding related genes via Gene Ontologies

The analysis of 395 genes using GOEAST revealed the importance of protein and

nucleotide binding related genes. This also means that the significant portion of 395 genes

is transcription factors (GO: 0000166).

Another significantly enriched GO term is, GO:0005072 - transforming growth factor beta

receptor, cytoplasmic mediator activity, defines the molecular function in which it explains

the activity of any molecules that transmit the signal from a TGF-beta receptor from the

cytoplasm to the nucleus [40]. As seen from Figure 11, there are totally 10 genes (see Table

10) in GO:0005072, and 4 of them are enriched in the list introduced.

Table 10: List of totally 10 genes in GO:0005072 - transforming growth factor beta

receptor, cytoplasmic mediator activity [40].

Parameters that were chosen on GOEAST:

Statistical test method: Hypergeometric

Multi-test adjustment method: Yekutieli (FDR under dependency)

Significance Level of Enrichment: 0.001

Database ID Gene_Symbol Reference Evidence Gene name

O15105 SMAD7 PMID:9256479 IDA

O15198 SMAD9 PMID:19018011TAS

O43541 SMAD6 PMID:9256479 IDA

P17813 ENG PMID:12015308IDA

P46527 CDKN1B PMID:8033212 TAS

P84022 SMAD3 PMID:9111321 IDA

Q13485 SMAD4 PMID:9389648 IDA

Q15796 SMAD2 PMID:9256479 IDA

UniProtKB Mothers against decapentaplegic homolog 7



UniProtKB Endoglin

UniProtKB Cyclin-dependent kinase inhibitor 1B




30

Figure: 11 395 genes that were obtained by microarray data is used to analyze by the help of

GOEAST. The gradient of the color yellow indicates the significance of the corresponding gene

ontology (the more intense the yellow is, the more the significance is because of lower p values).

Result VI: Validation of microarray results by qPCR revealed that Zdhhc2 and tp53inp1 genes are significantly downregulated As a result of qPCR experiment, the quantification of selected genes was obtained. This let

us accurately determine which gene(s) is/are downregulated in 5 different samples. As a

result of qPCR, zdhhc2 and tp53inp1 showed downregulation in both miR-155 mimic

100nM and miR-155 mimic 50nM when compared to miR-155 control 50nM.

As a result of qPCR, Zdhhc2 and tp53inp1 genes showed significant downregulation in

both miR-155 mimic 100nM and miR-155 mimic 50nM when compared to miR-155

control 50nM (see Figure 11 and 12).

31

Figure11: The qPCR results of 2ef2, kdm5b and zdhhc2 genes. The zdhhc2 gene showed

significant downregulation when considering NP69 mimic 50nM and NP69 mimic 100nM

compared with NP69 control 50nM, NP69 parental and TW03. Other genes did not show

significant downregulation.

Figure 12: The qPCR results of bclaf1, terf1 and tp53inp1 genes. The tp53inp1 gene

showed significant downregulation when considering NP69 mimic 50nM and NP69 mimic

100nM compared with NP69 control 50nM, NP69 parental and TW03.

32

Figure 13: The qPCR results of the gene perp. This gene did not show significant

downregulation.

33

4 Discussion

Finding the best working software packages for miRNA target prediction is quite

complicated for many reasons. Different software packages use different parameters as

well as different 3‟UTRs (some of them only considers the longest 3‟UTR, while the other

considering all possible 3‟UTRs). These differences result in different set of targets for

particular miRNA. Another complication is that having different output formats from

different software packages. This needs to be converted into common identifier and

sometimes it is difficult to find the proper identifier.

The biological difference between animal and plant miRNA targeting mechanism remains

largely unknown. The obvious difference is the site of miRNA binding, which in plant is

CDS, while in animals it is 3‟UTR. Theoretically, miRNA can bind to CDS in animals, too.

Maybe this is where the difference arise, that binding CDS is more difficult in animals than

plants, because of ribosome or other translation factors occupying mRNA. Binding to the

CDS might be difficult to avoid in plants. Another hypothesis would be the difference in

the effect of RISC, meaning that in plants RISC might bind to corresponding miRNA so

that it would favor to have complete complementarity. This leads to the fact that, miRNA

target prediction in plants is easier.

The first evaluation of this study is that, 4 different miRNA prediction softwares namely –

TargetScan 5.1, PicTar 5, MAMI and DIANA-MicroT 3.0 could be used for further

investigation. Incorporating microarray data into the study slightly strengthened the results

gained from those softwares. However, the overall result is, computational predictions and

microarray data didn't add drammatic effect.

By using computational predictions and microarray data 2 potentially strong miR-155

target genes is another contribution of this study. These two targets namely, tp53inp1 and

zdhhc2, will be considered for further validation, especially Luciferase reporter assay.

Using 395 genes for functional enrichment studies could be considered as another source

for finding potential targets. As described in Methods, those 395 genes have at least 25%

downregulation (log2 of fold changes is less than -0.5). One might consider 25%

downregulation as insignificant or noise, but when it comes to miRNAs, many reported

studies indicated that 25% downregulation also matters. Therefore, candidate target genes

found by using DAVID, DPY19L1 and WEE1, are also significant enough for further

validation experiments. The amount of WEE1 (a nuclear protein, which is a tyrosine

kinase) enzyme decreases at M phase when it is hyperphosphorylated, is consistent with the

idea that it might act as a negative regulator of entry into mitosis. If one would make a

story, the storyline would be; by downregulating WEE1 activity, mitosis will be kept active,

which results in proliferation, which is favored by almost all cancer tissues.

The target prediction and validation procedure could be improved by using alternative

technologies. The alternative method for microarray would be RNA-Seq, which is a

34

technique that quantifies the transcriptome of cells by using deep sequencing technologies.

There are significant amount of publications supporting that RNA-Seq reveals more

information than microarray, because it is not hybridization dependent technique (like

microarray), in which detecting different isoforms is less likely. Since multi-exon genes

have the potential to produce different isoforms, and microarray mostly doesn‟t detect

different isoforms, one could argue about the misleading of microarray data. Hypothetical

example would be: Specific miRNA binds to specific isoform of a gene (alternative 3‟UTR

splicing events give rise to different 3‟UTR of a single gene) and eventually downregulates

it. The hybridization probe of microarray is unique for the gene, but it doesn‟t specifically

bind to 3‟UTR (it can bind everywhere on mRNA). Thus while one isoform going down;

other isoforms would still exist, contributing the total amount of mRNA in the sample.

Consequently, the microarray data will not show highly downregulation, and this will lead

to misinterpreting microarray data. As a result of all these, one could use RNA-Seq for the

miRNA studies.

Another part of the experiment was to validate microarray results by checking the relative

expression of some “significant” genes by qPCR. Being significant gene here could be

explained by having a tumor suppression function. The following 5 genes are related to

tumor suppression or negative regulation of mitosis or positive regulation of apoptosis.

This means that in the absence of those genes there is a certain risk of the cell being highly

proliferative and becoming a tumor cell.

TERF1, BCLAF1 ----> Negative regulation of mitosis, GO: 0045839.

PERP, TP53INP1 ---> Positive regulation of apoptosis, GO: 0043065.

E2F2 -----> Plays a crucial role in the control of cell cycle and action of tumor suppressor

proteins and is also a target of the transforming proteins of DNA tumor viruses.

Recent study [41] has published that SMAD2 is direct target of miR-155. SMAD2 was

also enriched in this study during GO analysis of GOEAST. SMAD2 belongs to

GO:0005072 which has 10 genes mostly belonging to SMAD family, those act as a

mediators of TGF-β (pleiotropic cytokine, with important effects on processes such as

fibrosis, angiogenesis and immunosupression) signaling. Upregulation of miR-155 altered

the response mechanisms to TGF-β by changing the expression of target genes which are

involved in inflammation, fibrosis and angiogenesis. Briefly, this brings the idea that other

SMAD family genes that were enriched in our study could be checked during further

validations.

35

5 References

[1] Bartel DP: MicroRNAs: Target Recognition and Regulatory Functions. Cell 2009,

136(2):215-233

[2] Lee, R. C., Feinbaum, R. L., and Ambros, V. (1993). The C. elegans heterochronic gene

lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75, 843-854.

[3] Rhoades MW, Reinhart BJ, Lim LP, Burger CB, Bartel B, Bartel DP: Prediction of plant

microRNA targets. Cell 2003, 110:513-520.

[4] Reinhart BJ, Slack F, Basson M, Pasquinelli A, Bettinger J, Rougvie A, Horvitz HR,

Ruvkun G: The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis

elegans. Nature 2000,

403:901-906.

[5] Lee RC, Feinbaum RL, Ambros V: The C. elegans heterochronic gene lin-4 encodes

small RNAs with antisense complementarity to lin-14. Cell 1993, 75:843-854.

[6] Xiaowei Wang and Issam M. El Naqa (2008) Prediction of both conserved and

nonconserved microRNA targets in animals. Bioinformatics 24(3):325-332.

[7] Xiaowei Wang (2008) miRDB: a microRNA target prediction and functional annotation

database with a wiki interface. RNA 14(6):1012-1017

[8] Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines,

indicates that thousands of human genes are microRNA targets. Cell 2005;120:15-20.

[9] Ambros V (2004). The functions of animal microRNAs. Nature 431: 350–355

[10] Bushati N, Cohen SM (2007) microRNA functions. Annu Rev Cell Dev Biol 23: 175–

205

[11] Sevignani C, Calin GA, Nnadi SC, Shimizu M, Davuluri RV, Hyslop T, Demant P,

Croce CM, Siracusa LD (2007) MicroRNA genes are frequently located near mouse cancer

susceptibility loci. Proc Natl Acad Sci USA 104: 8017– 8022

[12] Asangani IA, Rasheed SA, Nikolova DA, Leupold JH, Colburn NH, Post S, Allgayer

H (2008) MicroRNA-21 (miR-21) post-transcriptionally downregulates tumor suppressor

Pdcd4 and stimulates invasion, intravasation and metastasis in colorectal cancer. Oncogene

27: 2128–2136

[13] Selbach M, Schwanhausser B, Thierfelder N, Fang Z, Khanin R, Rajewsky N (2008)

36

Widespread changes in protein synthesis induced by microRNAs. Nature 455: 58– 63

[14] Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP. MicroRNA

targeting specificity in mammals: determinants beyond seed pairing. Mol Cell

2007;27(1):91–105.

[15] Saito T., and Sætrom P., (2010). MicroRNAs – targeting and target prediction

[16] UCSC Genome Browser on Human Mar. 2006 (NCBI36/hg18) Assembly. Retrieved in

04.04.2010 from http://genome.ucsc.edu/cgi-

bin/hgTracks?db=hg18&position=chr21:25868163-

25868227&hgt.customText=http://mirnamap.mbc.nctu.edu.tw/cache/bed/hsa-mirna.bed

[17] The pre-miRNA of MI0000681. Retrieved in 04.04.2010 from

http://mirnamap.mbc.nctu.edu.tw/php/mirna_entry.php?acc=MI0000681

[18] Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB. Prediction of

mammalian microRNA targets. Cell. 115: 787-798 (2003).

[19] Lytle, J.R. et al. (2007) Target mRNAs are repressed as efficiently by

microRNAbinding sites in the 50 UTR as in the 30 UTR. Proc. Natl. Acad. Sci. U. S. A.

104, 9667– 9672.

[20] Arvey A, Larsson E, Sander C, Leslie CS, Marks DS. Target mRNA abundance dilutes

microRNA and siRNA activity. Molecular Systems Biology (2010) 6:363.

[21] M. Maragkakis; M. Reczko; V. A. Simossis; P. Alexiou; G. L. Papadopoulos; T.

Dalamagas; G. Giannopoulos; G. Goumas; E. Koukis; K. Kourtis; T. Vergoulis; N. Koziris;

T. Sellis; P. Tsanakas; A. G. Hatzigeorgiou. DIANA-microT web server: elucidating

microRNA functions through target prediction. Nucleic Acids Research 2009 Jul 1; 37(Web

Server issue):W273-6.

[22] Friedman, R.C., Farh K. K., Christopher B Burge, David P Bartel. (2009) Most

mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19, 92–105

[23] Rajewsky N., and Chen K. Natural selection on human microRNA binding sites

inferred from SNP data. Nature Genetics 38, 1452 - 1456 (2006)

[24] Kong W, He L, Coppola M, Guo J, Esposito NN, Coppola D, Cheng JQ. MicroRNA-

155 regulates cell survival, growth and chemosensitivity by targeting FOXO3a in breast

cancer. J Biol Chem. 2010 Apr 6

[25] Brown JR, Sanseau P. A computational view of microRNAs and their targets. Drug

Discov Today. 10: 595-601 (2005)

[26] Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA

genomics. Nucleic Acids Res. 36: D154-D158 (2008)

[27] Hammell CM. The microRNA-argonaute complex: a platform for mRNA modulation.

http://www.ncbi.nlm.nih.gov/pubmed?term=





37

RNA Biol 2008;5(3):123–7.

[28] The database of experimentally supported targets: a functional update of TarBase.

(Papadopoulos GL, Reczko M, Simossis VA, Sethupathy P, Hatzigeorgiou AG.), Nucleic

Acids Res. 2009 Jan;37(Database issue):D155-8. Epub 2008 Oct 27.

[29] Mirwalk, http://www.ma.uni-heidelberg.de/apps/zmf/mirwalk/index.html

[30] Uniprot, http://www.uniprot.org/keywords/?query=name:"Phosphoprotein"

[31] Nucleic Acids Res. 2008 May 16. GOEAST: a web-based software toolkit for Gene

Ontology enrichment analysis. Zheng Q, Wang XJ. PMID: 18487275

[32] Lim LP, Lau NC, Garrett-Engele P, Grimson A, Schelter JM, Castle J, Bartel DP,

Linsley PS, Johnson JM. Microarray analysis shows that some microRNAs downregulate

large numbers of target mRNAs, Nature. 433: 769-773 (2005)

[33] Eulalio A, Huntzinger E, Nishihara T, Rehwinkel J, Fauser M, Izaurralde E (January

2009)."Deadenylation is a widespread effect of miRNA regulation". RNA 15 (1): 21–

32.doi:10.1261/rna.1399509. PMID 19029310.

[34] Sunkar R, Jagadeeswaran G. In silico identification of conserved microRNAs in large

number of diverse plant species. BMC Plant Biol. 2008;8:37.

[35] Howell F Moffett and Carl D Novina (2007). A small RNA makes a Bic difference.

Genome Biology 2007, 8:221 (doi:10.1186/gb-2007-8-7-221)

[36] Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large

gene lists using DAVID Bioinformatics Resources. Nature Protoc. 2009;4(1):44-57.

[37] Dennis G Jr, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA.

DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol.

2003;4(5):P3

[38] Retrieved from Genecards, http://www.genecards.org, on June 2010.

[39] Anthony A. Millar and Peter M. Waterhouse. (2005) Plant and animal microRNAs:

similarities and differences. FUNCTIONAL & INTEGRATIVE GENOMICS. 5:3, 129-

135, DOI: 10.1007/s10142-005-0145-2.

[40] Retrieved from www.geneontology.org

[41] Louafi F, Martinez-Nunez RT, Sanchez-Elsner T.(2010). Microrna-155 (miR-155)

targets SMAD2 and modulates the response of macrophages to transforming growth factor-

{beta}. J Biol Chem.

http://www.ma.uni-heidelberg.de/apps/zmf/mirwalk/index.html

http://www.uniprot.org/keywords/?query=name:

http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2612776

http://en.wikipedia.org/wiki/Digital_object_identifier

http://dx.doi.org/10.1261%2Frna.1399509

http://en.wikipedia.org/wiki/PubMed_Identifier

http://www.ncbi.nlm.nih.gov/pubmed/19029310

http://www.genecards.org/cgi-bin/carddisp.pl?gene=ETS1&search=Ets-1

http://www.springerlink.com/content/?Author=Anthony+A.+Millar

http://www.springerlink.com/content/?Author=Peter+M.+Waterhouse

http://www.springerlink.com/content/1438-793x/

http://www.springerlink.com/content/1438-793x/5/3/

http://www.geneontology.org/

http://www.ncbi.nlm.nih.gov/pubmed?term=%22Louafi%20F%22%5BAuthor%5D

http://www.ncbi.nlm.nih.gov/pubmed?term=%22Martinez-Nunez%20RT%22%5BAuthor%5D

http://www.ncbi.nlm.nih.gov/pubmed?term=%22Sanchez-Elsner%20T%22%5BAuthor%5D

38

Appendices

Supplementary Figure 1

Supplementary Figure 1: The scatter plot showing the correlation between NP69 tissue

and NP69 control 50nM did not significant correlation between them.

39

Appendix 1

DIANA-MicroT 3.0 http://diana.cslab.ece.ntua.gr/microT/

TargetScan 5.1 http://www.targetscan.org/vert_50/

Pictar 5.0 http://pictar.mdc-berlin.de/

MAMI http://mami.med.harvard.edu/

Blue ones are validated DIRECT targets of miR-155


NUFIP2 1 1 1 1 4 nuclear fragile X mental retardation protein interacting

protein 2

MAP3K7IP2 1 1 1 1 4 mitogen-activated protein kinase kinase kinase 7

interacting protein 2

SGK3 1 1 1 1 4 serum/glucocorticoid regulated kinase family, member 3

TSHZ3 1 1 1 1 4 teashirt zinc finger homeobox 3

SEMA5A 1 1 1 1 4 sema domain, seven thrombospondin repeats (type 1 and

type 1-like)

RAB11FIP2 1 1 1 1 4 RAB11 family interacting protein 2 (class I)

SEPT11 1 1 1 1 4 septin 11

FAR1 1 1 1 1 4 fatty acyl CoA reductase 1

KRAS 1 1 1 1 4 v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog

ETS1 1 1 1 1 4 v-ets erythroblastosis virus E26 oncogene homolog 1

(avian)

BACH1 1 1 1 1 4 BTB and CNC homology 1, basic leucine zipper

transcription factor 1

ZNF236 1 1 1 1 4 zinc finger protein 236

DCUN1D3 1 1 1 3 DCN1, defective in cullin neddylation 1, domain

containing 3 (S. cerevisiae)

ETNK2 1 1 1 3 ethanolamine kinase 2

DNAJB7 1 1 1 3 DnaJ (Hsp40) homolog, subfamily B, member 7

IKBKE 1 1 1 3 inhibitor of kappa light polypeptide gene enhancer in B-

cells, kinase epsilon

HDAC4 1 1 1 3 histone deacetylase 4

FBXO11 1 1 1 3 F-box protein 11

CACNA1C 1 1 1 3 hypothetical protein LOC100131098; calcium channel

C3orf18 1 1 1 3 chromosome 3 open reading frame 18

UBQLN1 1 1 1 3 ubiquilin 1

CSF1R 1 1 1 3 colony stimulating factor 1 receptor

CD47 1 1 1 3 CD47 molecule

CARHSP1 1 1 1 3 calcium regulated heat stable protein 1, 24kDa

YWHAE 1 1 1 3 similar to 14-3-3 protein epsilon (14-3-3E)

MIDN 1 1 1 3 midnolin

MAP3K14 1 1 1 3 mitogen-activated protein kinase kinase kinase 14

http://diana.cslab.ece.ntua.gr/microT/

http://www.targetscan.org/vert_50/

http://pictar.mdc-berlin.de/

http://mami.med.harvard.edu/

40

MAP3K10 1 1 1 3 mitogen-activated protein kinase kinase kinase 10

NFAT5 1 1 1 3 nuclear factor of activated T-cells 5, tonicity-responsive

N4BP1 1 1 1 3 NEDD4 binding protein 1

MYO10 1 1 1 3 myosin X

KPNA1 1 1 1 3 karyopherin alpha 1 (importin alpha 5)

KIAA1274 1 1 1 3 KIAA1274

JARID2 1 1 1 3 jumonji, AT rich interactive domain 2

LRRC59 1 1 1 3 leucine rich repeat containing 59

ZIC3 1 1 1 3 Zic family member 3 (odd-paired homolog, Drosophila)

KPNA4 1 1 1 3 karyopherin alpha 4 (importin alpha 3)

SLA 1 1 1 3 Src-like-adaptor

SKV 1 1 1 3 v-ski sarcoma viral oncogene homolog (avian)

RNF123 1 1 1 3 ring finger protein 123

ZNF652 1 1 1 3 zinc finger protein 652

Nova1 1 1 1 3 neuro-oncological ventral antigen 1

FGF7 1 1 1 3 hypothetical LOC100132771; fibroblast growth factor 7

CEBPB 1 1 1 3 CCAAT/enhancer binding protein (C/EBP), beta

Myo1d 1 1 1 3 myosin ID

ZNF703 1 1 1 3 zinc finger protein 703

kcip-1 1 1 1 3 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase

activation protein

p53DINP1 1 1 1 3 tumor protein p53 inducible nuclear protein 1

LRP1B 1 1 1 3 low density lipoprotein-related protein 1B (deleted in

tumors)

C1QL2 1 1 1 3 complement component 1, q subcomponent-like 2

ARL5B 1 1 1 3 ADP-ribosylation factor-like 5B

AICDA 1 1 1 3 activation-induced cytidine deaminase

ADD3 1 1 1 3 adducin 3 (gamma)

BOC 1 1 1 3 Boc homolog (mouse)

BCAT1 1 1 1 3 branched chain aminotransferase 1, cytosolic

ASTN2 1 1 1 3 astrotactin 2

c-myb 1 1 1 3 v-myb myeloblastosis viral oncogene homolog (avian)

ZNF198 1 1 1 3 zinc finger, MYM-type 2

CSNK1G2 1 1 1 3 casein kinase 1, gamma 2

TLE4 1 1 1 3 transducin-like enhancer of split 4 (E(sp1) homolog,

Drosophila)

EHD1 1 1 1 3 EH-domain containing 1

Olfml3 1 1 1 3 olfactomedin-like 3

PSKH1 1 1 1 3 protein serine kinase H1

USP48 1 1 1 3 ubiquitin specific peptidase 48

41

SOX1 1 1 1 3 SRY (sex determining region Y)-box 1

TOMM20 1 1 1 3 similar to translocase of outer mitochondrial membrane

WIT1 1 1 1 3 Wilms tumor upstream neighbor 1

UPP2 1 1 1 3 uridine phosphorylase 2

SOCS1 1 1 1 3 suppressor of cytokine signaling 1

SPI1 1 1 1 3 spleen focus forming virus (SFFV)

TSGA14 1 1 1 3 testis specific, 14

SDCBP 1 1 1 3 syndecan binding protein (syntenin)

WWC1 1 1 1 3 WW and C2 domain containing 1

TRIM2 1 1 1 3 tripartite motif-containing 2

SUFU 1 1 1 3 suppressor of fused homolog (Drosophila)

SMARCA4 1 1 2 SWI/SNF related, matrix associated

AKAP10 1 1 2 A kinase (PRKA) anchor protein 10

Itk 1 1 2 IL2-inducible T-cell kinase

VAV3 1 1 2 vav 3 guanine nucleotide exchange factor

SMNDC1 1 1 2 survival motor neuron domain containing 1

RCN2 1 1 2 reticulocalbin 2, EF-hand calcium binding domain

RREB1 1 1 2 ras responsive element binding protein 1

SPIN3 1 1 2 spindlin family, member 3

JMJD1A 1 1 2 lysine (K)-specific demethylase 3A

CSNK1A1 1 1 2 casein kinase 1, alpha 1

SOX10 1 1 2 SRY (sex determining region Y)-box 10

SOS1 1 1 2 son of sevenless homolog 1 (Drosophila)

SMAD1 1 1 2 SMAD family member 1

ATP2B1 1 1 2 ATPase, Ca++ transporting, plasma membrane 1

ANTXR2 1 1 2 anthrax toxin receptor 2

SMAD2 1 1 2 SMAD family member 2

SLC39A10 1 1 2 solute carrier family 39 (zinc transporter), member 10

BCL11A 1 1 2 B-cell CLL/lymphoma 11A (zinc finger protein)

ZNF642 1 1 2 zinc finger protein 642

BAG5 1 1 2 BCL2-associated athanogene 5

TSPAN14 1 1 2 tetraspanin 14


BRD1 1 1 2 bromodomain containing 1

SOCS6 1 1 2 suppressor of cytokine signaling 6

TRPS1 1 1 2 trichorhinophalangeal syndrome I

ABHD2 1 1 2 abhydrolase domain containing 2

ACTA1 1 1 2 actin, alpha 1, skeletal muscle

42

RRP15 1 1 2 ribosomal RNA processing 15 homolog (S. cerevisiae)

MLCK 1 1 2 myosin light chain kinase

KBTBD2 1 1 2 kelch repeat and BTB (POZ) domain containing 2

FLJ90013 1 1 2 transmembrane anterior posterior transformation 1

BSN2 1 1 2 basonuclin 2

SP3 1 1 2 Sp3 transcription factor

PSIP1 1 1 2 PC4 and SFRS1 interacting protein 1

WDFY3 1 1 2 WD repeat and FYVE domain containing 3

INPP5D 1 1 2 inositol polyphosphate-5-phosphatase, 145kDa

TYRP1 1 1 2 tyrosinase-related protein 1

ARID2 1 1 2 AT rich interactive domain 2 (ARID, RFX-like)

ZFYVE14 1 1 2 ankyrin repeat and FYVE domain containing 1

PELI1 1 1 2 pellino homolog 1 (Drosophila)

WDR45 1 1 2 WD repeat domain 45


LCORL 1 1 2 ligand dependent nuclear receptor corepressor-like

SP1 1 1 2 Sp1 transcription factor

NR2F2 1 1 2 nuclear receptor subfamily 2, group F, member 2

Mon1a 1 1 2 MON1 homolog A (yeast)

RAB1A 1 1 2 RAB1A, member RAS oncogene family

cab39 1 1 2 calcium binding protein 39

TMEM178 1 1 2 transmembrane protein 178

TFDP2 1 1 2 transcription factor Dp-2 (E2F dimerization partner 2)

SSH2 1 1 2 slingshot homolog 2 (Drosophila)

NDFIP1 1 1 2 Nedd4 family interacting protein 1

EHF 1 1 2 ets homologous factor

STRN3 1 1 2 striatin, calmodulin binding protein 3

DNCI1 1 1 2 dynein, cytoplasmic 1, intermediate chain 1

AHCYL2 1 1 2 adenosylhomocysteinase-like 2

TRIM32 1 1 2 tripartite motif-containing 32

H3.3B 1 1 2 H3 histone, family 3B (H3.3B);

MEIS1 1 1 2 Meis homeobox 1

KCNN3 1 1 2 potassium intermediate/small conductance channel

LOC389458 1 1 2 hypothetical LOC389458; RB-associated KRAB zinc

finger

RBMS3 1 1 2 RNA binding motif, single stranded interacting protein

HIF1A 1 1 2 hypoxia inducible factor 1, alpha subunit

RNF146 1 1 2 ring finger protein 146

IRF2BP2 1 1 2 interferon regulatory factor 2 binding protein 2

43

HIVEP2 1 1 2 human immunodeficiency virus type I enhancer binding

protein 2

HNRPA3 1 1 2 heterogeneous nuclear ribonucleoprotein A3

KCNA1 1 1 2 potassium voltage-gated channel

KIAA1267 1 1 2 KIAA1267

RICTOR 1 1 2 RPTOR independent companion of MTOR,

JHDM1D 1 1 2 jumonji C domain containing histone demethylase 1

SGCB 1 1 2 sarcoglycan, beta

GPR85 1 1 2 G protein-coupled receptor 85

GTF2A1L 1 1 2 stonin 1

GDF6 1 1 2 growth differentiation factor 6

GOLPH3L 1 1 2 golgi phosphoprotein 3-like

RSPO2 1 1 2 R-spondin 2 homolog (Xenopus laevis)

HERC4 1 1 2 hect domain and RLD 4

H3F3B 1 1 2 H3 histone, family 3B (H3.3B)

HBP1 1 1 2 HMG-box transcription factor 1

MECP2 1 1 2 methyl CpG binding protein 2 (Rett syndrome)

PKN2 1 1 2 protein kinase N2

NAV3 1 1 2 neuron navigator 3; similar to neuron navigator 3

PLEKHK1 1 1 2 rhotekin 2

PLAG1 1 1 2 pleiomorphic adenoma gene 1

PEA15 1 1 2 phosphoprotein enriched in astrocytes 15

PHF17 1 1 2 PHD finger protein 17

PKIA 1 1 2 protein kinase (cAMP-dependent, catalytic) inhibitor alpha

PICALM 1 1 2 phosphatidylinositol binding clathrin assembly protein

REPS2 1 1 2 RALBP1 associated Eps domain containing 2

RC3H2 1 1 2 ring finger and CCCH-type zinc finger domains 2

RBM47 1 1 2 RNA binding motif protein 47

KIAA1715 1 1 2 KIAA1715

RCOR1 1 1 2 REST corepressor 1

RAB34 1 1 2 RAB34, member RAS oncogene family

ZBTB41 1 1 2 zinc finger and BTB domain containing 41

LOC646270 1 1 2 elongation factor, RNA polymerase II, 2

LOC646438 1 1 2 H3 histone, family 3B (H3.3B);

CDC73 1 1 2 cell division cycle 73

CHD7 1 1 2 chromodomain helicase DNA binding protein 7

SF3B1 1 1 2 splicing factor 3b, subunit 1, 155kDa

CBL 1 1 2 Cas-Br-M (murine) ecotropic retroviral seq

COL21A1 1 1 2 collagen, type XXI, alpha 1

44

COL7A1 1 1 2 collagen, type VII, alpha 1

CKAP5 1 1 2 cytoskeleton associated protein 5

CNTN4 1 1 2 contactin 4

BCORL1 1 1 2 BCL6 co-repressor-like 1

C10orf26 1 1 2 chromosome 10 open reading frame 26


SLC12A6 1 1 2 solute carrier family 12




SIM2 1 1 2 single-minded homolog 2 (Drosophila)

SHOX 1 1 2 short stature homeobox

FAM134C 1 1 2 family with sequence similarity 134, member C

FBXO33 1 1 2 F-box protein 33

FLJ37543 1 1 2 hypothetical protein FLJ37543

FAM135A 1 1 2 family with sequence similarity 135, member A

S100PBP 1 1 2 S100P binding protein

GABRA1 1 1 2 gamma-aminobutyric acid (GABA) A receptor, alpha 1

GCN5L2 1 1 2 K(lysine) acetyltransferase 2A

FOS 1 1 2 v-fos FBJ murine osteosarcoma viral oncogene homolog

FZD5 1 1 2 frizzled homolog 5 (Drosophila)

COPS3 1 1 2 COP9 constitutive photomorphogenic homolog subunit 3

DNAJB1 1 1 2 DnaJ (Hsp40) homolog, subfamily B, member 1

SCG2 1 1 2 secretogranin II (chromogranin C)

SEC14L5 1 1 2 SEC14-like 5 (S. cerevisiae)

DET1 1 1 2 de-etiolated homolog 1 (Arabidopsis)

SATB1 1 1 2 SATB homeobox 1

SALL1 1 1 2 sal-like 1 (Drosophila)

E2F2 1 1 2 E2F transcription factor 2

EDG1 1 1 2 sphingosine-1-phosphate receptor 1

45

Appendix 2:

ID Gene Name

HMGCS1 3-hydroxy-3-methylglutaryl-Coenzyme A synthase 1 (soluble)

AAK1 AP2 associated kinase 1

ATP6V1C1 ATPase, H+ transporting, lysosomal 42kDa, V1 subunit C1

CAP2 CAP, adenylate cyclase-associated protein, 2 (yeast)

CNOT1 CCR4-NOT transcription complex, subunit 1

CDC42BPA CDC42 binding protein kinase alpha (DMPK-like)

COMMD2 COMM domain containing 2

F11R F11 receptor

H3F3A H3 histone, family 3B (H3.3B); H3 histone, family 3A pseudogene; H3 histone, family 3A; similar to H3 histone, family 3B; similar to histone H3.3B

HBS1L HBS1-like (S. cerevisiae)

KIAA1671 KIAA1671 protein

LASS6 LAG1 homolog, ceramide synthase 6

LRBA LPS-responsive vesicle trafficking, beach and anchor containing

MLF1IP MLF1 interacting protein

PERP PERP, TP53 apoptosis effector

ARHGAP5 Rho GTPase activating protein 5



SMEK2 SMEK homolog 2, suppressor of mek1 (Dictyostelium)

ST6GALNAC2ST6 (alpha-N-acetyl-neuraminyl-2,3-beta-galactosyl-1,3)-N-acetylgalactosaminide alpha-2,6-sialyltransferase 2

TAF9B TAF9B RNA polymerase II, TATA box binding protein (TBP)-associated factor, 31kDa

WEE1 WEE1 homolog (S. pombe)

XIAP X-linked inhibitor of apoptosis

ACOX1 acyl-Coenzyme A oxidase 1, palmitoyl

ANLN anillin, actin binding protein

ANXA2P2 annexin A2 pseudogene 2

ANXA2P1, ANXA2annexin A2 pseudogene 3; annexin A2; annexin A2 pseudogene 1

ATL3 atlastin GTPase 3

CREBL2 cAMP responsive element binding protein-like 2

CDH1 cadherin 1, type 1, E-cadherin (epithelial)

CHP calcium binding protein P22

CREG1 cellular repressor of E1A-stimulated genes 1

CENPF centromere protein F, 350/400ka (mitosin)

CBX5 chromobox homolog 5 (HP1 alpha homolog, Drosophila)

C18orf10 chromosome 18 open reading frame 10

CSDE1 cold shock domain containing E1, RNA-binding

COL12A1 collagen, type XII, alpha 1

CBFB core-binding factor, beta subunit

DLGAP5 discs, large (Drosophila) homolog-associated protein 5

ENAH enabled homolog (Drosophila)

ERMP1 endoplasmic reticulum metallopeptidase 1

EGFR epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b) oncogene homolog, avian)

46

ANKRD36B similar to KIAA1641; similar to ankyrin repeat domain 26; ankyrin repeat domain 36B

SNRNP200 similar to U5 snRNP-specific protein, 200 kDa; small nuclear ribonucleoprotein 200kDa (U5)

HMGB3 similar to high mobility group box 3; high-mobility group box 3

PRKDC similar to protein kinase, DNA-activated, catalytic polypeptide; protein kinase, DNA-activated, catalytic polypeptide

TOMM20 similar to translocase of outer mitochondrial membrane 20 homolog; similar to mitochondrial outer membrane protein 19; translocase of outer mitochondrial membrane 20 homolog (yeast)

SLC35B4 solute carrier family 35, member B4

SLC9A6 solute carrier family 9 (sodium/hydrogen exchanger), member 6

FAM173B family with sequence similarity 173, member B

SKA2 family with sequence similarity 33, member A; similar to Spindle and kinetochore-associated protein 2

GLTP glycolipid transfer protein; glycolipid transfer protein pseudogene 1

GPC1 glypican 1

GNAQ guanine nucleotide binding protein (G protein), q polypeptide

HSPB1 heat shock 27kDa protein-like 2 pseudogene; heat shock 27kDa protein 1

HELZ helicase with zinc finger

HP1BP3 heterochromatin protein 1, binding protein 3

HNRNPA3 heterogeneous nuclear ribonucleoprotein A3

HIST1H1B histone cluster 1, H1b

HIP1 huntingtin interacting protein 1

ID1 inhibitor of DNA binding 1, dominant negative helix-loop-helix protein

ID3 inhibitor of DNA binding 3, dominant negative helix-loop-helix protein

ITGAV integrin, alpha V (vitronectin receptor, alpha polypeptide, antigen CD51)

IL13RA1 interleukin 13 receptor, alpha 1

KPNA6 karyopherin alpha 6 (importin alpha 7)

KRT17 keratin 17; keratin 17 pseudogene 3

KTN1 kinectin 1 (kinesin receptor)

KREMEN1 kringle containing transmembrane protein 1

LNX2 ligand of numb-protein X 2

MANEA mannosidase, endo-alpha

MID1 midline 1 (Opitz/BBB syndrome)

MSN moesin

MYH9 myosin, heavy chain 9, non-muscle

MARCKS myristoylated alanine-rich protein kinase C substrate

NCAPD2 non-SMC condensin I complex, subunit D2

PAK2 p21 protein (Cdc42/Rac)-activated kinase 2

PPL periplakin

PICALM phosphatidylinositol binding clathrin assembly protein

PKP1 plakophilin 1 (ectodermal dysplasia/skin fragility syndrome); similar to plakophilin 1 isoform 1a

PABPC1 poly(A) binding protein, cytoplasmic pseudogene 5; poly(A) binding protein, cytoplasmic 1

PMEPA1 prostate transmembrane protein, androgen induced 1

PCMTD2 protein-L-isoaspartate (D-aspartate) O-methyltransferase domain containing 2

RIPK4 receptor-interacting serine-threonine kinase 4

RPL5 ribosomal protein L5 pseudogene 34; ribosomal protein L5 pseudogene 1; ribosomal protein L5

RPLP0 ribosomal protein, large, P0 pseudogene 2; ribosomal protein, large, P0 pseudogene 3; ribosomal protein, large, P0 pseudogene 6; ribosomal protein, large, P0

SEMA3C sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3C

47

SYNE2 spectrin repeat containing, nuclear envelope 2

SGPL1 sphingosine-1-phosphate lyase 1

SKAP2 src kinase associated phosphoprotein 2

STON2 stonin 2

SMC4 structural maintenance of chromosomes 4

SNAP23 synaptosomal-associated protein, 23kDa

TNKS tankyrase, TRF1-interacting ankyrin-related ADP-ribose polymerase

TLL1 tolloid-like 1

TOP2A topoisomerase (DNA) II alpha 170kDa

TOP2B topoisomerase (DNA) II beta 180kDa

TOB2 transducer of ERBB2, 2

TBL1XR1 transducin (beta)-like 1 X-linked receptor 1

TM7SF3 transmembrane 7 superfamily member 3

TMEM14A transmembrane protein 14A

TMEM56 transmembrane protein 56

TWF1 twinfilin, actin-binding protein, homolog 1 (Drosophila)

UBE4A ubiquitination factor E4A (UFD2 homolog, yeast)

ZFAT zinc finger and AT hook domain containing

ZBTB41 zinc finger and BTB domain containing 41

48

Appendix 3:

ID Gene Name

HMGCS1 3-hydroxy-3-methylglutaryl-Coenzyme A synthase 1 (soluble)

OXCT1 3-oxoacid CoA transferase 1

AHNAK AHNAK nucleoprotein

AHNAK2 AHNAK nucleoprotein 2

ATP6V1D ATPase, H+ transporting, lysosomal 34kDa, V1 subunit D

AGAP1 ArfGAP with GTPase domain, ankyrin repeat and PH domain 1

DIP2A DIP2 disco-interacting protein 2 homolog A (Drosophila)

ELAVL2 ELAV (embryonic lethal, abnormal vision, Drosophila)-like 2 (Hu antigen B)

F11R F11 receptor

GPSM2 G-protein signaling modulator 2 (AGS3-like, C. elegans)

IQGAP1 IQ motif containing GTPase activating protein 1

KIAA1671 KIAA1671 protein

KLF7 Kruppel-like factor 7 (ubiquitous)

LFNG LFNG O-fucosylpeptide 3-beta-N-acetylglucosaminyltransferase

MLF1IP MLF1 interacting protein

NDRG1 N-myc downstream regulated 1

WEE1 WEE1 homolog (S. pombe)

XIAP X-linked inhibitor of apoptosis

ANXA2P1, ANXA2annexin A2 pseudogene 3; annexin A2; annexin A2 pseudogene 1

CDH1 cadherin 1, type 1, E-cadherin (epithelial)

C18orf10 chromosome 18 open reading frame 10

PSAT1 chromosome 8 open reading frame 62; phosphoserine aminotransferase 1

CIT citron (rho-interacting, serine/threonine kinase 21)

CLOCK clock homolog (mouse)

COL12A1 collagen, type XII, alpha 1

DLG1 discs, large homolog 1 (Drosophila)

DPY19L1 dpy-19-like 1 (C. elegans); similar to hCG1645499

ENAH enabled homolog (Drosophila)

EGFR epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b) oncogene homolog, avian)

FAM173B family with sequence similarity 173, member B

GGCX gamma-glutamyl carboxylase

GLTP glycolipid transfer protein; glycolipid transfer protein pseudogene 1

GTDC1 glycosyltransferase-like domain containing 1

GNG12 guanine nucleotide binding protein (G protein), gamma 12

HDLBP high density lipoprotein binding protein

49

HIP1 huntingtin interacting protein 1

KRT17 keratin 17; keratin 17 pseudogene 3

MAN1A2 mannosidase, alpha, class 1A, member 2

MBOAT2 membrane bound O-acyltransferase domain containing 2

MMGT1 membrane magnesium transporter 1

MSN moesin

MYO5A myosin VA (heavy chain 12, myoxin)

MYH9 myosin, heavy chain 9, non-muscle

PAK2 p21 protein (Cdc42/Rac)-activated kinase 2

PALLD palladin, cytoskeletal associated protein

PPL periplakin

PLD1 phospholipase D1, phosphatidylcholine-specific

PKP1 plakophilin 1 (ectodermal dysplasia/skin fragility syndrome); similar to plakophilin 1 isoform 1a

PABPC1 poly(A) binding protein, cytoplasmic pseudogene 5; poly(A) binding protein, cytoplasmic 1

PRKCA protein kinase C, alpha

PTPN11 protein tyrosine phosphatase, non-receptor type 11; similar to protein tyrosine phosphatase, non-receptor type 11

PTPRK protein tyrosine phosphatase, receptor type, K

PHTF2 putative homeodomain transcription factor 2

RIPK4 receptor-interacting serine-threonine kinase 4

KDM5B similar to Jumonji, AT rich interactive domain 1B (RBP2-like); lysine (K)-specific demethylase 5B

PRKDC similar to protein kinase, DNA-activated, catalytic polypeptide; protein kinase, DNA-activated, catalytic polypeptide

SLC39A9 solute carrier family 39 (zinc transporter), member 9

SPTBN1 spectrin, beta, non-erythrocytic 1

SGPL1 sphingosine-1-phosphate lyase 1

SREBF2 sterol regulatory element binding transcription factor 2

SVIL supervillin

TBL1XR1 transducin (beta)-like 1 X-linked receptor 1

TGFBI transforming growth factor, beta-induced, 68kDa

TNRC6B trinucleotide repeat containing 6B

TUFT1 tuftelin 1

TWF1 twinfilin, actin-binding protein, homolog 1 (Drosophila)

VPS13B vacuolar protein sorting 13 homolog B (yeast)

ZNF185 zinc finger protein 185 (LIM domain)

TRITA-CSC-E 2010:164 ISRN-KTH/CSC/E--10/164-SE

ISSN-1653-5715

www.kth.se

mir-155 target prediction and validation in nasopharyngeal ... · mir-155 target prediction and...

Documents