identification of mi rna s and their target gen es in …€¦ · cloning the cells and isolating...

20
IDENTIFICATION OF miRNAs AND THEIR TARGET GENES IN STEM CELL DERIVED CARDIOMYOCYTES

Upload: others

Post on 06-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

IDENTIFICATION OF miRNAs

AND THEIR TARGET GENES

IN STEM CELL DERIVED

CARDIOMYOCYTES �

����������������� ����������������� ������������������������������������������� !"��� ���� #�����!��������$� ���������������%������$�&��"� ��'��(")��

Page 2: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

INTRODUCTION 2

Stem cell research, especially the one dealing with human embry-onic stem cells, is a major topic nowadays. In the last few yearsstudies about human embryonic stem cell derived cardiomyocyteshighlighted the importance of those, as their characteristics are al-most identical as of the cardiomyocytes in the heart (i.e. the con-traction of those cells). The studies concentrate on the ability ofusing cardiomyocytes in the drug development for cardiac diseasesor in regenerative medicine and cell replacement therapies. Incontrast some researchers concentrate on microRNAs (miRNAs)as regulators in the development of cardiomyocytes. This studycombines both research topics as it deals with stem cells and miR-NAs (as well as their target mRNAs). A main objective is tofind differentially expressed genes by using Significance Analysis ofMicroarrays (SAM) as method. Furthermore miRNA target pre-diction is applied and the identified targets are compared with theones found by SAM. With an intersection approach we derived 41targets of up-regulated miRNAs and 25 targets of down-regulatedmiRNAs, which can be the basis for further studies (i.e. knock-outexperiments).

Introduction

Stem Cells

Although the first stem cells werefound in the 80ies (in human cordblood) it took researchers some timeto set milestones (creating stemcell lines from different organisms,cloning the cells and isolating themfrom early embryos). Recently the in-terest, especially in human stem cellresearch, has grown rapidly due totheir promising capabilities for vari-ous applications.One can divide mammalian stem cellsinto two categories/types: embry-onic stem cells (ESCs) and somaticor adult stem cells. For this mas-ter thesis human ESCs (hESCs) areof particular interest. Human ESCs

are isolated from the inner cell massof a blastocyst, and then plated ona mouse embryonic fibroblast (MEF)feeder layer, where the cells formcolonies [1].James A. Thomson and his co-workers isolated the first hESCs in1998 at the University of Wiscon-sin [2]. According to the Universityof California San Francisco (UCSF)nowadays 100 to 200 hESC lines areavailable worldwide, but only a few ofthem (those with higher quality) areused for research.There are especially two capabilitiesof the embryonic stem cells that makethem particularly interesting for re-search and these are self-renewal andpluripotency. Due to the abilityof self-renewal the cells can divideand form more new stem cells and

Page 3: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

INTRODUCTION 3

pluripotency gives the possiblity forthe cells to differentiate into everyspecialized cell type in the organism(from all the three germ layers).The fact that stem cells can becomenearly any cell type make them us-able for basic research, drug discov-ery, and stem cell therapies. In drugdevelopment new drugs can be testedon human stem cells before testing onanimal models and/or clinical trialson humans. In the future, these cellshave also a great potential for curingdiseases by cell transplantation.

Cardiomyocytes

Cardiomyocytes (CMs) are musclecells in the heart or more precisely inthe myocardium (one of the three lay-ers that form the wall of the heart).The contraction of those cells is re-sponsible for the blood flow in thebody (from the chambers to the bloodvessels of the circulatory system).The heart is one of the least regenera-tive organs in our body, which meansthat a major injury can lead to aloss of many of the functional car-diomyocytes [1]. Therefore, it wasan important milestone when Ke-hat et al. [3] published their studydescribing contracting cells withcardiomyocytes-like properties whichwere derived from hESC. From thereon scientists realised what a greatpotential lies in those cells: an un-limited source for human CMs.

miRNA

miRNAs are short (19-25 nucleotides)non-coding, single-stranded RNA.They regulate the translation ofmRNA into protein by binding to the3’-untranslated regions (UTR) of themRNAs. The biogenesis pathway ofa miRNA starts in the nucleus wherethe miRNA genes are transcribed intoprimary transcripts (pri-miRNA) bythe pol II RNA polymerase. Pri-miRNAs are then processed by thenuclear RNase III Drosha to formshort stem-loop structures, the pre-cursor miRNAs (pre-miRNAs). Thepre-miRNAs are transported fromthe nucleus to the cytoplasm, wherethe Dicer enzyme (another RNase III)generates miRNA duplexes. Fromthose duplexes it is generally onlyone of the strands that has a func-tion and this strand is the maturemiRNA, which is incorporated intothe RNA-induced silencing complex(RISC). The other strand (knownas miRNA*) is usually degraded [4].One miRNA can have many mRNAtargets and one mRNA can be a tar-get of many miRNAs and this makestarget predictions extremely compli-cated. Nevertheless, studies on miR-NAs are very interesting and impor-tant, especially for the cardiac re-search, since they are proposed asregulators of the growth, develop-ment as well as function and stressresponsiveness of the heart [5].

Page 4: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

INTRODUCTION 4

Bioinformatic Methods forIdentification of differen-tially expressed transcripts

For the identification of differen-tially expressed genes (DEGs) vari-ous methods/algorithms can be used.The use of p-values, SignificanceAnalysis of Microarrays (SAM) andfold-change are example of methodsthat commonly have been used forthis purpose. The p-value is associ-ated with a statistical test (e.g. t-test) and represents the probabilitythat random sampling leads to theresults observed. Before testing, acertain threshold for the p-value isdefined. If the p-value for a geneis below this threshold, that geneis statistically significantly differentlyexpressed in the compared samples,meaning that the null hypothesis hasto be rejected. If the p-value is abovethe threshold, the gene is not signifi-cantly different in expression betweenthe compared samples.If many tests on the same data areperformed, the false positives (FPs)rate increases, because the occurrenceof FPs is proportional to the numberof tests. This is commonly referred toas the multiple testing problem andis highly relevant for microarray datawhere thousands of genes are tested.The most common correction methodto avoid the multiple testing problemis the Bonferroni correction, in whichthe original p-values are divided bythe number of performed tests. How-ever, it has generally shown to be too

strict criteria to be useful. An al-ternative method is SAM, which usesthe false discovery rate (FDR) to cor-rect for multiple testing by using per-mutation tests to calculate the ”per-centage of genes identified by chance”[6].Several researchers [7, 8, 9] pub-lished their studies where differentmethods (statistical and fold-changebased) were analyzed to select DEGsand the conclusion from their stud-ies is that SAM and fold-change (FC)ranking should be preferred over p-value ranking. The authors further-more suggest that FC actually is notas rough as many researchers think.For example, Kadota et al. [7]Kadotaet al. showed that when investigatingthe percentages of overlapping genes(POGs), FC-based methods are over-all more reliable than those methodsthat use t-statistic [7].The MicroArray Quality Control(MAQC) study also concluded thatwhen FC-based methods are used toidentify DE, the results are more re-producible than when other methodsare applied [9].Shi et al. showed that with respectto cross-platform concordance, fold-change ranking with previous datafiltering results in more DEGs thanwhen p-value ranking or SAM areused [8], whereas SAM is suggestedas the best method if you compareit to pure fold-change ranking (with-out data filter before) as well as p-value ranking. The identified differ-entially expressed miRNAs can thenbe used to predict their targets with a

Page 5: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

RELATED WORK 5

miRNA target prediction tool, whichare based on specific rules of tar-get recognition: The 5’ region of themiRNA is much more important thanthe 3’ end. Lewis et al. [10] describedthat a perfect match of a seed (nu-cleotides 2-8 of miRNA) at the 5’ endof the miRNA (which is complemen-tary to the 3’ UTR of the target) ismore important than at the 3’ end,because there are the core elementsof a miRNA and miRNAs are betterconserved at the 5’ than the 3’ end[10]. If the seed at the 5’ end of amiRNA does not match perfectly, anextensive complementarity at the 3’end can balance that [11, 12].

Related work

In 2003 Mummery et al. publishedtheir study about the cardiomyocytedifferentiation of human embryonicstem cells [13]. It was the firststudy that induced the hESCs to dif-ferentiate into CMs. Mummery etal. took visceral-endoderm(VE)-likecell lines, which got mouse embryonicstem cells to aggregate with mouseembryonal carcinoma cells and differ-entiate into contracting muscle cells.The mouse endoderm-like cell line(END-2) was then used for cocultur-ing hESC and to induce differentia-tion into two cell types. Mummeryet al. showed that one of those celltypes are cardiomyocytes. Further-more, Mummery and her co-workersinvestigated the electrophysiology ofthe hESC-derived CMs and the hu-

man fetal CMs and could show thathESC-derived CMs have different ac-tion potentials. Non-beating cells,which had adopted morphologies ofthe surrounding beating cells, couldbe induced to have rhythmic contrac-tions caused by repeated action po-tentials [13]. Concerning the miR-NAs many researches work in thatfield: Gangaraju and Lin [14] con-centrated on miRNAs in stem cells ingeneral, whereas van Rooij and Ol-son [5] had a look on miRNAs, thatplay a role in heart diseases. vanRooij and Olson [5] focused on car-diac hypertrophy and the miRNAsthat function as key-regulators. Theresearchers showed that cardiac hy-pertrophy is accompanied by genesthat occur in fetal cardiac develop-ment, which are responsible for hy-pertrophic growth and are normallyreplaced postnatally by adult cardiacgenes. van Rooij and Olson [5] as wellas Gangaraju and Lin demonstratedthat the miRNAs miR-1 and miR-133are involved in myogenesis and car-diogenesis.

Problem Description

and Motivation

As already discussed in the introduc-tion, congestive heart failures are amajor concern nowadays. Notably,Habib et al. [1] stated that this isthe most common cause that leads tohospitalization for people with an ageover 65 in the US. Doss et al. [15]

Page 6: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

PROBLEM DESCRIPTION AND MOTIVATION 6

described that ”hESC derived car-diomyocytes theoretically fulfil most,if not all, of the properties of an idealdonor” which makes those cells andthe study of them particularly im-portant. With the gained knowledgeregarding efficient differentiation ofhESC towards CMs, hopefully pa-tients with cardiac diseases can infuture undergo ESC derived cell re-placement therapies. But more re-search is needed before this can bea reality and finding genes that arerelated to cardiac development andhence responsible for certain cardiacdiseases will be one big step forwardwithin the field of cardiac research.In this thesis miRNA and mRNAexpression data has been analyzedwith respect to the question ”Can we

identify a correlation between miRNA

data and mRNA data on a global

scale?”. To come closer to the answermiRNAs and mRNAs that are signif-icantly up- or down-regulated in thestem cell derived CMs compared tothe adult heart (AH) and fetal heart(FH) have been identified. As figure1 shows the identified miRNAs werethe starting point for target predic-tion. E.g. the result of the targetprediction for the up-regulated miR-NAs are mRNAs which are down-regulated. Therefore an intersec-tion of the previous identified down-regulated mRNAs with the targetprediction result shows in this contexta ”correlation” between miRNA andmRNA data.

Where stem cell research raiseshope for regenerative medicine,miRNA studies give insight into thebiological process of regulatory net-works. If both research fields aretaken together they can reveal moreinformation, such as which miR-NAs affect the embryonic stem celldevelopment. Furthermore findingmiRNA targets in that conjunctioncan open up new possible strategiesfor cardiology. Nowadays, as per asearch in NCBI PubMed [16] thereexist much more studies about stemcells, cardiomyocytes and mRNAs(82) as for miRNAs (4), thereforethis study can show if investigatingmiRNAs can be an additional way todirect stem cells towards cardiomy-ocytes differentiation. We assumedthat many genes have a lower expres-sion in stem cell derived CM samplesthan in the control tissues (AH andFH samples) because of their repres-sion by miRNAs, which are globallyhigher expressed in the CM samples.In comparison to the AH and FHsamples, the CM samples are imma-ture and therefore an identificationof the dissimilar miRNAs betweenthose samples might suggest thatthese miRNAs are somehow respon-sible for the developmental processesof cardiomyocytes. The results are ofgreat interest for the research groupat Cellartis AB because they may beable to use these miRNAs as candi-dates for future knock-out studies forvalidation of their functions in vitro.

Page 7: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

MATERIALS AND METHODS 7

Figure 1: Flowchart of how the given miRNA and mRNA data is processed inthis study. To simplify matters the whole methods are shown only for the up-regulated miRNAs and down-regulated mRNAs. Of course an intersectionwas made too with the targets of down-regulated miRNAs and up-regulatedmRNAs.

Materials and

Methods

Data Sources

The miRNA and mRNA expressiondata sets for this thesis are kindlyprovided by Cellartis AB, Goteborg,Sweden. The expression profileswere generated by hybridzing human

embryonic stem cell line SA002 onAffymetrix microarrays. To be ableto control for biological variation inthe data, three repeated experimentswere performed. The samples consistof three time points: undifferentiated(UD) cells, cardiomyocyte clusters 3weeks and 7 weeks after onset of dif-ferentiation (CM3w, CM7w). Sam-ples from fetal heart (FH) and adult

Page 8: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

MATERIALS AND METHODS 8

heart (AH) are used as controls (seefigure 2). In former experiments ithas been shown that cardiomyocytesstart to beat around after 4-6 daysand already after 3 weeks they con-tract synchronized and you can besure that they are mature. By keep-

ing the cells in culture for another 4weeks the researches wanted to see ifthey become more mature. As it didnot show a large difference after 3 and7 weeks, the cardiomyocytes after 3and 7 weeks are taken as one groupfor this study.

Figure 2: Structure of the datasets.

Figure 3: Different combinations of the cardiomyocyte (CM), fetal heart(FH) and adult heart (AH) samples. CM after 3 and 7 weeks seen as onegroup was compared to AH and FH separately. FH and AH was taken asone group too and compared to the CM samples.

Page 9: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

RESULTS 9

Methods

Software

All the statistical tests and analysiswere implemented and applied withthe stastistical software package R[17]. Especially for the analysis ofgenomic data a complementary opensource project, named Bioconductor[18] which is based on R, exists andwas used in this study to make useof the ”’siggenes” package [19] andthereafter of the SAM [6] function.To answer the question ”Can we iden-tify a correlation between miRNAdata and mRNA data on a globalscale?” the first step was using SAM(with an FDR of 3%) and FC (withdifferent thresholds) to identify thedifferentially expressed mRNAs andmiRNAs separately. Furthermore dif-ferent combinations (see figure 3) ofthe samples were used in the analysisto select the differentially expressedmRNAs and miRNAs. First the CMsamples from three and seven weekswere combined together as one group,and these have been compared to theAH and FH samples separately. Togain more statistical power, the con-trol tissues (AH and FH) have alsobeen combined as one group, whichwas then compared to the combinedgroup of CM samples.The results from the identification ofdifferentially expressed miRNAs andmRNAs were the starting point forthe prediction of target genes by us-

ing miRecords [20], a resource formiRNA-target interactions. The nextstep was to investigate the overlap ofthe predicted target genes and theprevious identified differentially ex-pressed mRNAs by calculating howmany percent of the predicted tar-get genes were among the up- anddown-regulated mRNAs. By us-ing Gene Ontology and its investiga-tion in which cellular components theidentified miRNAs occur, as well as inwhich biological processes and molec-ular functions those miRNAs are in-volved it should be shown that thosespecific miRNAs really are importantfor the cardiac development.

Results

Identification of differen-tially expressed mRNAsand miRNAs by SAM

In the first step SAM was appliedon the mRNA as well as on themiRNA data to identify differentiallyexpressed transcripts. In figure 4 aSAM plot illustrates differentially ex-pressed genes in the mRNA expres-sion data. The more a gene deviatesfrom the ”observed = expected” line,the more likely it is to be significant.The high number of significant genes(up- and down-regulated ones) showthat three replicates are enough totest this data set with a FDR of 3%.

Page 10: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

RESULTS 10

Figure 4: SAM plot showing differentially expressed genes. Up-regulatedgenes in green and down-regulated genes in red. The black line (where ob-served = expected) represents the set cutoffs with a chosen delta value.

Table 1 and 2 show the numberof identified up- and down-regulatedmRNAs and miRNAs using the dif-ferent sample combinations. An in-tersection was done to select the tran-script that overlap when the differentdata set combinations were applied.One can clearly see that the statisti-cal power that we gain in the variouscombinations varies substantially andas we do not want to risk to loose in-formation the first combination, car-diomyocyte samples as one group andthe adult and fetal heart samples asthe other group, was taken for further

analysis. The fact that the numberof differentially expressed mRNAs byusing the sample combination CM vsAH + FH is more than double thanby using adult and fetal heart sepa-rately (CM vs AH and CM vs FH)is caused by the number of replicatesand how SAM handles those. For ex-ample by using the combination CMvs AH, there are three replicates ofadult heart and if there is just a lit-tle variation in a specific mRNA, itwill lead to a high standard deviationand SAM will exclude those from theresult. Whereas using a combination

Page 11: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

RESULTS 11

of adult and fetal heart, which meanshaving more replicates, results in notmuch difference in the standard de-viation and therefore SAM will notexclude those mRNA anymore. Thisdoes not occur in the miRNAs be-

cause if you would add more repli-cates within the same standard de-viation it will lead to more statisticalpower, but as you cannot see muchdifference in the results there is noneed for much more statistical power.

Table 1: Differentially ex-pressed mRNAs identified bySAM.combinations mRNAs

up down

CM vs AH + FH 4528 3115CM vs AH 1888 1351CM vs FH 2112 1679

⋂1431 1122

Table 2: Differentially ex-pressed miRNAs identified bySAM.combinations miRNAs

up down

CM vs AH + FH 123 101CM vs AH 103 93CM vs FH 121 99

⋂85 79

DE mRNAs and miRNAsby fold change

Next selection method to be testedwas FC using three different thresh-olds (FC = 2, 3 and 4) and thismethod was applied only to thefirst data set combination (cardiomy-ocytes against heart samples) andthe result was compared to the re-

sult from SAM (see table 3). Theproblem with high innergroup vari-ation (as described in the Methods)did not occur in this study and evenwith a low FC threshold like FC =2, more transcripts were identified asDE when the SAM method was ap-plied. To keep as much informationas possible the SAM results were usedfor further analysis.

Table 3: Differentially expressed genes identified by SAM and different foldchange thresholds.

SAM SAM FC=2 FC=2 FC=3 FC=3 FC=4 FC=4up down up down up down up down

mRNA 4528 3115 1192 1241 441 501 234 300miRNA 123 101 82 77 28 37 15 18

Page 12: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

RESULTS 12

miRNA target prediction

After identifying differentially ex-pressed mRNAs and miRNAs, amiRNA target prediction was car-ried out. To acchieve reliableresults only validated targets ofmiRecords were considered. Unfor-tunately miRecords does not pro-vide a batch-function and thereforethe latest version (as of May 5,2010) of the miRNA-target interac-tions was downloaded and used forthe analysis. For the identified up-(123) and down-regulated (101) miR-NAs miRNA-target interactions were

identified, if there were any validatedinteractions stored in the database.The target prediction was performedseparately with the up- and down-regulated miRNAs. From the 123identified up-regulated miRNAs only13 had at least one target and onlyfor 37 of the down-regulated miRNAstargets could be found. In total 254targets for up-regulated miRNAs and385 targets for down-regulated miR-NAs could be found. Table 4 and 5contain more information about howmany miRNAs had interactions andhow many targets according to miR-NAs could be found.

Table 4: Number of targets, which interacted with up-regulated miRNAs.# of targets count miRNAs % miRNAs

1 5 38.462 2 15.383 2 15.385 1 7.699 1 7.6927 1 7.69198 1 7.69

Correlation betweenmRNA and miRNA data

The results of the target predic-tion were then used to apply an in-tersection approach with the previ-ous identified differentially expressedmRNAs, which resulted in 41 up-regulated and 25 down-regulated mR-

NAs. Only 10.65% of the pre-dicted up-regulated targets can befound among the identified targets ofdown-regulated miRNAs, the down-regulated targets overlap even lesswith only 9.84%. In figure 5 the de-rived results are shown and in table 6and 7 the whole target lists are rep-resented.

Page 13: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

RESULTS 13

Table 5: Number of targets, which interacted with down-regulated miRNAs.# of targets count miRNAs % miRNAs

1 13 33.142 4 10.813 4 10.814 4 10.815 6 16.226 1 2.70

110 1 2.7010 1 2.7074 1 2.7065 1 2.7041 1 2.70

Figure 5: Intersection approach representing the overlapping targets of down-regulated (left) and up-regulated (right) miRNAs. Predicted/Validated tar-gets are shown in orange, identified targets by SAM in blue.

GO Annotation of identi-fied target genes

The target genes have different bi-ological functions and occur in dif-ferent biological processes. To in-vestigate the functions of the iden-tified target genes a Gene Ontologyanalysis was performed. To map theregulated target genes to biological

functions described in Gene Ontologythe Functional Annotation Tool inDAVID [21] was used. The tables 8,9 and 10 show the top terms for theup-regulated genes and their biolog-ical process, molecular function andcellular component. The Gene Ontol-ogy analysis for the down-regulatedgenes is shown in the tables 11, 12and 13.

Page 14: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

RESULTS 14

Table 6: List of identified up-regulated targets (from intersection).RefSeq Gene symbol Gene name

NM 024674 LIN28 lin-28 homolog A (C. elegans)NM 003483 HMGA2 high mobility group AT-hook 2NM 005207 CRKL v-crk sarcoma virus CT10 oncogene homolog (avian)-likeNM 005544 IRS1 insulin receptor substrate 1NM 002822 TWF1 twinfilin, actin-binding protein, homolog 1 (Drosophila)NM 001660 ARF4 ADP-ribosylation factor 4NM 012329 MMD monocyte to macrophage differentiation-associatedNM 006148 LASP1 LIM and SH3 protein 1NM 004520 KIF2A kinesin heavy chain member 2ANM 014937 INPP5F inositol polyphosphate-5-phosphatase FNM 004396 DDX5 DEAD (Asp-Glu-Ala-Asp) box polypeptide 5NM 006367 CAP1 CAP, adenylate cyclase-associated protein 1 (yeast)NM 018413 CHST11 carbohydrate (chondroitin 4) sulfotransferase 11NM 018448 CAND1 cullin-associated and neddylation-dissociated 1NM 014918 CHSY1 chondroitin sulfate synthase 1NM 014408 TRAPPC3 trafficking protein particle complex 3NM 024792 FAM57A family with sequence similarity 57, member ANM 032865 TNS4 tensin 4NM 172020 POM121 POM121 membrane glycoproteinNM 017958 PLEKHB2 pleckstrin homology domain containing, family B (evectins) member 2NM 001655 ARCN1 archain 1NM 001659 ARF3 ADP-ribosylation factor 3NM 001358 DHX15 DEAH (Asp-Glu-Ala-His) box polypeptide 15NM 005477 HCN4 hyperpolarization activated cyclic nucleotide-gated potassium channel 4NM 177438 DICER1 dicer 1, ribonuclease type IIINM 005524 HES1 hairy and enhancer of split 1, (Drosophila)NM 002524 NRAS neuroblastoma RAS viral (v-ras) oncogene homologNM 018451 CENPJ centromere protein JNM 005113 GOLGA5 golgin A5NM 002687 PNN pinin, desmosome associated proteinNM 003299 HSP90B1 heat shock protein 90kDa beta (Grp94), member 1NM 003359 UGDH UDP-glucose 6-dehydrogenaseNM 001882 CRHBP corticotropin releasing hormone binding proteinNM 005736 ACTR1A ARP1 actin-related protein 1 homolog A, centractin alpha (yeast)NM 006546 IGF2BP1 insulin-like growth factor 2 mRNA binding protein 1NM 001001890 RUNX1 runt-related transcription factor 1NM 052910 SLITRK1 SLIT and NTRK-like family, member 1NM 004302 ACVR1B activin A receptor, type IBNM 018930 PCDHB10 protocadherin beta 10NM 003901 SGPL1 sphingosine-1-phosphate lyase 1NM 002644 PIGR polymeric immunoglobulin receptor

Page 15: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

RESULTS 15

Table 7: List of identified down-regulated targets (from intersection).RefSeq Gene symbol Gene name

NM 005618 DLL1 delta-like 1 (Drosophila)NM 000633 BCL2 B-cell CLL/lymphoma 2NM 014888 FAM3C family with sequence similarity 3, member CNM 014454 SESN1 sestrin 1NM 005596 NFIB nuclear factor I/BNM 000362 TIMP3 TIMP metallopeptidase inhibitor 3NM 000314 PTEN phosphatase and tensin homologNM 005924 MEOX2 mesenchyme homeobox 2NM 004817 TJP2 tight junction protein 2 (zona occludens 2)NM 000183 HADHB hydroxyacyl-CoA dehydrogenase/3-ketoacyl-CoA thiolase/enoyl-CoA

hydratase (trifunctional protein), beta subunitNM 024092 TMEM109 transmembrane protein 109NM 005327 HADH hydroxyacyl-CoA dehydrogenaseNM 015055 SWAP70 SWAP switching B-cell complex 70kDa subunitNM 006754 SYPL1 synaptophysin-like 1NM 022152 TMBIM1 transmembrane BAX inhibitor motif containing 1NM 001010875 SLC25A30 solute carrier family 25, member 30NM 175866 UHMK1 U2AF homology motif (UHM) kinase 1NM 014320 HEBP2 heme binding protein 2NM 001654 ARAF v-raf murine sarcoma 3611 viral oncogene homologNM 153186 KANK1 KN motif and ankyrin repeat domains 1NM 003909 CPNE3 copine IIINM 000696 ALDH9A1 aldehyde dehydrogenase 9 family, member A1NM 004099 STOM stomatinNM 004815 ARHGAP29 Rho GTPase activating protein 29NM 003848 SUCLG2 succinate-CoA ligase, GDP-forming, beta subunit

Table 8: Top 10 biological processes in which the up-regulated genes areinvolved.term % p-value

vesicle-mediated transport 19.51 4.51E-04

cellular process 85.37 4.63E-04

developmental process 43.90 5.48E-04

localization 39.02 0.00316895

anatomical structure development 34.15 0.005486553

cellular developmental process 26.83 0.006876471

embryonic development 14.63 0.012390708

pre-microRNA processing 4.88 0.014787427

macromolecule localization 19.51 0.014938027

multicellular organismal development 34.15 0.015868619

Page 16: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

DISCUSSION & ANALYSIS 16

Table 9: Top 10 molecular functions in which the up-regulated genes areinvolved.term % p-value

nucleotide binding 36.59 6.85E-04

purine ribonucleotide binding 31.71 0.001272363

ribonucleotide binding 31.71 0.001272363

purine nucleotide binding 31.71 0.00186583

protein tyrosine kinase activity 9.76 0.008238424

nucleoside-triphosphatase activity 17.07 0.008957694

protein binding 70.73 0.00976464

adenyl ribonucleotide binding 24.39 0.010213171

pyrophosphatase activity 17.07 0.010746842

hydrolase activity, acting on acid anhydrides, in phosphorus-

containing anhydrides

17.07 0.010945536

Table 10: Top 10 cell components in which the up-regulated genes are in-volved.term % p-value

cytoplasmic part 51.22 0.004025231

Golgi apparatus 19.51 0.004121384

cytoplasm 65.85 0.004384796

ribonucleoprotein complex 14.63 0.007246225

intracellular organelle 73.17 0.008919468

organelle 73.17 0.009143177

stress granule 4.88 0.009521654

Golgi membrane 9.76 0.009825613

endomembrane system 17.07 0.009944993

macromolecular complex 36.59 0.011095484

Discussion & Analysis

Differentially expressed mRNAs andmiRNAs were identified with SAMand FC. The reason to use the re-sults of SAM instead of FC wasbased, besides the disadvantages ofFC over SAM, on the fact to avoidlosing too many plausible signifi-cant genes. Furthermore, differentsample set combinations were testedand the one which identified mostgenes (CM samples against heart

samples) was selected. In the end4528 up-regulated and 3115 down-regulated mRNAs as well as 123 up-regulated miRNAs and 101 down-regulated miRNAs were identified.The identified miRNAs were usedto perform target prediction withvalidated data from miRecords [20],which resulted in finding 254 down-regulated and 385 up-regulated targetmRNAs. In this study we were inter-ested in finding a correlation betweenthe expression data of miRNAs and

Page 17: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

DISCUSSION & ANALYSIS 17

Table 11: The top 10 biological processess in which the down-regulated genesare involved.term % p-value

somite specification 8.00 0.005373671

response to folic acid 8.00 0.005373671

response to estrogen stimulus 12.00 0.008629415

aging 12.00 0.009437187

response to organic cyclic substance 12.00 0.011328537

negative regulation of cell proliferation 16.00 0.011857406

regulation of cell adhesion 12.00 0.014352243

segment specification 8.00 0.014711829

response to nutrient 12.00 0.014954214

central nervous system development 16.00 0.018347457

Table 12: The molecular functions in which the down-regulated genes areinvolved.term % p-value

lipid binding 20 0.00209148

3-hydroxyacyl-CoA dehydrogenase activity 8 0.007505891

diacylglycerol binding 8 0.082004757

kinase activity 16 0.083593677

Table 13: The cell components in which the down-regulated genes are in-volved.term % p-value

mitochondrion 32 8.42E-04

cytoplasm 80 9.51E-04

intracellular 92 0.009398626

mitochondrial part 20 0.011346087

organelle envelope 20 0.013048406

envelope 20 0.013191331

cytoplasmic part 56 0.014324442

intracellular part 88 0.020391748

mitochondrial membrane 16 0.020749881

mitochondrial envelope 16 0.024361413

Page 18: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

FUTURE WORK 18

mRNAs, therefore an intersection ap-proach of the predicted/validated tar-get mRNAs and the previous identi-fied mRNAs was applied. The out-comes of this approach were 41 up-regulated and 25 down-regulated mR-NAs. The low numbers concern-ing the overlaps are caused due thefact that only validated targets hadbeen used for the target prediction,which in general leads to less re-sults. In addition the targets werebased on only a few miRNAs (13up-regulated and 37 down-regulatedmiRNAs), which also had an effect onthe overlap in the end. Furthermore,the biological functions of those geneswere identified with Gene Ontology[21], where 85.37% of the 41 up-regulated genes are involved in celluarprocesses, 70.73% of all up-regulatedgenes can be associated with proteinbinding and the cellular componentwith most of the up-regulated genesis with 78.05% the intracellular part.The Gene Ontology analysis can besupported by looking at the identi-fied up-regulated target gene CAP,adenylate cyclase-associated protein1 (CAP1) which plays an importantrole in the cell morphogenesis andmotility [22]. Another example isthe twinfilin (TWF1, also known asPTK9) which is one target of themiR-1. A reduction of this miRNAleads to an up-regulation of the twin-filin, which results in cardiac hyper-trophy [23]. Taking a closer look atthe down-regulated genes the GeneOntology terms show that the ma-jority (92%) is expressed intracellular

and most of those genes (24%) areinvolved in the system process. Asexamples the B-cell CLL/lymphoma2 (BCl-2) and the phosphatase andtensin homolog (PTEN) can be men-tioned. A repression of BCl-2 (causedby an over-expression of miR-1) leadsto the reduction of the protein lev-els of two anti-apoptotic moleculesand therefore to the inactivation oftheir activity [24]. The PTEN is a 3’-lipid phosphatase and therefore oneof the representatives for the 20% ofgenes with ”lipid binding” as molec-ular function. In addition PTEN reg-ulates the cell survival, hypertrophyand contractility of cardiomyocytes[25]. All in all the findings supportthe hypothesis that there is a corre-lation between miRNA and mRNAexpression data in hESC derived car-diomyocytes.

Future work

For the future it would be interest-ing to take a look at the other sam-ple combinations (cardiomyocytesagainst adult heart, cardiomyocytesagainst fetal heart) and repeat allthe analysis (identify differentiallyexpressed genes, miRNA target pre-diction and then check for correla-tion) with these data. As AH and FHhave less samples than combined bothas one group (as it was done for thisstudy) the results of up- and down-regulated targets will be less. Com-paring the identified targets of thedifferent combinations it would be in-

Page 19: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

ACKNOWLEDGEMENTS 19

teresting to check which targets occurin all the analysis and to have a closerlook on that one.

Acknowledgements

This thesis would not have been pos-sible unless Jane Synnergren gave methe great opportunity to work on thistopic. I would like to thank herfor her supervision and her helpinghand whenever it was necessary. Iwould like to thank Angelica Lindloffor teaching me how to carry out aproject and which important stepsit includes. Last but not least, Iwould like to thank my family andfriends for supporting and motivatingme to study abroad and to finish thisproject.

References

[1] M. Habib, O. Caspi, and L. Gepstein. Hu-

man embryonic stem cells for cardiomyogenesis.

Journal of Molecular and Cellular Cardiology,

45:462–474, 2008.

[2] JA et al. Thomson. Embryonic stem cell lines

derived from human blastocysts. Science, 282

(5391):1145–1147, 1998.

[3] I. Kehat, D. Kenyagin-Karsenti, M. Snir,

H. Segev, M. Amit, and A. et al. Gepstein. Hu-

man embryonic stem cells can differentiate into

myocytes with structural and functional prop-

erties of cardiomyocytes. Journal of Clinical

Investigation, 108(3):407–414, 2001.

[4] T. Du and PD Zamore. microprimer: the bio-

genesis and function of microrna. Development,

132(21):4645–4652, 2005.

[5] E. van Rooij and EN Olson. Micrornas: power-

ful new regulators of heart disease and provoca-

tive therapeutic targets. The Journal of Clini-

cal Investigation, 117(9):2369–2376, 2007.

[6] VG Tusher, R. Tibshirani, and G. Chu. Sig-

nificance analysis of microarrays applied to the

ionizing radiation response. PNAS, 98(9):5116–

5121, 2001.

[7] K. Kadota, Y. Nakai, and K Shimizu. Ranking

differentially expressed genes from affymetrix

gene expression data: methods with repro-

ducibility, sensitivity, and specificity. Algo-

rithms for Molecular Biology, 4:7, 2009.

[8] L. et al. Shi. Cross-platform comparability of

microarray technology: Intra-platform consis-

tency and appropriate data analysis procedures

are essential. BMC Bioinformatics, 6(Suppl 2):

12, 2005.

[9] RD Pearson. A comprehensive re-analysis of the

golden spike data: Towards a benchmark for dif-

ferential expression methods. BMC Bioinfor-

matics, 9:164, 2008.

[10] BP Lewis, I. Shih, MW Jones-Rhoades, DP Bar-

tel, and CB Burge. Prediction of mammalian

microrna targets. Cell, 115:787–798, 2003.

[11] EC Lai. Predicting and validating microrna tar-

gets. Genome Biology, 5(9):115, 2004.

[12] J. Brennecke, A. Stark, RB Russel, and SM Co-

hen. Principles of microrna–target recognition.

PLoS Biology, 3(3):e85, 2005.

[13] C. et al. Mummery. Differentiation of human

embryonic stem cells to cardiomyocytes: Role

of coculture with visceral endoderm-like cells.

Circulation, 107:2733–2740, 2003.

[14] VK Gangaraju and H. Lin. Micrornas: key reg-

ulators of stem cells. Nature Reviews Molecular

Cell Biology, 10:116–125, 2009.

[15] MX Doss, A. Sachinidis, and J. Hescheler. Hu-

man es cell derived cardiomyocytes for cell re-

placement therapy: A current update. Chinese

Journal of Physiology, 51(4):226–229, 2008.

[16] Ncbi pubmed. URL http://www.ncbi.nlm.nih.

gov/pubmed/. Accessed 19 June 2010.

[17] R. Ihaka and R. Gentleman. R: A language for

data analysis and graphics. Journal of Compu-

tational and Graphical Statistics, 5(3):299–314,

1996.

[18] RC. et al. Gentleman. Bioconductor: Open

software development for computational biol-

ogy and bioinformatic. Genome Biology, 5:R80,

2004.

[19] H. Schwender, A. Krause, and K. Ickstadt. Iden-

tifying interesting genes with siggenes. RNews,

6(5):45–50, 2006.

Page 20: IDENTIFICATION OF mi RNA s AND THEIR TARGET GEN ES IN …€¦ · cloning the cells and isolating them from early embryos). Recently the in-terest, especially in human stem cell research,

REFERENCES 20

[20] F. Xiao, Z. Zuo, G. Cai, S. Kang, X. Gao, and

T. Li. mirecords: an integrated resource for

microrna-target interactions. Nucleic Acids Re-

search, 37:D105–D110, 2009.

[21] DW Huang, BT Sherman, and RA Lempicki.

Systematic and integrative analysis of large gene

lists using david bioinformatics resources. Na-

ture Protocol, 4(1):44–57, 2009.

[22] E. Bertling, P. Hotulaiinen, PK Mattila,

T. Matilainen, M. Salminen, and P. Lap-

palainen. Cyclase-associated protein 1 (cap1)

promotes cofilininduced actin dynamics in mam-

malian nonmuscle cells. Molecular Biology of

the Cell, 2004.

[23] Q. Li, XW. Song, J. Zou, N. Zhu, XQ Li, P. Lap-

palainen, WJ Yuan, YW Qin, and Q. Jing.

[24] Y. Tang, J. Zheng, Y. Sun, Z. Wu, Z. Liu,

and G. Huang. Microrna-1 regulates cardiomy-

ocyte apoptosis by targeting bcl-2. Interna-

tional Heart Journal, 50(3):377–387, 2009.

[25] GY Oudit and JM Penninger. Cardiac regu-

lation by phosphoinositide 3-kinases and pten.

Cardiovascular Research, 82:250–260, 2009.