introduction to analysis of genomic data using r...
Post on 15-Jul-2020
6 Views
Preview:
TRANSCRIPT
- - :
Introduction to Analysis of Genomic Data Using RLab 2: Introduction to R
Dr. Yen-Yi Ho (hoyen@stat.sc.edu)
Jan 17, 2018
1/11
- - :
2/11
- - :
R computing & graphics package
I R is a powerful, free statistical computing and graphicspackage.
I Popular with many researchers due to contributed packages:R functions to do specialized, advanced, & often complexstatistical analysis.
I R can also do many important, routine calculations, analysis,and provide common graphical displays used in this course.
I Installed in several of the computing labs across campus, e.g.Sloan 108 & 109, Gambrell 003.
I You can download it and install it from CRAN:http://cran.r-project.org/
3/11
- - :
R: Pros and Cons
Pros Cons+ Free - No dedicated support+ Available for all major - Complex Syntax
platforms+ Powerful graphics - Not point-and-click+ Comprehensive - No warranty+ Easy interface with other languages
(such as C, Fortran) - Relatively slow+ Well-designed programming
language (object-oriented)+ Unlimited extensibility+ Widely used by statisticians+ Increasingly used for genomic
analyses (Bioconductor)
4/11
- - :
Bioconductor: a collection of R packages for genomic dataanalysis
I Started by Robert Gentleman housing R packages for genomicdata analysis.
5/11
- - :
Bioconductor installation
I Use biocLite.R script
I Installing a specific package from Bioconductor:
source("http://www.bioconductor.org/biocLite.R")
biocLite("limma")
6/11
- - :
Online resources: genome browser and public datarepositories
I UCSC genome browser: host genomic annotation data formany species.
7/11
- - :
Public high-throughput data repositories
I GEO: Gene expression omnibus.I Funded by NCBII Host array- and sequencing-based data.
I ArrayExpression: European version of GEOI Better curated than GEO but has less data.
I SRA: sequence read archive.I Designed for hosting large scale high-throughput sequencing
data (high speed file transfer).
8/11
- - :
Other public data resources
I TCGA (The Cancer Genome Atlas)I Host data generated by TCGA, a big consortium to study
cancer genomics.I Huge collection of cancer related data: different types of
genomic, genetic and clinical data for many different types ofcancers.
I ICGC (International Cancer Genome Consortium): Similar toTCGA but have a larger collection of studies.
I ENCODE (the ENCyclopedia Of DNA Elements) datacoordination center
I Host data generated by ENCODE, a big consortium to studyfunctional elements of human genome.
I Rich collection of genomic and epigenomic data.
I Many others ...
9/11
- - :
Next Lab: R Topics Outline
I Get Started
I R as a calculator
I Vectors
I Matrices, Arrays, Factors, List, Data Frame
I Import/Export Data
I R Graphics
I Random number generating
I Writing R function
I for loops
I rep, seq, which, match
10/11
- - :
To do list after this class
I Review slides.
I Read WiKi for DNA, gene, genome, DNA microarray andDNA sequencing.
I Install R and Bioconductor on your computer.
I Start to learn R by reading Applied Statistics froBioinformatics Using R https://cran.r-project.org/doc/contrib/Krijnen-IntroBioInfStatistics.pdf
11/11
top related