workshop sarah pendergrass, phd ms research associate center for systems genomics

43
Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

Upload: anne-beasley

Post on 21-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

Workshop

Sarah Pendergrass, PhD MSResearch Associate

Center for Systems Genomics

Page 2: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

Outline

• ggplot2• Cytoscape• PhenoGram

Page 3: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2

• ggplot2• Plotting system for R• Flexible, accessible, visualization of data

• You must have R installed• You must have ggplot2 installed:• install.packages("ggplot2")• library(ggplot2)

Page 4: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2

• ggplot2• Developed by Hadley Wickham• Grammar of graphics: formal structured

perspective on describing data graphics• Data properties: typically numerical or

categorical values• Visual properties: x and y positions of

points, colors of lines, heights of bars• Once you have your code you can reuse reuse

reuse• Benefits compared to other R packages• Structure of the data can remain the same

while making very different types of plots• Standard format for generating plots

Page 5: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2

• ggplot2• Plotting system for R• Flexible, accessible, visualization of data

• We will walk through some examples, great references:• “R Graphics Cookbook”• “ggplot2” by Hadley Wickham

• Many more examples also exist on-line• Worth doing image searches when you

have a new set of data to plot to get ideas and basic code to modify

Page 6: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Input/Output

• A little about file input• Input – a dataframe

• data <- read.table (“datafile.txt”,header=TRUE)• data <- read.csv(“datafile.txt”,

header=FALSE)• There are also commands for importing

excel spread sheets

• Windows: • windows()will open a new figure window

• Mac: • quartz()will open a new figure window

Page 7: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Input/Output• A little about file output

• Raster/bitmap• Array of pixels• Can go high resolution (ex 600 dpi)

• Vector• Infinitely “zoomable”• Can be modified more easily in Adobe Illustrator• Slow to render with thousands of graphical objects

• ggsave(file = “output.pdf”)Or• pdf(file = “output.pdf”)• Then ggplot command• dev.off()

• Raster: png(),jpeg(),bitmap(),tiff()• Vector: pdf(),ps()

Page 8: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Input/Output

• A little about file output• Illustrator: svg• Latex: ps• MS Office: png (600 dpi)• Open Office: png (600 dpi)• Pdflatex: pdf, png (600 dpi)• Web: png (72 dpi)• Or whatever your journal specifies!

• Can specify dpi by adding to the command• ggsave(file = “output.png”, dpi=600)

Page 9: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 vocabulary

• Data: what we want to visualize• Consisting of variables in a data frame

• Data frame: primary data structure in R with properties of matrices

• Geoms: geometric objects drawn to represent the data

• Aesthetics (aes): visual properties of geoms such as defining X, defining Y, line color, point shapes, etc.

• Mappings: mapping from data values to aesthetics

• Scales: control mapping from data space to aesthetic space

• Guides: show viewer how to map visual properties back to data space: tick marks and labels, etc

Page 10: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 basics

• ggplot2• Data has to be saved in a data frame• Each type of variable mapped to an aesthetic

must be stored in a separate column (your x, y variables)

• Basic ggplot2 specification:• ggplot(dat, aes(x=xval, y=yval)• x=xval maps the column xval to the x

position• y=yval maps the column yval to the y

position• Now you need to add geometric objects…

let’s go make a first plot!

Page 11: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 1: Scatter Plot

• ggplot2• Load example data frame 1:

• dat <- read.table ("datafile.txt",header=FALSE,sep="\t")

• Name the columns:• names(dat) <-c("SampleID","PC1","PC2","Ancestry","Site","Platform","Sex","BMI")

• Type dat to check your data frame• and remember, up and down arrows are your friend in R!

Page 12: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 1: Scatter Plot

• Type dat to check your data frame

Page 13: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 1: Scatter Plot• Ggplot2 example 1

• ggplot(dat, aes (x=PC1, y=PC2))• Indicates the data (our data frame)• xval column values are mapped to the x position, etc. But

we need to add geometric objects such as points, so we need to add:

• Command: • ggplot(dat,aes(x=PC1,y=PC2))+geom_point()

• We can add group to the color of the points, by adding specifying aesthetics for that particular geom• Command:

• ggplot(dat,aes(x=PC1,y=PC2)) + geom_point(aes(color=Ancestry))

And if you want• ggsave(file="output1.pdf")

Page 14: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 1: Scatter Plot

Page 15: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 1: Scatter Plot• Ggplot2 example 1

• Command: • ggplot(dat,aes(x=PC1, y=PC2))+geom_point()

• We can add group to the color of the points, by adding specifying aesthetics for that particular geom• last_plot()+geom_point(aes(color=Ancestry))

Page 16: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 1: Scatter Plot

Page 17: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 1: Scatter Plot• Ggplot2 example 1

• How about changing the axes?• Command:

• ggplot(dat,aes(x=PC1,y=PC2))+geom_point()• Modify the scale:

• ggplot(dat,aes(x=PC1,y=PC2))+geom_point() + scale_x_continuous(limits=c(0,8))

Page 18: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 1: Scatter Plot• Ggplot2 example 1

• Change points • ggplot(dat,aes(x=PC1,y=PC2,color=Ancestry)) + geom_point(shape=1)+scale_colour_hue(l=50) # Open circles and a slightly darker palette than normal

• Add regression lines• ggplot(dat,aes(x=PC1,y=PC2)) + geom_point(shape=1)+scale_colour_hue(l=50) + geom_smooth(method=lm,se=FALSE) #Add linear regression lines but don’t add shaded confidence region

• ggplot(dat,aes(x=PC1,y=PC2,color=Ancestry)) + geom_point(shape=1) + scale_colour_hue(l=50) + geom_smooth(method=lm,se=FALSE)

Page 19: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 1: Scatter Plot

Page 20: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 1: Scatter PlotSet shape based on a conditionggplot(dat,aes(x=PC1,y=PC2,shape=Ancestry)) + geom_point()Set shape and color based on separate conditionsggplot(dat,aes(x=PC1, y=PC2, color=Platform,shape=Ancestry)) + geom_point()Same but use hollow circles and trianglesggplot(dat,aes(x=PC1,y=PC2,shape=Ancestry, color=Platform)) + geom_point() + scale_shape_manual(values=c(1,2))

Page 21: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 1: Scatter Plot

Page 22: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 1: Scatter PlotSame but increase shape size and text size

theme_set(theme_gray(base_size = 15))

ggplot(dat,aes(x=PC1,y=PC2,shape=Ancestry, color=Platform)) + geom_point(size=8)

Page 23: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 2: Histograms

Histogram• ggplot(dat, aes(x=BMI)) + geom_histogram(binwidth=.5,

colour="black", fill="white")

Histogram adding the mean• ggplot(dat, aes(x=BMI)) + geom_histogram(binwidth=.5,

colour="black", fill="white") +geom_vline(aes(xintercept=mean(BMI, na.rm=T)),color="red", linetype="dashed", size=1)

Tip: you can use “bin width” to adjust bin size (wider bins, more items in each bin)• ggplot(dat, aes(x=BMI)) + geom_histogram(binwidth=5,

colour="black", fill="white") +geom_vline(aes(xintercept=mean(BMI, na.rm=T)),color="red", linetype="dashed", size=1)

Page 24: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 2: Histogram and Density Graphs

Page 25: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 4: Bar GraphMaking a bar graph:

• ggplot(data=dat, aes(x=SampleID, y=BMI))+ geom_bar(stat="identity")

• Colors• ggplot(data=dat, aes(x=SampleID, y=BMI, fill=Ancestry))+ geom_bar(stat="identity")

Page 26: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

Ggplot2: Bar Graph• The space below the top line on a bar chart is usually

meaningless – only representing the distance between start value and plotted value

• The information of the bar plot can actually be represented with single dots

• This can cut down on visual clutter, and also make a more visually meaningful plot

• One way to show the trends of the points – but needs more:• ggplot(data=dat, aes(x=SampleID, y=BMI))+ geom_point()

Page 27: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

Ggplot2: Bar Graph

• One way to show the trends of the points better – however this might make it seem like nearby points are related via proximity• ggplot(data=dat, aes(x=SampleID, y=BMI))+ geom_line()+geom_point()

Page 28: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

Ggplot2: Cleveland Dot Plot

• Another way to show the trends of the points better: Cleveland Dot Plot• ggplot(data=dat, aes(x=SampleID, y=BMI))+geom_segment(aes(xend=SampleID),yend=0,color="grey") + geom_point()

Page 29: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 5: Creating Boxplots

• When comparing the distributions of groups of data, boxplots are a great approach instead of bar charts

• Command: • ggplot(dat,aes(x=Sex,y=BMI)) + geom_boxplot()

Page 30: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 5: Creating Boxplots

• When comparing the distributions of groups of data, boxplots are a great approach instead of bar charts

• Adding color to box plot• ggplot(dat,aes(x=Sex,y=BMI,fill=Sex)) + geom_boxplot()

Page 31: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 5: Creating Boxplots

• When comparing the distributions of groups of data, boxplots are a great approach instead of bar charts

• Add summary like mean to box plot (Adding mean as a diamond shape)• ggplot(dat,aes(x=Sex,y=BMI)) + geom_boxplot()+ stat_summary(fun.y=mean,geom="point", shape=5, size=4)

Page 32: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 5: Creating Boxplots

• When comparing the distributions of groups of data, boxplots are a great approach instead of bar charts

• Adding individual data points to the box plot• ggplot(dat, aes(x=Sex,y=BMI,fill=Sex)) + geom_boxplot()+geom_point()

Page 33: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 5: Creating Boxplots

• When comparing the distributions of groups of data, boxplots are a great approach instead of bar charts

• What about adding a title?• ggplot(dat,aes(x=Sex,y=BMI,fill=Sex)) + geom_boxplot()+geom_point()+ggtitle("BMI for each Sex")

Page 34: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 5: Creating Boxplots

• When comparing the distributions of groups of data, boxplots are a great approach instead of bar charts

• What about adding modifying the axis titles?• ggplot(dat, aes(x=Sex, y=BMI,fill=Sex)) + geom_boxplot() +geom_point()+ggtitle("BMI for each Sex")+xlab("Sex")+ylab("Body Mass Index")

Page 35: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Example 6:Facets• You to split up your data by one or more variables and

plot the subsets of data together:• ggplot(dat,aes(x=PC1,y=PC2)) +

geom_point(aes(color=Ancestry))+facet_grid(Sex ~ .)

Page 36: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 A Note on Colors• In the examples, we used mostly ggplot2 default colors

• There are lots of options for getting into different colors for ggplot2

• http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/• Example• Using scale_fill_manual, you can use color hexadecimal codes

(you can get these from Color Brewer http://colorbrewer2.org/)

• ggplot(dat,aes(x=Sex,y=BMI,fill=Sex))+geom_bar(stat="identity") +scale_fill_manual(values=c("#CC6666","#9999CC"))

Page 37: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Other Notes

• Not covered here but so many options!• Color of background• Grid line modification• Font choice• Other kinds of plots such as heatmaps, and using the

techniques here to make Manhattan plots, coloring maps with information

Page 38: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

ggplot2 Other NotesExamples and code are EVERYWHERE!!

This was just a Google Image search on “ggplot2”!

Page 39: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

Cytoscape• Introduction to Cytoscape

Page 40: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

PhenoGram• Chromosomal Ideogram• Can add lines, shapes, and text• Can add cytogenetic banding patterns• Web version here:

http://visualization.ritchielab.psu.edu/phenograms/plot• Example files here:

http://visualization.ritchielab.psu.edu/phenograms/examples

• Currently only human chromosomal information, adding mouse soon and will add other model organisms

Page 41: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

PhenoGram• Chromosomal Ideogram• Can add lines, shapes• Can add cytogenetic banding patterns• Download: phenogram-groups-sample.txt• Go to:

http://visualization.ritchielab.psu.edu/phenograms/plot

Page 42: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

PhenoGram

Page 43: Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

PhenoGram• Chromosomal Ideogram• Can plot just lines• Download: phenogram-

groups-poscolor.txt• Go to:

http://visualization.ritchielab.psu.edu/phenograms/plot