regioner an r/bioconductor package for the magement and comparision of genomic regions anna díez...

Post on 18-Jan-2018

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

regioneR Basic management of genomic regions Statistical evaluation Helper function making our lives easier

TRANSCRIPT

regioneRan R/Bioconductor package for the magement and comparision of genomic regions

Anna DíezBernat GelRoberto Malinverni

regioneR aimsPractical to use. Easy to understand.

Generic and useful

Efficient

Customizable Something we would like to use

regioneRBasic management of genomic regions

Statistical evaluation

Helper function making our lives easier

The BasicsStatistics

Customization

Helper Functions

The Basics Statistics Customization Helper Functions

The Basics Statistics Customization Helper Functions

THE BASICS

joinRegions

The Basics Statistics Customization Helper Functions

Amin.dist

joinRegions(A, min.dist)

subtractRegions

The Basics Statistics Customization Helper Functions

A

B

subtractRegions(A, B)

splitRegions

The Basics Statistics Customization Helper Functions

A

B

splitRegions(A, B, min.size=1, track.original=TRUE)

mergeRegions

The Basics Statistics Customization Helper Functions

commonRegions

extendRegions¿any other? flankingRegions? …

overlapRegions

The Basics Statistics Customization Helper Functions

A

B

overlapRegions(A, B, colA, colB, type, min.bases, min.pctA, min.pctB, get.pctA, get.pctB, get.bases, only.boolean, only.count, ...)

overlapRegions

The Basics Statistics Customization Helper Functions

Example: annotateRegions

The Basics Statistics Customization Helper Functions

regAnnotation(regions, annot.tab, ann.names, strands, descr, peak.point, gap3,

gap5)

The Basics Statistics Customization Helper Functions

STATISTICS

overlapPermTest

The Basics Statistics Customization Helper Functions

A

BB

overlapPermTest

The Basics Statistics Customization Helper Functions

A

B

B’4

4

3

5

4

5

2

4

0.33

1

overlapPermTest

The Basics Statistics Customization Helper Functions

Example: TIs

The Basics Statistics Customization Helper Functions

TIs over: 81TIs under 66

SCNA gain: 60SCNA losses: 53

The Basics Statistics Customization Helper Functions

Number of permutations: 1000Alternative: greaterEvaluation of the original region set: 81Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 37.00 47.00 50.00 50.43 53.00 67.00 Standard score: 6.8117P-value: 0.000999000999000999 ***--- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

overlapPermTest(TIs_over, SCNA.gains, alternative="g“, genome=“hg19”, ntimes=1000)

Gains vs Overexpression

~800s (~13min)

The Basics Statistics Customization Helper Functions

Number of permutations: 1000Alternative: greaterEvaluation of the original region set: 66Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 37.00 48.00 50.00 50.18 53.00 60.00 Standard score: 4.4942P-value: 0.000999000999000999 ***--- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

overlapPermTest(Tis_under, SCNA.losses, alternative="g“, genome=“hg19”, ntimes=1000)

Losses vs Underexpression

The Basics Statistics Customization Helper Functions

Number of permutations: 1000Alternative: greaterEvaluation of the original region set: 25Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 30.00 39.00 41.00 41.25 44.00 52.00 Standard score: -4.2739P-value: 1 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

overlapPermTest(TIs_under, SCNA.gains, alternative="g“, genome=“hg19”, ntimes=1000)

Gains vs Underexpression

recomputePermTest(gains.under, alternative="l")

The Basics Statistics Customization Helper Functions

Number of permutations: 1000Alternative: lessEvaluation of the original region set: 25Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 30.00 39.00 41.00 41.25 44.00 52.00 Standard score: -4.2739P-value: 0.000999000999000999 ***

overlapPermTest(TIs_under, SCNA.gains, alternative=“l“, genome=“hg19”, ntimes=1000)

Gains vs Underexpression

The Basics Statistics Customization Helper Functions

overlapPermTest(10KrandomA, 10KrandomB, alternative=“g“, genome=“hg19”, ntimes=1000)

Random Region Sets

Number of permutations: 1000Alternative: greaterEvaluation of the original region set: 68Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 42.00 57.00 62.00 62.16 67.00 89.00 Standard score: 0.7488P-value: 0.215784215784216 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

~1850s (~30min) (Single core)

~800s (~13min) (Parallel 4 cores)

permTest

The Basics Statistics Customization Helper Functions

overlapPermTestoverlap

randomRegions

distance

resampling

value of a function

permTest

The Basics Statistics Customization Helper Functions

permTest(A, ntimes=1000, randomize.function, evaluate.function, alternative, min.parallel=1000, force.parallel=NULL, ...)

overlapPermTest <- permTest(A, randomize.function=randomizeRegions, evaluate.function=countOverlaps)

Example: Genes & ALUs

The Basics Statistics Customization Helper Functions

1.175.329 ALUs 9.111 overexpressed genes51.796 genes

¿Are overexpressed genes closer to ALUs than expected by chance?

Example: Genes & ALUs

The Basics Statistics Customization Helper Functions

Resampling

¿Are overexpressed genes closer to ALUs than expected by chance?

Mean Distance

permTest(A=expressed, B=alus, ntimes=1000, randomize.function=resampleRegions, universe=genes2, evaluate.function=meanDistance, alternative="less")

Example: Genes & ALUs

The Basics Statistics Customization Helper Functions

¿Are overexpressed genes closer to ALUs than expected by chance?

Number of permutations: 1000 Alternative: less Evaluation of the original region set: 353.371858193393 Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 912.1 992.8 1010.0 1011.0 1028.0 1095.0  Standard score: -25.0275 P-value: 0.000999000999000999 ***

The Basics Statistics Customization Helper Functions

CUSTOMIZATION

The Basics Statistics Customization Helper Functions

countOverlapsmeanDistancemeanInRegions

Available functions

randomizeRegionsresampleRegions

Evaluation Randomization

GC content TF binding sites Encode classification …

GC aware randomization …

The Basics Statistics Customization Helper Functions

Custom functions

randomize.function(A,...)

Randomization

resampleRegions <- function(A, universe, ...) { resample <- universe[sample(1:length(universe), length(A))] return(resample) }

The Basics Statistics Customization Helper Functions

Custom functions

evaluate.function(A,...)

Evaluation

meanDistance <- function(A, B, ...) {d <- distanceToNearest(A, B, ...)

return(mean(as.matrix(d@elementMetadata)[,1])) }

The Basics Statistics Customization Helper Functions

HELPERFUNCTIONS

The Basics Statistics Customization Helper Functions

toGRanges & toDataframe

chr start end chr1 2000 4000 chr1 5000 5500 chr1 10000 12000

GRanges with 3 ranges and 0 elementMetadata values seqnames ranges strand | <Rle> <IRanges> <Rle> | [1] chr1 [ 2000, 4000] * | [2] chr1 [ 5000, 5500] * | [3] chr1 [10000, 12000] * |

Seqlengths chr1 NA

The Basics Statistics Customization Helper Functions

Genomes & MasksgetGenome(genome)

getMask(genome)

getGenomeAndMask(genome, mask)

characterToBSGenome(genome.id)

maskFromBSGenome(bsgenome)

emptyCache()

The Basics Statistics Customization Helper Functions

RandomizationrandomizeRegions(A, genome="hg19", mask=NULL, non.overlapping=FALSE, per.chromosome=FALSE, ...)

createRandomRegions(nregions=100, length.mean=250, length.sd=20, genome="hg19", mask=NULL, non.overlapping=FALSE)

resampleRegions(A, univers, per.chromosome=FALSE, ...)

Aaaaaalmost finished: Anyone with experience in packaging for Bioconductor?

Suggestions? Requests? Improvements?

Beta Testers Wanted

top related