viewing the data: ucsc genome browser and its possibilities

23
. Robert Kuhn UC Santa Cruz . Variant Prediction Training Course Johor, Malaysia . August 27-30, 2018 Viewing the data: UCSC Genome Browser and its possibilities @GenomeBrowser

Upload: others

Post on 02-May-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Viewing the data: UCSC Genome Browser and its possibilities

.Robert Kuhn

UC Santa Cruz.

Variant Prediction Training CourseJohor, Malaysia

.

August 27-30, 2018

Viewing the data: UCSC Genome Browser and its possibilities

@GenomeBrowser

Page 2: Viewing the data: UCSC Genome Browser and its possibilities

Disclosures

Royalties from Browser licensesBioinformatics contract, Regeneron, Inc

Page 3: Viewing the data: UCSC Genome Browser and its possibilities

funding:

National Human Genome Research Institute (NHGRI)

California Institute for Regenerative Medicine (CIRM)

QB3 (UCBerkeley, UCSF, UCSC)

Chan Zuckerberg Initiative

Howard Hughes Medical Institute

Page 4: Viewing the data: UCSC Genome Browser and its possibilities

UCSC Browser team

– David Haussler – co-PI

– Jim Kent – Browser Concept, BLAT, Team Leader, PI

EngineeringAngie HinrichsKate RosenbloomHiram ClawsonGalt BarberBrian RaneyMax HaeusslerJonathan CasperChristopher Lee

QA, Docs, SupportBrian LeeMatt Speir Jairo Navarro Chris VillarrealLou NassarDaniel SmelterConner Powell

KiloKluster, Sys-adminJorge GarciaErich WeilerHaifang Talc

ManagementAnn Zweig

Page 5: Viewing the data: UCSC Genome Browser and its possibilities

UCSC = UC Santa Cruz!= USC, USCS, UCSF, UCSD….

Page 6: Viewing the data: UCSC Genome Browser and its possibilities
Page 7: Viewing the data: UCSC Genome Browser and its possibilities

How do you visualize your Next-Gen Sequencing data?

Page 8: Viewing the data: UCSC Genome Browser and its possibilities

120 assemblies

30,000 genes54,000,000 SNPs

71,000,000 mRNAs

3,000,000,000 nucleotides

80 organisms

DECIPHER

OMIM

CNVs

array probesets

Microarray expression

7,000,000,000 people1000 Genomes Project

Page 9: Viewing the data: UCSC Genome Browser and its possibilities

http:// genome-asia.ucsc.edu

http:// bit.ly/ucscMalaysia2018

>>>>>

Page 10: Viewing the data: UCSC Genome Browser and its possibilities

UCSC Genome Browser

Display engine for genomic annotations.Consistent interface across genomes.

A tool for inquiry-driven discovery.

Page 11: Viewing the data: UCSC Genome Browser and its possibilities
Page 12: Viewing the data: UCSC Genome Browser and its possibilities

YouTube training channel: bit.ly/ucscVideos

On-site workshops (training link on main page):http:// genome.ucsc.edu/training/

Page 13: Viewing the data: UCSC Genome Browser and its possibilities

VariationWatson: 2.06 million SNPsVenter: 3.32 million SNPs

1.19 million in common

In 20 kb myoglobin region on chr22, Watson and Venter share 20 SNPs. Watson has 9 unique SNPs, Venter 6.

Page 14: Viewing the data: UCSC Genome Browser and its possibilities

VariationWatson: 2.06 million SNPsVenter: 3.32 million SNPs

1.19 million in common

Page 15: Viewing the data: UCSC Genome Browser and its possibilities

VariationWatson: 2.06 million SNPsVenter: 3.32 million SNPs

1.19 million in common

Page 16: Viewing the data: UCSC Genome Browser and its possibilities

support for (some) HGVS:

NM_198056.2:c.1654G>T

NP_002993.1:p.Asp92Glu

NP_002993.1:p.D92E

BRCA1 Ala744Cys / BRCA1 p744http://genome.soe.ucsc.edu/goldenPath/help/query.html#HGVS

= bit.ly/ucscHGVS

Page 17: Viewing the data: UCSC Genome Browser and its possibilities

And now: HGVS output from Variation Annotation Integrator:

Page 18: Viewing the data: UCSC Genome Browser and its possibilities

Wiggle track

BAM track

bamToBigWig

bedGraphToBigWig

bedGraph track

Data pipeline

slide modified from:

Tim Hubbard, King’s College, London

100K genomes (rare disease or cancer)

bamToBedGraph

Page 19: Viewing the data: UCSC Genome Browser and its possibilities

Data pipeline

slide modified from:

Tim Hubbard, King’s College, London

100K genomes (rare disease or cancer)

Variant Annotation Integrator

BAM trackVCF track

Page 20: Viewing the data: UCSC Genome Browser and its possibilities

How do you visualize your Next-Gen Sequencing data? -- BAM filemillions of short reads

files too large to upload (timeout)

mismatches to referenceare in red

Custom

Track

Raw Reads aligner / SAMtools SAM

BAM

.bai

index / SAMtools

Custom Track:Make BAM, .bai files available to the web (http:, https: or ftp:)

Upload only the location of the data.

track name=trackName type=bam bigDataUrl=http://path/file.bam

The Browser fetches only tiny portion of the file

http://samtools.sourceforge.net/

sample ChIP-seq data courtesy Charles Nicolet, UC Davis

drag / zoom

Page 21: Viewing the data: UCSC Genome Browser and its possibilities

zoom to base level

view alignment details

possible heterozygotes lower quality scores

shown in lighter color

SNP track

homozygous mismatch to reference is same as a known non-synonymous SNP

UCSC genes

Page 22: Viewing the data: UCSC Genome Browser and its possibilities

Display

Read alignments: BAM, CRAM

Coverage: (BAM), wiggle

Variants: pgSNP, VCF, (HGVS, rs#)

Predictive

Variant Annotation Integrator

SIFT, PolyPhen, Mutation{Taster,Assessor} …

Data tracks

Benign, Pathogenic

CNVs, SNPs

Page 23: Viewing the data: UCSC Genome Browser and its possibilities

live demo:

http:// genome-asia.ucsc.edu

FGF2 FGFR1 FGFR2 FGFR3 FGFR4FGF2 FGFR1 FGFR2 FGFR3 FGFR4