integrated sequence analysis of pancreatic...

24
Tipping Point Meeting - 1 st December 2010 Integrated sequence analysis of pancreatic cancer Queensland Centre for Medical Genomics Sequencing Group Dr Brooke Gardiner

Upload: others

Post on 10-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Tipping Point Meeting - 1st December 2010

Integrated sequence analysis of pancreatic cancer

Queensland Centre for Medical Genomics

Sequencing Group

Dr Brooke Gardiner

Page 2: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

QCMG:Sean GrimmondPeter Wilson

Genome Biology: Bioinformatics: Genome Sequencing: Life Technologies:Nicole Cloonan John Pearson Brooke Gardiner Gabriel KolleKarin Kassahn Darrin Taylor David Miller John DavisNic Waddell David Tang Craig Nourse John SheppardAnita Steptoe Conrad Leonard Suzanne Manning Kevin McKernanShivangi Wani Jason Steen Ehsan Nourbakhsh Yongming SunKeerthana Krishnan Christina Xu Ivon Harliwong Beverly R&D Mellissa Brown Matt Anderson Senel Idrisoglu Foster City R&DNick Matigan David WoodRathi Thiagarajan

Acknowledgements

Array Facility (IMB):Katia NonesRebecca Foale

HPC (UQ):Lutz ProssZiping FangDavid Green

Garvan Institute:Andrew BiankinAmber Johns Chris Scarlett Mark PineseDavid Chang Michelle Thomas Chris ToonMary-Anne Brancato Cathy Axford Emily ColvinAmanda Mawson Johana Susanto Rob SutherlandSue Henshall Liz Musgrove Roger Daly

Page 3: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

QCMG sequencing and analysis facilities

Computing:3 servers, 400 Cores, 3Tb RAM, 1Pb storage10G network connectivity

Computing:3 servers, 400 Cores, 3Tb RAM, 1Pb storage10G network connectivity

Workflow:Manual Library prepAutomated emulsion PCR & EnrichmentRobotic Library assembly & Enrichment(Bravo Agilent) & (Spri- Beckmann)

Workflow:Manual Library prepAutomated emulsion PCR & EnrichmentRobotic Library assembly & Enrichment(Bravo Agilent) & (Spri- Beckmann)

Sequencers11 SOLiD Genome Sequencers V4.

Technology development:SOLiDHQ (250Gb Q4-09 500Gb Q1-10)Ion Torrent, … ? ….

Sequencers11 SOLiD Genome Sequencers V4.

Technology development:SOLiDHQ (250Gb Q4-09 500Gb Q1-10)Ion Torrent, … ? ….

Laboratories:1200m2 dedicated laboratory space.(5th and 6th floors IMB)

Personnel:41 Bioinformaticians, Genomics experts & Genome Biologists

Laboratories:1200m2 dedicated laboratory space.(5th and 6th floors IMB)

Personnel:41 Bioinformaticians, Genomics experts & Genome Biologists

LaboratoriesLaboratories

Informatics PersonnelInformatics Personnel

Page 4: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

ICGC Global Participants

ICGC Goal: To obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical significance across the globe

ICGC Goal: To obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical significance across the globe

Page 5: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Cancer is driven by the accumulationof genetic & epigenetic changes

ICGC Goal: To obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical significance across the globe

ICGC Goal: To obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical significance across the globe

Page 6: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Pancreatic Cancer Study

Presentation

Diagnosis

Treatment Plan

Surgery

Adjuvant Therapy

Recurrence

Death

Recruitment Patient ConsentSample Collection (Blood)Recording of Serum MarkersRecording of Pathological Data

Sample Collection (Blood)Recording of Serum MarkersRecording of Pathological Data

Date and Cause

Sample Collection (Blood)Recording of Serum MarkersRecording of Pathological Data

Sample CollectionOperative Data RecordingXenograft Generation

Resection

Pancreatic Resection Surgery

Page 7: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Cancer Genome Workflow

Sample Submission

Library Preparation

Tag Mapping

Sequencing

Sequencing Preparation

Microarray

Data Generation

Data CollationData Analysis

Data Submission

SurgeryPathological Review

Tumour Dissection

CNV/SNP:Expression:

Illumina HumanOmni1-Quad SNPIllumina HumanHT-12

Applied Biosystems SOLiD Analyzer

Emulsion PCR; Bead enrichment

Agilent BioanalyerNanoDrop1000, QubitElectrophoretic fractionation

DNA:

RNA:

Whole genome; methyome; exomeFragment; BC Fragment; Mate PairmiRNA; whole transcriptomeTotal Transcriptome BC Fragment

Applied Biosystems BioScopeIn-House software & analysis pipelines

In-House software & analysis pipelines

Laboratory Tracking –-LIMS –- Geneus

Macro-dissection of tumour tissueXenograft & cell line generation

Pathological review: Aus, USA, ItalyEstimation of tumour content

1. Independent Pathological reviewin Australia, Italy and USA.

2. 5mm3 frozen blocks of Tumour are sectioned to locate non-tumour tissue.

3. Tumour rich regions are dissected form block and DNA/RNA extracted

Page 8: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Sample availability from participating patients(DNA/RNA submitted to QCMG)

Patients represented in QCMG collectionPancreatic 63Ovarian 60

Patients represented in QCMG collectionPancreatic 63Ovarian 60

Direct Sample Processing

Enrichment Processing

PA

N

T

X

C

S

e

e

e

Validation & surveillance

Adjacent Normal

Normal

Tumour

Xenograft

Cell line

Serum

Page 9: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Cancer Genome Workflow

Sample Submission

Library Preparation

Tag Mapping

Sequencing

Sequencing Preparation

Microarray

Data Generation

Data CollationData Analysis

Data Submission

Agilent BioanalyerNanoDrop1000, QubitElectrophoretic fractionation

Laboratory Tracking –-LIMS –- Geneus

SurgeryPathological Review

Tumour Dissection

SmallRNAs

18S & 28S rRNA

High molecularweight DNA

ICGC-PICI-20100225.01-TD

ICGC-PICI-20100225.02-TR

Page 10: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Cancer Genome Workflow

Sample Submission

Library Preparation

Tag Mapping

CNV/SNP:Expression:

Illumina HumanOmni1-Quad SNPIllumina HumanHT-12

Sequencing

Sequencing Preparation

Microarray

Data Generation

Data CollationData Analysis

Data Submission

SurgeryPathological Review

Tumour Dissection

Tumour vs Xenograft

Human cells isolated from xenograft

Primary tumour (25% cellularity)

Require:>80% sensitivity & 95% accuracy

Page 11: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Cancer Genome Workflow

Sample Submission

Library Preparation

Tag Mapping

Sequencing

Sequencing Preparation

Microarray

Data Generation

Data CollationData Analysis

Data Submission

DNA:

RNA:

Whole genome; methyome; exomeFragment; BC Fragment; Mate PairmiRNA; whole transcriptomeTotal Transcriptome BC Fragment

SurgeryPathological Review

Tumour Dissection

Whole Genome:Tumor tissue & normal

30‐40x (LMP / PE)

Whole Genome:Tumor tissue & normal

30‐40x (LMP / PE)

Exome:Tumor tissue & normal

>>100x (PE)

Exome:Tumor tissue & normal

>>100x (PE)

mRNA & miRNA:Tumor tissue & adjacent normal 100million (PE) / 10million reads 

(SE)

mRNA & miRNA:Tumor tissue & adjacent normal 100million (PE) / 10million reads 

(SE)Methylome

(Methyl‐capture)Tumor & adjacent normal ~20million reads (1Gb) (PE)

Methylome(Methyl‐capture)

Tumor & adjacent normal ~20million reads (1Gb) (PE)

Page 12: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Whole Genome

Sure Select Exome

Exome SNV calling

Map tags togenome

Identification of SNVs

Verify SNVs(eg. SNP chip, RNASeq,

Sanger Sequencing)

Filter Sequencetags

Annotate SNVs(e.g. dbSNP,

non-synonymous,somatic)

Sample Submission

Library Preparation

Tag Mapping

Sequencing

Sequencing Preparation

Microarray

Data Generation

Data CollationData Analysis

Data Submission

Applied Biosystems BioScopeIn-House software & analysis pipelines

In-House software & analysis pipelines

SurgeryPathological Review

Tumour Dissection

Page 13: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Raw data generation and mapping pipelines

Total Raw ~ 7.5 Tb

Exome capture coverage rates(post filter, high quality)

Exome – 100 Patient (ND/TD) paired LMP – 500 Patient (ND/TD/+) sets

Page 14: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Detecting variations by sequencing

Structural VariationCopy Number Variation

Substitutions, Insertions, Deletions

Page 15: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

BAM processing for variant/mutation calling

Map tags togenome

Identification of SNVs

Verify SNVs(eg. SNP chip, RNASeq,

Sanger Sequencing)

Filter Sequencetags

Annotate SNVs(e.g. dbSNP,

non-synonymous,somatic)

PCR duplicates are marked using Picard

In-house tools for manipulating and profiling.bams (qbamMerge, qbamSplit, qbamFilter, qProfiler)

Pre-filter: alignment length >34 or (F5 and in proper pair), mappingquality > 14, less than 3 mismatches

Variant caller: diBayes (Bioscope 1.2)

Post-filter: coverage, > 3 novel starts supporting mutation/variant, mutation/variant not in pileup of matched normal, not a germlinevariant in another patient, review in IGV (non-syn, stop)

All variants/mutations called are retained in in-house database, even if they failed a filter (classA vs classB)

Page 16: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Exome SNV calling

Map tags togenome

Identification of SNVs

Verify SNVs(eg. SNP chip, RNASeq,

Sanger Sequencing)

Filter Sequencetags

Annotate SNVs(e.g. dbSNP,

non-synonymous,somatic)

Number within an ORF5766

Number within an ORF~18,000~13,000

SNV consequenceSplice site ~2,100 / ~1,300

Non-synonymous ~9,000 / 6,600Stop gained 129 / 84

SNV consequenceSplice site 8 / 7

Non-synonymous 43 / 48Stop gained 3 / 2

Somatic SNVs~7,000

124

Germline SNVs~2,400,000

~36,000

Total Number of SNVs in a patient~3,200,000

~73,000

whole-genome shotgun / exome-capture

Page 17: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Exome SNV calling

Map tags togenome

Identification of SNVs

Verify SNVs(eg. SNP chip, RNASeq,

Sanger Sequencing)

Filter Sequencetags

Annotate SNVs(e.g. dbSNP,

non-synonymous,somatic)

Possibly damaging genes from class A:CDON, EXT1, HOXB2, KIAA1199, KRAS, NPC1, TMEM74, XKR3

Probably damaging genes from class A:BIRC6, CGN, CTTNBP2NL, DDX56, ERC1, MPDZ, MPP4, MYH11, OR6A2, PPFIBP2, RASSF5, SLC4A11, SNX13, SPOCK1, TLR7, TMEM22, TSHZ3, ZC3H11A, ZNF318

SNV Total Class A

Non-synonymous 1462 48

Annotated in PolyPhen 1250 42

Benign 514 13

Possibly damaging 224 8

Probably damaging 437 20

Unknown 75 1

Page 18: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Commonly targeted pathways: Jones et al

Pathway # genes Representative altered genes

Apoptosis 9 CASP10, VCP, CAD, HIP1

DNA damage control 9 ERCC4, ERCC6, EP300, RANBP2, TP53

Regulation of G1/S phase transition 19 CDKN2A, FBXW7, CHD1, APC2

Hedgehog signaling 19 TBX5, SOX3, LRP2, GLI1, GLI3, BOC, BMPR2, CREBBP

Homophilic cell adhesion 30 CDH1, CDH10, CDH2, CDH7, FAT, PCDH15, PCDH17, PCDH18, PCDH9, PCDHB16, PCDHB2, PCDHGA1, PCDHGA11, PCDHGC4

Integrin signaling 24 ITGA4, ITGA9, ITGA11, LAMA1, LAMA4, LAMA5, FN1, ILK

c-Jun N-terminal kinase signaling 9 9MAP4K3, TNF, ATF2, NFATC3

KRAS signaling 5 KRAS, MAP2K4, RASGRP3

Regulation of invasion 46 ADAM11, ADAM12, ADAM19, ADAM5220, ADAMTS15, DPP6, MEP1A, PCSK6, APG4A, PRSS23

Small GTPase–dependent signaling (other than KRAS)

33 AGHGEF7, ARHGEF9, CDC42BPA, DEPDC2, PLCB3, PLCB4, RP1, PLXNB1, PRKCG

TGF-β signaling 37 TGFBR2, BMPR2, SMAD4, SMAD3

Wnt/Notch signaling 29 MYC, PPP2R3A, WNT9A, MAP2, TSC2, GATA6, TCF4

Page 19: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Commonly targeted pathways: Jones et al Overlap with QCMG

Pathway # genes Representative altered genes

Apoptosis 9 CASP10, VCP, CAD, HIP1

DNA damage control 9 ERCC4, ERCC6, EP300, RANBP2, TP53

Regulation of G1/S phase transition 19 CDKN2A, FBXW7, CHD1, APC2

Hedgehog signaling 19 TBX5, SOX3, LRP2, GLI1, GLI3, BOC, BMPR2, CREBBP

Homophilic cell adhesion 30 CDH1, CDH10, CDH2, CDH7, FAT, PCDH15, PCDH17, PCDH18, PCDH9, PCDHB16, PCDHB2, PCDHGA1, PCDHGA11, PCDHGC4

Integrin signaling 24 ITGA4, ITGA9, ITGA11, LAMA1, LAMA4, LAMA5, FN1, ILK

c-Jun N-terminal kinase signaling 9 9MAP4K3, TNF, ATF2, NFATC3

KRAS signaling 5 KRAS, MAP2K4, RASGRP3

Regulation of invasion 46 ADAM11, ADAM12, ADAM19, ADAM5220, ADAMTS15, DPP6, MEP1A, PCSK6, APG4A, PRSS23

Small GTPase–dependent signaling (other than KRAS)

33 AGHGEF7, ARHGEF9, CDC42BPA, DEPDC2, PLCB3, PLCB4, RP1, PLXNB1, PRKCG

TGF-β signaling 37 TGFBR2, BMPR2, SMAD4, SMAD3

Wnt/Notch signaling 29 MYC, PPP2R3A, WNT9A, MAP2, TSC2, GATA6, TCF4

Page 20: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Commonly targeted pathways:Overlap with QCMGNew QCMG

Pathway # genes Representative altered genes

Apoptosis 9 CASP10, VCP, CAD, HIP1, CASP3, IRAK4, PIK3CD TNFRSF1A

DNA damage control 9 ERCC4, ERCC6, EP300, RANBP2, TP53

Regulation of G1/S phase transition 19 CDKN2A, FBXW7, CHD1, APC2, RB

Hedgehog signaling 19 TBX5, SOX3, LRP2, GLI1, GLI3, BOC, BMPR2, CREBBP, DYRK1A

Homophilic cell adhesion 30 CDH1, CDH10, CDH2, CDH7, FAT, PCDH15, PCDH17, PCDH18, PCDH9, PCDHB16, PCDHB2, PCDHGA1, PCDHGA11, PCDHGC4, PCDHGB7, THBS4, FARP2

Integrin signaling 24 ITGA4, ITGA9, ITGA11, LAMA1, LAMA4, LAMA5, FN1, ILK

c-Jun N-terminal kinase signaling 9 9MAP4K3, TNF, ATF2, NFATC3

KRAS signaling 5 KRAS, MAP2K4, RASGRP3, AKAP9, PDE1C

Regulation of invasion 46 ADAM11, ADAM12, ADAM19, ADAM5220, ADAMTS15, DPP6, MEP1A, PCSK6, APG4A, PRSS23

Small GTPase–dependent signaling (other than KRAS)

33 AGHGEF7, ARHGEF9, CDC42BPA, DEPDC2, PLCB3, PLCB4, RP1, PLXNB1, PRKCG

TGF-β signaling 37 TGFBR2, BMPR2, SMAD4, SMAD3, TGFB1, EP300

Wnt/Notch signaling 29 MYC, PPP2R3A, WNT9A, MAP2, TSC2, GATA6, TCF4, NCSTN, NOTCH2, CAMK2D, SENP2

Page 21: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Transcriptome analysis

Gene-centric analysis:

• Count reads mapping to exons• Normalize/scale counts• Analyze differential expression• Array correlation 0.75-0.8• Sensitivity < 1 RNA/cell• Arrays: 8,500 genes active• RNAseq: 12,008 genes active

Nucleotide resolution:

• Identify expressed variants• Split variants into expressed germline Vs somatic events• Powerful validation tool for mutations predicted by wgs and exome-seq.• Potential for studying allele specific expression, and RNA editing

Page 22: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Expression of somatic variants

GermlineGermline

TumourTumour

ExpressionExpression

KRAS activating mutation C>T (G12D)

Page 23: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

Cancer Genome Report: APGI -1959

Cancer Research Program

Page 24: Integrated sequence analysis of pancreatic cancerlabsergen.langebio.cinvestav.mx/bioinformatics/jacob/meetings/tippi… · In-House software & analysis pipelines In-House software

QCMG:Sean GrimmondPeter Wilson

Genome Biology: Bioinformatics: Genome Sequencing: Life Technologies:Nicole Cloonan John Pearson Brooke Gardiner Gabriel KolleKarin Kassahn Darrin Taylor David Miller John DavisNic Waddell David Tang Craig Nourse John SheppardAnita Steptoe Conrad Leonard Suzanne Manning Kevin McKernanShivangi Wani Jason Steen Ehsan Nourbakhsh Yongming SunKeerthana Krishnan Christina Xu Ivon Harliwong Beverly R&D Mellissa Brown Matt Anderson Senel Idrisoglu Foster City R&DNick Matigan David WoodRathi Thiagarajan

Acknowledgements

Array Facility (IMB):Katia NonesRebecca Foale

HPC (UQ):Lutz ProssZiping FangDavid Green

Garvan Institute:Andrew BiankinAmber Johns Chris Scarlett Mark PineseDavid Chang Michelle Thomas Chris ToonMary-Anne Brancato Cathy Axford Emily ColvinAmanda Mawson Johana Susanto Rob SutherlandSue Henshall Liz Musgrove Roger Daly