1 cse dept., university of connecticut 2 immunology dept., uchc

Analysis Pipeline

Genome sequence

mRNA reads

Consensus coding

sequences

CCDSMapping

Genome Mapping

Read merging

EpitopePrediction

Merged mapped

Expressed SNPsCCDS

mapped reads

Genome mapped

Immunogenic mutations

Known SNPs

SNP calling

Approaches to Phasing

• We maximize the number of accurately mapped reads by using Maq to map them against both the reference genome and reference transcripts (from CCDS database)

• For combining read mapping results we implemented two approaches called hard merging and soft merging– Hard merging throws away reads that are mapped uniquely by one

procedure and to multiple places by the other while soft merging keeps the unique alignment for these reads

– Both merging methods keep reads mapped uniquely to the same place by both mapping procedures and reads mapped by one procedure andnot mapped by the other

– Both methods throw away reads mapped uniquely to different places by both mapping procedures and reads mapped multiple times by both procedures.

Mapping Reads

SNP Calling Methods•Binomial: binomial test used in [3,7] for calling SNPs from genomic DNA

– Binomial probability based on the two highest allele counts under the null hypothesis that the genotype is heterozygous

•Posterior: uses base quality scores and read mapping probabilities

– Conditional probability of observing read data given each possible genotype G is computed as a product of read contributions, assuming independence between reads. A base b mapped with error probability eb contributes as follows:• For G=XX, 1-eb if X = b, eb/3 otherwise• For G=XY, X≠Y, (1-eb)/2+eb/6 if X=b or Y=b, eb/3 otherwise

– Maq mapping probabilities taken into account by raising the corresponding term to the probability that the read is mapped correctly at this location

– The posterior probability of each genotype is then evaluated assuming uniform priors. A variant is called in this approach if the genotype with highest posterior probability is different than homozygous reference and exceeds a user specified threshold

Results on Cancer Cell Line Reads

259138591987429561102567softmerged

220433931904528532100215hardmerged

18792924167122533393499genome

277741212000629623102874transcripts

0.9990.990.950.90.1Threshold

23513186444751147371softmerged

20012773397346096775hardmerged

17272422350540936108genome

25083376466153517638transcripts

0.9990.990.950.90.1Threshold

Alt. Coverage 1

Alt. Coverage 3

Results on Hapmap Reads

0 200 400 600 800 1000 1200

False Positives

Posterior

Binomial Exact

Binomial Cumulative

0 20 40 60 80

False Positives

Posterior

Binomial Exact

Binomial Cumulative

0 100 200 300 400

False Positives

Transcripts

Genome

Hard Merge

Soft Merge

0 5 10 15 20 25 30 35

False Positives

Transcripts

Genome

Hard Merge

Soft Merge

Alt. coverage 1 Alt. coverage 3

Validation Results

•Predicted immunogenic mutations are currently being validated by Sanger sequencing

•For confirmed mutations, peptide immunogenicity will be validated experimentally

• Proposed pipeline has improved accuracy for detecting mutations compared to previous methods

• Ongoing work includes refining the posterior method to increase mutation detection robustness in the presence of differential allelic expression and detecting short immunogenic indels

Conclusions & Ongoing Work

Experimental Setup•We tested the performance of implemented methods on 63 million Illumina mRNA reads generated from blood cell tissue of Hapmap individual NA12878 [2] (NCBI SRA database accession number SRX000566)

•We included in evaluation Hapmap SNPs in known exons for which there was at least one mapped read by any method

•Hapmap genotypes for these SNPs: 22,362 homozygous reference, 7,893 heterozygous or homozygous variant

•True positives: called SNPs for which Hapmap genotype is heterozygous or homozygous variant

•False positives: called SNPs for which Hapmap genotype is homozygous reference

•We also ran our analysis pipeline on 6.75 million Illumina reads from mRNA isolated from a mouse cancer tumor cell line

• Immunotherapy is a promising cancer treatment approach that relies on awakening the immune system to the presence of antigens associated with tumor cells

• The success of this approach depends on the ability to reliablydetect immunogenic cancer mutations, the vast majority of which are expected to be tumor-specific [6]

• In this poster we present a bioinformatics pipeline for detecting immunogenic cancer mutations from high throughput mRNA sequencing data

• Immunogenic mutations predicted by our pipeline from IlluminamRNA reads generated from a mouse cancer tumor cell line are currently under experimental validation

Introduction

Bioinformatics pipeline for detection of immunogenic cancer mutations by high throughput mRNA sequencing

Jorge Duitama1, Ion Mandoiu1, and Pramod Srivastava2

1CSE Dept., University of Connecticut2Immunology Dept., UCHC

1 cse dept., university of connecticut 2 immunology dept., uchc

mapped reads

mapping snps

mapping procedures

read mapping results

mapping readssnp

possible genotype g

highest posterior probability

base b

Documents

what’s up? @st john’s hospital...college hospital,...

immunology1 transplantation immunology transplantation...

approaches to active learning; two medical schools make a...

joanna trojanek dept. of microbiology & clinical immunology...

uchc guide to facs diva lsr ii instruments...erj rev...

uchc expansion pathways: economic impacts

2007-2008 uchc sustainability initiatives & events

balancing teaching, service and research productivity james...

national congress of the italian society of immunology...

humoral immunity & immunoglobulin structure and function...

125:583 biointerfacial characterization: protein-protein...

immunology of tuberculosis. immunology unit, dept. of...

the molecular basis for autoimmune diseases dr. adel...

immunology basic immunology immunopathology Éva...

primary immunodeficiency conleth feighery dept. of...

epigenetic mechanisms in b cell lymphoma eugene oltz dept....

final presentation - uchc college summer fellowship

clinical immunology and serology practice mlis-201 prof. dr....

immunology - biointeractive.org · immunology (advanced) 15...

create status update uchc faculty forum october 20, 2010