1 cse dept., university of connecticut 2 immunology dept., uchc

Analysis Pipeline

Genome sequence

mRNA reads

Consensus coding

sequences

CCDSMapping

Genome Mapping

Read merging

EpitopePrediction

Merged mapped

reads

Expressed SNPsCCDS

mapped reads

Genome mapped

reads

Immunogenic mutations

Known SNPs

SNP calling

Approaches to Phasing

• We maximize the number of accurately mapped reads by using Maq to map them against both the reference genome and reference transcripts (from CCDS database)

• For combining read mapping results we implemented two approaches called hard merging and soft merging– Hard merging throws away reads that are mapped uniquely by one

procedure and to multiple places by the other while soft merging keeps the unique alignment for these reads

– Both merging methods keep reads mapped uniquely to the same place by both mapping procedures and reads mapped by one procedure andnot mapped by the other

– Both methods throw away reads mapped uniquely to different places by both mapping procedures and reads mapped multiple times by both procedures.

Mapping Reads

SNP Calling Methods•Binomial: binomial test used in [3,7] for calling SNPs from genomic DNA

– Binomial probability based on the two highest allele counts under the null hypothesis that the genotype is heterozygous

•Posterior: uses base quality scores and read mapping probabilities

– Conditional probability of observing read data given each possible genotype G is computed as a product of read contributions, assuming independence between reads. A base b mapped with error probability eb contributes as follows:• For G=XX, 1-eb if X = b, eb/3 otherwise• For G=XY, X≠Y, (1-eb)/2+eb/6 if X=b or Y=b, eb/3 otherwise

– Maq mapping probabilities taken into account by raising the corresponding term to the probability that the read is mapped correctly at this location

– The posterior probability of each genotype is then evaluated assuming uniform priors. A variant is called in this approach if the genotype with highest posterior probability is different than homozygous reference and exceeds a user specified threshold

Results on Cancer Cell Line Reads

259138591987429561102567softmerged

220433931904528532100215hardmerged

18792924167122533393499genome

277741212000629623102874transcripts

0.9990.990.950.90.1Threshold

23513186444751147371softmerged

20012773397346096775hardmerged

17272422350540936108genome

25083376466153517638transcripts

0.9990.990.950.90.1Threshold

Alt. Coverage 1

Alt. Coverage 3

Results on Hapmap Reads

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0 200 400 600 800 1000 1200

False Positives

Tru

e P

osi

tive

s

Posterior

Binomial Exact

Binomial Cumulative

Maq

0

500

1000

1500

2000

2500

3000

3500

0 20 40 60 80

False Positives

Tru

e P

osi

tive

s

Posterior

Binomial Exact

Binomial Cumulative

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0 100 200 300 400

False Positives

Tru

e P

osi

tive

s

Transcripts

Genome

Hard Merge

Soft Merge

0

500

1000

1500

2000

2500

3000

3500

0 5 10 15 20 25 30 35

False Positives

Tru

e P

osi

tive

s

Transcripts

Genome

Hard Merge

Soft Merge

Alt. coverage 1 Alt. coverage 3

Validation Results

•Predicted immunogenic mutations are currently being validated by Sanger sequencing

•For confirmed mutations, peptide immunogenicity will be validated experimentally

• Proposed pipeline has improved accuracy for detecting mutations compared to previous methods

• Ongoing work includes refining the posterior method to increase mutation detection robustness in the presence of differential allelic expression and detecting short immunogenic indels

Conclusions & Ongoing Work

Experimental Setup•We tested the performance of implemented methods on 63 million Illumina mRNA reads generated from blood cell tissue of Hapmap individual NA12878 [2] (NCBI SRA database accession number SRX000566)

•We included in evaluation Hapmap SNPs in known exons for which there was at least one mapped read by any method

•Hapmap genotypes for these SNPs: 22,362 homozygous reference, 7,893 heterozygous or homozygous variant

•True positives: called SNPs for which Hapmap genotype is heterozygous or homozygous variant

•False positives: called SNPs for which Hapmap genotype is homozygous reference

•We also ran our analysis pipeline on 6.75 million Illumina reads from mRNA isolated from a mouse cancer tumor cell line

• Immunotherapy is a promising cancer treatment approach that relies on awakening the immune system to the presence of antigens associated with tumor cells

• The success of this approach depends on the ability to reliablydetect immunogenic cancer mutations, the vast majority of which are expected to be tumor-specific [6]

• In this poster we present a bioinformatics pipeline for detecting immunogenic cancer mutations from high throughput mRNA sequencing data

• Immunogenic mutations predicted by our pipeline from IlluminamRNA reads generated from a mouse cancer tumor cell line are currently under experimental validation

Introduction

Bioinformatics pipeline for detection of immunogenic cancer mutations by high throughput mRNA sequencing

Jorge Duitama1, Ion Mandoiu1, and Pramod Srivastava2

1CSE Dept., University of Connecticut2Immunology Dept., UCHC

1 cse dept., university of connecticut 2 immunology dept., uchc

Documents

mapped reads

mapping snps

mapping procedures

read mapping results

mapping readssnp

possible genotype g

highest posterior probability

base b