bc-cancer chimerascan presentation
TRANSCRIPT
ChimeraScanChimeric transcript discovery by paired end
transcriptome sequencing.
AGENDA
• Overview: What is ChimeraScan?
• ChimeraScan Method(Algorithm)
• How to run ChimeraScan?
• ChimeraScan Results
• Limitations: What could be done better?
• Comparison with current software(deFuse, Trans-Abyss)
WHAT IS CHIMERASCAN?
• A tool for discovering chimeric transcripts or fusions in sequencing data.
ChimeraScan Method
● ChimerScan differs from other fusion finders(deFUSE) in that it adds a fragmentationstep along with the whole paired-endapproach which is also used by deFUSE.
Tell me more!!!!
ChimeraScan Algorithm
Fragmentation
ChimeraScan Algorithm
Step 1: Prepare reads for alignment
ChimeraScan parses FASTQ
1) converts all quality scores to Sanger format
(Phred + 33)
2) converts the qname for the reads from an arbitrarily long
string to a number (1/1, 1/2 for PE reads)
Reduces storage requirements for intermediate steps
ChimeraScan Algorithm
Pysam package is used.
Step 3: Create a sorted/indexed BAM file
Enables fast lookup of original read alignments by genomic coordinates.
Step 4: Estimate insert size distribution
Only uniquely mapping reads are used to sample the insert size distribution (used in future steps to help localize fusion
breakpoints).
ChimeraScan Algorithm
Step 5: Realign initially unmapped reads(Fragmentation)All of the initially unmapped reads are treated as single reads and realigned.
Additionally, the reads are trimmed such that only the sequences at the ends of the
fragment are aligned (default=25bp).
Step 6: Discover discordant reads
ChimeraScan Algorithm
Step 7: Nominate chimeras(fragment size distribution used)
Step 8: Extract chimeric breakpoint sequences(from genome FASTA file)
bowtie indexer used to create new alignment index of these breakpoint sequences
Step 9: Nominate reads that could span breakpoints
7
9
7 8
ChimeraScan Algorithm
Step 10: Align against breakpoint sequence database
(Created in step 8)
Step 11: Assess breakpoint spanning alignment
results (min anchor > #homologous bases between 5’->3’ at breakpoint
+ #mismatches allowed)
Reads that align to the breakpoint sequence index are
discarded if the overlap is small (less than anchor_min bases)
or have larger overlap but contain mismatches (red reads).
Reads overlapping the breakpoint by more than anchor_length
bases are retained (green read).
ChimeraScan Algorithm
Step 12: Filter chimerasMany filters which the user can specify to minimize the amount of false
positives.(know-false-positives, filter-size-distribution, supporting reads)
Step 13: Produce a text output file (BEDPE file)
How to run ChimeraScan
STEP 1: Generate read paired fastq files from merged bam files
'Bash baprojects/trans_scratch/software/deFUSE/scripts/bam2fastq.converter.sh'
INPUT(S):
BAM_FILE_PATH(ABSOLUTE)
LIBRARY_ID
OUTPUT_DIRECTORY
How to run ChimeraScan
STEP 2: Submit Chimerascan to cluster
'python /projects/trans_scratch/chimerascan/chimerascan-0.4.5/bin/chimerascan_run.py'
INPUT(S):
-v: verbose (for logging and debugging)
-p: processors(tested with -p = 8)
chimerascan_index(generated during chimerascan installation)
Fastq_1, Fastq_1 (both generated in step 1)
How to run ChimeraScan
Combine steps 1 & 2:
'bash /projects/trans_scratch/chimerascan/chimerascan_setup.sh'
INPUT(S):
PATIENT_ID
LIBRARY_ID
BAM_FILE_PATH
PROJECT_DIRECTORY
output(S):'qsub_all_chimerascan.sh': a script that submits both
steps 1 & 2 to the cluster. Jobs are run serially(fastq files are created before the chimerascan job is submitted)
ChimeraScan Results
Output(S):
Chimerascan outputs a chimeras.bedpe tabular file.
The chimeras.bedpe file contains information about the chromosomal regions, transcript ids, genes, and statistics for each chimera. The file adapts to the BEDPE format for representing paired-intervals (courtesy Aaron Quinlan and the BEDTools project).
The chimeras.bedpe also contains spanning and supporting reads(total score) for each reported events.
Other intermediate files are also created during the run, but they do not contain any useful information and thus can be deleted after the run is complete.
ChimeraScan Results
PROJECT LIBRIRAY_ID TOTAL TIME TOTAL SPACE
MCF7 A37098 ~23 HRS 178 GB
UHR Z01229 ~21 HRS 132 GB
COLO-829 A36972 ~20 HRS 157 GB
OUR VALIDATION:
Run settings: 8 cores, 8 parallel jobs
Limitations: What could be better?
Lack of an injective(one to one) mapping from chimeras.bedpe event types to our current set of event types.
Translocation ---> {interchromosomal}
Duplication ---> {intrachromosomal_complex, adjacent_complex}
Deletion ---> {intrachromosomal, intrachromosomal_diverging, intrachromosomal_complex}
Inversion --> {intrachromosomal_diverging}
Relies on an annotated set of genes(found in the reference index)
High sensitivity but also high number false positives. (tradeoff??)
Comparison with current software
MCF7 LIBRARY ChimeraScan DeFUSE(filtered)
Trans-Abyss (1.4.8)
Total events 629 503 161
Validated events found
32/89 35/89 33/89
Validated events not found
57 54 56
Novel events found
2 3 4
89 events were listed in the publications
18 events out of 89 were novel events
71 events out of 89 were previously known events
Note: The events for trans-abyss were taken from the 'sense_fusions.tsv' tabular file.
Comparison with current software(Validated Events)
Library: MCF7(A37098)
Total Events Found: 45/89
Events unique to Chimerascan: 2/89
Events unique to deFUSE: 5/89
Events unque to Trans-Abyss: 5/89
2
5522
3
Trans-abyss deFUSE
ChimeraScan
3 5
Comparison with current software(All Events)
Library: MCF7(A37098)
Total Events Found: 10,160
Events unique to Chimerascan: 587/10,160
Events unique to deFUSE: 502/10,160
Events unque to Trans-Abyss: 8857/10,160
587
5028857
15
172
Trans-abyss deFUSE
ChimeraScan
8 19
Comparison with current software
UHRLIBRARY
ChimeraScan DeFUSE(filtered)
Tran-Abyss (1.4.8)
Total events 1304 192 78
Validated events found
21/68 14/68 21/68
Validated events not found
47 54 47
68 events were listed in the publications
14 events out of 68 were externally verified events
44 events out of 68 were previously known events
Note: The events for trans-abyss were taken from the 'sense_fusions.tsv' tabular file.
Comparison with current software(Validated Events)
Library: UHR(Z01229)
Total Events Found: 28/68
Events unique to Chimerascan: 4/68
Events unique to deFUSE: 0/68
Events unque to Trans-Abyss: 5/68
4
05
59
3
2
Trans-Abyss deFUSE
ChimeraScan
Comparison with current software(All Events)
Library: UHR(Z01229)
Total Events Found: 18,015
Events unique to Chimerascan: 1279/18,015
Events unique to deFUSE: 154/18,015
Events unque to Trans-Abyss: 16,558/18,015
1279
15416,558
69
9
24
Trans-Abyss deFUSE
ChimeraScan
Comparison with current software(All Events)
Library: COLO-829(a36972)
Total Events Found: 3,361
Events unique to Chimerascan: 458/3,361
Events unique to deFUSE: 225/3,361
Events unque to Trans-Abyss: 2,668/3,361
458
2252,668
01
6
3
Trans-Abyss deFUSE
ChimeraScan
What's Next???
• Improve Runtime
• Find an injective mapping from chimeras.bdpe event types to our current set of event types
Reference(s)
• Iyer MK, Chinnaiyan AM, Maher CA. ChimeraScan: a tool for identifying chimeric transcription in sequencing data. Bioinformatics. 2011;27(20):2903-2904. doi:10.1093/bioinformatics/btr467.
• Weirather JL, Afshar PT, Clark TA, et al. Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing. Nucleic Acids Research. 2015;43(18):e116. doi:10.1093/nar/gkv562.
Karen MungallCaleb Choo
AWKNOLEDGEMENTS