bioinformatics at molecular epidemiology - new tools for identifying indels in sequencing data kai...
Post on 17-Dec-2015
216 Views
Preview:
TRANSCRIPT
Bioinformatics at Molecular Epidemiology- new tools for identifying indels in sequencing data
Kai Yek.ye@lumc.nl
Data collection for osteoarthritis, cardiovascular disease and longevity
• Serum parameters• Cellular characteristics (biobank)• Skin ageing• Glycosylation • Metabonomic• Transcriptomic• Genetic (GWAS/sequence)• Epigenetic• Data Integration
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0 65.0 70.0 75.0 80.0-50
-20
0
20
40
60
80
100
120
140
160
180
200
220
240
260
280
300
320
350 612 #68 6 dec B4 FLUmV
min
1 - 36.281
2 - 38.161
3 - 41.934
4 - Intergrate-11 - 42.787
5 - 44.173
6 - Intergrate-12 - 45.324
7 - Intergrate-13 - 48.294
8 - 49.809
9 - 52.029
10 - 54.688
11 - 55.813
12 - 58.113
13 - 60.439
14 - 65.038
15 - 66.956
16 - 69.878
17 - 72.70518 - 76.407
N-Acetylglucosamine
Galactose
Mannose
Sialicacid
Fucose
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0 65.0 70.0 75.0 80.0-50
-20
0
20
40
60
80
100
120
140
160
180
200
220
240
260
280
300
320
350 612 #68 6 dec B4 FLUmV
min
1 - 36.281
2 - 38.161
3 - 41.934
4 - Intergrate-11 - 42.787
5 - 44.173
6 - Intergrate-12 - 45.324
7 - Intergrate-13 - 48.294
8 - 49.809
9 - 52.029
10 - 54.688
11 - 55.813
12 - 58.113
13 - 60.439
14 - 65.038
15 - 66.956
16 - 69.878
17 - 72.70518 - 76.407
N-Acetylglucosamine
Galactose
Mannose
Sialicacid
Fucose
N-Acetylglucosamine
Galactose
Mannose
Sialicacid
Fucose
Genetic &Epigenetic analyses
BiochemanalysesExpression
analysis
metabonomicanalysis
GlycosylationCell responses
Joost KokErik vd Akker Kai Ye Statistical analysis
About me
• 1995 – 2003 B.S. and M.S. in biology and pharmaceutical science
• 2004 – 2008 PhD with Cum Laude at Leiden University. Thesis title: Novel algorithms for protein sequence analysis
• 2008 – 2009 Postdoc at European Bioinformatics Institute, collaborating with scientists in Sanger Institute
• Currently assistant professor at MolEpi
A Pindel approach for identifying indels in Next-Gen sequencing data
• Paired-end reads in Next-gen sequencing
• Indel detection algorithms• Pindel• Cancer genome project• 1000 genomes project
SNP
Mapping paired-end reads
CNVs: copy number variations; INDELs: insertions and deletions; SVs: Structural variations
Gapped alignment for small indels
ATCCGTATCACGGTCA-CAGATCAGTCCAGT
ATCCGTATCACGGTCAGCAGATCAGTCCAGT
indel
Read-pair approach for SVs
No Indel
Deletion
Insertion
Sample
Reference
Sample
Reference
Sample
Reference
18 April 2023 16
ref
Pindel: Deletions
Anchor
2 x average distance
Expected maximum deletion size + read length (36)
18 April 2023 18
African male: NA18507
• Bentley et al., Nature 2008• 135Gb of sequence• ~4 billion paired 35-base reads• After preprocessing:
56,161,333 pairs of one-end mapped reads
• Pindel– 142,908 1-16bp insertions– 162,068 1bp-10kb deletions
Cancer genome
• COLO-829 cells• Normal ~30x paired-end 100bp reads• Tumor ~40x paired-end 100bp reads• Search for somatic (tumor specific) indels
1000genomes project
• Pilot 1: 180 people of 3 major geographic groups (YRI, CEU, CHB and JPT) at low coverage (~4x)
• Pilot 2: the genomes of two families (CEU and YRI, both parents and an adult child) with deep coverage (20x per genome)
• Pilot 3: sequencing the coding regions (exons) of 1,000 genes in 1,000 people with deep coverage (20x).
top related