genome sequencing. vocabulary bac: bacterial artificial chromosome: cloning vector for yeast pac,...
Post on 22-Dec-2015
229 views
TRANSCRIPT
Vocabulary
• Bac: Bacterial Artificial Chromosome: cloning vector for yeast
• Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli
• Library: collection of fragments of a genome in cloning vectors
• Draft: crude 1st generation sequence assembly• Scaffold: Sequences which are anchored to a
genetic map
Vocabulary 2
• Minimal tiling path: Minimal set of overlapping clones that together provides complete coverage across a genomic region
• Coverage: The number of times a genomic region is represented in a collection of clones or sequence reads
• Contig: Alignment of overlapping reads• 'N50 length‘ is defined as the largest length L
such that 50% of all nucleotides are contained in contigs of size at least L
N50
Cum
ulat
ive
cont
ig c
onte
nt
in %
of
geno
me
0400
50
100
Contig size (in kb)
Order contigs according to sizeCompute cumulative sizeN50 = contig size (sequence length) which marks 50% of genome content
100 1000
Human genome
• 2001: 2 Draft sequences published• Public Bac by Bac sequence• Celeras WGSA
– 90% of euchromatic sequence– 150.000 gaps– N50: 81 kb– Error rate: 1:10.000
• 2004 Finished public sequence– 99 % of euchromatic sequence– 341 gaps– N50: 38.500 kb– Error rate: 1:100.000
The problem with complex genomes
• Gaps
• Orientation of contigs not known
• Near identical repeats hard to resolve