from dna to genomics: the rise of bioinformatics - catherine abbott
TRANSCRIPT
From DNA to genomics: the rise of bioinformatics
Catherine [email protected] 2 DecemberBioInfoSummer 2013
1
NB. Most images in this presentation are viaGoogle images.
Outline of talk
• Introduce bioinformatics• Very very basic introduction
to – DNA-molecular biology– Genes– Genomes
• Human Genome Project• Genomics• The challenges and the
future2
What is Bioinformatics?• Bioinformatics is the
field of science in whichbiologyinformatics: computer
science, information technology, mathematics, statistics and other sciences
3
• The “Infancy” Period: 1996–2001
• The “Adolescence” Period: 2002–2006
• The “Adulthood” Period: 2007–2013
Ouzounis CA.PLoS Comput Biol. 2012;8(4):e1002487. doi: 10.1371/journal.pcbi.1002487. Epub 2012 Apr 26.Rise and demise of bioinformatics? Promise and progress.
Central paradigm of Molecular Biology
5
DNA RNA Protein PhenotypeGuanine- GAdenine- AThymine- TCytosine- C
Guanine- GAdenine- AUracil- UCytosine- C
G Glycine Gly
P Proline Pro
A Alanine Ala
V Valine Val
20 amino acids
Central paradigm of Molecular Biology
6
What is a gene?• The gene is the basic physical unit of
inheritance
7
http://www.bbc.co.uk/schools/gcsebitesize/science/add_aqa_pre_2011/celldivision/celldivision1.shtml
8
DNA Sequences- threebases and stop codons
9http://www.genome.gov/EdKit/bio2b.html
Reading frames
10
http://www.genome.gov/EdKit/bio2e.html
Exons and Introns
11
http://www.genome.gov/EdKit/bio2i.html
Further information
• http://www.dnai.org/• http://
www.dnalc.org/websites/dnaftb.html
1977: Sanger Sequencing
• used chemically altered "dideoxy" bases to terminate newly synthesized DNA fragments at specific bases (either A, C, T, or G).
• Was awarded two nobel prizes 1958 and 1980 (shared with Gilbert and Berg)
13
Evolution of Sequencing Technology
1865 : Mendal shows inheritance in peas
1953 : Watson and Crick structure of DNA
1977 : Era of sequencing begins1980 : Shotgun sequencing
coined1982 : GenBank founded1983 : Kary Mullis and
colleagues develop PCR1986 : First commercial ABI
sequencer launched 1990 : Blast algorithm developed
at NCBI1991 : EST strategy developed
Evolution of Sequencing Technology
1995 : Cycle Sequencing by Amersham; Applied Biosystems releases capillary electrophoresis system Prism 310; output 5000 bases per day
1997 : MegaBACE 1000 capillaries;output 250,000-500,000 bases per day
1998 : Pyrosequencing developed2001 : Draft human sequence2005 : Launch of Genome Sequencer 20
System by 454 Life Sciences based on Pyrosequencing technology; output 20 million bases per run
Fihlo JS Breast Cancer Research 2009
17
Traditional Sequencing vs 454 Technology
Genbank:
18
Nucleic Acids Res. 2011 Jan;39(Database issue):D32-7.
Nucleic Acids Res. 2011 Jan;39(Database issue):D32-7.
What is Genomics?
• An organism's complete set of DNA is called its genome
• Genomics is the study of the entire genome of an organism
• investigations into the structure and function of very large numbers of genes undertaken in a simultaneous fashion.
21
The Race
22
20th July 1969 26th June 2000
President Clinton 26th June 2000
• “We are here to celebrate the completion of the first survey of the entire human genome. Without a doubt, this is the most important, most wondrous map ever produced by humankind.”
23
Prime Minister Blair26 June 2000• “……a revolution in medical
science whose implications far surpass even the discovery of antibiotics…... And every so often in the history of human endeavor there comes a breakthrough that takes humankind across a frontier and into a new era. ……a breakthrough that opens the way for massive advances in the treatment of cancer and hereditary diseases, and that is only the beginning.”
24
February 2001
25$2.7 billion US $300 million US
Cost of Private effort-13 years ago
• 300 machines running night and day for over a year
• $30,000,000 to buy• $2 M a month in
electricity• $4 M a month in
chemicals• Fits on 5 CDs
26
27
Human Genome Project• The biggest bioinformatics project of
its time• So what have we learned so far
– 3.2 billion bases in the human genome– Just over 20,000 protein coding genes– Humans vary 1/1000bp
• 3.2 million differences between non-relatives
• Almost as much information as in the entire genome of E.coli (4.6 million bases)
28
29
Bishop Desmond Tutu2010
Craig Venter2001-2003
James D Watson2007
CompletedHuman Genomes
James D. Watson• June 2007• 454 Sequencer• Took 4 months• Cost <$1 Million
30
Richard Carson/Reuters
31
2005 2007 2008
2009 2010
19 August 2011• Baylor College of Medicine
Human Genome Sequencing Center and the AGRF in Melbourne, Australia.
• WGS and Sanger sequencing• 2 x coverage• 5.9 x coverage on ABI SOLID• 2,574 Megabase
32
Renfree et al. Genome Biology 2011, 12:R81
Complete Genomes-Nov 2010 • http://www.ncbi.nlm.nih.gov/
Genomes/• There are now over 1000
complete Prokaryotic Genomes available in Entrez Genome
• All three main domains of life - bacteria, archae and eukaroytic- are represented, as well as many viruses and organelles
• Humans, mice, rats, worms and flies have been completed
33http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/org.html
34
http://www.1000genomes.org/about
http://www.icgc.org/
DIY genomics
Summary and Challenges Ahead
• DNA sequencing is becoming faster and cheaper at a pace far outstripping Moore’s law (the rate at which computing gets faster and cheaper).
• the ability to determine DNA sequences is starting to outrun the ability of researchers to store, transmit and especially to analyze the data.
http://infoproc.blogspot.com/2011/11/dna-data-deluge.html
Summary and Challenges Ahead
• Data handling is now the bottleneck
• It costs more to analyze a genome than to sequence a genome.
• The cost of sequencing a human genome — all three billion bases of DNA in a set of human chromosomes — plunged to $10,500 last July from $8.9 million in July 2007
Summary and Challenges Ahead
• Storage and access to data causes issues– Not all data in Genbank or in a format that can be easily
accessed• Demand from non-scientists for tools to visualize, understand
and interpret their own genomic data
http://www.missionmassimo.com/?page_id=8
Personalized Medicine: the future
Ouzounis CA.PLoS Comput Biol. 2012;8(4):e1002487. doi:
10.1371/journal.pcbi.1002487. Epub 2012 Apr 26.Rise and demise of bioinformatics? Promise and
progress.
Fig 1. The use of the term “bioinformatics” in Google Trends
http://www.indeed.com/jobanalytics/jobtrends?q=bioinformatics&l=
BioInfoSummer 2013 program
• Monday-Background to Biology and Statistics
• Tuesday- Evolution Biology• Wednesday- Systems Biology• Thursday-Next Generation Sequencing
(NGS)• Friday- Programing for Bioinformatics
43
•Thank You!
44