bme280/cse277/cse377: bioinformatics spring 2006
Post on 18-Dec-2015
217 views
TRANSCRIPT
![Page 1: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/1.jpg)
BME280/CSE277/CSE377: BioinformaticsSpring 2006
![Page 2: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/2.jpg)
2
Administrivia
• Lecture time: TTh 12:30-1:45pm• Lecture place: Engineering II, Room 322• Instructor: Ion Mandoiu
– Office: ITEB 261– Tel: 6-3784– E-mail: [email protected]– Office hours: MW 1-2pm
![Page 3: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/3.jpg)
3
Textbooks
• Neil C. Jones and Pavel A. Pevzner, An Introduction to Bioinformatics Algorithms, MIT Press, 2004. Textbook website: http://bioalgorithms.info/. (REQUIRED)
• D. Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge University Press, 1997 (OPTIONAL)
![Page 4: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/4.jpg)
4
Grading
• 30% homework assignments– Bi-weekly
• 30% programming projects– Individual, 3-4 projects
• 40% final project– Individual or teams of 2– Written report + short presentation– Possible topics
• Algorithm implementation + empirical study• In-depth survey of a topic not covered in class• Progress on open research problems• Propose your own!
![Page 5: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/5.jpg)
5
What is Bioinformatics?
• Bioinformatics is generally defined as the analysis, prediction, and modeling of biological data with the help of computers
![Page 6: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/6.jpg)
6
Why Bioinformatics?
• DNA sequencing technologies have created massive amounts of information that can only be efficiently analyzed with computers
• Hundreds of species sequenced– Human, rat, chimp, chicken, …
• As the information becomes ever so larger and more complex, more computational tools are needed to sort through the data. – Biology is becoming an information science!
– Slowly, we are learning how cells work through comparative genomics -- not unlike comparative linguistics
![Page 7: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/7.jpg)
7
Bioinformatics Tools
• Bioinformatics problems involve multiple aspects– Example: Sequence Comparison
• Biology: How are genes evolving? How is gene function related to gene sequence?
• Learning/AI: How do we define “similar’’? Can we learn from examples?
• Algorithms: How can we efficiently find all similar sequences?• Statistics: How do we distinguish a random match from a true
one?
![Page 8: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/8.jpg)
8
Course Description
• Course emphasis– Modeling computational problems arising in biology as graph-theoretic,
statistical, or mathematical optimization problems– Design, analysis, and implementation of efficient algorithms
• Algorithmic techniques to be covered– Exhaustive search– Integer programming– Greedy algorithms– Dynamic programming– Divide-and-conquer– Graph algorithms– Combinatorial pattern matching– Clustering– Hidden Markov models– Randomized algorithms
![Page 9: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/9.jpg)
9
Course Description
• Biological applications – Restriction mapping– DNA sequencing– Motif finding– Pairwise sequence alignment– Gene prediction– Evolutionary trees– Genome rearrangements
![Page 10: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/10.jpg)
Complete and return the survey!
![Page 11: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/11.jpg)
11
Basic Molecular Biology
![Page 12: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/12.jpg)
12
The Cell
Source: D. Geiger
All cells contain the same DNA, yet there are many types of cells!
![Page 13: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/13.jpg)
13
Mendel and his Genes
• Genes -- physical and functional traits passed on from one generation to the next
• Discovered by Gregor Mendel in the 1860s while he was experimenting with the pea plant. He asked the question:
![Page 14: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/14.jpg)
14
The Pea Plant Experiments
• Mendel discovered that genes were passed on to offspring by both parents in two forms: dominant and recessive.
• The dominant form would be the phenotypic characteristic of the offspring
![Page 15: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/15.jpg)
15
DNA: The Code of Life
• The structure and the four genomic letters code for all living organisms • Adenine, Guanine, Thymine, and Cytosine which pair A-T and C-G on
complimentary strands.
![Page 16: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/16.jpg)
16
DNA Components
Source: D. Geiger
![Page 17: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/17.jpg)
17
The Human Genome
Source: D. Geiger
![Page 18: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/18.jpg)
18
DNA Organization
Source: D. Geiger
![Page 19: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/19.jpg)
19
Genome Sizes
• E. Coli (bacteria) 4.7 Mb (Mega bases)
• Yeast (simple fungi) 15 Mb• Nematode (C. Elegans) 100 Mb• Mouse 2 Gb (Giga bases)• Human 3 Gb• Wheat 16.5 Gb• Lily 32-48 Gb
![Page 20: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/20.jpg)
20
Genes
• DNA strings contain:– Coding regions (genes)– Control regions– “Junk” DNA (unknown function)
• Estimated number of genes:– E. Coli (bacteria) 4,000– Yeast (simple fungi) 6,000– Nematode (C. Elegans) 13,000– Human 32,000 (?)
![Page 21: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/21.jpg)
21
Central Dogma
• Cells express different subsets of genes under different environments
Gene
mRNAProtein
Transcription Translation
![Page 22: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/22.jpg)
22
Gene Transcription
Source: D. GeigerRNA: similar to DNA, but has
• slightly different backbone
• Uracil (U) instead of Thymine (T)
![Page 23: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/23.jpg)
23
RNA Roles
Source: D. Geiger
![Page 24: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/24.jpg)
24
Translation
• Catalyzed by Ribosome• Using two different sites, the
Ribosome continually binds tRNA, joins the amino acids together and moves to the next location along the mRNA
• ~10 codons/second, but multiple translations can occur simultaneously
http://wong.scripps.edu/PIX/ribosome.jpg
![Page 25: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/25.jpg)
25
Genetic Code
Source: D. Geiger
• Human cells produce approx. 100,000 proteins
• Proteins are poly-peptides consisting of 70-3,000 amino acids
• There are 20 different amino acids; every 3 nucleotides in a gene encode for 1 amino acid (or the STOP signal)
![Page 26: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/26.jpg)
26
Protein Folding
• Proteins are not linear structures, though they are built that way
• The amino acids have very different chemical properties; they interact with each other after the protein is built– This causes the protein to start fold and adopting it’s functional
structure– Proteins may fold in reaction to some ions, and several separate
chains of peptides may join together through their hydrophobic and hydrophilic amino acids to form a polymer
![Page 27: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/27.jpg)
27
Protein Folding (cont’d)
• The structure that a protein adopts is vital to it’s chemistry
• Its structure determines which of its amino acids are exposed carry out the protein’s function
• Its structure also determines what substrates it can react with
![Page 28: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/28.jpg)
28
Protein Structure
Source: D. Geiger
![Page 29: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/29.jpg)
29
Basic Molecular Biotechnology
How is information accessed at molecular level?
![Page 30: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/30.jpg)
30
• Amplification (making many copies)• Cutting into shorter fragments• Reading fragment lengths• Reading DNA sequence• Probing presence of specific fragments
Operations on DNA/RNA
![Page 31: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/31.jpg)
31
Why we need so many copies
• Biologists needed to find a way to read DNA codes.• How do you read base pairs that are angstroms in size?
– It is not possible to directly look at it due to DNA’s small size.
– Need to use chemical techniques to detect what you are looking for.
– To read something so small, you need a lot of it, so that you can actually detect the chemistry.
• Need a way to make many copies of the base pairs, and a method for reading the pairs.
![Page 32: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/32.jpg)
32
Polymerase Chain Reaction
• Problem: Modern instrumentation cannot easily detect single molecules of DNA, making amplification a prerequisite for further analysis
• Solution: PCR doubles the number of DNA fragments at every iteration
1… 2… 4… 8…
![Page 33: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/33.jpg)
33
Denaturation
Raise temperature to 94oC to separate the duplex form of DNA into single strands
![Page 34: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/34.jpg)
34
Design primers
• To perform PCR, a 10-20bp sequence on either side of the sequence to be amplified must be known because DNA pol requires a primer to synthesize a new strand of DNA
![Page 35: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/35.jpg)
35
Annealing
• Anneal primers at 50-65oC
![Page 36: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/36.jpg)
36
Annealing
• Anneal primers at 50-65oC
![Page 37: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/37.jpg)
37
Extension
• Extend primers: raise temp to 72oC, allowing Taq pol to attach at each priming site and extend a new DNA strand
![Page 38: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/38.jpg)
38
Extension
• Extend primers: raise temp to 72oC, allowing Taq pol to attach at each priming site and extend a new DNA strand
![Page 39: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/39.jpg)
39
Repeat
• Repeat the Denature, Anneal, Extension steps at their respective temperatures…
![Page 40: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/40.jpg)
40
Polymerase Chain Reaction
![Page 41: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/41.jpg)
41
Restriction Enzymes
• Discovered in the early 1970’s
– Used as a defense mechanism by bacteria to break down the DNA of attacking viruses.
– They cut the DNA into small fragments.
• Can also be used to cut the DNA of organisms.
– This allows the DNA sequence to be in a more manageable bite-size pieces.
• It is then possible using standard purification techniques to single out certain fragments and duplicate them to macroscopic quantities.
![Page 42: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/42.jpg)
42
Molecular Scissors
Molecular Cell Biology, 4th editionfig 9-10
![Page 43: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/43.jpg)
43
Discovering Restriction Enzymes
• HindII: first restriction enzyme discovered by Hamilton Smith in 1970
– From bacterium Haemophilus influenzae
– Discovered accidentally while studying how the bacterium Haemophilus influenzae takes up DNA from the phage virus P22
– Recognizes and cuts DNA at sequences:
• GTGCAC• GTTAAC
![Page 44: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/44.jpg)
44
Recognition Sites of Restriction Enzymes
![Page 45: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/45.jpg)
45
Separating DNA by Size
• Gel electrophoresis is a process for separating DNA by size
• Can separate DNA fragments that differ in length in only 1 nucleotide for fragments up to 500 nucleotides long
![Page 46: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/46.jpg)
46
Gel Electrophoresis
• DNA fragments are injected into a gel positioned in an electric field
• DNA are negatively charged near neutral pH– The ribose phosphate backbone of each nucleotide is
acidic; DNA has an overall negative charge• DNA molecules move towards the positive electrode
![Page 47: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/47.jpg)
47
Gel Electrophoresis (cont’d)
• DNA fragments of different lengths are separated according to size– Smaller molecules move through the gel matrix more
readily than larger molecules• The gel matrix restricts random diffusion so molecules of
different lengths separate into bands
![Page 48: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/48.jpg)
48
Detecting DNA: Autoradiography
• One way to visualize separated DNA bands on a gel is autoradiography:
• The DNA is radioactively labeled
• The gel is laid against a sheet of photographic film in the dark, exposing the film at the positions where the DNA is present.
![Page 49: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/49.jpg)
49
Detecting DNA: Fluorescence
• Another way to visualize DNA bands in gel is fluorescence:
• The gel is incubated with a solution containing the fluorescent dye ethidium
• Ethidium binds to the DNA
• The DNA lights up when the gel is exposed to ultraviolet light.
![Page 50: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/50.jpg)
50
Gel Electrophoresis: Example
Direction of DNA movement
Smaller fragments travel farther
![Page 51: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/51.jpg)
51
Sequencing
• Biologists can reliably find the sequence of A/C/T/G for short strings (few hundred nucleotides)
• Chain termination– Single strand template– Complementary strand synthesis blocked with small probability at
particular nucleotides– Lengths of fragments read for each class of strings
![Page 52: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/52.jpg)
52
Sequencing
• Biologists can reliably find the sequence of A/C/T/G for short strings (few hundred nucleotides)
• Chain termination– Single strand template– Complementary strand synthesis blocked with small probability at
particular nucleotides– Lengths of fragments read for each class of strings
A C T G----
--------
--------
--------
![Page 53: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/53.jpg)
53
Sequencing
• Biologists can reliably find the sequence of A/C/T/G for short strings (few hundred nucleotides)
• Chain termination– Single strand template– Complementary strand synthesis blocked with small probability at
particular nucleotides– Lengths of fragments read for each class of strings
A C T G----
--------
--------
--------
ATACGGAATACGGATACGATACATAATA
![Page 54: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/54.jpg)
54
Sequencing
![Page 55: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/55.jpg)
55
DNA Hybridization
• Single-stranded DNA will naturally bind to complementary strands
• Hybridization is used to locate genes, regulate gene expression, and determine the degree of similarity between DNA from different sources
• Hybridization is also referred to as annealing or renaturation
![Page 56: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/56.jpg)
56
Microarray Technologies
• Oligonucleotide arrays– Short (20-60bp) synthetic DNA strands
• Arrays of cDNAs– Obtained by reverse transcription from Expressed Sequence
Tags (ESTs)
![Page 57: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/57.jpg)
57
DNA Array Hybridization Experiment
Images courtesy of Affymetrix.
Tagged RNA fragments flushed over array Laser activation of fluorescent tags
Optical scanning of hybridization
intensities
![Page 58: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/58.jpg)
58
Two-Color Technique
• Sample labeled RED• Control labeled GREEN• YELLOWYELLOW probes hybridize to both sample and control•BLACK probes hybridize to neither
Cy3Cy3Cy3
Cy5Cy5Cy5
cell type 2
cell type 1
RNA 2
RNA 1
target 1
target 2
![Page 59: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/59.jpg)
59
Sequencing by Hybridization
• Exploits parallel hybridization capabilities offered by DNA arrays
• ALL probes of a certain length k (k=8 to 10) are synthesized on the array
• Target DNA hybridizes at locations which store probes complementary to its k-substrings
• Sequencing by Hybridization (SBH) Problem: Reconstruct target DNA given its k-length substrings (spectrum)
![Page 60: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/60.jpg)
60
• Cloning in expression vectors
• 2-Dimensional gel electrophoresis: separate proteins by molecular weight/pH gradient
• Antibody techniques (immunoprecipitation, antibody arrays,…)
• Mass spectrometry (e.g., MALDI-TOF)
• …
Operations on Proteins
![Page 61: BME280/CSE277/CSE377: Bioinformatics Spring 2006](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d255503460f949fc3d1/html5/thumbnails/61.jpg)
61
• Genome projects have already given draft genome sequence for hundreds of species, but lots of questions remain to be answered
– Create a complete “parts” list: gene sequences (including intron/exon structure), transcription factors, …
– Understand function of each part, e.g., protein structure, protein/DNA and protein/protein interactions
– Understand mechanisms, e.g., pathways
– Understand how everything fits together: systems biology
Active research problems