csci6904 genomics and biological computing instructor: christian blouin schedule : - monday 14:30...
TRANSCRIPT
CSCI6904
Genomics and Biological Computing
Instructor: Christian BlouinSchedule :
- Monday 14:30 – 13:55- Wednesday 14:30 – 13:55
Contact : [email protected] rm.: 321 CS building ph: 6702
GenomicsAnalysis of biological data within the context of genetic content of entire
organism.
Computational Molecular BiologyModeling and problem solving using computational techniques
BioinformaticsUsing computational techniques to perform data analysis on
biological datasets
Possible misconceptions about Bioinformatics
Bioinformatics is about large datasets!
I need a biological degree to do bioinformatics.
Biologists don’t know anything about computation. Trivial applications of CS can make a break through.
CSCI6904 midterms are hard!
Why should CS people do biology?
The nature of science is changing rapidly
Sir. A. Flemming discovered Penicillin by designing experiments (although the actual discovery was itself an anecdote).
Rosalind Franklin generated X-ray diffraction patterns by developing methods and instrumentations.
Why should CS people do biology?
The nature of science is changing rapidly
Again, more research in biology and chemistry boils down to the design of a clever analysis.
The quantitative skills required to navigate biology/chemistry are highly sought by:
-- Industrial sector PharmaceuticalEnvironmentAgricultureFood Science
-- Government labs-- Academic labs
Role of Computer Scientists in future developments in the field
Accessibility to data
The availability of a rapidly growing mass of information has been a cliché one-liner already for a while. It is nonetheless true. The researchers interested in biological questions cannot be bothered with database issues. Computer scientists are needed to make this connection and in the process generate more general and portable methodology.
Annotation, curation, query ,maintenance…
Role of Computer Scientists in future developments in the field
Accessibility to computation
Even if its easy to get all the relevant data, rarely there is the appropriate tools to do the job. There is a need for flexible and powerful computational platform to allow biologists/chemists to get the information they want, when they want it.
Toolkits, APIs, Interfaces, Visual Programming, Education
Role of Computer Scientists in future developments in the field
Accessibility to Knowledge
Biological systems did not evolve in complexity with regards to human limitations. In a not so far future, knowledge rather than data will become a more useful commodity. By knowledge, I refer to the inference of conceptual relationship between data and statements present in the literature.
Knowledge mining, natural language processing
Role of Computer Scientists in future developments in the field
Statistical Mechanics
As the base of data gets bigger and the bias in the nature of the data fades, the assumptions made by statistical mechanics are increasingly getting satisfied. Statistical mechanics has the potential to clean complex problem of convoluted models to represent them.
Computational chemistry, pattern detection, design.
Role of Computer Scientists in future developments in the field
Nanotechnology
Molecular biology presents a pre-fabricated framework for a microscopic platform. Proteins and nucleotides can be used as machines, for computing. The limiting factor to this is the inadequate quality of the models used for molecular design. Modeling evolution and molecules may just be what we need for the next biggest thing since running water, electricity and the internet.
Integrate all of the above.
Role of Computer Scientists in future developments in the field
Nevermind technology!
Whatever takes too long to run today will run slowly tomorrow, and will probably run in real time off your video card in five years. High performance computing should be seen as an open door to smart rather than just faster computing. A great example is the massive parallel algorithm behind folding@home.
Distributed computing and algorithms, data structures.
Academic activities
Lectures (Partial examinations I and II) 2 * 10%Identify problems, relate computational techniques to biological problems, apply
bioinformatic techniques to unrelated issues.Journal club content when relevant to class content
2 Paper Reviews (30 min critical presentation) (10%, 15%)
Present and discuss a paper on a topic of you choice. All are expected to read the papers ahead of the presentation.
Project (A clear question, a brief answer) 55%The main activity for this course will be a small project on a relevant issues in
Bioinformatics.(5% will be peer reviews)
The LecturesObjectives
1. Proficiencies in the generals applications of Bioinformatics.2. Focus on Genomics and Evolutionary Biology.3. Learn the minimum necessary in Biology, Chemistry and Medicine to
understand current problems in the field.4. Stimulate the generation of ideas for the course’s project.
The SeminarsObjectives
1. Read current papers.2. Identify current issues in Bioinformatics.3. Learn about new applications of CS to the field.4. Personalize the course to your own interests.
Workshops?I would be glad to swap one/a few seminar session for workshops
and first hand work if you are interested and the enrollment is such that we have free seminar sessions.
The Project
Objectives
An excuse for you to get first-hand experience in a field for which you may have never touched before.
Address an issue of general interest in bioinformatics:
Application of your favorite techniques.
Application of general methodologies.
Feasibility studies.
Straightforward implementations applied to bioinformatics.
Biologically-inspired computing which requires of you more biology that you’d want to get into.
The Project
Format
Milestone 1 – Definition of an area.
Milestone 2 – Definition of a problem.
Milestone 3 – Design of an experiment.
Milestone 4 – Journal discussion of the problem.
Milestone 5 – Discussing the results in the form of a short paper.
The Project
Format II
Can be a group project. However, as the team size increases, so will be the expectations!
The ProjectGrade Meaning
A+ This work could be published in the bioinformatics literature, providing
more polishing.
A This work could be published in the bioinformatics literature, provided
more experiments.
A- This work was well done, although would be unlikely to be publishable
in its current form.
B+ This work is neat, although it replicates things that have been
done before.
B This work is sound, although straightforward.
… …
Plagiarism
The university has guidelines http://plagiarism.dal.ca
Contact Gwendolyn MacNairn, our Librarian, if in doubt.
This should not be an issue anymore for graduate students!
As part of the project I will offer to proof read each of the term papers. This proof reading will aim at pointing out logic, scientific errors or omissions. A bit like a mini-peer review. However, please note the following:
1. It doesn’t make the instructor a co-author: I don’t want to be responsible if you don’t get a perfect mark even if you implement all of my comments.
2. If there is a suspicion of plagiarism, although the review isn’t graded, a manuscript WILL be sent out for disciplinary action.
Recommended Readings
Fundamental Concepts of Bioinformatics, Krane and Raymer, 92$, University Library
Discovering Genomics, Proteomics and Bioinformatics, Campbell and Heyer, 92$ (amazon <- pay no mind to these “ z” characters…)
Inferring Phylogenies, Felsenstein, 75$ (online)
All are covering only part of what we are going to talk about, unfortunately. However, the first one is a rather comprehensive overview of the field.
CSCI6904
Genomics and Biological Computing
Content
Genomic data
Alphabet in biology
Statistical mechanics
Physical Simulations
Classic/Modern Genetics
Evolutionary theory
Cellular Processing
Functional Genomics
Sequence alignments
Structure alignments
Phylogeny
Protein Folding
Machine learning methods
Conceptual Biology
DNA computing
Parallel history
Life
Life – Origins
Quick glance at life forms
EukaryotesWe are! Nucleus, linear chromosomes and extensive control machinery
Quick glance at life forms
ArchaeaBacteria look alike. Apparently more closely related to us than bacteria. Many known to live in exotic environments.
Quick glance at life forms
BacteriaSingle cell, one circular genome, “omnipresent” life forms.
Quick glance at self-replicative entities
VirusSole purpose is to replicate, usually don’t do much more.
Quick glance at self-replicative entities
TransposonsPieces of DNA that jump from one cell to another.
Indian corn
Transposon disrupts a pigmentation-related gene.
Quick glance at self-replicative entities
PrionsNot even genetically encoded. Responsible for “Mad cow” disease. Same principle in neurodegenerative diseases “Alzheimer” and “Parkinson”.
What is Cellular biology ?
http://www.emc.maricopa.edu/faculty/farabee/BIOBK/BioBookTOC.html
Real World players
Sugars
Nucleotides Amino-Acids
Lipids
What is molecular biology ?
http://www.emc.maricopa.edu/faculty/farabee/BIOBK/BioBookTOC.html
Complex systems are usually modeled well
using a graph approach.
Graphs terminology isn’t in the biological
culture, yet.
Edges and vertices
Lucky us, this encoding is 1-dimensional
(and thus can be represented as strings)
http://www.emc.maricopa.edu/faculty/farabee/BIOBK/BioBookTOC.html
What kind of information ?
Bergeron, Bioinformatics Computing, pp:45-46
Sequences
http://www.ncbi.nlm.nih.gov
Genebankhttp://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html
•DNA sequences.•Primary data generators submit to Genebank.•Annotation issues.•Heart of most genomics projects.
Structures
http://www.ncbi.nlm.nih.gov
Protein Databankhttp://www.rcsb.org/pdb /
•Models of 3D structures•X-ray crystallography•NMR spectroscopy
Microarray
http://www.ncbi.nlm.nih.gov
Gene expression•Identify which genes are expressed under a given set of conditions.•Microchips require small amount of sample for a full analysis.
What can we do with sequences?
http://www.ncbi.nlm.nih.gov
Multiple sequence alignments
PrincipleCharacter in sequences can be substituted randomly.
Alignment position homologous position together.
Unlikely that an ultimate alignment tool will ever be made.
What can we do with sequences?
http://www.mbio.ncsu.edu/BioEdit/bioedit.html
Multiple sequence alignments Tools
Bioedit (Windows)free. All inclusive functions
Seaview (Unix)
Free. Unstable. Little alternative that I know of.
What can we do with sequences?
http://www.the-scientist.com/images/yr2001/oct29/y.gif
Whole genome analysis
Look for genes
Look for regulation mechanisms
Look for drug targets (exclusive pathway)
Predict the function of unknown sequences
What can we do with sequences?Tell a story
Relationship amongst sequences
Origins of systems
Horizontal transfer of information between sequence
Understand evolution
What can we do with structures?
What is its function?
What is the mechanism?
Does it relate to other known structures
Can we design a drug to enhance/suppress its function?
Predict the structure of related proteins.
http://www.ks.uiuc.edu/Research/vmd/