csci6904 genomics and biological computing instructor: christian blouin schedule : - monday 14:30...

47
CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : [email protected] rm.: 321 CS building

Upload: dina-jordan

Post on 28-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

CSCI6904

Genomics and Biological Computing

Instructor: Christian BlouinSchedule :

- Monday 14:30 – 13:55- Wednesday 14:30 – 13:55

Contact : [email protected] rm.: 321 CS building ph: 6702

Page 2: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

GenomicsAnalysis of biological data within the context of genetic content of entire

organism.

Computational Molecular BiologyModeling and problem solving using computational techniques

BioinformaticsUsing computational techniques to perform data analysis on

biological datasets

Page 3: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Possible misconceptions about Bioinformatics

Bioinformatics is about large datasets!

I need a biological degree to do bioinformatics.

Biologists don’t know anything about computation. Trivial applications of CS can make a break through.

CSCI6904 midterms are hard!

Page 4: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Why should CS people do biology?

The nature of science is changing rapidly

Sir. A. Flemming discovered Penicillin by designing experiments (although the actual discovery was itself an anecdote).

Rosalind Franklin generated X-ray diffraction patterns by developing methods and instrumentations.

Page 5: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Why should CS people do biology?

The nature of science is changing rapidly

Again, more research in biology and chemistry boils down to the design of a clever analysis.

The quantitative skills required to navigate biology/chemistry are highly sought by:

-- Industrial sector PharmaceuticalEnvironmentAgricultureFood Science

-- Government labs-- Academic labs

Page 6: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Role of Computer Scientists in future developments in the field

Accessibility to data

The availability of a rapidly growing mass of information has been a cliché one-liner already for a while. It is nonetheless true. The researchers interested in biological questions cannot be bothered with database issues. Computer scientists are needed to make this connection and in the process generate more general and portable methodology.

Annotation, curation, query ,maintenance…

Page 7: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Role of Computer Scientists in future developments in the field

Accessibility to computation

Even if its easy to get all the relevant data, rarely there is the appropriate tools to do the job. There is a need for flexible and powerful computational platform to allow biologists/chemists to get the information they want, when they want it.

Toolkits, APIs, Interfaces, Visual Programming, Education

Page 8: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Role of Computer Scientists in future developments in the field

Accessibility to Knowledge

Biological systems did not evolve in complexity with regards to human limitations. In a not so far future, knowledge rather than data will become a more useful commodity. By knowledge, I refer to the inference of conceptual relationship between data and statements present in the literature.

Knowledge mining, natural language processing

Page 9: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Role of Computer Scientists in future developments in the field

Statistical Mechanics

As the base of data gets bigger and the bias in the nature of the data fades, the assumptions made by statistical mechanics are increasingly getting satisfied. Statistical mechanics has the potential to clean complex problem of convoluted models to represent them.

Computational chemistry, pattern detection, design.

Page 10: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Role of Computer Scientists in future developments in the field

Nanotechnology

Molecular biology presents a pre-fabricated framework for a microscopic platform. Proteins and nucleotides can be used as machines, for computing. The limiting factor to this is the inadequate quality of the models used for molecular design. Modeling evolution and molecules may just be what we need for the next biggest thing since running water, electricity and the internet.

Integrate all of the above.

Page 11: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Role of Computer Scientists in future developments in the field

Nevermind technology!

Whatever takes too long to run today will run slowly tomorrow, and will probably run in real time off your video card in five years. High performance computing should be seen as an open door to smart rather than just faster computing. A great example is the massive parallel algorithm behind folding@home.

Distributed computing and algorithms, data structures.

Page 12: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Academic activities

Lectures (Partial examinations I and II) 2 * 10%Identify problems, relate computational techniques to biological problems, apply

bioinformatic techniques to unrelated issues.Journal club content when relevant to class content

2 Paper Reviews (30 min critical presentation) (10%, 15%)

Present and discuss a paper on a topic of you choice. All are expected to read the papers ahead of the presentation.

Project (A clear question, a brief answer) 55%The main activity for this course will be a small project on a relevant issues in

Bioinformatics.(5% will be peer reviews)

Page 13: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

The LecturesObjectives

1. Proficiencies in the generals applications of Bioinformatics.2. Focus on Genomics and Evolutionary Biology.3. Learn the minimum necessary in Biology, Chemistry and Medicine to

understand current problems in the field.4. Stimulate the generation of ideas for the course’s project.

Page 14: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

The SeminarsObjectives

1. Read current papers.2. Identify current issues in Bioinformatics.3. Learn about new applications of CS to the field.4. Personalize the course to your own interests.

Page 15: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Workshops?I would be glad to swap one/a few seminar session for workshops

and first hand work if you are interested and the enrollment is such that we have free seminar sessions.

Page 16: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

The Project

Objectives

An excuse for you to get first-hand experience in a field for which you may have never touched before.

Address an issue of general interest in bioinformatics:

Application of your favorite techniques.

Application of general methodologies.

Feasibility studies.

Straightforward implementations applied to bioinformatics.

Biologically-inspired computing which requires of you more biology that you’d want to get into.

Page 17: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

The Project

Format

Milestone 1 – Definition of an area.

Milestone 2 – Definition of a problem.

Milestone 3 – Design of an experiment.

Milestone 4 – Journal discussion of the problem.

Milestone 5 – Discussing the results in the form of a short paper.

Page 18: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

The Project

Format II

Can be a group project. However, as the team size increases, so will be the expectations!

Page 19: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

The ProjectGrade Meaning

A+ This work could be published in the bioinformatics literature, providing

more polishing.

A This work could be published in the bioinformatics literature, provided

more experiments.

A- This work was well done, although would be unlikely to be publishable

in its current form.

B+ This work is neat, although it replicates things that have been

done before.

B This work is sound, although straightforward.

… …

Page 20: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Plagiarism

The university has guidelines http://plagiarism.dal.ca

Contact Gwendolyn MacNairn, our Librarian, if in doubt.

This should not be an issue anymore for graduate students!

As part of the project I will offer to proof read each of the term papers. This proof reading will aim at pointing out logic, scientific errors or omissions. A bit like a mini-peer review. However, please note the following:

1. It doesn’t make the instructor a co-author: I don’t want to be responsible if you don’t get a perfect mark even if you implement all of my comments.

2. If there is a suspicion of plagiarism, although the review isn’t graded, a manuscript WILL be sent out for disciplinary action.

Page 21: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Recommended Readings

Fundamental Concepts of Bioinformatics, Krane and Raymer, 92$, University Library

Discovering Genomics, Proteomics and Bioinformatics, Campbell and Heyer, 92$ (amazon <- pay no mind to these “ z” characters…)

Inferring Phylogenies, Felsenstein, 75$ (online)

All are covering only part of what we are going to talk about, unfortunately. However, the first one is a rather comprehensive overview of the field.

Page 22: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

CSCI6904

Genomics and Biological Computing

Content

Genomic data

Alphabet in biology

Statistical mechanics

Physical Simulations

Classic/Modern Genetics

Evolutionary theory

Cellular Processing

Functional Genomics

Sequence alignments

Structure alignments

Phylogeny

Protein Folding

Machine learning methods

Conceptual Biology

DNA computing

Page 23: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Parallel history

Page 24: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Life

Page 25: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Life – Origins

Page 26: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Quick glance at life forms

EukaryotesWe are! Nucleus, linear chromosomes and extensive control machinery

Page 27: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Quick glance at life forms

ArchaeaBacteria look alike. Apparently more closely related to us than bacteria. Many known to live in exotic environments.

Page 28: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Quick glance at life forms

BacteriaSingle cell, one circular genome, “omnipresent” life forms.

Page 29: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Quick glance at self-replicative entities

VirusSole purpose is to replicate, usually don’t do much more.

Page 30: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Quick glance at self-replicative entities

TransposonsPieces of DNA that jump from one cell to another.

Indian corn

Transposon disrupts a pigmentation-related gene.

Page 31: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Quick glance at self-replicative entities

PrionsNot even genetically encoded. Responsible for “Mad cow” disease. Same principle in neurodegenerative diseases “Alzheimer” and “Parkinson”.

Page 32: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

What is Cellular biology ?

http://www.emc.maricopa.edu/faculty/farabee/BIOBK/BioBookTOC.html

Page 33: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Real World players

Sugars

Nucleotides Amino-Acids

Lipids

Page 34: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

What is molecular biology ?

http://www.emc.maricopa.edu/faculty/farabee/BIOBK/BioBookTOC.html

Page 35: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Complex systems are usually modeled well

using a graph approach.

Graphs terminology isn’t in the biological

culture, yet.

Page 36: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Edges and vertices

Page 37: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Lucky us, this encoding is 1-dimensional

(and thus can be represented as strings)

http://www.emc.maricopa.edu/faculty/farabee/BIOBK/BioBookTOC.html

Page 38: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

What kind of information ?

Bergeron, Bioinformatics Computing, pp:45-46

Page 39: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

What kind of information ?

http://www.ncbi.nlm.nih.gov

Page 40: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Sequences

http://www.ncbi.nlm.nih.gov

Genebankhttp://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html

•DNA sequences.•Primary data generators submit to Genebank.•Annotation issues.•Heart of most genomics projects.

Page 41: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Structures

http://www.ncbi.nlm.nih.gov

Protein Databankhttp://www.rcsb.org/pdb /

•Models of 3D structures•X-ray crystallography•NMR spectroscopy

Page 42: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

Microarray

http://www.ncbi.nlm.nih.gov

Gene expression•Identify which genes are expressed under a given set of conditions.•Microchips require small amount of sample for a full analysis.

Page 43: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

What can we do with sequences?

http://www.ncbi.nlm.nih.gov

Multiple sequence alignments

PrincipleCharacter in sequences can be substituted randomly.

Alignment position homologous position together.

Unlikely that an ultimate alignment tool will ever be made.

Page 44: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

What can we do with sequences?

http://www.mbio.ncsu.edu/BioEdit/bioedit.html

Multiple sequence alignments Tools

Bioedit (Windows)free. All inclusive functions

Seaview (Unix)

Free. Unstable. Little alternative that I know of.

Page 45: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

What can we do with sequences?

http://www.the-scientist.com/images/yr2001/oct29/y.gif

Whole genome analysis

Look for genes

Look for regulation mechanisms

Look for drug targets (exclusive pathway)

Predict the function of unknown sequences

Page 46: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

What can we do with sequences?Tell a story

Relationship amongst sequences

Origins of systems

Horizontal transfer of information between sequence

Understand evolution

Page 47: CSCI6904 Genomics and Biological Computing Instructor: Christian Blouin Schedule : - Monday 14:30 – 13:55 - Wednesday 14:30 – 13:55 Contact : cblouin@cs.dal.cacblouin@cs.dal.ca

What can we do with structures?

What is its function?

What is the mechanism?

Does it relate to other known structures

Can we design a drug to enhance/suppress its function?

Predict the structure of related proteins.

http://www.ks.uiuc.edu/Research/vmd/