biol 6385 / math6343/ bmen 6389prr105020/biol6385/2019/lecture/lectu… · computational biology...

34
Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15 – May. 7), 2019 The University of Texas at Dallas

Upload: others

Post on 25-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Computational BiologyBIOL 6385 / MATH6343/ BMEN 6389

Instructor: Prof. Michael Q. Zhang

Spring (Jan. 15 – May. 7), 2019The University of Texas at Dallas

Page 2: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

What the course teaches

• Computational and statistical methods for analyzing biological data and understanding the biological systems.

• Introduces computational aspects of :– Genomics– Evolution & phylogenetics– Gene regulation & gene networks

• Focus on generic methods and algorithms, NOT on specific protocols or tools

Page 3: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Course resources

• Instructors ( contact details on website ) :

Michael Zhang Pradipta Ray Qin Zhou

Instructor Associate Instructor Teaching Assistant

Page 4: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Course resources

• Mailing list :– [email protected]

• Please email the instructors your convenient email ( UTD email preferred ) to join.

• This is a broadcast email list– Only instructors post– For students, it is best to directly email the

instructors (email early, not late)– Email is the preferred mode of communication

Page 5: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Course resources• Website : home page ( dates, contacts, hours,

news ) : http://utdallas.edu/~prr105020/biol6385/

PHY 1.103

MATH6343

Page 6: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Course resources• Website : course info tab ( course policy )

Page 7: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Course resources• Website : schedule tab ( schedule, handouts, HW, solns)

MATH6343

Page 8: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Course policy

• Attendance and participation :– Active participation in class room discussion is

expected. Attendance is mandatory except special permission from the instructor.

Page 9: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Grading• Grading :

– midterm and final exams, and 3 problem sets– HWs (50%)– Midterm (25%)– Final exam (25%)– This is a graduate course : don’t focus on grades :

the goal is to understand the subject matter !– Final letter grades will depend on clustering and

relative quantile profiles, not on direct translation of numerical grades.

Page 10: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Examinations

• Exams– 75 minutes in duration.– open book and open notes. No Computers or

communication devices allowed.– Mid term exam date: March 7, class hours, in

class– Final exam date: May 7, class hours, in class– It is impossible for us to accommodate individual

requests to reschedule the exams.

Page 11: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Homework• To be done individually. Show all intermediate steps.

• Late homework: Homework is worth full credit at the beginning of class on the due date, It is worth 75% for the next 24 hours, 50% credit from 24 to 96 hours after the due date, 0% credit after that.

• Turn in all 3 HWs, even if for no credit, to pass the course. Late HW assignments must be turned in to the instructors.

Page 12: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

• For how to access online, or from a library near you, check the class website.

PRIMARY

TextbooksSECONDARY TERTIARY

Page 13: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Reference books

• For how to access online, or from a library near you, check the class website.

http

://w

ork.

calte

ch.e

du/le

ctur

es.h

tm

Page 14: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

5 sections•Unit 1: Modelling Uncertainty in Biology

– How to build a framework to rationally deal with uncertainty : probability (deductive/logical inference)

– How to estimate and infer parameters associated with such uncertainty : statistics (inductive learning, generalization/abstraction)

– How to proceed when there are many sources of uncertainty in a BIG DATA : bayes nets / deep neural networks

sketchup.google.com

Page 15: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

5 sections

• Unit 2: Molecular Sequence Analysis– Searching and alignment of sequences– Modelling composition of sequences and guessing

their functionality : classification of subsequencesand annotation

– Integrative analysis : how to combine evidence from multiple and extra-sequential sources when analyzing sequences

comm

ons.wikimedia.org

Page 16: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

5 sections

• Unit 3: Markovian models– Markov chains: The Markov condition among

random variables, factoring the joint– Hidden Markov Models: What happens when the

state of the system is unobserved ?– Supervised and unsupervised inference : Forward-

Backward type of algorithms, Baum-Welch / Expectation Maximization algorithm

– Pair and profile HMMs : Engineering Markovian models to solve computational biological problems

Statpics.blogspot.com

Page 17: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

5 sections• Unit 4: Evolution & Comparative

Genomics– Evolutionary dynamics : how DNA may change by

mutations– Multiple sequence alignment : comparing

sequences across individuals or species– Phylogenetic trees : clustering based on

sequences, explicitly modelling evolution of sequences

tolweb.org

Page 18: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

5 sections• Unit 5: Generic Machine Learning

Approaches for Comp Biologists– Optimization techniques : greedy and

more systematic optimization strategies– Markov Chain Monte Carlo: Algorithms to sample from

probability distributions– Classification : identifying classes of observation, category

prediction– Regression : estimating quantitative relationships among

multiple variables, forecasting– Structure learning : how to learn the structure of data– Ensemble learning : combining learning machines– Neural Networks and Deep Learning: feature free learning

comm

ons.wikimedia.org

Page 19: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

What’s computational biology?

Bioinformatics applies principles of information sciences and technologies to make the vast, diverse, and complex life sciences data more understandable and useful. Computational biology uses mathematical models and computational approaches to address theoretical and experimental questions in biology. Although bioinformatics and computational biology are distinct, there is also significant overlapand activity at their interface.[1] Wikipedia

Learning: Information Knowledge, but what’s more important than Knowledge?

Page 20: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

" It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material. " Watson&Crick

"Information is any difference that makes a difference.“ Shannon/Turing/Bateson

Digital revolution

Page 21: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

The Human Genome Project (1990-2005)(http://www.nhgri.nih.gov/HGP/)

Mapped Human Genes

“The new paradigm now emerging, is that all thegenes will be known (in the sense of beingresident in databases available electronically),and that the starting point of a biologicalinvestigation will be theoretical”

W. Gilbert (1991) 22

Page 22: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Gene finding and structure/function prediction (Sequence → Structure → Function)

23Human β-Globin

A typical vertebrate gene

E1 E2 E3 E4 E5 E6 E7

I1 I2 I3 I4 I5 I6DNA

mRNA Splicing

N a m e S i z e ( k b ) M R N A ( k b ) I n t r o n sβ - G l o b i n 1 . 5 0 . 6 2I n s u l i n 1 . 7 0 . 4 2P r o t e i n k i n a s e C 1 1 1 . 4 7A l b u m i n 2 5 2 . 1 1 4C a t a l a s e 3 4 1 . 6 1 2L D L r e c e p t o r 4 5 5 . 5 1 7F a c t o r V I I I 1 8 6 9 2 5T h y r o g l o b u l i n 3 0 0 8 . 7 3 6D y s t r o p h i n > 2 0 0 0 1 7 > 5 0

Some sizes of human genes

1 2 31 2 3

1 3

Example: alternative splicing of the fly sex determination gene

Page 23: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

CF Gene Discovery (1989)

24

Positional cloning:•Linkage analysis•Physical mapping•cDNA selection•Sequencing•Database search (alignment)

Page 24: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Single gene regulation

(enhancer)

(promoter)

CTCF

(insulator/boundary)

Page 25: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

GRN: Respiration Module

Regulation program

Module genes

Hap4+Msn4 known to regulate module genes

Predicted regulator

(Segal et al., Nature Genetics ’03)• Module genes known targets of predicted regulators?

Spellman et all, Mol. Biol. Cell, (1998)

Page 26: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

2008 - 2013

highlowmediumsilent

Page 27: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

amazon.com

Page 28: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

(Synthetic Biology)

Yeast (S. cerevisiae, 12.1Mb, 2017)

DNA predict:• Face• Age• Behavior• Make new species• etc.

Page 29: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

EnvironmentComp. Biol.

Syn. Biol.

What can be more interesting than understanding ourselves?Modified from ebookbrowse

The Omics-cascade, nature is unity

Page 30: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Two levels of modeling• Statistical (Macroscopic) and Population models

– Simple correlation: Y ~ X– Probabilistic/Predictive:

• P(Y,X), P(Y|X)• Ῡ=f(x, α) = E[Y|x] = Σ y P(Y=y | X=x)e.g. f = a x + b (linear regression);Boyel’s law: V p = CT , Kinetic theory (Boltzmann);Brownian motion: , (Einstein).

• Biophysical/Biochemical (Microscopic)and Evolutionary (dynamical) models

D2t 6πηrN

= RTx2=

∂t ∂x2

∂ρ = D ∂2ρ

Page 31: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Chance-Life: Statistical Learning• Probabilistic Graphical (chains/trees/DAGs) Models

– Directed (Bayesian Networks, Phylogeny),Undirected (Markov Networks:HMM/generative, CRF/discriminative)

– Representation (Conditional independence, H-C Thm: MN=Gibbs), Inference (DP/VP), Learning (MLE/BE, EM/MCMC, Sparsity, Regularization)

http://www.pgm-class.org/

• Machine Learning & Learning Machines–ANN, GA, Perceptron, SVM, Boosting, Boltzman Machine https://www.coursera.org/learn/machine-learning

Belief, behavior, Boosting (Efron)

Page 32: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

IBM's supercomputer Deep Blue (May 1997) beat chess master Garry Kasparov in a six-game match, in a dramatic reversal of their battle the previous year.Machine: extension of human being, replacing or beating man in specific functional task.

On March 15, 2016, the distributed version of AlphaGo won 4-1 against Lee Sedol, whose Elo rating is now estimated at 3,520. The distributed version of AlphaGo is now estimated at 3,586. It is unlikely that AlphaGo would have won against Lee Sedol if it had not improved since 2015.

Machine Brain convergence

AlphaGo vs Ke Jie

AlphaGo vs Lee Sedo

AlphaGo Zero Plays itself

Page 33: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Cognitive Computing

Page 34: BIOL 6385 / MATH6343/ BMEN 6389prr105020/biol6385/2019/lecture/lectu… · Computational Biology BIOL 6385 / MATH6343/ BMEN 6389 Instructor: Prof. Michael Q. Zhang Spring (Jan. 15

Welcome new and talented collaborating

research studentsworking on BigData, get

strong recommendation!