informs 2004

30
INFORMS 2004 Hyun-suk Yoon Joel Sokol School of Industrial and Systems Engineering Georgia Institute of Technology Optimization Approaches to HP Lattice Protein Folding

Upload: joelle-kirkland

Post on 31-Dec-2015

41 views

Category:

Documents


0 download

DESCRIPTION

INFORMS 2004. Optimization Approaches to HP Lattice Protein Folding. Hyun-suk Yoon Joel Sokol School of Industrial and Systems Engineering Georgia Institute of Technology. Table of contents. Introduction to Protein Folding Integer Programming (IP) Approach - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: INFORMS 2004

INFORMS 2004

Hyun-suk Yoon

Joel Sokol

School of Industrial and Systems Engineering

Georgia Institute of Technology

Optimization Approaches toHP Lattice Protein Folding

Page 2: INFORMS 2004

Table of contents

Introduction to Protein Folding

Integer Programming (IP) Approach

Introduction to Constraint Programming

(CP)

CP Approach

Discussion

Page 3: INFORMS 2004

Protein

• Sequence of amino acids

• Size: 30 ~ 10,000 amino acids,

a few hundred amino acids

on average

• Fold into a 3D compact structure

quickly in minimum energy state.

• Exponential number of possible

3D structures.

Page 4: INFORMS 2004

Problem description

How can we find a 3D structure of a protein

given a sequence of amino acids?

Page 5: INFORMS 2004

Motivation

1. Design drugs

• Most drugs work by attaching themselves to a protein

• Knowing 3-D shapes of proteins will help to design drugs. 2. Detect misfolding

• Proteins occasionally may not have the correct 3-D shapes.

• Misfolded proteins are known as the causes of a number of

diseases, i.e., Alzheimer’s disease and Parkinson’s disease.

Page 6: INFORMS 2004

Protein folding

How to figure out protein folding

• Experimental techniques: X-ray crystallography and NMR

spectroscopy

• Computational techniques: i.e., Folding@Home Protein Data Bank (PDB)

• http://www.rcsb.org/pdb

• Worldwide repository for 3-D structure data of large

molecules of proteins and nucleic acids.

Page 7: INFORMS 2004

HP model

• Hydrophobic or Polar

• 20 types of amino acids:

8 H’s and 12 P’s

Lattice model

• Locate each amino acid on a point of a cubic lattice.

• Parity problem: triangular or diagonal lattice model.

HP model and Lattice model

Page 8: INFORMS 2004

• HP model + Lattice model: the simplest protein model

- Advantage: use enumeration techniques to locate amino

acids.

- Disadvantage: low resolution, no explicit local interactions,

equal bond length

• Lau and Dill (1989): minimizing total energy in the HP

lattice model = maximizing the number of H-H contacts.

HP lattice model

Page 9: INFORMS 2004

Example of HP lattice model

Hydrophobic amino acid

Polar amino acid

Peptide bond

H-H contacts

Number of H-H contacts

= Number of adjacencies between hydrophobic amino acids

(except for peptide bonds)

Page 10: INFORMS 2004

Literature review

Protein topology

• Levitt and Chothia (1976) represent 2D structural topology of protein in a diagrammatic form.

• Richardson (1977) shows the first systematic survey of protein topology.

HP lattice model

• Lau and Dill (1989) study a HP model on the square and cubic lattice.

• Berger and Leighton (1998) and Crescenzi et al. (1998) prove that HP lattice model is NP-complete.

Page 11: INFORMS 2004

Table of contents

Introduction to Protein Folding

Integer Programming (IP) Approach

Introduction to Constraint Programming

(CP)

CP Approach

Discussion

Page 12: INFORMS 2004

General model

Max The number of H-H contacts

s.t. 1. (Assignment) Each amino acid must occupy one

lattice point.

2. (Non-overlapping) No two amino acids may share

the same lattice point.

3. (Connectivity) Every two amino acids that are

consecutive in the protein's sequence must also

occupy adjacent lattice points.

Page 13: INFORMS 2004

Two IP models

(0,0)

• Model IP-1: Uses the coordinate of each amino

acid.

• Model IP-2: Uses the direction (Up, Down, Left,

Right).

(0,1) (1,1)

2Up Righ

t

3

1

2 3

1

Page 14: INFORMS 2004

• Often use 2-D model instead of 3-D and attempt to extend

2-D into 3-D.

• Easily extend 2-D into 3-D in our models

- Model IP-1: (x,y) (x,y,z)

- Model IP-2: add two more directions – forward, backward.

2-D vs 3-D

Page 15: INFORMS 2004

Solving IP Models

Defining decision

variables

Formulating the

problem

Preprocessing

Running it with

CPLEX

Max

s.t.

ji d

ijdy

jixk

ijk ,1 (Assignment) kx

i jijk 1

(Non-overlapping)

ijdijk yx ,binary

djixhyxhyk

kdijkijdk

ijkkijd ,,, ,)(

kjixxxxx kjikjikjikjiijk ,,0)1)(1()1)(1()1()1()1()1(

(Connectivity)

(Define y)

xijk = 1 if kth amino acid is located at (i,j),

0 otherwise.

yijd = 1 if two amino acids in (i,j) and in (i,j)+d are

both adjacent,

0 otherwise.

Page 16: INFORMS 2004

Computational results

Instance: 1PSV

• 28 amino acids: one of the smallest human proteins.

• Obtained data from PDB.

• Truncate to different sizes: 12, 18, 23, 28.

• Optimal solution:

Page 17: INFORMS 2004

Computational results (cont)

• CPLEX Running times (seconds)

- IP does not work well.

- Take a long time to solve 23 and 28 amino acids

instances.

N = 12

N = 18 N = 23 N = 28

IP-1 9.32 72.61 30000+ 30000+IP-2 13.85 30000+ 30000+ 30000+

Page 18: INFORMS 2004

IP did not work well

• Why?

- High degeneracy: there are a lot of structures having

the same minimum energy.

- Symmetry: IP formulation contains much symmetry.

• CP is known better than IP where IP formulation contains

much symmetry.

• So move on to CP.

Page 19: INFORMS 2004

Table of contents

Introduction to Protein Folding

Integer Programming (IP) Approach

Introduction to Constraint Programming

(CP)

CP Approach

Discussion

Page 20: INFORMS 2004

Concepts of CP

Constraint programming (CP)

• Study of modeling and solving a system of logical

constraints using search techniques.

• Began in the 1980s as part of artificial intelligence

research.

• Two main procedures: domain reduction and constraint

propagation

Page 21: INFORMS 2004

CP vs IP

• Advantages and disadvantages

• Unified methodologies with CP and IP have been

designed in recent years.

Advantages Disadvantages

CP More expressive,More effective in some cases

Less predictable,A lower bound may not exists.

IP A lower bound always exists.

Less expressive

Page 22: INFORMS 2004

CP previous research

• Smith (1996) shows environments where CP may work

better than IP.

• Barták (1999), Smith (1995), ILOG Solver 5.0 manual

(2000) show CP’s successful accomplishments in many

applications.

• Easton (2003) and Milano (2004) deal with combining

CP and IP.

Page 23: INFORMS 2004

• Model CP-1, CP-2: Use the direction (Up, Down, Left,

Right).

• Model CP-3: Uses the combination of coordinates.

Three CP models

2Up Righ

t

3

1

203+1 = 1 (0,1) (1,1) 13+1 =43

103+0 = 0 (0,0)

Page 24: INFORMS 2004

Models Description

Model CP-1Similar as IP models, but use max function and if-

then

function. Model CP-2Similar to CP-1 and makes the formulation simpler

using

Boolean function and absolute value. Model CP-3Use the alldifferent function.

Page 25: INFORMS 2004

How to solve the problem faster

CP strategies to solve the problem faster

• Use a known solution.

• Fix the direction from the first amino acid to the

next.

• Any two amino acids which have an even distance

cannot be adjacent.

• Two amino acids have an upper bound on their

distance.

• Variable ordering: Choose first the variables with

the smallest domain.

Page 26: INFORMS 2004

Computational results

• Same instance as IP (1PSV): 12, 18, 23, 28 amino

acids.

• Use ILOG Solver to run CP.

N = 23 N = 28N = 23 N = 28

Page 27: INFORMS 2004

Computational result - IP vs CP

• IP vs CP best running times (seconds)

- Models used: IP IP-1, CP CP-1 (with strategies).

- CP is faster than IP with our models.

IP (CPLEX) CP (Solver)

N = 12

9.32 0.18

N = 18

72.61 18.83

N = 23

30,000+ 7347.74 (= 2 hrs)

N = 28

30,000+ 209,127.89 (= 58 hrs)

Page 28: INFORMS 2004

Proposed research

1. Try other CP approaches such as dual modeling and

dynamic variable ordering.

2. Consider an unified methodology of IP and CP

- Decompose the problem, and apply IP to one part and

CP to the other part.

3. Attempt other approaches such as heuristic algorithm to

find better bounds.

Page 29: INFORMS 2004

Contribution

2. Biological field

• Success of our research can help in

the prediction of 3-D protein

structures, which may assist in

medical development.

1. Optimization field

• Help to show how CP can be an

alternative to or a complement of IP.

Page 30: INFORMS 2004

Any questions?

Hyun-suk Yoon

Industrial and Systems Engineering, Georgia Tech

[email protected]