shape modeling and matching in protein structure identification sasakthi abeysinghe, tao ju...

23
Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu Baylor College of Medicine, Houston, USA

Upload: kerry-miles

Post on 16-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Shape Modeling and Matching in Protein Structure IdentificationShape Modeling and Matching in Protein Structure Identification

Sasakthi Abeysinghe, Tao Ju

Washington University, St. Louis, USA

Matthew Baker, Wah Chiu

Baylor College of Medicine, Houston, USA

Page 2: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Shape MatchingShape Matching

• Shape comparison

– How similar are shape A and shape B?

– Application: 3D model retrieval

• Shape alignment

– What is the best alignment of A onto B?

– Application: object recognition and registration

Page 3: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Shape MatchingShape Matching

• Shape comparison

– How similar are shape A and shape B?

– Application: 3D model retrieval

• Shape alignment

– What is the best alignment of A onto B?

– Application: object recognition and registration

3D Protein Image1D Protein Sequence

Page 4: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Structural BiologyStructural Biology

• Protein: a sequence of amino acids

– Folds into a 3D structure in order to interact with other molecules

– Protein function derived from its 3D structure

• Identifying protein structure

– Imaging methods: X-ray, NMR

– Drawback: can not resolve large assemblies, like viruses.

Page 5: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Domain ProblemDomain Problem

• Cryo-electron microscopy (Cryo-EM)

– Produces 3D density volumes

– Drawback: insufficient resolution to resolve atom locations

• How to determine protein structure in a cryo-EM volume?

?

Page 6: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Shape Matching FormulationShape Matching Formulation

• Matching 1D protein sequence with 3D density volume

• Intermediate goal: Matching alpha-helices

– One of the basic building blocks in a protein

– Identified as cylindrical densities in the volume [Baker 07]

• How to align the protein sequence with the cryo-EM volume to match the two sets of helices?

+?

Page 7: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Method OverviewMethod Overview

• Compatible shape representation

– 1D sequence and 3D volume as attributed relational graphs

• Graph-based shape matching

– A new constrained graph matching problem and an optimal solution

– Error-tolerant (inexact) matching

Page 8: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Shape RepresentationShape Representation

• Protein sequence as attributed relation graph

– An edge: a helix segment or a non-helix segment

• Attribute: number of amino acids in the segment

– A node: end of a helix of end of the sequence

– Add additional edges that skip at most m helix segments

• To allow matching with a cryo-EM volume that has missing helices

Page 9: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Shape RepresentationShape Representation

• Graph representation of Cryo-EM volume via skeletons

– 3D Skeleton [Ju 06] builds connectivity among detected helices

– An edge: a detected helix or a skeleton path between two helices

• Attribute: length of the helix or skeleton path

– A node: end of a helix of end of the protein

– Add additional edges between helix-ends less than d apart

• To account for missing helix connectivity in the skeleton

Page 10: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Shape Matching - ProblemShape Matching - Problem

• Finding two matching chains of helices

– Same number of edges

– Alternating types between non-helix and helix

– Minimal attribute matching error

• Uniqueness of this problem:

– Inexact: not all edges/nodes in the two graphs are used in the matched sequence

– Constrained: the match must have a linear topology

Page 11: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Shape Matching - ReviewShape Matching - Review

• Previous work on graph matching

– Exact matching

• Graph mono-morphism [Wong 90]

• Sub-graph isomorphism [Ullmann 76, Cordella 99]

– Inexact matching

• A* search [Nilsson 80], simulated annealing [Herault 90], neural networks [Feng 94], probabilistic relaxation [Christmas 95], genetic algorithms [Wang 97], graph decomposition [Messmer 98]

• All designed for un-constrained problems where there is no restriction on the topology of the matched sub-graphs.

Page 12: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Shape Matching - MethodShape Matching - Method

• Key idea: utilize the linearity of chains.

• Performing depth-first tree-search

– Append matching nodes to the incomplete chain with minimal matching error

• A*-search

– Reduce node expansion by estimating future matching error

– Optimal if future error estimation is smaller than the actual error.

– 3 future error functions are designed

{3,3} 63

{2,2} 42

{2,3} 85

{2,4} 92

{2,5} 40

{3,2} 61

{3,4} 72

{3,4} 48

{3,5} 91

{4,3} 99

{4,5} 51

{6,6} 58

{1,1}Sequence Graph Volume Graph

Page 13: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Experimental SetupExperimental Setup

• Test data

– Simulated data: 8 proteins (taken from Protein Data Bank)

– Authentic data: 3 proteins (produced at Baylor)

• Test modes

– Automatic

– With a few user-specified helix correspondences

• Validation with the actual helix correspondence

– Produce a list of candidates sorted by their matching errors

– Find out where the actual correspondence ranks in the list

Page 14: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Results - 1Results - 1

Sequence Cryo-EM volume and its skeleton

+

Top Matching

• Bluetongue Virus (simulated, 10 helices, 0 missing)

– Actual correspondence ranks #1

Page 15: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Results - 2Results - 2

Sequence Cryo-EM volume and its skeleton

+

• Human Insulin Receptor (simulated, 9 helices, 1 missing)

– Actual correspondence ranks #1

+

Top Matching

Page 16: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Results - 3Results - 3

Sequence Cryo-EM skeleton Top Matching

• Bacteriophage P22 (authentic, 11 helices, 6 missing)

– Actual correspondence ranks #4

+

Actual Correspondence

Page 17: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Results - 4Results - 4

SequenceCryo-EM skeleton with 2 use-specified helix pairs

Top Matching Without user-specification

• Triose Phosphate Isomerase (simulated, 12 helices, 3 missing)

– Before user-specification: actual correspondence not in the candidate list

– Given 2 specified helix pairs: actual correspondence ranks #9

+

Actual Correspondence

Page 18: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Result - SummaryResult - Summary

• Among the 11 proteins, the correct correspondence ranks among the candidate list computed by our method:

– Top 1: 4 proteins

– Within top 10: 2 proteins (1 simulated)

– Top 1 after user-interaction: 2 proteins (both simulated)

• 4 specified helix pairs in a 14/20-helix protein.

– Within top 10 after user-interaction: 3 proteins

• 2 specified helix pairs in a 6/9/12-helix protein

• Performance

– Under 4 seconds for proteins with 20 helices

– Compare: [Wu 05] uses exhaustive search and takes 16 hours for finding correspondences in proteins with 8 helices

Page 19: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

ConclusionConclusion

• Formulate protein structure identification as shape matching

– 1D protein sequence vs. 3D cryo-EM density volume

– Compatible representation of disparate biological data as graphs

• Formulate a constrained inexact matching problem and propose an optimal solution

– Based on A*-search

• Validation on simulated and authentic data

Page 20: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Future Work (Bio)Future Work (Bio)

• Incorporating beta-sheets for improved accuracy

– Challenge: the match is no longer a linear chain

• Integrating homology and ab initio modeling

– Utilizing known 3D structure of segments

– Refining the alignment by molecular energy minimization

Page 21: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Future Work (CS)Future Work (CS)

• Faster graph matching algorithm

– Explore variants of A*-search to reduce running time for larger proteins (>20 helices)

• Better skeleton generation

– Generate skeletons directly from gray-scale density volume for iso-value-independent representation

– Utilize cell-complex-based skeleton for better skeleton geometry

• Currently used for topology editing, see [Ju, Zhou and Hu. Siggraph 2007]

Page 22: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Pacific Graphics • Hawaii • 2007Pacific Graphics • Hawaii • 2007

• Oct 29 – Nov 2, in Maui, Hawaii

Conference Chair: Ron GoldmanProgram co-chairs: Marc Alexa, Steven Gortler, Tao Ju

Page 23: Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu

Results - 1Results - 1

Sequence Cryo-EM volume and its skeleton

+

Top Matching

• Bluetongue Virus (simulated, 10 helices, 0 missing)

– Actual correspondence ranks #1