motif space database design

12
Motif Space Database Design Kiranjit Sidhu

Upload: yates

Post on 13-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Motif Space Database Design. Kiranjit Sidhu. Outline. Schema Design Content of Database Functionality Future Plans. Sample PDB File. Sample PDB File Each PDB File represented as a text file (~ 60K Lines) Inefficient for pattern matching - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Motif Space Database Design

Motif Space Database Design

Kiranjit Sidhu

Page 2: Motif Space Database Design

2

Outline Schema Design Content of Database Functionality Future Plans

Page 3: Motif Space Database Design

3

Sample PDB File

Sample PDB File

Each PDB File represented as a text file (~ 60K Lines)

Inefficient for pattern matching Relational Database required for

most efficient solution

Page 4: Motif Space Database Design

4

Structure of Database DB divided into two major components:

Protein Data Motif (Occurrence) Data

Protein Data Obtained from PDB Files (Protein Data Bank) Derived Data

Motif Data Obtained from Luke’s FFSM technique Derived Data

Page 5: Motif Space Database Design

5

Schema Design

Page 6: Motif Space Database Design

6

Schema Design - Protein

Page 7: Motif Space Database Design

7

Schema Design - Motif

Page 8: Motif Space Database Design

8

Tools Used Obtaining Data

Perl Scripts Database:

SQL Server 2000 and SQL Server 2005 T-SQL (Bulk Import Data)

Page 9: Motif Space Database Design

9

Obtaining Data

PDB File Temp Tables (T-SQL)

T-SQL Procedures

CSV FileExtract Import

Final DB Convert and Derive

Page 10: Motif Space Database Design

10

Uploading Protein Data Input dataset: ~ 70,000 PDB/Chain

Combinations Entries in tables:

E.g. Approx. 800 Million Rows in the proteinchaindistance table

Initial version imported 10 PDB files in 1 day

Current version: under 3 minutes

Page 11: Motif Space Database Design

11

Current Functionality Protein (PDB) data has been completely

uploaded into both: Production Database (MotifSpace) Development Database (MotifSpaceDev)

Visualize protein structure using data from database (data available)

Data can be obtained from Server using SOAP or web services.

Basic Queries such as Different PDBs a specific motif occurs in? Histograms to compute statistics.

Page 12: Motif Space Database Design

12

Demo