high-throughput crystallography at monash noel faux dept of biochemistry and molecular biology...

Post on 13-Jan-2016

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

High-Throughput Crystallography at Monash

Noel Faux

Dept of Biochemistry

and Molecular Biology

Monash University

Structural Biology Pipe Line

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

CloningExpression

PurificationCrystallisation

X-ray diffraction Determine the structure

High throughput robots and technologies:Tecan Freedom EvolutionÄKTAxpress™Trialing crystal storage and imaging facilities

Australian synchrotrononline in 2007

Data processingand structural determination: major bottle neck

Target tracking / LIMSData

ManagementPhasing (CCP4/CNS

GRID computing)

The problems• Target-tracking/Data management• The process of protein structure determination creates a large volume of data.• Storage, security, traceability, management and backup of files is ad-hoc.• Remote access of the files is limited and requires different media formats.

• Structure determination• CPU intensive

• Part of a National Project for the

development of eResearch platforms for the

management and analysis of data for

research groups in Australia.

• Aim: establish common standardised software / middleware applications that are adaptable to many research capabilities

Solution

• Central repository of files• Attach metadata to the files• World wide secure access to the files• Automated collection and annotation of

the files from in-house and synchrotron detectors

The infrastructure

Instrume

nt R

epK

epler

Crystal Temp

Lab Temp

X-ray image

Mounted crystal Streaming Video (SV)

Lab SV

Lab Still Pics

Sensor Data

Storage ResourceBroker

Monash University ITs Sun GRID: 54 dual 2.3 GHz CPUs208.7 GB (3.8 GB per node)>10 TB storage capacityRunning Gridsphere

Lab PC

CollectionPC

Central web portal

Central web portal

Automated X-ray data reduction

• Automated processing of the diffraction data

• Investigating the incorporation of Xia2 : Automated Data Reduction:• New automated data reduction system designed

to work from raw diffraction data and a little metadata, and produce usefully reduced data in a form suitable for immediately starting phasing and structure determination (CCP4)

1. (Graeme Winter) The CCP4 suite: programs for protein crystallography. (1994). Acta Crystallogr. D50, 760-763.

1

Divide and Conquer

• A large number of CPUs available across different computer clusters at different locations:• Monash ITs Sun grid• VPAC:

• (Brecca – 97 dual Xeon 2.8 GHz CPUs, 160 GB (2 GB per node) total memory; Edda – 185 Power5 CPUs, 552 GB (8-16 GB per node) total memory)

• APAC:• 1680 processors, 3.56 terabytes of memory, 100 terabytes of

disk

• Personal computers

DART and CCP4

• Aims: Use the CCP4 interface locally but run the jobs remotely across a distributed system

• Nimrod to distribute the CCP4 jobs across the different Grid systems

• Investigating the possibility of incorporating the CCP4 interface into the DART web portal

• No phasing data• No sequence identity (<20%)• No search model• Is there a possible fold homolog• Exhaustive Phaser scan of the PDB

• Exhaustive searches with different parameters and search models

Exhaustive Molecular Replacement

2. Acta Cryst. (2005). D61, 458-464. Likelihood-enhanced fast translation functions A. J. McCoy, R. W. Grosse-Kunstleve, L. C. Storoni and R. J. Read.

2

Exhaustive Molecular Replacement

• Proteins building blocks are domains• Use subset of SCOP as search models in a

PHASER calculation.• The use of Grid computing will make this possible

~1000 CPUs = days for typical run

SCOP

ClassFold

SuperfamilyFamilies

Domains

7971

15893004

75930

• Search at the family level• Take the highest resolution structure

• Mutate to poly-alanine, and delete loops and turns• Phaser• Families with z-score 6 search with each of their domain members

Exhaustive Molecular Replacement

Database containing:• ToDo list• Parameters• Results ITs Sun GRID

Each node runs a perl script:• Requests a job• Launch phaser• Returns the results • Repeats until the list is exhausted

56 dual dual AMD OpteronCPUs208.7 GB (3.8 GB per node)>10 TB storage capacity, 160 GB (2 GB per node) total memory

Will be extended to use Nimrod to gain access to APAC and the Pacific Rim Grid (Pragma)

Final Pipeline

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

CloningExpression

PurificationCrystallisation

X-ray diffraction Determine the structure

Data collection, management, storage, and remote accessDART

Xia2 Data processing, exhaustive experimental (e.g., SAD, SIRAS, MIRAS) and MR phasing for final refinementGrid ComputingNIMRODPHASERAutoSHARPCCP4DART

High through put robotics and

technologies

AcknowledgmentsMonash UniversityAnthony BeitzNicholas McPheeJames WhisstockAshley Buckle

James Cook UniversityFrank EilertTristan King

DART Team

top related