knime as a platform for distributed molecular property predictions

15
Johannes Koppe, Andreas Teckentrup, Nils Weskamp KNIME as a platform for distributed molecular property predictions

Upload: others

Post on 12-Sep-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: KNIME as a platform for distributed molecular property predictions

Johannes Koppe, Andreas Teckentrup, Nils Weskamp

KNIME as a platform for distributed molecular property predictions

Page 2: KNIME as a platform for distributed molecular property predictions

Motivation

BICEPS

Synthesis ideas

Sample registration

Vendor catalogs

Virtual libraries

Screening hits

Predicted molecular properties

Compound filtering andprioritization

Most promising compounds

Page 3: KNIME as a platform for distributed molecular property predictions

BICEPS: assisting design of chemical structures

Page 4: KNIME as a platform for distributed molecular property predictions

BICEPS: technical implementation

PIDAMDB

CIDB

ISIS/Base Frontend

Oracle Oracle

SDF

(Conductor)

XML

Page 5: KNIME as a platform for distributed molecular property predictions

BICEPS: number of generated data points over 12 months

0

2.000.000

4.000.000

6.000.000

8.000.000

10.000.000

12.000.000

14.000.000

16.000.000

18.000.000

20.000.000

123456789101112

BICEPS

CIDB

01/1

1

12/1

0

11/1

0

10/1

0

09/1

0

08/1

0

07/1

0

06/1

0

05/1

0

04/1

0

03/1

0

02/1

0

Page 6: KNIME as a platform for distributed molecular property predictions

BICEPS: experiences under high load conditions

• Altogether, nearly 55 million results generated over the last 12 months

• High variability in monthly system load

– Peaks in system load often due to model updates and new compound collections

• Issues of scalability, load balancing and reliability under high load conditions

– Need for manual interventions and significant maintenance efforts

– Client-Server architecture of many workflow tools leads to bottlenecks and single points of failure

– Scale-up significantly increases license costs

• Decision to evaluate fully-distributed system architecture based on KNIME

Page 7: KNIME as a platform for distributed molecular property predictions

Distributed KNIME setup @ BI

Interactive use:

Headless use:

Conductor

scheduling and submitting

deployed KNIME jobs

Compute Node

Compute NodeCompute Node

Compute Node

Compute Node

Compute NodeCompute Node

User @ Workstation

Page 8: KNIME as a platform for distributed molecular property predictions

Collaboration with KNIME.com

• Existing workflows rely heavily on various external tools for structure manipulation and descriptor calculation

• Generic integration of external tools into KNIME necessary (incl. optional parallel cluster execution)

– Workflow migration to KNIME without switching the underlying chemistry tools

• Sponsoring of a generic “External Tool” node/framework

– Specifications by BI in collaboration with KNIME.com

– Implementation by KNIME.com

– Code part of the KNIME open source release

• Adaptation of the generic framework for specific tools and setup @ BI

Page 9: KNIME as a platform for distributed molecular property predictions

External tool integration incl. cluster execution

Generic ExtTool FrameworkDefault Executor

(local job execution)

BI SGE Executor(DRMAA-based

job submission toSGE)

SSH Remote Executor

Corina-specific Node Customizer

(Options, GUI, Filetypes)

Daylight-specific Node Customizer

(Options, GUI, Filetypes)

Moe-specific Node Customizer

(Options, GUI, Filetypes)

Page 10: KNIME as a platform for distributed molecular property predictions

External tool integration incl. cluster execution

Corina options

Batch creation

for parallel

execution

Page 11: KNIME as a platform for distributed molecular property predictions

External tool integration incl. cluster execution

Local execution

vs. grid engine

submission

Error handling

and automatic

re-submission of

failed jobs

Automatic queue

selection based

on external tool

characteristics

Page 12: KNIME as a platform for distributed molecular property predictions

External tool integration incl. cluster execution

Page 13: KNIME as a platform for distributed molecular property predictions

Example workflow – property calculation (PROP4)

Page 14: KNIME as a platform for distributed molecular property predictions

Current status

• Most of the necessary external chemistry tools available within KNIME

• Large number of workflows for property predictions ported to KNIME

– Testing under realistic conditions yielded promising results

– Identical results from old and new workflows

– Switch to productive use of KNIME expected soon

• Rollout of KNIME within the computational chemistry group for interactive usage

– First end-user training completed

– Currently extensive evaluation within the group

– Received a lot of feedback, mainly on usability issues

– Decision on productive use of KNIME in this application expected in 2011

Page 15: KNIME as a platform for distributed molecular property predictions

Acknowledgements

• Oliver Wissdorf

• Bernd Wiswedel (KNIME.com)