Transcript
Page 1: Molecular Shape Searching on GPUs: A Brave New World

FastROCS: What does it mean to be “fast”?

OpenEye Scienti!c Software Brian Cole

March 26, 2013 © 2013 OpenEye Scienti!c Software

Page 2: Molecular Shape Searching on GPUs: A Brave New World

FastROCS and the “Chasm”

OpenEye Scientific Software Brian Cole

© 2013 OpenEye Scientific Software March 26, 2013

Page 3: Molecular Shape Searching on GPUs: A Brave New World

ROCS: Rapid Overlay of Chemical Structures

March 26, 2013 © 2013 OpenEye Scienti!c Software

Page 4: Molecular Shape Searching on GPUs: A Brave New World

LeadHopper

March 26, 2013 © 2013 OpenEye Scienti!c Software

Page 5: Molecular Shape Searching on GPUs: A Brave New World

And then you wait…

March 26, 2013 © 2013 OpenEye Scienti!c Software

Page 6: Molecular Shape Searching on GPUs: A Brave New World

What is FastROCS?

CPU   GPU  

Shap

e  Overla

ys  per  Secon

d  

© 2013 OpenEye Scienti!c Software

High  is  

Best  

Page 7: Molecular Shape Searching on GPUs: A Brave New World

1  

10  

100  

1,000  

10,000  

100,000  

1,000,000  

CPU   GPU  

Shap

e  Overla

ys  per  Secon

d  

What is FastROCS?

© 2013 OpenEye Scienti!c Software

High  is  

Best  

Page 8: Molecular Shape Searching on GPUs: A Brave New World

©  2013  OpenEye  Scien;fic  So>ware  

0  

100,000  

200,000  

300,000  

400,000  

500,000  

600,000  

CPU   GPU  

Shap

e  Overla

ys  per  Secon

d  

What is FastROCS?

High  is  

Best  

Page 9: Molecular Shape Searching on GPUs: A Brave New World

1  

10  

100  

1,000  

10,000  

100,000  

1   10   100  

Log  (Elapsed

 5me  in  se

cond

s)  

Log  (cores/GPUs)  

March 26, 2013 © 2013 OpenEye Scienti!c Software

But I want it now!

ROCS  

FastROCS  Low  is  

Best  

Page 10: Molecular Shape Searching on GPUs: A Brave New World

Riding Moore’s Law

March 26, 2013 © 2013 OpenEye Scienti!c Software

0  200,000  400,000  600,000  800,000  

1,000,000  1,200,000  1,400,000  1,600,000  1,800,000  2,000,000  

C1060   C2050   C2075   C2090   K10   K20  

Shap

e  Overla

ys  per  Secon

d  

High  is  

Best  

Page 11: Molecular Shape Searching on GPUs: A Brave New World

ROCS user base

•  Every Pharma R&D •  Many BioTechs •  Many Universities •  National Labs and Research Centers •  Other software companies

March 26, 2013 © 2013 OpenEye Scienti!c Software

Page 12: Molecular Shape Searching on GPUs: A Brave New World

Licenses by Year

March 26, 2013 © 2013 OpenEye Scienti!c Software

2009   2010   2011   2012  

ROCS  

FastROCS  

High  is  

Best  

Page 13: Molecular Shape Searching on GPUs: A Brave New World

Licenses by Year (Linear Scale)

March 26, 2013 © 2013 OpenEye Scienti!c Software

2009   2010   2011   2012  

ROCS  

FastROCS  

%15  

Pharmageddon    

Page 14: Molecular Shape Searching on GPUs: A Brave New World

All ROCS users (linear scale)

March 26, 2013 © 2013 OpenEye Scienti!c Software

2009   2010   2011   2012  

Academics  

ROCS  

FastROCS  

%3  

Page 15: Molecular Shape Searching on GPUs: A Brave New World

Technology Adoption Lifecycle

March 26, 2013 © 2013 OpenEye Scienti!c Software

%2.5   %13.5   %34   %34   %16  

FastROCS  

Page 16: Molecular Shape Searching on GPUs: A Brave New World

What’s in the “chasm”?

•  “ROCS is already fast enough”

•  “The results aren’t bitwise comparable”

•  “There’s nothing else to run on the GPU”

•  “GPUs are different”

March 26, 2013 © 2013 OpenEye Scienti!c Software

GTC!  

Some  other  ;me…  

Page 17: Molecular Shape Searching on GPUs: A Brave New World

FastROCS Quick Start

•  crtl-alt-F1 (to switch to a non X-server terminal) •  login as root •  /sbin/init 3 (to turn off the X-server) •  ./NVIDIA-Linux-x86_64-285.05.09.run •  reboot •  ./cuda.sh to give /dev/nvidia* correct permissions

•  tar –xzf fastrocs-1.3.1-RHEL5-x64-OpenCL-1.1-CUDA-4.1.tar.gz •  openeye/bin/ShapeDatabaseServer.py database.oeb.gz •  openeye/bin/ShapeDatabaseClient.py localhost:8080 query.sdf out.sdf

March 26, 2013 © 2013 OpenEye Scienti!c Software

Page 18: Molecular Shape Searching on GPUs: A Brave New World

ROCS Quick Start

•  tar –xzf ROCS-3.1.1-RHEL5-x64.tar.gz

•  openeye/bin/rocs query.sdf database.oeb.gz

March 26, 2013 © 2013 OpenEye Scienti!c Software

S;ll  a  barrier  to  entry  to  work  around!  

Page 19: Molecular Shape Searching on GPUs: A Brave New World

This is even worse!

fastrocs-1.3.1-RHEL5-x64-OpenCL-1.1-CUDA-4.1.tar.gz

March 26, 2013 © 2013 OpenEye Scienti!c Software

NVidia  OpenCL  binaries  are  ;ghtly    locked  to  a  par;cular  driver  version  

Page 20: Molecular Shape Searching on GPUs: A Brave New World

Worthwhile to upgrade

March 26, 2013 © 2013 OpenEye Scienti!c Software

0  

100,000  

200,000  

300,000  

400,000  

500,000  

600,000  

700,000  

800,000  

C2050  (260  Driver)   C2050  (295  Driver)  

Conformers  /

 Secon

d  %11  

High  is  

Best  

Page 21: Molecular Shape Searching on GPUs: A Brave New World

Needed for new hardware

March 26, 2013 © 2013 OpenEye Scienti!c Software

0  

200,000  

400,000  

600,000  

800,000  

1,000,000  

1,200,000  

C2050  (295  Driver)   M2090  (295  Driver)  

Conformers  /

 Secon

d  

High  is  

Best  

Page 22: Molecular Shape Searching on GPUs: A Brave New World

Scalability between drivers (4x C2050)

March 26, 2013 © 2013 OpenEye Scienti!c Software

1  

2  

3  

4  

1   2   3   4  

Speedu

p  (Single  GPU

 5me  /  Mul5-­‐GPU

 5me)  

Number  of  GPUs    

Ideal  

260  driver  

295  driver  

High  is  

Best  

Page 23: Molecular Shape Searching on GPUs: A Brave New World

Really bad for 8x M2090

March 26, 2013 © 2013 OpenEye Scienti!c Software

0  

1  

2  

3  

4  

5  

6  

7  

8  

1   2   3   4   5   6   7   8  

Speedu

p  (Single  GPU

 5me  /  Mul5-­‐GPU

 5me)

 

Number  of  GPUs    

High  is  

Best  

Page 24: Molecular Shape Searching on GPUs: A Brave New World

Ways to transfer to device

•  CL_MEM_USE_HOST_PTR –  kernelBuf = clCreateBuffer(CL_MEM_USE_HOST_PTR)

•  CL_MEM_ALLOC_HOST_PTR|CL_MEM_COPY_HOST_PTR –  kernelBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR|CL_MEM_COPY_HOST_PTR)

•  CL_MEM_ALLOC_HOST_PTR –  kernelBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR) - cacheable –  ptr = clEnqueueMapBuffer(kernelBuf, CL_MAP_WRITE) –  memcpy(ptr, data) –  clEnqueueUnmapMemObject(ptr)

•  clEnqueueMapBuffer –  kernelBuf = clCreateBuffer() - cacheable –  ptr = clEnqueueMapBuffer(kernelBuf, CL_MAP_WRITE) –  memcpy(ptr, data) –  clEnqueueUnmapMemObject(ptr)

•  clEnqueueWriteBuffer –  kernelBuf = clCreateBuffer() - cacheable –  clEnqueueWriteBuffer(kernelBuf, data)

•  oclCopyCompute –  pinnedBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR|CL_MEM_READ_WRITE) – cacheable –  pinnedPtr = clEnqueueMapBuffer(pinnedBuf, CL_MAP_WRITE) – cacheable –  memcpy(pinnedPtr, data) –  kernelBuf = clCreateBuffer() – cacheable –  clEnqueueWriteBuffer(kernelBuf, pinnedPtr)

March 26, 2013 © 2013 OpenEye Scienti!c Software

Page 25: Molecular Shape Searching on GPUs: A Brave New World

Ways to transfer from device

•  CL_MEM_ALLOC_HOST_PTR –  kernelBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR) - cacheable –  ptr = clEnqueueMapBuffer(kernelBuf, CL_MAP_WRITE) –  memcpy(data, ptr) –  clEnqueueUnmapMemObject(ptr)

•  clEnqueueMapBuffer –  kernelBuf = clCreateBuffer() - cacheable –  ptr = clEnqueueMapBuffer(kernelBuf, CL_MAP_WRITE) –  memcpy(data, ptr) –  clEnqueueUnmapMemObject(ptr)

•  clEnqueueReadBuffer –  kernelBuf = clCreateBuffer() - cacheable –  clEnqueueWriteBuffer(kernelBuf, data)

•  oclCopyCompute –  pinnedBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR|CL_MEM_READ_WRITE) –

cacheable –  pinnedPtr = clEnqueueMapBuffer(pinnedBuf, CL_MAP_WRITE) – cacheable –  memcpy(pinnedPtr, data) –  kernelBuf = clCreateBuffer() – cacheable –  clEnqueueReadBuffer(kernelBuf, pinnedPtr)

March 26, 2013 © 2013 OpenEye Scienti!c Software

Page 26: Molecular Shape Searching on GPUs: A Brave New World

March 26, 2013 © 2013 OpenEye Scienti!c Software

0  

1  

2  

3  

4  

5  

6  

7  

8  

9  

1  1  1  1  1  2  2  2  2  2  3  3  3  3  3  4  4  4  4  4  5  5  5  5  5  6  6  6  6  6  7  7  7  7  7  8  8  8  8  8  Speedu

p  (Tim

e  Sequ

en5a

l  /  Tim

e  Pa

rallel)  

Number  of  GPUs  U5lized  

FastROCS  scalability  across  8x  M2070  

Page 27: Molecular Shape Searching on GPUs: A Brave New World

Lessons from the mess

•  clEnqueueWriteBuffer > clEnqueueMapBuffer

•  clEnqueueMapBuffer >> clEnqueueReadBuffer

•  CL_MEM_* constants aren’t worth the effort

March 26, 2013 © 2013 OpenEye Scienti!c Software

Page 28: Molecular Shape Searching on GPUs: A Brave New World

CUDA?

•  Serious customers will only use NVidia cards

•  Pinned memory

•  Better support for binaries and compatibility •  CUDA support >> OpenCL support

March 26, 2013 © 2013 OpenEye Scienti!c Software

Page 29: Molecular Shape Searching on GPUs: A Brave New World

FastROCS CUDA port

March 26, 2013 © 2013 OpenEye Scienti!c Software

0  

500,000  

1,000,000  

1,500,000  

2,000,000  

2,500,000  

3,000,000  

OpenCL   CUDA   CUDA-­‐pinned  

Confom

ers  p

er  Secon

d  

2xC2075  2xC2090  2xK20  

High  is  

Best  

Page 30: Molecular Shape Searching on GPUs: A Brave New World

CUDA Scaling?

March 26, 2013 © 2013 OpenEye Scienti!c Software

0  

1,000,000  

2,000,000  

3,000,000  

4,000,000  

5,000,000  

6,000,000  

7,000,000  

8,000,000  

1   2   3   4   5   6   7   8  

Conformers  p

er  Secon

d  

Number  of  individual  K10  GPUs    (Note,  each  K10  has  2  physical  GPUs  on  the  board)  

CUDA  

OpenCL  

Ideal  

High  is  

Best  

Page 31: Molecular Shape Searching on GPUs: A Brave New World

CUDA vs OpenCL: Ding Ding!

•  Portability vs Innovation

•  NVidia vs Intel and AMD

•  Open vs Proprietary

•  Customers don’t care…

March 26, 2013 © 2013 OpenEye Scienti!c Software

Page 32: Molecular Shape Searching on GPUs: A Brave New World

ROCS Implementations

•  We only care a little…

•  Fortran code (1995) •  C code (1999) •  C++ wrapper code (2003) •  OpenCL code (2009) •  CUDA code (2012) •  C++ thread-safe code (2013)

March 26, 2013 © 2013 OpenEye Scienti!c Software

Page 33: Molecular Shape Searching on GPUs: A Brave New World

OpenEye Software

•  Lots of Software –  14 products –  13 software libraries

•  C++ (no SIMD) –  2.5 million lines

•  Python –  416 thousand lines

•  Java –  63 thousand lines

•  C# –  38 thousand lines

©  2012  OpenEye  Scien;fic  So>ware  

Page 34: Molecular Shape Searching on GPUs: A Brave New World

20  

12  

10  Programmers  Hardcore  Scripter  Other  stuff  

The People

•  GPGPU = ½ of a developer –  Only %2.5 of development effort

© 2012 OpenEye Scientific Software

Page 35: Molecular Shape Searching on GPUs: A Brave New World

Technology Adoption Lifecycle

March 26, 2013 © 2013 OpenEye Scienti!c Software

%2.5   %13.5   %34   %34   %16  

OpenEye  GPGPU  development  

Page 36: Molecular Shape Searching on GPUs: A Brave New World

LinkedIn skills

March 26, 2013 © 2013 OpenEye Scienti!c Software

%2.2  

Page 37: Molecular Shape Searching on GPUs: A Brave New World

Technology Adoption Lifecycle

March 26, 2013 © 2013 OpenEye Scienti!c Software

%2.5   %13.5   %34   %34   %16  

GPGPU  development  

Page 38: Molecular Shape Searching on GPUs: A Brave New World

I Believe…

•  GPGPU computing can become ubiquitous…

•  By expressing parallelism everywhere…

•  We can make it easy for our customers… –  Pre-installed in every operating system –  Integrated seamlessly into every language –  Then eventually becoming the CPU

March 26, 2013 © 2013 OpenEye Scienti!c Software

Page 39: Molecular Shape Searching on GPUs: A Brave New World

Acknowledgements

•  Nikolai Sakharnykh (NVidia) •  Dave Mullaly (HP) •  Exxact Computing

March 26, 2013 © 2013 OpenEye Scienti!c Software

Page 40: Molecular Shape Searching on GPUs: A Brave New World

Father of “ROCS”

Andrew Grant April 28th 1963 - December 29th 2012

March 26, 2013 © 2013 OpenEye Scienti!c Software

Page 41: Molecular Shape Searching on GPUs: A Brave New World

March 26, 2013 © 2013 OpenEye Scienti!c Software

Page 42: Molecular Shape Searching on GPUs: A Brave New World

Dude, where’s my color?

March 26, 2013 © 2010 OpenEye Scienti!c Software

0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  

ROCS   FastROCS  

DUD  Av

erage  AU

C  

Shape  Only  With  Color  

Page 43: Molecular Shape Searching on GPUs: A Brave New World

ROCS vs FastROCS Histogram

March 26, 2013 © 2010 OpenEye Scienti!c Software

0  

2  

4  

6  

8  

10  

12  0.10  

0.15  

0.20  

0.25  

0.30  

0.35  

0.40  

0.45  

0.50  

0.55  

0.60  

0.65  

0.70  

0.75  

0.80  

0.85  

0.90  

0.95  

1.00  

Num

ber  o

f  Targets  

Kendall  Tau  Correla5on  Coefficient  


Top Related