python on gacrc computing resources...python overview –scientific distributions • anaconda “a...

32
Georgia Advanced Computing Resource Center EITS/University of Georgia Zhuofei Hou, [email protected] Python on GACRC Computing Resources 1 10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Upload: others

Post on 08-Jul-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Georgia Advanced Computing Resource CenterEITS/University of GeorgiaZhuofei Hou, [email protected]

Python on GACRC Computing Resources

110/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Page 2: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Outline

• GACRC

• Python Overview

• Python on Clusters

• Python Packages on Clusters

• Run Python Interactively on Clusters

• Run Python Batch Job on Clusters

210/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Page 3: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

GACRCWho Are We? Georgia Advanced Computing Resource Center Collaboration between the Office of Vice President for Research (OVPR) and

the Office of the Vice President for Information Technology (OVPIT) Guided by a faculty advisory committee (GACRC-AC)

Why Are We Here? To provide computing hardware and network infrastructure in support of high-

performance computing (HPC) at UGA

Where Are We? http://gacrc.uga.edu (Web) http://wiki.gacrc.uga.edu (Wiki) http://gacrc.uga.edu/help/ (Web Help) https://wiki.gacrc.uga.edu/wiki/Getting_Help (Wiki Help)

310/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Page 4: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Python Overview – Language

• Open source general-purpose scripting language (https://www.python.org/)

• Working with procedural, object-oriented, and functional programming

• Glue language with Interfaces to C/C++ (via SWIG), Object-C (via PyObjC),

Java (Jython), and Fortran (via F2PY) , etc.

(https://wiki.python.org/moin/IntegratingPythonWithOtherLanguages)

• Mainstream version is 2.7.x; new version is 3.5.x (as to March 2016)

410/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Page 5: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Python Overview – Modules

• Python has a large collection of built-in modules included in standard distributions:

https://docs.python.org/2/py-modindex.html

https://docs.python.org/3/py-modindex.html

• Many third-party packages for scientific modules:

5

NumPy

Sympy

SciPy

Biopy

Matplotlib

10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Page 6: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Python Overview – Scientific Modules

NumPy: Matlab-ish capabilities, fast N-D array operations, linear algebra, etc.

(http://www.numpy.org/)

SciPy: Fundamental library for scientific computing (http://www.scipy.org/)

Sympy: Symbolic mathematics (http://www.sympy.org/en/index.html)

matplotlib: High quality plotting (http://matplotlib.org/)

Biopy: Phylogenetic exploration (https://code.google.com/archive/p/biopy/)

6

A scientific Python distribution may include all those packages for you!

10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Page 7: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Python Overview – Scientific Distributions

• Anaconda

“A Python distribution including ~200 of the most popular Python packages for science, math, engineering, and data analysis.”

Supports Linux, Mac and Windows (https://www.continuum.io/)

• Python(x,y)

Windows only (http://python-xy.github.io/)

• WinPython

Windows only (http://winpython.github.io/)

710/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Page 8: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Python on Clusters

• Python

https://wiki.gacrc.uga.edu/wiki/Python

https://wiki.gacrc.uga.edu/wiki/Python-Sapelo

• Anaconda Python

https://wiki.gacrc.uga.edu/wiki/Anaconda

https://wiki.gacrc.uga.edu/wiki/Anaconda-Sapelo

810/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Page 9: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Python on zcluster

9

Version Installation Path Invoke Command

2.4.3 (default) /usr/bin python

2.7.2* /usr/local/python/2.7.2 python2.7

2.7.8 /usr/local/python/2.7.8 python2.7.8

3.3.0 /usr/local/python/3.3.0 python3

3.4.0 /usr/local/python/3.4.0 python3.4

10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

* Most Python site-packages GACRC installed are for the version of 2.7.2 on zcluster

Page 10: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Python on Sapelo

10

Version Installation Path Module Load Invoke Command

2.6.6 (default) /usr/bin

python

2.7.8 /usr/local/apps/python/2.7.8 module load python/2.7.8

3.4.3 /usr/local/apps/python/3.4.3 module load python/3.4.3 python3

10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Page 11: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Anaconda Python on zcluster

11

Version Installation PathPython Version

ExportsInvoke

Command

2.3.0 /usr/local/anaconda/2.3.0 2.7.11export

PATH=/usr/local/anaconda/2.3.0/bin:$PATH

python

3-2.2.0 /usr/local/anaconda/3-2.2.0 3.4.3export

PATH=/usr/local/anaconda/3-2.2.0/bin:$PATH

10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Page 12: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Anaconda Python on Sapelo

12

Version Installation PathPythonVersion

Module LoadInvoke

Command

2.2.0 /usr/local/apps/anaconda/2.2.0 2.7.12 module load anaconda/2.2.0

python2.5.0 /usr/local/apps/anaconda/2.5.0 2.7.11 module load anaconda/2.5.0

3-2.2.0 /usr/local/apps/anaconda/3-2.2.0 3.4.3 module load anaconda/3-2.2.0

10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Page 13: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Python Packages on Clusters

• Python Packages

https://wiki.gacrc.uga.edu/wiki/Python

https://wiki.gacrc.uga.edu/wiki/Python-Sapelo

• Anaconda Python Packages

https://wiki.gacrc.uga.edu/wiki/Anaconda

https://wiki.gacrc.uga.edu/wiki/Anaconda-Sapelo

1310/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Page 14: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Python Packages on Clusters

How to know if the package you need is already installed on clusters?

1. python –c ‘import pkgName; print pkgName__version__’

2. conda list pkgName

3. python -m pip list | grep pkgName

Examples: Next page!

1410/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Page 15: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Python Packages on Clusters

1510/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

zcluster

1. python2.7 –c ‘import numpy; print numpy.__version__’

2. python2.7.8 –m pip list | grep numpy

3. export PATH=/usr/local/anaconda/2.3.0/bin:$PATH

conda list numpy

Sapelo

1. module load python/2.7.8

python –c ‘import numpy; print numpy.__version__’

2. python -m pip list | grep numpy

3. module load anaconda/3-2.2.0

conda list numpy

Page 16: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Common Python Packages on zcluster

1610/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Packagepython2.7

(python2.7.2)python2.7.8

(python2.7.8)python3.4

(python3.4.0)Anaconda2.3.0(python2.7.11)

Anaconda3-2.2.0(python3.4.3)

Numpy 1.11.0 1.10.1 1.9.1 1.10.2 1.10.2

Scipy 0.10.1 0.14.1 n/a 0.15.1 0.15.1

Biopython 1.65 1.67 n/a 1.66 1.66

Matplotlib 1.3.1 1.3.1 1.3.1 1.4.3 1.5.0

Cython 0.16 0.19.1 0.19.1 0.23.2 0.23.4

Pandas 0.17.0 n/a n/a 0.17.1 0.15.2

Scikit-image n/a 0.10.1 n/a 0.11.3 0.11.2

Scikit-learn 0.15.2 0.17 n/a 0.16.1 0.15.2

Networkx 2.0.dev 1.11 1.11 1.9.1 1.9.1

Requests 2.5.1 2.8.0 n/a 2.9.0 2.9.0

Page 17: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Common Python Packages on Sapelo

1710/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Packagepython

(python2.7.8)python3

(python3.4.3)Anaconda2.5.0(python2.7.11)

Anaconda3-2.2.0(python3.4.3)

Numpy 1.9.2 1.9.2 1.10.4 1.9.2

Scipy 0.16.1 n/a 0.17.0 0.15.1

Biopython 1.66 n/a 1.66 1.66

Matplotlib 1.4.3 1.5.1 1.5.1 1.4.3

Cython 0.24.1 0.22 0.23.4 0.22

Pandas 0.17.1 0.17.1 0.17.1 0.15.2

Scikit-image n/a n/a 0.11.3 0.11.2

Scikit-learn 0.17.1 n/a 0.17 0.15.2

Networkx n/a n/a 1.11 1.9.1

Requests 2.5.1 n/a 2.9.1 2.6.0

Page 18: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Python Package Paths on zcluster

18

Version Python Package Path Python Shared Library Path

2.7.2/usr/local/python/2.7.2/lib/python2.7/usr/local/python/2.7.2/lib/python2.7/site-packages

N/A

2.7.8/usr/local/python/2.7.8/lib/python2.7/usr/local/python/2.7.8/lib/python2.7/site-packages

/usr/local/python/2.7.8/lib

3.3.0/usr/local/python/3.3.0/lib/python3.3/usr/local/python/3.3.0/lib/python3.3/site-packages

N/A

3.4.0/usr/local/python/3.4.0/lib/python3.4/usr/local/python/3.4.0/lib/python3.4/site-packages

N/A

10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

python can find those packages automatically! To be exported in LD_LIBRARY_PATH

Page 19: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Python Package Paths on Sapelo

19

Version Python Package Path Python Shared Library Path

2.7.8/usr/local/apps/python/2.7.8/lib/python2.7

/usr/local/apps/python/2.7.8/lib/python2.7/site-packages/usr/local/apps/python/2.7.8/lib

3.4.3/usr/local/apps/python/3.4.3/lib/python2.7

/usr/local/apps/python/3.4.3/lib/python3.4/site-package/usr/local/apps/python/3.4.3/lib

10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

python can find those packages automatically! To be exported in LD_LIBRARY_PATH

Page 20: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Anaconda Python Package Paths on zcluster

20

Version Installation PathPython Version

Python Package Path

2.3.0 /usr/local/anaconda/2.3.0 2.7.11

/usr/local/anaconda/2.3.0/lib/python2.7

/usr/local/anaconda/2.3.0/lib/python2.7/site-

packages

3-2.2.0 /usr/local/anaconda/3-2.2.0 3.4.3

/usr/local/anaconda/3-2.2.0/lib/python3.4

/usr/local/anaconda/3-2.2.0/lib/python3.4/site-

packages

10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

python can find those packages automatically!Python Shared Libraries were built for all versions in lib

Page 21: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Anaconda Python Package Paths on Sapelo

21

Version Installation PathPythonVersion

Python Package Path

2.2.0 /usr/local/apps/anaconda/2.2.0 2.7.12/usr/local/apps/anaconda/2.2.0/lib/python2.7/usr/local/apps/anaconda/2.2.0/lib/python2.7/site-packages

2.5.0 /usr/local/apps/anaconda/2.5.0 2.7.11/usr/local/apps/anaconda/2.5.0/lib/python2.7/usr/local/apps/anaconda/2.5.0/lib/python2.7/site-packages

3-2.2.0 /usr/local/apps/anaconda/3-2.2.0 3.4.3/usr/local/apps/anaconda/3-2.2.0/lib/python3.4/usr/local/apps/anaconda/3-2.2.0/lib/python3.4/site-packages

10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

python can find those packages automatically!Python Shared Libraries were built for all versions in lib

Page 22: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Run Python Interactively on Clusters

• Run Python Interactively

• Run Anaconda Python Interactively

2210/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

DO NOT run jobs from login node; Run interactive tasks from interactive node:

zcluster.rcc.uga.edu (zcluster)

sapelo1.gacrc.uga.edu (Sapelo)

https://wiki.gacrc.uga.edu/wiki/Training - Download

qlogin

interactive node

Page 23: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Run Python Interactively on Clusters

23

zhuofei@compute-18-16:~$ python

Python 2.4.3 (#1, Oct 23 2012, 22:02:41)

[GCC 4.1.2 20080704 (Red Hat 4.1.2-54)] on linux2

Type "help", "copyright", "credits" or "license" for

more information.

>>> a = 7

>>> e = 2

>>> a**e

49

>>>

[zhuofei@n15 ~]$ python

Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)

[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2

Type "help", "copyright", "credits" or "license" for

more information.

>>> a = 7

>>> e = 2

>>> a**e

49

>>>

10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Running on zcluster interactive node: Running on Sapelo interactive node:

Page 24: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Run Python Interactively on Clusters

24

zhuofei@compute-18-16:~$ /usr/local/python/2.7.8/bin/python myScript.py

2.7.8 (default, Jan 7 2015, 15:33:35)

[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]

49

[zhuofei@n15 ~]$ module load python/2.7.8

[zhuofei@n15 ~]$ python myScript.py

2.7.8 (default, Sep 26 2014, 07:26:46)

[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]

49

import sysprint sys.version

a = 7e = 2print a**e

10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Page 25: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Run Python Interactively on Clusters

25

zhuofei@compute-18-16:~$ chmod u+x myScript.py

zhuofei@compute-18-16:~$ ./myScript.py

2.7.8 (default, Jan 7 2015, 15:33:35)

[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]

49

[zhuofei@n15 ~]$ chmod u+x myScript.py

[zhuofei@n15 ~]$ ./myScript.py

2.7.8 (default, Sep 26 2014, 07:26:46)

[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]

49

#!/usr/local/python/2.7.8/bin/python

import sysprint sys.version

a = 7; e = 2print a**e

tells exec where python is on zcluster

10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

#!/usr/local/apps/python/2.7.8/bin/python2.7

import sysprint sys.version

a = 7; e = 2print a**e

tells exec where python is on Sapelo

Page 26: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Run Python Interactively on Clusters

26

zhuofei@compute-18-16:~$ chmod u+x myScript.py

zhuofei@compute-18-16:~$ export PATH=/usr/local/python/2.7.8/bin:$PATH

zhuofei@compute-18-16:~$ ./myScript.py

2.7.8 (default, Jan 7 2015, 15:33:35)

[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]

49

[zhuofei@n15 ~]$ chmod u+x myScript.py

[zhuofei@n15 ~]$ module load python/2.7.8

[zhuofei@n15 ~]$ ./myScript.py

2.7.8 (default, Sep 26 2014, 07:26:46)

[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]

49

#!/usr/bin/env python

import sysprint sys.version

a = 7; e = 2print a**e

env tells exec where python is by searching PATH

10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Page 27: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

[zhuofei@n15 ~]$ chmod u+x myScript.py

[zhuofei@n15 ~]$ module load anaconda/2.5.0

[zhuofei@n15 ~]$ ./myScript.py

2.7.11 |Anaconda 2.5.0 (64-bit)| (default, Dec 6 2015, 18:08:32)

[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]

49

Run Anaconda Python Interactively on Clusters

27

zhuofei@compute-18-16:$ chmod u+x myScript.py

zhuofei@compute-18-16:$ export PATH=/usr/local/anaconda/2.3.0/bin:$PATH

zhuofei@compute-18-16:$ ./myScript.py

2.7.11 |Anaconda 2.3.0 (64-bit)| (default, Dec 6 2015, 18:08:32)

[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]

49

#!/usr/bin/env python

import sysprint sys.version

a = 7; e = 2print a**e

10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Page 28: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Run Python Batch Job on Clusters

• Run Python Batch Job on zcluster

• Run Python Bach Job on Sapelo

2810/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

Note: zcluster Job Working Space: /escratch4/username/

Sapelo Job Working Space: /lustre1/username/

https://wiki.gacrc.uga.edu/wiki/Training - Download

Page 29: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Run Python Batch Job on zcluster

2910/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

#!/bin/bash

cd working_directory

export PATH=/usr/local/python/2.7.8/bin:$PATH

export PYTHONPATH=/usr/local/python/2.7.8/lib/python2.7:/usr/local/python/2.7.8/lib/python2.7/site-\

packages:$PYTHONPATH

time python myScript.py [options]

qsub -q rcc-30d sub.sh optional qsub options, e.g., -pe threads 4 -l mem_total=20g

Page 30: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Run Anaconda Python Batch Job on zcluster

3010/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

#!/bin/bash

cd working_directory

export PATH=/usr/local/anaconda/2.3.0/bin:$PATH

export PYTHONPATH=/usr/local/anaconda/2.3.0/bin:/usr/local/anaconda/2.3.0/lib/python2.7:$PYTHONPATH

time python myScript.py [options]

qsub -q rcc-30d sub.sh optional qsub options, e.g., -pe threads 4 -l mem_total=20g

Page 31: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Run (Anaconda) Python Batch Job on Sapelo

3110/26/2016 PYTHON ON GACRC COMPUTING RESOURCES

#PBS -S /bin/bash

#PBS -q batch

#PBS -N PythonJob1

#PBS -l nodes=1:ppn=4:AMD

#PBS -l walltime=48:00:00

#PBS -l mem=10gb

cd $PBS_O_WORKDIR

module load python/3.4.3

time python3 myScript.py [options]

qsub sub.sh #PBS -S /bin/bash

#PBS -q batch

#PBS -N PythonJob1

#PBS -l nodes=1:ppn=4:AMD

#PBS -l walltime=48:00:00

#PBS -l mem=10gb

cd $PBS_O_WORKDIR

module load anaconda/3-2.2.0

time python myScript.py [options]

Page 32: Python on GACRC Computing Resources...Python Overview –Scientific Distributions • Anaconda “A Python distribution including ~200 of the most popular Python packages for science,

Thank You!

3210/26/2016 PYTHON ON GACRC COMPUTING RESOURCES