python on hpcc

Post on 02-May-2022

12 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Dr. Yongjun Choi

Python on HPCCICER Workshop, Oct/12/2020

Scope of this workshop• What we want to do:

• Explain What MSU HPC are doing to support Python users.

• Provide guidance to help users improve Python performance at the the HPC

• Point out tools that support developers of Python in HPC

• What we assume:

• You know and use Python, or

• You know and use HPC and are curious about using Python in your own HPC work.

Getting Started with Python Resouces• https://www.python.org/about/gettingstarted/

• https://wiki.python.org/moin/BeginnersGuide/

• https://www.codecademy.com/learn/python/

• https://www.coursera.org/specializations/python/

• https://software-carpentry.org/lessons/

• https://pymotw.com/

• https://wiki.hpcc.msu.edu/display/ITH/Python

• https://www.youtube.com/watch?v=_uQrJ0TkZlc

• py4e.com

• ……

Python is a very popular languge

Most popular coding Languages of 2020: www.tiobe.com/tiobe-index

https://insights.stackoverflow.com/survey/2020#technology-most-loved-dreaded-and-wanted-languages-loved

Why is Python so popular?• Easy

• Clean, clear syntax

• Multi-paradigm, integreted

• No manual garbage collection

• Flexible, full-feature data strutures

• Extensive standard libraries

• Open-source packages

The Scientific Python Stacks

• Primary Uses:

• Script workflows for both data analysis and simulations

• Perform exploratory, interactive data analysis and vizualization

Python at MSU HPCC• HPCC supports Python

• Maximizing Python performance can be challenging:

• Interpreted languages are difficult to optimize.

• Designed only one native thread can execute at a time.

• Designed and implemented without considering realities of HPC.

Basic Guidelines for Python in HPC• Identify and exploit parallelism at the core, node, and cluster levels

• Understand and apply Numpy array syntax and its broadcasting rules:

• https://numpy.org/doc/stable

• https://numpy.org/doc/stable/user/basics.broadcasting.html

• Measure your codes’ performance using profiling tools

• https://stackify.com/how-to-use-python-profilers-learn-the-basics/

Python at HPCC• HPC Module

• module avail python

• Module spider python

• module load python

• module load Python/2.7.9

• Or install your own Python (many options, but we suggest Anaconda)

• System python (/usr/bin/python): risky, not recommended.

Using Python In HPC• limited packages: only a few very famous packages are installed such as

Numpy, Matplotlib

• Why? Python has a lot of packages, modules and libraries that researchers may want to use. However, it is difficult for HPCC to keep up with and avoid conflicts between different versions of packages and libraries.

• https://wiki.hpcc.msu.edu/display/ITH/Python

• Virtual environment (virtualenv)

• Anaconda

Virtualenv• Based on HPC Python: Users control packages. HPC controls Python

• https://wiki.hpcc.msu.edu/x/xIEVAg

Anaconda (recommended)• Easy to install.

• Install on your home or research space

• Fully control by users

• https://www.anaconda.com

• Download https://www.anaconda.com/products/individual

• https://wiki.hpcc.msu.edu/display/ITH/Using+conda

• pip and Anaconda can be used for package installation. However, it would be better to stick to one way.

• pip/conda can not uninstall packages which were installed via conda/pip.

Jupyter notebook• https://ondemand.hpcc.msu.edu/pun/sys/dashboard

Can my Python code be faster?• Vectorization

• Do not using loop if possible. Instead, use Numpy.

• eg: ex01.py

• Parallelization (MPI, OpenMP, OpenACC, Thread)

• Workflows - eg: simultaneously launching with job-array (eg: ex02.py, and ex02.sb)

• Numba: has some restrictions, but it makes your code very fast!

• eg: https://murillogroupmsu.com/numba-versus-c/

• ex03.py

Use Threaded Libraries• Packages like NumPy, SciPy are already built with MPI and thread support via

BLAS/LAPACK, MKL

• Don’t reimplement solvers in pure Python

• Many of your favorite threaded libraries and packages already have bindings:

• PyTrilinos

• Petsc4py

• Elemental

• SLEPc

• Do not try to reinvent wheels. If it is not new, probably it is already implemented in a very nice way.

Using Compiled Modules• Methods of using pre-compiled, threaded GIL-free code for speed include:

• Cython

• F2py

• PyBind11

• Swig

• Boost

• Ctypes

• Writing bindings in C/C++ (https://docs.python.org/3/extending/extending.html/)

Profiling: cProfile, SnakeViz, VTune (intel) etc• cProfile: https://docs.python.org/3/library/profile.html

• SankeViz: https://jiffyclub.github.io/snakeviz/

• VTune: https://software.intel.com/content/www/us/en/develop/tools/vtune-profiler.html

• module load Vtune

• Check speed (time), calls (frequency), memory

Parallelization: numba - parallel• Automatic parallelization with numba

• Very easy to use - You need only one line decorator: @njit(paralle=True)

• More information:

• https://numba.pydata.org/numba-doc/latest/user/parallel.html

Parallelization: numba - cuda

• Only works with NVIDIA GPU cards

• Easy to use (at least much easier to use than other languages).

• https://github.com/keipertk/pygpu-workshop

Parallelization: MPI• MPI

• It is the HPC paramdigm for inter-process communications

• MPI makes full use of HPC envirionments

• Well-supported tools exist

• Python-MPI bindings have been developed since 1996

Parallelization: MPI - mpi4py• mpi4py

• Pythonic wrapping of the system’s native MPI

• Provides almost all MPI-1, 2 and common MPI-3 features

• Very well maintained

• Distributed with major Python distributions

• Portalbe and scalable

• Requires only - NumPy, Cython (build only), and MPI library

• https://mpi4py.readthedocs.io/en/stable/#

More Resources• Getting help:

• Office hrs: Mon/Thur 1-2PM

• https://icer.msu.edu/contact

• Documentation

• https://wiki.hpcc.msu.edu

top related