GPU Computing in Python

GPU Computing in Python Glib Ivashkevych junior researcher, NSC KIPT

Parallel revolution The Free Lunch Is Over: A Fundamental Turn
Toward Concurrency in Software Herb Sutter, March 2005

When serial code hits the wall. Power wall. Now, Intel
is embarked on a course already adopted by some of its major rivals: obtaining more computing power by stamping multiple processors on a single chip rather than straining to increase the speed of a single processor. Paul S. Otellini, Intel's CEO May 2004

July 2006 Feb 2007 Nov 2008 Intel launches Core 2
Duo (Conroe) Nvidia releases CUDA SDK Tsubame, first GPU accelerated supercomputer Dec 2008 OpenCL 1.0 specification released Today 50 GPU powered supercomputers in Top500

It's very clear, that we are close to the tipping
point. If we're not at a tipping point, we're racing at it. Jen-Hsun Huang, NVIDIA Co-founder and CEO March 2013 Heterogeneous computing becomes a standard in HPC and programming has changed

Heterogeneous computing CPU main memory GPU cores GPU memory multiprocessors
Host Device

CPU GPU general purpose sophisticated design and scheduling perfect for
task parallelism highly parallel huge memory bandwidth lightweight scheduling perfect for data parallelism

Anatomy of GPU: multiprocessors GPU MP shared memory GPU is
composed of tens of multiprocessors (streaming processors), which are composed of tens of cores = hundreds of cores

Compute Unified Device Architecture is a hierarchy of computation memory
synchronization

Compute hierarchy software kernel hardware abstractions hardware thread thread block
grid of blocks core multiprocessor GPU

Compute hierarchy thread threadIdx thread block blockIdx, blockDim grid of
blocks gridDim

Python fast development huge # of packages: for data analysis,
linear algebra, special functions etc metaprogramming Convenient, but not that fast in number crunching

PyCUDA Wrapper package around CUDA API Convenient abstractions: GPUArray, random
numbers generation, reductions & scans etc Automatic cleanup, initialization and error checking, kernels caching Completeness

GPUArray NumPy-like interface for GPU arrays Convenient creation and manipulation
routines Elementwise operations Cleanup

SourceModule Abstraction to create, compile and run GPU code GPU
code to compile is passed as a string Control over nvcc compiler options Convenient interface to get kernels

Metaprogramming GPU code can be created at runtime PyCUDA uses
mako template engine internally Any template engine is ok to create GPU source code. Remember about codepy Create more flexible and optimized code

Installation numpy, mako, CUDA driver & toolkit are required Boost.Python
is optional Dev packages: if you build from source Also: PyOpenCl, pyfft

GPU computing resources Documentation Intro to Parallel Programming by David
Luebke (Nvidia) and John Owens (UC Davis) Heterogeneous Parallel Programming by Wen-mei W. Hwu (UIUC) Several excellent books

GPU Computing in Python

GPU Computing in Python

Yehor Nazarkin

More Decks by Yehor Nazarkin

Other Decks in Programming

Featured

Transcript

GPU Computing in Python Glib Ivashkevych junior researcher, NSC KIPT

Parallel revolution The Free Lunch Is Over: A Fundamental Turn

When serial code hits the wall. Power wall. Now, Intel

July 2006 Feb 2007 Nov 2008 Intel launches Core 2

It's very clear, that we are close to the tipping

Heterogeneous computing CPU main memory GPU cores GPU memory multiprocessors

CPU GPU general purpose sophisticated design and scheduling perfect for

Anatomy of GPU: multiprocessors GPU MP shared memory GPU is

Compute Unified Device Architecture is a hierarchy of computation memory

Compute hierarchy software kernel hardware abstractions hardware thread thread block

Compute hierarchy thread threadIdx thread block blockIdx, blockDim grid of

Python fast development huge # of packages: for data analysis,

PyCUDA Wrapper package around CUDA API Convenient abstractions: GPUArray, random

GPUArray NumPy-like interface for GPU arrays Convenient creation and manipulation

SourceModule Abstraction to create, compile and run GPU code GPU

Metaprogramming GPU code can be created at runtime PyCUDA uses

Installation numpy, mako, CUDA driver & toolkit are required Boost.Python

GPU computing resources Documentation Intro to Parallel Programming by David