Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to GPU with CUDA

Onur K.
April 19, 2012

Introduction to GPU with CUDA

An introduction slideshow for GPU and NVIDIA's relevant framework; CUDA.

Onur K.

April 19, 2012

More Decks by Onur K.

Other Decks in Programming


  1. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Moore’s Law “The

    number of transistors on an integrated circuit doubles every two years.” Gordon E. Moore
  2. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Outline • Graphics

    Processing Unit (GPU) o Definition o Future Trends o Advantages o Differences o Parallel Computing o GPU types • Compute Unified Device Architecture (CUDA) o Definition o Advantages o Architecture o Kernel and Thread o Memory Management o Code sample
  3. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Definition • Dedicated

    hardware usually used for graphics but also computing o But not all vendors allow GPU to be used for computing • Provides high computing power o 128 core GPU = 500 GFLOPS (500 billion floating point operations per second) o 4 core CPU = 90 GFLOPS • Designed for mathematical or parallel problems CPU GPU
  4. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Future Trends •

    Computing is evolving from central processing on the CPU to co-processing on CPU and GPU • Can not continue to develop processor frequencies o Hardware limit 10 GHz chips • Can not continue to increase power consumption o Chip temperatures might melt chip it self • Can continue to increase the number transistors o From Moore’s Law
  5. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Future Trends •

    Computers are no longer get faster o Serial development will eventually stop • Computers will expand o Parallel development is the future • There will always more data than cores SO • Algorithms need to be also parallel
  6. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Advantages of GPUs

    • High computing output o For example 933 GFLOP/s • High memory bandwidth o For example 140 GB/s • High hardware availability o 200+ million CUDA supported GPUs
  7. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Differences from CPUs

    • GPU’s work load is parallel o CPU must be work both parallel and serial • GPU is used for maximum output in all threads o CPU is for decrease latency in one thread • GPU has lots of mathematical units o CPU has many of all units • GPU can access memory fast o CPU can cache fast and reuse data • GPU runs a program on each fragment / vertex o CPU runs lots of different processes / threads
  8. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Parallel Computing •

    Calculations are carried out simultaneously • Large problems are divided into smaller ones then solved concurrently - in parallel • Algorithms and data structures need to be redesigned • A serial algorithm can convert into parallel by finding operable synchronicity / simultaneity
  9. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Parallel Computing •

    Task Parallel o Independent processes with low communication o Most of the operating systems is using with Symmetric Multiprocessing (SMP) o Two or more identical processors with shared memory controlled by single OS instance o CPUs are generally used • Data Parallel o Large data of the one computation is executed o Independent data elements in each computational step o Can use many ALUs o Require algorithmic redesign o GPUs are generally used
  10. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Parallel Computing Decomposition

    Task Data Dependency Analysis Grouping Ordering Data Sharing
  11. Onur KESKİN Dokuz Eylül University Mechatronics Engineering GPU Types •

    Dedicated o Most powerful ones o PCI, PCIe or AGP interface o Use its own dedicated RAM o Multiple GPU support • Integrated o Uses system’s RAM o Integrated with motherboard o Cheap but low capacity
  12. Onur KESKİN Dokuz Eylül University Mechatronics Engineering GPU Types •

    Hybrid o Has shared memory with system and also dedicated memory cache o ATI HyperMemory & NVIDIA TurboCache • General Purpose GPU (GPGPU) o Allows to make computation itself rather than using CPUs o Problems need to defined as graphics – texture maps o Algorithms must be in terms of image synthesis – rendering steps o Hard to learn especially for non-graphics experts o Highly constrained memory layout and access model o Algorithms may cause bandwidth consumption
  13. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Definition • CUDA

    is NVIDIA’s parallel computing architecture • Simplify usage of GPU computing performance • CUDA enabled GPUs are already in market • Usage areas o Image and video processing o Computational biology and chemistry o Simulation of fluid dynamics o CT image reconstruction o Seismic analysis o Ray tracing
  14. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Advantages • CUDA

    enables GPU computing without any graphics knowledge • Programmer-friendly design o Let use focus on parallel algorithms o Random and unlimited access to memory o Can be used in hybrid architecture with combining CPU and GPU • Tools and Ecosystem o GPU Accelerated Applications – HPC, super computing etc. o Numerical Analysis Tools – Matlab, Mathematica, Labview etc. o GPU Accelerated Libraries – cuFFT (Fast Fourier Transform), cuBLAS (Basic Linear Algebra Subroutines) o Programming Language and APIs – C/C++, Fortran, OpenACC o Performance Analysis Tools – Parallel Nsight, NVIDIA Visual Profiler o Debugging Solutions – Paralell Nsight o Cluster Management
  15. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Architecture NVIDIA C

    Complier Cuda Optimized Libraries Integrated CPU + GPU C Source Code NVIDIA Assembly for Computing (PTX) CUDA Driver Profiler CPU Host Code Standard C Code CPU GPU
  16. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Architecture C/C++ CUDA

    Application NVCC CPU Code PTX Code Virtual Layer PTX to Target Complier G80 ... GPU Physical Layer
  17. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Kernel and Thread

    • Kernel o Execution of the parallel portions of an application on the device o C Function that runs on the GPU – device – with some restrictions o One at a time o Threads execute on each kernel o All threads runs the same code o Each thread has an ID to address and control • Thread o CUDA threads are very lightweight o CUDA uses 1000s of threads • Multicore CPUs use only a few o Threads are in cooperation • To avoid computational redundancy • Share memory access decrease bandwidth
  18. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Kernel and Thread

    • Kernel function restrictions o Can not access CPU – host memory o Must be void o No variable number of arguments (varargs) o No recursion o No static variables But function arguments automatically copied from CPU to GPU
  19. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Memory Management •

    CPU and GPU has separate memory • CPU code manages GPU memory o Allocate or free o Copy data to and from GPU o Applies to global device memory (DRAM)
  20. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Code Sample Standard

    Program CUDA Program void incArr (int *a, int N) { for(int i = 0; i < N; i++) { a[i] = a[i] + 1; } } int main() { ... incArr(a,N); } __global__ void incArr(int *a, int N) { int i = blockIdx.x * blockDim.x + threadIdx.x; if(i < N) { a[i] = a[i] + 1; } } int main() { ... dim3 dimBlock(blocksize); dim3 dimGrid(ceil(N/(float)blocksize)); incArr<<dimGrid, dimBlock>>>(a,N); }
  21. Onur KESKİN Dokuz Eylül University Mechatronics Engineering Code Sample •

    All __glabal__ and __device__ functions have access to defined variables: o dim3 gridDim • Dimensions of the grid in blocks o dim3 blockDim • Dimensions of the block in threads o dim3 blockIdx • Block index within the grid o dim3 threadIdx • Thread index within the grid
  22. Onur KESKİN Dokuz Eylül University Mechatronics Engineering References • http://developer.nvidia.com/nvidia-gpu-computing-documentation

    • http://mathema.tician.de/software/pycuda • http://openvidia.sourceforge.net/index.php/OpenVIDIA • http://gpu4vision.icg.tugraz.at/ • http://en.wikipedia.org/wiki/Graphics_processing_unit • http://en.wikipedia.org/wiki/Gpgpu • http://en.wikipedia.org/wiki/CUDA • http://www.mathworks.com/discovery/matlab-gpu.html