Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to Run Neural Nets on GPUs - Strata

How to Run Neural Nets on GPUs - Strata

This talk is just what the title says. I will demonstrate how to run a neural net on a GPU because neural nets are solving some interesting problems and GPUs are a good tool to use.

Neural networks have regained popularity in the last decade plus because there are real world applications we are finally able to apply them to (e.g. Siri, self driving cars, facial recognition). This is due to significant improvements in computational power and the amount of data that is available for building the models. However, neural nets still have a barrier to entry as a useful tool in companies because they can be computationally expensive to obtain value and implement.

GPUs are popular processors in gaming and research due to their computational speed. Deep Neural Net’s parallel structures (millions of identical nodes that perform the same operation on different data), are ideal for GPU’s. Depending on the neural net, you can use a single server with GPUs vs. a CPU cluster and improve communication latency as well as reduces size and power consumption. Running an optimization method (training algorithm) like Stochastic Gradient Descent on a CPU vs. a GPU can be up to 40 times faster.

This talk will briefly explain what neural nets are and why they’re important, as well as give context about GPUs. Then I will walk through the code and actually launch a neural net on a GPU. I will cover key pitfalls you may hit and techniques to diagnose and troubleshoot. You will walk away understanding how to approach using GPUs on your own and have some resources to dive into for further understanding.

Melanie Warrick

December 03, 2015
Tweet

More Decks by Melanie Warrick

Other Decks in Technology

Transcript

  1. @nyghtowl Artificial Neural Nets Input Output Hidden Run until error

    stops improving = converge Loss Function Output k j X M kj W y
  2. @nyghtowl Real World... - Real-time Language Translation (NLP) - Auto

    Image Tagging (Computer Vision) - Movie Recs (Recommender Engines)
  3. @nyghtowl CPUs GPUs focus decision maker laborer processing sequential serial

    parallel cores 4 - 48 100s - 1000s RAM 16.8M TB 2-12GB ALU 4-12 32-bit instructs / clock 32K 32-bit instructs / clock FLOPs faster clock speed ~ 1000s MHz ~ 100s MHz vs
  4. @nyghtowl Dist-Belief: YouTube Image Rec Google - 2012 Stanford -

    2013 - 1K CPUs = 16K cores - $5B - week - 3 GPUs = 18K cores - $33K - week
  5. @nyghtowl GPU Hardware & Software Nvidia AMD Lower power &

    quieter Lower price AWS & Google Macs CUDA & OpenCL OpenCL • GeForce (Consumer) • Quadro (Prof) & Tesla ( HPC) • Radeon (Consumer) • FirePro (Prof & HPC) Titan X (3K cores & 12GB RAM)
  6. @nyghtowl Distributed Approach HW - 1 chip - mult chips

    - mult boxes SW - split data - split NN model - mix of both
  7. @nyghtowl Focus on Matrix Math W 3 W 4 W

    2 W 1 Input Output Hidden Loss Function Output k j X M kj W y
  8. @nyghtowl Independent Math W1 X a b c d e

    f g 1 Example h W2 a b c d e f g h W3 a b c d e f g h 1st Hidden Layer W4 a b c d e f g h
  9. @nyghtowl Ex: Cuda GPU Code allocate memory cudaMalloc((void**)&dev_a, N *

    sizeof(int)); data in cudaMemcpy( dev_a, a, N * sizeof(int), cudaMemcpyHostToDevice); run kernel cublasDgemm(); OR add<<N, 1>>( dev_a, dev_b, dev_c ); data out cudaMemcpy( c, dev_c, N * sizeof(int), cudaMemcpyDeviceToHost ); sync cudaDeviceSynchronize(); free memory cudaFree(dev_a);
  10. @nyghtowl Neural Net Packages Java / Scala Python C /

    C++ Lua Matlab DL4J Theano TensorFlow Caffe Torch ConvNet Neon Chainer MXNet matrbm Lasagne Graphlab SINGA DBN Keras NuPIC Eblearn Kayak PyBrain Cuda-Convnet Blocks PyLearn2
  11. @nyghtowl Example: MNIST ~ “Hello World” • Classify handwritten digits

    0-9 • Each pixel is an input • Input value ranges 0-255 (white to black)
  12. 0 0 1 0 0 0 0 1 1 0

    0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 Example: Input @nyghtowl
  13. Example: MNIST Structure Output Hidden ...784 Nodes 1000 10 outputs

    4 9 8 7 6 5 4 3 2 1 0 @nyghtowl Output k j X M kj W y Input
  14. @nyghtowl Tools for Troubleshooting Debugging - Cuda-Memcheck - Cuda-GDB Profiling

    - Nvidia Visual Profiler - CUDA Profiling Interface Nvidia Nsight
  15. @nyghtowl no CUDA-capable device is detected => check GPU is

    running sudo kextload /System/Library/Extensions/CUDA.kext kill => reduce size of mini-batch and check flushing frequency runs slow => check syncing interval across the chip Troubleshoot Pointers
  16. @nyghtowl • 10 Misconceptions about Neural Networks http://www.turingfinance.com/misconceptions-about- neural-networks/#blackbox •

    Nature of Code: Neural Networks http://natureofcode.com/book/chapter-10-neural-networks/ • Theano Tutorial http://deeplearning.net/software/theano/tutorial/index.html#tutorial • Machine Learning (Coursera-Ng) https://class.coursera.org/ml-005/lecture • Hacker’s Guide to Neural Nets (Stanford - Karpathy) https://karpathy.github.io/neuralnets/ • Neural Networks for Machine Learning (Coursera - Hinton) https://class.coursera.org/neuralnets-2012- 001/lecture • Neural Nets and Deep Learning http://neuralnetworksanddeeplearning.com/ • Deep Learning Stanford CS http://deeplearning.stanford.edu/ • Deep Learning Tutorial (NYU - LeCun) http://www.cs.nyu.edu/~yann/talks/lecun-ranzato-icml2013.pdf • Deep Learning Tutorial (U Montreal - Bengio) http://deeplearning.net/tutorial/deeplearning.pdf • Tutorial on Deep Learning for Vision https://sites.google.com/site/deeplearningcvpr2014/ • Deep Learning http://deeplearning.net/ References: Neural Nets
  17. @nyghtowl • An Introduction to Using GPUs for Computation: http://www.stat.berkeley.edu/scf/paciorek-

    gpuWorkshop.html • Comparison of GPU and CPU implementations of mean-firing rate neural networks on parallel hardware http://www.researchgate.net/publication/233392650_Comparison_of_GPU-_and_CPU- implementations_of_mean-firing_rate_neural_networks_on_parallel_hardware • My first CUDA program! https://llpanorama.wordpress.com/2008/05/21/my-first-cuda-program/ • PyCuda Tutorial: http://documen.tician.de/pycuda/tutorial.html#transferring-data • Accelerated Machine Learning with the cuDNN Deep Neural Network Library http://devblogs.nvidia. com/parallelforall/accelerate-machine-learning-cudnn-deep-neural-network-library/ • Neural Networks with Parallel and GPU Computing: https://www.mathworks.com/help/nnet/ug/neural- networks-with-parallel-and-gpu-computing.html • One weird trick for parallelizing convolutional neural networks: http://arxiv.org/pdf/1404.5997v2.pdf • Which GPU for deep-learning: https://timdettmers.wordpress.com/2014/08/14/which-gpu-for-deep- learning/ • Why a GPU mines faster than a CPU: https://developer.nvidia.com/gpu-accelerated-libraries https://en.bitcoin.it/wiki/Why_a_GPU_mines_faster_than_a_CPU References: GPUs
  18. @nyghtowl • Deeplearning4J: http://nd4j.org/getstarted.html • Caffe: http://caffe.berkeleyvision.org/install_osx.html • Theano: http://deeplearning.net/software/theano/install.html

    • PyCuda: http://wiki.tiker.net/PyCuda/Installation/Mac#Pre-install_Tips • Installing CUDA, OpenCL, & PyOpenCL on AWS EC2: http://vasir.net/blog/opencl/installing-cuda-opencl- pyopencl-on-aws-ec2 References: Setup
  19. @nyghtowl References: Images • http://www.texample.net/tikz/examples/neural-network/ • http://jaoying-google.blogspot.com/2012_12_01_archive.html • https://www.kaggle.com/forums/f/15/kaggle-forum/t/10878/feature-representation-in-deep-learning •

    http://users.clas.ufl.edu/glue/longman/1/einstein.html • http://www.nvidia.com/object/what-is-gpu-computing.html • https://www.classes.cs.uchicago.edu/archive/2013/spring/12300-1/pa/pa1/ • http://www.hitechreview.com/it-products/pc/nvidia-strikes-back-presents-tesla-k20x-graphics- card/40392/ • http://disney.wikia.com/wiki/Magic_Brooms • http://www.playinterference.com/view/7713/ • http://www.drmichellemazur.com/wp-content/uploads/2013/08/fowl_storm.jpg • https://eda360insider.wordpress.com/2011/09/14/what-would-you-do-with-a-23000-simultaneous- thread-school-of-piranha-asks-nvidia/ • http://www.nvidiadefect.com/what-exactly-is-the-nvidia-defect-t3.html • http://www.maximumpc.com/everything-you-need-to-know-about-nvidias-gf100-fermi-gpu/ • http://www.theregister.co.uk/2013/11/16/nvidia_reveals_cuda_6_joins_cpugpu_shared_memory_party/ • http://adailypinch.com/bill-cat-spirit-animal • https://stackoverflow.com/questions/20146098/can-cpu-process-write-to-memoryuva-in-gpu-ram- allocated-by-other-cpu-process
  20. @nyghtowl • Tim Elser • Tarin Ziyaee • Phillip Culliton

    • Megan Speir • Lindsay Cade • Isabel Markl • Jeremy Dunck • Erin O’Connell • Cyprien Noel • Christian Fernandez • Charles Ruhland • Bryan Catanzaro • Adam Gibson Special Thanks
  21. @nyghtowl Last Points - NNs ~ personalization - Training NNs

    is hard - GPUs makes training faster - Same thing multiple times at the same time Go play with GPUs!
  22. @nyghtowl How to Run Neural Nets on GPUs Melanie Warrick

    github.com/nyghtowl/Neural_Nets_GPUs(code) skymind.io (company) gitter.im/deeplearning4j/deeplearning4j (chat)