Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Compute Kernel with Metal

codelynx
February 11, 2017

Compute Kernel with Metal

This is a presentation materials at Yokohama iOS developer meeting in Feb. 2017. It gives you a very basic idea of Metal computing Kernel, and show the example codes.

codelynx

February 11, 2017
Tweet

More Decks by codelynx

Other Decks in Programming

Transcript

  1. Executive Summery • Computing method using GPU • Super Parallel

    Computing • May not suitable for complex algorithm
  2. Device, CommandQueue and
 CommandBuffer • MTLDevice • MTLCreateSystemDefaultDevice() • MTLCommandQueue

    • device.makeCommandQueue() • MTLCommandBuffer • commandQueue.makeCommandBuffer() MTLCommandQueue MTLCommandBuffer MTLComputeComman dEncoder
  3. Library, Function and MTLComputePipelineState • MTLLibrary • try! device.makeLibrary(source: shaderSource,

    options: nil) • device.newDefaultLibrary() • MTLFunction • library.makeFunction(name: "bezier_kernel") • MTLComputePipelineState • .library.makeComputePipelineState(function: function) MTLLibrary MTLFunction MTLComputePipelineState
  4. Computing Kernel • C++ 14 subset shading language • Restrictions

    • lambda expressions, dynamic_cast operator, type identification, recursive function calls, new and delete operators, noexcept operator, goto statement, register, thread_local storage qualifiers, virtual function qualifier, derived classes and exception handling
  5. Scalar Types • bool, char, int8_t, unsigned char, uchar •

    short, unsigned short, ushort • int, unsigned int, uint – 32bit • half – 16bit half precision, float – 32bit single precision • size_t, ptrdiff_t, void • no double
  6. Vector and Matrix Types • booln • charn, shortn, ucharn,

    ushortn, uintn • halfn, floatn • halfnxm, floatnxm * n is a number
  7. Glance a code TUSVDU.Z7FSUFY*O\  GMPBUQPTJUJPO  GMPBUDPMPS ^
 


    TUSVDU.Z7FSUFY0VU\  GMPBUQPTJUJPO  GMPBUDPMPS ^
 LFSOFMWPJENZ@DPNQVUF@LFSOFM   DPOTUBOU.Z7FSUFY*O WFSUJDFT<<CVGGFS  >>   EFWJDF.Z7FSUFY0VU PVU7FSUFYFT<<CVGGFS  >>   VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>> \  ǘ ^ * Just for getting an idea, not working code ←Defining a structure ↓ Defining a kernel code ↓ Specifying Buffer Index
  8. Qualifiers LFSOFMWPJENZ@DPNQVUF@LFSOFM   DPOTUBOU.Z7FSUFY*O WFSUJDFT<<CVGGFS  >>  

    EFWJDF.Z7FSUFY0VU PVU7FSUFYFT<<CVGGFS  >>   VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>> \  ǘ ^
  9. Address Space • device Address Space • buffer memory objects

    allocated from the device memory pool that are both readable and writeable • threadgroup Address Space • Variables allocated in the threadgroup address space in a kernel function are allocated for each threadgroup executing the kernel, are shared by all threads in a threadgroup and exist only for the lifetime of the threadgroup that is executing the kernel • constant Address Space • The constant address space name refers to buffer memory objects allocated from the device memory pool but are read-only • thread Address Space • The thread address space refers to the per-thread memory address space
  10. Compute Command Encoder • MTLComputeCommandEncoder  MFUFODPEFSDPNNBOE#VGGFSNBLF$PNQVUF$PNNBOE&ODPEFS   FODPEFSTFU$PNQVUF1JQFMJOF4UBUF

    DPNQVUF1JQFMJOF4UBUF   FODPEFSTFU#VGGFS FMFNFOUT#VGGFS PGGTFU BU   FODPEFSTFU#VGGFS WFSUFY#VGGFS PGGTFU BU 
  11. Thread Group • Kernel requires a task broken into small

    pieces  MFUUISFBEHSPVQT1FS(SJE.5-4J[F.BLF FMFNFOUTDPVOU     MFUUISFBET1FS5ISFBEHSPVQ.5-4J[F.BLF      FODPEFSEJTQBUDI5ISFBEHSPVQT UISFBEHSPVQT1FS(SJE 
 UISFBET1FS5ISFBEHSPVQUISFBET1FS5ISFBEHSPVQ * I am still not fully understood
  12. Commit • Finally ready to commit  FODPEFSFOE&ODPEJOH  

    DPNNBOE#VGGFSDPNNJU  • Wait or Add Completion Handler… DPNNBOE#VGGFSXBJU6OUJM$PNQMFUFE 
 DPNNBOE#VGGFSBEE$PNQMFUFE)BOEMFS\ CVGGFS JO EPTPNFXPSLIFSF ^ • Check the buffer • there must be something good in there!
  13. Goal • Give Shader an array of Path Elements or

    equivalent • Produce many consequent positions using Kernel • Using Bezier Calculation Method of my Qiita atricle
  14. Strategies • Path elements buffer • Vertex buffer • CPU

    estimates the length of 
 path elements • A Kernel produces vertices for
 a path element • There may be a better way… #0 p0 … p3 0 #1 p0 … p3 m1 #n p0 … p3 m2 #0 pt … pt #m1 pt … #m2 … pt Element Buffer Vertex Buffer
  15. bezier_kernel shader DBTF1BUI&MFNFOU5ZQF2VBE$VSWF5P GPS JOUJOEFYJOEFYOVNCFS0G7FSUFYFTJOEFY \ GMPBUUGMPBU JOEFY GMPBU OVNCFS0G7FSUFYFT

     GMPBURQ  QQ  U GMPBURQ  QQ  U GMPBUSR  RR  U GMPBUXX  XX  U 7FSUFYW7FSUFY IBMG SY SZ IBMG X  PVU7FSUFYFT<FMFNFOUWFSUFY*OEFY JOEFY>W ^ CSFBL DBTF1BUI&MFNFOU5ZQF$VSWF5P ǘTOJQǘ CSFBL ^ ^ LFSOFMWPJECF[JFS@LFSOFM  DPOTUBOU1BUI&MFNFOU FMFNFOUT<<CVGGFS  >>  EFWJDF7FSUFY PVU7FSUFYFT<<CVGGFS  >>  VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>> \ 1BUI&MFNFOUFMFNFOUFMFNFOUT<JE> JOUOVNCFS0G7FSUFYFTFMFNFOUOVNCFS0G7FSUFYFT GMPBUQFMFNFOUQ GMPBUQFMFNFOUQ GMPBUQFMFNFOUQ GMPBUQFMFNFOUQ  TXJUDI FMFNFOUUZQF \ DBTF1BUI&MFNFOU5ZQF-JOF5P ǘTOJQǘ CSFBL DBTF1BUI&MFNFOU5ZQF2VBE$VSWF5P GPS JOUJOEFYJOEFYOVNCFS0G7FSUFYFTJOEFY \ GMPBUUGMPBU JOEFY GMPBU OVNCFS0G7FSUFYFT  GMPBURQ  QQ  U GMPBURQ  QQ  U GMPBUSR  RR  U GMPBUXX  XX  U 7FSUFYW7FSUFY IBMG SY SZ IBMG X  PVU7FSUFYFT<FMFNFOUWFSUFY*OEFY JOEFY>W ^ CSFBL * not a whole code
  16. Playground! • Yellow – Core Graphics • Red – Compute

    Kernel • Look Good https://github.com/codelynx/BezierKernelPlayground
  17. Double or Triple Buffering • Avoid access collision between CPU

    and GPU Buffer#1 Buffer#2 Buffer#1 Buffer#2 Buffer#1 Buffer#1 Buffer#1 →time Buffer#1 ⚡CPU ⚡GPU ⚡CPU ⚡GPU Buffer#1 Buffer#2 Buffer#1 Buffer#2 ⚡CPU ⚡CPU ⚡CPU ⚡CPU ⚡GPU ⚡GPU ⚡GPU ⚡GPU →time
  18. Buffer Management • Memory resource is finite, recycle them where

    possible • System crashes at device.makeBuffer() rather returning nil • It is hard to find out the reason (as of iOS10) • Save memory resources and be a good citizen
  19. Wrap Up • Computing Shaders are much easier than Rendering

    Shader • Memory Management could be pain if you wants one more step toward high performance shader • Be aware memory alignment • Shader is hard to debug – no break point nor printf()