Pro Yearly is on sale from $80 to $50! »

Compute Kernel with Metal

79874cedccd1cf5baa8cb264b5091ee6?s=47 codelynx
February 11, 2017

Compute Kernel with Metal

This is a presentation materials at Yokohama iOS developer meeting in Feb. 2017. It gives you a very basic idea of Metal computing Kernel, and show the example codes.

79874cedccd1cf5baa8cb264b5091ee6?s=128

codelynx

February 11, 2017
Tweet

Transcript

  1. Compute Kernel with Metal Kaz Yoshikawa kyoshikawa@electricwoods.com Feb 2017 –

    Yokohama iOS Developers Meet-up
  2. Understanding 
 Computing Kernel

  3. Executive Summery • Computing method using GPU • Super Parallel

    Computing • May not suitable for complex algorithm
  4. Players MTLDevice MTLCommandQueue MTLCommandBuffer MTLLibrary MTLFunction MTLComputePipelineState MTLComputeComman dEncoder MTLBuffer

    MTLTexture
  5. None
  6. Device, CommandQueue and
 CommandBuffer • MTLDevice • MTLCreateSystemDefaultDevice() • MTLCommandQueue

    • device.makeCommandQueue() • MTLCommandBuffer • commandQueue.makeCommandBuffer() MTLCommandQueue MTLCommandBuffer MTLComputeComman dEncoder
  7. Library, Function and MTLComputePipelineState • MTLLibrary • try! device.makeLibrary(source: shaderSource,

    options: nil) • device.newDefaultLibrary() • MTLFunction • library.makeFunction(name: "bezier_kernel") • MTLComputePipelineState • .library.makeComputePipelineState(function: function) MTLLibrary MTLFunction MTLComputePipelineState
  8. Compute Pipeline State • Blue Print for Computing Parameters •

    Buffers • Textures • etc…
  9. Computing Kernel

  10. Computing Kernel • C++ 14 subset shading language • Restrictions

    • lambda expressions, dynamic_cast operator, type identification, recursive function calls, new and delete operators, noexcept operator, goto statement, register, thread_local storage qualifiers, virtual function qualifier, derived classes and exception handling
  11. Scalar Types • bool, char, int8_t, unsigned char, uchar •

    short, unsigned short, ushort • int, unsigned int, uint – 32bit • half – 16bit half precision, float – 32bit single precision • size_t, ptrdiff_t, void • no double
  12. Vector and Matrix Types • booln • charn, shortn, ucharn,

    ushortn, uintn • halfn, floatn • halfnxm, floatnxm * n is a number
  13. Glance a code TUSVDU.Z7FSUFY*O\  GMPBUQPTJUJPO  GMPBUDPMPS ^
 


    TUSVDU.Z7FSUFY0VU\  GMPBUQPTJUJPO  GMPBUDPMPS ^
 LFSOFMWPJENZ@DPNQVUF@LFSOFM   DPOTUBOU.Z7FSUFY*O WFSUJDFT<<CVGGFS  >>   EFWJDF.Z7FSUFY0VU PVU7FSUFYFT<<CVGGFS  >>   VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>> \  ǘ ^ * Just for getting an idea, not working code ←Defining a structure ↓ Defining a kernel code ↓ Specifying Buffer Index
  14. Qualifiers LFSOFMWPJENZ@DPNQVUF@LFSOFM   DPOTUBOU.Z7FSUFY*O WFSUJDFT<<CVGGFS  >>  

    EFWJDF.Z7FSUFY0VU PVU7FSUFYFT<<CVGGFS  >>   VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>> \  ǘ ^
  15. Address Space • device Address Space • buffer memory objects

    allocated from the device memory pool that are both readable and writeable • threadgroup Address Space • Variables allocated in the threadgroup address space in a kernel function are allocated for each threadgroup executing the kernel, are shared by all threads in a threadgroup and exist only for the lifetime of the threadgroup that is executing the kernel • constant Address Space • The constant address space name refers to buffer memory objects allocated from the device memory pool but are read-only • thread Address Space • The thread address space refers to the per-thread memory address space
  16. Compute Command Encoder • MTLComputeCommandEncoder  MFUFODPEFSDPNNBOE#VGGFSNBLF$PNQVUF$PNNBOE&ODPEFS   FODPEFSTFU$PNQVUF1JQFMJOF4UBUF

    DPNQVUF1JQFMJOF4UBUF   FODPEFSTFU#VGGFS FMFNFOUT#VGGFS PGGTFU BU   FODPEFSTFU#VGGFS WFSUFY#VGGFS PGGTFU BU 
  17. Thread Group • Kernel requires a task broken into small

    pieces  MFUUISFBEHSPVQT1FS(SJE.5-4J[F.BLF FMFNFOUTDPVOU     MFUUISFBET1FS5ISFBEHSPVQ.5-4J[F.BLF      FODPEFSEJTQBUDI5ISFBEHSPVQT UISFBEHSPVQT1FS(SJE 
 UISFBET1FS5ISFBEHSPVQUISFBET1FS5ISFBEHSPVQ * I am still not fully understood
  18. Commit • Finally ready to commit  FODPEFSFOE&ODPEJOH  

    DPNNBOE#VGGFSDPNNJU  • Wait or Add Completion Handler… DPNNBOE#VGGFSXBJU6OUJM$PNQMFUFE 
 DPNNBOE#VGGFSBEE$PNQMFUFE)BOEMFS\ CVGGFS JO EPTPNFXPSLIFSF ^ • Check the buffer • there must be something good in there!
  19. Computing Bezier Positions Living example

  20. Goal • Give Shader an array of Path Elements or

    equivalent • Produce many consequent positions using Kernel • Using Bezier Calculation Method of my Qiita atricle
  21. http://qiita.com/codelynx/items/f7e6a844aac3746a6b79

  22. Strategies • Path elements buffer • Vertex buffer • CPU

    estimates the length of 
 path elements • A Kernel produces vertices for
 a path element • There may be a better way… #0 p0 … p3 0 #1 p0 … p3 m1 #n p0 … p3 m2 #0 pt … pt #m1 pt … #m2 … pt Element Buffer Vertex Buffer
  23. bezier_kernel shader DBTF1BUI&MFNFOU5ZQF2VBE$VSWF5P GPS JOUJOEFYJOEFYOVNCFS0G7FSUFYFTJOEFY \ GMPBUUGMPBU JOEFY GMPBU OVNCFS0G7FSUFYFT

     GMPBURQ  QQ  U GMPBURQ  QQ  U GMPBUSR  RR  U GMPBUXX  XX  U 7FSUFYW7FSUFY IBMG SY SZ IBMG X  PVU7FSUFYFT<FMFNFOUWFSUFY*OEFY JOEFY>W ^ CSFBL DBTF1BUI&MFNFOU5ZQF$VSWF5P ǘTOJQǘ CSFBL ^ ^ LFSOFMWPJECF[JFS@LFSOFM  DPOTUBOU1BUI&MFNFOU FMFNFOUT<<CVGGFS  >>  EFWJDF7FSUFY PVU7FSUFYFT<<CVGGFS  >>  VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>> \ 1BUI&MFNFOUFMFNFOUFMFNFOUT<JE> JOUOVNCFS0G7FSUFYFTFMFNFOUOVNCFS0G7FSUFYFT GMPBUQFMFNFOUQ GMPBUQFMFNFOUQ GMPBUQFMFNFOUQ GMPBUQFMFNFOUQ  TXJUDI FMFNFOUUZQF \ DBTF1BUI&MFNFOU5ZQF-JOF5P ǘTOJQǘ CSFBL DBTF1BUI&MFNFOU5ZQF2VBE$VSWF5P GPS JOUJOEFYJOEFYOVNCFS0G7FSUFYFTJOEFY \ GMPBUUGMPBU JOEFY GMPBU OVNCFS0G7FSUFYFT  GMPBURQ  QQ  U GMPBURQ  QQ  U GMPBUSR  RR  U GMPBUXX  XX  U 7FSUFYW7FSUFY IBMG SY SZ IBMG X  PVU7FSUFYFT<FMFNFOUWFSUFY*OEFY JOEFY>W ^ CSFBL * not a whole code
  24. Playground! • Yellow – Core Graphics • Red – Compute

    Kernel • Look Good https://github.com/codelynx/BezierKernelPlayground
  25. Other Considerations

  26. Double or Triple Buffering • Avoid access collision between CPU

    and GPU Buffer#1 Buffer#2 Buffer#1 Buffer#2 Buffer#1 Buffer#1 Buffer#1 →time Buffer#1 ⚡CPU ⚡GPU ⚡CPU ⚡GPU Buffer#1 Buffer#2 Buffer#1 Buffer#2 ⚡CPU ⚡CPU ⚡CPU ⚡CPU ⚡GPU ⚡GPU ⚡GPU ⚡GPU →time
  27. Buffer Management • Memory resource is finite, recycle them where

    possible • System crashes at device.makeBuffer() rather returning nil • It is hard to find out the reason (as of iOS10) • Save memory resources and be a good citizen
  28. Wrap Up

  29. Wrap Up • Computing Shaders are much easier than Rendering

    Shader • Memory Management could be pain if you wants one more step toward high performance shader • Be aware memory alignment • Shader is hard to debug – no break point nor printf()
  30. One More Thing

  31. MetalBlendTester https://github.com/codelynx/MetalBlendTester

  32. Thanks Kaz Yoshikawa