Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Compute Kernel with Metal

codelynx
February 11, 2017

Compute Kernel with Metal

This is a presentation materials at Yokohama iOS developer meeting in Feb. 2017. It gives you a very basic idea of Metal computing Kernel, and show the example codes.

codelynx

February 11, 2017
Tweet

More Decks by codelynx

Other Decks in Programming

Transcript

  1. Compute Kernel with Metal Kaz Yoshikawa kyoshikawa@electricwoods.com Feb 2017 –

    Yokohama iOS Developers Meet-up
  2. Understanding 
 Computing Kernel

  3. Executive Summery • Computing method using GPU • Super Parallel

    Computing • May not suitable for complex algorithm
  4. Players MTLDevice MTLCommandQueue MTLCommandBuffer MTLLibrary MTLFunction MTLComputePipelineState MTLComputeComman dEncoder MTLBuffer

    MTLTexture
  5. None
  6. Device, CommandQueue and
 CommandBuffer • MTLDevice • MTLCreateSystemDefaultDevice() • MTLCommandQueue

    • device.makeCommandQueue() • MTLCommandBuffer • commandQueue.makeCommandBuffer() MTLCommandQueue MTLCommandBuffer MTLComputeComman dEncoder
  7. Library, Function and MTLComputePipelineState • MTLLibrary • try! device.makeLibrary(source: shaderSource,

    options: nil) • device.newDefaultLibrary() • MTLFunction • library.makeFunction(name: "bezier_kernel") • MTLComputePipelineState • .library.makeComputePipelineState(function: function) MTLLibrary MTLFunction MTLComputePipelineState
  8. Compute Pipeline State • Blue Print for Computing Parameters •

    Buffers • Textures • etc…
  9. Computing Kernel

  10. Computing Kernel • C++ 14 subset shading language • Restrictions

    • lambda expressions, dynamic_cast operator, type identification, recursive function calls, new and delete operators, noexcept operator, goto statement, register, thread_local storage qualifiers, virtual function qualifier, derived classes and exception handling
  11. Scalar Types • bool, char, int8_t, unsigned char, uchar •

    short, unsigned short, ushort • int, unsigned int, uint – 32bit • half – 16bit half precision, float – 32bit single precision • size_t, ptrdiff_t, void • no double
  12. Vector and Matrix Types • booln • charn, shortn, ucharn,

    ushortn, uintn • halfn, floatn • halfnxm, floatnxm * n is a number
  13. Glance a code TUSVDU.Z7FSUFY*O\  GMPBUQPTJUJPO  GMPBUDPMPS ^
 


    TUSVDU.Z7FSUFY0VU\  GMPBUQPTJUJPO  GMPBUDPMPS ^
 LFSOFMWPJENZ@DPNQVUF@LFSOFM   DPOTUBOU.Z7FSUFY*O WFSUJDFT<<CVGGFS  >>   EFWJDF.Z7FSUFY0VU PVU7FSUFYFT<<CVGGFS  >>   VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>> \  ǘ ^ * Just for getting an idea, not working code ←Defining a structure ↓ Defining a kernel code ↓ Specifying Buffer Index
  14. Qualifiers LFSOFMWPJENZ@DPNQVUF@LFSOFM   DPOTUBOU.Z7FSUFY*O WFSUJDFT<<CVGGFS  >>  

    EFWJDF.Z7FSUFY0VU PVU7FSUFYFT<<CVGGFS  >>   VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>> \  ǘ ^
  15. Address Space • device Address Space • buffer memory objects

    allocated from the device memory pool that are both readable and writeable • threadgroup Address Space • Variables allocated in the threadgroup address space in a kernel function are allocated for each threadgroup executing the kernel, are shared by all threads in a threadgroup and exist only for the lifetime of the threadgroup that is executing the kernel • constant Address Space • The constant address space name refers to buffer memory objects allocated from the device memory pool but are read-only • thread Address Space • The thread address space refers to the per-thread memory address space
  16. Compute Command Encoder • MTLComputeCommandEncoder  MFUFODPEFSDPNNBOE#VGGFSNBLF$PNQVUF$PNNBOE&ODPEFS   FODPEFSTFU$PNQVUF1JQFMJOF4UBUF

    DPNQVUF1JQFMJOF4UBUF   FODPEFSTFU#VGGFS FMFNFOUT#VGGFS PGGTFU BU   FODPEFSTFU#VGGFS WFSUFY#VGGFS PGGTFU BU 
  17. Thread Group • Kernel requires a task broken into small

    pieces  MFUUISFBEHSPVQT1FS(SJE.5-4J[F.BLF FMFNFOUTDPVOU     MFUUISFBET1FS5ISFBEHSPVQ.5-4J[F.BLF      FODPEFSEJTQBUDI5ISFBEHSPVQT UISFBEHSPVQT1FS(SJE 
 UISFBET1FS5ISFBEHSPVQUISFBET1FS5ISFBEHSPVQ * I am still not fully understood
  18. Commit • Finally ready to commit  FODPEFSFOE&ODPEJOH  

    DPNNBOE#VGGFSDPNNJU  • Wait or Add Completion Handler… DPNNBOE#VGGFSXBJU6OUJM$PNQMFUFE 
 DPNNBOE#VGGFSBEE$PNQMFUFE)BOEMFS\ CVGGFS JO EPTPNFXPSLIFSF ^ • Check the buffer • there must be something good in there!
  19. Computing Bezier Positions Living example

  20. Goal • Give Shader an array of Path Elements or

    equivalent • Produce many consequent positions using Kernel • Using Bezier Calculation Method of my Qiita atricle
  21. http://qiita.com/codelynx/items/f7e6a844aac3746a6b79

  22. Strategies • Path elements buffer • Vertex buffer • CPU

    estimates the length of 
 path elements • A Kernel produces vertices for
 a path element • There may be a better way… #0 p0 … p3 0 #1 p0 … p3 m1 #n p0 … p3 m2 #0 pt … pt #m1 pt … #m2 … pt Element Buffer Vertex Buffer
  23. bezier_kernel shader DBTF1BUI&MFNFOU5ZQF2VBE$VSWF5P GPS JOUJOEFYJOEFYOVNCFS0G7FSUFYFTJOEFY \ GMPBUUGMPBU JOEFY GMPBU OVNCFS0G7FSUFYFT

     GMPBURQ  QQ  U GMPBURQ  QQ  U GMPBUSR  RR  U GMPBUXX  XX  U 7FSUFYW7FSUFY IBMG SY SZ IBMG X  PVU7FSUFYFT<FMFNFOUWFSUFY*OEFY JOEFY>W ^ CSFBL DBTF1BUI&MFNFOU5ZQF$VSWF5P ǘTOJQǘ CSFBL ^ ^ LFSOFMWPJECF[JFS@LFSOFM  DPOTUBOU1BUI&MFNFOU FMFNFOUT<<CVGGFS  >>  EFWJDF7FSUFY PVU7FSUFYFT<<CVGGFS  >>  VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>> \ 1BUI&MFNFOUFMFNFOUFMFNFOUT<JE> JOUOVNCFS0G7FSUFYFTFMFNFOUOVNCFS0G7FSUFYFT GMPBUQFMFNFOUQ GMPBUQFMFNFOUQ GMPBUQFMFNFOUQ GMPBUQFMFNFOUQ  TXJUDI FMFNFOUUZQF \ DBTF1BUI&MFNFOU5ZQF-JOF5P ǘTOJQǘ CSFBL DBTF1BUI&MFNFOU5ZQF2VBE$VSWF5P GPS JOUJOEFYJOEFYOVNCFS0G7FSUFYFTJOEFY \ GMPBUUGMPBU JOEFY GMPBU OVNCFS0G7FSUFYFT  GMPBURQ  QQ  U GMPBURQ  QQ  U GMPBUSR  RR  U GMPBUXX  XX  U 7FSUFYW7FSUFY IBMG SY SZ IBMG X  PVU7FSUFYFT<FMFNFOUWFSUFY*OEFY JOEFY>W ^ CSFBL * not a whole code
  24. Playground! • Yellow – Core Graphics • Red – Compute

    Kernel • Look Good https://github.com/codelynx/BezierKernelPlayground
  25. Other Considerations

  26. Double or Triple Buffering • Avoid access collision between CPU

    and GPU Buffer#1 Buffer#2 Buffer#1 Buffer#2 Buffer#1 Buffer#1 Buffer#1 →time Buffer#1 ⚡CPU ⚡GPU ⚡CPU ⚡GPU Buffer#1 Buffer#2 Buffer#1 Buffer#2 ⚡CPU ⚡CPU ⚡CPU ⚡CPU ⚡GPU ⚡GPU ⚡GPU ⚡GPU →time
  27. Buffer Management • Memory resource is finite, recycle them where

    possible • System crashes at device.makeBuffer() rather returning nil • It is hard to find out the reason (as of iOS10) • Save memory resources and be a good citizen
  28. Wrap Up

  29. Wrap Up • Computing Shaders are much easier than Rendering

    Shader • Memory Management could be pain if you wants one more step toward high performance shader • Be aware memory alignment • Shader is hard to debug – no break point nor printf()
  30. One More Thing

  31. MetalBlendTester https://github.com/codelynx/MetalBlendTester

  32. Thanks Kaz Yoshikawa