Compute Kernel with Metal

Compute Kernel with Metal Kaz Yoshikawa [email protected] Feb 2017 –
Yokohama iOS Developers Meet-up

Understanding   Computing Kernel

Executive Summery • Computing method using GPU • Super Parallel
Computing • May not suitable for complex algorithm

Players MTLDevice MTLCommandQueue MTLCommandBuffer MTLLibrary MTLFunction MTLComputePipelineState MTLComputeComman dEncoder MTLBuffer
MTLTexture

Device, CommandQueue and  CommandBuffer • MTLDevice • MTLCreateSystemDefaultDevice() • MTLCommandQueue
• device.makeCommandQueue() • MTLCommandBuﬀer • commandQueue.makeCommandBuffer() MTLCommandQueue MTLCommandBuffer MTLComputeComman dEncoder

Library, Function and MTLComputePipelineState • MTLLibrary • try! device.makeLibrary(source: shaderSource,
options: nil) • device.newDefaultLibrary() • MTLFunction • library.makeFunction(name: "bezier_kernel") • MTLComputePipelineState • .library.makeComputePipelineState(function: function) MTLLibrary MTLFunction MTLComputePipelineState

Compute Pipeline State • Blue Print for Computing Parameters •
Buﬀers • Textures • etc…

Computing Kernel

Computing Kernel • C++ 14 subset shading language • Restrictions
• lambda expressions, dynamic_cast operator, type identification, recursive function calls, new and delete operators, noexcept operator, goto statement, register, thread_local storage qualifiers, virtual function qualifier, derived classes and exception handling

Scalar Types • bool, char, int8_t, unsigned char, uchar •
short, unsigned short, ushort • int, unsigned int, uint – 32bit • half – 16bit half precision, ﬂoat – 32bit single precision • size_t, ptrdiﬀ_t, void • no double

Vector and Matrix Types • booln • charn, shortn, ucharn,
ushortn, uintn • halfn, ﬂoatn • halfnxm, ﬂoatnxm * n is a number

Glance a code TUSVDU.Z7FSUFY*O\ GMPBUQPTJUJPO GMPBUDPMPS ^   
TUSVDU.Z7FSUFY0VU\ GMPBUQPTJUJPO GMPBUDPMPS ^  LFSOFMWPJENZ@DPNQVUF@LFSOFM DPOTUBOU.Z7FSUFY*O WFSUJDFT<<CVGGFS >> EFWJDF.Z7FSUFY0VU PVU7FSUFYFT<<CVGGFS >> VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>> \ ǘ ^ * Just for getting an idea, not working code ←Defining a structure ↓ Defining a kernel code ↓ Specifying Buffer Index

Qualifiers LFSOFMWPJENZ@DPNQVUF@LFSOFM DPOTUBOU.Z7FSUFY*O WFSUJDFT<<CVGGFS >>
EFWJDF.Z7FSUFY0VU PVU7FSUFYFT<<CVGGFS >> VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>> \ ǘ ^

Address Space • device Address Space • buffer memory objects
allocated from the device memory pool that are both readable and writeable • threadgroup Address Space • Variables allocated in the threadgroup address space in a kernel function are allocated for each threadgroup executing the kernel, are shared by all threads in a threadgroup and exist only for the lifetime of the threadgroup that is executing the kernel • constant Address Space • The constant address space name refers to buffer memory objects allocated from the device memory pool but are read-only • thread Address Space • The thread address space refers to the per-thread memory address space

Compute Command Encoder • MTLComputeCommandEncoder MFUFODPEFSDPNNBOE#VGGFSNBLF$PNQVUF$PNNBOE&ODPEFS FODPEFSTFU$PNQVUF1JQFMJOF4UBUF
DPNQVUF1JQFMJOF4UBUF FODPEFSTFU#VGGFS FMFNFOUT#VGGFS PGGTFU BU FODPEFSTFU#VGGFS WFSUFY#VGGFS PGGTFU BU

Thread Group • Kernel requires a task broken into small
pieces MFUUISFBEHSPVQT1FS(SJE.5-4J[F.BLF FMFNFOUTDPVOU MFUUISFBET1FS5ISFBEHSPVQ.5-4J[F.BLF FODPEFSEJTQBUDI5ISFBEHSPVQT UISFBEHSPVQT1FS(SJE   UISFBET1FS5ISFBEHSPVQUISFBET1FS5ISFBEHSPVQ * I am still not fully understood

Commit • Finally ready to commit FODPEFSFOE&ODPEJOH
DPNNBOE#VGGFSDPNNJU • Wait or Add Completion Handler… DPNNBOE#VGGFSXBJU6OUJM$PNQMFUFE   DPNNBOE#VGGFSBEE$PNQMFUFE)BOEMFS\ CVGGFS JO EPTPNFXPSLIFSF ^ • Check the buﬀer • there must be something good in there!

Computing Bezier Positions Living example

Goal • Give Shader an array of Path Elements or
equivalent • Produce many consequent positions using Kernel • Using Bezier Calculation Method of my Qiita atricle

http://qiita.com/codelynx/items/f7e6a844aac3746a6b79

Strategies • Path elements buﬀer • Vertex buﬀer • CPU
estimates the length of   path elements • A Kernel produces vertices for  a path element • There may be a better way… #0 p0 … p3 0 #1 p0 … p3 m1 #n p0 … p3 m2 #0 pt … pt #m1 pt … #m2 … pt Element Buffer Vertex Buffer

bezier_kernel shader DBTF1BUI&MFNFOU5ZQF2VBE$VSWF5P GPS JOUJOEFYJOEFYOVNCFS0G7FSUFYFTJOEFY \ GMPBUUGMPBU JOEFY GMPBU OVNCFS0G7FSUFYFT
GMPBURQ QQ U GMPBURQ QQ U GMPBUSR RR U GMPBUXX XX U 7FSUFYW7FSUFY IBMG SY SZ IBMG X PVU7FSUFYFT<FMFNFOUWFSUFY*OEFY JOEFY>W ^ CSFBL DBTF1BUI&MFNFOU5ZQF$VSWF5P ǘTOJQǘ CSFBL ^ ^ LFSOFMWPJECF[JFS@LFSOFM DPOTUBOU1BUI&MFNFOU FMFNFOUT<<CVGGFS >> EFWJDF7FSUFY PVU7FSUFYFT<<CVGGFS >> VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>> \ 1BUI&MFNFOUFMFNFOUFMFNFOUT<JE> JOUOVNCFS0G7FSUFYFTFMFNFOUOVNCFS0G7FSUFYFT GMPBUQFMFNFOUQ GMPBUQFMFNFOUQ GMPBUQFMFNFOUQ GMPBUQFMFNFOUQ TXJUDI FMFNFOUUZQF \ DBTF1BUI&MFNFOU5ZQF-JOF5P ǘTOJQǘ CSFBL DBTF1BUI&MFNFOU5ZQF2VBE$VSWF5P GPS JOUJOEFYJOEFYOVNCFS0G7FSUFYFTJOEFY \ GMPBUUGMPBU JOEFY GMPBU OVNCFS0G7FSUFYFT GMPBURQ QQ U GMPBURQ QQ U GMPBUSR RR U GMPBUXX XX U 7FSUFYW7FSUFY IBMG SY SZ IBMG X PVU7FSUFYFT<FMFNFOUWFSUFY*OEFY JOEFY>W ^ CSFBL * not a whole code

Playground! • Yellow – Core Graphics • Red – Compute
Kernel • Look Good https://github.com/codelynx/BezierKernelPlayground

Other Considerations

Double or Triple Buffering • Avoid access collision between CPU
and GPU Buffer#1 Buffer#2 Buffer#1 Buffer#2 Buffer#1 Buffer#1 Buffer#1 →time Buffer#1 ⚡CPU ⚡GPU ⚡CPU ⚡GPU Buffer#1 Buffer#2 Buffer#1 Buffer#2 ⚡CPU ⚡CPU ⚡CPU ⚡CPU ⚡GPU ⚡GPU ⚡GPU ⚡GPU →time

Buffer Management • Memory resource is finite, recycle them where
possible • System crashes at device.makeBuffer() rather returning nil • It is hard to find out the reason (as of iOS10) • Save memory resources and be a good citizen

Wrap Up

Wrap Up • Computing Shaders are much easier than Rendering
Shader • Memory Management could be pain if you wants one more step toward high performance shader • Be aware memory alignment • Shader is hard to debug – no break point nor printf()

One More Thing

MetalBlendTester https://github.com/codelynx/MetalBlendTester

Thanks Kaz Yoshikawa

Compute Kernel with Metal

Compute Kernel with Metal

codelynx

More Decks by codelynx

Other Decks in Programming

Featured

Transcript