Compute Pipeline State
• Blue Print for Computing Parameters
• Buffers
• Textures
• etc…
Slide 9
Slide 9 text
Computing Kernel
Slide 10
Slide 10 text
Computing Kernel
• C++ 14 subset shading language
• Restrictions
• lambda expressions, dynamic_cast operator, type identification, recursive function calls, new and delete
operators, noexcept operator, goto statement, register, thread_local storage qualifiers, virtual function
qualifier, derived classes and exception handling
Vector and Matrix Types
• booln
• charn, shortn, ucharn, ushortn, uintn
• halfn, floatn
• halfnxm, floatnxm
* n is a number
Slide 13
Slide 13 text
Glance a code
TUSVDU.Z7FSUFY*O\
GMPBUQPTJUJPO
GMPBUDPMPS
^
TUSVDU.Z7FSUFY0VU\
GMPBUQPTJUJPO
GMPBUDPMPS
^
LFSOFMWPJENZ@DPNQVUF@LFSOFM
DPOTUBOU.Z7FSUFY*OWFSUJDFT<<CVGGFS
>>
EFWJDF.Z7FSUFY0VUPVU7FSUFYFT<<CVGGFS
>>
VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>>
\
ǘ
^
* Just for getting an idea, not working code
←Defining a structure
↓ Defining a kernel code
↓ Specifying Buffer Index
Address Space
• device Address Space
• buffer memory objects allocated from the device memory pool that are both readable and writeable
• threadgroup Address Space
• Variables allocated in the threadgroup address space in a kernel function are allocated for each threadgroup executing the
kernel, are shared by all threads in a threadgroup and exist only for the lifetime of the threadgroup that is executing the kernel
• constant Address Space
• The constant address space name refers to buffer memory objects allocated from the device memory pool but are read-only
• thread Address Space
• The thread address space refers to the per-thread memory address space
Slide 16
Slide 16 text
Compute Command Encoder
• MTLComputeCommandEncoder
MFUFODPEFSDPNNBOE#VGGFSNBLF$PNQVUF$PNNBOE&ODPEFS
FODPEFSTFU$PNQVUF1JQFMJOF4UBUF DPNQVUF1JQFMJOF4UBUF
FODPEFSTFU#VGGFS FMFNFOUT#VGGFS
PGGTFU
BU
FODPEFSTFU#VGGFS WFSUFY#VGGFS
PGGTFU
BU
Slide 17
Slide 17 text
Thread Group
• Kernel requires a task broken into small pieces
MFUUISFBEHSPVQT1FS(SJE.5-4J[F.BLF FMFNFOUTDPVOU
MFUUISFBET1FS5ISFBEHSPVQ.5-4J[F.BLF
FODPEFSEJTQBUDI5ISFBEHSPVQT UISFBEHSPVQT1FS(SJE
UISFBET1FS5ISFBEHSPVQUISFBET1FS5ISFBEHSPVQ
* I am still not fully understood
Slide 18
Slide 18 text
Commit
• Finally ready to commit
FODPEFSFOE&ODPEJOH
DPNNBOE#VGGFSDPNNJU
• Wait or Add Completion Handler…
DPNNBOE#VGGFSXBJU6OUJM$PNQMFUFE
DPNNBOE#VGGFSBEE$PNQMFUFE)BOEMFS\ CVGGFS
JO
EPTPNFXPSLIFSF
^
• Check the buffer
• there must be something good in there!
Slide 19
Slide 19 text
Computing Bezier
Positions
Living example
Slide 20
Slide 20 text
Goal
• Give Shader an array of Path Elements or equivalent
• Produce many consequent positions using Kernel
• Using Bezier Calculation Method of my Qiita atricle
Strategies
• Path elements buffer
• Vertex buffer
• CPU estimates the length of
path elements
• A Kernel produces vertices for
a path element
• There may be a better way…
#0 p0 … p3 0
#1 p0 … p3 m1
#n p0 … p3 m2
#0 pt
… pt
#m1
pt
…
#m2
… pt
Element Buffer
Vertex Buffer
Slide 23
Slide 23 text
bezier_kernel shader
DBTF1BUI&MFNFOU5ZQF2VBE$VSWF5P
GPS JOUJOEFYJOEFYOVNCFS0G7FSUFYFTJOEFY
\
GMPBUUGMPBU JOEFY
GMPBU OVNCFS0G7FSUFYFT
GMPBURQ QQ
U
GMPBURQ QQ
U
GMPBUSR RR
U
GMPBUXX XX
U
7FSUFYW7FSUFY IBMG SY
SZ
IBMG X
PVU7FSUFYFTW
^
CSFBL
DBTF1BUI&MFNFOU5ZQF$VSWF5P
ǘTOJQǘ
CSFBL
^
^
LFSOFMWPJECF[JFS@LFSOFM
DPOTUBOU1BUI&MFNFOUFMFNFOUT<<CVGGFS
>>
EFWJDF7FSUFYPVU7FSUFYFT<<CVGGFS
>>
VJOUJE<<UISFBE@QPTJUJPO@JO@HSJE>>
\
1BUI&MFNFOUFMFNFOUFMFNFOUT
JOUOVNCFS0G7FSUFYFTFMFNFOUOVNCFS0G7FSUFYFT
GMPBUQFMFNFOUQ
GMPBUQFMFNFOUQ
GMPBUQFMFNFOUQ
GMPBUQFMFNFOUQ
TXJUDI FMFNFOUUZQF
\
DBTF1BUI&MFNFOU5ZQF-JOF5P
ǘTOJQǘ
CSFBL
DBTF1BUI&MFNFOU5ZQF2VBE$VSWF5P
GPS JOUJOEFYJOEFYOVNCFS0G7FSUFYFTJOEFY
\
GMPBUUGMPBU JOEFY
GMPBU OVNCFS0G7FSUFYFT
GMPBURQ QQ
U
GMPBURQ QQ
U
GMPBUSR RR
U
GMPBUXX XX
U
7FSUFYW7FSUFY IBMG SY
SZ
IBMG X
PVU7FSUFYFTW
^
CSFBL
* not a whole code
Slide 24
Slide 24 text
Playground!
• Yellow – Core Graphics
• Red – Compute Kernel
• Look Good
https://github.com/codelynx/BezierKernelPlayground
Slide 25
Slide 25 text
Other Considerations
Slide 26
Slide 26 text
Double or Triple Buffering
• Avoid access collision between CPU and GPU
Buffer#1 Buffer#2
Buffer#1 Buffer#2
Buffer#1 Buffer#1 Buffer#1
→time
Buffer#1
⚡CPU
⚡GPU
⚡CPU
⚡GPU
Buffer#1 Buffer#2
Buffer#1 Buffer#2
⚡CPU ⚡CPU ⚡CPU ⚡CPU
⚡GPU ⚡GPU ⚡GPU ⚡GPU
→time
Slide 27
Slide 27 text
Buffer Management
• Memory resource is finite, recycle them where possible
• System crashes at device.makeBuffer() rather returning nil
• It is hard to find out the reason (as of iOS10)
• Save memory resources and be a good citizen
Slide 28
Slide 28 text
Wrap Up
Slide 29
Slide 29 text
Wrap Up
• Computing Shaders are much easier than Rendering Shader
• Memory Management could be pain if you wants one more
step toward high performance shader
• Be aware memory alignment
• Shader is hard to debug – no break point nor printf()