Save 37% off PRO during our Black Friday Sale! »

iOS GPU Programming with Swift & Metal

iOS GPU Programming with Swift & Metal

Introduction to using Metal under Swift for iOS GPU programming. Discusses different approaches for passing data between GPU and CPU.

Sadly the deck doesn't include the videos, but they're all available in my "Metal" playlist under my YouTube account: https://www.youtube.com/playlist?list=PLVS70FynAbX5imGT3T7X4kFV-H4THC1Cy

For more information, visit my blog: http://flexmonkey.blogspot.co.uk

Cf40be92f231a15e81512b93a3cd0136?s=128

simon gladman

March 22, 2015
Tweet

Transcript

  1. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey iOS GPU Programming with

    Swift & Metal Is that a Cray in your pocket…? Simon Gladman for Swift Summit March 2015 Twitter: @FlexMonkey Blog: flexmonkey.blogspot.com GitHub: github.com/FlexMonkey
  2. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey What is Metal? •

    Metal is a framework for GPU programming • It’s the lowest level of abstraction compared to technologies such as SpriteKit and SceneKit • More work for the developer, but more powerful • Designed for GPU accelerated 3D graphics and data-parallel computation • It’s tied to iOS and A7 processors or later • Shaders are written in a C++ based language
  3. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey CPU versus GPU •

    CPU’s handle a few software threads at a time • GPUs handle hundreds or thousands of threads in parallel
  4. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Different Types of Shader

    • Vertex • Takes the three dimensional position of each vertex of each triangle in a scene and maps it the the two dimensional screen coordinate. • Fragment • Compute the colour of each pixel. • Manages textures, shadows, highlights, shading.
  5. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Different Types of Shader

    • Kernel • Allows us to build programs that operate in parallel on arrays or grids of data
  6. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Setting up Metal in

    a Swift Project • Create a device let device: MTLDevice = MTLCreateSystemDefaultDevice() • Create a default library let defaultLibrary: MTLLibrary = device.newDefaultLibrary() • Create a command queue let commandQueue: MTLCommandQueue = device.newCommandQueue()
  7. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Setting up Metal: The

    kernel function • Define the kernel function let kernelFunction: MTLFunction = defaultLibrary.newFunctionWithName(“particleRendererShader") • Define the pipeline state let pipelineState: MTLComputePipelineState = device.newComputePipelineStateWithFunction(kernelFunction!, error: nil)
  8. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Setting up Metal: The

    Command Buffer • Define the command buffer let commandBuffer: MTLCommandBuffer = commandQueue.commandBuffer()
  9. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Setting up Metal: The

    Command Encoder • Define the command encoder let commandEncoder: MTLComputeCommandEncoder = commandBuffer.computeCommandEncoder() commandEncoder.setComputePipelineState(pipelineState)
  10. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Preparing the kernel function:

    resources • Set some parameters var particleBrightness: Float = 0.8 let particleBrightnessBuffer: MTLBuffer = device.newBufferWithBytes(&particleBrightness, length: sizeof(Float), options: nil) commandEncoder.setBuffer(particleBrightnessBuffer, offset: 0, atIndex: 2)
  11. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Texture instantiation • Initialise

    textures let textureDescriptor = MTLTextureDescriptor.texture2DDescriptorWithPixelFormat( MTLPixelFormat.RGBA8Unorm, width: Int(imageSide), height: Int(imageSide), mipmapped: false) var textureA: MTLTexture = device.newTextureWithDescriptor(textureDescriptor)
  12. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Passing the texture to

    the kernel function • Define input and output textures var textureA: MTLTexture! commandEncoder.setTexture(textureA, atIndex: 0)
  13. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Creating a Simple Particle

    System • Create a Particle type in Swift struct Particle { var positionX: Float = 0 var positionY: Float = 0 var velocityX: Float = 0 var velocityY: Float = 0 } • Create its equivalent in Metal struct Particle { float positionX; float positionY; float velocityX; float velocityY; };
  14. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Creating a Simple Particle

    System • Create and populate an array of Particle instances in Swift var particles = [Particle]() func setUpParticles() { for _ in 0 ..< particleCount { var positionX = Float(arc4random() % UInt32(imageSide)) var positionY = Float(arc4random() % UInt32(imageSide)) let velocityX = (Float(arc4random() % 10) - 5) / 10.0 let velocityY = (Float(arc4random() % 10) - 5) / 10.0 let particle = Particle( positionX: positionX, positionY: positionY, velocityX: velocityX, velocityY: velocityY) particles.append(particle) } }
  15. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Create one buffer for

    input into kernel function • Determine length of buffer let particleVectorByteLength = particles.count*sizeofValue(particles[0]) • Create and populate the buffer var inVectorBuffer = device.newBufferWithBytes( &particles, length: particleVectorByteLength, options: nil) • Pass the buffer to the shader commandEncoder.setBuffer(inVectorBuffer, offset: 0, atIndex: 0)
  16. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Create one buffer for

    output from kernel function • Define the Swift variable to receive the kernel results var resultdata = [Particle]( count:particles.count, repeatedValue: Particle()) • Create the buffer var outVectorBuffer = device.newBufferWithBytes( &resultdata, length: particleVectorByteLength, options: nil) • Pass the buffer to the shader commandEncoder.setBuffer(outVectorBuffer, offset: 0, atIndex: 1)
  17. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Setting up Metal: Threadgroups

    • Two dimensional thread groups are used for image processing threadGroupCount = MTLSize(width: 8,height: 8,depth: 1) threadGroups = MTLSize(width: 1024 / threadGroupCount.width, width: 1024 / threadGroupCount.height, depth: 1) • A one dimensional thread group Swift to pass Metal a one dimensional array threadGroupCount = MTLSize(width:32, height:1, depth:1) threadGroups = MTLSize(width:(4096 + 31) / 32, height:1, depth:1)
  18. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Executing the kernel function

    • Execute the shader commandEncoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupCount) commandEncoder.endEncoding() commandBuffer.commit()
  19. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Accessing the arrays in

    the shader • When the kernel function executes, both the input and output arrays are available. kernel void particleRendererShader( const device Particle *inParticle [[ buffer(0) ]], device Particle *outParticle [[ buffer(1) ]], constant float &particleBrightness [[buffer(2)]], uint id [[thread_position_in_grid]]) { const Particle thisParticle = inParticle[id]; outParticle[id].positionX = thisParticle.positionX + thisParticle.velocityX; outParticle[id].positionY = thisParticle.positionY + thisParticle.velocityY; }
  20. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Writing to the output

    texture • Set the color of the pixel at the particle coordinates kernel void particleRendererShader( texture2d<float, access::write> outTexture [[texture(0)]], const device Particle *inParticle [[ buffer(0) ]], uint id [[thread_position_in_grid]]) { const uint2 particlePosition(inParticle[id].positionX, inParticle[id].positionY); const float4 outColor(1.0, 1.0, 0.0, 1.0); // RGBA yellow outTexture.write(outColor, particlePosition); }
  21. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Accessing the updated array

    in Swift • Once the kernel function has completed, we can access the updated array var data = NSData(bytesNoCopy: outVectorBuffer.contents(), length: particles.count*sizeof(Particle), freeWhenDone: false) data.getBytes(&particles, length:particles.count * sizeof(Particle))
  22. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Converting the output texture

    to a UIImage • Create a UIImage from the texture region = MTLRegionMake2D(0, 0, Int(imageSide), Int(imageSide)) textureA.getBytes(&imageBytes, bytesPerRow: bytesPerRowInt, fromRegion: region, mipmapLevel: 0) let providerRef = CGDataProviderCreateWithCFData( NSData(bytes: &self.imageBytes, length: self.providerLength)) imageRef = CGImageCreate(imageSide, imageSide, bitsPerComponent, bitsPerPixel, bytesPerRow, rgbColorSpace, bitmapInfo, providerRef, nil, false, renderingIntent) imageView.image = UIImage(CGImage: imageRef)!
  23. None
  24. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Issues • The main

    performance issue isn’t running the kernel function, it’s moving the data between the GPU and CPU
  25. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Improving Performance With Shared

    Memory • To eliminate this bottleneck, we can share memory between the GPU and CPU • No data being copied - improved performance! • posix_memalign() allocates memory for use by both GPU and CPU
  26. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Shared GPU/CPU Memory •

    Declaring buffers and pointers let particleCount: Int = 2097152 var particlesMemory:UnsafeMutablePointer<Void> = nil let particlesMemoryByteSize:UInt = UInt(2097152) * UInt(sizeof(Particle)) var particlesVoidPtr: COpaquePointer! var particlesParticlePtr: UnsafeMutablePointer<Particle>! var particlesParticleBufferPtr: UnsafeMutableBufferPointer<Particle>! • posix_memalign() for shared memory posix_memalign(&particlesMemory, 0x4000, particlesMemoryByteSize) particlesVoidPtr = COpaquePointer(particlesMemory) particlesParticlePtr = UnsafeMutablePointer<Particle>(particlesVoidPtr) particlesParticleBufferPtr = UnsafeMutableBufferPointer( start: particlesParticlePtr, count: particleCount)
  27. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Shared GPU/CPU Memory •

    Populating data for index in particlesParticleBufferPtr.startIndex ..< particlesParticleBufferPtr.endIndex { let particle = Particle( … ) particlesParticleBufferPtr[index] = particle } • Passing into kernel shader let particlesBufferNoCopy = device.newBufferWithBytesNoCopy( particlesMemory, length: Int(particlesMemoryByteSize), options: nil, deallocator: nil) commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 0) commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 1)
  28. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Image processing inside particle

    compute shader • If the number of particles is more than the number of pixels uint id [[thread_position_in_grid]] // passed in constructor uint2 textureCoordinate(fast::floor(id / imageWidth),id % int(imageWidth)); if (textureCoordinate.x < imageWidth && textureCoordinate.y < imageWidth) { float4 accumColor = inTexture.read(textureCoordinate); for (int j = -1; j <= 1; j++) { for (int i = -1; i <= 1; i++) { uint2 kernelIndex(textureCoordinate.x + i, textureCoordinate.y + j); accumColor.rgb += inTexture.read(kernelIndex).rgb; } } accumColor.rgb = (accumColor.rgb / 10.5f); accumColor.a = 1.0f; outTexture.write(accumColor, textureCoordinate); }
  29. None
  30. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Advanced Particle Systems •

    Particle array is passed into kernel function const device Particle *inParticles [[ buffer(0) ]] • The current index is also passed into kernel function uint id [[thread_position_in_grid]] • Accessing an item Particle thisParticle = inParticles[id]; • Looping over entire array for (uint i = 0; i < 4096; i++) { const Particle otherParticle = inParticles[i]; […]
  31. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Advanced Particle Systems kernel

    void particleRendererShader(texture2d<float, access::write> outTexture [[texture(0)]], texture2d<float, access::read> inTexture [[texture(1)]], const device Particle *inParticles [[ buffer(0) ]], device Particle *outParticles [[ buffer(1) ]], constant SwarmGenome &genomeOne [[buffer(2)]], constant SwarmGenome &genomeTwo [[buffer(3)]], constant SwarmGenome &genomeThree [[buffer(4)]], constant float &particleBrightness [[buffer(5)]], constant float &gravityWellX [[buffer(6)]], constant float &gravityWellY [[buffer(7)]], uint id [[thread_position_in_grid]]) { Particle inParticle = inParticles[id]; const uint2 particlePosition(inParticle.positionX, inParticle.positionY); const int type = int(inParticle.type); const float4 outColor((type == 0 ? particleBrightness : particleBrightness / 2.0), (type == 1 ? particleBrightness : particleBrightness / 2.0), (type == 2 ? particleBrightness : particleBrightness / 2.0), 1.0); float neigbourCount = 0; float localCentreX = 0; float localCentreY = 0; float localDx = 0; float localDy = 0; float tempAx = 0; float tempAy = 0; if (gravityWellX > 1 && gravityWellY > 1) { const float dist = fast::distance(float2(inParticle.positionX, inParticle.positionY), float2(gravityWellX, gravityWellY)); const float factor = (1 / (dist < 1 ? 1 : dist)) * 10; inParticle.velocityX = inParticle.velocityX + (inParticle.positionX - gravityWellX) * factor; inParticle.velocityY = inParticle.velocityY + (inParticle.positionY - gravityWellY) * factor; } const SwarmGenome genome = type == 0 ? genomeOne : type == 1 ? genomeTwo : genomeThree; for (uint i = 0; i < 4096; i++) { if (i != id) { const Particle candidateNeighbour = inParticles[i]; const float dist = fast::distance(float2(inParticle.positionX, inParticle.positionY), float2(candidateNeighbour.positionX, candidateNeighbour.positionY)); if (dist < genome.radius * 100) { localCentreX = localCentreX + candidateNeighbour.positionX; localCentreY = localCentreY + candidateNeighbour.positionY; localDx = localDx + candidateNeighbour.velocityX; localDy = localDy + candidateNeighbour.velocityY; neigbourCount = neigbourCount + 1.0f; float foo = (dist < 1 ? 1 : dist) * (0.001 + genome.c3_seperation) * 50.0f; tempAx = tempAx + (inParticle.positionX - candidateNeighbour.positionX) / foo; tempAy = tempAy + (inParticle.positionY - candidateNeighbour.positionY) / foo; const float randomThree = fast::abs(fast::cos(candidateNeighbour.velocityX + candidateNeighbour.velocityY)); if (randomThree < genome.c4_steering) { const int randomOne = fast::cos(candidateNeighbour.positionX + candidateNeighbour.velocityY); const int randomTwo = fast::sin(candidateNeighbour.positionY + candidateNeighbour.velocityX); tempAx = tempAx + randomOne * 3; tempAy = tempAy + randomTwo * 3; } } } } if (neigbourCount > 0) { localCentreX = localCentreX / neigbourCount; localCentreY = localCentreY / neigbourCount; localDx = localDx / neigbourCount; localDy = localDy / neigbourCount; tempAx = tempAx + (localCentreX - inParticle.positionX) * genome.c1_cohesion; tempAy = tempAy + (localCentreY - inParticle.positionY) * genome.c1_cohesion; tempAx = tempAx + (localDx - inParticle.velocityX) * genome.c2_alignment; tempAy = tempAy + (localDy - inParticle.velocityY) * genome.c2_alignment; // accellerate inParticle.velocityX2 += tempAx; inParticle.velocityY2 += tempAy; float d = fast::sqrt(inParticle.velocityX2 * inParticle.velocityX2 + inParticle.velocityY2 * inParticle.velocityY2); if (d == 0) { d = 0.001f; } float accelerateMultiplier = ((genome.normalSpeed * 4.0f) - d) / d * genome.c5_paceKeeping; inParticle.velocityX2 += inParticle.velocityX2 * accelerateMultiplier; inParticle.velocityY2 += inParticle.velocityY2 * accelerateMultiplier; } inParticle.velocityX = inParticle.velocityX2; inParticle.velocityY = inParticle.velocityY2; inParticle.positionX += inParticle.velocityX; inParticle.positionY += inParticle.velocityY; outParticles[id] = inParticle; if (outParticles[id].positionX <= 0) { outParticles[id].positionX = 800; } else if (outParticles[id].positionX >= 800) { outParticles[id].positionX = 0; } if (outParticles[id].positionY <= 0) { outParticles[id].positionY = 800; } else if (outParticles[id].positionY >= 800) { outParticles[id].positionY = 0; } const float4 inColor = inTexture.read(particlePosition).rgba; outTexture.write(inColor + outColor, particlePosition); const float4 inColor2 = inTexture.read(particlePosition - uint2(1, 1)).rgba; outTexture.write(inColor2 + outColor, particlePosition - uint2(1, 1)); const float4 inColor3 = inTexture.read(particlePosition - uint2(0, 1)).rgba; outTexture.write(inColor3 + outColor, particlePosition - uint2(0, 1)); const float4 inColor4 = inTexture.read(particlePosition - uint2(1, 0)).rgba; outTexture.write(inColor4 + outColor, particlePosition - uint2(1, 0)); }
  32. None
  33. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Acknowledgements • http://metalbyexample.com/ •

    http://www.raywenderlich.com/ • http://memkite.com/ • http://bingweb.binghamton.edu/~sayama/ • Ida, Beren and Morgan for Swift Summit!
  34. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey iOS GPU Programming with

    Swift & Metal Simon Gladman for Swift Summit March 2015 Twitter: @FlexMonkey Blog: flexmonkey.blogspot.com GitHub: github.com/FlexMonkey