Upgrade to Pro — share decks privately, control downloads, hide ads and more …

iOS GPU Programming with Swift & Metal

iOS GPU Programming with Swift & Metal

Here are the slides from my talk, "iOS GPU Programming with Swift & Metal" at AltConf on June 9, 2015

The GitHub repo for this project is at: https://github.com/FlexMonkey/ParticleLab

simon gladman

June 09, 2015
Tweet

More Decks by simon gladman

Other Decks in Programming

Transcript

  1. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey iOS GPU Programming with

    Swift & Metal Is that a Cray in your pocket…? Simon Gladman for altconf June 2015 Twitter: @FlexMonkey Blog: flexmonkey.blogspot.com GitHub: github.com/FlexMonkey
  2. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey iOS GPU Programming with

    Swift & Metal Is that a Cray in your pocket…? Simon Gladman for altconf June 2015 Twitter: @FlexMonkey Blog: flexmonkey.blogspot.com GitHub: github.com/FlexMonkey
  3. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey What is Metal? •

    Metal is a framework for GPU programming • It’s the lowest level of abstraction compared to technologies such as SpriteKit and SceneKit • More work for the developer, but more powerful
  4. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey What is Metal? •

    Metal is a framework for GPU programming • It’s the lowest level of abstraction compared to technologies such as SpriteKit and SceneKit • More work for the developer, but more powerful • Designed for GPU accelerated 3D graphics and data-parallel computation
  5. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey What is Metal? •

    Metal is a framework for GPU programming • It’s the lowest level of abstraction compared to technologies such as SpriteKit and SceneKit • More work for the developer, but more powerful • Designed for GPU accelerated 3D graphics and data-parallel computation • Requires iOS and A7 processors or later
  6. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey What is Metal? •

    Metal is a framework for GPU programming • It’s the lowest level of abstraction compared to technologies such as SpriteKit and SceneKit • More work for the developer, but more powerful • Designed for GPU accelerated 3D graphics and data-parallel computation • Requires iOS and A7 processors or later
  7. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey What is Metal? •

    Metal is a framework for GPU programming • It’s the lowest level of abstraction compared to technologies such as SpriteKit and SceneKit • More work for the developer, but more powerful • Designed for GPU accelerated 3D graphics and data-parallel computation • Requires iOS and A7 processors or later • Shaders are written in a C++ based language
  8. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey CPU versus GPU •

    CPU’s handle a few software threads at a time • GPUs handle hundreds or thousands of threads in parallel
  9. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Different Types of Shader

    • Vertex • Takes the three dimensional position of each vertex of each triangle in a scene and maps it the the two dimensional screen coordinate.
  10. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Different Types of Shader

    • Vertex • Takes the three dimensional position of each vertex of each triangle in a scene and maps it the the two dimensional screen coordinate. • Fragment • Computes the colour of each pixel. • Manages textures, shadows, highlights, shading.
  11. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Different Types of Shader

    • Kernel • Allows us to build programs that operate in parallel on arrays or grids of data
  12. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Setting up Metal in

    a Swift Project • Create a device let device: MTLDevice = MTLCreateSystemDefaultDevice()
  13. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Setting up Metal in

    a Swift Project • Create a device let device: MTLDevice = MTLCreateSystemDefaultDevice() • Create a library let defaultLibrary: MTLLibrary = device.newDefaultLibrary()
  14. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Setting up Metal in

    a Swift Project • Create a device let device: MTLDevice = MTLCreateSystemDefaultDevice() • Create a library let defaultLibrary: MTLLibrary = device.newDefaultLibrary() • Create a command queue let commandQueue: MTLCommandQueue = device.newCommandQueue()
  15. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Setting up Metal: The

    kernel function • Define the kernel function let kernelFunction: MTLFunction = defaultLibrary.newFunctionWithName(“particleRendererShader")
  16. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Setting up Metal: The

    kernel function • Define the kernel function let kernelFunction: MTLFunction = defaultLibrary.newFunctionWithName(“particleRendererShader") • Define the pipeline state let pipelineState: MTLComputePipelineState = device.newComputePipelineStateWithFunction(kernelFunction!, error: nil)
  17. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Setting up Metal: The

    Command Buffer • Define the command buffer let commandBuffer: MTLCommandBuffer = commandQueue.commandBuffer()
  18. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Setting up Metal: The

    Command Encoder • Define the command encoder let commandEncoder: MTLComputeCommandEncoder = commandBuffer.computeCommandEncoder() commandEncoder.setComputePipelineState(pipelineState)
  19. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Preparing the kernel function:

    resources • Set some parameters var particleBrightness: Float = 0.8
  20. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Preparing the kernel function:

    resources • Set some parameters var particleBrightness: Float = 0.8 • Create and populate the buffer let particleBrightnessBuffer: MTLBuffer = device.newBufferWithBytes(&particleBrightness, length: sizeof(Float), options: nil)
  21. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Preparing the kernel function:

    resources • Set some parameters var particleBrightness: Float = 0.8 • Create and populate the buffer let particleBrightnessBuffer: MTLBuffer = device.newBufferWithBytes(&particleBrightness, length: sizeof(Float), options: nil) • Pass the buffer to the command encoder commandEncoder.setBuffer(particleBrightnessBuffer, offset: 0, atIndex: 2)
  22. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Texture instantiation • Initialise

    textures let textureDescriptor = MTLTextureDescriptor.texture2DDescriptorWithPixelFormat( MTLPixelFormat.RGBA8Unorm, width: Int(imageSide), height: Int(imageSide), mipmapped: false)
  23. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Passing the texture to

    the kernel function • Define input and output textures var textureA: MTLTexture = device.newTextureWithDescriptor(textureDescriptor) commandEncoder.setTexture(textureA, atIndex: 0)
  24. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Creating a Simple Particle

    System • Create a Particle type in Swift struct Particle { var positionX: Float = 0 var positionY: Float = 0 var velocityX: Float = 0 var velocityY: Float = 0 } • Create its equivalent in Metal struct Particle { float positionX; float positionY; float velocityX; float velocityY; };
  25. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Creating a Simple Particle

    System • Create and populate an array of Particle instances in Swift var particles = [Particle]() func setUpParticles() { for _ in 0 ..< particleCount // 250_000 { var positionX = Float(arc4random() % UInt32(imageSide)) var positionY = Float(arc4random() % UInt32(imageSide)) let velocityX = (Float(arc4random() % 10) - 5) / 10.0 let velocityY = (Float(arc4random() % 10) - 5) / 10.0 let particle = Particle( positionX: positionX, positionY: positionY, velocityX: velocityX, velocityY: velocityY) particles.append(particle) } }
  26. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Create one buffer for

    input into kernel function • Determine length of buffer let particleVectorByteLength = particles.count * sizeofValue(particles[0])
  27. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Create one buffer for

    input into kernel function • Determine length of buffer let particleVectorByteLength = particles.count * sizeofValue(particles[0]) • Create and populate the buffer var inVectorBuffer = device.newBufferWithBytes( &particles, length: particleVectorByteLength, options: nil)
  28. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Create one buffer for

    input into kernel function • Determine length of buffer let particleVectorByteLength = particles.count * sizeofValue(particles[0]) • Create and populate the buffer var inVectorBuffer = device.newBufferWithBytes( &particles, length: particleVectorByteLength, options: nil) • Pass the buffer to the command encoder commandEncoder.setBuffer(inVectorBuffer, offset: 0, atIndex: 0)
  29. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Create one buffer for

    output from kernel function • Define the Swift variable to receive the kernel results var resultdata = [Particle]( count:particles.count, repeatedValue: Particle()) • Create the buffer var outVectorBuffer = device.newBufferWithBytes( &resultdata, length: particleVectorByteLength, options: nil) • Pass the buffer to the shader commandEncoder.setBuffer(outVectorBuffer, offset: 0, atIndex: 1)
  30. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Setting up Metal: Threadgroups

    • A one dimensional thread group for a one dimensional array let threadExecutionWidth = pipelineState.threadExecutionWidth // 32 on iPad particle_threadGroupCount = MTLSize(width:threadExecutionWidth, height:1,depth:1) particle_threadGroups = MTLSize(width:particleCount / threadExecutionWidth, height:1, depth:1)
  31. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Executing the kernel function

    • Execute the shader commandEncoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupCount) commandEncoder.endEncoding() commandBuffer.commit()
  32. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Accessing the arrays in

    the shader • When the kernel function executes, both the input and output arrays are available. kernel void particleRendererShader( const device Particle *inParticle [[ buffer(0) ]], device Particle *outParticle [[ buffer(1) ]], uint id [[thread_position_in_grid]]) { const Particle thisParticle = inParticle[id]; outParticle[id].positionX = thisParticle.positionX + thisParticle.velocityX; outParticle[id].positionY = thisParticle.positionY + thisParticle.velocityY; }
  33. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Accessing the arrays in

    the shader • When the kernel function executes, both the input and output arrays are available. kernel void particleRendererShader( const device Particle *inParticle [[ buffer(0) ]], device Particle *outParticle [[ buffer(1) ]], uint id [[thread_position_in_grid]]) { const Particle thisParticle = inParticle[id]; outParticle[id].positionX = thisParticle.positionX + thisParticle.velocityX; outParticle[id].positionY = thisParticle.positionY + thisParticle.velocityY; }
  34. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Accessing the arrays in

    the shader • When the kernel function executes, both the input and output arrays are available. kernel void particleRendererShader( const device Particle *inParticle [[ buffer(0) ]], device Particle *outParticle [[ buffer(1) ]], uint id [[thread_position_in_grid]]) { const Particle thisParticle = inParticle[id]; outParticle[id].positionX = thisParticle.positionX + thisParticle.velocityX; outParticle[id].positionY = thisParticle.positionY + thisParticle.velocityY; }
  35. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Accessing the arrays in

    the shader • When the kernel function executes, both the input and output arrays are available. kernel void particleRendererShader( const device Particle *inParticle [[ buffer(0) ]], device Particle *outParticle [[ buffer(1) ]], uint id [[thread_position_in_grid]]) { const Particle thisParticle = inParticle[id]; outParticle[id].positionX = thisParticle.positionX + thisParticle.velocityX; outParticle[id].positionY = thisParticle.positionY + thisParticle.velocityY; }
  36. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Accessing the arrays in

    the shader • When the kernel function executes, both the input and output arrays are available. kernel void particleRendererShader( const device Particle *inParticle [[ buffer(0) ]], device Particle *outParticle [[ buffer(1) ]], uint id [[thread_position_in_grid]]) { const Particle thisParticle = inParticle[id]; outParticle[id].positionX = thisParticle.positionX + thisParticle.velocityX; outParticle[id].positionY = thisParticle.positionY + thisParticle.velocityY; }
  37. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Writing to the output

    texture • Set the color of the pixel at the particle coordinates kernel void particleRendererShader( texture2d<float, access::write> outTexture [[texture(0)]], const device Particle *inParticle [[ buffer(0) ]], constant float &particleBrightness [[buffer(2)]], uint id [[thread_position_in_grid]]) { const uint2 particlePosition(inParticle[id].positionX, inParticle[id].positionY); const float4 outColor(particleBrightness, particleBrightness, 0.0, 1.0); // RGBA yellow outTexture.write(outColor, particlePosition); }
  38. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Writing to the output

    texture • Set the color of the pixel at the particle coordinates kernel void particleRendererShader( texture2d<float, access::write> outTexture [[texture(0)]], const device Particle *inParticle [[ buffer(0) ]], constant float &particleBrightness [[buffer(2)]], uint id [[thread_position_in_grid]]) { const uint2 particlePosition(inParticle[id].positionX, inParticle[id].positionY); const float4 outColor(particleBrightness, particleBrightness, 0.0, 1.0); // RGBA yellow outTexture.write(outColor, particlePosition); }
  39. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Writing to the output

    texture • Set the color of the pixel at the particle coordinates kernel void particleRendererShader( texture2d<float, access::write> outTexture [[texture(0)]], const device Particle *inParticle [[ buffer(0) ]], constant float &particleBrightness [[buffer(2)]], uint id [[thread_position_in_grid]]) { const uint2 particlePosition(inParticle[id].positionX, inParticle[id].positionY); const float4 outColor(particleBrightness, particleBrightness, 0.0, 1.0); // RGBA yellow outTexture.write(outColor, particlePosition); }
  40. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Accessing the updated array

    in Swift • Once the kernel function has completed, we can access the updated array var data = NSData(bytesNoCopy: outVectorBuffer.contents(), length: particles.count*sizeof(Particle), freeWhenDone: false) data.getBytes(&particles, length:particles.count * sizeof(Particle))
  41. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Converting the output texture

    to a UIImage • Create a UIImage from the texture region = MTLRegionMake2D(0, 0, Int(imageSide), Int(imageSide)) textureA.getBytes(&imageBytes, bytesPerRow: bytesPerRowInt, fromRegion: region, mipmapLevel: 0) let providerRef = CGDataProviderCreateWithCFData( NSData(bytes: &self.imageBytes, length: self.providerLength)) imageRef = CGImageCreate(imageSide, imageSide, bitsPerComponent, bitsPerPixel, bytesPerRow, rgbColorSpace, bitmapInfo, providerRef, nil, false, renderingIntent) imageView.image = UIImage(CGImage: imageRef)!
  42. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Issues • The main

    performance issue isn’t running the kernel function, it’s moving the data between the GPU and CPU
  43. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Improving Performance With Shared

    Memory • To eliminate this bottleneck, we can share memory between the GPU and CPU • No data being copied - improved performance! • posix_memalign() allocates memory for use by both GPU and CPU
  44. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Shared GPU/CPU Memory •

    Declaring buffers and pointers let particleCount: Int = 2097152 var particlesMemory:UnsafeMutablePointer<Void> = nil let particlesMemoryByteSize:Int = 2097152 * sizeof(Particle) var particlesVoidPtr: COpaquePointer! var particlesParticlePtr: UnsafeMutablePointer<Particle>! var particlesParticleBufferPtr: UnsafeMutableBufferPointer<Particle>! • posix_memalign() for shared memory posix_memalign(&particlesMemory, 0x4000, particlesMemoryByteSize) particlesVoidPtr = COpaquePointer(particlesMemory) particlesParticlePtr = UnsafeMutablePointer<Particle>(particlesVoidPtr) particlesParticleBufferPtr = UnsafeMutableBufferPointer( start: particlesParticlePtr, count: particleCount)
  45. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Shared GPU/CPU Memory •

    Populating data for index in particlesParticleBufferPtr.startIndex ..< particlesParticleBufferPtr.endIndex { let particle = Particle( … ) particlesParticleBufferPtr[index] = particle }
  46. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Shared GPU/CPU Memory •

    Populating data for index in particlesParticleBufferPtr.startIndex ..< particlesParticleBufferPtr.endIndex { let particle = Particle( … ) particlesParticleBufferPtr[index] = particle } • Passing into kernel shader let particlesBufferNoCopy = device.newBufferWithBytesNoCopy( particlesMemory, length: Int(particlesMemoryByteSize), options: nil, deallocator: nil) commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 0) commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 1)
  47. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Image processing inside particle

    compute shader • If the number of particles is more than the number of pixels uint id [[thread_position_in_grid]] // passed in constructor uint2 textureCoordinate(fast::floor(id / imageWidth),id % int(imageWidth)); if (textureCoordinate.x < imageWidth && textureCoordinate.y < imageWidth) { float4 accumColor = inTexture.read(textureCoordinate); for (int j = -1; j <= 1; j++) { for (int i = -1; i <= 1; i++) { uint2 kernelIndex(textureCoordinate.x + i, textureCoordinate.y + j); accumColor.rgb += inTexture.read(kernelIndex).rgb; } } accumColor.rgb = (accumColor.rgb / 10.5f); accumColor.a = 1.0f; outTexture.write(accumColor, textureCoordinate); }
  48. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Room for Improvement •

    Another issue is converting the texture data to a UIImage and displaying in a UIImageView
  49. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Using CAMetalLayer • This

    code for managing textures.. let textureDescriptor = MTLTextureDescriptor.texture2DDescriptorWithPixelFormat( MTLPixelFormat.RGBA8Unorm, width: Int(imageSide), height: Int(imageSide), mipmapped: false) var textureA: MTLTexture = device.newTextureWithDescriptor(textureDescriptor) • …and all this code for converting textures to UIImages and displaying them… textureA.getBytes(&imageBytes, bytesPerRow: bytesPerRowInt, fromRegion: region, mipmapLevel: 0) let providerRef = CGDataProviderCreateWithCFData( NSData(bytes: &self.imageBytes, length: self.providerLength)) imageRef = CGImageCreate(imageSide, imageSide, bitsPerComponent, bitsPerPixel, bytesPerRow, rgbColorSpace, bitmapInfo, providerRef, nil, false, renderingIntent) imageView.image = UIImage(CGImage: imageRef)!
  50. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Using CAMetalLayer • No

    need to create textures • No need to create UIImage or UIImageView if let drawable = nextDrawable() { commandEncoder.setTexture(drawable.texture, atIndex: 0) commandEncoder.dispatchThreadgroups(particle_threadGroups, threadsPerThreadgroup: particle_threadGroupCount) commandEncoder.endEncoding() commandBuffer.commit() drawable.present() }
  51. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Using CAMetalLayer • No

    need to create textures • No need to create UIImage or UIImageView if let drawable = nextDrawable() { commandEncoder.setTexture(drawable.texture, atIndex: 0) commandEncoder.dispatchThreadgroups(particle_threadGroups, threadsPerThreadgroup: particle_threadGroupCount) commandEncoder.endEncoding() commandBuffer.commit() drawable.present() }
  52. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Using CADisplayLink • Better

    than using dispatch_async() or NSTimer • CADisplayLink is a timer bound to the display vsync let timer = CADisplayLink(target: self, selector: Selector("step")) timer.addToRunLoop(NSRunLoop.mainRunLoop(), forMode: NSDefaultRunLoopMode)
  53. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Using a vector •

    We can extend our simple particle object to define the properties of four particles using vector
  54. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Using a vector •

    In Swift, we create a Vector4 type representing a single particle struct Vector4 { var x: Float32 = 0 var y: Float32 = 0 var z: Float32 = 0 var w: Float32 = 0 }
  55. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Using a vector •

    In Swift, we create a Particle type representing four particles struct Particle { var A: Vector4 = Vector4(x: 0, y: 0, z: 0, w: 0) var B: Vector4 = Vector4(x: 0, y: 0, z: 0, w: 0) var C: Vector4 = Vector4(x: 0, y: 0, z: 0, w: 0) var D: Vector4 = Vector4(x: 0, y: 0, z: 0, w: 0) }
  56. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Using a vector •

    Passing them back and forth to Metal remains the same using setBuffer() let particlesBufferNoCopy = device.newBufferWithBytesNoCopy( particlesMemory, length: Int(particlesMemoryByteSize), options: nil, deallocator: nil) commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 0) commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 1)
  57. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Inside Metal • Instances

    of particles are now of type float4x4 • The arguments to the kernel shader look like: const device float4x4 *inParticles [[ buffer(0) ]], device float4x4 *outParticles [[ buffer(1) ]] • …and we access each particle by a subscript and their position and velocity with properties ‘x’, ‘y’, ‘z’ and ‘w’ inParticle[0].x inParticle[0].y inParticle[0].z inParticle[0].w
  58. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Inside Metal: Caveat •

    Looping over each of the four particle float definitions was actually slower than copy and pasting the code const uint2 particlePositionA(inParticle[0].x, inParticle[0].y); if (particlePositionA.x > 0 && particlePositionA.y > 0 && particlePositionA.x < imageWidth && particlePositionA.y < imageHeight) { outTexture.write(outColor, particlePositionA); } else if (respawnOutOfBoundsParticles) { inParticle[0].z = spawnSpeedMultipler * fast::sin(inParticle[0].x + inParticle[0].y); inParticle[0].w = spawnSpeedMultipler * fast::cos(inParticle[0].x + inParticle[0].y); inParticle[0].x = imageWidth / 2; inParticle[0].y = imageHeight / 2; } const float2 particlePositionAFloat(inParticle[0].x, inParticle[0].y); const float distanceZeroA = fast::max(distance_squared(particlePositionAFloat, gravityWellZeroPosition), 0.01); const float distanceOneA = fast::max(distance_squared(particlePositionAFloat, gravityWellOnePosition), 0.01); const float distanceTwoA = fast::max(distance_squared(particlePositionAFloat, gravityWellTwoPosition), 0.01); const float distanceThreeA = fast::max(distance_squared(particlePositionAFloat, gravityWellThreePosition), 0.01);
  59. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Metal Particles as a

    Component • https://github.com/FlexMonkey/ParticleLab particleLab = ParticleLab( width: UInt(view.frame.width), height: UInt(view.frame.height), numParticles: ParticleCount.TwoMillion) particleLab.frame = CGRect( x: 0, y: 0, width: view.frame.width, height: view.frame.height) view.layer.addSublayer(particleLab)
  60. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Public Properties • Properties

    var particleColor = ParticleColor(R: 1, G: 0.5, B: 0.2, A: 1) var dragFactor: Float = 0.97 var respawnOutOfBoundsParticles = false var showGravityWellPositions: Bool
  61. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Public Methods • Methods

    func resetGravityWells() func resetParticles(edgesOnly: Bool = true) func getGravityWellNormalisedPosition(#gravityWell: GravityWell) -> (x: Float, y: Float) func setGravityWellProperties(#gravityWellIndex: Int, normalisedPositionX: Float, normalisedPositionY: Float, mass: Float, spin: Float) func setGravityWellProperties(#gravityWell: GravityWell, normalisedPositionX: Float, normalisedPositionY: Float, mass: Float, spin: Float)
  62. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Metal Particles as a

    Component • Delegate protocol ParticleLabDelegate: NSObjectProtocol { func particleLabDidUpdate() func particleLabMetalUnavailable() }
  63. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Advanced Particle Systems •

    Particle array is passed into kernel function const device Particle *inParticles [[ buffer(0) ]]
  64. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Advanced Particle Systems •

    Particle array is passed into kernel function const device Particle *inParticles [[ buffer(0) ]] • The current index is also passed into kernel function uint id [[thread_position_in_grid]]
  65. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Advanced Particle Systems •

    Particle array is passed into kernel function const device Particle *inParticles [[ buffer(0) ]] • The current index is also passed into kernel function uint id [[thread_position_in_grid]] • Accessing an item Particle thisParticle = inParticles[id];
  66. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Advanced Particle Systems •

    Particle array is passed into kernel function const device Particle *inParticles [[ buffer(0) ]] • The current index is also passed into kernel function uint id [[thread_position_in_grid]] • Accessing an item Particle thisParticle = inParticles[id]; • Looping over entire array for (uint i = 0; i < 4096; i++) { const Particle otherParticle = inParticles[i]; […]
  67. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Advanced Particle Systems kernel

    void particleRendererShader(texture2d<float, access::write> outTexture [[texture(0)]], texture2d<float, access::read> inTexture [[texture(1)]], const device Particle *inParticles [[ buffer(0) ]], device Particle *outParticles [[ buffer(1) ]], constant SwarmGenome &genomeOne [[buffer(2)]], constant SwarmGenome &genomeTwo [[buffer(3)]], constant SwarmGenome &genomeThree [[buffer(4)]], constant float &particleBrightness [[buffer(5)]], constant float &gravityWellX [[buffer(6)]], constant float &gravityWellY [[buffer(7)]], uint id [[thread_position_in_grid]]) { Particle inParticle = inParticles[id]; const uint2 particlePosition(inParticle.positionX, inParticle.positionY); const int type = int(inParticle.type); const float4 outColor((type == 0 ? particleBrightness : particleBrightness / 2.0), (type == 1 ? particleBrightness : particleBrightness / 2.0), (type == 2 ? particleBrightness : particleBrightness / 2.0), 1.0); float neigbourCount = 0; float localCentreX = 0; float localCentreY = 0; float localDx = 0; float localDy = 0; float tempAx = 0; float tempAy = 0; if (gravityWellX > 1 && gravityWellY > 1) { const float dist = fast::distance(float2(inParticle.positionX, inParticle.positionY), float2(gravityWellX, gravityWellY)); const float factor = (1 / (dist < 1 ? 1 : dist)) * 10; inParticle.velocityX = inParticle.velocityX + (inParticle.positionX - gravityWellX) * factor; inParticle.velocityY = inParticle.velocityY + (inParticle.positionY - gravityWellY) * factor; } const SwarmGenome genome = type == 0 ? genomeOne : type == 1 ? genomeTwo : genomeThree; for (uint i = 0; i < 4096; i++) { if (i != id) { const Particle candidateNeighbour = inParticles[i]; const float dist = fast::distance(float2(inParticle.positionX, inParticle.positionY), float2(candidateNeighbour.positionX, candidateNeighbour.positionY)); if (dist < genome.radius * 100) { localCentreX = localCentreX + candidateNeighbour.positionX; localCentreY = localCentreY + candidateNeighbour.positionY; localDx = localDx + candidateNeighbour.velocityX; localDy = localDy + candidateNeighbour.velocityY; neigbourCount = neigbourCount + 1.0f; float foo = (dist < 1 ? 1 : dist) * (0.001 + genome.c3_seperation) * 50.0f; tempAx = tempAx + (inParticle.positionX - candidateNeighbour.positionX) / foo; tempAy = tempAy + (inParticle.positionY - candidateNeighbour.positionY) / foo; const float randomThree = fast::abs(fast::cos(candidateNeighbour.velocityX + candidateNeighbour.velocityY)); if (randomThree < genome.c4_steering) { const int randomOne = fast::cos(candidateNeighbour.positionX + candidateNeighbour.velocityY); const int randomTwo = fast::sin(candidateNeighbour.positionY + candidateNeighbour.velocityX); tempAx = tempAx + randomOne * 3; tempAy = tempAy + randomTwo * 3; } } } } if (neigbourCount > 0) { localCentreX = localCentreX / neigbourCount; localCentreY = localCentreY / neigbourCount; localDx = localDx / neigbourCount; localDy = localDy / neigbourCount; tempAx = tempAx + (localCentreX - inParticle.positionX) * genome.c1_cohesion; tempAy = tempAy + (localCentreY - inParticle.positionY) * genome.c1_cohesion; tempAx = tempAx + (localDx - inParticle.velocityX) * genome.c2_alignment; tempAy = tempAy + (localDy - inParticle.velocityY) * genome.c2_alignment; // accellerate inParticle.velocityX2 += tempAx; inParticle.velocityY2 += tempAy; float d = fast::sqrt(inParticle.velocityX2 * inParticle.velocityX2 + inParticle.velocityY2 * inParticle.velocityY2); if (d == 0) { d = 0.001f; } float accelerateMultiplier = ((genome.normalSpeed * 4.0f) - d) / d * genome.c5_paceKeeping; inParticle.velocityX2 += inParticle.velocityX2 * accelerateMultiplier; inParticle.velocityY2 += inParticle.velocityY2 * accelerateMultiplier; } inParticle.velocityX = inParticle.velocityX2; inParticle.velocityY = inParticle.velocityY2; inParticle.positionX += inParticle.velocityX; inParticle.positionY += inParticle.velocityY; outParticles[id] = inParticle; if (outParticles[id].positionX <= 0) { outParticles[id].positionX = 800; } else if (outParticles[id].positionX >= 800) { outParticles[id].positionX = 0; } if (outParticles[id].positionY <= 0) { outParticles[id].positionY = 800; } else if (outParticles[id].positionY >= 800) { outParticles[id].positionY = 0; } const float4 inColor = inTexture.read(particlePosition).rgba; outTexture.write(inColor + outColor, particlePosition); const float4 inColor2 = inTexture.read(particlePosition - uint2(1, 1)).rgba; outTexture.write(inColor2 + outColor, particlePosition - uint2(1, 1)); const float4 inColor3 = inTexture.read(particlePosition - uint2(0, 1)).rgba; outTexture.write(inColor3 + outColor, particlePosition - uint2(0, 1)); const float4 inColor4 = inTexture.read(particlePosition - uint2(1, 0)).rgba; outTexture.write(inColor4 + outColor, particlePosition - uint2(1, 0)); }
  68. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey New for iOS9 and

    OS X El Capitan • MetalKit • Texture loading • Model handling • View management
  69. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey New for iOS9 and

    OS X El Capitan • SceneKit and SpriteKit • Model IO • Import 3D Assets • Share data buffers with Metal • Light probes • Light scattering functions
  70. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey New for iOS9 and

    OS X El Capitan • Metal Performance Shaders • Histograms • Gaussian Blur • Edge detection • Convolution kernels • Erode / Dilate • Threshold
  71. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey Acknowledgements • http://metalbyexample.com/ •

    http://www.raywenderlich.com/ • http://memkite.com/ • Jay Fenton (https://github.com/jfenton) • http://bingweb.binghamton.edu/~sayama/
  72. Simon Gladman | flexmonkey.blogspot.com | @FlexMonkey iOS GPU Programming with

    Swift & Metal Simon Gladman for altconf June 2015 Twitter: @FlexMonkey Blog: flexmonkey.blogspot.com GitHub: github.com/FlexMonkey