Spatial Rendering for Apple Vision Pro

Slide 1

Slide 1 text

Warren Moore December 12, 2024 Spatial Rendering for Apple Vision Pro with ARKit, Metal, and Compositor Services Metal is a registered trademark of Apple Inc. Apple Vision Pro is a trademark of Apple Inc.

Slide 2

Slide 2 text

Introduction 2

Slide 3

Slide 3 text

About me Worked at Apple (2013–2014; 2016–2017) Wrote Metal by Example Last spoke at SLUG ten years ago (!) @warrenm @warrenm.bsky.social 3

Slide 4

Slide 4 text

Sample code 4,000 lines of spatial goodness Physically based rendering engine in Metal Hand tracking and rendering Basic spatial interaction Scene reconstruction and occlusion …and more! 4 github.com/metal-by-example/ spatial-rendering

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Agenda ARKit Rendering Concepts Compositor Services Interaction & Immersion 6

Slide 7

Slide 7 text

ARKit for visionOS 7

Slide 8

Slide 8 text

ARKit for visionOS Topics Scene understanding Poses and transforms Data providers Running a session Handling updates 8

Slide 9

Slide 9 text

Scene understanding Building a map of the real world with cameras and sensors Anchors • Have a pose relative to ARKit ’ s origin • Represent an image, a hand, a surface, etc. • Each anchor type is generated by a different data provider 9

Slide 10

Slide 10 text

Poses 10 Pose = position and orientation ARKit origin pose • Determined by the system • At floor/ground height near user ’ s feet

Slide 11

Slide 11 text

Poses Considered as coordinate spaces X Points expressed as (x, y, z) triplets Relative to origin Coordinate = distance along an axis (x, y, z)

Slide 12

Slide 12 text

Poses Considered as matrices 11 Position → translation (T) Orientation → rotation (R) Scale → scaling (S) Combine into a TRS matrix (“transform”) ⃗ vworld = Mworld ⋅ ⃗ vmodel M = T ⋅ R ⋅ S

Slide 13

Slide 13 text

Poses Anchor transforms 12 protocol Anchor { var originFromAnchorTransform: simd_float4x4 { get } }

Slide 14

Slide 14 text

Device anchor transform 13 Device pose relative to origin Render content anchored to the real world, or the headset

Slide 15

Slide 15 text

Scene graphs Representing hierarchy 14 Parent-child entity relationships → spatial hierarchy Model transform = product of local and ancestors ’ transforms Anchoring an entity locks content in space

Slide 16

Slide 16 text

Data providers Ten types as of visionOS 2.0 We ’ ll focus on a few: • World tracking • Hand tracking • Plane tracking • Scene reconstruction 15

Slide 17

Slide 17 text

Running a session Exclusive to visionOS: ARKitSession A simple example: let dataProvider = WorldTrackingProvider() let session = ARKitSession() try await session.run([dataProvider]) 16

Slide 18

Slide 18 text

ARKit Permissions No permission required for world tracking Automatic prompts for permission based on your Info.plist NSWorldSensingUsageDescription Required for plane tracking, scene reconstruction, light estimation, etc. NSHandsTrackingUsageDescription Required for hand tracking 􀇿 17

Slide 19

Slide 19 text

Anchor updates AnchorUpdateSequence Each data provider has an anchorUpdates property AnchorUpdateSequence conforms to AsyncSequence Can be awaited on by a Task of suitable priority Task(priority: .low) { [weak self] in for await update in provider.anchorUpdates { // do something useful } } 18

Slide 20

Slide 20 text

Anchor updates Polling Ask for anchors at a particular time: let anchor = worldTrackingProvider.queryDeviceAnchor(atTimestamp: timestamp) May cause ARKit to interpolate or extrapolate Can fail 19

Slide 21

Slide 21 text

Concepts in 3D Rendering 20

Slide 22

Slide 22 text

Concepts in 3D Rendering Topics Data on the GPU The render pipeline Coordinate spaces 21

Slide 23

Slide 23 text

Data on the GPU Resources Buffer • Typeless allocation of memory • Conforms to the MTLBuffer protocol Texture • Formatted image data • Conforms to the MTLTexture protocol • Can be used as a render target 22

Slide 24

Slide 24 text

Data on the GPU Loading a model 23 Toy drummer model - Copyright 2022 Apple Inc. Model file (USDZ, glTF, etc.) Model loader (Model I/O, etc.) 0.1 0.23 0.12 0.44 0.78 0.11 0.23 0.77 0.91 0.34 0.87 0.66 0.87 0.91 0.2 1.1 0.9 0.33 0 2 1 3 2 4 5 4 6 6 7 8 8 9 10 12 11 12 GPU Resources Buffers Textures

Slide 25

Slide 25 text

Command submission Introduction Commands are batched into command buffers, via command encoders Command buffers are created by command queues Command submission follows a fire-and-forget pattern X

Slide 26

Slide 26 text

Command submission Command buffer encoding X Command Buffer (Active) Command Buffer (Committed) Command Encoder Command Queue “Set some state” “Bind this resource” “Draw a mesh” 101011 010101 010110 101011 010101 010110

Slide 27

Slide 27 text

The render pipeline Shaders Vertex function • Reads model-space position (and other attributes) • Produces a clip-space position (and other attributes) Fragment function • Receives interpolated vertex data • Determines the color of a sample 24

Slide 28

Slide 28 text

The render pipeline Overview 25 Raster Ops Stencil test Depth test Blending Render targets Rasterizer Perspective divide Viewport transform Vertex Postprocessing Programmable stage Fixed-function stage Vertex Function Vertex data, transforms, etc. Fragment Function

Slide 29

Slide 29 text

The render pipeline Render pipeline states X Render Pipeline Descriptor Shader functions Vertex descriptor Blend state Render target pixel formats Render Pipeline State …

Slide 30

Slide 30 text

Coordinate spaces Model space to world space 26 world origin camera

Slide 31

Slide 31 text

Coordinate spaces View space 27

Slide 32

Slide 32 text

Coordinate spaces Clip space 28 x=-1 y=-1 y=1 z=1 x=1

Slide 33

Slide 33 text

Coordinate spaces Vertex processing stage 29 Model Space World Space View Space Clip Space Model transform View transform Projection transform Implemented by the vertex function: ⃗ vclip = P ⋅ V ⋅ M ⋅ ⃗ vmodel

Slide 34

Slide 34 text

X struct VertexAttributes { float3 position [[attribute(0)]]; float3 normal [[attribute(1)]]; float2 texCoords [[attribute(2)]]; }; [[vertex]] VertexOut vertex_main(VertexAttributes in [[stage_in]], ...) { VertexOut out {}; out.position = modelViewProjectionMatrix * float4(in.position, 1.0f); // ... return out; }

Slide 35

Slide 35 text

Coordinate spaces Vertex post-processing stage 30 Clip Space NDC Viewport Space Perspective divide Viewport transform Performed during vertex post-processing

Slide 36

Slide 36 text

Coordinate spaces Viewport space 31 (0,0) (W-1, H-1)

Slide 37

Slide 37 text

32 Mono vs stereo rendering left view right view

Slide 38

Slide 38 text

33 Viewports Primary Secondary

Slide 39

Slide 39 text

Concepts in 3D Rendering Recap Load model data into GPU-resident resources Write shaders to transform vertices and perform lighting & shading Stereo rendering = render the same thing from different viewpoints 34

Slide 40

Slide 40 text

Compositor Services 35

Slide 41

Slide 41 text

ImmersiveSpace 36 An ImmersiveSpace is a SwiftUI Scene Hosts spatial content (e.g., RealityView or LayerRenderer) Can be in one of three immersion styles • Full • Progressive • Mixed

Slide 42

Slide 42 text

ImmersiveSpace Example 37 struct SpatialApp: App { @State var selectedImmersionStyle: (any ImmersionStyle) = .mixed var body: some Scene { ImmersiveSpace() { // content } .immersionStyle(selection: $selectedImmersionStyle, in: .mixed, .full) } }

Slide 43

Slide 43 text

Immersive space Opening on launch 38

Slide 44

Slide 44 text

Layouts 39 Dedicated Shared Layered slice 1 slice 0 Texture Viewport

Slide 45

Slide 45 text

LayerRenderer 40 A LayerRender conforms to ImmersiveSpaceContent Connects your Metal content to Compositor Services Provides frame objects

Slide 46

Slide 46 text

LayerRenderer Example 41 ImmersiveSpace() { CompositorLayer(configuration: LayerConfiguration()) { layerRenderer in Task.detached(priority: .high) { await renderLoop(layerRenderer) } } }

Slide 47

Slide 47 text

LayerRenderer Configuration 42 CompositorLayerConfiguration protocol makeConfiguration method • Enable/disable foveation • Select layout • Select pixel formats

Slide 48

Slide 48 text

LayerRenderer Configuration example 43 func makeConfiguration(capabilities: LayerRenderer.Capabilities, configuration: inout LayerRenderer.Configuration) { if capabilities.supportsFoveation { configuration.isFoveationEnabled = true configuration.layout = .layered } else { configuration.layout = .dedicated } configuration.colorFormat = .rgba16Float configuration.depthFormat = .depth32Float }

Slide 49

Slide 49 text

The render loop Preparation 44 Start ARKit session Load scene content Create render pipeline states and other long-lived Metal objects

Slide 50

Slide 50 text

The render loop LayerRenderer states 45 func run(_ layerRenderer: LayerRenderer) async { while true { switch layerRenderer.state { case .paused: layerRenderer.waitUntilRunning() case .running: autoreleasepool { renderFrame() } case .invalidated: return } } }

Slide 51

Slide 51 text

The render loop Frame timing 46 Query Frame and predict timing Update: frame.startUpdate() / frame.endUpdate() • Process input events, etc. Submit: frame.startSubmission() / frame.endSubmission() • Query Drawable • Encode rendering work • Present

Slide 52

Slide 52 text

Stereo rendering Drawables 47 Views Color Textures Depth Textures Rasterization Rate Maps Projection Matrices Device Anchor Resources Data

Slide 53

Slide 53 text

Stereo rendering View matrices 48 drawable.views[viewIndex].transform Eye pose relative to device anchor V = (Mdevice ⋅ Mview )−1

Slide 54

Slide 54 text

Stereo rendering Projection matrices 49 drawable.computeProjection(viewIndex:) Asymmetric, overlapping view frustums Based on device optics Must be honored to avoid discomfort

Slide 55

Slide 55 text

Stereo rendering Dedicated layout 50 Render one pass per eye Not very efficient • Render target changes • No shared work between eyes (twice the draw calls)

Slide 56

Slide 56 text

Stereo rendering Render pass descriptor (Dedicated) 51 func makeRenderPassDescriptor(for drawable: LayerRenderer.Drawable, passIndex: Int) -> MTLRenderPassDescriptor { let passDescriptor = MTLRenderPassDescriptor() passDescriptor.colorAttachments[0].loadAction = .clear passDescriptor.colorAttachments[0].clearColor = MTLClearColor(red: 0, green: 0, blue: 0, alpha: 1) passDescriptor.colorAttachments[0].texture = drawable.colorTextures[passIndex] passDescriptor.colorAttachments[0].storeAction = .store passDescriptor.depthAttachment.loadAction = .clear passDescriptor.depthAttachment.clearDepth = 0.0 passDescriptor.depthAttachment.texture = drawable.depthTextures[passIndex] passDescriptor.depthAttachment.storeAction = .store return passDescriptor }

Slide 57

Slide 57 text

Stereo rendering Frame encoding (Dedicated) 52 for passIndex in 0..

Slide 58

Slide 58 text

Advanced rendering Vertex amplification 53 Invoke the vertex pipeline multiple times for each vertex Reduces draw count by half when combined with layered rendering Specify primitive topology and amplification count up-front: if device.supportsVertexAmplificationCount(2) { renderPipelineDescriptor.inputPrimitiveTopology = .triangle renderPipelineDescriptor.maxVertexAmplificationCount = 2 }

Slide 59

Slide 59 text

Advanced rendering Layered rendering 54 Adapt shaders to be amplification-aware Target each vertex to a render target slice Combined with vertex amplification → render both eyes simultaneously

Slide 60

Slide 60 text

Layered rendering Frame encoding differences 55 passDescriptor.renderTargetArrayLength = drawable.colorTextures[0].arrayLength // create render command encoder // bind pipeline // bind resources renderCommandEncoder.setVertexAmplificationCount(2, viewMappings: nil) // issue draw calls

Slide 61

Slide 61 text

Layered rendering Shader differences (1/3) 56 struct PassConstants { float4x4 viewMatrices[2]; float4x4 projectionMatrices[2]; float3 cameraPositions[2]; };

Slide 62

Slide 62 text

Layered rendering Shader differences (2/3) 57 struct VertexOut { float4 clipPosition [[position]]; float3 normal; float2 texCoords; uint renderTargetSlice [[render_target_array_index]]; }; slice 1 slice 0

Slide 63

Slide 63 text

Layered renderer Shader differences (3/3) 58 vertex VertexOut vertex_main(VertexIn in [[stage_in]], constant PassConstants &frame, uint viewIndex [[amplification_id]]) { float4x4 viewMatrix = frame.viewMatrices[viewIndex]; float4x4 projectionMatrix = frame.projectionMatrices[viewIndex]; VertexOut out { ... }; out.renderTargetSlice = viewIndex; return out; }

Slide 64

Slide 64 text

Passthrough rendering 59 Clear color target to alpha = 0 Use premultiplied blending for correct compositing

Slide 65

Slide 65 text

Foveated rendering 60 foveal intermediate peripheral Photo by john ko on Unsplash

Slide 66

Slide 66 text

Foveated rendering Rasterization rate maps 61 full resolution reduced resolution limited resolution

Slide 67

Slide 67 text

Foveated rendering Compositor Services 62 passDescriptor.rasterizationRateMap = drawable.rasterizationRateMaps[passIndex] Not available on every combination of platform and layout

Slide 68

Slide 68 text

Compositor Services Recap Present immersive content with LayerRenderer Chose a layout that works with your engine—prefer layered Work with Compositor Services to time your frame submission Use Metal features like vertex amplification and rasterization rate maps 63

Slide 69

Slide 69 text

Interaction & Immersion 64

Slide 70

Slide 70 text

Interaction & Immersion Topics Spatial gestures Hand tracking and rendering Scene reconstruction and occlusion Physics 65

Slide 71

Slide 71 text

Spatial gestures SpatialEventCollection No RealityKit SpatialTapGesture, etc.—no RealityKit entities! Subscribe via LayerRenderer.onSpatialEvent Hand pose and (static) gaze direction for indirect pinch gestures Pay attention to concurrency 66

Slide 72

Slide 72 text

Hand tracking HandTrackingProvider Query HandAnchors at a given time (similar to DeviceAnchor): func handAnchors(at timestamp: TimeInterval) -> (left: HandAnchor?, right: HandAnchor?) Use ARKit extrapolation to get low-latency hand poses: let predictedTiming = frame.predictTiming() let timestamp = predictedTiming.trackableAnchorTime) 67

Slide 73

Slide 73 text

Hand tracking Hand skeletons 26 estimated joint poses Wrist joint pose = hand anchor pose Render stylized hands in full immersion Attach colliders for direct interaction 68

Slide 74

Slide 74 text

Hand tracking Hand rendering Vertex skinning on the GPU with transform feedback X float4 weights = vert.jointWeights; float4x4 skinningMatrix = weights[0] * jointTransforms[vert.jointIndices[0]] + weights[1] * jointTransforms[vert.jointIndices[1]] + weights[2] * jointTransforms[vert.jointIndices[2]] + weights[3] * jointTransforms[vert.jointIndices[3]]; float3 skinnedPosition = (skinningMatrix * float4(vert.position, 1.0f)).xyz;

Slide 75

Slide 75 text

Scene reconstruction Mesh anchors SceneReconstructionProvider → MeshAnchor Anchor geometry approximates real-world shapes Geometry contains MTLBuffers: no need to copy 69

Slide 76

Slide 76 text

Slide 77

Slide 77 text

Scene reconstruction Occlusion material Render MeshAnchor geometry before the rest of the scene Disable writing to the render buffer by setting the color write mask: renderPipelineDescriptor.colorAttachments[0].writeMask = [] Still writes to the depth buffer, occluding virtual content 71

Slide 78

Slide 78 text

Scene reconstruction Going further with shadows Mesh anchor geometry is a great “shadow catcher” Return black (or other shadow color) from fragment function alpha = shadow intensity X

Slide 79

Slide 79 text

Physics Looking to the Horizon 72 All content, games titles, trade names, trademarks, artwork and associated imagery are trademarks and/or copyright material of their respective owners.

Slide 80

Slide 80 text

Physics Getting a Jolt 73 Jolt Physics • MIT licensed • multi-core rigid body physics engine • written in C++ 😎 • for games and VR applications Jolt Physics ragdoll demo

Slide 81

Slide 81 text

Physics Topics 74 Physics shapes and bodies Coupling the scene graph and simulation Interaction via hit-testing

Slide 82

Slide 82 text

Physics Shapes 75 Simplified geometric representation of a mesh Can be idealized (sphere, box, capsule) Or a convex hull or general polyhedron

Slide 83

Slide 83 text

Physics Bodies 76 Hold properties like mass, friction, restitution Static bodies • Not simulated, don ’ t collide, can be collided with Dynamic bodies • Simulated, subject to forces, collide with all other bodies Kinematic bodies • Driven by user input or animations • Don ’ t respond to collisions or forces

Slide 84

Slide 84 text

Physics Coupling 77 During update phase Copy kinematic transforms to physics world Run physics simulation steps Copy transforms of moved objects back to scene graph

Slide 85

Slide 85 text

Physics Hit Testing 78 pinch pose gaze direction

Slide 86

Slide 86 text

Slide 87

Slide 87 text

Conclusion 80

Slide 88

Slide 88 text

Conclusion ARKit provides a rich set of data streams for scene understanding Metal and Compositor Services enable limitless spatial rendering capabilities Physics and interaction are D.I.Y. but aided by scene understanding 81

Slide 89

Slide 89 text

Conclusion Q&A 82 Sample code available on GitHub github.com/metal-by-example/spatial-rendering

Slide 90

Slide 90 text

Supplemental Slides X

Slide 91

Slide 91 text

Computer Graphics from Scratch is a terrific introduction to graphics programming topics: •3D mathematics •Ray tracing •Rasterization Read this if you want to really understand what your GPU is doing! Learning the fundamentals X