"INTRODUCTION TO DIRECT X RAYTRACING" - Days of Future Past -

CG 技術の実装と数理 2018 Shinya Morishige 2018

STORY 2. DirectX Raytracing の実装 3. AO Ray と Indirect
Ray (Hybrid Rendering) 1. CPU と GPU を使った Raytracing の実装 4. 今後の課題と展望 CG 技術の実装と数理 2018 Shinya Morishige 2018

CPU、GPU、DXR の実装中心のお話です。 CG 技術の実装と数理 2018 Shinya Morishige 2018

CPUとGPUのRAY TRACING 実装 1. Create Camera Sensor 2. Build Scene
Structure 3. Ray Generation 4. Trace Ray (until camera far) 4-1. Geometry Intersection 4-2. Lighting and Shading Sample Radiance and Shadow 5. Final Output CG 技術の実装と数理 2018 Shinya Morishige 2018

CPU VS. GPU CPU Pros Trace Ray が再帰実装できる Ray Scattering
で使う疑似乱数、確率分布の実装が容易デバッグしやすいスタック、メモリが潤沢に確保できて分岐予測があるので、光線追跡を加速するデータ構造が実装しやすい CPU Cons 並列実行（メニイコアはコストがかかる） GPU Pros SIMD 並列でCPU の数十倍から数百倍の処理実行速度で、リアルタイム描画 GPU Cons 再帰が標準では使えない疑似乱数の関数がないメモリが少ない（AMD Vega 以降はHBM2, SSD 直結で大容量が扱える） CG 技術の実装と数理 2018 Shinya Morishige 2018

4-2. DIFFUSE MATERIAL (-2,-1,-1) (2,-1,-1) (-2,1,-1) (2,1,-1) ray=origin + direction
(u,v) origin https://en.wikipedia.org/wiki/Diffuse_reflection diffuse =拡散反射 Ray Scattering p (ray hit point) Normal Sample Random Rays Range [-1,1]^3 S CG 技術の実装と数理 2018 Shinya Morishige 2018

4-2. RAY SCATTERING (-2,-1,-1) (2,-1,-1) (-2,1,-1) (2,1,-1) (u,v) origin https://en.wikipedia.org/wiki/Diffuse_reflection
p (ray hit point) Normal Sample Random Rays Range [-1,1]^3 S bool Scatter(const CpuRay& r_in, const CpuRayPayload& rec, Vector3& attenuation, CpuRay& scattered) const { // s: random ray Vector3 s = rec.m_Position + rec.m_Normal + random_in_unit_sphere(); // CpuRay(origin, direction) scattered = CpuRay(rec.m_Position, s - rec.m_Position); attenuation = m_Albedo; return true; } Vector3 random_in_unit_sphere() { Vector3 p; do { // p ∊ [-1, 1] p = 2.0 * Vector3(drand48(),drand48(),drand48())) - Vector3(1, 1, 1); } while (squared_length(p) >= 1.0); return p; } CG 技術の実装と数理 2018 Shinya Morishige 2018

GPU 実装の検討事項 Shinya Morishige, 2018 1. Trace Ray 関数の再帰実行 →
Compute Shader 内で指定回数実行する (naive) 2. Ray Scattering で使う疑似乱数 → 疑似乱数を実装 → 拡散反射やOcclusion で球面上で一様にサンプルできる分布が使いたい 3. Material, Lights などの情報をGPU にどうやって渡すか 4. Final Output は、VRAM に直接書き込むので問題ない → VRAM 転送不要なので、Real-Time Rendering 向き CG 技術の実装と数理 2018 Shinya Morishige 2018

4. TRACE RAY Shinya Morishige, 2018 [numthreads(8, 8, 1)] void
CSMain(uint3 DTid : SV_DispatchThreadID) { uint width, height; RenderTarget.GetDimensions(width, height); int actualSampleCount = 0; float2 uv = float2(DTid.xy + float2(0.5f, 0.5f)) / float2(width - 1, height - 1); float3 color = float3(0.0f, 0.0f, 0.0f); for (int sample = 0; sample < sampleCount; ++sample) { float2 uv2 = uv + Noise(uv2 + randomSeed.y * DTid.z) * float2(1 / (float)width, 1 / (float)height); uv2.y = 1.0f - uv2.y; Ray ray = CreateCameraRay(uv2); for (int bounce = 0; bounce < bounceMax; ++bounce) { bool isScattered = SampleColor(ray, uv2); if (!isScattered) { color += lerp(ray.color, float3(0, 0, 0), ray.bounces > bounceMax); actualSampleCount++; break; } } } // end for sample (-2,-1,-1) (2,-1,-1) (-2,1,-1) (2,1,-1) ray=origin + direction (u,v) origin CG 技術の実装と数理 2018 Shinya Morishige 2018 Trace Ray の再帰は、 Compute でbounce max で指定回数実行で実現ただし、この方式は、Compute Thread の Divergence を引き起こす 8x8 threads 全てが同じbounce で終了すれば処理効率は落ちない 64 threads のうち、63 個が1bounce で完了、1個がbounce max なら全ての threads が、bounce max の thread 完了まで待つことになる。

4. TRACE RAY (改善) [numthreads(8, 8, 1)] void CSMain(uint3 DTid
: SV_DispatchThreadID) { float3 color = float3(0.0f, 0.0f, 0.0f); // バッファから1つ前の ray を読み込む Ray ray = RayBuffer[uint2(DTid.x, DTid.y)]; bool isScattered = SampleColor(ray, uv2); if (!isScattered) { color += lerp(ray.color, float3(0, 0, 0), ray.bounces > bounceMax); } // バッファに更新した ray を書き込む ray.accumColor = color = sqrt(color / float(actualSampleCount)); RayBuffer[uint2(DTid.x, DTid.y)] = ray; } // end for sample (-2,-1,-1) (2,-1,-1) (-2,1,-1) (2,1,-1) ray=origin + direction (u,v) origin Trace Ray の再帰で、Divergence を回避する方法 Ray そのものをUAV Buffer で読み書きする Dispatch() を複数回繰り返して、再帰を実現する CG 技術の実装と数理 2018 Shinya Morishige 2018

4-2. RAY SCATTERING (-2,-1,-1) (2,-1,-1) (-2,1,-1) (2,1,-1) (u,v) origin p
(ray hit point) Normal Sample Random Rays Range [-1,1]^3 S bool MaterialScatter(in uint materialIndex, in Ray r_in, in Payload rec, inout float3 attenuation, inout Ray scattered) { float3 s = rec.position + rec.normal + RandomInUnitSphere(rec.uv); scattered = CreateRay(rec.position, s - rec.position); scattered.origin = rec.position + rec.normal; scattered.direction = s - rec.position; scattered.color = r_in.color; scattered.bounces = r_in.bounces; scattered.material = materialIndex; attenuation = rec.albedo; } Compute Shader での疑似乱数は、CPU で事前生成した球面上の乱数列をBuffer に書き込み、Compute で読むそれだけでは、偏りがあるパタンになるので primary ray に対応した uv を入力とする Noise() を作り、その結果で球面上の乱数列をサンプルする (Interlevaed Gradient Noise) GPU で乱数列を生成する方式は、処理負荷が高くなる。ランダムアクセスの遅延は回避できる www.iryoku.com/next-generation-post-processing-in-call-of-duty-advanced-warfare CG 技術の実装と数理 2018 Shinya Morishige 2018

4-2. RAY SCATTERING (応用) https://www.irit.fr/~David.Vanderhaeghe/M2IGAI-CO/2016-g1/docs/spherical_fibonacci_mapping.pdf Ray Sampling Strategy 最小のRay Sampling
で、精度の高い結果を得たい（積分近似の精度向上） Ray サンプリング空間の球面上に、一様な分布を生成したい GPU で簡単に、定数時間で実装できる手法が欲しい →球面フィボナッチ・マッピング (Spherical Fibonacci Mapping) CG 技術の実装と数理 2018 Shinya Morishige 2018

GPU RAYTRACING DEMO CG 技術の実装と数理 2018 Shinya Morishige 2018

DIRECT X RAY TRACING (DXR) Rasterization ベースのDirect X Graphics に、
Ray Tracing を統合従来のGPU Ray Tracing, 秒 / frame DXR ミリ秒/frame で、リアルタイム特化 GPU Ray Tracing 実装で必要な機能が提供光線追跡時の物体との衝突判定は、専用のシェーダを使う光線追跡の再帰は、TraceRay() 組み込み関数を使う光線追跡の処理速度を加速するためのデータ構造が提供される Top と Bottom, 2層のAcceleration Structure として提供 CG 技術の実装と数理 2018 Shinya Morishige 2018 http://intro-to-dxr.cwyman.org/

DXR CONCEPTS 1.Acceleration Structure 2.Command DispatchRays() 3.Ray Shaders 4.Ray Tracing
State Object 5.Procedural Geometry CG 技術の実装と数理 2018 Shinya Morishige 2018 http://intro-to-dxr.cwyman.org/

RAY SHADERS の前に・・・DXR PIPELINE 出典 http://intro-to-dxr.cwyman.org/ Primary Ray 生成, Ray
交差判定, Ray Tracing 結果の出力まで CG 技術の実装と数理 2018 Shinya Morishige 2018 Primary Ray はRay Generation Shader で生成する CPU 側で、RaysDispatch() で起動する TraceRay() で、Ray を再帰的に追跡する Ray Shadres の中で実行する

RAY SHADERS の前に・・・DXR PIPELINE http://intro-to-dxr.cwyman.org/ 5+1 のシェーダは、3種類のシェーダグループに分類できる 1.Ray Generation 2.Hit
Group 3.Miss Shader その他 Callable (どこからでも呼べる) http://intro-to-dxr.cwyman.org/

DXR RAY SHADERS A Ray Generation Shader Intersection Shader Miss
Shader Closest-hit Shader Any-hit Shader Callable Shader CG 技術の実装と数理 2018 Shinya Morishige 2018 5+1 のシェーダは、3種類のグループに分類できる 1.Ray Generation 2.Hit Group Ray がGeometry と衝突した際に起動するシェーダ 3.Miss Shader どこからでも呼べる Ray を画面(Sensor) からシーンに放出する際に起動するシェーダ Ray が何も衝突しなかった場合に起動するシェーダ

実装の流れ 1. システム初期化 1. Ray Tracing Device 構築*1 2. Ray
Tracing Command List 初期化*1 2. 描画パス初期化 1. Final Output, GBuffer 構築 2. Ray Shaders 読込 3. Raytracing Pipeline State Object 構築*1 4. Ray Shader Table 構築*1 3. シーン初期化 1. Scene 構築 2. Acceleration Structure 構築*1 3. Ray Shader Table 更新（マテリアル情報） *1 4. シーン更新 1. Acceleration Structure 更新(Update Flag) 5. レイトレーシング / ラスタライズ描画の実行 1. RaysDispatch() Command 実行*1 CG 技術の実装と数理 2018 Shinya Morishige 2018 *1 は、DXR 対応GPU(e.g., Turing, Volta) と非対応GPU (Fallback Layer, Compute) でオブジェクトが別物になる処理所感システムやRay State Object, Acceleraiton Structure の実装がかなり煩雑なので Framework を作った方がよい理由は、DXR 対応と非対応の両方に対応できるようにするため、DirectX 特有の初期化コードの長さ

今回の実装 1. DXR 対応GPU と非対応GPU の両方で動作 → NVIDIA GPU
だけでなく、AMD / Intel / Game Console で動かしたい 2. 検証しやすい基本的なシーンを試す Triangle Polygon Procedural Geometry →AABB に対してRay Shader で任意の形状を生成 3. 照明各種の検証 Hybrid Rendering Direct Illumination Diffuse and Specular Hard Shadow Global Illumination Reflection / Refraction Ambient Occlusion Indirect Diffuse (Irradiance) CG 技術の実装と数理 2018 Shinya Morishige 2018

今回、実装する描画パス Hybrid Rendering 1. GBuffer は、Raytraced と Rasterized の２種類 (今回は別々に参照、Raytracing
はDepth なし） 2. GBuffer を使ったRaytraced Lighting and Shadows Raytracing は、Direct / Indirect Illumination / Shadows をマルチ描画パスではなく、1つのシェーダで実行できるパフォーマンスと絵作りの都合上、成分を選択する（できる） Lighting and Shadows (Raytraced) GBuffer (Rasterized) Position / Normal / Diffuse/ Specular GBuffer (Raytraced) Position / Normal / Diffuse / Specular Direct Illumination Global Illumination 結果は、UAV テクスチャに書き出す Indirect Diffuse Ambient Occlusion Diffuse Hard Shadows (Visibility) Indirect Specular Reflection / Refraction CG 技術の実装と数理 2018 Shinya Morishige 2018

CG 技術の実装と数理 2018 Shinya Morishige 2018 SIMPLE DXR FRAMEWORK の設計と実装（概念モデル）
DxrContext DXR のDeviceContext DxrRenderPass 描画パス（主役） DxrOutput レイトレース結果（UAV テクスチャ） DXR API を直接使って、書いているとコード量がかなり増えてデバッグや実験の遂行が困難になってくる。機能をまとめて、種々のレイトレが実装しやすくなるような簡易なフレームワークをつくっていく。 NVIDIA Falcor があるが、現在、DXR は、NVIDIA Volta, Turing GPU でのみ動作する。 Intel や AMD のGPU でもDXR が利用できるのが望ましい。（Fallback Layer の実装が必要） DxrShaderTable YourGraphics::Context 既存の描画システム DxrScene class Scene 既存のシーン DxrAccelerationStructure レイトレースで参照するシーン（BVH)

1. システム初期化 http://intro-to-dxr.cwyman.org/ 1. Ray Tracing Device 構築 2. Ray
Tracing Command List 初期化 class DxrContext // 1. Ray Tracing Device 構築 virtual bool DxrContext::Initialize() { ID3D12Device* device = Graphics::GetDevice(); // 既存ライブラリ、エンジンとの接続 if (RaytracingAPI::FallbackLayer == m_raytracingAPI) { // DXR 非対応 GPU 向けのRay Tracing Device (Fallback Layer) CreateRaytracingFallbackDeviceFlags createDeviceFlags = m_forceComputeFallback ? CreateRaytracingFallbackDeviceFlags::ForceComputeFallback : CreateRaytracingFallbackDeviceFlags::None; D3D12CreateRaytracingFallbackDevice(device, createDeviceFlags, 0, IID_PPV_ARGS(&m_fallbackDevice)); } else { // DXR 対応 GPU 向けのRay Tracing Device device->QueryInterface(IID_PPV_ARGS(&m_dxrDevice)); } return true; } CG 技術の実装と数理 2018 Shinya Morishige 2018

1. システム初期化 http://intro-to-dxr.cwyman.org/ 1. Ray Tracing Device 構築 2. Ray
Tracing Command List 初期化 class DxrContext // 2. Ray Tracing Command List 初期化 virtual bool DxrContext::InitializeCommandList(GraphicsContext& graphicsContext) { ID3D12GraphicsCommandList* commandList = graphicsContext.GetCommandList(); if (RaytracingAPI::FallbackLayer == m_raytracingAPI) { // DXR 非対応 GPU 向けのRay Tracing CommandList (Fallback Layer) m_fallbackDevice->QueryRaytracingCommandList(commandList, IID_PPV_ARGS(&m_fallbackCommandList)); } else { // DXR 対応 GPU 向けのRay Tracing CommandList commandList->QueryInterface(IID_PPV_ARGS(&m_dxrCommandList)); } return true; } CG 技術の実装と数理 2018 Shinya Morishige 2018

2. 描画パス初期化 1. Final Output, GBuffer 構築 2. Ray Shaders
読込 3. Raytracing Pipeline State Object (PSO) 構築*1 4. Ray Shader Table を構築 class DxrRenderPass // Raytracing 結果を格納するUAV テクスチャ struct DxrOutput { DxrComPtr<ID3D12Resource> m_resource; CD3DX12_RESOURCE_DESC m_desc; D3D12_GPU_DESCRIPTOR_HANDLE m_gpuHandleUav; D3D12_GPU_DESCRIPTOR_HANDLE m_gpuHandleSrv; D3D12_CPU_DESCRIPTOR_HANDLE m_cpuHandleUav; UINT m_descriptorIndexUav = UINT_MAX; UINT m_descriptorIndexSrv = UINT_MAX; }; enum DxrOutputType { Position = 0, Normal, Diffuse, Specular, Ao, Indirect, Final, Count, }; struct DxrOutput ID3D12Device* device = Graphics::GetDevice(); // 既存ライブラリ、エンジンとの接続 for(UINT index=0; index<OutputType::Count; ++index) { auto& output = m_dxrOutput[index]; // テクスチャリソースとUAV 作成 auto texDesc = CD3DX12_RESOURCE_DESC::Tex2D(GetOutputFormat(index), m_width, m_height, 1, 1, 1, 0, D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS); device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT), D3D12_HEAP_FLAG_NONE, &texDesc, D3D12_RESOURCE_STATE_UNORDERED_ACCESS, nullptr, IID_PPV_ARGS(&output.m_resource))); output.m_desc = texDesc; // UAV 作成 DxrAllocateDescriptor(&output); D3D12_UNORDERED_ACCESS_VIEW_DESC uavDesc = {}; uavDesc.ViewDimension = D3D12_UAV_DIMENSION_TEXTURE2D; device->CreateUnorderedAccessView(output.m_resource.Get(), nullptr, &uavDesc, cpuHandleUav); output.m_gpuHandleUav = CD3DX12_GPU_DESCRIPTOR_HANDLE( m_descriptorHeap->GetGPUDescriptorHandleForHeapStart(), output.m_descriptorIndexUav, m_descriptorSize); } // SRV 作成（省略） CG 技術の実装と数理 2018 Shinya Morishige 2018 検証目的で、今回は。成分ごとに出力UAV を構築して書き出す。

読込 3. Raytracing Pipeline State Object (PSO) 構築*1 4. Ray Shader Table を構築 class DxrRenderPass // Hit Group struct DxrHitGroup { std::wstring m_name; std::wstring m_closestHitShaderName; std::wstring m_intersectionShaderName; std::wstring m_anyHitShaderName; DxrComPtr<ID3D12Resource> m_shaderTable; UINT m_shaderTableStrideInBytes = UINT_MAX; } // 描画パス（padding は自動。わかりやすさを優先したデータメンバのレイアウト） class DxrRenderPass { public: DxrHitGroup m_hitGroupTriangle; DxrHitGroup m_hitGroupProceduralGeometry; std::wstring m_name; std::wstring m_raygenShaderName; std::wstring m_missShaderName; DxrComPtr<ID3D12Resource> m_rayGenShaderTable; DxrComPtr<ID3D12Resource> m_missShaderTable; UINT m_missShaderTableStrideInBytes = UINT_MAX; // DXR 非対応GPU DxrComPtr<ID3D12RaytracingFallbackStateObject> m_fallbackStateObject; // DXR 対応GPU DxrComPtr<ID3D12StateObjectPrototype> m_dxrStateObject; DxrComPtr<ID3D12RootSignature> m_rootSignature; DxrComPtr<ID3D12RootSignature> m_localRootSignature; UINT8 m_type; public: virtual ~ RenderPass(){} virtual void Terminate(); public: virtual void Execute(ID3D12CommandList* commandList, UINT frameIndex); }; // RaytracedGBuffer.hlsl: レイ生成シェーダ DispatchRays() コマンド実行時に起動するシェーダ。 [shader("raygeneration”)] void GBufferRaygenShader() { Ray ray = ComputeCameraRay(DispatchRaysIndex().xy, DispatchRaysDimensions().xy); RayDesc rayDesc; rayDesc.Origin = ray.origin ; rayDesc.Direction = ray.direction; rayDesc.TMin = 0; rayDesc.TMax = 10000; GBufferRayPayload rayPayload=false; // Scene に対して、Ray Trace して、GBuffer の成分となる位置、法線、Radiance を取得 TraceRay(g_sceneBVH ,, RAY_FLAG_CULL_BACK_FACING_TRIANGLES, 0xFF, 0, /* hit group offset */ 1, /* hit group stride */ 0, /* miss shader offset */ rayDesc , rayPayload); } CG 技術の実装と数理 2018 Shinya Morishige 2018 Ray Shader は、対応したhlsl ファイルを事前にコンパイルしておく。コンパイルしたバイナリを C++ Header としてインクルードして、実行ファイルにシェーダバイナリを結合する。 dxc.exe -T lib_6_1 -HV 2017 -O4 –Zpr /Zi RaytracedGBuffer.hlsl g_RaytracedGBuffer .¥output¥RaytracedBuffer.hlsl.h // main.cpp #include "RaytracingRaytracedGBuffer.hlsl.h“ D3D12_SHADER_BYTECODE shaderBin=CD3DX12_SHADER_BYTECODE((void*)g_RaytracedGBuffer,ARRAYSIZE(g_RaytracedGBuffer));

読込 3. Raytracing Pipeline State Object (PSO) 構築*1 4. Ray Shader Table 構築 class DxrRenderPass // Hit Group struct DxrHitGroup { std::wstring m_name; std::wstring m_closestHitShaderName; std::wstring m_intersectionShaderName; std::wstring m_anyHitShaderName; DxrComPtr<ID3D12Resource> m_shaderTable; UINT m_shaderTableStrideInBytes = UINT_MAX; } // 描画パス（padding は自動。わかりやすさを優先したデータメンバのレイアウト） class DxrRenderPass { public: DxrHitGroup m_hitGroupTriangle; DxrHitGroup m_hitGroupProceduralGeometry; std::wstring m_name; std::wstring m_raygenShaderNames; std::wstring m_missShaderNames; DxrComPtr<ID3D12Resource> m_rayGenShaderTable; DxrComPtr<ID3D12Resource> m_missShaderTable; UINT m_missShaderTableStrideInBytes = UINT_MAX; // DXR 非対応GPU DxrComPtr<ID3D12RaytracingFallbackStateObject> m_fallbackStateObject; // DXR 対応GPU DxrComPtr<ID3D12StateObjectPrototype> m_dxrStateObject; DxrComPtr<ID3D12RootSignature> m_rootSignature; DxrComPtr<ID3D12RootSignature> m_localRootSignature; UINT8 m_type; public: virtual ~ RenderPass(){} void Terminate(); }; // main.cpp, 描画パスシェーダ DxrRenderPass& pass = m_renderPass[GBuffer]; pass.m_name = L“RaytracedGBuffer"; // RaytracedGBuffer.hlsl の raygeneration の関数名を指定する pass.m_raygenShaderName = L“GBufferRaygenShader”; // 同様に、miss, intersection, closesthit について .hlsl の関数名を指定する、Hit Group 名も決めておく。 // PSO は、描画パスごとに構築する（描画パスごとにShader や Local Root Sigunature が異なるため） void CreateRaytracingPSO(DxrRenderPass& outRenderPass) { CD3D12_STATE_OBJECT_DESC pso{ D3D12_STATE_OBJECT_TYPE_RAYTRACING_PIPELINE }; // PSO にShader Binary を関連付ける BuildDxilLibrarySubobjects(&pso,outRenderPass); // PSO にHit Group (Closest/Intersection/AnyHit) を関連付ける BuildHitGroupSubobjects(&pso,outRenderPass); // PSO にRaytracing Shader の設定（Raytracing で運ぶ情報 payload サイズ上限など）を関連付ける BuildShaderConfigSubobjects(&pso,outRenderPass); // PSO にLocal Root Sigunature (arguments) を関連付ける(シェーダ個別のRoot Sigunature） BuildLocalRootSigunatureSubobjects(&pso,outRenderPass); // PSO にGlobal Root Sigunature (arguments) を関連付ける (シェーダ共通のRoot Sigunature） BuildGlobalRootSigunatureSubobjects(&pso,outRenderPass); // PSO にPipeline の設定（Raytracing でのトレース深さ上限など）を関連付ける BuildPipelineConfigSubobjects(&pso,outRenderPass); // PSO 構築（DXR 非対応GPU と対応GPU で構築に使うデバイスが異なる） if (RaytracingAPI::FallbackLayer == m_raytracingAPI) { // DXR 非対応GPU m_fallbackDevice->CreateStateObject(pso, IID_PPV_ARGS(&outRenderPass.m_fallbackStateObject)); } else { // DXR 対応GPU m_dxrDevice->CreateStateObject(pso, IID_PPV_ARGS(&outRenderPass.m_dxrStateObject)); } } CG 技術の実装と数理 2018 Shinya Morishige 2018 レイトレーシング用のPSO （描画パイプライン）を構築する。 PSO は、描画コールDispatchRays() 単位で用意する。 PSO は使うシェーダやHit Group、Root Sigunature (Global/Local, シェーダ固有のGPU リソース）, レイトレ用のシェーダ設定（Payload サイズ上限など）, パイプライン設定（トレースの深さ上限）をまとめたオブジェクト。それぞれの関数の詳細は、後日公開のGitHub 参照

読込 3. Raytracing Pipeline State Object (PSO) 構築*1 4. Ray Shader Table 構築 class DxrShaderTable // Shader Record は、ShaderID と Local Root Arguments を記録する class DxrShaderRecord { public: void* m_shaderId = nullptr; void* m_localRootArguments = nullptr; uint32_t m_shaderIdSize = 0; uint32_t m_localRootArgumentsSize = 0; void CopyTo(void* dst) const; }; // Shader Table class DxrShaderTable : public YourGraphics::GpuBuffer { private: uint8_t* m_ptr = nullptr; UINT m_recordSize = D3D12_RAYTRACING_SHADER_RECORD_BYTE_ALIGNMENT; std::wstring m_name; std::vector<DxrShaderRecord> m_recordArray; public: DxrShaderTable(UINT recordCount, UINT recordSize, const std::wstring& name) : m_name(name) { // Shader Record サイズは、16 byte アライメント m_recordSize = recordSize; m_recordArray.reserve(recordCount); UINT bufferSize = recordCount * m_recordSize; Alloc(bufferSize); m_ptr = map(); } void Register(const DxrShaderRecord& record) { m_recordArray.push_back(record); record.CopyTo(m_ptr); m_ptr += m_recordSize; } UINT GetRecordSize() { return m_recordSize; } }; // main.cpp, Shader Table 構築 DxrRenderPass& pass = m_renderPass[GBuffer]; void BuildShaderTables(DxrRenderPass& outRenderPass) { // Shader ID のサイズ取得（Shader Tableは複数のShader Recordを持ち、レコードのオフセットサイズとして使う） // Shader Record は、Shader ID と Local Root Arguments （インスタンスごとのマテリアルなど）を持つ。 // 今回は、Shader ID のみとする。 void* rayGenShaderID = nullptr; UINT shaderIDSize = 0; if (RaytracingAPI::FallbackLayer == m_raytracingAPI) { shaderIDSize = m_fallbackDevice->GetShaderIdentifierSize(); rayGenShaderID = outRenderPass.m_fallbackStateObject->GetShaderIdentifier(outRenderPass.m_raygenShaderNames.c_str()); } else { shaderIDSize = m_dxrDevice->GetShaderIdentifierSize(); rayGenShaderID = outRenderPass.m_dxrStateObject->GetShaderIdentifier(outRenderPass.m_raygenShaderNames.c_str()); } // Ray Generation Shader DxrShaderTable rayGenShaderTable(1, shaderIDSize, L"RayGenerationShaderTable" ); rayGenShaderTable.Register(DxrShaderRecord(rayGenShaderID, shaderIDSize, nullptr, 0)); outputRenderPass.m_rayGenShaderTable = rayGenShaderTable.GetGpuResource(); // 同様に Miss Shader, Hit Group Shader についてもShader Table を構築する } CG 技術の実装と数理 2018 Shinya Morishige 2018 Shader Table を構築する。 Shader Table で、レイトレースごとに、シェーダとリソースを切り替えることができる。 TraceRay() の引数にShader Record のインデックスを指定して切り替える。 Shader Table は、Shader Record を複数持つ。 Shader Record は、Shader ID と Local Root Arguments を持つ。

3. シーン初期化 1. Scene 構築 2. Acceleration Structure 構築*1 3.
Ray Shader Table 更新（マテリアル情報など） class DxrScene // GPU Buffer class GpuBuffer; // Scene class DxrScene : public YourGraphics::Scene { public: std::vector<D3D12_RAYTRACING_AABB> m_proceduralGeometryArray; GpuBuffer m_proceduralGeometryBuffer; public: void Build(); }; // main.cpp, シーン構築 void BuildScene(DxrScene& outScene) { // Sphere Procedural Geometry 構築(AABB) auto CreateSphere = [&](auto& center, auto radius) { return D3D12_RAYTRACING_AABB{ center.x - radius, center.y - radius, center.z - radius, center.x + radius, center.y + radius, center.z + radius, }; }; // Sphere を Procedural Geometry で構築する outScene.m_proceduralGeometryArray.push_back(CreateSphere(Float3(0.0f, 0.0f, -1.0f), 0.25f)); outScene.m_proceduralGeometryArray.push_back(CreateSphere(Float3(0.0f, -100.5f, -1.0f), 100.0f)); outScene.m_proceduralGeometryArray.push_back(CreateSphere(Float3(0.5f, -0.25f, -1.0f), 0.25f)); outScene.m_proceduralGeometryArray.push_back(CreateSphere(Float3(-0.5f, -0.25f, -1.0f), 0.25f)); outScene.m_proceduralGeometryArray.push_back(CreateSphere(Float3(0.0f, 0.0f, -1.5f), 0.25f)); outScene.m_proceduralGeometryArray.push_back(CreateSphere(Float3(0.5f, -0.25f, -1.5f), 0.25f)); outScene.m_proceduralGeometryArray.push_back(CreateSphere(Float3((-0.5f, 0.25f, -1.5f), 0.25f)); // GPU VRAM に Sphere 情報を Structured Buffer としてアップロード outScene.Build(); } CG 技術の実装と数理 2018 Shinya Morishige 2018

Ray Shader Table 更新（マテリアル情報など） class DxrAccelerationStructure // Acceleration Structure // DxrPtr<class> : Shared Pointer class DxrAccelerationStructure { public: DxrPtr<DxrScene> m_scene; DxrComPtr<ID3D12Resource> m_scratch; DxrComPtr<ID3D12Resource> m_resource; // AS DxrComPtr<ID3D12Resource> m_instance; UINT m_dataSizeInBytes = 0; public: // BLAS 構築 void Build(const std::vector<D3D12_RAYTRACING_GEOMETRY_DESC>& geometryDescArray); // TLAS 構築 void Build(const std::vector<DxrAccelerationStructure>& blasArray); }; // main.cpp, Raytracing で使うシーンのAcceleration Structure 構築。UAV で構築するので、コマンドリストを用意する void BuildAccelerationStructure(DxrAccelerationStructure& outTLAS) { auto device = Graphics::GetD3DDevice(); auto commandList = Graphics::GetRaytracingCommandList(); auto commandQueue = Graphics::GetCommandQueue(); auto commandAllocator = Graphics::GetCommandAllocator(); // コマンドリストの初期化 commandList->Reset(commandAllocator, nullptr); // Bottom Level Acceleration Structure (BLAS) 構築, 今回は１つだけ std::vector<DxrAccelerationStructure> blasArray(1); std::vector<D3D12_RAYTRACING_GEOMETRY_DESC> geometryDescArray; SetupGeometryDescsBLAS(geometryDescArray); blasArray[0].Build(geometryDescArray); // BLAS 構築完了を待つためのリソースバリア D3D12_RESOURCE_BARRIER resourceBarrier; resourceBarrier = CD3DX12_RESOURCE_BARRIER::UAV(blasArray[0].resource.Get()); commandList->ResourceBarrier(1, &resourceBarrier); // Top Level Acceleration Structure (TLAS) 構築 DxrAccelerationStructure tlas; tlas.Build(blasArray); // TLAS 構築のコマンドリスト実行と完了待ち Graphics::ExecuteCommandList(); Graphics::WaitForGpu(); // 記録 outTLAS = tlas; } CG 技術の実装と数理 2018 Shinya Morishige 2018

Ray Shader Table 更新（マテリアル情報など） // マテリアル情報を格納する定数バッファ sturct ProceduralGeometryMaterialConstantBuffer { float4 m_albedo; float m_diffuse; float m_specular; float m_reflection; float m_padding; }; // インスタンスの属性情報を格納する定数バッファ sturct ProceduralGeometryInstanceAttributeConstantBuffer { UINT m_instanceIndex; UINT m_padding[3]; }; // Local Root Arguments sturct ProceduralGeometryLocalRootArguments { ProceduralGeometryMaterialConstantBuffer m_material; ProceduralGeometryInstanceAttributeConstantBuffer m_attribute; }; // main.cpp const UINT SphereCount = 7; DxrShaderTable m_hitgroupShaderTable; ProceduralGeometryMaterialConstantBuffer m_sphereMaterial[SphereCount]; // シーンとAceeleration Structure 構築後に、Shader Table に描画モデルのマテリアル情報を登録する void UpdateShaderTable(DxrRenderPass& outRenderPass) { // Hit Group Shader ID を取得する。描画パスごとに登録する。（シェーダが異なるので） void* hitGroupShaderID = nullptr; UINT shaderIDSize = 0; if (RaytracingAPI::FallbackLayer == m_raytracingAPI) { // DXR 非対応GPU shaderIDSize = m_fallbackDevice->GetShaderIdentifierSize(); hitGroupShaderID = outRenderPass.m_fallbackStateObject->GetShaderIdentifier(outRenderPass.m_hitGroupShaderNames.c_str()); } else { // DXR 対応GPU shaderIDSize = m_dxrDevice->GetShaderIdentifierSize(); hitGroupShaderID = outRenderPass.m_dxrStateObject->GetShaderIdentifier(outRenderPass.m_hitGroupShaderNames.c_str()); } // Hit Group の Shader Table に、マテリアル情報、インスタンスID をLocal Root Arguments として登録する ProceduralGeometryLocalRootArguments rootArgs; for(UINT index=0; index<SphereCount; ++index) { rootArgs.m_material = m_sphereMaterial[index]; rootArgs.m_attribute.m_instanceIndex = index; m_hitgroupShaderTable.push_back(DxrShaderRecord(hitGroupShaderID, shaderIDSize, &rootArgs, sizeof(rootArgs)); } } CG 技術の実装と数理 2018 Shinya Morishige 2018

5. レイトレーシングの実行 1. RaysDispatch() Command 実行 // DXR 非対応GPU と対応GPU
の両方に対応した DispatchRays() void DispatchRays(auto* commandList, auto* stateObject, auto* dispatchDesc) { dispatchDesc->HitGroupTable.StartAddress = m_hitGroupShaderTable->GetGPUVirtualAddress(); dispatchDesc->HitGroupTable.SizeInBytes = m_hitGroupShaderTable->GetDesc().Width; dispatchDesc->HitGroupTable.StrideInBytes = m_hitGroupShaderTableStrideInBytes; dispatchDesc->MissShaderTable.StartAddress = m_missShaderTable->GetGPUVirtualAddress(); dispatchDesc->MissShaderTable.SizeInBytes = m_missShaderTable->GetDesc().Width; dispatchDesc->MissShaderTable.StrideInBytes = m_missShaderTableStrideInBytes; dispatchDesc->RayGenerationShaderRecord.StartAddress = m_rayGenShaderTable->GetGPUVirtualAddress(); dispatchDesc->RayGenerationShaderRecord.SizeInBytes = m_rayGenShaderTable->GetDesc().Width; dispatchDesc->Width = m_outputWidth; dispatchDesc->Height = m_outputHeight; commandList->DispatchRays(stateObject, dispatchDesc); } // BLAS インスタンスごとに設定する定数バッファ struct BlasInstancePerFrameBuffer { Float4x4 m_localToBLAS; Float4x4 m_blasToLocal; }; // Scene 定数バッファ struct SceneConstantBuffer { Float4x4 m_projectionToWorld; Float4 m_cameraPosition; Float4 m_cameraDirection; Float4 m_lightPosition; float m_elapsedTime; float m_aoRadius; float m_aoRayMinT; float m_indirectRayMinT; int m_aoRaySampleCount; int m_indirectRaySampleCount; int m_frameCount; int m_padding; }; // dxrRaytracedGBufferRenderPass.cpp SceneConstantBuffer m_sceneData; BlasInstancePerFrameBuffer m_blasInstanceBuffer; void DxrRaytracedGBufferRenderPass::Execute(ID3D12GraphicsCommandList* commandList, UINT frameIndex) { // GBuffer 出力先UAV を設定 Position が先頭, Normal, Diffuse, Specular と続く commandList->SetComputeRootDescriptorTable(OutputSlot, m_output[Position].m_gpuHandleUav); // シェーダ共通の Global Root Signature 設定 commandList->SetComputeRootSignature(m_globalRootSignature.Get()); // GPU リソース設定 commandList->SetComputeRootConstantBufferView(SceneSlot, m_sceneData.GpuVirtualAddress(frameIndex)); commandList->SetComputeRootShaderResourceView(ProceduralGeometryAttributeSlot, m_blasInstanceBuffer.GpuVirtualAddress(frameIndex)); if (RaytracingAPI::FallbackLayer == m_raytracingAPI) { // DXR 非対応GPU D3D12_FALLBACK_DISPATCH_RAYS_DESC dispatchDesc = {}; m_fallbackCommandList->SetDescriptorHeaps(1, m_descriptorHeap.GetAddressOf()); m_fallbackCommandList->SetTopLevelAccelerationStructure(AsSlot, m_fallbackTLASPointer); DispatchRays(m_fallbackCommandList.Get(), m_fallbackStateObject.Get(), &dispatchDesc); } else { // DXR 対応GPU D3D12_DISPATCH_RAYS_DESC dispatchDesc = {}; m_dxrCommandList->SetDescriptorHeaps(1, m_descriptorHeap.GetAddressOf()); commandList->SetComputeRootShaderResourceView(AsSlot, m_tlas->GetGPUVirtualAddress()); DispatchRays(m_dxrCommandList.Get(), m_dxrStateObject.Get(), &dispatchDesc); } }; CG 技術の実装と数理 2018 Shinya Morishige 2018 class DxrRenderPass

DXR RAYTRACING DEMO CG 技術の実装と数理 2018 Shinya Morishige 2018

AMBIENT OCCLUSION 1 ray / pixel 4 ray / pixel
32 ray / pixel Hybrid Rendering GBuffer を作成して、画面上の Surface Normal からのみ AO Ray を飛ばす GBuffer Normal (World ) DirectX Raytracing by Morishige CG 技術の実装と数理 2018 Shinya Morishige 2018

AMBIENT OCCLUSION 1. Ray Generation Shader 2. Miss Shader 3．Intersection
Shader (Procedural Geometry Only) // AmbientOcclsuion.hlsl: レイ生成シェーダ DispatchRays() 実行時に起動するシェーダ。 RWTexture2D<float4> g_renderTargetAo : register(u4, space0); RaytracingAccelerationStructure g_sceneBVH : register(t0, space0); Texture2D<float4> g_renderTargetPosition : register(t4, space0); Texture2D<float4> g_renderTargetNormal : register(t5, space0); ConstantBuffer<SceneConstantBuffer> g_sceneConstant : register(b0); [shader("raygeneration”)] void AoRaygenShader() { uint2 index = DispatchRaysIndex().xy; float4 worldPos = g_renderTargetPosition[index]; float4 worldNormal = g_renderTargetNormal[index]; uint sampleRayCount = g_sceneConstantBuffer.aoSampleCount; float visibleRay = float(sampleRayCount);); // GBuffer Position のw 成分で背景かどうかを判定 if (worldPos.w != 0.0f) { visibleRay = 0.0f; for(int i = 0;;i < sampleRayCount; ++i) {{ // Surface Normal 中心に半球サンプリング float3 sampleWorldDir = SampleUniformHemisphere(seed,worldNormal.xyz); float minT = g_sceneConstantBuffer.aoRayMinT; // 小さすぎると、アクネがでる。 float radius = g_sceneConstantBuffer.aoRadius;; // AO 半径 = Ray の距離上限 AoRayPayload rayPayload =0.0f; RayDesc rayAO; rayAO.Origin = worldPos.xyz; rayAO.Direction = sampleWorldDir.xyz; rayAO.TMin = minT; rayAO.TMax = radius; TraceRay(g_sceneBVH, RAY_FLAG_ACCEPT_FIRST_HIT_AND_END_SEARCH | RAY_FLAG_SKIP_CLOSEST_HIT_SHADER, 0xFF, 0, 1, 0, rayAO, rayPayload); visibleRay += rayPayload.visibleRay; } } float aoColor = visibleRay / float(sampleRayCount); g_renderTargetAo[index] = float4(aoColor, aoColor, aoColor,1.0f); } CG 技術の実装と数理 2018 Shinya Morishige 2018 [shader"miss"] void AoMissShader(inout AoRayPayload rayPayload){ // 完全に可視 (遮蔽されていない） rayPayload.visibleRay = 1.0f; } // 完全球の衝突判定 StructuredBuffer<BlasInstanceBuffer> g_instanceBuffer : register(t3,space0) ConstantBuffer<ProceduralGeometryInstanceConstantBuffer> g_sphereConstantBuffer: register(b2) [shader("intersection"] void AoIntersectionShader_ProceduralGeometry(){ // レイをSphere を構成するAABB 中心の座標系に変換（Procedural Geometry はAABB とシェーダで表現） Ray localRay = GetRayInSphereLocalSpace(); float hitT; ProceduralGeometryAttributes attr; // 球として衝突検出したAABB のインデックス取得して、衝突を通知。 if (RaySphereIntersectionTest(localRay, hitT, attr)){ #if 0 // AO 計算ではワールド法線は不要。Indirect Diffuse 計算で必要になる。 BlasInstanceBuffer sphereAttribute = g_instanceBuffer[g_sphereConstantBuffer.instanceIndex]; attr.normal = mul(attr.normal,float3x3(sphereAttribute.localSpaceToBottomLevelAS); attr.normal = normalize(mul(float3x3(ObjectToWorld,attr.normal)); #endif ReportHit(hitT,0,attr); } }

INDIRECT DIFFUSE RAY 1 BOUNCE AO がマスクの役目を果たす 1 pass で、直接照明、間接照明、AO、ソフトシャドウを生成できる
パフォーマンスやベイク用途で、成分ごとに生成可 1 ray / pixel 32 rays / pixel DirectX Raytracing by Morishige CG 技術の実装と数理 2018 Shinya Morishige 2018 実装は、後日公開のGitHub 参照

CONCLUSION CPU, GPU Compute, DXR とそれぞれ実装することで、等身大の課題がわかった。 GTX1080 で、16ms以内（今回の球であれば） What’s
next? 半透明と種々のマテリアル Real-Time Denoise Character Skinning Indirect Ray 高速化交差判定の高速化 Virtual Texture (Mip/LOD)との組み合わせ CG 技術の実装と数理 2018 Shinya Morishige 2018 今回の実装は、 Simple DXR Framework by Morishige として、GitHub で公開予定

APPLICATION 1. Hybrid Rendering a. Ray Tracing => Rasterization b.
Rasterization => Ray Tracing 4. Procedural Sound https://ime.ist.hokudai.ac.jp/~doba/projects.html Real-time Rendering of Fire/Explosion Sound Real-time Rendering of Aerodynamic Sound 2. Interactive Illumination 3. Physics and AI Collision Navigation Mesh Water Mesh https://www.ea.com/seed/news/seed-gdc-2018-presentation-slides-shiny-pixels CG 技術の実装と数理 2018 Shinya Morishige 2018 https://www.ea.com/frostbite/news/real-time-raytracing-for-interactive-global-illumination-workflows-in-frostbite

REFERENCES [1] RAY TRACING IN ONE WEEKEND, Peter Shirley http://in1weekend.blogspot.com/2016/01/ray-tracing-in-one-weekend.html
[2] Introduction to DirectX RayTracing, SIGGRAPH 2018 Courses, Chris Wyman http://intro-to-dxr.cwyman.org/ [3] Microsoft DirectX Raytracing, DirectX Developer Blog https://blogs.msdn.microsoft.com/directx/2018/03/19/announcing-microsoft-directx-raytracing/ [4] Spherical Fibonacci Mapping https://www.irit.fr/~David.Vanderhaeghe/M2IGAI-CO/2016-g1/docs/spherical_fibonacci_mapping.pdf [5] Rasterization: a Practical Implementation https://www.scratchapixel.com/lessons/3d-basic-rendering/rasterization-practical-implementation [6] GPU Ray Tracing in One Weekend https://medium.com/@jcowles/gpu-ray-tracing-in-one-weekend-3e7d874b3b0f CG 技術の実装と数理 2018 Shinya Morishige 2018

"INTRODUCTION TO DIRECT X RAYTRACING" - Days o...

"INTRODUCTION TO DIRECT X RAYTRACING" - Days of Future Past -

More Decks by Cygames

Other Decks in Technology

Featured

Transcript