support ray tracing? • What is neighbor search? • How to use hardware ray tracing to accelerate neighbor search? (BNF1MBO "DDFMFSBUJOH /FJHICPS4FBSDI 6TJOH)BSEXBSF 3BZ5SBDJOH
MESHES. THE COLUMNS ARE RESPECTIVELY THE NUMBER OF VERTICES OF THE INPUT AND OUTPUT MESHES, THE METRIC USED FOR THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) .FTI JF )PX4DFOFJT3FQSFTFOUFE 9 Very informally: 3D piece-wide linear approximation of arbitrary 3D surfaces Quadrilateral mesh Triangular mesh Valette, et al. [TVCG’08] free3d.com
mesh 2D image cgarena.com Visibility Problem For each pixel in the image (to be rendered), which point in the scene (i.e., on the mesh) corresponds to it?
along the ray direction? 10 Modeling Rendering Lighting, camera, material, etc. Visibility Shading 3D mesh 2D image cgarena.com * Usually cast multiple rays for each pixel
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) • Goal: calculate the [x, y, z] coordinates of the closest hit between the ray and the mesh. • Why closest hit? [x, y, z] Valette, et al. [TVCG’08] x y z
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) • Goal: calculate the [x, y, z] coordinates of the closest hit between the ray and the mesh. • Why closest hit? [x, y, z] Valette, et al. [TVCG’08] x y z
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) • The simplest solution: [x, y, z] Valette, et al. [TVCG’08]
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) • The simplest solution: • iterate all triangles [x, y, z] Valette, et al. [TVCG’08]
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) • The simplest solution: • iterate all triangles • test intersection for each triangle [x, y, z] Valette, et al. [TVCG’08]
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) • The simplest solution: • iterate all triangles • test intersection for each triangle • return the closest hit, if any [x, y, z] Valette, et al. [TVCG’08]
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) • The simplest solution: • iterate all triangles • test intersection for each triangle • return the closest hit, if any • Complexity: [x, y, z] Valette, et al. [TVCG’08]
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) • The simplest solution: • iterate all triangles • test intersection for each triangle • return the closest hit, if any • Complexity: • O(# of rays x # of triangles) [x, y, z] Valette, et al. [TVCG’08]
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) • The simplest solution: • iterate all triangles • test intersection for each triangle • return the closest hit, if any • Complexity: • O(# of rays x # of triangles) • Slow: [x, y, z] Valette, et al. [TVCG’08]
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) • The simplest solution: • iterate all triangles • test intersection for each triangle • return the closest hit, if any • Complexity: • O(# of rays x # of triangles) • Slow: • lots of triangles and lots of rays [x, y, z] Valette, et al. [TVCG’08]
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) • The simplest solution: • iterate all triangles • test intersection for each triangle • return the closest hit, if any • Complexity: • O(# of rays x # of triangles) • Slow: • lots of triangles and lots of rays • …and it’s recursive [x, y, z] Valette, et al. [TVCG’08]
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) Valette, et al. [TVCG’08] • To implement realistic shading. Color?
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) Valette, et al. [TVCG’08] • To implement realistic shading. • The color* of an exiting ray depends on the colors* of all incident rays. Color? Color? Color? Color?
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) Valette, et al. [TVCG’08] • To implement realistic shading. • The color* of an exiting ray depends on the colors* of all incident rays. • color* should technically be radiance; not important for our discussion here. Color? Color? Color? Color?
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) Valette, et al. [TVCG’08] • To implement realistic shading. • The color* of an exiting ray depends on the colors* of all incident rays. • color* should technically be radiance; not important for our discussion here. • also depends on the surface material (diffuse vs. specular vs. …); not important for our discussion here. Color? Color? Color? Color?
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) Valette, et al. [TVCG’08] • To implement realistic shading. • The color* of an exiting ray depends on the colors* of all incident rays. • color* should technically be radiance; not important for our discussion here. • also depends on the surface material (diffuse vs. specular vs. …); not important for our discussion here. • How do we know the color of an incident ray? Cast more rays! Color? Color? Color? Color?
THE CLUSTERING, THE TIME SPENT ON THE CURVATURE MEASURE COMPUTATION AND ON THE CLUSTERING, THE PERCENTAGE OF MINIMAL INTERNAL ANGLES BELLOW 30o AND THE AVERAGE TRIANGLE ASPECT RATIO. Fig. 12. Coarsened versions of the rockerarm model (1000 vertices) and the buddha model (20k vertices). models (left : AQ metric; right: IQ metric). The anisotropic behavior of the AQ metric is clearly visible in elongated Fig. 13. Closeup view of the David model remeshed to 500k vertices (Isotropic metric) Valette, et al. [TVCG’08] • To implement realistic shading. • The color* of an exiting ray depends on the colors* of all incident rays. • color* should technically be radiance; not important for our discussion here. • also depends on the surface material (diffuse vs. specular vs. …); not important for our discussion here. • How do we know the color of an incident ray? Cast more rays! Secondary Ray Secondary Ray Secondary Ray
fr (x, ωo , ωi ) Li (x, ωi ) cos θ dωi “Color” of exiting ray wo “Color” of incident ray wi Integrate incident rays over the hemisphere “Transfer function”
part of the scene that does intersect the ray. intersect(space, ray) { if ray doesn’t intersect space boundary: return else: foreach subspace in space if (subspace != empty) intersect(subspace, ray) }
part of the scene that does intersect the ray. intersect(space, ray) { if ray doesn’t intersect space boundary: return else: foreach subspace in space if (subspace != empty) intersect(subspace, ray) }
part of the scene that does intersect the ray. intersect(space, ray) { if ray doesn’t intersect space boundary: return else: foreach subspace in space if (subspace != empty) intersect(subspace, ray) }
part of the scene that does intersect the ray. • Key: how to partition the space? intersect(space, ray) { if ray doesn’t intersect space boundary: return else: foreach subspace in space if (subspace != empty) intersect(subspace, ray) }
E 3 • A, B, C, D, E are the bounding volumes, which are Axis-Aligned Bounding Boxes (AABBs) here. Other (irregular) bounding volumes are possible. A B C 1 D 2 3 E 4 Interior node Leaf node Root Primitive
tmax O D thit Yes; any ray segment that’s completely inside an AABB must be treated as intersecting. tmin tmax Should this be counted as a hit? tmin tmax
finding the intersection of one ray and the scene is called ray casting. • Ray tracing referes to recursive ray casting. • Acceleration structures • Data structures that help speed up ray tracing is called “acceleration structures” (e.g., BVH), not to be confused with hardware accelerators.
3 Ray Ray Ray • Build the BVH. • For each ray (thread): • Traverse the BVH (manage local stack) • Ray-AABB intersection test • Ray-primitive intersection test • Executes a shading algorithm
3 Ray Ray Ray • Build the BVH. • For each ray (thread): • Traverse the BVH (manage local stack) • Ray-AABB intersection test • Ray-primitive intersection test • Executes a shading algorithm • Prior to OptiX (2010) • Manually implement in CUDA.
3 Ray Ray Ray • Build the BVH. • For each ray (thread): • Traverse the BVH (manage local stack) • Ray-AABB intersection test • Ray-primitive intersection test • Executes a shading algorithm • Prior to OptiX (2010) • Manually implement in CUDA. Fixed-function ~Fixed-function
Shader Enter leaf node BVH Traversal + Ray-AABB Test (TL) • “Shaders” are user-defined functions executing on CUDA cores. • Allows custom primitives (not just triangles). A B E C D 2 3 4 1
Shader Enter leaf node BVH Traversal + Ray-AABB Test (TL) No Any-Hit (AH) Shader Ray primitive intersect? Yes • “Shaders” are user-defined functions executing on CUDA cores. • Allows custom primitives (not just triangles).
Shader Enter leaf node BVH Traversal + Ray-AABB Test (TL) No Any-Hit (AH) Shader Ray primitive intersect? Yes Found a hit? Traversal completes • “Shaders” are user-defined functions executing on CUDA cores. • Allows custom primitives (not just triangles).
Shader Enter leaf node BVH Traversal + Ray-AABB Test (TL) No Any-Hit (AH) Shader Ray primitive intersect? Yes Closest-Hit (CH) Shader Miss Shader Found a hit? Traversal completes • “Shaders” are user-defined functions executing on CUDA cores. • Allows custom primitives (not just triangles).
Shader Enter leaf node BVH Traversal + Ray-AABB Test (TL) No Any-Hit (AH) Shader Ray primitive intersect? Yes Closest-Hit (CH) Shader Miss Shader Found a hit? Traversal completes • “Shaders” are user-defined functions executing on CUDA cores. • Allows custom primitives (not just triangles). Fixed functions executed on the RT cores.
… … … … BVH Traversal + Ray-AABB Test (TL) Found a hit? Closest-Hit (CH) Shader Miss Shader Any-Hit (AH) Shader Intersection (IS) Shader Ray primitive intersect? Enter leaf node Yes No Traversal completes One Single CUDA Kernel CUDA Threads OptiX Rays
Scientific and Technical Achievement. • Built on Intel Embree, a collection of ray tracing kernels, which uses Intel Implicit SPMD Program Compiler (ISPC) for explicit vectorization. • PBRT • Pedagogical engine. • The book won 2014 Oscar for Scientific and Technical Achievement.
of neighbors: • practical memory constraint, • downstream algorithms expect a fixed # of neighbors. rangeSearch(query, points, range, K) Return any K points that are within range of query
also limits the total # of neighbors: • practical memory constraint, • downstream algorithms expect a fixed # of neighbors. rangeSearch(query, points, range, K) Return any K points that are within range of query
total # of neighbors: • practical memory constraint, • downstream algorithms expect a fixed # of neighbors. usually also limits ranges of neighbors: • neighbors too far away are of no significance (e.g., force from a remote particle). rangeSearch(query, points, range, K) Return any K points that are within range of query
total # of neighbors: • practical memory constraint, • downstream algorithms expect a fixed # of neighbors. usually also limits ranges of neighbors: • neighbors too far away are of no significance (e.g., force from a remote particle). rangeSearch(query, points, range, K) Return any K points that are within range of query KNN(query, points, range, K) Return K nearest points that are within range of query
science and engineering fields (e.g., computational fluid dynamics, graphics, vision). • They deal with physical data (e.g., particles, surface samples) that are inherent 2D/3D. • High-dimensional search is a completely different game. • “Curse of dimensionality” means we need different algorithms and distance metric.
treated as intersecting. • Idea: generate a short ray from Q and (ask the RT cores to) perform the ray-AABB test. • The ray has an arbitrary direction and a very small length. 1PJOUJO""##5FTU 49 Q 2r
treated as intersecting. • Idea: generate a short ray from Q and (ask the RT cores to) perform the ray-AABB test. • The ray has an arbitrary direction and a very small length. • Why a very small ray length? 1PJOUJO""##5FTU 49 Q 2r
treated as intersecting. • Idea: generate a short ray from Q and (ask the RT cores to) perform the ray-AABB test. • The ray has an arbitrary direction and a very small length. • Why a very small ray length? 1PJOUJO""##5FTU 49 Q 2r Q’
support ray tracing? • What is neighbor search? • How to use hardware ray tracing to accelerate neighbor search? (BNF1MBO "DDFMFSBUJOH /FJHICPS4FBSDI 6TJOH)BSEXBSF 3BZ5SBDJOH
point rangeSearch(query, points, r, K) https://forums.developer.nvidia.com/t/bvh-building-algorithm-and-primitive-order/182231/8 Construct a BVH from the AABBs (No control; hidden behind the OptiX APIs and most likely done in hardware)
point rangeSearch(query, points, r, K) https://forums.developer.nvidia.com/t/bvh-building-algorithm-and-primitive-order/182231/8 Construct a BVH from the AABBs (No control; hidden behind the OptiX APIs and most likely done in hardware)
point rangeSearch(query, points, r, K) https://forums.developer.nvidia.com/t/bvh-building-algorithm-and-primitive-order/182231/8 Construct a BVH from the AABBs (No control; hidden behind the OptiX APIs and most likely done in hardware) Use spheres as primitives, not triangles.
point rangeSearch(query, points, r, K) https://forums.developer.nvidia.com/t/bvh-building-algorithm-and-primitive-order/182231/8 Generate a ray for each query (RG Shader) Construct a BVH from the AABBs (No control; hidden behind the OptiX APIs and most likely done in hardware)
point rangeSearch(query, points, r, K) https://forums.developer.nvidia.com/t/bvh-building-algorithm-and-primitive-order/182231/8 Generate a ray for each query (RG Shader) Construct a BVH from the AABBs (No control; hidden behind the OptiX APIs and most likely done in hardware) Traverse BVH; skip non-circumscribing AABBs (No control; done in hardware)
point rangeSearch(query, points, r, K) https://forums.developer.nvidia.com/t/bvh-building-algorithm-and-primitive-order/182231/8 Generate a ray for each query (RG Shader) Construct a BVH from the AABBs (No control; hidden behind the OptiX APIs and most likely done in hardware) Traverse BVH; skip non-circumscribing AABBs (No control; done in hardware) At leaf nodes: calc dist, collect neighbors (IS Shader)
point rangeSearch(query, points, r, K) https://forums.developer.nvidia.com/t/bvh-building-algorithm-and-primitive-order/182231/8 Generate a ray for each query (RG Shader) Construct a BVH from the AABBs (No control; hidden behind the OptiX APIs and most likely done in hardware) Traverse BVH; skip non-circumscribing AABBs (No control; done in hardware) At leaf nodes: calc dist, collect neighbors (IS Shader)
+ Ray-AABB Test (TL) Found a hit? Closest-Hit (CH) Shader Miss Shader Any-Hit (AH) Shader Intersection (IS) Shader Ray primitive intersect? Enter leaf node Yes No Traversal completes Is Q in the AABB? (Prunes remote points) If so, is Q in the sphere?
that their rays follow similar traversal paths. • Improving ray coherence in graphics parlance. • How? A simple heuristic: queries enclosed by the same AABB are spatially close.
AABBs, but any AABB will do. • How to find one? Cast a ray and immediately terminate the ray once the first IS shader is called. • optixTerminateRay() 1 2 3 4 7 6 5 8
AABBs, but any AABB will do. • How to find one? Cast a ray and immediately terminate the ray once the first IS shader is called. • optixTerminateRay() • Effectively returning ID (key) of the first enclosing leaf AABB. 1 2 3 4 7 6 5 8
AABBs, but any AABB will do. • How to find one? Cast a ray and immediately terminate the ray once the first IS shader is called. • optixTerminateRay() • Effectively returning ID (key) of the first enclosing leaf AABB. • Then sort by key. 1 2 3 4 7 6 5 8
5 0 Time (s) 30 25 20 15 10 5 0 AABB Width • Using smaller AABBs drastically reduces the search time. • Smaller AABB means a query is enclosed by fewer AABBs.
5 0 Time (s) 30 25 20 15 10 5 0 AABB Width • Using smaller AABBs drastically reduces the search time. • Smaller AABB means a query is enclosed by fewer AABBs. • …which leads to fewer traversals and IS shader calls.
5 0 Time (s) 30 25 20 15 10 5 0 AABB Width • Using smaller AABBs drastically reduces the search time. • Smaller AABB means a query is enclosed by fewer AABBs. • …which leads to fewer traversals and IS shader calls. • Particularly important for KNN search, where the IS shader manipulates a priority queue.
that’s just large enough to ensure correctness. • Group queries such that queries in each partition share the same AABB. q0 q1 q2 q3 Calc. Smallest AABB Size q1 .. .. .. BVH 0 Partitions … …… Queries …… q0 .. BVH 1 .. BVH n-1 q2 BVH n q3 ..
that’s just large enough to ensure correctness. • Group queries such that queries in each partition share the same AABB. • Build a different BVH for each partition. q0 q1 q2 q3 Calc. Smallest AABB Size q1 .. .. .. BVH 0 Partitions … …… Queries …… q0 .. BVH 1 .. BVH n-1 q2 BVH n q3 ..
that’s just large enough to ensure correctness. • Group queries such that queries in each partition share the same AABB. • Build a different BVH for each partition. • Essentially trades BVH construction overhead for faster search. q0 q1 q2 q3 Calc. Smallest AABB Size q1 .. .. .. BVH 0 Partitions … …… Queries …… q0 .. BVH 1 .. BVH n-1 q2 BVH n q3 ..
the cell that contains the query, and iteratively grow along all four (2D) or six (3D) directions. • Stop when K neighbors are found (or the sphere boundary is reached). d
the cell that contains the query, and iteratively grow along all four (2D) or six (3D) directions. • Stop when K neighbors are found (or the sphere boundary is reached). • We call the final collection of cells the megacell, with a width d. d
the cell that contains the query, and iteratively grow along all four (2D) or six (3D) directions. • Stop when K neighbors are found (or the sphere boundary is reached). • We call the final collection of cells the megacell, with a width d. • d is the AABB size. d
guaranteed to have the K nearest neighbors. • Why? Given a circle with N neighbors, those N neighbors are by definition the N nearest neighbors; N is guaranteed to be >= K. d p2 q p1
guaranteed to have the K nearest neighbors. • Why? Given a circle with N neighbors, those N neighbors are by definition the N nearest neighbors; N is guaranteed to be >= K. • AABB must be the circumscribing square/cube of that circle/sphere. d p2 q p1
guaranteed to have the K nearest neighbors. • Why? Given a circle with N neighbors, those N neighbors are by definition the N nearest neighbors; N is guaranteed to be >= K. • AABB must be the circumscribing square/cube of that circle/sphere. • Width is for 2D and for 3D. 2d 3d d p2 q p1
within and around a megacell. • A sphere C that has the same volume as cube A will contain K neighbors, which are guaranteed to be the K nearest neighbors. d p2 q p1 A B C
within and around a megacell. • A sphere C that has the same volume as cube A will contain K neighbors, which are guaranteed to be the K nearest neighbors. • AABB size is for 3D. 2 3 3 4π d d p2 q p1 A B C
queries); reorderQueries(queries, firstHitAABBs); traceRays(bvh, queries); foreach q in queries: AABBSize ← findSmallestAABBSize(q); partitions.add(AABBSize, q); // assuming a hash table foreach p in partitions: queries ← all queries in p; r ← AABBSize of p;
BVH construction overhead. • Especially bad when point density is globally non-uniform (e.g., astrophysics simulation). • Bundle partitions to minimize overall search time. Bundling two partitions: p1 p2 p3 p4 b1 b2 b3 Partitions Bundles
BVH construction overhead. • Especially bad when point density is globally non-uniform (e.g., astrophysics simulation). • Bundle partitions to minimize overall search time. Bundling two partitions: • eliminates one BVH construction cost. p1 p2 p3 p4 b1 b2 b3 Partitions Bundles
BVH construction overhead. • Especially bad when point density is globally non-uniform (e.g., astrophysics simulation). • Bundle partitions to minimize overall search time. Bundling two partitions: • eliminates one BVH construction cost. • but also increases the search cost. Why? p1 p2 p3 p4 b1 b2 b3 Partitions Bundles
of IS shader calls, which • …is dictated by the number of AABBs a query resides in, which 35 28 21 14 7 0 Execution Time (s) 0.9 0.6 0.3 0.0 # of IS Shader Calls (millions)
of IS shader calls, which • …is dictated by the number of AABBs a query resides in, which • …is equivalent to the number of points inside an AABB, which 35 28 21 14 7 0 Execution Time (s) 0.9 0.6 0.3 0.0 # of IS Shader Calls (millions)
of IS shader calls, which • …is dictated by the number of AABBs a query resides in, which • …is equivalent to the number of points inside an AABB, which • …is density x volume (r3), assuming locally-uniform density 35 28 21 14 7 0 Execution Time (s) 0.9 0.6 0.3 0.0 # of IS Shader Calls (millions)
of IS shader calls, which • …is dictated by the number of AABBs a query resides in, which • …is equivalent to the number of points inside an AABB, which • …is density x volume (r3), assuming locally-uniform density • Search cost ∝ r3 35 28 21 14 7 0 Execution Time (s) 0.9 0.6 0.3 0.0 # of IS Shader Calls (millions)
of IS shader calls, which • …is dictated by the number of AABBs a query resides in, which • …is equivalent to the number of points inside an AABB, which • …is density x volume (r3), assuming locally-uniform density • Search cost ∝ r3 35 28 21 14 7 0 Execution Time (s) 0.9 0.6 0.3 0.0 # of IS Shader Calls (millions) Tsearch = kNρS3
of IS shader calls, which • …is dictated by the number of AABBs a query resides in, which • …is equivalent to the number of points inside an AABB, which • …is density x volume (r3), assuming locally-uniform density • Search cost ∝ r3 35 28 21 14 7 0 Execution Time (s) 0.9 0.6 0.3 0.0 # of IS Shader Calls (millions) Tsearch = kNρS3 # of queries in a partition
of IS shader calls, which • …is dictated by the number of AABBs a query resides in, which • …is equivalent to the number of points inside an AABB, which • …is density x volume (r3), assuming locally-uniform density • Search cost ∝ r3 35 28 21 14 7 0 Execution Time (s) 0.9 0.6 0.3 0.0 # of IS Shader Calls (millions) Tsearch = kNρS3 # of queries in a partition Point density in a partition
of IS shader calls, which • …is dictated by the number of AABBs a query resides in, which • …is equivalent to the number of points inside an AABB, which • …is density x volume (r3), assuming locally-uniform density • Search cost ∝ r3 35 28 21 14 7 0 Execution Time (s) 0.9 0.6 0.3 0.0 # of IS Shader Calls (millions) Tsearch = kNρS3 # of queries in a partition Point density in a partition AABB size of the partition
of IS shader calls, which • …is dictated by the number of AABBs a query resides in, which • …is equivalent to the number of points inside an AABB, which • …is density x volume (r3), assuming locally-uniform density • Search cost ∝ r3 35 28 21 14 7 0 Execution Time (s) 0.9 0.6 0.3 0.0 # of IS Shader Calls (millions) Tsearch = kNρS3 # of queries in a partition Point density in a partition AABB size of the partition A constant regressed offline
construction cost. What’s the optimal bundling? • Combinatorial optimization, but we have to solve it at run-time. p1 p2 p3 p4 b1 b2 b3 Partitions Bundles p1 p2 p3 p4 b1 b2 b3 Partitions Bundles
construction cost. What’s the optimal bundling? • Combinatorial optimization, but we have to solve it at run-time. • We leverage an empirical observation to simplify the problem structure, which yields an efficient linear-time solution. p1 p2 p3 p4 b1 b2 b3 Partitions Bundles p1 p2 p3 p4 b1 b2 b3 Partitions Bundles
are inversely correlated. 104 105 106 107 Number of Queries 2.3 1.9 1.5 1.1 0.7 0.3 AABB Size Intuitively, only a handful of sparsely located queries need a large AABB to find K neighbors.
are inversely correlated. • Given this empirical observation, we can derive the optimal bundling in linear time. • Proof omitted; see paper. 104 105 106 107 Number of Queries 2.3 1.9 1.5 1.1 0.7 0.3 AABB Size Intuitively, only a handful of sparsely located queries need a large AABB to find K neighbors.
ascending order of their AABB sizes. • Start from the last partition and scan backward; at each step, bundle all partitions that have been scanned, leave the rest unbundled. • Pick the one with the lowest search cost. p1 p2 p3 p4 b1 b2 b3 Partitions Bundles Larger AABBs, fewer queries.
ascending order of their AABB sizes. • Start from the last partition and scan backward; at each step, bundle all partitions that have been scanned, leave the rest unbundled. • Pick the one with the lowest search cost. p1 p2 p3 p4 b1 b2 b3 Partitions Bundles Larger AABBs, fewer queries. p1 p2 p3 p4 b1 b2 b3 Partitions Bundles
ascending order of their AABB sizes. • Start from the last partition and scan backward; at each step, bundle all partitions that have been scanned, leave the rest unbundled. • Pick the one with the lowest search cost. p1 p2 p3 p4 b1 b2 b3 Partitions Bundles Larger AABBs, fewer queries. p1 p2 p3 p4 b1 b2 b3 Partitions Bundles
q); // assuming a hash table foreach p in partitions: queries ← all queries in p; r ← AABBSize of p; bvh ← buildBVH(points, r); firstHitAABBs ← traceRays(bvh, queries); reorderQueries(queries, firstHitAABBs); traceRays(bvh, queries);
q); // assuming a hash table foreach p in partitions: queries ← all queries in p; r ← AABBSize of p; bvh ← buildBVH(points, r); firstHitAABBs ← traceRays(bvh, queries); reorderQueries(queries, firstHitAABBs); traceRays(bvh, queries); bundle(partitions);
• cuNSearch: grid search in CUDA; used in SPlisHSPlasH fluid simulator. • FRNN: grid search in CUDA. • PCLOctree: octree-search in CUDA (i.e., use octree, as opposed to BVH, to prune search). • FastRNN: KNN search in RT cores without our optimizations. • Datasets: • KITTI: self-driving car datasets; points are surface samples; mostly confined in 2D (ground) • Stanford 3D Scanning Repo: Bunny, Dragon, Buddha. • N-body simulation: non-uniform distribution in 3D. 78
KITTI-1M KITTI-6M KITTI-12M KITTI-25M NBody-9M NBody-10M Bunny-360K Dragon3.6M Buddha-4.6M Data Opt BVH FS Search 100 80 60 40 20 0 Time (%) KITTI-1M KITTI-6M KITTI-12M KITTI-25M NBody-9M NBody-10M Bunny-360K Dragon3.6M Buddha-4.6M Data Opt BVH FS Search Range search: much of the time is spent on optimization, data transfer, BVH construction. KNN search: time is mostly dominated by the actual search. KITTI N-body 3D scan KITTI N-body 3D scan 0 0
KITTI-1M KITTI-6M KITTI-12M KITTI-25M NBody-9M NBody-10M Bunny-360K Dragon3.6M Buddha-4.6M Data Opt BVH FS Search 100 80 60 40 20 0 Time (%) KITTI-1M KITTI-6M KITTI-12M KITTI-25M NBody-9M NBody-10M Bunny-360K Dragon3.6M Buddha-4.6M Data Opt BVH FS Search N-body N-body Galaxy (point) distribution in universe is very non- uniform; so a lot of time spent on partitioning. 0 0
will the same happen to RT cores? • A few examples of using RT cores for non-graphics workloads. • Key: formulate your problem as a BVH search. • But very limited, because RT cores are built to support only BVH search, which has a very specific branching logic (ray-AABB test). • Relax the hardware? Does it make sense? Will Nvidia do it? 84
Many natural opportunities for approximation in our algorithm. • Use a smaller-than-necessary AABB to build the BVH. • Elide ray-sphere test (skip IS shader calls); provides an error bound. • Even better: many applications that use neighbor search are differentiable (e.g., neural network). We could integrate approximate neighbor search into the training process to tolerate end-to-end accuracy loss. • See Yu Feng’s ISCA 2022 paper. 85