4.4 Kd-Tree Accelerator
Binary space partitioning (BSP) trees adaptively subdivide space with planes. A BSP tree starts with a bounding box that encompasses the entire scene. If the number of primitives in the box is greater than some threshold, the box is split in half by a plane. Primitives are then associated with whichever half they overlap, and primitives that lie in both halves are associated with both of them. (This is in contrast to BVHs, where each primitive is assigned to only one of the two subgroups after a split.)
The splitting process continues recursively either until each leaf region in the resulting tree contains a sufficiently small number of primitives or until a maximum depth is reached. Because the splitting planes can be placed at arbitrary positions inside the overall bound and because different parts of 3D space can be refined to different degrees, BSP trees can easily handle uneven distributions of geometry.
Two variations of BSP trees are kd-trees and octrees. A kd-tree simply restricts the splitting plane to be perpendicular to one of the coordinate axes; this makes both traversal and construction of the tree more efficient, at the cost of some flexibility in how space is subdivided. The octree uses three axis-perpendicular planes to simultaneously split the box into eight regions at each step (typically by splitting down the center of the extent in each direction). In this section, we will implement a kd-tree for ray intersection acceleration in the KdTreeAccel class. Source code for this class can be found in the files accelerators/kdtreeaccel.h and accelerators/kdtreeaccel.cpp.
In addition to the primitives to be stored, the KdTreeAccel constructor takes a few parameters that are used to guide the decisions that will be made as the tree is built; these parameters are stored in member variables (isectCost, traversalCost, maxPrims, maxDepth, and emptyBonus) for later use. See Figure 4.14 for an overview of how the tree is built.
4.4.1 Tree Representation
The kd-tree is a binary tree, where each interior node always has both children and where leaves of the tree store the primitives that overlap them. Each interior node must provide access to three pieces of information:
- Split axis: which of the , , or axes was split at this node
- Split position: the position of the splitting plane along the axis
- Children: information about how to reach the two child nodes beneath it
Each leaf node needs to record only which primitives overlap it.
It is worth going through a bit of trouble to ensure that all interior nodes and many leaf nodes use just 8 bytes of memory (assuming 4-byte Floats) because doing so ensures that eight nodes will fit into a 64-byte cache line. Because there are often many nodes in the tree and because many nodes are generally accessed for each ray, minimizing the size of the node representation substantially improves cache performance. Our initial implementation used a 16-byte node representation; when we reduced the size to 8 bytes we obtained nearly a 20% speed increase.
Both leaves and interior nodes are represented by the following KdAccelNode structure. The comments after each union member indicate whether a particular field is used for interior nodes, leaf nodes, or both.
The two low-order bits of the KdAccelNode::flags variable are used to differentiate between interior nodes with , , and splits (where these bits hold the values 0, 1, and 2, respectively) and leaf nodes (where these bits hold the value 3). It is relatively easy to store leaf nodes in 8 bytes: since the low 2 bits of KdAccelNode::flags are used to indicate that this is a leaf, the upper 30 bits of KdAccelNode::nPrims are available to record how many primitives overlap it. Then, if just a single primitive overlaps a KdAccelNode leaf, an integer index into the KdTreeAccel::primitives array identifies the Primitive. If more than one primitive overlaps, then their indices are stored in a segment of KdTreeAccel::primitiveIndices. The offset to the first index for the leaf is stored in KdAccelNode::primitiveIndicesOffset and the indices for the rest directly follow.
Leaf nodes are easy to initialize, though we have to be careful with the details since both flags and nPrims share the same storage; we need to be careful to not clobber data for one of them while initializing the other. Furthermore, the number of primitives must be shifted two bits to the left before being stored so that the low two bits of KdAccelNode::flags can both be set to 1 to indicate that this is a leaf node.
For leaf nodes with zero or one overlapping primitives, no additional memory allocation is necessary thanks to the KdAccelNode::onePrimitive field. For the case where multiple primitives overlap, storage is allocated in the primitiveIndices array.
Getting interior nodes down to 8 bytes is also reasonably straightforward. A Float (which is 32 bits in size when Floats are defined to be floats) stores the position along the chosen split axis where the node splits space, and, as explained earlier, the lowest 2 bits of KdAccelNode::flags are used to record which axis the node was split along. All that is left is to store enough information to find the two children of the node as we’re traversing the tree.
Rather than storing two pointers or offsets, we lay the nodes out in a way that lets us only store one child pointer: all of the nodes are allocated in a single contiguous block of memory, and the child of an interior node that is responsible for space below the splitting plane is always stored in the array position immediately after its parent (this layout also improves cache performance, by keeping at least one child close to its parent in memory). The other child, representing space above the splitting plane, will end up somewhere else in the array; a single integer offset, KdAccelNode::aboveChild, stores its position in the nodes array. This representation is similar to the one used for BVH nodes in Section 4.3.4.
Given all those conventions, the code to initialize an interior node is straightforward. As in the InitLeaf() method, it’s important to assign the value to flags before setting aboveChild and to compute the logical OR of the shifted above child value so as not to clobber the bits stored in flags.
Finally, we’ll provide a few methods to extract various values from the node, so that callers don’t have to be aware of the details of its representation in memory.
4.4.2 Tree Construction
The kd-tree is built with a recursive top-down algorithm. At each step, we have an axis-aligned region of space and a set of primitives that overlap that region. Either the region is split into two subregions and turned into an interior node or a leaf node is created with the overlapping primitives, terminating the recursion.
As mentioned in the discussion of KdAccelNodes, all tree nodes are stored in a contiguous array. KdTreeAccel::nextFreeNode records the next node in this array that is available, and KdTreeAccel::nAllocedNodes records the total number that have been allocated. By setting both of them to 0 and not allocating any nodes at start-up, the implementation here ensures that an allocation will be done immediately when the first node of the tree is initialized.
It is also necessary to determine a maximum tree depth if one wasn’t given to the constructor. Although the tree construction process will normally terminate naturally at a reasonable depth, it is important to cap the maximum depth so that the amount of memory used for the tree cannot grow without bound in pathological cases. We have found that the value gives a reasonable maximum depth for a variety of scenes.
Because the construction routine will be repeatedly using the bounding boxes of the primitives along the way, they are stored in a vector before tree construction starts so that the potentially slow Primitive::WorldBound() methods don’t need to be called repeatedly.
One of the parameters to the tree construction routine is an array of primitive indices indicating which primitives overlap the current node. Because all primitives overlap the root node (when the recursion begins) we start with an array initialized with values from zero through primitives.size()-1.
KdTreeAccel::buildTree() is called for each tree node. It is responsible for deciding if the node should be an interior node or leaf and updating the data structures appropriately. The last three parameters, edges, prims0, and prims1, are pointers to data that is allocated in the <<Allocate working memory for kd-tree construction>> fragment, which will be defined and documented in a few pages.
The main parameters to KdTreeAccel::buildTree() are the offset into the array of KdAccelNodes to use for the node that it creates, nodeNum; the bounding box that gives the region of space that the node covers, nodeBounds; and the indices of primitives that overlap it, primNums. The remainder of the parameters will be described later, closer to where they are used.
If all of the allocated nodes have been used up, node memory is reallocated with twice as many entries and the old values are copied. The first time KdTreeAccel::buildTree() is called, KdTreeAccel::nAllocedNodes is 0 and an initial block of tree nodes is allocated.
A leaf node is created (stopping the recursion) either if there are a sufficiently small number of primitives in the region or if the maximum depth has been reached. The depth parameter starts out as the tree’s maximum depth and is decremented at each level.
If this is an internal node, it is necessary to choose a splitting plane, classify the primitives with respect to that plane, and recurse.
Our implementation chooses a split using the SAH introduced in Section 4.3.2. The SAH is applicable to kd-trees as well as BVHs; here, the estimated cost is computed for a series of candidate splitting planes in the node, and the split that gives the lowest cost is chosen.
In the implementation here, the intersection cost and the traversal cost can be set by the user; their default values are 80 and 1, respectively. Ultimately, it is the ratio of these two values that determines the behavior of the tree-building algorithm. The greater ratio between these values compared to the values used for BVH construction reflects the fact that visiting a kd-tree node is less expensive than visiting a BVH node.
One modification to the SAH used for BVH trees is that for kd-trees it is worth giving a slight preference to choosing splits where one of the children has no primitives overlapping it, since rays passing through these regions can immediately advance to the next kd-tree node without any ray–primitive intersection tests. Thus, the revised costs for unsplit and split regions are, respectively,
where is a “bonus” value that is zero unless one of the two regions is completely empty, in which case it takes on a value between 0 and 1.
Given a way to compute the probabilities for the cost model, the only problem to address is how to generate candidate splitting positions and how to efficiently compute the cost for each candidate. It can be shown that the minimum cost with this model will be attained at a split that is coincident with one of the faces of one of the primitive’s bounding boxes—there’s no need to consider splits at intermediate positions. (To convince yourself of this, consider the behavior of the cost function between the edges of the faces.) Here, we will consider all bounding box faces inside the region for one or more of the three coordinate axes.
The cost for checking all of these candidates thus can be kept relatively low with a carefully structured algorithm. To compute these costs, we will sweep across the projections of the bounding boxes onto each axis and keep track of which gives the lowest cost (Figure 4.15). Each bounding box has two edges on each axis, each of which is represented by an instance of the BoundEdge structure. This structure records the position of the edge along the axis, whether it represents the start or end of a bounding box (going from low to high along the axis), and which primitive it is associated with.
At most, 2 * primitives.size() BoundEdges are needed for computing costs for any tree node, so the memory for the edges for all three axes is allocated once and then reused for each node that is created.
After determining the estimated cost for creating a leaf, KdTreeAccel::buildTree() chooses an axis to try to split along and computes the cost function for each candidate split. bestAxis and bestOffset record the axis and bounding box edge index that have given the lowest cost so far, bestCost. invTotalSA is initialized to the reciprocal of the node’s surface area; its value will be used when computing the probabilities of rays passing through each of the candidate children nodes.
This method first tries to find a split along the axis with the largest spatial extent; if successful, this choice helps to give regions of space that tend toward being square in shape. This is an intuitively sensible approach. Later, if it was unsuccessful in finding a good split along this axis, it will go back and try the others in turn.
First the edges array for the axis is initialized using the bounding boxes of the overlapping primitives. The array is then sorted from low to high along the axis so that it can sweep over the box edges from first to last.
The C++ standard library routine sort() requires that the structure being sorted define an ordering; this is done using the BoundEdge::t values. However, one subtlety is that if the BoundEdge::t values match, it is necessary to try to break the tie by comparing the node’s types; this is necessary since sort() depends on the fact that the only time a < b and b < a are both false is when a == b.
Given the sorted array of edges, we’d like to quickly compute the cost function for a split at each one of them. The probabilities for a ray passing through each child node are easily computed using their surface areas, and the number of primitives on each side of the split is tracked by the variables nBelow and nAbove. We would like to keep their values updated such that if we chose to split at edgeT for a particular pass through the loop, nBelow will give the number of primitives that would end up below the splitting plane and nAbove would give the number above it.
At the first edge, all primitives must be above that edge by definition, so nAbove is initialized to nPrimitives and nBelow is set to 0. When the loop is considering a split at the end of a bounding box’s extent, nAbove needs to be decremented, since that box, which must have previously been above the splitting plane, will no longer be above it if splitting is done at the point. Similarly, after calculating the split cost, if the split candidate was at the start of a bounding box’s extent, then the box will be on the below side for all subsequent splits. The tests at the start and end of the loop body update the primitive counts for these two cases.
belowSA and aboveSA hold the surface areas of the two candidate child bounds; they are easily computed by adding up the areas of the six faces.
Given all of this information, the cost for a particular split can be computed.
If the cost computed for this candidate split is the best one so far, the details of the split are recorded.
It may happen that there are no possible splits found in the previous tests (Figure 4.16 illustrates a case where this may happen). In this case, there isn’t a single candidate position at which to split the node along the current axis. At this point, splitting is tried for the other two axes in turn. If neither of them can find a split (when retries is equal to 2), then there is no useful way to refine the node, since both children will still have the same number of overlapping primitives. When this condition occurs, all that can be done is to give up and make a leaf node.
It is also possible that the best split will have a cost that is still higher than the cost for not splitting the node at all. If it is substantially worse and there aren’t too many primitives, a leaf node is made immediately. Otherwise, badRefines keeps track of how many bad splits have been made so far above the current node of the tree. It’s worth allowing a few slightly poor refinements since later splits may be able to find better ones given a smaller subset of primitives to consider.
Having chosen a split position, the bounding box edges can be used to classify the primitives as being above, below, or on both sides of the split in the same way as was done to keep track of nBelow and nAbove in the earlier code. Note that the bestOffset entry in the arrays is skipped in the loops below; this is necessary so that the primitive whose bounding box edge was used for the split isn’t incorrectly categorized as being on both sides of the split.
Recall that the node number of the “below” child of this node in the kd-tree nodes array is the current node number plus one. After the recursion has returned from that side of the tree, the nextFreeNode offset is used for the “above” child. The only other important detail here is that the prims0 memory is passed directly for reuse by both children, while the prims1 pointer is advanced forward first. This is necessary since the current invocation of KdTreeAccel::buildTree() depends on its prims1 values being preserved over the first recursive call to KdTreeAccel::buildTree() in the following, since it must be passed as a parameter to the second call. However, there is no corresponding need to preserve the edges values or to preserve prims0 beyond its immediate use in the first recursive call.
Thus, much more space is needed for the prims1 array of integers for storing the worst-case possible number of overlapping primitive numbers than for the prims0 array, which only needs to handle the primitives at a single level at a time.
Figure 4.17 shows the basic process of ray traversal through the tree. Intersecting the ray with the tree’s overall bounds gives initial tMin and tMax values, marked with points in the figure. As with the BVHAccel in this chapter, if the ray misses the overall primitive bounds, this method can immediately return false. Otherwise, it starts to descend into the tree, starting at the root. At each interior node, it determines which of the two children the ray enters first and processes both children in order. Traversal ends either when the ray exits the tree or when the closest intersection is found.
The algorithm starts by finding the overall parametric range of the ray’s overlap with the tree, exiting immediately if there is no overlap.
The array of KdToDo structures is used to record the nodes yet to be processed for the ray; it is ordered so that the last active entry in the array is the next node that should be considered. The maximum number of entries needed in this array is the maximum depth of the kd-tree; the array size used in the following should be more than enough in practice.
The traversal continues through the nodes, processing a single leaf or interior node each time through the loop. The values tMin and tMax will always hold the parametric range for the ray’s overlap with the current node.
An intersection may have been previously found in a primitive that overlaps multiple nodes. If the intersection was outside the current node when first detected, it is necessary to keep traversing the tree until we come to a node where tMin is beyond the intersection. Only then is it certain that there is no closer intersection with some other primitive.
For interior tree nodes the first thing to do is to intersect the ray with the node’s splitting plane; given the intersection point, we can determine if one or both of the children nodes need to be processed and in what order the ray passes through them.
The parametric distance to the split plane is computed in the same manner as was done in computing the intersection of a ray and an axis-aligned plane for the ray–bounding box test. We use the precomputed invDir value to save a divide each time through the loop.
Now it is necessary to determine the order in which the ray encounters the children nodes so that the tree is traversed in front-to-back order along the ray. Figure 4.18 shows the geometry of this computation. The position of the ray’s origin with respect to the splitting plane is enough to distinguish between the two cases, ignoring for now the case where the ray doesn’t actually pass through one of the two nodes. The rare case when the ray’s origin lies on the splitting plane requires careful handling in this case, as its direction needs to be used instead to discriminate between the two cases.
It may not be necessary to process both children of this node. Figure 4.19 shows some configurations where the ray only passes through one of the children. The ray will never miss both children, since otherwise the current interior node should never have been visited.
The first if test in the following code corresponds to Figure 4.19(a): only the near node needs to be processed if it can be shown that the ray doesn’t overlap the far node because it faces away from it or doesn’t overlap it because . Figure 4.19(b) shows the similar case tested in the second if test: the near node may not need processing if the ray doesn’t overlap it. Otherwise, the else clause handles the case of both children needing processing; the near node will be processed next, and the far node goes on the todo list.
If the current node is a leaf, intersection tests are performed against the primitives in the leaf.
Processing an individual primitive is just a matter of passing the intersection request on to the primitive.
After doing the intersection tests at the leaf node, the next node to process is loaded from the todo array. If no more nodes remain, then the ray has passed through the tree without hitting anything.
Like the BVHAccel, the KdTreeAccel has a specialized intersection method for shadow rays that is not shown here. It is similar to the Intersect() method but calls the Primitive::IntersectP() method and returns true as soon as it finds any intersection without worrying about finding the closest one.