## 4.4 Kd-Tree Accelerator

*Binary space partitioning* (BSP) trees adaptively subdivide space
with planes. A BSP tree starts with a bounding box that encompasses the
entire scene. If the number of primitives in the box is greater than some
threshold, the box is split in half by a plane. Primitives are then
associated with whichever half they overlap, and primitives that lie in both
halves are associated with both of them. (This is in contrast to BVHs,
where each primitive is assigned to only one of the two subgroups after a
split.)

The splitting process continues recursively either until each leaf region in the resulting tree contains a sufficiently small number of primitives or until a maximum depth is reached. Because the splitting planes can be placed at arbitrary positions inside the overall bound and because different parts of 3D space can be refined to different degrees, BSP trees can easily handle uneven distributions of geometry.

Two variations of BSP trees are *kd-trees* and *octrees*. A kd-tree
simply restricts the splitting plane to be perpendicular to one of the
coordinate axes; this makes both traversal and construction of the tree more
efficient, at the cost of some flexibility in how space is subdivided. The
octree uses three axis-perpendicular planes to simultaneously
split the box into eight regions at each step (typically by splitting down
the center of the extent in each direction). In this section, we will
implement a kd-tree for ray intersection acceleration in the `KdTreeAccel`
class. Source code for this class can be found in the files
`accelerators/kdtreeaccel.h` and
`accelerators/kdtreeaccel.cpp`.

In addition to the primitives to be stored, the `KdTreeAccel`
constructor takes a few parameters that are used to guide the decisions
that will be made as the tree is built; these parameters are stored in
member variables (`isectCost`, `traversalCost`,
`maxPrims`,
`maxDepth`, and `emptyBonus`) for later use. See
Figure 4.14 for an overview of how the tree is built.

`primNums`for kd-tree construction>>

### 4.4.1 Tree Representation

The kd-tree is a binary tree, where each interior node always has both children and where leaves of the tree store the primitives that overlap them. Each interior node must provide access to three pieces of information:

- Split axis: which of the , , or axes was split at this node
- Split position: the position of the splitting plane along the axis
- Children: information about how to reach the two child nodes beneath it

Each leaf node needs to record only which primitives overlap it.

It is worth going through a bit of trouble to ensure that all interior
nodes and many leaf nodes use just 8 bytes of memory (assuming 4-byte
`Float`s) because doing so ensures that eight nodes will
fit into a 64-byte cache line. Because there are often many nodes in the tree
and because many nodes are generally accessed for each ray, minimizing the size of
the node representation substantially improves cache performance. Our
initial implementation used a 16-byte node representation; when we reduced
the size to 8 bytes we obtained nearly a 20% speed increase.

Both leaves and interior nodes are represented by the following
`KdAccelNode` structure. The comments after each `union` member
indicate whether a particular field is used for interior nodes, leaf nodes,
or both.

The two low-order bits of the `KdAccelNode::flags` variable are used to
differentiate between interior nodes with , , and splits (where
these bits hold the values 0, 1, and 2, respectively) and leaf nodes (where
these bits hold the value 3). It is relatively easy to store leaf nodes in
8 bytes: since the low 2 bits of `KdAccelNode::flags` are used to
indicate that this is a leaf, the upper 30 bits of
`KdAccelNode::nPrims` are available to record how many primitives
overlap it. Then, if just a single primitive overlaps a `KdAccelNode`
leaf, an integer index into the `KdTreeAccel::primitives` array
identifies the `Primitive`. If more than one primitive overlaps, then
their indices are stored in a segment of
`KdTreeAccel::primitiveIndices`. The offset to the first index for the
leaf is stored in `KdAccelNode::primitiveIndicesOffset` and the indices
for the rest directly follow.

Leaf nodes are easy to initialize, though we have to be careful with the
details since both `flags` and `nPrims` share the same storage;
we need to be careful to not clobber data for one of them while
initializing the other. Furthermore, the number of primitives must be
shifted two bits to the left before being stored so that the low two bits
of `KdAccelNode::flags` can both be set to `1` to indicate that
this is a leaf node.

For leaf nodes with zero or one overlapping primitives, no additional
memory allocation is necessary thanks to the
`KdAccelNode::onePrimitive` field. For the case where multiple
primitives overlap, storage is allocated in the `primitiveIndices`
array.

Getting interior nodes down to 8 bytes is also reasonably straightforward.
A `Float` (which is 32 bits in size when `Float`s are defined to be
`float`s) stores the position along the chosen split axis where the
node splits space, and, as explained earlier, the lowest 2 bits of
`KdAccelNode::flags` are used to record which axis the node was split
along. All that is left is to store enough information to find the two
children of the node as we’re traversing the tree.

Rather than storing two pointers or offsets, we lay the nodes out in a way
that lets us only store one child pointer: all of the nodes are allocated in
a single contiguous block of memory, and the child of an interior node that
is responsible for space below the splitting plane is always stored in the
array position immediately after its parent (this layout also improves cache
performance, by keeping at least one child close to its parent in memory).
The other child, representing space above the splitting plane, will end up
somewhere else in the array; a single integer offset,
`KdAccelNode::aboveChild`, stores its position in the nodes array.
This representation is similar to the one used for BVH nodes in Section 4.3.4.

Given all those conventions, the code to initialize an interior node is
straightforward. As in the `InitLeaf()` method, it’s important to
assign the value to `flags` before setting `aboveChild` and to
compute the logical OR of the shifted `aboveChild` value so as not to clobber the bits
stored in `flags`.

Finally, we’ll provide a few methods to extract various values from the node, so that callers don’t have to be aware of the details of its representation in memory.

### 4.4.2 Tree Construction

The kd-tree is built with a recursive top-down algorithm. At each step, we have an axis-aligned region of space and a set of primitives that overlap that region. Either the region is split into two subregions and turned into an interior node or a leaf node is created with the overlapping primitives, terminating the recursion.

As mentioned in the discussion of `KdAccelNode`s, all tree nodes are
stored in a contiguous array. `KdTreeAccel::nextFreeNode` records the
next node in this array that is available, and
`KdTreeAccel::nAllocedNodes` records the total number that have been
allocated. By setting both of them to 0 and not allocating any nodes at
start-up, the implementation here ensures that an allocation will be done
immediately when the first node of the tree is initialized.

It is also necessary to determine a maximum tree depth if one wasn’t given to the constructor. Although the tree construction process will normally terminate naturally at a reasonable depth, it is important to cap the maximum depth so that the amount of memory used for the tree cannot grow without bound in pathological cases. We have found that the value gives a reasonable maximum depth for a variety of scenes.

`primNums`for kd-tree construction>>

Because the construction routine will be repeatedly using the bounding
boxes of the primitives along the way, they are stored in a `vector` before
tree construction starts so that the potentially slow
`Primitive::WorldBound()` methods don’t need to be called repeatedly.

One of the parameters to the tree construction routine is an array of
primitive indices
indicating which primitives overlap the current node.
Because all primitives overlap the root node (when the recursion begins) we
start with an array initialized with values from zero through
`primitives.size()-1`.

`primNums`for kd-tree construction>>=

`KdTreeAccel::buildTree()` is called for each tree node. It is responsible
for deciding if the node should be an interior node or leaf and updating the
data structures appropriately. The last three parameters, `edges`,
`prims0`, and `prims1`, are pointers to data that is allocated in the
<<Allocate working memory for kd-tree construction>> fragment, which
will be defined and documented in a few pages.

The main parameters to `KdTreeAccel::buildTree()` are the offset into
the array of `KdAccelNode`s to use for the node that it creates,
`nodeNum`; the bounding box that gives the region of space that the
node covers, `nodeBounds`; and the indices of primitives that overlap
it, `primNums`. The remainder of the parameters will be described
later, closer to where they are used.

`nodes`array>>

`axis`>>

`edges`for

`axis`>>

`axis`to find best>>

`i`th edge>>

`edgeT`>>

If all of the allocated nodes have been used up, node memory is reallocated
with twice as many entries and the old values are copied. The first
time `KdTreeAccel::buildTree()` is called,
`KdTreeAccel::nAllocedNodes` is 0 and an initial block of tree
nodes is allocated.

`nodes`array>>=

A leaf node is created (stopping the recursion) either if there are a
sufficiently small number of primitives in the region or if the maximum depth
has been reached. The `depth` parameter starts out as the tree’s maximum
depth and is decremented at each level.

If this is an internal node, it is necessary to choose a splitting plane, classify the primitives with respect to that plane, and recurse.

`axis`>>

`edges`for

`axis`>>

`axis`to find best>>

`i`th edge>>

`edgeT`>>

Our implementation chooses a split using the SAH introduced in Section 4.3.2. The SAH is applicable to kd-trees as well as BVHs; here, the estimated cost is computed for a series of candidate splitting planes in the node, and the split that gives the lowest cost is chosen.

In the implementation here, the intersection cost and the traversal cost can be set by the user; their default values are 80 and 1, respectively. Ultimately, it is the ratio of these two values that determines the behavior of the tree-building algorithm. The greater ratio between these values compared to the values used for BVH construction reflects the fact that visiting a kd-tree node is less expensive than visiting a BVH node.

One modification to the SAH used for BVH trees is that for kd-trees it is worth giving a slight preference to choosing splits where one of the children has no primitives overlapping it, since rays passing through these regions can immediately advance to the next kd-tree node without any ray–primitive intersection tests. Thus, the revised costs for unsplit and split regions are, respectively,

where is a “bonus” value that is zero unless one of the two regions is completely empty, in which case it takes on a value between 0 and 1.

Given a way to compute the probabilities for the cost model, the only problem to address is how to generate candidate splitting positions and how to efficiently compute the cost for each candidate. It can be shown that the minimum cost with this model will be attained at a split that is coincident with one of the faces of one of the primitive’s bounding boxes—there’s no need to consider splits at intermediate positions. (To convince yourself of this, consider the behavior of the cost function between the edges of the faces.) Here, we will consider all bounding box faces inside the region for one or more of the three coordinate axes.

The cost for checking all of these candidates thus can be kept relatively
low with a carefully structured algorithm. To compute these costs, we will
sweep across the projections of the bounding boxes onto each axis and keep
track of which gives the lowest cost
(Figure 4.15). Each bounding box has two edges on
each axis, each of which is represented by an instance of the
`BoundEdge` structure. This structure records the position of the edge
along the axis, whether it represents the start or end of a bounding box
(going from low to high along the axis), and which primitive it is
associated with.

`BoundEdge`structure.

At most, `2 * primitives.size()` `BoundEdge`s are needed for computing
costs for any tree node, so the memory for the edges for all three
axes is allocated once and then reused for each node that is created.

After determining the estimated cost for creating a leaf,
`KdTreeAccel::buildTree()` chooses an axis to try to split along and computes the cost
function for each candidate split. `bestAxis` and `bestOffset`
record the axis and bounding box edge index that have given the lowest cost so far,
`bestCost`. `invTotalSA` is initialized to the reciprocal of the
node’s surface area; its value will be used when computing the probabilities of
rays passing through each of the candidate children nodes.

`axis`>>

`edges`for

`axis`>>

`axis`to find best>>

`i`th edge>>

`edgeT`>>

This method first tries to find a split along the axis with the largest spatial extent; if successful, this choice helps to give regions of space that tend toward being square in shape. This is an intuitively sensible approach. Later, if it was unsuccessful in finding a good split along this axis, it will go back and try the others in turn.

First the `edges` array for the axis is initialized using the
bounding boxes of the overlapping primitives. The array is then sorted
from low to high along the axis so that it can sweep over the box edges
from first to last.

`axis`>>=

`edges`for

`axis`>>

The C++ standard library routine `sort()` requires that the structure
being sorted define an ordering; this is done using the
`BoundEdge::t` values. However, one subtlety is that if the
`BoundEdge::t` values match, it is necessary to try to break the tie by
comparing the node’s types; this is necessary since `sort()` depends on
the fact that the only time `a < b` and `b < a` are both
`false` is when `a == b`.

`edges`for

`axis`>>=

Given the sorted array of edges, we’d like to quickly compute the cost
function for a split at each one of them. The probabilities for a ray
passing through each child node are easily computed using their surface
areas, and the number of primitives on each side of the split is tracked by
the variables `nBelow` and `nAbove`. We would like to keep their
values updated such that if we chose to split at `edgeT` for a
particular pass through the loop, `nBelow` will give the number of
primitives that would end up below the splitting plane and `nAbove`
would give the number above it.

At the first edge, all primitives must be above that edge by definition, so
`nAbove` is initialized to `nPrimitives` and `nBelow` is set
to 0. When the loop is considering a split at the end of a bounding box’s
extent, `nAbove` needs to be decremented, since that box, which
must have previously been above the splitting plane, will no longer
be above it if
splitting is done at the point. Similarly, after calculating the split
cost, if the split candidate was at the start of a bounding box’s extent,
then the box will be on the below side for all subsequent splits. The
tests at the start and end of the loop body update the primitive counts for
these two cases.

`axis`to find best>>=

`i`th edge>>

`edgeT`>>

`belowSA` and `aboveSA` hold the surface areas of the two
candidate child bounds; they are easily computed by adding up the areas of
the six faces.

`edgeT`>>=

Given all of this information, the cost for a particular split can be computed.

`i`th edge>>=

`edgeT`>>

If the cost computed for this candidate split is the best one so far, the details of the split are recorded.

It may happen that there are no possible splits found in the previous tests
(Figure 4.16 illustrates a case where this may
happen). In this case, there isn’t a single candidate position at which to
split the node along the current axis. At this point, splitting is tried
for the other two axes in turn. If neither of them can find a split (when
`retries` is equal to 2), then there is no useful way to refine the
node, since both children will still have the same number of overlapping
primitives. When this condition occurs, all that can be done is to give up
and make a leaf node.

It is also possible that the best split will have a cost that is still higher
than the cost for not splitting the node at all. If it is substantially worse
and there aren’t too many primitives, a leaf node is made immediately.
Otherwise, `badRefines` keeps track of how many bad splits have been made
so far above the current node of the tree. It’s worth allowing a few slightly
poor refinements since later splits may be able to find better ones given
a smaller subset of primitives to consider.

Having chosen a split position, the bounding box edges can be used to
classify the primitives as being above, below, or on both sides of
the split in the same way as was done to keep track of `nBelow` and
`nAbove` in the earlier code. Note that the `bestOffset` entry in
the arrays is skipped in the loops below; this is necessary so that the
primitive whose bounding box edge was used for the split isn’t incorrectly
categorized as being on both sides of the split.

Recall that the node number of the “below” child of this node in the
kd-tree nodes array is the current node number plus one. After the
recursion has returned from that side of the tree, the `nextFreeNode`
offset is used for the “above” child. The only other important detail
here is that the `prims0` memory is passed directly for reuse by both
children, while the `prims1` pointer is advanced forward first. This
is necessary since the current invocation of `KdTreeAccel::buildTree()`
depends on its `prims1` values being preserved over the first recursive
call to `KdTreeAccel::buildTree()` in the following, since it must be
passed as a parameter to the second call. However, there is no
corresponding need to preserve the `edges` values or to preserve
`prims0` beyond its immediate use in the first recursive call.

Thus, much more space is needed for the `prims1` array of integers for
storing the worst-case possible number of overlapping primitive numbers
than for the `prims0` array, which only needs to handle the primitives
at a single level at a time.

### 4.4.3 Traversal

Figure 4.17 shows the basic process of ray traversal
through the tree. Intersecting the ray with the tree’s overall bounds gives
initial `tMin` and `tMax` values, marked with points in the
figure. As with the `BVHAccel` in this chapter, if the ray misses
the overall primitive bounds, this method can immediately return `false`.
Otherwise, it starts to descend into the tree, starting at the root. At
each interior node, it determines which of the two children the ray enters
first and processes both children in order. Traversal ends either when the
ray exits the tree or when the closest intersection is found.

`secondChild`in todo list>> node = firstChild; tMax = tPlane; }

The algorithm starts by finding the overall parametric range of the ray’s overlap with the tree, exiting immediately if there is no overlap.

The array of `KdToDo` structures is used to record the
nodes yet to be processed for the ray; it is ordered so that the last active
entry in the array is the next node that should be considered. The maximum
number of entries needed in this array is the maximum depth of the kd-tree; the
array size used in the following should be more than enough in practice.

The traversal continues through the nodes, processing a single leaf or
interior node each time through the loop. The values `tMin` and
`tMax` will always hold the parametric range for the ray’s overlap
with the current node.

`secondChild`in todo list>> node = firstChild; tMax = tPlane; }

An intersection may have been previously found in a primitive that overlaps
multiple nodes. If the intersection was outside the current node when first
detected, it is necessary to keep traversing the tree until we come to a node
where `tMin` is beyond the intersection. Only then is it certain that there is
no closer intersection with some other primitive.

For interior tree nodes the first thing to do is to intersect the ray with the node’s splitting plane; given the intersection point, we can determine if one or both of the children nodes need to be processed and in what order the ray passes through them.

`secondChild`in todo list>> node = firstChild; tMax = tPlane; }

The parametric distance to the split plane is computed in the same manner as
was done in computing the intersection of a ray and an axis-aligned plane for
the ray–bounding box test. We use the precomputed `invDir` value to
save a divide each time through the loop.

Now it is necessary to determine the order in which the ray encounters the children nodes so that the tree is traversed in front-to-back order along the ray. Figure 4.18 shows the geometry of this computation. The position of the ray’s origin with respect to the splitting plane is enough to distinguish between the two cases, ignoring for now the case where the ray doesn’t actually pass through one of the two nodes. The rare case when the ray’s origin lies on the splitting plane requires careful handling in this case, as its direction needs to be used instead to discriminate between the two cases.

It may not be necessary to process both children of this node. Figure 4.19 shows some configurations where the ray only passes through one of the children. The ray will never miss both children, since otherwise the current interior node should never have been visited.

The first `if` test in the following code corresponds to
Figure 4.19(a): only the near node needs to be processed if it can be shown that the
ray doesn’t overlap the far node because it faces away from it or
doesn’t overlap it because .
Figure 4.19(b)
shows the similar case tested in the second `if` test: the near node may
not need processing if the ray doesn’t overlap it. Otherwise, the `else`
clause handles the case of both children needing processing; the near node will
be processed next, and the far node goes on the `todo` list.

`secondChild`in todo list>> node = firstChild; tMax = tPlane; }

`secondChild`in todo list>>=

If the current node is a leaf, intersection tests are performed against the primitives in the leaf.

Processing an individual primitive is just a matter of passing the intersection request on to the primitive.

After doing the intersection tests at the leaf node, the next node to process
is loaded from the `todo` array. If no more nodes remain, then
the ray has passed through the tree without hitting anything.

Like the `BVHAccel`, the `KdTreeAccel` has a specialized
intersection method for shadow rays that is not shown here. It is similar
to the `Intersect()` method but calls the `Primitive::IntersectP()`
method and returns `true` as soon as it finds any intersection without
worrying about finding the closest one.