Kd-Tree Accelerator

4.4 Kd-Tree Accelerator

Binary space partitioning (BSP) trees adaptively subdivide space with planes. A BSP tree starts with a bounding box that encompasses the entire scene. If the number of primitives in the box is greater than some threshold, the box is split in half by a plane. Primitives are then associated with whichever half they overlap, and primitives that lie in both halves are associated with both of them. (This is in contrast to BVHs, where each primitive is assigned to only one of the two subgroups after a split.)

The splitting process continues recursively either until each leaf region in the resulting tree contains a sufficiently small number of primitives or until a maximum depth is reached. Because the splitting planes can be placed at arbitrary positions inside the overall bound and because different parts of 3D space can be refined to different degrees, BSP trees can easily handle uneven distributions of geometry.

Two variations of BSP trees are kd-trees and octrees. A kd-tree simply restricts the splitting plane to be perpendicular to one of the coordinate axes; this makes both traversal and construction of the tree more efficient, at the cost of some flexibility in how space is subdivided. The octree uses three axis-perpendicular planes to simultaneously split the box into eight regions at each step (typically by splitting down the center of the extent in each direction). In this section, we will implement a kd-tree for ray intersection acceleration in the KdTreeAccel class. Source code for this class can be found in the files accelerators/kdtreeaccel.h and accelerators/kdtreeaccel.cpp.

<<KdTreeAccel Declarations>>=

class KdTreeAccel : public Aggregate { public: <<KdTreeAccel Public Methods>>

KdTreeAccel(const std::vector<std::shared_ptr<Primitive>> &p, int isectCost = 80, int traversalCost = 1, Float emptyBonus = 0.5, int maxPrims = 1, int maxDepth = -1); Bounds3f WorldBound() const { return bounds; } ~KdTreeAccel(); bool Intersect(const Ray &ray, SurfaceInteraction *isect) const; bool IntersectP(const Ray &ray) const;

private: <<KdTreeAccel Private Methods>>

void buildTree(int nodeNum, const Bounds3f &bounds, const std::vector<Bounds3f> &primBounds, int *primNums, int nprims, int depth, const std::unique_ptr<BoundEdge[]> edges[3], int *prims0, int *prims1, int badRefines = 0);

<<KdTreeAccel Private Data>>

const int isectCost, traversalCost, maxPrims; const Float emptyBonus; std::vector<std::shared_ptr<Primitive>> primitives; std::vector<int> primitiveIndices; KdAccelNode *nodes; int nAllocedNodes, nextFreeNode; Bounds3f bounds;

};

In addition to the primitives to be stored, the KdTreeAccel constructor takes a few parameters that are used to guide the decisions that will be made as the tree is built; these parameters are stored in member variables (isectCost, traversalCost, maxPrims, maxDepth, and emptyBonus) for later use. See Figure 4.14 for an overview of how the tree is built.

Figure 4.14: The kd-tree is built by recursively splitting the bounding box of the scene geometry along one of the coordinate axes. Here, the first split is along the axis; it is placed so that the triangle is precisely alone in the right region and the rest of the primitives end up on the left. The left region is then refined a few more times with axis-aligned splitting planes. The details of the refinement criteria—which axis is used to split space at each step, at which position along the axis the plane is placed, and at what point refinement terminates—can all substantially affect the performance of the tree in practice.

<<KdTreeAccel Method Definitions>>=

KdTreeAccel::KdTreeAccel( const std::vector<std::shared_ptr<Primitive>> &p, int isectCost, int traversalCost, Float emptyBonus, int maxPrims, int maxDepth) : isectCost(isectCost), traversalCost(traversalCost), maxPrims(maxPrims), emptyBonus(emptyBonus), primitives(p) { <<Build kd-tree for accelerator>>

nextFreeNode = nAllocedNodes = 0; if (maxDepth <= 0) maxDepth = std::round(8 + 1.3f * Log2Int(primitives.size())); <<Compute bounds for kd-tree construction>>

std::vector<Bounds3f> primBounds; for (const std::shared_ptr<Primitive> &prim : primitives) { Bounds3f b = prim->WorldBound(); bounds = Union(bounds, b); primBounds.push_back(b); }

<<Allocate working memory for kd-tree construction>>

std::unique_ptr<BoundEdge[]> edges[3]; for (int i = 0; i < 3; ++i) edges[i].reset(new BoundEdge[2 * primitives.size()]); std::unique_ptr<int[]> prims0(new int[primitives.size()]); std::unique_ptr<int[]> prims1(new int[(maxDepth+1) * primitives.size()]);

<<Initialize primNums for kd-tree construction>>

std::unique_ptr<int[]> primNums(new int[primitives.size()]); for (size_t i = 0; i < primitives.size(); ++i) primNums[i] = i;

<<Start recursive construction of kd-tree>>

buildTree(0, bounds, primBounds, primNums.get(), primitives.size(), maxDepth, edges, prims0.get(), prims1.get());

}

<<KdTreeAccel Private Data>>=

const int isectCost, traversalCost, maxPrims; const Float emptyBonus; std::vector<std::shared_ptr<Primitive>> primitives;

4.4.1 Tree Representation

The kd-tree is a binary tree, where each interior node always has both children and where leaves of the tree store the primitives that overlap them. Each interior node must provide access to three pieces of information:

Split axis: which of the , , or axes was split at this node
Split position: the position of the splitting plane along the axis
Children: information about how to reach the two child nodes beneath it

Each leaf node needs to record only which primitives overlap it.

It is worth going through a bit of trouble to ensure that all interior nodes and many leaf nodes use just 8 bytes of memory (assuming 4-byte Floats) because doing so ensures that eight nodes will fit into a 64-byte cache line. Because there are often many nodes in the tree and because many nodes are generally accessed for each ray, minimizing the size of the node representation substantially improves cache performance. Our initial implementation used a 16-byte node representation; when we reduced the size to 8 bytes we obtained nearly a 20% speed increase.

Both leaves and interior nodes are represented by the following KdAccelNode structure. The comments after each union member indicate whether a particular field is used for interior nodes, leaf nodes, or both.

<<KdTreeAccel Local Declarations>>=

struct KdAccelNode { <<KdAccelNode Methods>>

void InitLeaf(int *primNums, int np, std::vector<int> *primitiveIndices); void InitInterior(int axis, int ac, Float s) { split = s; flags = axis; aboveChild |= (ac << 2); } Float SplitPos() const { return split; } int nPrimitives() const { return nPrims >> 2; } int SplitAxis() const { return flags & 3; } bool IsLeaf() const { return (flags & 3) == 3; } int AboveChild() const { return aboveChild >> 2; }

union { Float split; // Interior int onePrimitive; // Leaf int primitiveIndicesOffset; // Leaf }; union { int flags; // Both int nPrims; // Leaf int aboveChild; // Interior }; };

The two low-order bits of the KdAccelNode::flags variable are used to differentiate between interior nodes with , , and splits (where these bits hold the values 0, 1, and 2, respectively) and leaf nodes (where these bits hold the value 3). It is relatively easy to store leaf nodes in 8 bytes: since the low 2 bits of KdAccelNode::flags are used to indicate that this is a leaf, the upper 30 bits of KdAccelNode::nPrims are available to record how many primitives overlap it. Then, if just a single primitive overlaps a KdAccelNode leaf, an integer index into the KdTreeAccel::primitives array identifies the Primitive. If more than one primitive overlaps, then their indices are stored in a segment of KdTreeAccel::primitiveIndices. The offset to the first index for the leaf is stored in KdAccelNode::primitiveIndicesOffset and the indices for the rest directly follow.

<<KdTreeAccel Private Data>>+=

std::vector<int> primitiveIndices;

Leaf nodes are easy to initialize, though we have to be careful with the details since both flags and nPrims share the same storage; we need to be careful to not clobber data for one of them while initializing the other. Furthermore, the number of primitives must be shifted two bits to the left before being stored so that the low two bits of KdAccelNode::flags can both be set to 1 to indicate that this is a leaf node.

<<KdTreeAccel Method Definitions>>+=

void KdAccelNode::InitLeaf(int *primNums, int np, std::vector<int> *primitiveIndices) { flags = 3; nPrims |= (np << 2); <<Store primitive ids for leaf node>>

if (np == 0) onePrimitive = 0; else if (np == 1) onePrimitive = primNums[0]; else { primitiveIndicesOffset = primitiveIndices->size(); for (int i = 0; i < np; ++i) primitiveIndices->push_back(primNums[i]); }

}

For leaf nodes with zero or one overlapping primitives, no additional memory allocation is necessary thanks to the KdAccelNode::onePrimitive field. For the case where multiple primitives overlap, storage is allocated in the primitiveIndices array.

<<Store primitive ids for leaf node>>=

Getting interior nodes down to 8 bytes is also reasonably straightforward. A Float (which is 32 bits in size when Floats are defined to be floats) stores the position along the chosen split axis where the node splits space, and, as explained earlier, the lowest 2 bits of KdAccelNode::flags are used to record which axis the node was split along. All that is left is to store enough information to find the two children of the node as we’re traversing the tree.

Rather than storing two pointers or offsets, we lay the nodes out in a way that lets us only store one child pointer: all of the nodes are allocated in a single contiguous block of memory, and the child of an interior node that is responsible for space below the splitting plane is always stored in the array position immediately after its parent (this layout also improves cache performance, by keeping at least one child close to its parent in memory). The other child, representing space above the splitting plane, will end up somewhere else in the array; a single integer offset, KdAccelNode::aboveChild, stores its position in the nodes array. This representation is similar to the one used for BVH nodes in Section 4.3.4.

Given all those conventions, the code to initialize an interior node is straightforward. As in the InitLeaf() method, it’s important to assign the value to flags before setting aboveChild and to compute the logical OR of the shifted aboveChild value so as not to clobber the bits stored in flags.

<<KdAccelNode Methods>>=

void InitInterior(int axis, int ac, Float s) { split = s; flags = axis; aboveChild |= (ac << 2); }

Finally, we’ll provide a few methods to extract various values from the node, so that callers don’t have to be aware of the details of its representation in memory.

<<KdAccelNode Methods>>+=

Float SplitPos() const { return split; } int nPrimitives() const { return nPrims >> 2; } int SplitAxis() const { return flags & 3; } bool IsLeaf() const { return (flags & 3) == 3; } int AboveChild() const { return aboveChild >> 2; }

4.4.2 Tree Construction

The kd-tree is built with a recursive top-down algorithm. At each step, we have an axis-aligned region of space and a set of primitives that overlap that region. Either the region is split into two subregions and turned into an interior node or a leaf node is created with the overlapping primitives, terminating the recursion.

As mentioned in the discussion of KdAccelNodes, all tree nodes are stored in a contiguous array. KdTreeAccel::nextFreeNode records the next node in this array that is available, and KdTreeAccel::nAllocedNodes records the total number that have been allocated. By setting both of them to 0 and not allocating any nodes at start-up, the implementation here ensures that an allocation will be done immediately when the first node of the tree is initialized.

It is also necessary to determine a maximum tree depth if one wasn’t given to the constructor. Although the tree construction process will normally terminate naturally at a reasonable depth, it is important to cap the maximum depth so that the amount of memory used for the tree cannot grow without bound in pathological cases. We have found that the value gives a reasonable maximum depth for a variety of scenes.

<<Build kd-tree for accelerator>>=

nextFreeNode = nAllocedNodes = 0; if (maxDepth <= 0) maxDepth = std::round(8 + 1.3f * Log2Int(primitives.size())); <<Compute bounds for kd-tree construction>>

std::vector<Bounds3f> primBounds; for (const std::shared_ptr<Primitive> &prim : primitives) { Bounds3f b = prim->WorldBound(); bounds = Union(bounds, b); primBounds.push_back(b); }

<<Allocate working memory for kd-tree construction>>

<<Initialize primNums for kd-tree construction>>

std::unique_ptr<int[]> primNums(new int[primitives.size()]); for (size_t i = 0; i < primitives.size(); ++i) primNums[i] = i;

<<Start recursive construction of kd-tree>>

buildTree(0, bounds, primBounds, primNums.get(), primitives.size(), maxDepth, edges, prims0.get(), prims1.get());

<<KdTreeAccel Private Data>>+=

KdAccelNode *nodes; int nAllocedNodes, nextFreeNode;

Because the construction routine will be repeatedly using the bounding boxes of the primitives along the way, they are stored in a vector before tree construction starts so that the potentially slow Primitive::WorldBound() methods don’t need to be called repeatedly.

<<Compute bounds for kd-tree construction>>=

std::vector<Bounds3f> primBounds; for (const std::shared_ptr<Primitive> &prim : primitives) { Bounds3f b = prim->WorldBound(); bounds = Union(bounds, b); primBounds.push_back(b); }

<<KdTreeAccel Private Data>>+=

Bounds3f bounds;

One of the parameters to the tree construction routine is an array of primitive indices indicating which primitives overlap the current node. Because all primitives overlap the root node (when the recursion begins) we start with an array initialized with values from zero through primitives.size()-1.

<<Initialize primNums for kd-tree construction>>=

std::unique_ptr<int[]> primNums(new int[primitives.size()]); for (size_t i = 0; i < primitives.size(); ++i) primNums[i] = i;

KdTreeAccel::buildTree() is called for each tree node. It is responsible for deciding if the node should be an interior node or leaf and updating the data structures appropriately. The last three parameters, edges, prims0, and prims1, are pointers to data that is allocated in the <<Allocate working memory for kd-tree construction>> fragment, which will be defined and documented in a few pages.

<<Start recursive construction of kd-tree>>=

buildTree(0, bounds, primBounds, primNums.get(), primitives.size(), maxDepth, edges, prims0.get(), prims1.get());

The main parameters to KdTreeAccel::buildTree() are the offset into the array of KdAccelNodes to use for the node that it creates, nodeNum; the bounding box that gives the region of space that the node covers, nodeBounds; and the indices of primitives that overlap it, primNums. The remainder of the parameters will be described later, closer to where they are used.

<<KdTreeAccel Method Definitions>>+=

void KdTreeAccel::buildTree(int nodeNum, const Bounds3f &nodeBounds, const std::vector<Bounds3f> &allPrimBounds, int *primNums, int nPrimitives, int depth, const std::unique_ptr<BoundEdge[]> edges[3], int *prims0, int *prims1, int badRefines) { <<Get next free node from nodes array>>

if (nextFreeNode == nAllocedNodes) { int nNewAllocNodes = std::max(2 * nAllocedNodes, 512); KdAccelNode *n = AllocAligned<KdAccelNode>(nNewAllocNodes); if (nAllocedNodes > 0) { memcpy(n, nodes, nAllocedNodes * sizeof(KdAccelNode)); FreeAligned(nodes); } nodes = n; nAllocedNodes = nNewAllocNodes; } ++nextFreeNode;

<<Initialize leaf node if termination criteria met>>

if (nPrimitives <= maxPrims || depth == 0) { nodes[nodeNum].InitLeaf(primNums, nPrimitives, &primitiveIndices); return; }

<<Initialize interior node and continue recursion>>

<<Choose split axis position for interior node>>

int bestAxis = -1, bestOffset = -1; Float bestCost = Infinity; Float oldCost = isectCost * Float(nPrimitives); Float totalSA = nodeBounds.SurfaceArea(); Float invTotalSA = 1 / totalSA; Vector3f d = nodeBounds.pMax - nodeBounds.pMin; <<Choose which axis to split along>>

int axis = nodeBounds.MaximumExtent();

int retries = 0; retrySplit: <<Initialize edges for axis>>

for (int i = 0; i < nPrimitives; ++i) { int pn = primNums[i]; const Bounds3f &bounds = allPrimBounds[pn]; edges[axis][2 * i] = BoundEdge(bounds.pMin[axis], pn, true); edges[axis][2 * i + 1] = BoundEdge(bounds.pMax[axis], pn, false); } <<Sort edges for axis>>

std::sort(&edges[axis][0], &edges[axis][2*nPrimitives], [](const BoundEdge &e0, const BoundEdge &e1) -> bool { if (e0.t == e1.t) return (int)e0.type < (int)e1.type; else return e0.t < e1.t; });

<<Compute cost of all splits for axis to find best>>

int nBelow = 0, nAbove = nPrimitives; for (int i = 0; i < 2 * nPrimitives; ++i) { if (edges[axis][i].type == EdgeType::End) --nAbove; Float edgeT = edges[axis][i].t; if (edgeT > nodeBounds.pMin[axis] && edgeT < nodeBounds.pMax[axis]) { <<Compute cost for split at ith edge>>

<<Compute child surface areas for split at edgeT>>

int otherAxis0 = (axis + 1) % 3, otherAxis1 = (axis + 2) % 3; Float belowSA = 2 * (d[otherAxis0] * d[otherAxis1] + (edgeT - nodeBounds.pMin[axis]) * (d[otherAxis0] + d[otherAxis1])); Float aboveSA = 2 * (d[otherAxis0] * d[otherAxis1] + (nodeBounds.pMax[axis] - edgeT) * (d[otherAxis0] + d[otherAxis1]));

Float pBelow = belowSA * invTotalSA; Float pAbove = aboveSA * invTotalSA; Float eb = (nAbove == 0 || nBelow == 0) ? emptyBonus : 0; Float cost = traversalCost + isectCost * (1 - eb) * (pBelow * nBelow + pAbove * nAbove); <<Update best split if this is lowest cost so far>>

if (cost < bestCost) { bestCost = cost; bestAxis = axis; bestOffset = i; }

} if (edges[axis][i].type == EdgeType::Start) ++nBelow; }

<<Create leaf if no good splits were found>>

if (bestAxis == -1 && retries < 2) { ++retries; axis = (axis + 1) % 3; goto retrySplit; } if (bestCost > oldCost) ++badRefines; if ((bestCost > 4 * oldCost && nPrimitives < 16) || bestAxis == -1 || badRefines == 3) { nodes[nodeNum].InitLeaf(primNums, nPrimitives, &primitiveIndices); return; }

<<Classify primitives with respect to split>>

int n0 = 0, n1 = 0; for (int i = 0; i < bestOffset; ++i) if (edges[bestAxis][i].type == EdgeType::Start) prims0[n0++] = edges[bestAxis][i].primNum; for (int i = bestOffset + 1; i < 2 * nPrimitives; ++i) if (edges[bestAxis][i].type == EdgeType::End) prims1[n1++] = edges[bestAxis][i].primNum;

<<Recursively initialize children nodes>>

Float tSplit = edges[bestAxis][bestOffset].t; Bounds3f bounds0 = nodeBounds, bounds1 = nodeBounds; bounds0.pMax[bestAxis] = bounds1.pMin[bestAxis] = tSplit; buildTree(nodeNum + 1, bounds0, allPrimBounds, prims0, n0, depth - 1, edges, prims0, prims1 + nPrimitives, badRefines); int aboveChild = nextFreeNode; nodes[nodeNum].InitInterior(bestAxis, aboveChild, tSplit); buildTree(aboveChild, bounds1, allPrimBounds, prims1, n1, depth - 1, edges, prims0, prims1 + nPrimitives, badRefines);

}

If all of the allocated nodes have been used up, node memory is reallocated with twice as many entries and the old values are copied. The first time KdTreeAccel::buildTree() is called, KdTreeAccel::nAllocedNodes is 0 and an initial block of tree nodes is allocated.

<<Get next free node from nodes array>>=

A leaf node is created (stopping the recursion) either if there are a sufficiently small number of primitives in the region or if the maximum depth has been reached. The depth parameter starts out as the tree’s maximum depth and is decremented at each level.

<<Initialize leaf node if termination criteria met>>=

if (nPrimitives <= maxPrims || depth == 0) { nodes[nodeNum].InitLeaf(primNums, nPrimitives, &primitiveIndices); return; }

If this is an internal node, it is necessary to choose a splitting plane, classify the primitives with respect to that plane, and recurse.

<<Initialize interior node and continue recursion>>=

<<Choose split axis position for interior node>>

int axis = nodeBounds.MaximumExtent();

int retries = 0; retrySplit: <<Initialize edges for axis>>

std::sort(&edges[axis][0], &edges[axis][2*nPrimitives], [](const BoundEdge &e0, const BoundEdge &e1) -> bool { if (e0.t == e1.t) return (int)e0.type < (int)e1.type; else return e0.t < e1.t; });

<<Compute cost of all splits for axis to find best>>

<<Compute child surface areas for split at edgeT>>

if (cost < bestCost) { bestCost = cost; bestAxis = axis; bestOffset = i; }

} if (edges[axis][i].type == EdgeType::Start) ++nBelow; }

<<Create leaf if no good splits were found>>

<<Classify primitives with respect to split>>

<<Recursively initialize children nodes>>

Our implementation chooses a split using the SAH introduced in Section 4.3.2. The SAH is applicable to kd-trees as well as BVHs; here, the estimated cost is computed for a series of candidate splitting planes in the node, and the split that gives the lowest cost is chosen.

In the implementation here, the intersection cost and the traversal cost can be set by the user; their default values are 80 and 1, respectively. Ultimately, it is the ratio of these two values that determines the behavior of the tree-building algorithm. The greater ratio between these values compared to the values used for BVH construction reflects the fact that visiting a kd-tree node is less expensive than visiting a BVH node.

One modification to the SAH used for BVH trees is that for kd-trees it is worth giving a slight preference to choosing splits where one of the children has no primitives overlapping it, since rays passing through these regions can immediately advance to the next kd-tree node without any ray–primitive intersection tests. Thus, the revised costs for unsplit and split regions are, respectively,

where is a “bonus” value that is zero unless one of the two regions is completely empty, in which case it takes on a value between 0 and 1.

Given a way to compute the probabilities for the cost model, the only problem to address is how to generate candidate splitting positions and how to efficiently compute the cost for each candidate. It can be shown that the minimum cost with this model will be attained at a split that is coincident with one of the faces of one of the primitive’s bounding boxes—there’s no need to consider splits at intermediate positions. (To convince yourself of this, consider the behavior of the cost function between the edges of the faces.) Here, we will consider all bounding box faces inside the region for one or more of the three coordinate axes.

The cost for checking all of these candidates thus can be kept relatively low with a carefully structured algorithm. To compute these costs, we will sweep across the projections of the bounding boxes onto each axis and keep track of which gives the lowest cost (Figure 4.15). Each bounding box has two edges on each axis, each of which is represented by an instance of the BoundEdge structure. This structure records the position of the edge along the axis, whether it represents the start or end of a bounding box (going from low to high along the axis), and which primitive it is associated with.

Figure 4.15: Given an axis along which we’d like to consider possible splits, the primitives’ bounding boxes are projected onto the axis, which leads to an efficient algorithm to track how many primitives would be on each side of a particular splitting plane. Here, for example, a split at would leave completely below the splitting plane, straddling it, and completely above it. Each point on the axis, , , , , , and , is represented by an instance of the BoundEdge structure.

<<KdTreeAccel Local Declarations>>+=

enum class EdgeType { Start, End };

<<KdTreeAccel Local Declarations>>+=

struct BoundEdge { <<BoundEdge Public Methods>>

BoundEdge() { } BoundEdge(Float t, int primNum, bool starting) : t(t), primNum(primNum) { type = starting ? EdgeType::Start : EdgeType::End; }

Float t; int primNum; EdgeType type; };

<<BoundEdge Public Methods>>=

BoundEdge(Float t, int primNum, bool starting) : t(t), primNum(primNum) { type = starting ? EdgeType::Start : EdgeType::End; }

At most, 2 * primitives.size() BoundEdges are needed for computing costs for any tree node, so the memory for the edges for all three axes is allocated once and then reused for each node that is created.

<<Allocate working memory for kd-tree construction>>=

std::unique_ptr<BoundEdge[]> edges[3]; for (int i = 0; i < 3; ++i) edges[i].reset(new BoundEdge[2 * primitives.size()]);

After determining the estimated cost for creating a leaf, KdTreeAccel::buildTree() chooses an axis to try to split along and computes the cost function for each candidate split. bestAxis and bestOffset record the axis and bounding box edge index that have given the lowest cost so far, bestCost. invTotalSA is initialized to the reciprocal of the node’s surface area; its value will be used when computing the probabilities of rays passing through each of the candidate children nodes.

<<Choose split axis position for interior node>>=

int axis = nodeBounds.MaximumExtent();

int retries = 0; retrySplit: <<Initialize edges for axis>>

std::sort(&edges[axis][0], &edges[axis][2*nPrimitives], [](const BoundEdge &e0, const BoundEdge &e1) -> bool { if (e0.t == e1.t) return (int)e0.type < (int)e1.type; else return e0.t < e1.t; });

<<Compute cost of all splits for axis to find best>>

<<Compute child surface areas for split at edgeT>>

if (cost < bestCost) { bestCost = cost; bestAxis = axis; bestOffset = i; }

} if (edges[axis][i].type == EdgeType::Start) ++nBelow; }

This method first tries to find a split along the axis with the largest spatial extent; if successful, this choice helps to give regions of space that tend toward being square in shape. This is an intuitively sensible approach. Later, if it was unsuccessful in finding a good split along this axis, it will go back and try the others in turn.

<<Choose which axis to split along>>=

int axis = nodeBounds.MaximumExtent();

First the edges array for the axis is initialized using the bounding boxes of the overlapping primitives. The array is then sorted from low to high along the axis so that it can sweep over the box edges from first to last.

<<Initialize edges for axis>>=

std::sort(&edges[axis][0], &edges[axis][2*nPrimitives], [](const BoundEdge &e0, const BoundEdge &e1) -> bool { if (e0.t == e1.t) return (int)e0.type < (int)e1.type; else return e0.t < e1.t; });

The C++ standard library routine sort() requires that the structure being sorted define an ordering; this is done using the BoundEdge::t values. However, one subtlety is that if the BoundEdge::t values match, it is necessary to try to break the tie by comparing the node’s types; this is necessary since sort() depends on the fact that the only time a < b and b < a are both false is when a == b.

<<Sort edges for axis>>=

std::sort(&edges[axis][0], &edges[axis][2*nPrimitives], [](const BoundEdge &e0, const BoundEdge &e1) -> bool { if (e0.t == e1.t) return (int)e0.type < (int)e1.type; else return e0.t < e1.t; });

Given the sorted array of edges, we’d like to quickly compute the cost function for a split at each one of them. The probabilities for a ray passing through each child node are easily computed using their surface areas, and the number of primitives on each side of the split is tracked by the variables nBelow and nAbove. We would like to keep their values updated such that if we chose to split at edgeT for a particular pass through the loop, nBelow will give the number of primitives that would end up below the splitting plane and nAbove would give the number above it.

At the first edge, all primitives must be above that edge by definition, so nAbove is initialized to nPrimitives and nBelow is set to 0. When the loop is considering a split at the end of a bounding box’s extent, nAbove needs to be decremented, since that box, which must have previously been above the splitting plane, will no longer be above it if splitting is done at the point. Similarly, after calculating the split cost, if the split candidate was at the start of a bounding box’s extent, then the box will be on the below side for all subsequent splits. The tests at the start and end of the loop body update the primitive counts for these two cases.

<<Compute cost of all splits for axis to find best>>=

<<Compute child surface areas for split at edgeT>>

if (cost < bestCost) { bestCost = cost; bestAxis = axis; bestOffset = i; }

} if (edges[axis][i].type == EdgeType::Start) ++nBelow; }

belowSA and aboveSA hold the surface areas of the two candidate child bounds; they are easily computed by adding up the areas of the six faces.

<<Compute child surface areas for split at edgeT>>=

Given all of this information, the cost for a particular split can be computed.

<<Compute cost for split at ith edge>>=

<<Compute child surface areas for split at edgeT>>

if (cost < bestCost) { bestCost = cost; bestAxis = axis; bestOffset = i; }

If the cost computed for this candidate split is the best one so far, the details of the split are recorded.

<<Update best split if this is lowest cost so far>>=

if (cost < bestCost) { bestCost = cost; bestAxis = axis; bestOffset = i; }

It may happen that there are no possible splits found in the previous tests (Figure 4.16 illustrates a case where this may happen). In this case, there isn’t a single candidate position at which to split the node along the current axis. At this point, splitting is tried for the other two axes in turn. If neither of them can find a split (when retries is equal to 2), then there is no useful way to refine the node, since both children will still have the same number of overlapping primitives. When this condition occurs, all that can be done is to give up and make a leaf node.

Figure 4.16: If multiple bounding boxes (dotted lines) overlap a kd-tree node (solid lines) as shown here, there is no possible split position that can result in fewer than all of the primitives being on both sides of it.

It is also possible that the best split will have a cost that is still higher than the cost for not splitting the node at all. If it is substantially worse and there aren’t too many primitives, a leaf node is made immediately. Otherwise, badRefines keeps track of how many bad splits have been made so far above the current node of the tree. It’s worth allowing a few slightly poor refinements since later splits may be able to find better ones given a smaller subset of primitives to consider.

<<Create leaf if no good splits were found>>=

Having chosen a split position, the bounding box edges can be used to classify the primitives as being above, below, or on both sides of the split in the same way as was done to keep track of nBelow and nAbove in the earlier code. Note that the bestOffset entry in the arrays is skipped in the loops below; this is necessary so that the primitive whose bounding box edge was used for the split isn’t incorrectly categorized as being on both sides of the split.

<<Classify primitives with respect to split>>=

Recall that the node number of the “below” child of this node in the kd-tree nodes array is the current node number plus one. After the recursion has returned from that side of the tree, the nextFreeNode offset is used for the “above” child. The only other important detail here is that the prims0 memory is passed directly for reuse by both children, while the prims1 pointer is advanced forward first. This is necessary since the current invocation of KdTreeAccel::buildTree() depends on its prims1 values being preserved over the first recursive call to KdTreeAccel::buildTree() in the following, since it must be passed as a parameter to the second call. However, there is no corresponding need to preserve the edges values or to preserve prims0 beyond its immediate use in the first recursive call.

<<Recursively initialize children nodes>>=

Thus, much more space is needed for the prims1 array of integers for storing the worst-case possible number of overlapping primitive numbers than for the prims0 array, which only needs to handle the primitives at a single level at a time.

<<Allocate working memory for kd-tree construction>>+=

std::unique_ptr<int[]> prims0(new int[primitives.size()]); std::unique_ptr<int[]> prims1(new int[(maxDepth+1) * primitives.size()]);

4.4.3 Traversal

Figure 4.17 shows the basic process of ray traversal through the tree. Intersecting the ray with the tree’s overall bounds gives initial tMin and tMax values, marked with points in the figure. As with the BVHAccel in this chapter, if the ray misses the overall primitive bounds, this method can immediately return false. Otherwise, it starts to descend into the tree, starting at the root. At each interior node, it determines which of the two children the ray enters first and processes both children in order. Traversal ends either when the ray exits the tree or when the closest intersection is found.

Figure 4.17: Traversal of a Ray through the kd-Tree. (a) The ray is intersected with the bounds of the tree, giving an initial parametric range to consider. (b) Because this range is nonempty, it is necessary to consider the two children of the root node here. The ray first enters the child on the right, labeled “near,” where it has a parametric range . If the near node is a leaf with primitives in it, ray–primitive intersection tests are performed; otherwise, its children nodes are processed. (c) If no hit is found in the node, or if a hit is found beyond , then the far node, on the left, is processed. (d) This sequence continues—processing tree nodes in a depth-first, front-to-back traversal—until the closest intersection is found or the ray exits the tree.

<<KdTreeAccel Method Definitions>>+=

bool KdTreeAccel::Intersect(const Ray &ray, SurfaceInteraction *isect) const { <<Compute initial parametric range of ray inside kd-tree extent>>

Float tMin, tMax; if (!bounds.IntersectP(ray, &tMin, &tMax)) return false;

<<Prepare to traverse kd-tree for ray>>

Vector3f invDir(1 / ray.d.x, 1 / ray.d.y, 1 / ray.d.z); constexpr int maxTodo = 64; KdToDo todo[maxTodo]; int todoPos = 0;

<<Traverse kd-tree nodes in order for ray>>

bool hit = false; const KdAccelNode *node = &nodes[0]; while (node != nullptr) { <<Bail out if we found a hit closer than the current node>>

if (ray.tMax < tMin) break;

if (!node->IsLeaf()) { <<Process kd-tree interior node>>

<<Compute parametric distance along ray to split plane>>

int axis = node->SplitAxis(); Float tPlane = (node->SplitPos() - ray.o[axis]) * invDir[axis];

<<Get node children pointers for ray>>

const KdAccelNode *firstChild, *secondChild; int belowFirst = (ray.o[axis] < node->SplitPos()) || (ray.o[axis] == node->SplitPos() && ray.d[axis] <= 0); if (belowFirst) { firstChild = node + 1; secondChild = &nodes[node->AboveChild()]; } else { firstChild = &nodes[node->AboveChild()]; secondChild = node + 1; }

<<Advance to next child node, possibly enqueue other child>>

if (tPlane > tMax || tPlane <= 0) node = firstChild; else if (tPlane < tMin) node = secondChild; else { <<Enqueue secondChild in todo list>>

todo[todoPos].node = secondChild; todo[todoPos].tMin = tPlane; todo[todoPos].tMax = tMax; ++todoPos;

node = firstChild; tMax = tPlane; }

} else { <<Check for intersections inside leaf node>>

int nPrimitives = node->nPrimitives(); if (nPrimitives == 1) { const std::shared_ptr<Primitive> &p = primitives[node->onePrimitive]; <<Check one primitive inside leaf node>>

if (p->Intersect(ray, isect)) hit = true;

} else { for (int i = 0; i < nPrimitives; ++i) { int index = primitiveIndices[node->primitiveIndicesOffset + i]; const std::shared_ptr<Primitive> &p = primitives[index]; <<Check one primitive inside leaf node>>

if (p->Intersect(ray, isect)) hit = true;

} }

<<Grab next node to process from todo list>>

if (todoPos > 0) { --todoPos; node = todo[todoPos].node; tMin = todo[todoPos].tMin; tMax = todo[todoPos].tMax; } else break;

} } return hit;

}

The algorithm starts by finding the overall parametric range of the ray’s overlap with the tree, exiting immediately if there is no overlap.

<<Compute initial parametric range of ray inside kd-tree extent>>=

Float tMin, tMax; if (!bounds.IntersectP(ray, &tMin, &tMax)) return false;

The array of KdToDo structures is used to record the nodes yet to be processed for the ray; it is ordered so that the last active entry in the array is the next node that should be considered. The maximum number of entries needed in this array is the maximum depth of the kd-tree; the array size used in the following should be more than enough in practice.

<<Prepare to traverse kd-tree for ray>>=

Vector3f invDir(1 / ray.d.x, 1 / ray.d.y, 1 / ray.d.z); constexpr int maxTodo = 64; KdToDo todo[maxTodo]; int todoPos = 0;

<<KdTreeAccel Declarations>>+=

struct KdToDo { const KdAccelNode *node; Float tMin, tMax; };

The traversal continues through the nodes, processing a single leaf or interior node each time through the loop. The values tMin and tMax will always hold the parametric range for the ray’s overlap with the current node.

<<Traverse kd-tree nodes in order for ray>>=

bool hit = false; const KdAccelNode *node = &nodes[0]; while (node != nullptr) { <<Bail out if we found a hit closer than the current node>>

if (ray.tMax < tMin) break;

if (!node->IsLeaf()) { <<Process kd-tree interior node>>

<<Compute parametric distance along ray to split plane>>

int axis = node->SplitAxis(); Float tPlane = (node->SplitPos() - ray.o[axis]) * invDir[axis];

<<Get node children pointers for ray>>

<<Advance to next child node, possibly enqueue other child>>

if (tPlane > tMax || tPlane <= 0) node = firstChild; else if (tPlane < tMin) node = secondChild; else { <<Enqueue secondChild in todo list>>

todo[todoPos].node = secondChild; todo[todoPos].tMin = tPlane; todo[todoPos].tMax = tMax; ++todoPos;

node = firstChild; tMax = tPlane; }

} else { <<Check for intersections inside leaf node>>

int nPrimitives = node->nPrimitives(); if (nPrimitives == 1) { const std::shared_ptr<Primitive> &p = primitives[node->onePrimitive]; <<Check one primitive inside leaf node>>

if (p->Intersect(ray, isect)) hit = true;

} }

<<Grab next node to process from todo list>>

if (todoPos > 0) { --todoPos; node = todo[todoPos].node; tMin = todo[todoPos].tMin; tMax = todo[todoPos].tMax; } else break;

} } return hit;

An intersection may have been previously found in a primitive that overlaps multiple nodes. If the intersection was outside the current node when first detected, it is necessary to keep traversing the tree until we come to a node where tMin is beyond the intersection. Only then is it certain that there is no closer intersection with some other primitive.

<<Bail out if we found a hit closer than the current node>>=

if (ray.tMax < tMin) break;

For interior tree nodes the first thing to do is to intersect the ray with the node’s splitting plane; given the intersection point, we can determine if one or both of the children nodes need to be processed and in what order the ray passes through them.

<<Process kd-tree interior node>>=

<<Compute parametric distance along ray to split plane>>

int axis = node->SplitAxis(); Float tPlane = (node->SplitPos() - ray.o[axis]) * invDir[axis];

<<Get node children pointers for ray>>

<<Advance to next child node, possibly enqueue other child>>

if (tPlane > tMax || tPlane <= 0) node = firstChild; else if (tPlane < tMin) node = secondChild; else { <<Enqueue secondChild in todo list>>

todo[todoPos].node = secondChild; todo[todoPos].tMin = tPlane; todo[todoPos].tMax = tMax; ++todoPos;

node = firstChild; tMax = tPlane; }

The parametric distance to the split plane is computed in the same manner as was done in computing the intersection of a ray and an axis-aligned plane for the ray–bounding box test. We use the precomputed invDir value to save a divide each time through the loop.

<<Compute parametric distance along ray to split plane>>=

int axis = node->SplitAxis(); Float tPlane = (node->SplitPos() - ray.o[axis]) * invDir[axis];

Now it is necessary to determine the order in which the ray encounters the children nodes so that the tree is traversed in front-to-back order along the ray. Figure 4.18 shows the geometry of this computation. The position of the ray’s origin with respect to the splitting plane is enough to distinguish between the two cases, ignoring for now the case where the ray doesn’t actually pass through one of the two nodes. The rare case when the ray’s origin lies on the splitting plane requires careful handling in this case, as its direction needs to be used instead to discriminate between the two cases.

Figure 4.18: The position of the origin of the ray with respect to the splitting plane can be used to determine which of the node’s children should be processed first. If the origin of a ray like is on the “below” side of the splitting plane, we should process the below child before the above child, and vice versa.

<<Get node children pointers for ray>>=

It may not be necessary to process both children of this node. Figure 4.19 shows some configurations where the ray only passes through one of the children. The ray will never miss both children, since otherwise the current interior node should never have been visited.

Figure 4.19: Two cases where both children of a node don’t need to be processed because the ray doesn’t overlap them. (a) The top ray intersects the splitting plane beyond the ray’s position and thus doesn’t enter the far child. The bottom ray is facing away from the splitting plane, indicated by a negative value. (b) The ray intersects the plane before the ray’s value, indicating that the near child doesn’t need processing.

The first if test in the following code corresponds to Figure 4.19(a): only the near node needs to be processed if it can be shown that the ray doesn’t overlap the far node because it faces away from it or doesn’t overlap it because . Figure 4.19(b) shows the similar case tested in the second if test: the near node may not need processing if the ray doesn’t overlap it. Otherwise, the else clause handles the case of both children needing processing; the near node will be processed next, and the far node goes on the todo list.

<<Advance to next child node, possibly enqueue other child>>=

if (tPlane > tMax || tPlane <= 0) node = firstChild; else if (tPlane < tMin) node = secondChild; else { <<Enqueue secondChild in todo list>>

todo[todoPos].node = secondChild; todo[todoPos].tMin = tPlane; todo[todoPos].tMax = tMax; ++todoPos;

node = firstChild; tMax = tPlane; }

<<Enqueue secondChild in todo list>>=

todo[todoPos].node = secondChild; todo[todoPos].tMin = tPlane; todo[todoPos].tMax = tMax; ++todoPos;

If the current node is a leaf, intersection tests are performed against the primitives in the leaf.

<<Check for intersections inside leaf node>>=

int nPrimitives = node->nPrimitives(); if (nPrimitives == 1) { const std::shared_ptr<Primitive> &p = primitives[node->onePrimitive]; <<Check one primitive inside leaf node>>

if (p->Intersect(ray, isect)) hit = true;

} }

Processing an individual primitive is just a matter of passing the intersection request on to the primitive.

<<Check one primitive inside leaf node>>=

if (p->Intersect(ray, isect)) hit = true;

After doing the intersection tests at the leaf node, the next node to process is loaded from the todo array. If no more nodes remain, then the ray has passed through the tree without hitting anything.

<<Grab next node to process from todo list>>=

if (todoPos > 0) { --todoPos; node = todo[todoPos].node; tMin = todo[todoPos].tMin; tMax = todo[todoPos].tMax; } else break;

Like the BVHAccel, the KdTreeAccel has a specialized intersection method for shadow rays that is not shown here. It is similar to the Intersect() method but calls the Primitive::IntersectP() method and returns true as soon as it finds any intersection without worrying about finding the closest one.

<<KdTreeAccel Public Methods>>=

bool IntersectP(const Ray &ray) const;