## Further Reading

*Hacker’s Delight* (Warren 2006) is a delightful
and thought-provoking exploration of bit-twiddling algorithms like
those used in some of the utility routines in this appendix. Sean Anderson
(2004) has a Web page filled with a collection of
bit-twiddling techniques like the ones in `IsPowerOf2()` and
`RoundUpPow2()` at graphics.stanford.edu/~seander/bithacks.html.

The MurmurHash hashing function that is wrapped by `pbrt`’s `Hash()` and
`HashBuffer()` functions is due to Appleby (2011) and
the implementation of `MixBits()` is due to Stafford
(2011), who found the various constant values used in
the implementation via search.

The inverse bilinear interpolation function implemented in
`InvertBilinear()` is due to Quilez (2010) and
`SinXOverX()` is thanks to Hatch (2003).

The algorithm implemented in `TwoSum()` is due to
Møller (1965) and Knuth (1969), and the
FMA-based `TwoProd()` was developed by Ogita et al. (2005).
The approach used in the `CompensatedSum` class is due to Kahan
(1965). The approach used in `DifferenceOfProducts()`
is also attributed to Kahan; its error was analyzed by Jeannerod et
al. (2013).

Welford (1962) developed the algorithm that is implemented
in the `VarianceEstimator` class. Its `Merge()` method is based
on an algorithm developed by Chan et al. (1979).

Atkinson’s book (1993) on numerical analysis discusses algorithms for matrix inversion and solving linear systems. See Moore’s book (1966) for an introduction to interval arithmetic.

Farin’s book (2001) is a good introduction to splines. The blossoming approach was introduced by Ramshaw (1987); his report remains a readable introduction to the topic. A subsequent publication drew further connections to polar forms and related work (Ramshaw 1989).

The PCG random number generator was developed by O’Neill (2014). The paper describing its implementation is well written and also features extensive discussion of a range of previous pseudo-random number generators and the challenges that they have faced in passing rigorous tests of their quality (L’Ecuyer and Simard 2007).

The article “UTF-8 Everywhere” by Radzivilovsky et
al. (2012) is a good introduction to Unicode
and also makes a strong case for adopting the UTF-8 representation. `pbrt` follows the approach they propose for interoperating with Windows’s
UTF-16-based APIs. At over 1,000 pages, the length of the official Unicode
specification gives some sense of the complexities in representing
multi-lingual text (Unicode Consortium 2020).

Gamma correction has a long history in computer graphics. Poynton (2002a, 2002b) has written comprehensive FAQs on issues related to color representation and gamma correction. The sRGB encoding was described by the International Electrotechnical Commission (1999). See Gritz and d’Eon (2008) for a detailed discussion of the implications of gamma correction for rendering and how to correctly account for it in rendering systems.

McKenney’s book on parallel programming is wonderfully written and has comprehensive coverage of the underlying issues, as well as many useful techniques for high-performance parallel programming on CPUs (2021). Drepper’s paper (2007) is a useful resource for understanding performance issues related to caches, cache coherence, and main memory access, particularly in multicore systems.

Boehm’s paper “Threads Cannot Be Implemented as a Library” (2005) makes the remarkable (and disconcerting) observation that multi-threading cannot be reliably implemented without the compiler having explicit knowledge of the fact that multi-threaded execution is expected. Boehm presented a number of examples that demonstrate the corresponding dangers in 2005-era compilers and language standards like C and C++ that did not have awareness of threading. Fortunately, the C++11 and C11 standards addressed the issues that he identified.

`pbrt`’s parallel `for` loop–based approach to multi-threading is a
widely used technique for multi-threaded programming; the OpenMP standard
supports a similar construct (and much more) (OpenMP Architecture Review
Board 2013). A slightly more general model for multi-core
parallelism is available from *task systems*, where computations are
broken up into a set of independent tasks that can be executed
concurrently. That model is supported through `RunAsync()`.
Blumofe et al. (1996) described the task
scheduler in Cilk, and Blumofe and Leiserson (1999)
described the work-stealing algorithm that is the mainstay of many current
high-performance task systems.

### References

- Anderson, S. 2004. Bit twiddling hacks. graphics.stanford.edu/ seander/bithacks.html.
- Appleby, A. 2011. MurmurHash3. https://sites.google.com/site/murmurhash/.
- Atkinson, K. 1993.
*Elementary Numerical Analysis.*New York: John Wiley & Sons. - Blumofe, R., and C. Leiserson. 1999.
Scheduling multithreaded computations by work stealing.
*Journal of the ACM**46*(5), 720–48. - Blumofe, R., C. Joerg, B. Kuszmaul, C. Leiserson, K. Randall, and
Y. Zhou. 1996.
Cilk: An efficient multithreaded runtime system.
*Journal of Parallel and Distributed Computing**37*(1), 55–69. - Boehm, H.-J. 2005.
Threads cannot be implemented as a library.
*ACM SIGPLAN Notices**40*(6), 261–68. - Chan, T. F., G. Golub, R. J. LeVeque. 1979.
Updating formulae and a pairwise algorithm for computing sample variances.
*Technical Report STAN-CS-79-773*, Department of Computer Science, Stanford University. - Drepper, U. 2007. What every programmer should know about memory. people.redhat.com/drepper/cpumemory.pdf.
- Farin, G. 2001.
*Curves and Surfaces for CAGD: A Practical Guide*(5th ed.). San Francisco: Morgan Kaufmann. - Gritz, L., and E. d’Eon. 2008.
The importance of being linear.
In H. Nguyen (ed.),
*GPU Gems 3*, 529–42. Boston, Massachusetts: Addison-Wesley. - Guthe, S., and P. Heckbert 2005.
Non-power-of-two Mipmap creation.
*NVIDIA Technical Report*. - Hatch, D. 2003. The right way to calculate stuff. http://www.plunk.org/ hatch/rightway.html.
- International Electrotechnical Commission (IEC). 1999. Multimedia systems and equipment—Colour measurement and management—Part 2-1: Colour management—Default RGB colour space—sRGB. IEC Standard 61966-2-1.
- Jeannerod, C.-P., N. Louvet, and J.-M. Muller. 2013.
Further analysis of Kahan’s algorithm for the accurate computation of determinants.
*Mathematics of Computation**82*(284), 2245–64. - Kahan, W. 1965.
Further remarks on reducing truncation errors.
*Communications of the ACM**8*(1), 40. - Kainz, F., R. Bogart, and D. Hess. 2004.
The OpenEXR File Format.
In R. Fernando (ed.),
*GPU Gems*, 425–44. Reading, Massachusetts: Addison-Wesley. - Knuth, D. E. 1969.
*The Art of Computer Programming: Seminumerical Algorithms*. Reading, Massachusetts: Addison-Wesley. - L’Ecuyer, P., and R. Simard. 2007.
TestU01: A C library for empirical testing of random number
generators.
*ACM Transactions on Mathematical Software**33*(4), 22:1–40. - Møller, O. 1965.
Quasi double precision in floating-point arithmetic.
*BIT Numerical Mathematics**5*, 37–50. - McKenney, P. E. 2021.
*Is Parallel Programming Hard, and, If So, What Can You Do About It?*https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html. - Moore, R. E. 1966.
*Interval Analysis*. Englewood Cliffs, New Jersey: Prentice Hall. - O’Neill, M. 2014. PCG: A family of simple fast space-efficient statistically good algorithms for random number generation. Unpublished manuscript. http://www.pcg-random.org/paper.html.
- Ogita, T., S. M. Rump, and S. Oishi. 2005.
Accurate sum and dot product.
*SIAM Journal on Scientific Computing**26*(6), 1955–88. - OpenMP Architecture Review Board. 2013. OpenMP Application Program Interface. http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf.
- Poynton, C. 2002a. Frequently-asked questions about color. www.poynton.com/ColorFAQ.html.
- Poynton, C. 2002b. Frequently-asked questions about gamma. www.poynton.com/GammaFAQ.html.
- Quilez, I. 2010. Inverse bilinear interpolation. https://www.iquilezles.org/www/articles/ibilinear/ibilinear.htm.
- Radzivilovsky, P., Y. Galka, and S. Novgorodov. 2012. UTF-8 everywhere. http://utf8everywhere.org.
- Ramshaw, L. 1987.
Blossoming: A connect-the-dots approach to splines.
*Digital Systems Research Center Technical Report*. - Ramshaw, R. 1989.
Blossoms are polar forms.
*Computer Aided Geometric Design**6*(4), 323–58. - Stafford, D. 2011. Better bit mixing—improving on MurmurHash3’s 64-bit finalizer. http://zimbry.blogspot.com/2011/09/better-bit-mixing-improving-on.html.
- Unicode Consortium. 2020. The Unicode Standard: Version 13.0. https://www.unicode.org/versions/Unicode13.0.0/UnicodeStandard-13.0.pdf.
- Warren, H. 2006.
*Hacker’s Delight*. Reading, Massachusetts: Addison-Wesley. - Welford, B. P. 1962.
Note on a method for calculating corrected sums of squares and products.
*Technometrics**4*(3), 419–20.