6.2 Projective Camera Models

One of the fundamental issues in 3D computer graphics is the 3D viewing problem: how to project a 3D scene onto a 2D image for display. Most of the classic approaches can be expressed by a 4 times 4 projective transformation matrix. Therefore, we will introduce a projection matrix camera class, ProjectiveCamera, and then define two camera models based on it. The first implements an orthographic projection, and the other implements a perspective projection—two classic and widely used projections.

<<Camera Declarations>>+= 
class ProjectiveCamera : public Camera { public: <<ProjectiveCamera Public Methods>> 
ProjectiveCamera(const AnimatedTransform &CameraToWorld, const Transform &CameraToScreen, const Bounds2f &screenWindow, Float shutterOpen, Float shutterClose, Float lensr, Float focald, Film *film, const Medium *medium) : Camera(CameraToWorld, shutterOpen, shutterClose, film, medium), CameraToScreen(CameraToScreen) { <<Initialize depth of field parameters>> 
lensRadius = lensr; focalDistance = focald;
<<Compute projective camera transformations>> 
<<Compute projective camera screen transformations>> 
ScreenToRaster = Scale(film->fullResolution.x, film->fullResolution.y, 1) * Scale(1 / (screenWindow.pMax.x - screenWindow.pMin.x), 1 / (screenWindow.pMin.y - screenWindow.pMax.y), 1) * Translate(Vector3f(-screenWindow.pMin.x, -screenWindow.pMax.y, 0)); RasterToScreen = Inverse(ScreenToRaster);
RasterToCamera = Inverse(CameraToScreen) * RasterToScreen;
}
protected: <<ProjectiveCamera Protected Data>> 
Transform CameraToScreen, RasterToCamera; Transform ScreenToRaster, RasterToScreen; Float lensRadius, focalDistance;
};

Three more coordinate systems (summarized in Figure 6.1) are useful for defining and discussing projective cameras:

  • Screen space: Screen space is defined on the film plane. The camera projects objects in camera space onto the film plane; the parts inside the screen window are visible in the image that is generated. Depth  z values in screen space range from 0 to 1, corresponding to points at the near and far clipping planes, respectively. Note that, although this is called “screen” space, it is still a 3D coordinate system, since z values are meaningful.
  • Normalized device coordinate (NDC) space: This is the coordinate system for the actual image being rendered. In x and y , this space ranges from left-parenthesis 0 comma 0 right-parenthesis to left-parenthesis 1 comma 1 right-parenthesis , with left-parenthesis 0 comma 0 right-parenthesis being the upper-left corner of the image. Depth values are the same as in screen space, and a linear transformation converts from screen to NDC space.
  • Raster space: This is almost the same as NDC space, except the x and y coordinates range from left-parenthesis 0 comma 0 right-parenthesis to left-parenthesis normal r normal e normal s normal o normal l normal u normal t normal i normal o normal n period normal x comma normal r normal e normal s normal o normal l normal u normal t normal i normal o normal n period normal y right-parenthesis .

Projective cameras use 4 times 4 matrices to transform among all of these spaces, but cameras with unusual imaging characteristics can’t necessarily represent all of these transformations with matrices.

Figure 6.1: Several camera-related coordinate spaces are commonly used to simplify the implementation of Cameras. The camera class holds transformations between them. Scene objects in world space are viewed by the camera, which sits at the origin of camera space and points along the plus z axis. Objects between the near and far planes are projected onto the film plane at z equals normal n normal e normal a normal r in camera space. The film plane is at z equals 0 in raster space, where x and y range from left-parenthesis 0 comma 0 right-parenthesis to left-parenthesis normal r normal e normal s normal o normal l normal u normal t normal i normal o normal n period normal x comma normal r normal e normal s normal o normal l normal u normal t normal i normal o normal n period normal y right-parenthesis . Normalized device coordinate (NDC) space normalizes raster space so that x and y range from left-parenthesis 0 comma 0 right-parenthesis to left-parenthesis 1 comma 1 right-parenthesis .

In addition to the parameters required by the Camera base class, the ProjectiveCamera takes the projective transformation matrix, the screen space extent of the image, and additional parameters related to depth of field. Depth of field, which will be described and implemented at the end of this section, simulates the blurriness of out-of-focus objects that occurs in real lens systems.

<<ProjectiveCamera Public Methods>>= 
ProjectiveCamera(const AnimatedTransform &CameraToWorld, const Transform &CameraToScreen, const Bounds2f &screenWindow, Float shutterOpen, Float shutterClose, Float lensr, Float focald, Film *film, const Medium *medium) : Camera(CameraToWorld, shutterOpen, shutterClose, film, medium), CameraToScreen(CameraToScreen) { <<Initialize depth of field parameters>> 
lensRadius = lensr; focalDistance = focald;
<<Compute projective camera transformations>> 
<<Compute projective camera screen transformations>> 
ScreenToRaster = Scale(film->fullResolution.x, film->fullResolution.y, 1) * Scale(1 / (screenWindow.pMax.x - screenWindow.pMin.x), 1 / (screenWindow.pMin.y - screenWindow.pMax.y), 1) * Translate(Vector3f(-screenWindow.pMin.x, -screenWindow.pMax.y, 0)); RasterToScreen = Inverse(ScreenToRaster);
RasterToCamera = Inverse(CameraToScreen) * RasterToScreen;
}

ProjectiveCamera implementations pass the projective transformation up to the base class constructor shown here. This transformation gives the camera-to-screen projection; from that, the constructor can easily compute the other transformation that will be needed, to go all the way from raster space to camera space.

<<Compute projective camera transformations>>= 
<<Compute projective camera screen transformations>> 
ScreenToRaster = Scale(film->fullResolution.x, film->fullResolution.y, 1) * Scale(1 / (screenWindow.pMax.x - screenWindow.pMin.x), 1 / (screenWindow.pMin.y - screenWindow.pMax.y), 1) * Translate(Vector3f(-screenWindow.pMin.x, -screenWindow.pMax.y, 0)); RasterToScreen = Inverse(ScreenToRaster);
RasterToCamera = Inverse(CameraToScreen) * RasterToScreen;

<<ProjectiveCamera Protected Data>>= 
Transform CameraToScreen, RasterToCamera;

The only nontrivial transformation to compute in the constructor is the screen-to-raster projection. In the following code, note the composition of transformations where (reading from bottom to top), we start with a point in screen space, translate so that the upper-left corner of the screen is at the origin, and then scale by the reciprocal of the screen width and height, giving us a point with x and y coordinates between 0 and 1 (these are NDC coordinates). Finally, we scale by the raster resolution, so that we end up covering the entire raster range from left-parenthesis 0 comma 0 right-parenthesis up to the overall raster resolution. An important detail here is that the y coordinate is inverted by this transformation; this is necessary because increasing y values move up the image in screen coordinates but down in raster coordinates.

<<Compute projective camera screen transformations>>= 
ScreenToRaster = Scale(film->fullResolution.x, film->fullResolution.y, 1) * Scale(1 / (screenWindow.pMax.x - screenWindow.pMin.x), 1 / (screenWindow.pMin.y - screenWindow.pMax.y), 1) * Translate(Vector3f(-screenWindow.pMin.x, -screenWindow.pMax.y, 0)); RasterToScreen = Inverse(ScreenToRaster);

<<ProjectiveCamera Protected Data>>+=  
Transform ScreenToRaster, RasterToScreen;

6.2.1 Orthographic Camera

<<OrthographicCamera Declarations>>= 
class OrthographicCamera : public ProjectiveCamera { public: <<OrthographicCamera Public Methods>> 
OrthographicCamera(const AnimatedTransform &CameraToWorld, const Bounds2f &screenWindow, Float shutterOpen, Float shutterClose, Float lensRadius, Float focalDistance, Film *film, const Medium *medium) : ProjectiveCamera(CameraToWorld, Orthographic(0, 1), screenWindow, shutterOpen, shutterClose, lensRadius, focalDistance, film, medium) { <<Compute differential changes in origin for orthographic camera rays>> 
dxCamera = RasterToCamera(Vector3f(1, 0, 0)); dyCamera = RasterToCamera(Vector3f(0, 1, 0));
} Float GenerateRay(const CameraSample &sample, Ray *) const; Float GenerateRayDifferential(const CameraSample &sample, RayDifferential *) const;
private: <<OrthographicCamera Private Data>> 
Vector3f dxCamera, dyCamera;
};

The orthographic camera, defined in the files cameras/orthographic.h and cameras/orthographic.cpp, is based on the orthographic projection transformation. The orthographic transformation takes a rectangular region of the scene and projects it onto the front face of the box that defines the region. It doesn’t give the effect of foreshortening—objects becoming smaller on the image plane as they get farther away—but it does leave parallel lines parallel, and it preserves relative distance between objects. Figure 6.2 shows how this rectangular volume defines the visible region of the scene.

Figure 6.2: The orthographic view volume is an axis-aligned box in camera space, defined such that objects inside the region are projected onto the z equals normal n normal e normal a normal r face of the box.

Figure 6.3 compares the result of using the orthographic projection for rendering to the perspective projection defined in the next section.

Figure 6.3: Car Model Rendered with Different Camera Models. Car rendered from the same viewpoint with (1) orthographic and (2) perspective cameras. The lack of foreshortening makes the orthographic view feel like it has less depth, although it does preserve parallel lines, which can be a useful property.

The orthographic camera constructor generates the orthographic transformation matrix with the Orthographic() function, which will be defined shortly.

<<OrthographicCamera Public Methods>>= 
OrthographicCamera(const AnimatedTransform &CameraToWorld, const Bounds2f &screenWindow, Float shutterOpen, Float shutterClose, Float lensRadius, Float focalDistance, Film *film, const Medium *medium) : ProjectiveCamera(CameraToWorld, Orthographic(0, 1), screenWindow, shutterOpen, shutterClose, lensRadius, focalDistance, film, medium) { <<Compute differential changes in origin for orthographic camera rays>> 
dxCamera = RasterToCamera(Vector3f(1, 0, 0)); dyCamera = RasterToCamera(Vector3f(0, 1, 0));
}

The orthographic viewing transformation leaves x and y coordinates unchanged but maps  z values at the near plane to 0 and  z values at the far plane to 1. To do this, the scene is first translated along the z axis so that the near plane is aligned with z equals 0 . Then, the scene is scaled in z so that the far plane maps to z equals 1 . The composition of these two transformations gives the overall transformation. (For a ray tracer like pbrt, we’d like the near plane to be at 0 so that rays start at the plane that goes through the camera’s position; the far plane offset doesn’t particularly matter.)

<<Transform Method Definitions>>+=  
Transform Orthographic(Float zNear, Float zFar) { return Scale(1, 1, 1 / (zFar - zNear)) * Translate(Vector3f(0, 0, -zNear)); }

Thanks to the simplicity of the orthographic projection, it’s easy to directly compute the differential rays in the x and y directions in the GenerateRayDifferential() method. The directions of the differential rays will be the same as the main ray (as they are for all rays generated by an orthographic camera), and the difference in origins will be the same for all rays. Therefore, the constructor here precomputes how much the ray origins shift in camera space coordinates due to a single pixel shift in the x and y directions on the film plane.

<<Compute differential changes in origin for orthographic camera rays>>= 
dxCamera = RasterToCamera(Vector3f(1, 0, 0)); dyCamera = RasterToCamera(Vector3f(0, 1, 0));

<<OrthographicCamera Private Data>>= 
Vector3f dxCamera, dyCamera;

We can now go through the code to take a sample point in raster space and turn it into a camera ray. The process is summarized in Figure 6.4. First, the raster space sample position is transformed into a point in camera space, giving a point located on the near plane, which is the origin of the camera ray. Because the camera space viewing direction points down the z axis, the camera space ray direction is left-parenthesis 0 comma 0 comma 1 right-parenthesis .

Figure 6.4: To create a ray with the orthographic camera, a raster space position on the film plane is transformed to camera space, giving the ray’s origin on the near plane. The ray’s direction in camera space is left-parenthesis 0 comma 0 comma 1 right-parenthesis , down the z axis.

If depth of field has been enabled for this scene, the ray’s origin and direction are modified so that depth of field is simulated. Depth of field will be explained later in this section. The ray’s time value is set by linearly interpolating between the shutter open and shutter close times by the CameraSample::time offset (which is in the range left-bracket 0 comma 1 right-parenthesis ). Finally, the ray is transformed into world space before being returned.

<<OrthographicCamera Definitions>>= 
Float OrthographicCamera::GenerateRay(const CameraSample &sample, Ray *ray) const { <<Compute raster and camera sample positions>> 
Point3f pFilm = Point3f(sample.pFilm.x, sample.pFilm.y, 0); Point3f pCamera = RasterToCamera(pFilm);
*ray = Ray(pCamera, Vector3f(0, 0, 1)); <<Modify ray for depth of field>> 
if (lensRadius > 0) { <<Sample point on lens>>  <<Compute point on plane of focus>> 
Float ft = focalDistance / ray->d.z; Point3f pFocus = (*ray)(ft);
<<Update ray for effect of lens>> 
ray->o = Point3f(pLens.x, pLens.y, 0); ray->d = Normalize(pFocus - ray->o);
}
ray->time = Lerp(sample.time, shutterOpen, shutterClose); ray->medium = medium; *ray = CameraToWorld(*ray); return 1; }

Once all of the transformation matrices have been set up, it’s easy to transform the raster space sample point to camera space.

<<Compute raster and camera sample positions>>= 
Point3f pFilm = Point3f(sample.pFilm.x, sample.pFilm.y, 0); Point3f pCamera = RasterToCamera(pFilm);

The implementation of GenerateRayDifferential() performs the same computation to generate the main camera ray. The differential ray origins are found using the offsets computed in the OrthographicCamera constructor, and then the full ray differential is transformed to world space.

<<OrthographicCamera Definitions>>+= 
Float OrthographicCamera::GenerateRayDifferential( const CameraSample &sample, RayDifferential *ray) const { <<Compute main orthographic viewing ray>> 
<<Compute raster and camera sample positions>> 
Point3f pFilm = Point3f(sample.pFilm.x, sample.pFilm.y, 0); Point3f pCamera = RasterToCamera(pFilm);
*ray = RayDifferential(pCamera, Vector3f(0, 0, 1)); <<Modify ray for depth of field>> 
if (lensRadius > 0) { <<Sample point on lens>>  <<Compute point on plane of focus>> 
Float ft = focalDistance / ray->d.z; Point3f pFocus = (*ray)(ft);
<<Update ray for effect of lens>> 
ray->o = Point3f(pLens.x, pLens.y, 0); ray->d = Normalize(pFocus - ray->o);
}
<<Compute ray differentials for OrthographicCamera>> 
if (lensRadius > 0) { <<Compute OrthographicCamera ray differentials accounting for lens>> 
<<Sample point on lens>>  Float ft = focalDistance / ray->d.z; Point3f pFocus = pCamera + dxCamera + (ft * Vector3f(0, 0, 1)); ray->rxOrigin = Point3f(pLens.x, pLens.y, 0); ray->rxDirection = Normalize(pFocus - ray->rxOrigin); pFocus = pCamera + dyCamera + (ft * Vector3f(0, 0, 1)); ray->ryOrigin = Point3f(pLens.x, pLens.y, 0); ray->ryDirection = Normalize(pFocus - ray->ryOrigin);
} else { ray->rxOrigin = ray->o + dxCamera; ray->ryOrigin = ray->o + dyCamera; ray->rxDirection = ray->ryDirection = ray->d; }
ray->time = Lerp(sample.time, shutterOpen, shutterClose); ray->hasDifferentials = true; ray->medium = medium; *ray = CameraToWorld(*ray); return 1; }

<<Compute ray differentials for OrthographicCamera>>= 
if (lensRadius > 0) { <<Compute OrthographicCamera ray differentials accounting for lens>> 
<<Sample point on lens>>  Float ft = focalDistance / ray->d.z; Point3f pFocus = pCamera + dxCamera + (ft * Vector3f(0, 0, 1)); ray->rxOrigin = Point3f(pLens.x, pLens.y, 0); ray->rxDirection = Normalize(pFocus - ray->rxOrigin); pFocus = pCamera + dyCamera + (ft * Vector3f(0, 0, 1)); ray->ryOrigin = Point3f(pLens.x, pLens.y, 0); ray->ryDirection = Normalize(pFocus - ray->ryOrigin);
} else { ray->rxOrigin = ray->o + dxCamera; ray->ryOrigin = ray->o + dyCamera; ray->rxDirection = ray->ryDirection = ray->d; }

6.2.2 Perspective Camera

The perspective projection is similar to the orthographic projection in that it projects a volume of space onto a 2D film plane. However, it includes the effect of foreshortening: objects that are far away are projected to be smaller than objects of the same size that are closer. Unlike the orthographic projection, the perspective projection doesn’t preserve distances or angles, and parallel lines no longer remain parallel. The perspective projection is a reasonably close match to how an eye or camera lens generates images of the 3D world. The perspective camera is implemented in the files cameras/perspective.h and cameras/perspective.cpp.

<<PerspectiveCamera Declarations>>= 
class PerspectiveCamera : public ProjectiveCamera { public: <<PerspectiveCamera Public Methods>> 
PerspectiveCamera(const AnimatedTransform &CameraToWorld, const Bounds2f &screenWindow, Float shutterOpen, Float shutterClose, Float lensRadius, Float focalDistance, Float fov, Film *film, const Medium *medium); Float GenerateRay(const CameraSample &sample, Ray *) const; Float GenerateRayDifferential(const CameraSample &sample, RayDifferential *ray) const; Spectrum We(const Ray &ray, Point2f *pRaster = nullptr) const; void Pdf_We(const Ray &ray, Float *pdfPos, Float *pdfDir) const; Spectrum Sample_Wi(const Interaction &ref, const Point2f &sample, Vector3f *wi, Float *pdf, Point2f *pRaster, VisibilityTester *vis) const;
private: <<PerspectiveCamera Private Data>> 
Vector3f dxCamera, dyCamera; Float A;
};

<<PerspectiveCamera Method Definitions>>= 
PerspectiveCamera::PerspectiveCamera( const AnimatedTransform &CameraToWorld, const Bounds2f &screenWindow, Float shutterOpen, Float shutterClose, Float lensRadius, Float focalDistance, Float fov, Film *film, const Medium *medium) : ProjectiveCamera(CameraToWorld, Perspective(fov, 1e-2f, 1000.f), screenWindow, shutterOpen, shutterClose, lensRadius, focalDistance, film, medium) { <<Compute differential changes in origin for perspective camera rays>>  <<Compute image plane bounds at z equals 1 for PerspectiveCamera>> 
Point2i res = film->fullResolution; Point3f pMin = RasterToCamera(Point3f(0, 0, 0)); Point3f pMax = RasterToCamera(Point3f(res.x, res.y, 0)); pMin /= pMin.z; pMax /= pMax.z; A = std::abs((pMax.x - pMin.x) * (pMax.y - pMin.y));
}

The perspective projection describes perspective viewing of the scene. Points in the scene are projected onto a viewing plane perpendicular to the z axis. The Perspective() function computes this transformation; it takes a field-of-view angle in fov and the distances to a near z plane and a far z plane. After the perspective projection, points at the near z plane are mapped to have z equals 0 , and points at the far plane have z equals 1 (Figure 6.5). For rendering systems based on rasterization, it’s important to set the positions of these planes carefully; they determine the z range of the scene that is rendered, but setting them with too many orders of magnitude variation between their values can lead to numerical precision errors. For ray tracers like pbrt, they can be set arbitrarily as they are here.

Figure 6.5: The perspective transformation matrix projects points in camera space onto the film plane. The x prime and y prime coordinates of the projected points are equal to the unprojected x and y coordinates divided by the z coordinate. The projected z prime coordinate is computed so that points on the near plane map to z prime equals 0 and points on the far plane map to z prime equals 1 .

<<Transform Method Definitions>>+= 
Transform Perspective(Float fov, Float n, Float f) { <<Perform projective divide for perspective projection>> 
Matrix4x4 persp(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, f / (f - n), -f*n / (f - n), 0, 0, 1, 0);
<<Scale canonical perspective view to specified field of view>> 
Float invTanAng = 1 / std::tan(Radians(fov) / 2); return Scale(invTanAng, invTanAng, 1) * Transform(persp);
}

The transformation is most easily understood in two steps:

  1. Points normal p Subscript in camera space are projected onto the viewing plane. A bit of algebra shows that the projected  x prime and  y prime coordinates on the viewing plane can be computed by dividing  x and  y by the point’s z coordinate value. The projected  z depth is remapped so that  z values at the near plane are 0 and  z values at the far plane are 1. The computation we’d like to do is
    StartLayout 1st Row 1st Column x prime 2nd Column equals x slash z 2nd Row 1st Column y prime 2nd Column equals y slash z 3rd Row 1st Column z prime 2nd Column equals StartFraction f left-parenthesis z minus n right-parenthesis Over z left-parenthesis f minus n right-parenthesis EndFraction period EndLayout
    All of this computation can be encoded in a 4 times 4 matrix using homogeneous coordinates:
    Start 4 By 4 Matrix 1st Row 1st Column 1 2nd Column 0 3rd Column 0 4th Column 0 2nd Row 1st Column 0 2nd Column 1 3rd Column 0 4th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column StartFraction f Over f minus n EndFraction 4th Column minus StartFraction f n Over f minus n EndFraction 4th Row 1st Column 0 2nd Column 0 3rd Column 1 4th Column 0 EndMatrix
    <<Perform projective divide for perspective projection>>= 
    Matrix4x4 persp(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, f / (f - n), -f*n / (f - n), 0, 0, 1, 0);
  2. The angular field of view (fov) specified by the user is accounted for by scaling the left-parenthesis x comma y right-parenthesis values on the projection plane so that points inside the field of view project to coordinates between left-bracket negative 1 comma 1 right-bracket on the view plane. For square images, both x and y lie between left-bracket negative 1 comma 1 right-bracket in screen space. Otherwise, the direction in which the image is narrower maps to left-bracket negative 1 comma 1 right-bracket , and the wider direction maps to a proportionally larger range of screen space values. Recall that the tangent is equal to the ratio of the opposite side of a right triangle to the adjacent side. Here the adjacent side has length 1, so the opposite side has the length tangent left-parenthesis monospace f monospace o monospace v slash 2 right-parenthesis . Scaling by the reciprocal of this length maps the field of view to range from left-bracket negative 1 comma 1 right-bracket .

<<Scale canonical perspective view to specified field of view>>= 
Float invTanAng = 1 / std::tan(Radians(fov) / 2); return Scale(invTanAng, invTanAng, 1) * Transform(persp);

Similar to the OrthographicCamera, information about how the camera rays generated by the PerspectiveCamera change as we shift pixels on the film plane can be precomputed in the constructor. Here, we compute the change in position on the near perspective plane in camera space with respect to shifts in pixel location.

<<Compute differential changes in origin for perspective camera rays>>= 

<<PerspectiveCamera Private Data>>= 
Vector3f dxCamera, dyCamera;

With the perspective projection, all rays originate from the origin, left-parenthesis 0 comma 0 comma 0 right-parenthesis , in camera space. A ray’s direction is given by the vector from the origin to the point on the near plane, pCamera, that corresponds to the provided CameraSample’s pFilm location. In other words, the ray’s vector direction is component-wise equal to this point’s position, so rather than doing a useless subtraction to compute the direction, we just initialize the direction directly from the point pCamera.

<<PerspectiveCamera Method Definitions>>+=  
Float PerspectiveCamera::GenerateRay(const CameraSample &sample, Ray *ray) const { <<Compute raster and camera sample positions>> 
Point3f pFilm = Point3f(sample.pFilm.x, sample.pFilm.y, 0); Point3f pCamera = RasterToCamera(pFilm);
*ray = Ray(Point3f(0, 0, 0), Normalize(Vector3f(pCamera))); <<Modify ray for depth of field>> 
if (lensRadius > 0) { <<Sample point on lens>>  <<Compute point on plane of focus>> 
Float ft = focalDistance / ray->d.z; Point3f pFocus = (*ray)(ft);
<<Update ray for effect of lens>> 
ray->o = Point3f(pLens.x, pLens.y, 0); ray->d = Normalize(pFocus - ray->o);
}
ray->time = Lerp(sample.time, shutterOpen, shutterClose); ray->medium = medium; *ray = CameraToWorld(*ray); return 1; }

The GenerateRayDifferential() method follows the implementation of GenerateRay(), except for an additional fragment that computes the differential rays.

<<PerspectiveCamera Public Methods>>= 
Float GenerateRayDifferential(const CameraSample &sample, RayDifferential *ray) const;

<<Compute offset rays for PerspectiveCamera ray differentials>>= 
if (lensRadius > 0) { <<Compute PerspectiveCamera ray differentials accounting for lens>> 
<<Sample point on lens>>  Vector3f dx = Normalize(Vector3f(pCamera + dxCamera)); Float ft = focalDistance / dx.z; Point3f pFocus = Point3f(0, 0, 0) + (ft * dx); ray->rxOrigin = Point3f(pLens.x, pLens.y, 0); ray->rxDirection = Normalize(pFocus - ray->rxOrigin); Vector3f dy = Normalize(Vector3f(pCamera + dyCamera)); ft = focalDistance / dy.z; pFocus = Point3f(0, 0, 0) + (ft * dy); ray->ryOrigin = Point3f(pLens.x, pLens.y, 0); ray->ryDirection = Normalize(pFocus - ray->ryOrigin);
} else { ray->rxOrigin = ray->ryOrigin = ray->o; ray->rxDirection = Normalize(Vector3f(pCamera) + dxCamera); ray->ryDirection = Normalize(Vector3f(pCamera) + dyCamera); }

6.2.3 The Thin Lens Model and Depth of Field

An ideal pinhole camera that only allows rays passing through a single point to reach the film isn’t physically realizable; while it’s possible to make cameras with extremely small apertures that approach this behavior, small apertures allow relatively little light to reach the film sensor. With a small aperture, long exposure times are required to capture enough photons to accurately capture the image, which in turn can lead to blur from objects in the scene moving while the camera shutter is open.

Real cameras have lens systems that focus light through a finite-sized aperture onto the film plane. Camera designers (and photographers using cameras with adjustable apertures) face a trade-off: the larger the aperture, the more light reaches the film and the shorter the exposures that are needed. However, lenses can only focus on a single plane (the focal plane), and the farther objects in the scene are from this plane, the blurrier they are. The larger the aperture, the more pronounced this effect is: objects at depths different from the one the lens system has in focus become increasingly blurry.

The camera model in Section 6.4 implements a fairly accurate simulation of lens systems in realistic cameras. For the simple camera models introduced so far, we can apply a classic approximation from optics, the thin lens approximation, to model the effect of finite apertures with traditional computer graphics projection models. The thin lens approximation models an optical system as a single lens with spherical profiles, where the thickness of the lens is small relative to the radius of curvature of the lens. (The more general thick lens approximation, which doesn’t assume that the lens’s thickness is negligible, is introduced in Section 6.4.3.)

Under the thin lens approximation, incident rays that are parallel to the optical axis and pass through the lens focus at a point behind the lens called the focal point. The distance the focal point is behind the lens, f , is the lens’s focal length. If the film plane is placed at a distance equal to the focal length behind the lens, then objects infinitely far away will be in focus, as they image to a single point on the film.

Figure 6.6 illustrates the basic setting. Here we’ve followed the typical lens coordinate system convention of placing the lens perpendicular to the z axis, with the lens at z equals 0 and the scene along negative z . (Note that this is a different coordinate system from the one we used for camera space, where the viewing direction is plus z .) Distances on the scene side of the lens are denoted with unprimed variables z , and distances on the film side of the lens (positive z ) are primed, z prime .

Figure 6.6: A thin lens, located along the z axis at z equals 0 . Incident rays that are parallel to the optical axis and pass through a thin lens (dashed lines) all pass through a point normal p Subscript , the focal point. The distance between the lens and the focal point, f , is the lens’s focal length.

For points in the scene at a depth z from a thin lens with focal length f , the Gaussian lens equation relates the distances from the object to the lens and from lens to the image of the point:

StartFraction 1 Over z Superscript prime Baseline EndFraction minus StartFraction 1 Over z EndFraction equals StartFraction 1 Over f EndFraction period
(6.1)

Note that for z equals negative normal infinity , we have z prime equals f , as expected.

We can use the Gaussian lens equation to solve for the distance between the lens and the film that sets the plane of focus at some z , the focal distance (Figure 6.7):

z prime equals StartFraction f z Over f plus z EndFraction period
(6.2)

Figure 6.7: To focus a thin lens at a depth z in the scene, Equation (6.2) can be used to compute the distance z prime on the film side of the lens that points at z focus to. Focusing is performed by adjusting the distance between the lens and the film plane.

A point that doesn’t lie on the plane of focus is imaged to a disk on the film plane, rather than to a single point. The boundary of this disk is called the circle of confusion. The size of the circle of confusion is affected by the diameter of the aperture that light rays pass through, the focal distance, and the distance between the object and the lens. Figure 6.8 shows this effect, depth of field, in a scene with a series of copies of the dragon model. As the size of the lens aperture increases, blurriness increases the farther a point is from the plane of focus. Note that the second dragon from the right remains in focus throughout all of the images, as the plane of focus has been placed at its depth.

Figure 6.8: (1) Scene rendered with no depth of field, (2) depth of field due to a relatively small lens aperture, which gives only a small amount of blurriness in the out-of-focus regions, (3) and (4) As the size of the lens aperture increases, the size of the circle of confusion in the out-of-focus areas increases, giving a greater amount of blur on the film plane.

Figure 6.9 shows depth of field used to render the landscape scene. Note how the effect draws the viewer’s eye to the in-focus grass in the center of the image.

Figure 6.9: Depth of field gives a greater sense of depth and scale to this part of the landscape scene. (Scene courtesy of Laubwerk.)

In practice, objects do not have to be exactly on the plane of focus to appear in sharp focus; as long as the circle of confusion is roughly smaller than a pixel on the film sensor, objects appear to be in focus. The range of distances from the lens at which objects appear in focus is called the lens’s depth of field.

The Gaussian lens equation also lets us compute the size of the circle of confusion; given a lens with focal length f that is focused at a distance z Subscript normal f , the film plane is at z prime Subscript normal f . Given another point at depth z , the Gaussian lens equation gives the distance z prime that the lens focuses the point to. This point is either in front of or behind the film plane; Figure 6.10(a) shows the case where it is behind.

Figure 6.10: (a) If a thin lens with focal length f is focused at some depth z Subscript normal f , then the distance from the lens to the film plane is z prime Subscript normal f , given by the Gaussian lens equation. A point in the scene at depth z not-equals z Subscript normal f will be imaged as a circle on the film plane; here z focuses at z prime , which is behind the film plane. (b) To compute the diameter of the circle of confusion, we can apply similar triangles: the ratio of d Subscript normal l , the diameter of the lens, to z prime must be the same as the ratio of d Subscript normal c , the diameter of the circle of confusion, to z prime minus z prime Subscript normal f .

The diameter of the circle of confusion is given by the intersection of the cone between z prime and the lens with the film plane. If we know the diameter of the lens d Subscript normal l , then we can use similar triangles to solve for the diameter of the circle of confusion d Subscript normal c (Figure 6.10(b)):

StartFraction d Subscript normal l Baseline Over z Superscript prime Baseline EndFraction equals StartFraction d Subscript normal c Baseline Over StartAbsoluteValue z prime minus z prime Subscript normal f EndAbsoluteValue EndFraction period

Solving for d Subscript normal c , we have

d Subscript normal c Baseline equals StartAbsoluteValue StartFraction d Subscript normal l Baseline left-parenthesis z prime minus z prime Subscript normal f right-parenthesis Over z Superscript prime Baseline EndFraction EndAbsoluteValue period

Applying the Gaussian lens equation to express the result in terms of scene depths, we can find that

d Subscript normal c Baseline equals StartAbsoluteValue StartFraction d Subscript normal l Baseline f left-parenthesis z minus z Subscript normal f Baseline right-parenthesis Over z left-parenthesis f plus z Subscript normal f Baseline right-parenthesis EndFraction EndAbsoluteValue period

Note that the diameter of the circle of confusion is proportional to the diameter of the lens. The lens diameter is often expressed as the lens’s f-number n , which expresses diameter as a fraction of focal length, d Subscript normal l Baseline equals f slash n .

Figure 6.11 shows a graph of this function for a 50-mm focal length lens with a 25-mm aperture, focused at z Subscript normal f Baseline equals 1 normal m . Note that the blur is asymmetric with depth around the focal plane and grows much more quickly for objects in front of the plane of focus than for objects behind it.

Figure 6.11: The diameter of the circle of confusion as a function of depth for a 50-mm focal length lens with 25-mm aperture, focused at 1 meter.

Modeling a thin lens in a ray tracer is remarkably straightforward: all that is necessary is to choose a point on the lens and find the appropriate ray that starts on the lens at that point such that objects in the plane of focus are in focus on the film (Figure 6.12). Therefore, projective cameras take two extra parameters for depth of field: one sets the size of the lens aperture, and the other sets the focal distance.

Figure 6.12: (a) For a pinhole camera model, a single camera ray is associated with each point on the film plane (filled circle), given by the ray that passes through the single point of the pinhole lens (empty circle). (b) For a camera model with a finite aperture, we sample a point (filled circle) on the disk-shaped lens for each ray. We then compute the ray that passes through the center of the lens (corresponding to the pinhole model) and the point where it intersects the plane of focus (solid line). We know that all objects in the plane of focus must be in focus, regardless of the lens sample position. Therefore, the ray corresponding to the lens position sample (dashed line) is given by the ray starting on the lens sample point and passing through the computed intersection point on the plane of focus.

<<ProjectiveCamera Protected Data>>+= 
Float lensRadius, focalDistance;

<<Initialize depth of field parameters>>= 
lensRadius = lensr; focalDistance = focald;

It is generally necessary to trace many rays for each image pixel in order to adequately sample the lens for smooth depth of field. Figure 6.13 shows the landscape scene from Figure 6.9 with only four samples per pixel (Figure 6.9 had 2048 samples per pixel).

Figure 6.13: Landscape scene with depth of field and only four samples per pixel: the depth of field is undersampled and the image is grainy. (Scene courtesy of Laubwerk.)

<<Modify ray for depth of field>>= 
if (lensRadius > 0) { <<Sample point on lens>>  <<Compute point on plane of focus>> 
Float ft = focalDistance / ray->d.z; Point3f pFocus = (*ray)(ft);
<<Update ray for effect of lens>> 
ray->o = Point3f(pLens.x, pLens.y, 0); ray->d = Normalize(pFocus - ray->o);
}

The ConcentricSampleDisk() function, defined in Chapter 13, takes a left-parenthesis u comma v right-parenthesis sample position in left-bracket 0 comma 1 right-parenthesis squared and maps it to a 2D unit disk centered at the origin left-parenthesis 0 comma 0 right-parenthesis . To turn this into a point on the lens, these coordinates are scaled by the lens radius. The CameraSample class provides the left-parenthesis u comma v right-parenthesis lens-sampling parameters in the pLens member variable.

<<Sample point on lens>>= 

The ray’s origin is this point on the lens. Now it is necessary to determine the proper direction for the new ray. We know that all rays from the given image sample through the lens must converge at the same point on the plane of focus. Furthermore, we know that rays pass through the center of the lens without a change in direction, so finding the appropriate point of convergence is a matter of intersecting the unperturbed ray from the pinhole model with the plane of focus and then setting the new ray’s direction to be the vector from the point on the lens to the intersection point.

For this simple model, the plane of focus is perpendicular to the z axis and the ray starts at the origin, so intersecting the ray through the lens center with the plane of focus is straightforward. The t value of the intersection is given by

t equals StartFraction f o c a l upper D i s t a n c e Over bold d Subscript z Baseline EndFraction period

<<Compute point on plane of focus>>= 
Float ft = focalDistance / ray->d.z; Point3f pFocus = (*ray)(ft);

Now the ray can be initialized. The origin is set to the sampled point on the lens, and the direction is set so that the ray passes through the point on the plane of focus, pFocus.

<<Update ray for effect of lens>>= 
ray->o = Point3f(pLens.x, pLens.y, 0); ray->d = Normalize(pFocus - ray->o);

To compute ray differentials with the thin lens, the approach used in the fragment <<Update ray for effect of lens>> is applied to rays offset one pixel in the x and y directions on the film plane. The fragments that implement this, <<Compute OrthographicCamera ray differentials accounting for lens>> and <<Compute PerspectiveCamera ray differentials accounting for lens>>, aren’t included here.