Saturday, March 5, 2016

Bundle Adjustment (gradient/Jacobian computation)

This is a continuation of Structure from Motion - Case of two views/images and Structure from Motion - Case of multiple views/images.

Bundle Adjustment (BA) is typically used in the context of Structure from Motion (SfM) to adjust the camera poses (rotation and translation) and/or the location of the 3D points in the reconstructed 3D scene. Bundle Adjustment can also adjust the internal camera parameters like the focal length but it not considered here. Bundle Adjustment is nothing more than solving a non-linear least squares problem which aims at minimizing the reprojection errors (between the computed projections of the 3D points on the images they appear on and the measured image coordinates on the same images). The method of choice in the Computer Vision community to minimize the reprojection errorr is the Levenberg-Marquardt algorithm although one can use any other non-linear least squares solver, for instance, Limited-memory BFGS.

The following write-up explains how to compute the gradient of the function to minimize (and the projection of the 3D point) w/r to the camera and point parameters. Computing the gradient is needed by a generic non-linear least squares solver like LBFGS. In the case of the Levenberg-Marquardt algorithm, it is the Jacobian matrix that is needed to solve the so-called (linear) normal equations. The gradient of the projection of the 3D point w/r to the parameters makes up the rows of the Jacobian matrix, which happens to be very sparse. Sparsity of the Jacobian matrix is a property that must be taken advantage of when implementing Levenberg-Marquardt for the purpose of Bundle Adjustment.

The gradient of the projection of the 3D point w/r to the parameters is computed nicely by using the chain rule. Although the chain rule simplifies things greatly and makes the computations more "elegant", the derivatives can certainly be computed without it but it may be a tad more tedious, especially the derivatives w/r to the camera parameters.

In the write-up, the 3x3 camera rotation is parameterized by the rotation vector omega (Rodrigues) but any good parameterization could have been used as well (like the quaternion).







This post was inspired by Computer Vision Fundamentals: Robust Non-Linear Least-Squares and their Applications, so a big thank you to Pascal Fua and Vincent Lepetit who are the authors.