By Cleve Moler, MathWorks
Stanford computer science professor Gene Golub has done more than anyone to make the singular value decomposition one of the most powerful and widely used tools in modern matrix computation.
|Gene Golub’s license plate, photographed by Professor P. M. Kroonenberg of Leiden University.|
The SVD is a recent development. Pete Stewart, author of the 1993 paper "On the Early History of the Singular Value Decomposition", tells me that the term valeurs singulières was first used by Emile Picard around 1910 in connection with integral equations. Picard used the adjective "singular" to mean something exceptional or out of the ordinary. At the time, it had nothing to do with singular matrices.
When I was a graduate student in the early 1960s, the SVD was still regarded as a fairly obscure theoretical concept. A book that George Forsythe and I wrote in 1964 described the SVD as a nonconstructive way of characterizing the norm and condition number of a matrix. We did not yet have a practical way to actually compute it. Gene Golub and W. Kahan published the first effective algorithm in 1965. A variant of that algorithm, published by Gene Golub and Christian Reinsch in 1970 is still the one we use today. By the time the first MATLAB appeared, around 1980, the SVD was one of its highlights.
We can generate a 2-by-2 example by working backwards, computing a matrix from its SVD. Take σ1 = 2, σ2 = 1/2, θ = π/6 and φ = π/4.
The matrices U and V are rotations through angles θ and φ, followed by reflections in the first dimension. The matrix ∑ is a diagonal scaling transformation. Generate A by computing
A = U∑V T
You will find that
This says that the matrix A can be generated by a rotation through 45° and a reflection, followed by independent scalings in each of the two coordinate directions by factors of 2 and 1/2, respectively, followed by a rotation through 30° and another reflection.
The MATLAB function eigshow generates a figure that demonstrates the singular value decomposition of a 2-by-2 matrix. Enter the statements
A = [1.4015 -1.0480; -0.4009 1.0133] eigshow(A)
The green circle is the unit circle in the plane. The blue ellipse is the image of this circle under transformation by the matrix A. The green vectors, v1 and v2, which are the columns of V, and the blue vectors, u1 and u2, which are the columns of U, are two different orthogonal bases for two-dimensional space. The columns of V are rotated 45° from the axes of the figure, while the columns of U2, which are the major and minor axes of the ellipse, are rotated 30°. The matrix A transforms v1 into σ1u1 and v2 into σ2u2.
Let’s move on to m-by-n matrices. One of the most important features of the SVD is its use of orthgonal matrices. A real matrix U is orthogonal, or has orthonormal columns, if
U TU = I
This says that the columns of U are perpendicular to each other and have unit length. Geometrically, transformations by orthogonal matrices are generalizations of rotations and reflections; they preserve lengths and angles. Computationally, orthogonal matrices are very desirable because they do not magnify roundoff or other kinds of errors.
Any real matrix A, even a nonsquare one, can be written as the product of three matrices.
A = U∑V T
The matrix U is orthogonal and has as many rows as A. The matrix V is orthogonal and has as many columns as A. The matrix ∑ is the same size as A, but its only nonzero elements are on the main diagonal. The diagonal elements of ∑ are the singular values, and the columns of U and V are the left and right singular vectors.
In abstract linear algebra terms, a matrix represents a linear transformation from one vector space, the domain, to another, the range. The SVD says that for any linear transformation it is possible to choose an orthonormal basis for the domain and a possibly different orthonormal basis for the range. The transformation becomes independent of scalings or dilatations in each coordinate direction.
The rank of a matrix is the number of linearly independent rows, which is the same as the number of linearly independent columns. The rank of a diagonal matrix is clearly the number of nonzero diagonal elements. Orthogonal transforms preserve linear independence. Thus, the rank of any matrix is the number of nonzero singular values. In MATLAB, enter the statement
to see how we choose a tolerance and count nonnegligible singular values.
Traditional courses in linear algebra make considerable use of the reduced row echelon form (RREF), but the RREF is an unreliable tool for computation in the face of inexact data and arithmetic. The SVD can be regarded as a modern, computationally powerful replacement for the RREF.
A square diagonal matrix is nonsingular if, and only if, its diagonal elements are nonzero. The SVD implies that any square matrix is nonsingular if, and only if, its singular values are nonzero. The most numerically reliable way to determine whether matrices are singular is to test their singular values. This is far better than trying to compute determinants, which have atrocious scaling properties.
With the singular value decomposition, the system of linear equations
U∑V Tx = b
The solution is
x = V∑ -1U Tb
Multiply by an orthogonal matrix, divide by the singular values, then multiply by another orthogonal matrix. This is much more computational work than Gaussian elimination, but it has impeccable numerical properties. You can judge whether the singular values are small enough to be regarded as negligible, and if they are, analyze the relevant singular system.
Let Ek denote the outer product of the k-th left and right singular vectors, that is
Ek = ukvkT
Then A can be expressed as a sum of rank-1 matrices,
If you order the singular values in decreasing order, σ1 σ2 > ... > σn , and truncate the sum after r terms, the result is a rank-r approximation to the original matrix. The error in the approximation depends upon the magnitude of the neglected singular values. When you do this with a matrix of data that has been centered, by subtracting the mean of each column from the entire column, the process is known as principal component analysis (PCA). The right singular vectors, vk, are the components, and the scaled left singular vectors, σkuk, are the scores. PCAs are usually described in terms of the eigenvalues and eigenvectors of the covariance matrix, AAT, but the SVD approach sometimes has better numerical properties.
SVD and matrix approximation are often illustrated by approximating images. Our example starts with the photo on Gene Golub’s Web page (Figure 2). The image is 897-by-598 pixels. We stack the red, green, and blue JPEG components vertically to produce a 2691-by-598 matrix. We then do just one SVD computation. After computing a low-rank approximation, we repartition the matrix into RGB components. With just rank 12, the colors are accurately reproduced and Gene is recognizable, especially if you squint at the picture to allow your eyes to reconstruct the original image. With rank 50, you can begin to read the mathematics on the white board behind Gene. With rank 120, the image is almost indistinguishable from the full rank 598. (This is not a particularly effective image compression technique. In fact, my friends in image processing call it "image degradration." )
So far in this column I have hardly mentioned eigenvalues. I wanted to show that it is possible to discuss singular values without discussing eigenvalues—but, of course, the two are closely related. In fact, if A is square, symmetric, and positive definite, its singular values and eigenvalues are equal, and its left and right singular vectors are equal to each other and to its eigenvectors. More generally, the singular values of A are the square roots of the eigenvalues of ATA or AAT.
Singular values are relevant when the matrix is regarded as a transformation from one space to a different space with possibly different dimensions. Eigenvalues are relevant when the matrix is regarded as a transformation from one space into itself—as, for example, in linear ordinary differential equations.
Google finds over 3,000,000 Web pages that mention "singular value decomposition" and almost 200,000 pages that mention "SVD MATLAB." I knew about a few of these pages before I started to write this column. I came across some other interesting ones as I surfed around.
Professor SVD made all of this, and much more, possible. Thanks, Gene.
Published 2006 - 91425v00