Reduced rank linear regression
5/25/2026
Recently, I have been interested in reduced-rank linear regression. My first stop was section 3.7 (Multiple Outcome Shrinkage and Selection) of Elements of Statistical Learning, but I found the treatment there a bit unclear (others have felt the same way). So, I decided to derive the solution by hand as a fun exercise.
Derivation
Suppose we have , , and positive integer rank (typically ). We want to find a matrix that solves the problem
This problem looks quite benign, but the rank constraint is non-convex, so it's not immediately clear how to solve it. With some simple manipulations though, we can show a closed-form, two-step method to produce an exact (up to numerical precision) solution.
First, let be the ordinary least squares solution to the unconstrained problem . The normal equations of the unconstrained problem give us that
If we define the residual matrix , then . It follows that for all , we have
Since has no dependence on , the problem (1) is equivalent to
Now, consider the following relaxation of problem (2):
where is a new variable. Problem (3) is a relaxation of problem (2) because for any feasible in problem (2), we can set to get a feasible solution to problem (3) with the same objective value. Thus, the optimal value of problem (3) is less than or equal to the optimal value of problem (2). However, we will show that the optimal values of problems (2) and (3) are actually equal, and that we can use the solution to problem (3) to construct a solution to problem (2).
The solution to (3) is the best rank- (or less) approximation to in the Frobenius norm. If we define to be the thin SVD of , then by the Eckart-Young-Mirsky theorem, this approximation is given by , where , , and are the first columns of , the first singular values of , and the first columns of , respectively. Thus, we have that
Since is a solution to the relaxation (3), it follows that is a solution to the original problem (2).
The core intuition behind this problem is realizing the following: the least squares residual is always orthogonal to the column space of , and the task of selecting the best rank- coefficient matrix involves choosing the best subspace within the column space of . Since the least squares residual and the reduced-rank residual are orthogonal, we can split the problem into two steps: first find the least squares solution, and then find the best rank- approximation to the fitted values .
Afterthoughts
Suppose and are both orthogonal and low rank but corrupted by noise so that they appear full rank. A least squares fit will then be full rank, with matching or nearly matching . Intuitively, reduced-rank regression might seem like a good choice here, but when , the reduced-rank approximation simply picks out the leading left singular vectors of from the column space of (a column space artificially enlarged by noise). The resulting fit will appear much better than it actually is: reduced-rank regression doesn't act as a regularizer here, and the singular values of can be quite large when the model is fitting to noise.
When and are noisy, it might also be tempting to directly compute the rank- approximation of the least squares solution , but this is even worse. When fits to noise in , the corresponding singular values of will be large, and the rank- approximation of will precisely pick out these components.
In contrast, taking a low-rank approximation of before doing least squares is much more like regularization. Optimistically, this low-rank approximation will zero out the noise in , and will be forced to fit on the signal in rather than the noise. At this stage, performing reduced-rank regression on the fitted values might be more reasonable since we don't run the risk of picking out noise from .
While deriving the reduced-rank regression coefficients, I was also reminded of the orthogonal Procrustes problem. The orthogonal Procrustes problem asks for the best orthogonal matrix that maps one matrix to another matrix in the Frobenius norm. As cleanly demonstrated on Wikipedia, the solution is , where is the thin SVD of . Both problems are examples of least-squares with a non-convex constraint, and both have closed-form solutions derived using the SVD.