202006071804

Matrix Factorization to Linear Model

Two to One

One of the reasons we never tried going to depth-1 for the [[project-penalty]] was that it felt like it would be an impossible task. In particular, one of the advantages of depth 2 is that you are explicitly finding a decomposition of the matrix into two matrices. It felt like therefore you needed at least depth-2 to get something interesting, as that structure would be forced.

If you have a depth-1 matrix, then you’re not assuming that it can be decomposed, and perhaps it actually can’t be decomposed into something nice. Thankfully matrix completion really only cares about predicting the unobserved entries, which ultimately is the same thing.

And of course, it turns out that we don’t actually need that structure, and, in fact, it might be a hindrance to the success of the algorithm. This definitely feels counter-intuitive at face value. We know that there is some structure, so it really should benefit us to enforce that, otherwise your algorithm doesn’t know what it should be looking for. Some possible reasons:

if you force this structure, then you’re also forcing the path of the solutions to also to be thusly decomposable, which might be sub-optimal.
I don’t know. This feels mysterious. Should be added to [[next-steps-for-penalty-paper]].

Update: following on from [[experimental-logs-for-matrix-completion]], we have looked at how the nuclear norm fails in the depth-1 case, which is that you dampen the top singular values and reconstruct the residuals from small singular vectors (feels ever so slightly like [[benign-overfitting-in-linear-regression]]).

Don’t forget that actually from SVD you can always just decompose it into two matrices, so the range of the two models are the same (excluding the cases where one of the matrices is just the identity). So really what’s it doing is changing the gradient trajectory.