202005141051

Matrix Completion Optimal Rates

Literature Review

Candes and Plan (⊕2009Candes, Emmanuel J, and Yaniv Plan. 2009. “Matrix Completion with Noise.” Proceedings of the IEEE 98 (6): 925–36.) :

RMS: \(\dfrac{\norm{\hat{M} - M}_{F}}{N}\)

Oracle rate: \[ \begin{equation} \norm{ M^{*} - M }_{F} \approx p^{-1/2} \delta, \tag{1} \end{equation} \] where \(\delta = \norm{Z}_{F}\), \(Z\) is the added noise matrix, and \(p = \frac{m}{n^2}\) is the fraction of observed entries11 Something weird: this obviously breaks down when \(\delta = 0\), or if it has to be positive, then just take the limit.. This is for arbitrary noise. When it’s white (with s.d. \(\sigma\)), then \[ \norm{ M^{*} - M }_{F} \approx \sqrt{\frac{nr}{p}} \sigma, \] where \(r\) is the rank of the low-rank matrix. To translate this into \(\delta\), we have \(\E{\delta} = \E{\norm{F}_{Z}} \approx n \sigma\), and so RHS is \(\sqrt{\frac{r}{np}} \delta\). Comparing this to (1), this is much sharper since \(r\) is very small.

Their method (which uses some form of SDP) seem to get something like \(\sqrt{n}\) off the oracle rate.

Negahban and Wainwright (⊕2012Negahban, Sahand, and Martin J Wainwright. 2012. “Restricted strong convexity and weighted matrix completion: Optimal bounds with noise.” Journal of Machine Learning Research 13 (May): 1665–97.) :

error is scaled here by \(\frac{\eta}{d_r d_c} = \frac{\sigma}{n^2}\), so that it’s independent of the ambient dimension.

they actually use our ratio, not to regularize, but to “provide a tractable measure of how close \(\Omega\) is to a low-rank matrix”.

Hastie et al. (⊕2015Hastie, Trevor, Rahul Mazumder, Jason Lee, and Reza Zadeh. 2015. “Matrix completion and low-rank SVD via fast alternating least squares,” January.) : this is SoftImpute.

they make the comparison between (Cai, Candes, and Shen ⊕2010Cai, Jian-Feng, Emmanuel J Candes, and Zuowei Shen. 2010. “A Singular Value Thresholding Algorithm for Matrix Completion.” SIAM Journal on Optimization 20 (4): 1956–82.), which essentially solves \[ \begin{equation} \min \norm{Z}_{*} \text{ s.t. } P_{\Sigma}(Z) = P_{\Sigma}(X), \tag{2} \end{equation} \] compared to the noisy relaxed version in this paper: \[ \begin{align*} \min \norm{Z}_{*} \text{ s.t. } \norm{P_{\Sigma}(Z) - P_{\Sigma}(X)}_{F} < \delta. \end{align*} \]

Cai, Candes, and Shen (⊕2010Cai, Jian-Feng, Emmanuel J Candes, and Zuowei Shen. 2010. “A Singular Value Thresholding Algorithm for Matrix Completion.” SIAM Journal on Optimization 20 (4): 1956–82.) :

they propose a first-order method for solving matrix completion that’s very fast and uses little memory
as it’s iterative, they have a convergence proof
they also show that the iterates have non-decreasing rank
this is mainly an engineering task, in that they care about the ability to solve large-scale matrix completion problems (like the Netflix challenge)

Recht (⊕2011Recht, Benjamin. 2011. “A Simpler Approach to Matrix Completion” 12 (104): 3413–30.) :

they show optimality of the convex program (minimizing the nuclear norm, a la (2)), provided \[m > 32 \mu_1 \cdot r n \beta \log^2(n),\] for some \(\beta > 1\) and \(\mu_1\) is something like an upper bound on the entries of singular vectors. they’re not interested in noise, just recovery.

Matrix Factorization

I guess more generally, if you don’t have a square matrix, then the SVD deccomposition is no longer as regular. But that doesn’t seem to be that interesting of a generalization. In any case, iterative methods are often employed, but they suffer from the fact that oftentimes it’ll be biased towards one of the optimization variables (i.e. \(U\) vs \(V\)).

useful links: