202005120105
Extending the Penalty
tags: [ proj:penalty ]
Question: how to extend the \(L_1/L_2\) ratio penalty to more general settings?
- currently, in the case of matrix completion, the penalty is straightforward, since we have a product matrix already as our final goal [[the-form-of-the-loss-in-matrix-completion]].
- in more general (or typical) settings, you have the weight matrices (plus bias vectors),
- currently, you can add penalties to the weight matrices themselves, but this seems a little crude
- the parallel for our matrix completion problem would be if we did it for the individual matrices, not the full product matrix (which doesn’t seem that great)
- let’s take a step back here:
- I feel like you can ask the same question about (Arora et al. 2019Arora, Sanjeev, Nadav Cohen, Wei Hu, and Yuping Luo. 2019. “Implicit Regularization in Deep Matrix Factorization.” arXiv.org, May. http://arxiv.org/abs/1905.13655v3.). that is, they want to say something about how depth is playing this interesting role where it’s dampening the singular values, or simply making it more sparse
- but if you were to generalize this, then it’s unclear what singular values you’re talking about?1 In fact, this is probably why they don’t actually generalize. A worry I have is that all this work using matrix completion as the test-bed might not be applicable anywhere else.
- and whatever singular values you want to talk about, well then there’s a matrix, and viola, we can simply just add the regularizer to that matrix, and the equivalence follows
- this is a similar problem to what we had when we wanted to have the trivial extension of the matrix completion problem to allow for deep non-linear networks as opposed to DLNNs.
- there, we simply added back the non-linear transforms between the weights.
- we found that it didn’t really do much in terms of test performance
- seems like, because the problem is linear to begin with, having non-linearities introduced doesn’t really give you any gains.
- very recently, a group (Yang, Wen, and Li 2019Yang, Huanrui, Wei Wen, and Hai Li. 2019. “DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures.” arXiv.org, August. http://arxiv.org/abs/1908.09979v2.) used a variant of our penalty to sparsify matrices. it’s not quite what we’re interested in – they basically show that using \(l_1/l_2\) versus just a uniform \(l_1\) penalty on the weights does better at sparsifying.
Backlinks
- [[next-steps-for-penalty-paper]]
- See [[extending-the-penalty]].