202005211141
On the Optimization of Deep Networks
tags: [ src:paper ]
src: (Arora, Cohen, and Hazan 2018Arora, Sanjeev, Nadav Cohen, and Elad Hazan. 2018. βOn the optimization of deep networks: Implicit acceleration by overparameterization.β In 35th International Conference on Machine Learning, Icml 2018, 372β89. Institute for Advanced Studies, Princeton, United States.)
- question: what role does depth play in deep learning?
- idea: depth is an accelerator
- somehow incorporates momentum
- how to check, though. depth plays two roles:
- better representation power
- overparameterization (changing the landscape, ish)
- and not just via width, but via depth
- solution: consider linear neural networks, so that expressiveness is fixed1 This is a cute justification for linear neural networks.
- results:
- show depth is equivalent to shallow (1-layer) + preconditioner
- that essentially combines momentum
- and that this acceleration cannot be replicated via gradient descent + regularizer
- though does not discount the possibility of an accelerated version
- this proof felt a little spurious [[troubling-trends-in-machine-learning-scholarship]]
- show depth is equivalent to shallow (1-layer) + preconditioner
- subtleties:
- this is for generic loss function (convex?), no longer just about matrix completion (I worked backwards)
offshoot: [[overparameterized-regression]]