202005211141

On the Optimization of Deep Networks

src: (Arora, Cohen, and Hazan ⊕2018Arora, Sanjeev, Nadav Cohen, and Elad Hazan. 2018. “On the optimization of deep networks: Implicit acceleration by overparameterization.” In 35th International Conference on Machine Learning, Icml 2018, 372–89. Institute for Advanced Studies, Princeton, United States.)

question: what role does depth play in deep learning?
idea: depth is an accelerator
- somehow incorporates momentum
how to check, though. depth plays two roles:
- better representation power
- overparameterization (changing the landscape, ish)
  - and not just via width, but via depth
solution: consider linear neural networks, so that expressiveness is fixed11 This is a cute justification for linear neural networks.
results:
- show depth is equivalent to shallow (1-layer) + preconditioner
  - that essentially combines momentum
- and that this acceleration cannot be replicated via gradient descent + regularizer
  - though does not discount the possibility of an accelerated version
  - this proof felt a little spurious [[troubling-trends-in-machine-learning-scholarship]]
subtleties:
- this is for generic loss function (convex?), no longer just about matrix completion (I worked backwards)

offshoot: [[overparameterized-regression]]