Foundations of Scalable Nonconvex Optimization
Technical Report,18 Apr 2018,11 Oct 2019
MASSACHUSETTS INST OF TECH CAMBRIDGE United States
Pagination or Media Count:
This research program focused on creating a new paradigm for scalable nonconvex optimization. Four principal investigators with backgrounds in optimization, machine learning, and statistics were involved. The project led to development of new theories for understanding the acceleration phenomena in convex and nonconvex optimization in Euclidean and non-Euclidean spaces, with applications of training deep neural networks. The project also led to new theories about limitations and expressivity of neural networks, and to the first complete results for characterization of the optimization landscape of deep linear neural networks, leading to new results that supported the concept that local minima are global. Adding minimal nonlinearities changed the picture, with local optima whose performance are worse than linear classifiers. A first set of complete results on Bayesian optimization was developed, corresponding to settings in which even function evaluations are expensive, as well as gradients and higher order derivatives. Furthermore, the powers of graph neural networks, which have become a popular new framework for modeling large scale data with graph structures, was rigorously characterized. Finally, over fitting and generalization was analyzed, showing that the standard view in machine learningstatistics that interpolation leads to overfitting is not quite accurate in high dimensions, providing an explanation for unreasonable effectiveness of over-parameterized neural networks.