Fast, Distributed Algorithms in Deep Networks
NAVAL ACADEMY ANNAPOLIS MD ANNAPOLIS United States
Pagination or Media Count:
In this project we demonstrate two different approaches to speed up the training of neural nets. First, even before training, we demonstrate an informed way of initializing parameters closer to their final, trained values. Second, we introduce a new training algorithm that scales linearly when parallelized, allowing for substantially decreased training times on large datasets. Neural nets are famously unintuitive, and as such, parameters are typically randomly assigned, then adjusted during training. However, by using a cosine activation function, a layer of neurons can be made to approximate the implicit feature space of a kernel. Therefore, intuition on kernel selection can guide initial parameter assignments even before any data observations. We implement this approach and show that it can greatly speed uptraining, often approaching the final accuracy after only one training iteration. Our second contribution was in the application of the ADMM algorithm to neural nets. Conventional gradient based optimization methods for neural nets scale poorly which is difficult to avoid with extremely large datasets. The proposed method avoids many of the conditions that typically make gradient based methods slow, allowing for efficient computation without specialized hardware. Our implementation demonstrates strong scalability with linear speedups even up to thousands of cores. We show that for large problems, our approach can converge faster than GPU-based implementations of standard algorithms.