Deep Learning Seminar / 30. Januar 2020
Training Structured Neural Networks Under Regularization and Constraints
[nur in Englisch verfügbar]
Through example applications in neural network pruning, network architecture search and binary neural network, I will demonstrate the importance and benefit of incorporating into the neural network modelling and training process regularization (such as L1-norm) and constraints (such as interval constraint). This is formulated as a constrained nonsmooth nonconvex optimization problem, and we propose a convergent proximal-type stochastic gradient descent (Prox-SGD) algorithm.
We show that under properly selected learning rates, momentum eventually resembles the unknown real gradient and thus is crucial in analyzing the convergence. We establish that with probability 1, every limit point of the sequence generated by the proposed Prox-SGD is a stationary point. Then the Prox-SGD is tailored to the aforementioned example applications, and the theoretical analysis is also supported by extensive numerical tests.