On Giant Batch Coaching and Sharp Minima: A Fokker-Planck Perspective
Authors: Xiaowu Dai, Yuhua Zhu
Summary: We research the statistical properties of the dynamic trajectory of stochastic gradient descent (SGD). We approximate the mini-batch SGD and the momentum SGD as stochastic differential equations (SDEs). We exploit the continual formulation of SDE and the idea of Fokker-Planck equations to develop new outcomes on the escaping phenomenon and the connection with giant batch and sharp minima. Specifically, we discover that the stochastic course of answer tends to converge to flatter minima whatever the batch measurement within the asymptotic regime. Nevertheless, the convergence price is rigorously confirmed to depend upon the batch measurement. These outcomes are validated empirically with varied datasets and fashions