Why Momentum Works (2017)
Published on: 2025-08-05 14:01:10
⋆ \star ⋆ + + + − - − = = = α \alpha α λ \lambda λ β \beta β R R R α = \alpha= α = β = \beta= β = β = 0 \beta = 0 β = 0 β = 1 \beta=1 β = 1 α = 1 / λ i \alpha = 1/\lambda_i α = 1 / λ i m o d e l \text{model} model 0 p 1 0 p_1 0 p 1 0 p ¯ 1 0 \bar{p}_1 0 p ¯ 1 2 β 2\sqrt{\beta} 2 √ β λ i \lambda_i λ i λ i = 0 \lambda_i = 0 λ i = 0 α > 1 / λ i \alpha > 1/\lambda_i α > 1 / λ i max { ∣ σ 1 ∣ , ∣ σ 2 ∣ } > 1 \max\{|\sigma_1|,|\sigma_2|\} > 1 max { ∣ σ 1 ∣ , ∣ σ 2 ∣ } > 1 x i k − x i ∗ x_i^k - x_i^* x i k − x i ∗ ξ i \xi_i ξ i β = ( 1 − α λ i ) 2 \beta = (1 - \sqrt{\alpha \lambda_i})^2 β = ( 1 − √ α λ i ) 2
Why Momentum Really Works
Step-size α = 0.02 Momentum β = 0.99 We often think of Momentum as a means of dampening oscillations and speeding up the iterations, leading to faster convergence. But it has other interesting behavior. It allows a larger range of step-sizes to be used, and
... Read full article.