Mengxiang Zhang, Shengjie Li, Inertial proximal stochastic gradient method with adaptive sampling for non-convex and non-smooth problems, Engineering Applications of Artificial Intelligence, Volume 163, Part 3, 2026, 10.1016/j.engappai.2025.113087.
Stochastic gradient methods with inertia have proven effective in convex optimization, yet most real-world tasks involve non-convex objectives. With the growing scale and dimensionality of modern datasets, non-convex and non-smooth regularization has become essential for improving generalization, controlling complexity, and mitigating overfitting. While widely applied in logistic regression, sparse recovery, medical imaging, and sparse neural networks, such formulations remain challenging due to the high cost of exact gradients, the sensitivity of stochastic gradients to sample size, and convergence difficulties caused by noise and non-smooth non-convexity. We propose a stochastic algorithm that addresses these issues by introducing an adaptive sampling strategy to balance stochastic gradient noise and efficiency, incorporating inertia for acceleration, and a step size update rule coupled with both sample size and inertia. We avoid the need for exact function value computations required by traditional inertial methods in non-convex and non-smooth problems, as well as the costly full-gradient evaluations or substantial memory usage typically associated with variance-reduction techniques. To our knowledge, this is the first stochastic method with adaptive sampling and inertia that guarantees convergence in non-convex and non-smooth settings, attaining O(1/K) rates to critical points under mild variance conditions, while achieving accelerated O(1/k2) convergence in convex optimization. Experiments on logistic regression and neural networks validate its efficiency and provide practical guidance for selecting sample sizes and step sizes.
