Indexed by:
Abstract:
In this article, theoretically, it is shown that spurious local minima are common for deep fully connected networks and average-pooling convolutional neural networks (CNNs) with piecewise linear activations and datasets that cannot be fit by linear models. Motivating examples are given to explain why spurious local minima exist: each output neuron of deep fully connected networks and CNNs with piecewise linear activations produces a continuous piecewise linear (CPWL) function, and different pieces of the CPWL output can optimally fit disjoint groups of data samples when minimizing the empirical risk. Fitting data samples with different CPWL functions usually results in different levels of empirical risk, leading to the prevalence of spurious local minima. The results are proved in general settings with arbitrary continuous loss functions and general piecewise linear activations. The main proof technique is to represent a CPWL function as maximization over minimization of linear pieces. Deep networks with piecewise linear activations are then constructed to produce these linear pieces and implement the maximization over minimization operation.
Keyword:
Reprint Author's Address:
Email:
Source :
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
ISSN: 2162-237X
Year: 2022
1 0 . 4
JCR@2022
1 0 . 4 0 0
JCR@2022
ESI Discipline: COMPUTER SCIENCE;
ESI HC Threshold:46
JCR Journal Grade:1
CAS Journal Grade:1
Cited Count:
WoS CC Cited Count: 1
SCOPUS Cited Count: 3
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 12
Affiliated Colleges: