Spurious Local Minima Are Common for Deep Neural Networks With Piecewise Linear Activations - Details

Author：

Liu, Bo (Liu, Bo.)

Indexed by：

EI Scopus SCIE

Abstract：

In　this　article,　theoretically,　it　is　shown　that　spurious　local　minima　are　common　for　deep　fully　connected　networks　and　average-pooling　convolutional　neural　networks　(CNNs)　with　piecewise　linear　activations　and　datasets　that　cannot　be　fit　by　linear　models.　Motivating　examples　are　given　to　explain　why　spurious　local　minima　exist:　each　output　neuron　of　deep　fully　connected　networks　and　CNNs　with　piecewise　linear　activations　produces　a　continuous　piecewise　linear　(CPWL)　function,　and　different　pieces　of　the　CPWL　output　can　optimally　fit　disjoint　groups　of　data　samples　when　minimizing　the　empirical　risk.　Fitting　data　samples　with　different　CPWL　functions　usually　results　in　different　levels　of　empirical　risk,　leading　to　the　prevalence　of　spurious　local　minima.　The　results　are　proved　in　general　settings　with　arbitrary　continuous　loss　functions　and　general　piecewise　linear　activations.　The　main　proof　technique　is　to　represent　a　CPWL　function　as　maximization　over　minimization　of　linear　pieces.　Deep　networks　with　piecewise　linear　activations　are　then　constructed　to　produce　these　linear　pieces　and　implement　the　maximization　over　minimization　operation.

Keyword：

Convolutional neural networks (CNNs) Biological neural networks loss landscape Neural networks Deep learning Training Matrix decomposition local minima deep learning theory deep neural networks Neurons Minimization

Author Community：

[ 1 ] [Liu, Bo]Beijing Univ Technol, Coll Comp Sci, Fac Informat Technol, Beijing 100124, Peoples R China

Reprint Author's Address：

Email：

liubo@bjut.edu.cn

Show more details

Related Keywords：

An efficient and improved scheme for handwritten digit recognition based on convolutional neural network
2019，SN APPLIED SCIENCES
Non-differentiable saddle points and sub-optimal local minima exist for deep ReLU networks
2021，NEURAL NETWORKS
Understanding the loss landscape of one-hidden-layer ReLU networks
2021，KNOWLEDGE-BASED SYSTEMS
Deep Neural Network Technique for High-Dimensional Microwave Modeling and Applications to Parameter Extraction of Microwave Filters
2019，IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES

Source ：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

ISSN： 2162-237X

Year： 2022

1 0 . 4

JCR@2022

1 0 . 4 0 0

JCR@2022

ESI Discipline： COMPUTER SCIENCE;

ESI HC Threshold：46

JCR Journal Grade：1

CAS Journal Grade：1

Cited Count：

WoS CC Cited Count： 1

SCOPUS Cited Count： 3

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 12

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to