THOR, Trace-Based Hardware-Driven Layer-Oriented Natural Gradient Descent Computation - Details

Author：

Indexed by：

Abstract：

It　is　well-known　that　second-order　optimizer　can　accelerate　the　training　of　deep　neural　networks,　however,　the　huge　computation　cost　of　second-order　optimization　makes　it　impractical　to　apply　in　real　practice.　In　order　to　reduce　the　cost,　many　methods　have　been　proposed　to　approximate　a　second-order　matrix.　Inspired　by　KFAC,　we　propose　a　novel　Trace-based　Hardware-driven　layer-ORiented　Natural　Gradient　Descent　Computation　method,　called　THOR,　to　make　the　second-order　optimization　applicable　in　the　real　application　models.　Specifically,　we　gradually　increase　the　update　interval　and　use　the　matrix　trace　to　determine　which　blocks　of　Fisher　Information　Matrix　(FIM)　need　to　be　updated.　Moreover,　by　resorting　the　power　of　hardware,　we　have　designed　a　hardware-driven　approximation　method　for　computing　FIM　to　achieve　better　performance.　To　demonstrate　the　effectiveness　of　THOR,　we　have　conducted　extensive　experiments.　The　results　show　that　training　ResNet-50　on　ImageNet　with　THOR　only　takes　66.7　minutes　to　achieve　a　top-1　accuracy　of　75.9　%　under　an　8　Ascend　910　environment　with　MindSpore,　a　new　deep　learning　computing　framework.　Moreover,　with　more　computational　resources,　THOR　can　only　takes　2.7　minutes　to　75.9　%　with　256　Ascend　910.　Copyright　©　2021,　Association　for　the　Advancement　of　Artificial　Intelligence　(www.aaai.org).　All　rights　reserved

Keyword：

Matrix algebra Optimization Fisher information matrix Gradient methods Deep neural networks

Author Community：

[ 1 ] [Chen, Mengyun]Huawei Technologies Co. Ltd, China
[ 2 ] [Gao, Kaixin]Tianjin University, China
[ 3 ] [Liu, Xiaolei]Tianjin University, China
[ 4 ] [Wang, Zidong]Huawei Technologies Co. Ltd, China
[ 5 ] [Ni, Ningxi]Huawei Technologies Co. Ltd, China
[ 6 ] [Zhang, Qian]Beijing University of Technology, China
[ 7 ] [Chen, Lei]Hong Kong University of Science and Technology, Hong Kong
[ 8 ] [Ding, Chao]Chinese Academy of Sciences, China
[ 9 ] [Huang, Zhenghai]Tianjin University, China
[ 10 ] [Wang, Min]Huawei Technologies Co. Ltd, China
[ 11 ] [Wang, Shuangling]Huawei Technologies Co. Ltd, China
[ 12 ] [Yu, Fan]Huawei Technologies Co. Ltd, China
[ 13 ] [Zhao, Xinyuan]Beijing University of Technology, China
[ 14 ] [Xu, Dachuan]Beijing University of Technology, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Optimization of number and placement of sensors for structural health monitoring
2008，Journal of Vibration and Shock
TT@CIM: A Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization
2023，IEEE Journal of Solid-State Circuits
A spectral kernel learning algorithm for classification
2010，
Prediction of NMF-based Wiener Filter for Speech Enhancement Using Deep Neural Networks
2020，2020 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2020

Source ：

Year： 2021

Volume： 8B

Page： 7046-7054

Language： English

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 5

Affiliated Colleges：

Get Fulltext

Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to