• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Chen, Mengyun (Chen, Mengyun.) | Gao, Kaixin (Gao, Kaixin.) | Liu, Xiaolei (Liu, Xiaolei.) | Wang, Zidong (Wang, Zidong.) | Ni, Ningxi (Ni, Ningxi.) | Zhang, Qian (Zhang, Qian.) | Chen, Lei (Chen, Lei.) | Ding, Chao (Ding, Chao.) | Huang, Zhenghai (Huang, Zhenghai.) | Wang, Min (Wang, Min.) | Wang, Shuangling (Wang, Shuangling.) | Yu, Fan (Yu, Fan.) | Zhao, Xinyuan (Zhao, Xinyuan.) | Xu, Dachuan (Xu, Dachuan.)

Indexed by:

EI

Abstract:

It is well-known that second-order optimizer can accelerate the training of deep neural networks, however, the huge computation cost of second-order optimization makes it impractical to apply in real practice. In order to reduce the cost, many methods have been proposed to approximate a second-order matrix. Inspired by KFAC, we propose a novel Trace-based Hardware-driven layer-ORiented Natural Gradient Descent Computation method, called THOR, to make the second-order optimization applicable in the real application models. Specifically, we gradually increase the update interval and use the matrix trace to determine which blocks of Fisher Information Matrix (FIM) need to be updated. Moreover, by resorting the power of hardware, we have designed a hardware-driven approximation method for computing FIM to achieve better performance. To demonstrate the effectiveness of THOR, we have conducted extensive experiments. The results show that training ResNet-50 on ImageNet with THOR only takes 66.7 minutes to achieve a top-1 accuracy of 75.9 % under an 8 Ascend 910 environment with MindSpore, a new deep learning computing framework. Moreover, with more computational resources, THOR can only takes 2.7 minutes to 75.9 % with 256 Ascend 910. Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved

Keyword:

Matrix algebra Optimization Fisher information matrix Gradient methods Deep neural networks

Author Community:

  • [ 1 ] [Chen, Mengyun]Huawei Technologies Co. Ltd, China
  • [ 2 ] [Gao, Kaixin]Tianjin University, China
  • [ 3 ] [Liu, Xiaolei]Tianjin University, China
  • [ 4 ] [Wang, Zidong]Huawei Technologies Co. Ltd, China
  • [ 5 ] [Ni, Ningxi]Huawei Technologies Co. Ltd, China
  • [ 6 ] [Zhang, Qian]Beijing University of Technology, China
  • [ 7 ] [Chen, Lei]Hong Kong University of Science and Technology, Hong Kong
  • [ 8 ] [Ding, Chao]Chinese Academy of Sciences, China
  • [ 9 ] [Huang, Zhenghai]Tianjin University, China
  • [ 10 ] [Wang, Min]Huawei Technologies Co. Ltd, China
  • [ 11 ] [Wang, Shuangling]Huawei Technologies Co. Ltd, China
  • [ 12 ] [Yu, Fan]Huawei Technologies Co. Ltd, China
  • [ 13 ] [Zhao, Xinyuan]Beijing University of Technology, China
  • [ 14 ] [Xu, Dachuan]Beijing University of Technology, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Source :

Year: 2021

Volume: 8B

Page: 7046-7054

Language: English

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 5

Affiliated Colleges:

Online/Total:721/10590193
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.