Indexed by:
Abstract:
Convolutional neural network involves in high computational complexity and excessive hardware resources, which greatly increases hardware deployment cost of deep learning algorithm. It is a promising scheme to make full use of the information redundancy of sparsity activation between layers can reduce the inference delay and power consumption with low resource overhead and almost lossless network accuracy. To solve low utilization problem of operation module caused by coarse-grained control in sparse convolution neural network accelerator, a sparsity-aware accelerator with flexible parallelism based on FPGA is designed. Convolution operation module is flexibly scheduled based on operation clustering idea,and the parallelism of input channel and output activation is adjusted online.In addition, a parallel propagation mode of input data is designed according to the data consistency during output activated parallel operation. The proposed hardware architecture is implemented on Xilinx VC709. It contains up to 1 024 multiplication and accumulation units and provides 409.6 GOP/s peak computing power, and the operation speed is up to 325.8 GOP/ s in VGG-16 model, which is equivalent to 794.63 GOP/s of accelerator without sparse activation optimization. Its performance is 4.6 times more than that of baseline model. © 2022 Chinese Institute of Electronics. All rights reserved.
Keyword:
Reprint Author's Address:
Email:
Source :
Acta Electronica Sinica
ISSN: 0372-2112
Year: 2022
Issue: 8
Volume: 50
Page: 1811-1818
Cited Count:
SCOPUS Cited Count: 1
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 1
Affiliated Colleges: