• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Zhang, W. (Zhang, W..) | Wang, T. (Wang, T..) | Fu, G. (Fu, G..) | Bao, Z. (Bao, Z..)

Indexed by:

Scopus

Abstract:

Deploying models on resource-constrained edge devices remains always a critical challenge for the application of neural network. Quantization is one of the most popular methods to compress the model for meeting the performance limitations. As only a small amount of calibration data is required, post-training quantization (PTQ) is more suitable for protecting privacy than quantization-aware training(QAT). However, PTQ often causes substantial accuracy degradation when it goes below 4-bit, and previous PTQ works primarily focused on single reconstruction quantization granularity, either all layer-wise or all block-wise. Nevertheless, it is proved in our exploratory experiments that these schemes are sub-optimal. In this paper, we explore the relation of Hessian matrix trace and the inter-layer dependency which takes key role in the choice of quantization reconstruction granularity. Based on the discovery, we propose a novel hybrid reconstruction granularity quantization scheme AQRG, which adaptively adjusts quantization granularity guided by the Hessian matrix trace. In image classification and object detection, AQRG achieves better accuracy and robustness for calibration data size on several typical convolutional neural networks. In particular, our 4-bit weight 2-bit activation (W4A2) scheme in ResNet-18 achieved 65.06% accuracy on the ImageNet dataset. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.

Keyword:

Artificial intelligence Convolutional neural network Fine-tuning Model compression Post-training quantization

Author Community:

  • [ 1 ] [Zhang W.]College of Computer Science, Beijing University of Technology, Beijing, China
  • [ 2 ] [Wang T.]College of Computer Science, Beijing University of Technology, Beijing, China
  • [ 3 ] [Fu G.]College of Computer Science, Beijing University of Technology, Beijing, China
  • [ 4 ] [Bao Z.]College of Computer Science, Beijing University of Technology, Beijing, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Source :

Neural Computing and Applications

ISSN: 0941-0643

Year: 2025

6 . 0 0 0

JCR@2022

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 1

Affiliated Colleges:

Online/Total:2150/10949965
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.