Indexed by:
Abstract:
Deploying models on resource-constrained edge devices remains always a critical challenge for the application of neural network. Quantization is one of the most popular methods to compress the model for meeting the performance limitations. As only a small amount of calibration data is required, post-training quantization (PTQ) is more suitable for protecting privacy than quantization-aware training(QAT). However, PTQ often causes substantial accuracy degradation when it goes below 4-bit, and previous PTQ works primarily focused on single reconstruction quantization granularity, either all layer-wise or all block-wise. Nevertheless, it is proved in our exploratory experiments that these schemes are sub-optimal. In this paper, we explore the relation of Hessian matrix trace and the inter-layer dependency which takes key role in the choice of quantization reconstruction granularity. Based on the discovery, we propose a novel hybrid reconstruction granularity quantization scheme AQRG, which adaptively adjusts quantization granularity guided by the Hessian matrix trace. In image classification and object detection, AQRG achieves better accuracy and robustness for calibration data size on several typical convolutional neural networks. In particular, our 4-bit weight 2-bit activation (W4A2) scheme in ResNet-18 achieved 65.06% accuracy on the ImageNet dataset. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.
Keyword:
Reprint Author's Address:
Email:
Source :
Neural Computing and Applications
ISSN: 0941-0643
Year: 2025
6 . 0 0 0
JCR@2022
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 1
Affiliated Colleges: