Self-distillation with Augmentation in Feature Space - Details

Author：

Xu, K. (Xu, K..) | Wang, L. (Wang, L..) | Li, S. (Li, S..) | Xin, J. (Xin, J..) | Yin, B. (Yin, B..)

Indexed by：

EI Scopus SCIE

Abstract：

Compared　with　traditional　knowledge　distillation,　self-distillation　does　not　require　a　pre-trained　teacher　network,　which　is　more　concise.　Among　them,　data　augmentation-based　methods　provide　an　elegant　solution　without　modifying　the　network　structure　or　additional　memory　consumption.　However,　when　employing　data　augmentation　in　the　input　space,　the　forward　propagations　for　augmented　data　bring　additional　computation　costs　and　the　augmentation　methods　need　be　adaptive　to　the　modality　of　input　data.　Meanwhile,　we　note　that　from　a　generalization　perspective,　under　the　condition　of　being　able　to　distinguish　from　other　classes,　a　dispersed　intra-class　feature　distribution　is　superior　to　compact　intra-class　feature　distribution,　especially　for　categories　with　larger　sample　differences.　Based　on　the　above　considerations,　this　paper　proposes　a　feature　augmentation　based　self-distillation　method　(FASD)　based　on　the　idea　of　feature　extrapolation.　For　each　source　feature,　two　augmentations　are　generated　by　subtraction　between　features.　The　one　is　subtracting　the　temporary　class　center　computed　with　samples　belonging　to　the　same　category,　and　another　one　is　subtracting　a　sample　feature　belonging　to　other　categories　with　the　closest　distance.　Then,　the　predicted　outputs　of　the　augmented　features　are　constrained　to　be　consistent　with　that　of　the　source　feature.　The　consistent　constraint　on　the　previous　augmented　feature　expands　the　learned　class　feature　distribution,　leading　to　greater　overlap　with　the　unknown　feature　distribution　of　test　samples,　thereby　improving　the　generalization　performance　of　the　network.　The　consistent　constraint　on　the　latter　augmented　feature　increases　the　distance　between　samples　from　different　categories,　which　enhances　the　distinguishability　between　categories.　Experimental　results　on　image　classification　task　demonstrate　the　effectiveness　and　efficiency　of　the　proposed　method.　Meanwhile,　experiments　on　text　and　audio　tasks　prove　the　universality　of　the　method　for　classification　tasks　with　different　modalities.　IEEE

Keyword：

Knowledge distillation Generalization performance Predictive models Task analysis Data augmentation Self-distillation Training Feature augmentation Extrapolation Feature extraction Knowledge engineering Classification task

Author Community：

[ 1 ] [Xu K.]Beijing Artificial Intelligence Institute, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China
[ 2 ] [Wang L.]Beijing Artificial Intelligence Institute, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China
[ 3 ] [Li S.]School of Automation, Beijing Information Science and Technology University, Beijing, China
[ 4 ] [Xin J.]Beijing Artificial Intelligence Institute, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China
[ 5 ] [Yin B.]Beijing Artificial Intelligence Institute, Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Learning from Teacher’s Failure: A Reflective Learning Paradigm for Knowledge Distillation
2023，IEEE Transactions on Circuits and Systems for Video Technology
Efficient Dual Attention Based Knowledge Distillation Network for Unsupervised Wafer Map Anomaly Detection
2024，IEEE Transactions on Semiconductor Manufacturing
Tongue Color Classification in TCM with Noisy Labels via Confident-Learning-Assisted Knowledge Distillation
2023，CHINESE JOURNAL OF ELECTRONICS
How to identify pollen like a palynologist: A prior knowledge-guided deep feature learning for real-world pollen classification
2024，Expert Systems with Applications

Source ：

IEEE Transactions on Circuits and Systems for Video Technology

ISSN： 1051-8215

Year： 2024

Issue： 10

Volume： 34

Page： 1-1

8 . 4 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 1

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to