• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Shi, R. (Shi, R..) | Li, T. (Li, T..) | Zhang, L. (Zhang, L..) | Yamaguchi, Y. (Yamaguchi, Y..)

Indexed by:

EI Scopus SCIE

Abstract:

Recent research has demonstrated that Vision Transformers (ViTs) are capable of comparable or even better performance than convolutional neural network (CNN) baselines. The differences in their structural designs are obvious, but our understanding of the differences in their feature representations remains limited. In this work, we propose several techniques to achieve high-quality visualization of representations in ViTs. Both qualitative and quantitative experiments show that our technical improvements can observably improve ViT visualization quality compared to previous studies. Furthermore, we conduct visualizations to explore the disparities between ViTs and CNNs pre-trained on ImageNet1K, revealing three intriguing properties of ViTs: (a) ViT feature propagation retains image detail information with minimal loss, whereas CNNs discard most image details for class discrimination. (b) Different from CNNs, object-related features do not show in ViT higher layers, suggesting that class-discriminative features may not be required for ViT classification. (c) Our visualization-assisted texture-bias experiment reveals that both ViTs and CNNs exhibit texture bias, of which ViTs seem to be more biased towards local textures. IEEE

Keyword:

Optimization convolutional neural network Neural networks Task analysis feature representation optimization visualization Transformers Vision Transformer Image reconstruction Visualization Minimization

Author Community:

  • [ 1 ] [Shi R.]Faculty of Information Technology, Beijing University of Technology, Beijing, China
  • [ 2 ] [Li T.]Faculty of Information Technology, Beijing University of Technology, Beijing, China
  • [ 3 ] [Zhang L.]Faculty of Information Technology, Beijing University of Technology, Beijing, China
  • [ 4 ] [Yamaguchi Y.]Department of General Systems Studies, University of Tokyo, Tokyo, Japan

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

IEEE Transactions on Multimedia

ISSN: 1520-9210

Year: 2023

Volume: 26

Page: 1-13

7 . 3 0 0

JCR@2022

ESI Discipline: COMPUTER SCIENCE;

ESI HC Threshold:19

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count: 7

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 8

Affiliated Colleges:

Online/Total:777/10604427
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.