Indexed by:
Abstract:
Accurately estimating the number of individuals contained in an image is the purpose of the crowd counting. It has always faced two major difficulties: uneven distribution of crowd density and large span of head size. Focusing on the former, most CNN-based methods divide the image into multiple patches for processing, ignoring the connection between the patches. For the latter, the multi-scale feature fusion method using feature pyramid ignores the matching relationship between the head size and the hierarchical features. In response to the above issues, we propose a crowd counting network named CCST based on swin transformer, and tailor a feature adaptive fusion regression head called FAFHead. Swin transformer can fully exchange information within and between patches, and effectively alleviate the problem of uneven distribution of crowd density. FAFHead can adaptively fuse multi-level features, improve the matching relationship between head size and feature pyramid hierarchy, and relief the problem of large span of head size available. Experimental results on common datasets show that CCST has better counting performance than all weakly supervised counting works and great majority of popular density map-based fully supervised works.
Keyword:
Reprint Author's Address:
Email:
Source :
VISUAL COMPUTER
ISSN: 0178-2789
Year: 2022
Issue: 7
Volume: 39
Page: 2671-2682
3 . 5
JCR@2022
3 . 5 0 0
JCR@2022
ESI Discipline: COMPUTER SCIENCE;
ESI HC Threshold:46
JCR Journal Grade:2
CAS Journal Grade:3
Cited Count:
WoS CC Cited Count: 24
SCOPUS Cited Count: 25
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 9
Affiliated Colleges: