• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Li, W.-S. (Li, W.-S..) | Zhang, J. (Zhang, J..) | Zhuo, L. (Zhuo, L..) | Wu, X.-J. (Wu, X.-J..) | Yan, Y. (Yan, Y..)

Indexed by:

EI Scopus

Abstract:

In the field of computer vision, visual segmentation is a fundamental task that categorizes pixels in an image or video frame into distinct regions. Thanks to the significant development of visual segmentation techniques,it plays a key role in various applications such as autonomous driving, aerial remote sensing, and video scene understanding. In recent years,Transformer-based visual segmentation has attracted much attention because of its long-range dependency modeling capability. With the continuous optimization and updating of Transformer’s model architecture,there is an urgent need to more comprehensively understand and recognize the existing progress and development trend of Transformer in field of visual segmentation, and to find out the deficiencies and challenges, so as to explore the core theory of Transformer in a deeper way. To this end, this paper organizes, reviews, analyzes and explores the recent advances in Transformer-based visual segmentation techniques from two visual pipelines of image/ video, not only summarizing the theoretical framework of Transformer, but also giving some application examples and research hotspots,so as to make a summary and overlook. Specifically,the background of the Transformer is initially reviewed, including problem definition, datasets,indicators, and the basic structure, in which the problem definition describes the expected goals and results of visual segmentation in image/video tasks;the dataset and indicators respond to the specific application scenarios of the model as well as the performance measures;the basic structure describes the core modules of the algorithm, the implementation process, and the relationship between the individual module. Then, the four methodologies of Transformer are highlighted in detail in terms of image semantic and instance segmentation,as well as the video semantic and instance segmentation, and current research hotspots are discussed. For the task of image semantic segmentation, the representative structures of Transformer are analyzed,including pure Transformer and dual-branch structures, and the motivation and application effect of Transformer’s improvement are exhibited and the visual results are shown with the practical application cases of unpaved road segmentation of UAV images and semantic segmentation of remote sensing images, while image instance segmentation summarizes the typical structure of Transformer without/with end-to-end framework. Video semantic segmentation is mainly categorized into accuracy-oriented and efficiency-oriented Transformer structures, while video instance segmentation includes frame-by-frame and segment-by-segment Transformer structure. Notably, video instance segmentation takes livestreaming video instance segmentation as an application example, and not only discusses the available datasets, experimental parameters and indicators, but also evaluates and analyzes the performance of the mainstream methods for livestreaming video instance segmentation, and shows some visual results. Subsequently, for segment anything (SAM), open vocabulary segmentation, and referring segmentation, which are widely concerned in the field of visual segmentation, this paper traces and reviews these hotspots, with a view to colliding new ideas and inspirations in visual segmentation. Finally, although Transformer-based visual segmentation has received widespread attention, the scientific problems have gradually emerged, limiting the further improvement of model performance and efficiency. Finally, this paper summarizes the changeable issues that still need to be addressed in terms of image/video semantic/instance segmentation tasks using Transformer, and looks forward to the potential future development directions to provide some insights for reference. © 2024 Science Press. All rights reserved.

Keyword:

semantic segmentation instance segmentation self-attention mechanism visual segmentation Transformer

Author Community:

  • [ 1 ] [Li W.-S.]School of Information Science and Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 2 ] [Zhang J.]School of Information Science and Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 3 ] [Zhang J.]Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, 100124, China
  • [ 4 ] [Zhuo L.]School of Information Science and Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 5 ] [Zhuo L.]Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, 100124, China
  • [ 6 ] [Wu X.-J.]School of Information Science and Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 7 ] [Yan Y.]School of Information Science and Technology, Beijing University of Technology, Beijing, 100124, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Source :

Chinese Journal of Computers

ISSN: 0254-4164

Year: 2024

Issue: 12

Volume: 47

Page: 2760-2782

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 9

Affiliated Colleges:

Online/Total:673/10616029
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.