EISNet: A Multi-Modal Fusion Network for Semantic Segmentation with Events and Images - Details

Author：

Xie, B. (Xie, B..) | Deng, Y. (Deng, Y..) | Shao, Z. (Shao, Z..) | Li, Y. (Li, Y..)

Indexed by：

EI Scopus SCIE

Abstract：

Bio-inspired　event　cameras　record　a　scene　as　sparse　and　asynchronous　“events”　by　detecting　per-pixel　brightness　changes.　Such　cameras　show　great　potential　in　challenging　scene　understanding　tasks,　benefiting　from　the　imaging　advantages　of　high　dynamic　range　and　high　temporal　resolution.　Considering　the　complementarity　between　event　and　standard　cameras,　we　propose　a　multi-modal　fusion　network　(EISNet)　to　improve　the　semantic　segmentation　performance.　The　key　challenges　of　this　topic　lie　in　(i)　how　to　encode　event　data　to　represent　accurate　scene　information　and　(ii)　how　to　fuse　multi-modal　complementary　features　by　considering　the　characteristics　of　two　modalities.　To　solve　the　first　challenge,　we　propose　an　Activity-Aware　Event　Integration　Module　(AEIM)　to　convert　event　data　into　frame-based　representations　with　high-confidence　details　via　scene　activity　modeling.　To　tackle　the　second　challenge,　we　introduce　the　Modality　Recalibration　and　Fusion　Module　(MRFM)　to　recalibrate　modal-specific　representations　and　then　aggregate　multi-modal　features　at　multiple　stages.　MRFM　learns　to　generate　modal-oriented　masks　to　guide　the　merging　of　complementary　features,　achieving　adaptive　fusion.　Based　on　these　two　core　designs,　our　proposed　EISNet　adopts　an　encoder-decoder　transformer　architecture　for　accurate　semantic　segmentation　using　events　and　images.　Experimental　results　show　that　our　model　outperforms　state-of-the-art　methods　by　a　large　margin　on　event-based　semantic　segmentation　datasets.　The　code　is　publicly　available　at　https://github.com/bochenxie/EISNet.　IEEE

Keyword：

Visualization Noise measurement Standards semantic segmentation Event camera Semantic segmentation Semantics Cameras multi-modal fusion attention mechanism Task analysis

Author Community：

[ 1 ] [Xie B.]Department of Mechanical Engineering, City University of Hong Kong, Hong Kong SAR, China
[ 2 ] [Deng Y.]College of Computer Science, Beijing University of Technology, Beijing, China
[ 3 ] [Shao Z.]College of Information Science and Engineering, Hunan Normal University, Changsha, China
[ 4 ] [Li Y.]Department of Mechanical Engineering, City University of Hong Kong, Hong Kong SAR, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Action Recognition and Benchmark Using Event Cameras
2023，IEEE Transactions on Pattern Analysis and Machine Intelligence
Attention-Aware and Semantic-Aware Network for RGB-D Indoor Semantic Segmentation
2021，Chinese Journal of Computers
Global and Local Interactive Perception Network for Referring Image Segmentation
2023，IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Attention-Bridged Modal Interaction for Text-to-Image Generation
2024，IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Source ：

IEEE Transactions on Multimedia

ISSN： 1520-9210

Year： 2024

Volume： 26

Page： 1-12

7 . 3 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 4

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 9

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to