Attention-Aware and Semantic-Aware Network for RGB-D Indoor Semantic Segmentation - Details

Author：

Duan, Li-Juan (Duan, Li-Juan.) (Scholars：段立娟) | Sun, Qi-Chao (Sun, Qi-Chao.) | Qiao, Yuan-Hua (Qiao, Yuan-Hua.) (Scholars：乔元华) | Chen, Jun-Cheng (Chen, Jun-Cheng.) | Cui, Guo-Qin (Cui, Guo-Qin.)

Indexed by：

EI Scopus CSCD

Abstract：

Semantic　segmentation　is　a　research　hotspot　in　the　field　of　computer　vision.　It　refers　to　assigning　all　pixels　into　different　semantic　classes.　As　a　fundamental　problem　in　scene　understanding,　semantic　segmentation　is　widely　used　in　various　intelligent　tasks.　In　recent　years,　with　the　success　of　convolutional　neural　network　(CNN)　in　many　computer　vision　applications,　fully　convolutional　networks　(FCN)　have　shown　great　potential　on　RGB　semantic　segmentation　task.　However,　semantic　segmentation　is　still　a　challenging　task　due　to　the　complexity　of　scene　types,　severe　object　occlusions　and　varying　illuminations.　In　recent　years,　with　the　availability　of　consumer　RGB-D　sensors　such　as　RealSense　3D　Camera　and　Microsoft　Kinect,　we　can　capture　both　RGB　image　and　depth　information　at　the　same　time.　Depth　information　can　describe　3D　geometric　information　which　might　be　missed　in　RGB-only　images.　It　can　significantly　reduce　classification　errors　and　improve　the　accuracy　of　semantic　segmentation.　In　order　to　make　effective　use　of　RGB　information　and　depth　information,　it　is　crucial　to　find　an　efficient　multi-modal　information　fusion　method.　According　to　different　fusion　periods,　the　current　RGB-D　feature　fusion　methods　can　be　divided　into　three　types:　early　fusion,　late　fusion　and　middle　fusion.　However,　most　of　previous　studies　fail　to　make　effective　use　of　complementary　information　between　RGB　information　and　depth　information.　They　simply　fuse　RGB　features　and　depth　features　with　equal-weight　concatenating　or　summing,　which　failed　to　extract　complementary　information　between　two　modals　and　will　suppressed　the　modality　specific　information.　In　addition,　semantic　information　in　high　level　features　between　different　modals　is　not　taken　into　account,　which　is　very　important　for　the　fine-grained　semantic　segmentation　task.　To　solve　the　above　problems,　in　this　paper,　we　present　a　novel　Attention-aware　and　Semantic-aware　Multi-modal　Fusion　Network　(ASNet)　for　RGB-D　semantic　segmentation.　Our　network　is　able　to　effectively　fuse　multi-level　RGB-D　features　by　including　Attention-aware　Multi-modal　Fusion　blocks(AMF)　and　Semantic-aware　Multi-modal　Fusion　blocks(SMF).　Specifically,　in　Attention-aware　Multi-modal　Fusion　blocks,　a　cross-modal　attention　mechanism　is　designed　to　make　RGB　features　and　depth　features　guide　and　optimize　each　other　through　their　complementary　characteristics　in　order　to　obtain　the　feature　representation　with　rich　spatial　location　information.　In　addition,　Semantic-aware　Multi-modal　Fusion　blocks　model　the　semantic　interdependencies　between　multi-modal　features　by　integrating　semantic　associated　feature　channels　among　the　RGB　and　depth　features　and　extract　more　precise　semantic　feature　representation.　The　two　blocks　are　integrated　into　a　two-branch　encoder-decoder　architecture,　which　can　restore　image　resolution　gradually　by　using　consecutive　up-sampling　operation　and　combine　low　level　features　and　high　level　features　through　skip-connections　to　achieve　high-resolution　prediction.　In　order　to　optimize　the　training　process,　we　using　deeply　supervised　learning　over　multi-level　decoding　features.　Our　network　is　able　to　effectively　learn　the　complementary　characteristics　of　two　modalities　and　models　the　semantic　context　interdependencies　between　RGB　features　and　depth　features.　Experimental　results　with　two　challenging　public　RGB-D　indoor　semantic　segmentation　datasets,　i.e.,　SUN　RGB-D　and　NYU　Depth　v2,　show　that　our　network　outperforms　existing　RGB-D　semantic　segmentation　methods　and　improves　the　segmentation　performance　by　1.9%　and　1.2%　for　mean　accuracy　and　mean　IoU　respectively.　©　2021,　Science　Press.　All　right　reserved.

Keyword：

Semantics Convolutional neural networks Image resolution Convolution Computer vision Decoding Cameras Semantic Web

Author Community：

[ 1 ] [Duan, Li-Juan]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
[ 2 ] [Duan, Li-Juan]Beijing Key Laboratory of Trusted Computing, Beijing; 100124, China
[ 3 ] [Duan, Li-Juan]National Engineering Laboratory for Key Technologies of Information Security Level Protection, Beijing; 100124, China
[ 4 ] [Sun, Qi-Chao]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
[ 5 ] [Sun, Qi-Chao]Beijing Key Laboratory of Trusted Computing, Beijing; 100124, China
[ 6 ] [Sun, Qi-Chao]Advanced Institute of Information Technology, Peking University, Hangzhou; 311200, China
[ 7 ] [Qiao, Yuan-Hua]College of Applied Sciences, Beijing University of Technology, Beijing; 100124, China
[ 8 ] [Chen, Jun-Cheng]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
[ 9 ] [Cui, Guo-Qin]State Key Laboratory of Digital Multi-media Chip Technology, Vimicro Corporation, Beijing; 100191, China

Reprint Author's Address：

[chen, jun-cheng]faculty of information technology, beijing university of technology, beijing; 100124, china

Email：

juncheng@bjut.edu.cn

Show more details

Related Keywords：

Semantic Segmentation Based on Deeplabv3+ and Attention Mechanism
2021，4th IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference, IMCEC 2021
Shot boundary detection with spatial-temporal convolutional neural networks
2018，1st Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2018
HDR Image Compression with Convolutional Autoencoder
2020，2020 IEEE International Conference on Visual Communications and Image Processing, VCIP 2020
Indoor Semantic Mapping with Efficient Convolutional Neural Networks for Resource-constrained SLAM System
2020，2020 5th International Conference on Intelligent Computing and Signal Processing, ICSP 2020

Source ：

Chinese Journal of Computers

ISSN： 0254-4164

Year： 2021

Issue： 2

Volume： 44

Page： 275-291

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 5

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 6

Affiliated Colleges：

信息科学技术学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to