Dual-Branch Knowledge Enhancement Network with Vision-Language Model for Human-Object Interaction Detection - Details

Author：

Indexed by：

CPCI-S EI Scopus

Abstract：

Human-Object　Interaction　(HOI)　detection　aims　to　localize　human-object　pairs　and　comprehend　their　interactions.　Recently,　pre-trained　Vision-Language　Models　(VLM)　have　shown　their　great　recognition　ability　in　HOI　detection　task.　However,　these　VLM　based　methods　are　struggle　to　transfer　knowledge　to　achieve　desired　performance.　To　this　end,　we　propose　a　Dual-Branch　Knowledge　Enhancement　Network　with　VLM　(DBKEN-VLM)　within　the　two-stage　paradigm　to　enhance　the　effectiveness　of　VLM.　Specifically,　we　propose　a　semantic　mining　decoder　to　supplement　contextual　and　action-related　semantic　information　into　our　model.　It　forms　a　dual-branch　knowledge　enhancement　network　with　spatial　guided　decoder.　Furthermore,　we　propose　a　two-level　fusion　strategy　for　the　dualbranch　network　to　facilitate　better　knowledge　transfer　of　VLM.　One　is　feature-level　fusion,　producing　more　instructive　interaction　features;　another　is　decision-level　fusion,　further　enhancing　the　capability　of　VLM　for　HOI　detection.　The　proposed　method　achieves　competitive　performance　compared　to　recent　methods　on　two　benchmark　datasets,　HICO-DET　and　V-COCO.　©　2024　IEEE.

Keyword：

Knowledge representation Holmium alloys Decoding Visual languages Semantics

Author Community：

[ 1 ] [Zhou, Guangpu]Beijing University of Technology, Department of Information, Beijing, China
[ 2 ] [Kong, Dehui]Beijing University of Technology, Department of Information, Beijing, China
[ 3 ] [Li, Jinghua]Beijing University of Technology, Department of Information, Beijing, China
[ 4 ] [Chen, Dongpan]Beijing University of Technology, Department of Information, Beijing, China
[ 5 ] [Bai, Zhuowei]Beijing University of Technology, Department of Information, Beijing, China
[ 6 ] [Yin, Baocai]Beijing University of Technology, Department of Information, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Towards a software/knowware co-engineering
2006，1st International Conference on Knowledge Science, Engineering and Management, KSEM 2006
Application of an Improved Synthetical Semantic Similarity Method in Water Knowledge Graph
2020，2020 International Conference on Computer Science and Communication Technology, ICCSCT 2020
Research on temporal operators and applications of temporal ontology
2014，IASTED International Conference on Modelling, Simulation and Identification, MSI 2014
Research on knowledge representation based on ontology for warning system and its application
2008，2008 International Conference on Wireless Communications, Networking and Mobile Computing, WiCOM 2008

Source ：

Year： 2024

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 4

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to