Multiple instance deep learning for weakly-supervised visual object tracking - Details

Author：

Huang, Kaining (Huang, Kaining.) | Shi, Yan (Shi, Yan.) | Zhao, Fuqi (Zhao, Fuqi.) | Zhang, Zijun (Zhang, Zijun.) | Tu, Shanshan (Tu, Shanshan.)

Indexed by：

EI Scopus SCIE

Abstract：

Intelligently　tracking　objects　with　varied　shapes,　color,　lighting　conditions,　and　backgrounds　is　an　extremely　useful　application　in　many　HCI　applications,　such　as　human　body　motion　capture,　hand　gesture　recognition,　and　virtual　reality　(VR)　games.　However,　accurately　tracking　different　objects　under　uncontrolled　environments　is　a　tough　challenge　due　to　the　possibly　dynamic　object　parts,　varied　lighting　conditions,　and　sophisticated　backgrounds.　In　this　work,　we　propose　a　novel　semantically-aware　object　tracking　framework,　wherein　the　key　is　weakly-supervised　learning　paradigm　that　optimally　transfers　the　video-level　semantic　tags　into　various　regions.　More　specifically,　give　a　set　of　training　video　clips,　each　of　which　is　associated　with　multiple　video-level　semantic　tags,　we　first　propose　a　weakly-supervised　learning　algorithm　to　transfer　the　semantic　tags　into　various　video　regions.　The　key　is　a　MIL　(Zhong　et　al.,　2020)　[1]-based　manifold　embedding　algorithm　that　maps　the　entire　video　regions　into　a　semantic　space,　wherein　the　video-level　semantic　tags　are　well　encoded.　Afterward,　for　each　video　region,　we　use　the　semantic　feature　combined　with　the　appearance　feature　as　its　representation.　We　designed　a　multi-view　learning　algorithm　to　optimally　fuse　the　above　two　types　of　features.　Based　on　the　fused　feature,　we　learn　a　probabilistic　Gaussian　mixture　model　to　predict　the　target　probability　of　each　candidate　window,　where　the　window　with　the　maximal　probability　is　output　as　the　tracking　result.　Comprehensive　comparative　results　on　a　challenging　pedestrian　tracking　task　as　well　as　the　human　hand　gesture　recognition　have　demonstrated　the　effectiveness　of　our　method.　Moreover,　visualized　tracking　results　have　shown　that　non-rigid　objects　with　moderate　occlusions　can　be　well　localized　by　our　method.

Keyword：

Object tracking Weakly-supervised Multi-view feature learning Multiple instance learning (MIL) Gaussian mixture model

Author Community：

[ 1 ] [Huang, Kaining]BengBu Univ, Bengbu City 233000, Anhui, Peoples R China
[ 2 ] [Shi, Yan]BengBu Univ, Bengbu City 233000, Anhui, Peoples R China
[ 3 ] [Zhao, Fuqi]BengBu Univ, Bengbu City 233000, Anhui, Peoples R China
[ 4 ] [Zhang, Zijun]BengBu Univ, Bengbu City 233000, Anhui, Peoples R China
[ 5 ] [Tu, Shanshan]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China

Reprint Author's Address：

[Shi, Yan]BengBu Univ, Bengbu City 233000, Anhui, Peoples R China

Email：

huangkaining@163.com |
shiyan09@mail.ustc.edu.cn |
zhaofuqi11@163.com |
bbxyzzj@163.com |
sstu@bjut.edu.cn

Show more details

Related Keywords：

Weakly-supervised video object localization with attentive spatio-temporal correlation
2021，Pattern Recognition Letters
Transformer-based multiple instance learning network with 2D positional encoding for histopathology image classification
2025，Complex and Intelligent Systems
A Discriminative Tracking Algorithm Based on Multi-Feature Fusion Mechanism
2024，6th International Conference on Communications, Information System and Computer Engineering, CISCE 2024
Visual Tracking with Bounding-Box Fine Adjustment
2018，11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, CISP-BMEI 2018

Source ：

SIGNAL PROCESSING-IMAGE COMMUNICATION

ISSN： 0923-5965

Year： 2020

Volume： 84

3 . 5 0 0

JCR@2022

ESI Discipline： ENGINEERING;

ESI HC Threshold：115

Cited Count：

WoS CC Cited Count： 3

SCOPUS Cited Count： 9

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 4

Affiliated Colleges：

信息科学技术学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to