GLOCAL: A self-supervised learning framework for global and local motion estimation - Details

Author：

Indexed by：

EI Scopus SCIE

Abstract：

Motions　in　videos　are　typically　a　mixture　of　local　dynamic　object　motions　and　global　camera　motion,　which　are　inconsistent　in　some　cases,　and　even　interfere　with　each　other,　causing　difficulties　in　various　downstream　applications,　such　as　video　stabilization　that　requires　the　global　motion,　and　action　recognition　that　consumes　local　motions.　Therefore,　it　is　crucial　to　estimate　them　separately.　Existing　methods　separate　two　motions　from　the　mixed　motion　fields,　such　as　optical　flow.　However,　the　quality　of　mixed　motion　determines　the　higher　bounds　of　the　performance.　In　this　work,　we　propose　a　framework,　GLOCAL,　to　directly　estimate　global　and　local　motions　simultaneously　from　adjacent　frames　in　a　self-supervised　manner.　Our　GLOCAL　consists　of　a　Global　Motion　Estimation　(GME)　module　and　a　Local　Motion　Estimation　(LME)　module.　The　GME　module　involves　a　mixed　motion　estimation　backbone,　an　implicit　bottleneck　structure　for　feature　dimension　reduction,　and　an　explicit　bottleneck　for　global　motion　recovery　based　on　the　global　motion　bases　with　foreground　mask　under　the　training　guidance　of　proposed　global　reconstruction　loss.　An　attention　U-Net　is　adopted　for　LME　which　produces　local　motions　while　excluding　motion　of　irrelevant　regions　under　the　guidance　of　proposed　local　reconstruction　loss.　Our　method　can　achieve　better　performance　than　the　existing　methods　on　the　homography　estimation　dataset　DHE　and　the　action　recognition　dataset　NCAA　and　UCF-101.　©　2024　Elsevier　B.V.

Keyword：

Video understanding motion estimation motion pattern optical flow

Author Community：

[ 1 ] [Zheng Y.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 2 ] [Luo K.]Megvii Technology, Beijing, 100190, China
[ 3 ] [Liu S.]School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
[ 4 ] [Li Z.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 5 ] [Xiang Y.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 6 ] [Wu L.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 7 ] [Zeng B.]School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
[ 8 ] [Chen C.W.]Department of Computing, The Hong Kong Polytechnic University, Hong Kong, 999077, Hong Kong

Reprint Author's Address：

Email：

Show more details

Related Keywords：

ASGSA: global semantic-aware network for action segmentation
2024，Neural Computing and Applications
Global Motion Pattern Based Event Recognition in Multi-person Videos
2017，2nd CCF Chinese Conference on Computer Vision (CCCV)
Variational optical flow based Velocity Estimation for Omni-Directional Intelligent Wheelchair
2017，10th International Symposium on Computational Intelligence and Design (ISCID)
Shot Boundary Detection with Key Motion Estimation and Appearance Differentiation
2019，2019 IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2019

Source ：

Pattern Recognition Letters

ISSN： 0167-8655

Year： 2024

Volume： 178

Page： 91-97

5 . 1 0 0

JCR@2022

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count： 2

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 11

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to