Weakly-supervised video anomaly detection via temporal resolution feature learning - Details

Author：

Peng, Shengjun (Peng, Shengjun.) | Cai, Yiheng (Cai, Yiheng.) | Yao, Zijun (Yao, Zijun.) | Tan, Meiling (Tan, Meiling.)

Indexed by：

EI Scopus SCIE

Abstract：

Weakly　supervised　video　anomaly　detection　(WS-VAD)　is　often　formulated　as　a　multiple　instance　learning　(MIL)　problem.　Snippet-level　anomaly　scores　can　be　predicted　using　only　video-level　annotations,　but　most　MIL　approaches　focus　on　improving　the　performance　of　the　feature　learning　network　and　ignore　the　method　design　of　the　preprocessing　stage.　MIL-based　methods　usually　preprocess　videos　of　different　lengths　into　a　predefined　number　of　snippets　for　later　anomaly　identification.　This　is　impractical　for　real-world　videos　of　varying　lengths　when　the　duration　of　anomalous　events　is　unknown　in　training.　Data　with　different　temporal　resolutions　generated　by　this　division　confuses　the　network　and　leads　to　limited　detection　capability.　To　address　this　issue,　we　propose　a　novel　WS-VAD　method.　First,　a　temporal　resolution　feature　mapping　module　(TRFM)　improves　the　network＇s　learning　ability　for　input　data　with　different　temporal　resolutions　by　mapping　the　temporal　resolution　information　into　the　feature　learning　space.　We　also　introduce　a　gated　recurrent　unit　(GRU)-based　multi-scale　temporal　feature　learning　module　(MS-GRU),　combining　GRUs　with　multi-scale　convolutional　structures　and　fusing　features　recursively　at　different　time　scales.　This　module　exploits　the　ability　of　GRUs　to　extract　temporal　information　and　compensates　for　the　fact　that　GRUs　only　extract　single-scale　temporal　dependence.　In　addition,　we　propose　the　Adaptive-k　module　to　optimize　the　original　Top-k　loss　and　increase　flexibility　in　training　by　using　the　optimal　number　of　anomalous　segments　k　generated　according　to　the　different　inputs.　This　approach　is　fully　applicable　to　real-world　videos　of　various　lengths.　Experimental　results　show　that　our　model　boosts　the　detection　accuracy　for　data　with　enormous　differences　in　temporal　resolution　and　obtains　state-of-the-art　frame-level　AUC　performance　on　three　real-world　surveillance　datasets:　UCF-Crime,　ShanghaiTech　and　XD-violence　datasets.

Keyword：

Multiple instance learning Weak supervision Video anomaly detection Temporal resolution

Author Community：

[ 1 ] [Peng, Shengjun]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 2 ] [Cai, Yiheng]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 3 ] [Yao, Zijun]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 4 ] [Tan, Meiling]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China

Reprint Author's Address：

Email：

417209402@qq.com |
caiyiheng@bjut.edu.cn |
yaozijun@emails.bjut.edu.cn |
tanmeiling_1@163.com

Show more details

Related Keywords：

Transformer-based multiple instance learning network with 2D positional encoding for histopathology image classification
2025，Complex and Intelligent Systems
Self-attention Pyramidal Convolutional Network for Weakly-supervised Video Anomaly Detection
2022，6th International Conference on Electronic Information Technology and Computer Engineering, EITCE 2022
Video anomaly detection based on deep learning
2023，5th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2023
Multi-scale Siamese prediction network for video anomaly detection
2022，SIGNAL IMAGE AND VIDEO PROCESSING

Source ：

APPLIED INTELLIGENCE

ISSN： 0924-669X

Year： 2023

Issue： 24

Volume： 53

Page： 30607-30625

5 . 3 0 0

JCR@2022

Cited Count：

WoS CC Cited Count： 2

SCOPUS Cited Count： 1

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to