• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Peng, Shengjun (Peng, Shengjun.) | Cai, Yiheng (Cai, Yiheng.) | Yao, Zijun (Yao, Zijun.) | Tan, Meiling (Tan, Meiling.)

Indexed by:

EI Scopus SCIE

Abstract:

Weakly supervised video anomaly detection (WS-VAD) is often formulated as a multiple instance learning (MIL) problem. Snippet-level anomaly scores can be predicted using only video-level annotations, but most MIL approaches focus on improving the performance of the feature learning network and ignore the method design of the preprocessing stage. MIL-based methods usually preprocess videos of different lengths into a predefined number of snippets for later anomaly identification. This is impractical for real-world videos of varying lengths when the duration of anomalous events is unknown in training. Data with different temporal resolutions generated by this division confuses the network and leads to limited detection capability. To address this issue, we propose a novel WS-VAD method. First, a temporal resolution feature mapping module (TRFM) improves the network's learning ability for input data with different temporal resolutions by mapping the temporal resolution information into the feature learning space. We also introduce a gated recurrent unit (GRU)-based multi-scale temporal feature learning module (MS-GRU), combining GRUs with multi-scale convolutional structures and fusing features recursively at different time scales. This module exploits the ability of GRUs to extract temporal information and compensates for the fact that GRUs only extract single-scale temporal dependence. In addition, we propose the Adaptive-k module to optimize the original Top-k loss and increase flexibility in training by using the optimal number of anomalous segments k generated according to the different inputs. This approach is fully applicable to real-world videos of various lengths. Experimental results show that our model boosts the detection accuracy for data with enormous differences in temporal resolution and obtains state-of-the-art frame-level AUC performance on three real-world surveillance datasets: UCF-Crime, ShanghaiTech and XD-violence datasets.

Keyword:

Multiple instance learning Weak supervision Video anomaly detection Temporal resolution

Author Community:

  • [ 1 ] [Peng, Shengjun]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
  • [ 2 ] [Cai, Yiheng]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
  • [ 3 ] [Yao, Zijun]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
  • [ 4 ] [Tan, Meiling]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China

Reprint Author's Address:

Show more details

Related Keywords:

Related Article:

Source :

APPLIED INTELLIGENCE

ISSN: 0924-669X

Year: 2023

Issue: 24

Volume: 53

Page: 30607-30625

5 . 3 0 0

JCR@2022

Cited Count:

WoS CC Cited Count: 2

SCOPUS Cited Count: 1

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 1

Affiliated Colleges:

Online/Total:943/10573517
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.