Indexed by:
Abstract:
While recent approaches based on multi-stage temporal convolutional network (TCN) can achieve good accuracy in action segmentation, they cannot get an excellent Fl-score, which makes them difficult to be applied in practice. The main issue we investigated is that the TCN lacks the max-pool and hence it is difficult to capture sufficient semantic information which leads to over-segmentation. To reduce the occurrence of over-segmentation, we propose the Semantic Guidance module (SG) to capture high-level semantic features and guide the TCN. In addition, we consider the role of each stage in a multi-stage architecture and deploy a lighter parameter-sharing TCN (PSTCN) as the backbone, which achieves higher accuracy and reduces about 16% parameters than the most popular backbone. Simultaneously, our proposed Video Speed Prediction module (VSP) explores temporal information and improves temporal modeling ability. Combining PS-TCN with VSP and using SG for guidance yield an accurate and robust segmentation model. Extensive experiments demonstrate that our model is much better than the benchmark MS-TCN++ (e.g. from 45.9% to 56.4% Fl@50 on Breakfast) and achieves state-of-the-art performance on two challenging datasets.
Keyword:
Reprint Author's Address:
Email:
Source :
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
ISSN: 2161-4393
Year: 2022
Cited Count:
WoS CC Cited Count: 4
SCOPUS Cited Count: 8
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 4
Affiliated Colleges: