Indexed by:
Abstract:
Extreme-scale computing involves hundreds of millions of threads with multi-level parallelism running on large-scale hierarchical and heterogeneous hardware. Some OpenMP multi-threaded applications increasingly suffer from runtime hidden behaviors owning to shared resource contention as well as software-and hardware-related problems. Such hidden behaviors can result in failure and inefficiencies and are among the main challenges in system resiliency. To minimize the impact of hidden behaviors, one must quickly and accurately detect and diagnose the hidden behaviors that cause the failures. However, it is difficult to identify hidden behaviors in the dynamic and noisy data collected by OpenMP multi-threaded monitoring infrastructures. This paper presents an anthropomorphic diagnosis framework for hidden behaviors of OpenMP multi-threaded applications. In the framework, we first design injected heartbeat functions for OpenMP multi-threaded applications. Then, we leverage the heartbeat sequences to extract features of hidden behaviors. Finally, we develop a feature learning-based algorithm using heartbeat analysis, namely HSA, to diagnose hidden behaviors. To evaluate our framework, the NAS Parallel NPB benchmark, EPCC OpenMP micro-benchmark suite, and Jacobi benchmark are used to test the performance of our proposed framework. The experimental results demonstrate that our framework successfully identifies 90.3% of the injected hidden behaviors of OpenMP multi-threaded applications while acquiring low overhead.(c) 2023 Elsevier Inc. All rights reserved.
Keyword:
Reprint Author's Address:
Email:
Source :
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
ISSN: 0743-7315
Year: 2023
Volume: 177
Page: 17-27
3 . 8 0 0
JCR@2022
ESI Discipline: COMPUTER SCIENCE;
ESI HC Threshold:19
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 2
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 4
Affiliated Colleges: