• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Wang, Weidong (Wang, Weidong.) | Li, Dian (Li, Dian.) | Luo, Wangda (Luo, Wangda.) | Kang, Yujian (Kang, Yujian.) | Wang, Liqiang (Wang, Liqiang.)

Indexed by:

EI Scopus SCIE

Abstract:

Extreme-scale computing involves hundreds of millions of threads with multi-level parallelism running on large-scale hierarchical and heterogeneous hardware. Some OpenMP multi-threaded applications increasingly suffer from runtime hidden behaviors owning to shared resource contention as well as software-and hardware-related problems. Such hidden behaviors can result in failure and inefficiencies and are among the main challenges in system resiliency. To minimize the impact of hidden behaviors, one must quickly and accurately detect and diagnose the hidden behaviors that cause the failures. However, it is difficult to identify hidden behaviors in the dynamic and noisy data collected by OpenMP multi-threaded monitoring infrastructures. This paper presents an anthropomorphic diagnosis framework for hidden behaviors of OpenMP multi-threaded applications. In the framework, we first design injected heartbeat functions for OpenMP multi-threaded applications. Then, we leverage the heartbeat sequences to extract features of hidden behaviors. Finally, we develop a feature learning-based algorithm using heartbeat analysis, namely HSA, to diagnose hidden behaviors. To evaluate our framework, the NAS Parallel NPB benchmark, EPCC OpenMP micro-benchmark suite, and Jacobi benchmark are used to test the performance of our proposed framework. The experimental results demonstrate that our framework successfully identifies 90.3% of the injected hidden behaviors of OpenMP multi-threaded applications while acquiring low overhead.(c) 2023 Elsevier Inc. All rights reserved.

Keyword:

OpenMP Machine learning Heartbeat High performance computing Hidden behaviors

Author Community:

  • [ 1 ] [Wang, Weidong]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
  • [ 2 ] [Li, Dian]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
  • [ 3 ] [Luo, Wangda]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
  • [ 4 ] [Kang, Yujian]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
  • [ 5 ] [Wang, Liqiang]Univ Cent Florida, Dept Comp Sci, Orlando, FL 32816 USA

Reprint Author's Address:

  • [Wang, Weidong]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China;;[Wang, Liqiang]Univ Cent Florida, Dept Comp Sci, Orlando, FL 32816 USA;;

Show more details

Related Keywords:

Related Article:

Source :

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

ISSN: 0743-7315

Year: 2023

Volume: 177

Page: 17-27

3 . 8 0 0

JCR@2022

ESI Discipline: COMPUTER SCIENCE;

ESI HC Threshold:19

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count: 2

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 4

Affiliated Colleges:

Online/Total:300/10625986
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.