Indexed by:
Abstract:
Prediction of Apache Spark job execution time is a key technology to guide Spark cluster resource allocation and parameter tuning. In the existing research, a unified modeling method is used for different jobs, and the prediction model considers less factors, resulting in poor prediction effect. In view of the above problems, this paper proposes a classification-based Spark job performance modeling method. The method first selects features that are strongly correlated with job execution time, then classifies jobs according to the selected features, and finally uses GBDT algorithm to build an execution time prediction model for each class of jobs classified. The experimental results show that, compared with the method using unified modeling, the method proposed in this paper can reduce the RMSE and MAPE of the prediction results by an average of 42.5% and 51.1%. © 2022 SPIE
Keyword:
Reprint Author's Address:
Email:
Source :
ISSN: 0277-786X
Year: 2022
Volume: 12259
Language: English
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 6
Affiliated Colleges: