• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Liu, Yanbin (Liu, Yanbin.) | Zhang, Wen (Zhang, Wen.) | Qin, Guangjie (Qin, Guangjie.) | Zhao, Jiangpeng (Zhao, Jiangpeng.)

Indexed by:

EI Scopus

Abstract:

In the current stage, software defect prediction is suffering the imbalanced data problem. Traditional methods are insensitive to defect-prone modules and tend to predict defect-prone modules as defect-free modules. To deal with this problem, sampling techniques are adopted to rebalance the defect-prone and defect-free data to train the predictive model in order to improve the performance. However, it is not clear on the combined effect of the sampling techniques and the machine learning classifiers on the performance of software defect prediction. The intent of the paper is to study the performance impact on defect prediction incurred by different combinations of sampling techniques and machine learning classifiers. Specifically, we investigate three types of sampling techniques as resampling, spread subsampling and SMOTE (Synthetic Minority Over-sampling Technique), and five types of machine learning classifiers as C4.5, naive Bayes, logistic regression, support vector machine and deep learning to study their combined effect on defect prediction. By using the Friedman test and Nemenyi test, we find that there isn't an optimal method among all the 12 combinations in defect prediction. However, support vector machine and deep learning have produced the best performance stably among all the investigated projects. With ANOVA analysis, we find that the sampling techniques have great impact on the outcomes of defect prediction because they produce different data distributions for model training. Nevertheless, the sampling proportion has significant impacts on TPR (True Positive Ratio) and FPR (False Positive Ratio) while it can merely influence the AUC (Area under Curve) and Balance of logistic regression. We explain the experimental results in the paper. © 2022 The Authors. Published by Elsevier B.V.

Keyword:

Logistic regression Learning systems Defects Support vector regression Deep learning Forecasting

Author Community:

  • [ 1 ] [Liu, Yanbin]The No.13th Research Institute of China Electronics Technology Group Corporation, Shijiazhuang, China
  • [ 2 ] [Zhang, Wen]College of Economics and Management, Beijing University of Technology, Beijing, China
  • [ 3 ] [Qin, Guangjie]College of Economics and Management, Beijing University of Technology, Beijing, China
  • [ 4 ] [Zhao, Jiangpeng]College of Economics and Management, Beijing University of Technology, Beijing, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Year: 2022

Issue: C

Volume: 214

Page: 1603-1616

Language: English

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 1

Affiliated Colleges:

Online/Total:2016/10857013
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.