Indexed by:
Abstract:
In the information age, people get more and more data from various channels, especially medical data and business. Data, education data and environmental data. In the field of data mining and machine learning research, decision tree has great advantages in knowledge induction. Decision tree constructs tree structure to obtain classification rules. It has become an important method of data mining, and its effective mining rules have theoretical and practical significance. Based on the above reasons, aiming at the problem that in the process of data classification prediction, the data dimension collected in reality is high, and there are redundant attributes that lead to the deviation of decision tree classification results, a classification prediction method based on classical decision tree C4.5 algorithm and CART algorithm is proposed. The main research work is as follows:(1)In this paper, the data set is pre-cleaned to screen out the variables that are meaningful for the classification prediction experiment and improve the evaluation efficiency. Then analyze the data distribution characteristics of the variables with missing values, and use appropriate filling methods to fill the missing data.(2)The XGBoost algorithm is used to evaluate the importance of features, and 15 significant features related to classification prediction are selected to improve the generalization performance and training efficiency of the model.(3)Modeling and training based on classical decision tree C4.5 algorithm and CART algorithm to obtain the classification performance of the model. © 2023 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2023
Page: 268-272
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 8
Affiliated Colleges: