Indexed by:
Abstract:
In the two-past decade, by using the methods of machine learning, the accuracy of performing computer-aided tasks successfully improved. Search engines (Google, Baidu, Bing...) use classification methods to rank the billion pages available on the world wide web. Rankings are made according to the algorithms with various features, which classify each page for a search engine request. The purpose of this paper is to analyze the performance of various machine learning models applied on features selected through different techniques. A dataset, composed of 31 features with 28,000 observations, has been evaluated considering only the characteristics with the highest correlation. To achieve that goal three filter methods were used (Chi-square, Gini index and Fisher) and three wrapper methods (Forward Selection, Backward Elimination and Bidirectional Elimination). To continue the research various classification algorithms were tested to create combination models with previous filtered and wrapper methods. Then, a comparison was done to determine the optimal features' combinations, to improve the correct prediction for an URL to be on Google Top10 SERP. From the research, it can be concluded that for this dataset, the Random Forest model combined with the Fisher filter method or Backward Elimination wrapper method could produce the best results among others. © 2020 ACM.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2020
Page: 84-90
Language: English
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 2
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 0
Affiliated Colleges: