A Novel Method of Chinese Text Content Analysis and Mining based on Statistical Models - Details

Author：

Jiao, K. (Jiao, K..)

Indexed by：

EI Scopus

Abstract：

With　the　accumulation　of　various　kinds　of　text　data,　it　is　no　longer　possible　to　generalize　or　classify　them　by　manual　reading,　so　how　to　use　statistical　models　to　mine　text　data　reasonably　and　effectively　has　become　an　important　issue　in　academic　research　and　practical　work.　This　paper　discusses　three　problems　of　Chinese　text　mining:　word　separation,　keyword　extraction　and　text　classification.　For　the　word　separation　problem,　the　Cascaded　Hidden　Markov　Model　and　the　WDM　that　treats　the　segmentation　between　words　as　missing　data　and　solves　it　with　the　EM　algorithm　are　introduced.　For　the　keyword　extraction　problem,　this　paper　proposes　a　Bayes　factor　and　introduces　CCS　using　sparse　regression.　For　the　text　classification　problem,　the　method　of　building　a　classifier　based　on　the　frequency　of　keywords　and　the　method　of　building　a　classifier　based　on　the　probability　of　the　topic　first　are　introduced.　We　give　the　respective　advantages　of　each　method　by　comparing　the　above　methods　with　two　datasets　using　SVM　and　Random　forest,　and　make　suggestions　of　their　use.　©　2023　SPIE.

Keyword：

big data analysis natural language processing data mining Chinese text analysis machine learning

Author Community：

[ 1 ] [Jiao K.]Beijing University of Technology, Beijing, 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Research Progress of Machine Learning Algorithm for Cement Strength Prediction; [机器学习算法用于水泥强度预测的研究进展]
2025，Materials Reports
Traffic Flow Prediction Model Based on Spectral Hypergraph Convolutional Network; [基于谱域超图卷积网络的交通流预测模型]
2024，Journal of Beijing University of Technology
Topological Data Analysis of Two Cases: Text Classification and Business Customer Relationship Management
2020，2020 4th International Workshop on Advanced Algorithms and Control Engineering, IWAACE 2020
Research on automatic sentiment analysis of text movie reviews with machine learning methods
2021，2021 2nd International Conference on Machine Learning and Computer Application, ICMLCA 2021

Source ：

ISSN： 0277-786X

Year： 2023

Volume： 12597

Language： English

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 5

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to