Indexed by:
Abstract:
The misclassification rate can be used to measure the clustering accuracy. Cai and Zhang (2018) establish an upper bound of misclassification rate for the two-class clustering model Yi=mu l(i)+Zi is an element of R-p where li is an element of{-1,1} and Z(i)(i.i.d)similar to N(0,Ip),i is an element of{1,...,n},when the vector dimension p is larger than sample size n. The authors prove that their key assumption ||mu||(2)>= C-gap(p/n)(1/4) is necessary for any estimator to be consistent. This paper dis-cusses the same problem with sub-Gaussian noises and n >= p: We first use Cai and Zhang's method to give an upper bound of the misclassification rate for ||mu||(2)>= C-gap.Then a lower bound of the misclassification rate is provided under some technical conditions, which matches the upper bound up to a constant multiple. This shows our upper bound estimation optimal. Examples are given to explain those technical conditions easily satisfied. Similar to Cai and Zhang's work, we also prove the assumption ||mu||(2)>= C-gap in our upper bound estimation necessary for any estimator to be consistent as well. Finally, numerical simulations support our theoretical analysis.
Keyword:
Reprint Author's Address:
Email:
Source :
JOURNAL OF THE KOREAN STATISTICAL SOCIETY
ISSN: 1226-3192
Year: 2024
Issue: 1
Volume: 54
Page: 110-143
0 . 6 0 0
JCR@2022
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 7
Affiliated Colleges: