中文说明:
VC维理论和结构风险最小化准则是统计学习理论中的重要内容,基于这一理论的支持向量机算法由于具有好的泛化性能受到重视,并被研究用于文本分类问题.基于多项式核的研究工作认为SVM的泛化能力不受多项式阶数的影响,并且能够处理很高维的分类问题,用于文本分类无需进行特征选择.研究发现,随着多项式核阶数的升高,SVM文本分类器会出现过学习现象,并且特征数越多越明显,特征选择是必需的.通过估计函数集的VC维,基于结构风险最小化理论对此问题进行分析,得出的结论跟实验结果相符.
English Description:
VC dimension theory and structural risk minimization criterion are important contents of statistical learning theory. Support vector machine algorithm based on this theory has been paid attention to because of its good generalization performance, and has been studied for text classification Based on the research work of polynomial kernel, it is considered that the generalization ability of SVM is not affected by the order of polynomial, and can deal with high-dimensional classification problems. It is used for text classification without feature selection It is found that with the increase of polynomial kernel order, SVM text classifier will have over learning phenomenon, and the more features, the more obvious, and feature selection is necessary By estimating the VC dimension of the function set, the problem is analyzed based on the structural risk minimization theory, and the conclusion is consistent with the experimental results p>