In the era of Big Data, network digital resources are growing rapidly, especially the short-text resources, such as tweets, comments, messages and so on, are showing a…
In the era of Big Data, network digital resources are growing rapidly, especially the short-text resources, such as tweets, comments, messages and so on, are showing a vigorous vitality. This study aims to compare the categories discriminative capacity (CDC) of Chinese language fragments with different granularities and to explore and verify feasibility, rationality and effectiveness of the low-granularity feature, such as Chinese characters in Chinese short-text classification (CSTC).
This study takes discipline classification of journal articles from CSSCI as a simulation environment. On the basis of sorting out the distribution rules of classification features with various granularities, including keywords, terms and characters, the classification effects accessed by the SVM algorithm are comprehensively compared and evaluated from three angles of using the same experiment samples, testing before and after feature optimization, and introducing external data.
The granularity of a classification feature has an important impact on CSTC. In general, the larger the granularity is, the better the classification result is, and vice versa. However, a low-granularity feature is also feasible, and its CDC could be improved by reasonable weight setting, even exceeding a high-granularity feature if synthetically considering classification precision, computational complexity and text coverage.
This is the first study to propose that Chinese characters are more suitable as descriptive features in CSTC than terms and keywords and to demonstrate that CDC of Chinese character features could be strengthened by mixing frequency and position as weight.
This paper is written as an attempt to employ the Chinese Social Science Citation Index (CSSCI) in the evaluation of Chinese humanities and social science research.
This paper uses statistics in the CSSCI (2000‐2004) to analyze the academic impact of researchers, papers and works, institutions and regions on Chinese humanities and social science research.
The authors identify 100 highly cited people, 50 highly cited papers, 50 highly cited works, 20 highly productive institutions and 20 highly cited institutions. Also provided is some regional information about Chinese humanities and social science research.
It is hoped that the CSSCI, as well as the analysis and evaluation based on it, will give researchers a better understanding of Chinese humanities and social science research.