Based on user-generated content from a Chinese social media platform, this paper aims to investigate multiple methods of constructing user profiles and their effectiveness in predicting their gender, age and geographic location.
This investigation collected 331,634 posts from 4,440 users of Sina Weibo. The data were divided into two parts, for training and testing . First, a vector space model and topic models were applied to construct user profiles. A classification model was then learned by a support vector machine according to the training data set. Finally, we used the classification model to predict users’ gender, age and geographic location in the testing data set.
The results revealed that in constructing user profiles, latent semantic analysis performed better on the task of predicting gender and age. By contrast, the method based on a traditional vector space model worked better in making predictions regarding the geographic location. In the process of applying a topic model to construct user profiles, the authors found that different prediction tasks should use different numbers of topics.
This study explores different user profile construction methods to predict Chinese social media network users’ gender, age and geographic location. The results of this paper will help to improve the quality of personal information gathered from social media platforms, and thereby improve personalized recommendation systems and personalized marketing.
This work was supported in part by Major Projects of National Social Science Fund (13&ZD174), the Fundamental Research Funds for the Central Universities (No.30915011323) and the Opening Foundation of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University).
Wang, Q., Ma, S. and Zhang, C. (2017), "Predicting users’ demographic characteristics in a Chinese social media network", The Electronic Library, Vol. 35 No. 4, pp. 758-769. https://doi.org/10.1108/EL-09-2016-0203Download as .RIS
Emerald Publishing Limited
Copyright © 2017, Emerald Publishing Limited