To read this content please select one of the options below:

Inducing stock market lexicons from disparate Chinese texts

Futao Zhao (School of Economics and Management, Beihang University, Beijing, China)
Zhong Yao (School of Economics and Management, Beihang University, Beijing, China) (Institute of Economics and Business, Beihang University, Beijing, China)
Jing Luan (School of Economics and Management, Beijing Jiaotong University, Beijing, China)
Hao Liu (School of Business Administration, Northeastern University, Shenyang, China) (Northeastern University at Qinhuangdao, Qinhuangdao, China)

Industrial Management & Data Systems

ISSN: 0263-5577

Article publication date: 2 January 2020

Issue publication date: 22 March 2020

235

Abstract

Purpose

The purpose of this paper is to propose a methodology to construct a stock market sentiment lexicon by incorporating domain-specific knowledge extracted from diverse Chinese media outlets.

Design/methodology/approach

This paper presents a novel method to automatically generate financial lexicons using a unique data set that comprises news articles, analyst reports and social media. Specifically, a novel method based on keyword extraction is used to build a high-quality seed lexicon and an ensemble mechanism is developed to integrate the knowledge derived from distinct language sources. Meanwhile, two different methods, Pointwise Mutual Information and Word2vec, are applied to capture word associations. Finally, an evaluation procedure is performed to validate the effectiveness of the method compared with four traditional lexicons.

Findings

The experimental results from the three real-world testing data sets show that the ensemble lexicons can significantly improve sentiment classification performance compared with the four baseline lexicons, suggesting the usefulness of leveraging knowledge derived from diverse media in domain-specific lexicon generation and corresponding sentiment analysis tasks.

Originality/value

This work appears to be the first to construct financial sentiment lexicons from over 2m posts and headlines collected from more than one language source. Furthermore, the authors believe that the data set established in this study is one of the largest corpora used for Chinese stock market lexicon acquisition. This work is valuable to extract collective sentiment from multiple media sources and provide decision-making support for stock market participants.

Keywords

Acknowledgements

This research is supported by the National Natural Science Foundation of China Nos 71271012, 71671011 and 71332003.

Citation

Zhao, F., Yao, Z., Luan, J. and Liu, H. (2020), "Inducing stock market lexicons from disparate Chinese texts", Industrial Management & Data Systems, Vol. 120 No. 3, pp. 508-525. https://doi.org/10.1108/IMDS-04-2019-0254

Publisher

:

Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited

Related articles