Search results
1 – 10 of over 1000Shutian Ma, Yingyi Zhang and Chengzhi Zhang
The purpose of this paper is to classify Chinese word semantic relations, which are synonyms, antonyms, hyponyms and meronymys.
Abstract
Purpose
The purpose of this paper is to classify Chinese word semantic relations, which are synonyms, antonyms, hyponyms and meronymys.
Design/methodology/approach
Basically, four simple methods are applied, ontology-based, dictionary-based, pattern-based and morpho-syntactic method. The authors make good use of search engine to build lexical and semantic resources for dictionary-based and pattern-based methods. To improve classification performance with more external resources, they also classify the given word pairs in Chinese and in English at the same time by using machine translation.
Findings
Experimental results show that the approach achieved an average F1 score of 50.87 per cent, an average accuracy of 70.36 per cent and an average recall of 40.05 per cent over all classification tasks. Synonym and antonym classification achieved high accuracy, i.e. above 90 per cent. Moreover, dictionary-based and pattern-based approaches work effectively on final data set.
Originality/value
For many natural language processing (NLP) tasks, the step of distinguishing word semantic relation can help to improve system performance, such as information extraction and knowledge graph generation. Currently, common methods for this task rely on large corpora for training or dictionaries and thesauri for inference, where limitation lies in freely data access and keeping built lexical resources up-date. This paper builds a primary system for classifying Chinese word semantic relations by seeking new ways to obtain the external resources efficiently.
Details
Keywords
Nora Madi, Rawan Al-Matham and Hend Al-Khalifa
The purpose of this paper is to provide an overall review of grammar checking and relation extraction (RE) literature, their techniques and the open challenges associated with…
Abstract
Purpose
The purpose of this paper is to provide an overall review of grammar checking and relation extraction (RE) literature, their techniques and the open challenges associated with them; and, finally, suggest future directions.
Design/methodology/approach
The review on grammar checking and RE was carried out using the following protocol: we prepared research questions, planed for searching strategy, addressed paper selection criteria to distinguish relevant works, extracted data from these works, and finally, analyzed and synthesized the data.
Findings
The output of error detection models could be used for creating a profile of a certain writer. Such profiles can be used for author identification, native language identification or even the level of education, to name a few. The automatic extraction of relations could be used to build or complete electronic lexical thesauri and knowledge bases.
Originality/value
Grammar checking is the process of detecting and sometimes correcting erroneous words in the text, while RE is the process of detecting and categorizing predefined relationships between entities or words that were identified in the text. The authors found that the most obvious challenge is the lack of data sets, especially for low-resource languages. Also, the lack of unified evaluation methods hinders the ability to compare results.
Details
Keywords
Futao Zhao, Zhong Yao, Jing Luan and Hao Liu
The purpose of this paper is to propose a methodology to construct a stock market sentiment lexicon by incorporating domain-specific knowledge extracted from diverse Chinese media…
Abstract
Purpose
The purpose of this paper is to propose a methodology to construct a stock market sentiment lexicon by incorporating domain-specific knowledge extracted from diverse Chinese media outlets.
Design/methodology/approach
This paper presents a novel method to automatically generate financial lexicons using a unique data set that comprises news articles, analyst reports and social media. Specifically, a novel method based on keyword extraction is used to build a high-quality seed lexicon and an ensemble mechanism is developed to integrate the knowledge derived from distinct language sources. Meanwhile, two different methods, Pointwise Mutual Information and Word2vec, are applied to capture word associations. Finally, an evaluation procedure is performed to validate the effectiveness of the method compared with four traditional lexicons.
Findings
The experimental results from the three real-world testing data sets show that the ensemble lexicons can significantly improve sentiment classification performance compared with the four baseline lexicons, suggesting the usefulness of leveraging knowledge derived from diverse media in domain-specific lexicon generation and corresponding sentiment analysis tasks.
Originality/value
This work appears to be the first to construct financial sentiment lexicons from over 2m posts and headlines collected from more than one language source. Furthermore, the authors believe that the data set established in this study is one of the largest corpora used for Chinese stock market lexicon acquisition. This work is valuable to extract collective sentiment from multiple media sources and provide decision-making support for stock market participants.
Details
Keywords
Junping Qiu and Wen Lou
– The purpose of this study is to construct a Chinese information science resource ontology and to explore a new method for semiautomatic ontology construction.
Abstract
Purpose
The purpose of this study is to construct a Chinese information science resource ontology and to explore a new method for semiautomatic ontology construction.
Design/methodology/approach
More than 8,290 articles indexed in the Chinese Social Science Citation Index (CSSCI), covering the years 2001 to 2010, were included in this study. Statistical analysis, co-occurrence analysis, and semantic similarity methods were applied to the selected articles. The ontology was built using existing construction principles and methods, as well as categories and hierarchy definitions based on CSSCI indexing fields.
Findings
Seven categories were found to be relevant for the Chinese information science resource ontology, which, in this study, consists of a three-tier architecture, 78,291 instances, and 182,109 pairs of semantic relations. These results indicate the following: further improvements are required in ontology construction methods; resource ontology is a breakthrough concept in ontology studies; the combination of semantic similarities and co-occurrence analysis can quantitatively describe relationships between concepts.
Originality/value
This study pioneers the resource ontology concept. It is one of the first to combine informetric methods with semantic similarity to reveal deep relationships in textual data.
Details
Keywords
In recent years, Chinese sentiment analysis has made great progress, but the characteristics of the language itself and downstream task requirements were not explored thoroughly…
Abstract
Purpose
In recent years, Chinese sentiment analysis has made great progress, but the characteristics of the language itself and downstream task requirements were not explored thoroughly. It is not practical to directly migrate achievements obtained in English sentiment analysis to the analysis of Chinese because of the huge difference between the two languages.
Design/methodology/approach
In view of the particularity of Chinese text and the requirement of sentiment analysis, a Chinese sentiment analysis model integrating multi-granularity semantic features is proposed in this paper. This model introduces the radical and part-of-speech features based on the character and word features, with the application of bidirectional long short-term memory, attention mechanism and recurrent convolutional neural network.
Findings
The comparative experiments showed that the F1 values of this model reaches 88.28 and 84.80 per cent on the man-made dataset and the NLPECC dataset, respectively. Meanwhile, an ablation experiment was conducted to verify the effectiveness of attention mechanism, part of speech, radical, character and word factors in Chinese sentiment analysis. The performance of the proposed model exceeds that of existing models to some extent.
Originality/value
The academic contribution of this paper is as follows: first, in view of the particularity of Chinese texts and the requirement of sentiment analysis, this paper focuses on solving the deficiency problem of Chinese sentiment analysis under the big data context. Second, this paper borrows ideas from multiple interdisciplinary frontier theories and methods, such as information science, linguistics and artificial intelligence, which makes it innovative and comprehensive. Finally, this paper deeply integrates multi-granularity semantic features such as character, word, radical and part of speech, which further complements the theoretical framework and method system of Chinese sentiment analysis.
Details
Keywords
The recent report for the Commission of the European Communities on current multilingual activities in the field of scientific and technical information and the 1977 conference on…
Abstract
The recent report for the Commission of the European Communities on current multilingual activities in the field of scientific and technical information and the 1977 conference on the same theme both included substantial sections on operational and experimental machine translation systems, and in its Plan of action the Commission announced its intention to introduce an operational machine translation system into its departments and to support research projects on machine translation. This revival of interest in machine translation may well have surprised many who have tended in recent years to dismiss it as one of the ‘great failures’ of scientific research. What has changed? What grounds are there now for optimism about machine translation? Or is it still a ‘utopian dream’ ? The aim of this review is to give a general picture of present activities which may help readers to reach their own conclusions. After a sketch of the historical background and general aims (section I), it describes operational and experimental machine translation systems of recent years (section II), it continues with descriptions of interactive (man‐machine) systems and machine‐assisted translation (section III), (and it concludes with a general survey of present problems and future possibilities section IV).
The identification of network user relationship in Fancircle contributes to quantifying the violence index of user text, mining the internal correlation of network behaviors among…
Abstract
Purpose
The identification of network user relationship in Fancircle contributes to quantifying the violence index of user text, mining the internal correlation of network behaviors among users, which provides necessary data support for the construction of knowledge graph.
Design/methodology/approach
A correlation identification method based on sentiment analysis (CRDM-SA) is put forward by extracting user semantic information, as well as introducing violent sentiment membership. To be specific, the topic of the implementation of topology mapping in the community can be obtained based on self-built field of violent sentiment dictionary (VSD) by extracting user text information. Afterward, the violence index of the user text is calculated to quantify the fuzzy sentiment representation between the user and the topic. Finally, the multi-granularity violence association rules mining of user text is realized by constructing violence fuzzy concept lattice.
Findings
It is helpful to reveal the internal relationship of online violence under complex network environment. In that case, the sentiment dependence of users can be characterized from a granular perspective.
Originality/value
The membership degree of violent sentiment into user relationship recognition in Fancircle community is introduced, and a text sentiment association recognition method based on VSD is proposed. By calculating the value of violent sentiment in the user text, the annotation of violent sentiment in the topic dimension of the text is achieved, and the partial order relation between fuzzy concepts of violence under the effective confidence threshold is utilized to obtain the association relation.
Details
Keywords
The purpose of this study is to examine the influence of semantic fluency on consumers' aesthetic evaluation in graphic designs with text and the mediating effect of visual…
Abstract
Purpose
The purpose of this study is to examine the influence of semantic fluency on consumers' aesthetic evaluation in graphic designs with text and the mediating effect of visual complexity in this relationship.
Design/methodology/approach
The hypotheses are examined in three experiments. Experiments 1 and 2 both verify that Chinese consumers rated the designs with low (vs high) semantic fluency words as more beautiful, and Experiment 3 further confirmed this effect in non-Chinese speakers.
Findings
Confirmed by Chinese and non-Chinese consumers, high fluency text leads to lower perceived visual complexity and less aesthetic perception of the entire design.
Research limitations/implications
Findings enrich the theory of beauty standards and put forward challenges to the positive relationship between processing fluency and aesthetic pleasure. Findings are limited to the decorative function of text, and lack discussions on how designers should balance when the informational function of text is equally important.
Originality/value
This study is the first to discuss how designs with text influence consumers' aesthetic perception and provides meaningful guidelines of transnational marketing for fashion designers and enterprises.
Details
Keywords
Junsheng Zhang, Yunchuan Sun and Changqing Yao
This paper aims to semantically linking scientific research events implied by scientific and technical literature to support information analysis and information service…
Abstract
Purpose
This paper aims to semantically linking scientific research events implied by scientific and technical literature to support information analysis and information service applications. Literature research is an important method to acquire scientific and technical information which is important for research, development and innovation of science and technology. It is difficult but urgently required to acquire accurate, timely, rapid, short and comprehensive information from the large-scale and fast-growing literature, especially in the big data era. Existing literature-based information retrieval systems focus on basic data organization, and they are far from meeting the needs of information analytics. It becomes urgent to organize and analyze scientific research events related to scientific and technical literature for forecasting development trend of science and technology.
Design/methodology/approach
Scientific literature such as a paper or a patent is represented as a scientific research event, which contains elements including when, where, who, what, how and why. Metadata of literature is used to formulate scientific research events that are implied in introduction and related work sections of literature. Named entities and research objects such as methods, materials and algorithms can be extracted from texts of literature by using text analysis. The authors semantically link scientific research events, entities and objects, and then, they construct the event space for supporting scientific and technical information analysis.
Findings
This paper represents scientific literature as events, which are coarse-grained units comparing with entities and relations in current information organizations. Events and semantic relations among them together formulate a semantic link network, which could support event-centric information browsing, search and recommendation.
Research limitations/implications
The proposed model is a theoretical model, and it needs to verify the efficiency in further experimental application research. The evaluation and applications of semantic link network of scientific research events are further research issues.
Originality/value
This paper regards scientific literature as scientific research events and proposes an approach to semantically link events into a network with multiple-typed entities and relations. According to the needs of scientific and technical information analysis, scientific research events are organized into event cubes which are distributed in a three-dimensioned space for easy-to-understand and information visualization.
Details