Search results

1 – 10 of over 6000
Article
Publication date: 30 January 2023

Zhongbao Liu and Wenjuan Zhao

In recent years, Chinese sentiment analysis has made great progress, but the characteristics of the language itself and downstream task requirements were not explored thoroughly…

Abstract

Purpose

In recent years, Chinese sentiment analysis has made great progress, but the characteristics of the language itself and downstream task requirements were not explored thoroughly. It is not practical to directly migrate achievements obtained in English sentiment analysis to the analysis of Chinese because of the huge difference between the two languages.

Design/methodology/approach

In view of the particularity of Chinese text and the requirement of sentiment analysis, a Chinese sentiment analysis model integrating multi-granularity semantic features is proposed in this paper. This model introduces the radical and part-of-speech features based on the character and word features, with the application of bidirectional long short-term memory, attention mechanism and recurrent convolutional neural network.

Findings

The comparative experiments showed that the F1 values of this model reaches 88.28 and 84.80 per cent on the man-made dataset and the NLPECC dataset, respectively. Meanwhile, an ablation experiment was conducted to verify the effectiveness of attention mechanism, part of speech, radical, character and word factors in Chinese sentiment analysis. The performance of the proposed model exceeds that of existing models to some extent.

Originality/value

The academic contribution of this paper is as follows: first, in view of the particularity of Chinese texts and the requirement of sentiment analysis, this paper focuses on solving the deficiency problem of Chinese sentiment analysis under the big data context. Second, this paper borrows ideas from multiple interdisciplinary frontier theories and methods, such as information science, linguistics and artificial intelligence, which makes it innovative and comprehensive. Finally, this paper deeply integrates multi-granularity semantic features such as character, word, radical and part of speech, which further complements the theoretical framework and method system of Chinese sentiment analysis.

Details

Data Technologies and Applications, vol. 57 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 7 August 2017

Hao Wang and Sanhong Deng

In the era of Big Data, network digital resources are growing rapidly, especially the short-text resources, such as tweets, comments, messages and so on, are showing a vigorous…

Abstract

Purpose

In the era of Big Data, network digital resources are growing rapidly, especially the short-text resources, such as tweets, comments, messages and so on, are showing a vigorous vitality. This study aims to compare the categories discriminative capacity (CDC) of Chinese language fragments with different granularities and to explore and verify feasibility, rationality and effectiveness of the low-granularity feature, such as Chinese characters in Chinese short-text classification (CSTC).

Design/methodology/approach

This study takes discipline classification of journal articles from CSSCI as a simulation environment. On the basis of sorting out the distribution rules of classification features with various granularities, including keywords, terms and characters, the classification effects accessed by the SVM algorithm are comprehensively compared and evaluated from three angles of using the same experiment samples, testing before and after feature optimization, and introducing external data.

Findings

The granularity of a classification feature has an important impact on CSTC. In general, the larger the granularity is, the better the classification result is, and vice versa. However, a low-granularity feature is also feasible, and its CDC could be improved by reasonable weight setting, even exceeding a high-granularity feature if synthetically considering classification precision, computational complexity and text coverage.

Originality/value

This is the first study to propose that Chinese characters are more suitable as descriptive features in CSTC than terms and keywords and to demonstrate that CDC of Chinese character features could be strengthened by mixing frequency and position as weight.

Article
Publication date: 29 March 2024

Sihao Li, Jiali Wang and Zhao Xu

The compliance checking of Building Information Modeling (BIM) models is crucial throughout the lifecycle of construction. The increasing amount and complexity of information…

Abstract

Purpose

The compliance checking of Building Information Modeling (BIM) models is crucial throughout the lifecycle of construction. The increasing amount and complexity of information carried by BIM models have made compliance checking more challenging, and manual methods are prone to errors. Therefore, this study aims to propose an integrative conceptual framework for automated compliance checking of BIM models, allowing for the identification of errors within BIM models.

Design/methodology/approach

This study first analyzed the typical building standards in the field of architecture and fire protection, and then the ontology of these elements is developed. Based on this, a building standard corpus is built, and deep learning models are trained to automatically label the building standard texts. The Neo4j is utilized for knowledge graph construction and storage, and a data extraction method based on the Dynamo is designed to obtain checking data files. After that, a matching algorithm is devised to express the logical rules of knowledge graph triples, resulting in automated compliance checking for BIM models.

Findings

Case validation results showed that this theoretical framework can achieve the automatic construction of domain knowledge graphs and automatic checking of BIM model compliance. Compared with traditional methods, this method has a higher degree of automation and portability.

Originality/value

This study introduces knowledge graphs and natural language processing technology into the field of BIM model checking and completes the automated process of constructing domain knowledge graphs and checking BIM model data. The validation of its functionality and usability through two case studies on a self-developed BIM checking platform.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 6 February 2017

Liangjun Zhou, Jerred Junqi Wang, Xiaoying Chen, Chundong Lei, James J. Zhang and Xiao Meng

Building upon the framework of glocalization, the purpose of this paper is to summarize the development of National Basketball Association (NBA) in Chinese market, explore its…

3086

Abstract

Purpose

Building upon the framework of glocalization, the purpose of this paper is to summarize the development of National Basketball Association (NBA) in Chinese market, explore its successful and unsuccessful places, and propose strategies of glocalization for the NBA as well as other overseas sport leagues.

Design/methodology/approach

The current case study was organized by summarizing the developmental history of NBA in China, analyzing its current promotional practices, investigating into its marketing strategies, and extrapolating practical references for other sport leagues aiming to penetrating into the Chinese marketplace.

Findings

The current case study concluded that when facing the current challenges, the NBA needs to bring authentic American cultural commodities while adding Chinese characteristics to accommodate local fans. Meanwhile, the NBA management needs to continue seeking ways to work out and through the differences in government models and cultural contexts between China and USA. In addition, this study suggested that the research framework of glocalization would be an ever intriguing inquiry needed for other sport organizations or leagues seeking expansion to overseas markets.

Originality/value

A thorough case study with the NBA that has achieved huge successes in Chinese markets will provide valuable implications for sport leagues to broaden their overseas markets.

Details

International Journal of Sports Marketing and Sponsorship, vol. 18 no. 1
Type: Research Article
ISSN: 1464-6668

Keywords

Article
Publication date: 10 June 2014

Ping Bao and Suoling Zhu

The purpose of this paper is to present a system for recognition of location names in ancient books written in languages, such as Chinese, in which proper names are not signaled…

Abstract

Purpose

The purpose of this paper is to present a system for recognition of location names in ancient books written in languages, such as Chinese, in which proper names are not signaled by an initial capital letter.

Design/methodology/approach

Rule-based and statistical methods were combined to develop a set of rules for identification of product-related location names in the local chronicles of Guangdong. A name recognition system, with functions of document management, information extraction and storage, rule management, location name recognition, and inquiry and statistics, was developed using Microsoft's .NET framework, SQL Server 2005, ADO.NET and XML. The system was evaluated with precision ratio, recall ratio and the comprehensive index, F.

Findings

The system was quite successful at recognizing product-related location names (F was 71.8 percent), demonstrating the potential for application of automatic named entity recognition techniques in digital collation of ancient books such as local chronicles.

Research limitations/implications

Results suffered from limitations in initial digitization of the text. Statistical methods, such as the hidden Markov model, should be combined with an extended set of recognition rules to improve recognition scores and system efficiency.

Practical implications

Electronic access to local chronicles by location name saves time for chorographers and provides researchers with new opportunities.

Social implications

Named entity recognition brings previously isolated ancient documents together in a knowledge base of scholarly and cultural value.

Originality/value

Automatic name recognition can be implemented in information extraction from ancient books in languages other than English. The system described here can also be adapted to modern texts and other named entities.

Article
Publication date: 19 February 2018

Qiujun Lan, Haojie Ma and Gang Li

Sentiment identification of Chinese text faces many challenges, such as requiring complex preprocessing steps, preparing various word dictionaries carefully and dealing with a lot…

Abstract

Purpose

Sentiment identification of Chinese text faces many challenges, such as requiring complex preprocessing steps, preparing various word dictionaries carefully and dealing with a lot of informal expressions, which lead to high computational complexity.

Design/methodology/approach

A method based on Chinese characters instead of words is proposed. This method represents the text into a fixed length vector and introduces the chi-square statistic to measure the categorical sentiment score of a Chinese character. Based on these, the sentiment identification could be accomplished through four main steps.

Findings

Experiments on corpus with various themes indicate that the performance of proposed method is a little bit worse than existing Chinese words-based methods on most texts, but with improved performance on short and informal texts. Especially, the computation complexity of the proposed method is far better than words-based methods.

Originality/value

The proposed method exploits the property of Chinese characters being a linguistic unit with semantic information. Contrasting to word-based methods, the computational efficiency of this method is significantly improved at slight loss of accuracy. It is more sententious and cuts off the problems resulted from preparing predefined dictionaries and various data preprocessing.

Details

Information Discovery and Delivery, vol. 46 no. 1
Type: Research Article
ISSN: 2398-6247

Keywords

Article
Publication date: 23 November 2012

Vincent Tam

Learning Chinese is unquestionably very important and popular worldwide with the fast economic growth of China. To most foreigners and also local students, one of the major…

Abstract

Purpose

Learning Chinese is unquestionably very important and popular worldwide with the fast economic growth of China. To most foreigners and also local students, one of the major challenges in learning Chinese is to write Chinese characters in correct stroke sequences that are considered as significant in the Chinese culture. However, due to the potentially complicated structures of Chinese characters together with their stroke sequences, there are very few character recognition techniques that can effectively tackle the involved training task in an efficient and flexible manner. The purpose of this paper is to propose an intelligent and flexible e‐learning software based on learning objects to facilitate the learning of writing Chinese characters in correct stroke sequences.

Design/methodology/approach

The paper adopts an incremental approach in designing the overall system architecture to emphasize on extendibility of the system. The basic features of the system including the evolution and pronunciation of each Chinese character can be embedded as a part of the learning object metadata to enhance students' understanding of Chinese characters. To demonstrate the feasibility of this proposal, a prototype of the proposed e‐learning software was built on smartphones such that students can learn anytime and anywhere.

Findings

From the empirical evaluation of the e‐learning prototype for learning to write correct Chinese characters on mobile devices, it was found that foreign students can learn and practise the writing more effectively anytime and anywhere on their mobile devices after classes. Some initial positive feedback was collected. Furthermore, a more careful and thorough evaluation is planned to be conducted in relevant courses for foreign students in the upcoming Fall semester.

Originality/value

This proposal represents the first attempt to reduce the complexity while increasing the extendibility of the e‐learning software to learn Chinese through learning objects running on smartphones or mobile devices in general. More importantly, it opens up numerous opportunities for further investigations including possible integrations with other existing Chinese e‐learning systems.

Details

Interactive Technology and Smart Education, vol. 9 no. 4
Type: Research Article
ISSN: 1741-5659

Keywords

Article
Publication date: 3 June 2019

Chih-Ming Chen and Chung Chang

With the rapid development of digital humanities, some digital humanities platforms have been successfully developed to support digital humanities research for humanists. However…

1108

Abstract

Purpose

With the rapid development of digital humanities, some digital humanities platforms have been successfully developed to support digital humanities research for humanists. However, most of them have still not provided a friendly digital reading environment and practicable social network analysis tool to support humanists on interpreting texts and exploring characters’ social network relationships. Moreover, the advancement of digitization technologies for the retrieval and use of Chinese ancient books is arising an unprecedented challenge and opportunity. For these reasons, this paper aims to present a Chinese ancient books digital humanities research platform (CABDHRP) to support historical China studies. In addition to providing digital archives, digital reading, basic search and advanced search functions for Chinese ancient books, this platform still provides two novel functions that can more effectively support digital humanities research, including an automatic text annotation system (ATAS) for interpreting texts and a character social network relationship map tool (CSNRMT) for exploring characters’ social network relationships.

Design/methodology/approach

This study adopted DSpace, an open-source institutional repository system, to serve as a digital archives system for archiving scanned images, metadata, and full texts to develop the CABDHRP for supporting digital humanities (DH) research. Moreover, the ATAS developed in the CABDHRP used the Node.js framework to implement the system’s front- and back-end services, as well as application programming interfaces (APIs) provided by different databases, such as China Biographical Database (CBDB) and TGAZ, used to retrieve the useful linked data (LD) sources for interpreting ancient texts. Also, Neo4j which is an open-source graph database management system was used to implement the CSNRMT of the CABDHRP. Finally, JavaScript and jQuery were applied to develop a monitoring program embedded in the CABDHRP to record the use processes from humanists based on xAPI (experience API). To understand the research participants’ perception when interpreting the historical texts and characters’ social network relationships with the support of ATAS and CSNRMT, semi-structured interviews with 21 research participants were conducted.

Findings

An ATAS embedded in the reading interface of CABDHRP can collect resources from different databases through LD for automatically annotating ancient texts to support digital humanities research. It allows the humanists to refer to resources from diverse databases when interpreting ancient texts, as well as provides a friendly text annotation reader for humanists to interpret ancient text through reading. Additionally, the CSNRMT provided by the CABDHRP can semi-automatically identify characters’ names based on Chinese word segmentation technology and humanists’ support to confirm and analyze characters’ social network relationships from Chinese ancient books based on visualizing characters’ social networks as a knowledge graph. The CABDHRP not only can stimulate humanists to explore new viewpoints in a humanistic research, but also can promote the public to emerge the learning interest and awareness of Chinese ancient books.

Originality/value

This study proposed a novel CABDHRP that provides the advanced features, including the automatic word segmentation of Chinese text, automatic Chinese text annotation, semi-automatic character social network analysis and user behavior analysis, that are different from other existed digital humanities platforms. Currently, there is no this kind of digital humanities platform developed for humanists to support digital humanities research.

Details

The Electronic Library , vol. 37 no. 2
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 1 May 2007

Fuchun Peng and Xiangji Huang

The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word…

Abstract

Purpose

The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word boundary information is available in written text. The paper advocates a simple language modeling based approach for this task.

Design/methodology/approach

Naïve Bayes, maximum entropy model, support vector machines, and language modeling approaches were implemented and were applied to Chinese and Japanese text classification. To investigate the influence of word segmentation, different word segmentation approaches were investigated and applied to Chinese text. A segmentation‐based approach was compared with the non‐segmentation‐based approach.

Findings

There were two findings: the experiments show that statistical language modeling can significantly outperform standard techniques, given the same set of features; and it was found that classification with word level features normally yields improved classification performance, but that classification performance is not monotonically related to segmentation accuracy. In particular, classification performance may initially improve with increased segmentation accuracy, but eventually classification performance stops improving, and can in fact even decrease, after a certain level of segmentation accuracy.

Practical implications

Apply the findings to real web text classification is ongoing work.

Originality/value

The paper is very relevant to Chinese and Japanese information processing, e.g. webpage classification, web search.

Details

Journal of Documentation, vol. 63 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 5 May 2023

Ying Yu and Jing Ma

The tender documents, an essential data source for internet-based logistics tendering platforms, incorporate massive fine-grained data, ranging from information on tenderee…

Abstract

Purpose

The tender documents, an essential data source for internet-based logistics tendering platforms, incorporate massive fine-grained data, ranging from information on tenderee, shipping location and shipping items. Automated information extraction in this area is, however, under-researched, making the extraction process a time- and effort-consuming one. For Chinese logistics tender entities, in particular, existing named entity recognition (NER) solutions are mostly unsuitable as they involve domain-specific terminologies and possess different semantic features.

Design/methodology/approach

To tackle this problem, a novel lattice long short-term memory (LSTM) model, combining a variant contextual feature representation and a conditional random field (CRF) layer, is proposed in this paper for identifying valuable entities from logistic tender documents. Instead of traditional word embedding, the proposed model uses the pretrained Bidirectional Encoder Representations from Transformers (BERT) model as input to augment the contextual feature representation. Subsequently, with the Lattice-LSTM model, the information of characters and words is effectively utilized to avoid error segmentation.

Findings

The proposed model is then verified by the Chinese logistic tender named entity corpus. Moreover, the results suggest that the proposed model excels in the logistics tender corpus over other mainstream NER models. The proposed model underpins the automatic extraction of logistics tender information, enabling logistic companies to perceive the ever-changing market trends and make far-sighted logistic decisions.

Originality/value

(1) A practical model for logistic tender NER is proposed in the manuscript. By employing and fine-tuning BERT into the downstream task with a small amount of data, the experiment results show that the model has a better performance than other existing models. This is the first study, to the best of the authors' knowledge, to extract named entities from Chinese logistic tender documents. (2) A real logistic tender corpus for practical use is constructed and a program of the model for online-processing real logistic tender documents is developed in this work. The authors believe that the model will facilitate logistic companies in converting unstructured documents to structured data and further perceive the ever-changing market trends to make far-sighted logistic decisions.

Details

Data Technologies and Applications, vol. 58 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

1 – 10 of over 6000