Search results

1 – 10 of over 51000
Article
Publication date: 21 January 2019

Issa Alsmadi and Keng Hoon Gan

Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type…

1101

Abstract

Purpose

Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type of document based on their content has a significant implication in many applications. The need to classify these documents in relevant classes according to their text contents should be interested in many practical reasons. Short-text classification is an essential step in many applications, such as spam filtering, sentiment analysis, Twitter personalization, customer review and many other applications related to social networks. Reviews on short text and its application are limited. Thus, this paper aims to discuss the characteristics of short text, its challenges and difficulties in classification. The paper attempt to introduce all stages in principle classification, the technique used in each stage and the possible development trend in each stage.

Design/methodology/approach

The paper as a review of the main aspect of short-text classification. The paper is structured based on the classification task stage.

Findings

This paper discusses related issues and approaches to these problems. Further research could be conducted to address the challenges in short texts and avoid poor accuracy in classification. Problems in low performance can be solved by using optimized solutions, such as genetic algorithms that are powerful in enhancing the quality of selected features. Soft computing solution has a fuzzy logic that makes short-text problems a promising area of research.

Originality/value

Using a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. Current solutions still have low performance, implying the need for improvement. This paper discusses related issues and approaches to these problems.

Details

International Journal of Web Information Systems, vol. 15 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 26 June 2020

Jamal Al Qundus, Adrian Paschke, Shivam Gupta, Ahmad M. Alzouby and Malik Yousef

The purpose of this paper is to explore to which extent the quality of social media short text without extensions can be investigated and what are the predictors, if any, of such…

Abstract

Purpose

The purpose of this paper is to explore to which extent the quality of social media short text without extensions can be investigated and what are the predictors, if any, of such short text that lead to trust its content.

Design/methodology/approach

The paper applies a trust model to classify data collections based on metadata into four classes: Very Trusted, Trusted, Untrusted and Very Untrusted. These data are collected from the online communities, Genius and Stack Overflow. In order to evaluate short texts in terms of its trust levels, the authors have conducted two investigations: (1) A natural language processing (NLP) approach to extract relevant features (i.e. Part-of-Speech and various readability indexes). The authors report relatively good performance of the NLP study. (2) A machine learning technique in more precise, a random forest (RF) classifierusing bag-of-words model (BoW).

Findings

The investigation of the RF classifier using BoW shows promising intermediate results (on average 62% accuracy of both online communities) in short-text quality identification that leads to trust.

Practical implications

As social media becomes an increasingly new and attractive source of information, which is mostly provided in the form of short texts, businesses (e.g. in search engines for smart data) can filter content without having to apply complex approaches and continue to deal with information that is considered more trustworthy.

Originality/value

Short-text classifications with regard to a criterion (e.g. quality, readability) are usually extended by an external source or its metadata. This enhancement either changes the original text if it is an additional text from an external source, or it requires text metadata that is not always available. To this end, the originality of this study faces the challenge of investigating the quality of short text (i.e. social media text) without having to extend or modify it using external sources. This modification alters the text and distorts the results of the investigation.

Details

Journal of Enterprise Information Management, vol. 33 no. 6
Type: Research Article
ISSN: 1741-0398

Keywords

Article
Publication date: 6 February 2023

Francina Malan and Johannes Lodewyk Jooste

The purpose of this paper is to compare the effectiveness of the various text mining techniques that can be used to classify maintenance work-order records into their respective…

Abstract

Purpose

The purpose of this paper is to compare the effectiveness of the various text mining techniques that can be used to classify maintenance work-order records into their respective failure modes, focussing on the choice of algorithm and preprocessing transforms. Three algorithms are evaluated, namely Bernoulli Naïve Bayes, multinomial Naïve Bayes and support vector machines.

Design/methodology/approach

The paper has both a theoretical and experimental component. In the literature review, the various algorithms and preprocessing techniques used in text classification is considered from three perspectives: the domain-specific maintenance literature, the broader short-form literature and the general text classification literature. The experimental component consists of a 5 × 2 nested cross-validation with an inner optimisation loop performed using a randomised search procedure.

Findings

From the literature review, the aspects most affected by short document length are identified as the feature representation scheme, higher-order n-grams, document length normalisation, stemming, stop-word removal and algorithm selection. However, from the experimental analysis, the selection of preprocessing transforms seemed more dependent on the particular algorithm than on short document length. Multinomial Naïve Bayes performs marginally better than the other algorithms, but overall, the performances of the optimised models are comparable.

Originality/value

This work highlights the importance of model optimisation, including the selection of preprocessing transforms. Not only did the optimisation improve the performance of all the algorithms substantially, but it also affects model comparisons, with multinomial Naïve Bayes going from the worst to the best performing algorithm.

Details

Journal of Quality in Maintenance Engineering, vol. 29 no. 3
Type: Research Article
ISSN: 1355-2511

Keywords

Article
Publication date: 7 August 2017

Hao Wang and Sanhong Deng

In the era of Big Data, network digital resources are growing rapidly, especially the short-text resources, such as tweets, comments, messages and so on, are showing a vigorous…

Abstract

Purpose

In the era of Big Data, network digital resources are growing rapidly, especially the short-text resources, such as tweets, comments, messages and so on, are showing a vigorous vitality. This study aims to compare the categories discriminative capacity (CDC) of Chinese language fragments with different granularities and to explore and verify feasibility, rationality and effectiveness of the low-granularity feature, such as Chinese characters in Chinese short-text classification (CSTC).

Design/methodology/approach

This study takes discipline classification of journal articles from CSSCI as a simulation environment. On the basis of sorting out the distribution rules of classification features with various granularities, including keywords, terms and characters, the classification effects accessed by the SVM algorithm are comprehensively compared and evaluated from three angles of using the same experiment samples, testing before and after feature optimization, and introducing external data.

Findings

The granularity of a classification feature has an important impact on CSTC. In general, the larger the granularity is, the better the classification result is, and vice versa. However, a low-granularity feature is also feasible, and its CDC could be improved by reasonable weight setting, even exceeding a high-granularity feature if synthetically considering classification precision, computational complexity and text coverage.

Originality/value

This is the first study to propose that Chinese characters are more suitable as descriptive features in CSTC than terms and keywords and to demonstrate that CDC of Chinese character features could be strengthened by mixing frequency and position as weight.

Article
Publication date: 29 April 2021

Heng-Yang Lu, Yi Zhang and Yuntao Du

Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet…

Abstract

Purpose

Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet Allocation may suffer from the sparsity problem when dealing with short texts, which mostly come from the Web. These models also exist the readability problem when displaying the discovered topics. The purpose of this paper is to propose a novel model called the Sense Unit based Phrase Topic Model (SenU-PTM) for both the sparsity and readability problems.

Design/methodology/approach

SenU-PTM is a novel phrase-based short-text topic model under a two-phase framework. The first phase introduces a phrase-generation algorithm by exploiting word embeddings, which aims to generate phrases with the original corpus. The second phase introduces a new concept of sense unit, which consists of a set of semantically similar tokens for modeling topics with token vectors generated in the first phase. Finally, SenU-PTM infers topics based on the above two phases.

Findings

Experimental results on two real-world and publicly available datasets show the effectiveness of SenU-PTM from the perspectives of topical quality and document characterization. It reveals that modeling topics on sense units can solve the sparsity of short texts and improve the readability of topics at the same time.

Originality/value

The originality of SenU-PTM lies in the new procedure of modeling topics on the proposed sense units with word embeddings for short-text topic discovery.

Details

Data Technologies and Applications, vol. 55 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 10 April 2017

Angeline Close Scheinbaum, Stefan Hampel and Mihyun Kang

Marketers use e-mail in new, potentially more informative, entertaining and lucrative ways – such as embedding video. The purpose of this paper is to examine consumer responses to…

2567

Abstract

Purpose

Marketers use e-mail in new, potentially more informative, entertaining and lucrative ways – such as embedding video. The purpose of this paper is to examine consumer responses to audiovisual (i.e. text along with a short video) versus text-only messages in brand communication. Specifically, authors seek to uncover the efficacy of marketer-embedded video (vs text-only) in e-mail on the consumer's product interest, informativeness, perceived prestige, electronic word-of-mouth (e-WOM) intentions and willingness to pass the electronic message along digitally or on social media. With the dual coding theory and selective visual attention as theoretical guideposts, the intended contribution is a framework that can explain and predict advantages for multi-modal e-mail marketing communications.

Design/methodology/approach

Five hypotheses are tested experimentally with a one-factor experiment with two conditions (text-only vs audiovisual). The sample was 240 adult participants. Real brands (Audi and Apple) were used. For both brands, participants were randomly assigned to one of two conditions of the e-mail (i.e. audiovisual vs text-only). The stimuli are identical, with the exception of embedded video in the e-mail body. The videos are authentic brand videos, are approximately 50 s and use a product feature appeal. Participants’ pre-existing brand attitude was measured. Then, five dependent variables (product interest, informativeness, perceived prestige, e-WOM intentions and willingness to pass the electronic message along digitally or on social media) were considered with respect to consumer exposure to e-mail with video and text in the e-mail from the brand versus text-only e-mail from the brand.

Findings

The results supported the hypotheses that audiovisual messages (i.e. those with text and video) heighten informativeness, product interest, perceived prestige, intentions to spread e-WOM for a brand and willingness to pass along the e-mail along to friends and family when compared to text-only messages. These experimental findings from a one-factor experiment with two conditions (text-only vs audiovisual) are generally consistent for an American consumer technology brand Apple (iPhone) and a German luxury automobile brand Audi (S4). Hypotheses are supported for both brands (Apple and Audi), with the exception of product interest for Audi, which may be explained by the high price of a luxury automobile.

Research limitations/implications

An implication here for the dual coding theory is that the theory may be extended to consider what happens after the consumer codes the information with both the verbal and the non-verbal subsystem. The finding of interest to information processing scholars is that a video accompanying text communication from a brand to a consumer has an advantage over text-only communication. Brands that communicate with multi-modal marketing communication have better outcomes in informativeness, brand prestige perceptions and intentions of online consumer behaviors, including positive e-WOM for the brand in general and willingness to pass the specific content along in digital and social media platforms. Consumers can become brand advocates by being more inclined to forward the e-mails with the product short video as well as the e-mail text.

Practical implications

Brand marketers should consider e-mail in an integrated brand promotion (IBP) campaign as a cost advantage; one of the reasons e-mail should have a solid place in the IBP toolkit is due to e-mail's relatively low cost. The main cost comes with administration and production of the video. As a managerial implication for advertisers, embedding ads of a short video format in e-mails is a way to be more effective than plain-text e-mails. Short videos in e-mails are a reasonable idea to include in an integrated marketing communications effort (plausibly due to information processing with both a verbal and a non-verbal system). Brands can use videos in e-mails to enhance informativeness regarding products to enhance product differentiation from competitors. Yet, it is important to raise caution with some concerning disadvantages potentially associated with e-mail marketing and video. The three areas of caution include potential issues of privacy, clutter and technical inhibitors.

Originality/value

Despite the fact that e-mail is one of the most heavily used communication tools in marketing, there is scarce literature on e-mail and branding. By brands evoking a degree of prestige with embedded videos, consumer willingness to become part of the marketing communications is enhanced, as their e-WOM and willingness to share the branded content increase.

Details

European Journal of Marketing, vol. 51 no. 3
Type: Research Article
ISSN: 0309-0566

Keywords

Book part
Publication date: 23 August 2022

Carol Abiri and Katina Zammit

The teaching of reading in English is fraught with challenges that influence teachers' practices in Papua New Guinea (PNG). There are a plethora of linguistic issues regarding…

Abstract

The teaching of reading in English is fraught with challenges that influence teachers' practices in Papua New Guinea (PNG). There are a plethora of linguistic issues regarding teaching in both the vernacular languages and English. Postcolonial education in PNG has continued to promote English as the medium of instruction while also promoting the use of vernacular and mother tongue. The outcomes-based education reform in the Language and Literacy Policy (1993–2014) supported the use of vernacular languages in the elementary years with the gradual bridging to English in Grade 3. In 2015, the Language and Literacy policy changed to standards-based education. One major shift was from the use of vernacular languages to English as a medium of instruction at all levels of formal education.

In this chapter, we use Tierney's concept of decolonizing spaces to investigate teachers' perspectives on implementing the English standards-based curriculum and the role the vernacular, mother tongue, and translanguaging plays in the classroom as Year 4 teachers grapple with the teaching of reading. It will problematize the colonization of English, the place of translanguaging, and the benefits and challenges for teachers when the classroom teacher most likely is not a native speaker of the children's dialect or English.

Article
Publication date: 19 February 2018

Qiujun Lan, Haojie Ma and Gang Li

Sentiment identification of Chinese text faces many challenges, such as requiring complex preprocessing steps, preparing various word dictionaries carefully and dealing with a lot…

Abstract

Purpose

Sentiment identification of Chinese text faces many challenges, such as requiring complex preprocessing steps, preparing various word dictionaries carefully and dealing with a lot of informal expressions, which lead to high computational complexity.

Design/methodology/approach

A method based on Chinese characters instead of words is proposed. This method represents the text into a fixed length vector and introduces the chi-square statistic to measure the categorical sentiment score of a Chinese character. Based on these, the sentiment identification could be accomplished through four main steps.

Findings

Experiments on corpus with various themes indicate that the performance of proposed method is a little bit worse than existing Chinese words-based methods on most texts, but with improved performance on short and informal texts. Especially, the computation complexity of the proposed method is far better than words-based methods.

Originality/value

The proposed method exploits the property of Chinese characters being a linguistic unit with semantic information. Contrasting to word-based methods, the computational efficiency of this method is significantly improved at slight loss of accuracy. It is more sententious and cuts off the problems resulted from preparing predefined dictionaries and various data preprocessing.

Details

Information Discovery and Delivery, vol. 46 no. 1
Type: Research Article
ISSN: 2398-6247

Keywords

Article
Publication date: 31 October 2023

Hong Zhou, Binwei Gao, Shilong Tang, Bing Li and Shuyu Wang

The number of construction dispute cases has maintained a high growth trend in recent years. The effective exploration and management of construction contract risk can directly…

Abstract

Purpose

The number of construction dispute cases has maintained a high growth trend in recent years. The effective exploration and management of construction contract risk can directly promote the overall performance of the project life cycle. The miss of clauses may result in a failure to match with standard contracts. If the contract, modified by the owner, omits key clauses, potential disputes may lead to contractors paying substantial compensation. Therefore, the identification of construction project contract missing clauses has heavily relied on the manual review technique, which is inefficient and highly restricted by personnel experience. The existing intelligent means only work for the contract query and storage. It is urgent to raise the level of intelligence for contract clause management. Therefore, this paper aims to propose an intelligent method to detect construction project contract missing clauses based on Natural Language Processing (NLP) and deep learning technology.

Design/methodology/approach

A complete classification scheme of contract clauses is designed based on NLP. First, construction contract texts are pre-processed and converted from unstructured natural language into structured digital vector form. Following the initial categorization, a multi-label classification of long text construction contract clauses is designed to preliminary identify whether the clause labels are missing. After the multi-label clause missing detection, the authors implement a clause similarity algorithm by creatively integrating the image detection thought, MatchPyramid model, with BERT to identify missing substantial content in the contract clauses.

Findings

1,322 construction project contracts were tested. Results showed that the accuracy of multi-label classification could reach 93%, the accuracy of similarity matching can reach 83%, and the recall rate and F1 mean of both can reach more than 0.7. The experimental results verify the feasibility of intelligently detecting contract risk through the NLP-based method to some extent.

Originality/value

NLP is adept at recognizing textual content and has shown promising results in some contract processing applications. However, the mostly used approaches of its utilization for risk detection in construction contract clauses predominantly are rule-based, which encounter challenges when handling intricate and lengthy engineering contracts. This paper introduces an NLP technique based on deep learning which reduces manual intervention and can autonomously identify and tag types of contractual deficiencies, aligning with the evolving complexities anticipated in future construction contracts. Moreover, this method achieves the recognition of extended contract clause texts. Ultimately, this approach boasts versatility; users simply need to adjust parameters such as segmentation based on language categories to detect omissions in contract clauses of diverse languages.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Abstract

Details

Children and Mobile Phones: Adoption, Use, Impact, and Control
Type: Book
ISBN: 978-1-78973-036-4

1 – 10 of over 51000