Search results

1 – 10 of over 40000
To view the access options for this content please click here
Article

Issa Alsmadi and Keng Hoon Gan

Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify…

Abstract

Purpose

Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type of document based on their content has a significant implication in many applications. The need to classify these documents in relevant classes according to their text contents should be interested in many practical reasons. Short-text classification is an essential step in many applications, such as spam filtering, sentiment analysis, Twitter personalization, customer review and many other applications related to social networks. Reviews on short text and its application are limited. Thus, this paper aims to discuss the characteristics of short text, its challenges and difficulties in classification. The paper attempt to introduce all stages in principle classification, the technique used in each stage and the possible development trend in each stage.

Design/methodology/approach

The paper as a review of the main aspect of short-text classification. The paper is structured based on the classification task stage.

Findings

This paper discusses related issues and approaches to these problems. Further research could be conducted to address the challenges in short texts and avoid poor accuracy in classification. Problems in low performance can be solved by using optimized solutions, such as genetic algorithms that are powerful in enhancing the quality of selected features. Soft computing solution has a fuzzy logic that makes short-text problems a promising area of research.

Originality/value

Using a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. Current solutions still have low performance, implying the need for improvement. This paper discusses related issues and approaches to these problems.

Details

International Journal of Web Information Systems, vol. 15 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

To view the access options for this content please click here
Article

Jamal Al Qundus, Adrian Paschke, Shivam Gupta, Ahmad M. Alzouby and Malik Yousef

The purpose of this paper is to explore to which extent the quality of social media short text without extensions can be investigated and what are the predictors, if any…

Abstract

Purpose

The purpose of this paper is to explore to which extent the quality of social media short text without extensions can be investigated and what are the predictors, if any, of such short text that lead to trust its content.

Design/methodology/approach

The paper applies a trust model to classify data collections based on metadata into four classes: Very Trusted, Trusted, Untrusted and Very Untrusted. These data are collected from the online communities, Genius and Stack Overflow. In order to evaluate short texts in terms of its trust levels, the authors have conducted two investigations: (1) A natural language processing (NLP) approach to extract relevant features (i.e. Part-of-Speech and various readability indexes). The authors report relatively good performance of the NLP study. (2) A machine learning technique in more precise, a random forest (RF) classifierusing bag-of-words model (BoW).

Findings

The investigation of the RF classifier using BoW shows promising intermediate results (on average 62% accuracy of both online communities) in short-text quality identification that leads to trust.

Practical implications

As social media becomes an increasingly new and attractive source of information, which is mostly provided in the form of short texts, businesses (e.g. in search engines for smart data) can filter content without having to apply complex approaches and continue to deal with information that is considered more trustworthy.

Originality/value

Short-text classifications with regard to a criterion (e.g. quality, readability) are usually extended by an external source or its metadata. This enhancement either changes the original text if it is an additional text from an external source, or it requires text metadata that is not always available. To this end, the originality of this study faces the challenge of investigating the quality of short text (i.e. social media text) without having to extend or modify it using external sources. This modification alters the text and distorts the results of the investigation.

Details

Journal of Enterprise Information Management, vol. 33 no. 6
Type: Research Article
ISSN: 1741-0398

Keywords

To view the access options for this content please click here
Article

Hao Wang and Sanhong Deng

In the era of Big Data, network digital resources are growing rapidly, especially the short-text resources, such as tweets, comments, messages and so on, are showing a…

Abstract

Purpose

In the era of Big Data, network digital resources are growing rapidly, especially the short-text resources, such as tweets, comments, messages and so on, are showing a vigorous vitality. This study aims to compare the categories discriminative capacity (CDC) of Chinese language fragments with different granularities and to explore and verify feasibility, rationality and effectiveness of the low-granularity feature, such as Chinese characters in Chinese short-text classification (CSTC).

Design/methodology/approach

This study takes discipline classification of journal articles from CSSCI as a simulation environment. On the basis of sorting out the distribution rules of classification features with various granularities, including keywords, terms and characters, the classification effects accessed by the SVM algorithm are comprehensively compared and evaluated from three angles of using the same experiment samples, testing before and after feature optimization, and introducing external data.

Findings

The granularity of a classification feature has an important impact on CSTC. In general, the larger the granularity is, the better the classification result is, and vice versa. However, a low-granularity feature is also feasible, and its CDC could be improved by reasonable weight setting, even exceeding a high-granularity feature if synthetically considering classification precision, computational complexity and text coverage.

Originality/value

This is the first study to propose that Chinese characters are more suitable as descriptive features in CSTC than terms and keywords and to demonstrate that CDC of Chinese character features could be strengthened by mixing frequency and position as weight.

To view the access options for this content please click here
Article

Angeline Close Scheinbaum, Stefan Hampel and Mihyun Kang

Marketers use e-mail in new, potentially more informative, entertaining and lucrative ways – such as embedding video. The purpose of this paper is to examine consumer…

Abstract

Purpose

Marketers use e-mail in new, potentially more informative, entertaining and lucrative ways – such as embedding video. The purpose of this paper is to examine consumer responses to audiovisual (i.e. text along with a short video) versus text-only messages in brand communication. Specifically, authors seek to uncover the efficacy of marketer-embedded video (vs text-only) in e-mail on the consumer's product interest, informativeness, perceived prestige, electronic word-of-mouth (e-WOM) intentions and willingness to pass the electronic message along digitally or on social media. With the dual coding theory and selective visual attention as theoretical guideposts, the intended contribution is a framework that can explain and predict advantages for multi-modal e-mail marketing communications.

Design/methodology/approach

Five hypotheses are tested experimentally with a one-factor experiment with two conditions (text-only vs audiovisual). The sample was 240 adult participants. Real brands (Audi and Apple) were used. For both brands, participants were randomly assigned to one of two conditions of the e-mail (i.e. audiovisual vs text-only). The stimuli are identical, with the exception of embedded video in the e-mail body. The videos are authentic brand videos, are approximately 50 s and use a product feature appeal. Participants’ pre-existing brand attitude was measured. Then, five dependent variables (product interest, informativeness, perceived prestige, e-WOM intentions and willingness to pass the electronic message along digitally or on social media) were considered with respect to consumer exposure to e-mail with video and text in the e-mail from the brand versus text-only e-mail from the brand.

Findings

The results supported the hypotheses that audiovisual messages (i.e. those with text and video) heighten informativeness, product interest, perceived prestige, intentions to spread e-WOM for a brand and willingness to pass along the e-mail along to friends and family when compared to text-only messages. These experimental findings from a one-factor experiment with two conditions (text-only vs audiovisual) are generally consistent for an American consumer technology brand Apple (iPhone) and a German luxury automobile brand Audi (S4). Hypotheses are supported for both brands (Apple and Audi), with the exception of product interest for Audi, which may be explained by the high price of a luxury automobile.

Research limitations/implications

An implication here for the dual coding theory is that the theory may be extended to consider what happens after the consumer codes the information with both the verbal and the non-verbal subsystem. The finding of interest to information processing scholars is that a video accompanying text communication from a brand to a consumer has an advantage over text-only communication. Brands that communicate with multi-modal marketing communication have better outcomes in informativeness, brand prestige perceptions and intentions of online consumer behaviors, including positive e-WOM for the brand in general and willingness to pass the specific content along in digital and social media platforms. Consumers can become brand advocates by being more inclined to forward the e-mails with the product short video as well as the e-mail text.

Practical implications

Brand marketers should consider e-mail in an integrated brand promotion (IBP) campaign as a cost advantage; one of the reasons e-mail should have a solid place in the IBP toolkit is due to e-mail's relatively low cost. The main cost comes with administration and production of the video. As a managerial implication for advertisers, embedding ads of a short video format in e-mails is a way to be more effective than plain-text e-mails. Short videos in e-mails are a reasonable idea to include in an integrated marketing communications effort (plausibly due to information processing with both a verbal and a non-verbal system). Brands can use videos in e-mails to enhance informativeness regarding products to enhance product differentiation from competitors. Yet, it is important to raise caution with some concerning disadvantages potentially associated with e-mail marketing and video. The three areas of caution include potential issues of privacy, clutter and technical inhibitors.

Originality/value

Despite the fact that e-mail is one of the most heavily used communication tools in marketing, there is scarce literature on e-mail and branding. By brands evoking a degree of prestige with embedded videos, consumer willingness to become part of the marketing communications is enhanced, as their e-WOM and willingness to share the branded content increase.

Details

European Journal of Marketing, vol. 51 no. 3
Type: Research Article
ISSN: 0309-0566

Keywords

To view the access options for this content please click here
Article

Qiujun Lan, Haojie Ma and Gang Li

Sentiment identification of Chinese text faces many challenges, such as requiring complex preprocessing steps, preparing various word dictionaries carefully and dealing…

Abstract

Purpose

Sentiment identification of Chinese text faces many challenges, such as requiring complex preprocessing steps, preparing various word dictionaries carefully and dealing with a lot of informal expressions, which lead to high computational complexity.

Design/methodology/approach

A method based on Chinese characters instead of words is proposed. This method represents the text into a fixed length vector and introduces the chi-square statistic to measure the categorical sentiment score of a Chinese character. Based on these, the sentiment identification could be accomplished through four main steps.

Findings

Experiments on corpus with various themes indicate that the performance of proposed method is a little bit worse than existing Chinese words-based methods on most texts, but with improved performance on short and informal texts. Especially, the computation complexity of the proposed method is far better than words-based methods.

Originality/value

The proposed method exploits the property of Chinese characters being a linguistic unit with semantic information. Contrasting to word-based methods, the computational efficiency of this method is significantly improved at slight loss of accuracy. It is more sententious and cuts off the problems resulted from preparing predefined dictionaries and various data preprocessing.

Details

Information Discovery and Delivery, vol. 46 no. 1
Type: Research Article
ISSN: 2398-6247

Keywords

To view the access options for this content please click here

Abstract

Details

Children and Mobile Phones: Adoption, Use, Impact, and Control
Type: Book
ISBN: 978-1-78973-036-4

To view the access options for this content please click here
Article

Isak Taksa, Sarah Zelikovitz and Amanda Spink

The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users.

Abstract

Purpose

The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users.

Design/methodology/approach

The authors describe an iterative system that begins with a small set of manually labeled terms, which are used to label queries from the log. A set of background knowledge related to these labeled queries is acquired by combining web search results on these queries. This background set is used to obtain many terms that are related to the classification task. The system then ranks each of the related terms, choosing those that most fit the personal properties of the users. These terms are then used to begin the next iteration.

Findings

The authors identify the difficulties of classifying web logs, by approaching this problem from a machine learning perspective. By applying the approach developed, the authors are able to show that many queries in a large query log can be classified.

Research limitations/implications

Testing results in this type of classification work is difficult, as the true personal properties of web users are unknown. Evaluation of the classification results in terms of the comparison of classified queries to well known age‐related sites is a direction that is currently being exploring.

Practical implications

This research is background work that can be incorporated in search engines or other web‐based applications, to help marketing companies and advertisers.

Originality/value

This research enhances the current state of knowledge in shorttext classification and query log learning.

Details

International Journal of Web Information Systems, vol. 3 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

To view the access options for this content please click here
Article

Kristine Pytash, Todd Hawley and Kate Morgan

The purpose of this paper is to explore the potential of using digital shorts (Pytash et al., 2017) focusing on social issues in social studies classrooms.

Abstract

Purpose

The purpose of this paper is to explore the potential of using digital shorts (Pytash et al., 2017) focusing on social issues in social studies classrooms.

Design/methodology/approach

Qualitative case study is used in this study.

Findings

Digital shorts focused on important social issues, and included their beliefs and perspectives about their social issue, as well as insights into their developing identities as citizens. The authors’ findings demonstrate how this assignment can be the gateway for discussions regarding social issues, how students perceive their identities tied to contemporary social issues, and how they make sense of these issues within multimodal compositions.

Research limitations/implications

The findings from this research have implications for researching the effectiveness of digital media production analysis for students’ learning of social issues.

Practical implications

The findings from this research have implications for exploring how digital media production analysis can be incorporated into social studies courses.

Originality/value

Although the push for social studies teachers to provide spaces for students to demonstrate these capacities, few examples exist in the literature.

Details

Social Studies Research and Practice, vol. 13 no. 3
Type: Research Article
ISSN: 1933-5415

Keywords

Content available
Article

Matjaž Kragelj and Mirjana Kljajić Borštnar

The purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods.

Abstract

Purpose

The purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods.

Design/methodology/approach

The general research approach is inherent to design science research, in which the problem of UDC assignment of the old, digitised texts is addressed by developing a machine-learning classification model. A corpus of 70,000 scholarly texts, fully bibliographically processed by librarians, was used to train and test the model, which was used for classification of old texts on a corpus of 200,000 items. Human experts evaluated the performance of the model.

Findings

Results suggest that machine-learning models can correctly assign the UDC at some level for almost any scholarly text. Furthermore, the model can be recommended for the UDC assignment of older texts. Ten librarians corroborated this on 150 randomly selected texts.

Research limitations/implications

The main limitations of this study were unavailability of labelled older texts and the limited availability of librarians.

Practical implications

The classification model can provide a recommendation to the librarians during their classification work; furthermore, it can be implemented as an add-on to full-text search in the library databases.

Social implications

The proposed methodology supports librarians by recommending UDC classifiers, thus saving time in their daily work. By automatically classifying older texts, digital libraries can provide a better user experience by enabling structured searches. These contribute to making knowledge more widely available and useable.

Originality/value

These findings contribute to the field of automated classification of bibliographical information with the usage of full texts, especially in cases in which the texts are old, unstructured and in which archaic language and vocabulary are used.

Details

Journal of Documentation, vol. 77 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

To view the access options for this content please click here
Article

Cong-Phuoc Phan, Hong-Quang Nguyen and Tan-Tai Nguyen

Large collections of patent documents disclosing novel, non-obvious technologies are publicly available and beneficial to academia and industries. To maximally exploit its…

Abstract

Purpose

Large collections of patent documents disclosing novel, non-obvious technologies are publicly available and beneficial to academia and industries. To maximally exploit its potential, searching these patent documents has increasingly become an important topic. Although much research has processed a large size of collections, a few studies have attempted to integrate both patent classifications and specifications for analyzing user queries. Consequently, the queries are often insufficiently analyzed for improving the accuracy of search results. This paper aims to address such limitation by exploiting semantic relationships between patent contents and their classification.

Design/methodology/approach

The contributions are fourfold. First, the authors enhance similarity measurement between two short sentences and make it 20 per cent more accurate. Second, the Graph-embedded Tree ontology is enriched by integrating both patent documents and classification scheme. Third, the ontology does not rely on rule-based method or text matching; instead, an heuristic meaning comparison to extract semantic relationships between concepts is applied. Finally, the patent search approach uses the ontology effectively with the results sorted based on their most common order.

Findings

The experiment on searching for 600 patent documents in the field of Logistics brings better 15 per cent in terms of F-Measure when compared with traditional approaches.

Research limitations/implications

The research, however, still requires improvement in which the terms and phrases extracted by Noun and Noun phrases making less sense in some aspect and thus might not result in high accuracy. The large collection of extracted relationships could be further optimized for its conciseness. In addition, parallel processing such as Map-Reduce could be further used to improve the search processing performance.

Practical implications

The experimental results could be used for scientists and technologists to search for novel, non-obvious technologies in the patents.

Social implications

High quality of patent search results will reduce the patent infringement.

Originality/value

The proposed ontology is semantically enriched by integrating both patent documents and their classification. This ontology facilitates the analysis of the user queries for enhancing the accuracy of the patent search results.

Details

International Journal of Web Information Systems, vol. 15 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 10 of over 40000