Search results

1 – 10 of 26
Article
Publication date: 16 August 2022

Jung Ran Park, Erik Poole and Jiexun Li

The purpose of this study is to explore linguistic stylometric patterns encompassing lexical, syntactic, structural, sentiment and politeness features that are found in…

Abstract

Purpose

The purpose of this study is to explore linguistic stylometric patterns encompassing lexical, syntactic, structural, sentiment and politeness features that are found in librarians’ responses to user queries.

Design/methodology/approach

A total of 462 online texts/transcripts comprising answers of librarians to users’ questions drawn from the Internet Public Library were examined. A Principal Component Analysis, which is a data reduction technique, was conducted on the texts and transcripts. Data analysis illustrates the three principal components that predominantly occur in librarians’ answers: stylometric richness, stylometric brevity and interpersonal support.

Findings

The results of the study have important implications in digital information services because stylometric features such as lexical richness, structural clarity and interpersonal support may interplay with the degree of complexity of user queries, the (a)synchronous communication mode, application of information service guideline and manuals and overall characteristics and quality of a given digital information service. Such interplay may bring forth a direct impact on user perceptions and satisfaction regarding interaction with librarians and the information service received through the computer-mediated communication channel.

Originality/value

To the best of the authors’ knowledge, the stylometric features encompassing lexical, syntactic, structural, sentiment and politeness using Principal Component Analysis have not been explored in digital information/reference services. Thus, there is an emergent need to explore more fully how linguistic stylometric features interplay with the types of user queries, the asynchronous online communication mode, application of information service guidelines and the quality of a particular digital information service.

Details

Global Knowledge, Memory and Communication, vol. 73 no. 3
Type: Research Article
ISSN: 2514-9342

Keywords

Article
Publication date: 7 July 2020

Ammara Zamir, Hikmat Ullah Khan, Waqar Mehmood, Tassawar Iqbal and Abubakker Usman Akram

This research study proposes a feature-centric spam email detection model (FSEDM) based on content, sentiment, semantic, user and spam-lexicon features set. The purpose of this…

Abstract

Purpose

This research study proposes a feature-centric spam email detection model (FSEDM) based on content, sentiment, semantic, user and spam-lexicon features set. The purpose of this study is to exploit the role of sentiment features along with other proposed features to evaluate the classification accuracy of machine learning algorithms for spam email detection.

Design/methodology/approach

Existing studies primarily exploits content-based feature engineering approach; however, a limited number of features is considered. In this regard, this research study proposed a feature-centric framework (FSEDM) based on existing and novel features of email data set, which are extracted after pre-processing. Afterwards, diverse supervised learning techniques are applied on the proposed features in conjunction with feature selection techniques such as information gain, gain ratio and Relief-F to rank most prominent features and classify the emails into spam or ham (not spam).

Findings

Analysis and experimental results indicated that the proposed model with sentiment analysis is competitive approach for spam email detection. Using the proposed model, deep neural network applied with sentiment features outperformed other classifiers in terms of classification accuracy up to 97.2%.

Originality/value

This research is novel in this regard that no previous research focuses on sentiment analysis in conjunction with other email features for detection of spam emails.

Details

The Electronic Library , vol. 38 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 4 October 2018

Maha Al-Yahya

In the context of information retrieval, text genre is as important as its content, and knowledge of the text genre enhances the search engine features by providing customized…

Abstract

Purpose

In the context of information retrieval, text genre is as important as its content, and knowledge of the text genre enhances the search engine features by providing customized retrieval. The purpose of this study is to explore and evaluate the use of stylometric analysis, a quantitative analysis for the linguistics features of text, to support the task of automated text genre detection for Classical Arabic text.

Design/methodology/approach

Unsupervised clustering and supervised classification were applied on the King Saud University Corpus of Classical Arabic texts (KSUCCA) using the most frequent words in the corpus (MFWs) as stylometric features. Four popular distance measures established in stylometric research are evaluated for the genre detection task.

Findings

The results of the experiments show that stylometry-based genre clustering and classification align well with human-defined genre. The evidence suggests that genre style signals exist for Classical Arabic and can be used to support the task of automated genre detection.

Originality/value

This work targets the task of genre detection in Classical Arabic text using stylometric features, an approach that has only been previously applied to Arabic authorship attribution. The study also provides a comparison of four distance measures used in stylomtreic analysis on the KSUCCA, a corpus with over 50 million words of Classical Arabic using clustering and classification.

Details

The Electronic Library, vol. 36 no. 5
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 18 April 2017

Mahmoud Al-Ayyoub, Ahmed Alwajeeh and Ismail Hmeidi

The authorship authentication (AA) problem is concerned with correctly attributing a text document to its corresponding author. Historically, this problem has been the focus of…

Abstract

Purpose

The authorship authentication (AA) problem is concerned with correctly attributing a text document to its corresponding author. Historically, this problem has been the focus of various studies focusing on the intuitive idea that each author has a unique style that can be captured using stylometric features (SF). Another approach to this problem, known as the bag-of-words (BOW) approach, uses keywords occurrences/frequencies in each document to identify its author. Unlike the first one, this approach is more language-independent. This paper aims to study and compare both approaches focusing on the Arabic language which is still largely understudied despite its importance.

Design/methodology/approach

Being a supervised learning problem, the authors start by collecting a very large data set of Arabic documents to be used for training and testing purposes. For the SF approach, they compute hundreds of SF, whereas, for the BOW approach, the popular term frequency-inverse document frequency technique is used. Both approaches are compared under various settings.

Findings

The results show that the SF approach, which is much cheaper to train, can generate more accurate results under most settings.

Practical implications

Numerous advantages of efficiently solving the AA problem are obtained in different fields of academia as well as the industry including literature, security, forensics, electronic markets and trading, etc. Another practical implication of this work is the public release of its sources. Specifically, some of the SF can be very useful for other problems such as sentiment analysis.

Originality/value

This is the first study of its kind to compare the SF and BOW approaches for authorship analysis of Arabic articles. Moreover, many of the computed SF are novel, while other features are inspired by the literature. As SF are language-dependent and most existing papers focus on English, extra effort must be invested to adapt such features to Arabic text.

Details

International Journal of Web Information Systems, vol. 13 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 15 October 2021

Erin Yirun Wang, Lawrence Hoc Nang Fong and Rob Law

This paper aims to examine the dynamics of emotional cues and cognitive cues in review fakeness. Additionally, the boundary condition (i.e. review valence) for the dynamics…

1384

Abstract

Purpose

This paper aims to examine the dynamics of emotional cues and cognitive cues in review fakeness. Additionally, the boundary condition (i.e. review valence) for the dynamics between emotional cues and cognitive cues is investigated.

Design/methodology/approach

This research conducted two studies, which analyzed restaurant and hotel reviews collected from Yelp.com. The authors adopted linguistic inquiry and word count 2015 to code review contents and tested the hypotheses using logistic regression.

Findings

Fake reviews contain more emotional cues compared with authentic reviews. Moreover, the dynamics of emotional cues and cognitive cues are salient among negative reviews.

Practical implications

This research provides implications to identify fake online reviews based on linguistic cues.

Originality/value

This research contributes to the literature by revealing the competition of mental resources between emotional and cognitive systems when deception is for harming others. Grounded in interpersonal deception theory, this paper investigates the interactive effect and complements the literature, which mainly used emotional cues and cognitive cues individually to detect fake reviews.

Details

International Journal of Contemporary Hospitality Management, vol. 34 no. 1
Type: Research Article
ISSN: 0959-6119

Keywords

Article
Publication date: 4 June 2020

Antonia Michael and Jan Eloff

Malicious activities conducted by disgruntled employees via an email platform can cause profound damage to an organization such as financial and reputational losses. This threat…

Abstract

Purpose

Malicious activities conducted by disgruntled employees via an email platform can cause profound damage to an organization such as financial and reputational losses. This threat is known as an “Insider IT Sabotage” threat. This involves employees misusing their access rights to harm the organization. Events leading up to the attack are not technical but rather behavioural. The problem is that owing to the high volume and complexity of emails, the risk of insider IT sabotage cannot be diminished with rule-based approaches.

Design/methodology/approach

Malicious human behaviours that insiders within the insider IT sabotage category would possess are studied and mapped to phrases that would appear in email communications. A large email data set is classified according to behavioural characteristics of these employees. Machine learning algorithms are used to identify occurrences of this insider threat type. The accuracy of these approaches is measured.

Findings

It is shown in this paper that suspicious behaviour of disgruntled employees can be discovered, by means of machine intelligence techniques. The output of the machine learning classifier depends mainly on the depth and quality of the phrases and behaviour analysis, cleansing and number of email attributes examined. This process of labelling content in isolation could be improved if other attributes of the email data are included, such that a confidence score can be computed for each user.

Originality/value

This research presents a novel approach to show that the creation of a prototype that can automate the detection of insider IT sabotage within email systems to mitigate the risk within organizations.

Details

Information & Computer Security, vol. 28 no. 4
Type: Research Article
ISSN: 2056-4961

Keywords

Article
Publication date: 17 March 2020

Hossein Dehdarirad, Javad Ghazimirsaeid and Ammar Jalalimanesh

The purpose of this investigation is to identify, evaluate, integrate and summarize relevant and qualified papers through conducting a systematic literature review (SLR) on the…

Abstract

Purpose

The purpose of this investigation is to identify, evaluate, integrate and summarize relevant and qualified papers through conducting a systematic literature review (SLR) on the application of recommender systems (RSs) to suggest a scholarly publication venue for researcher's paper.

Design/methodology/approach

To identify the relevant papers published up to August 11, 2018, an SLR study on four databases (Scopus, Web of Science, IEEE Xplore and ScienceDirect) was conducted. We pursued the guidelines presented by Kitchenham and Charters (2007) for performing SLRs in software engineering. The papers were analyzed based on data sources, RSs classes, techniques/methods/algorithms, datasets, evaluation methodologies and metrics, as well as future directions.

Findings

A total of 32 papers were identified. The most data sources exploited in these papers were textual (title/abstract/keywords) and co-authorship data. The RS classes in the selected papers were almost equally used. DBLP was the main dataset utilized. Cosine similarity, social network analysis (SNA) and term frequency–inverse document frequency (TF–IDF) algorithm were frequently used. In terms of evaluation methodologies, 24 papers applied only offline evaluations. Furthermore, precision, accuracy and recall metrics were the popular performance metrics. In the reviewed papers, “use more datasets” and “new algorithms” were frequently mentioned in the future work part as well as conclusions.

Originality/value

Given that a review study has not been conducted in this area, this paper can provide an insight into the current status in this area and may also contribute to future research in this field.

Details

Data Technologies and Applications, vol. 54 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 22 June 2023

Chiara Alzetta, Felice Dell'Orletta, Alessio Miaschi, Elena Prat and Giulia Venturi

The authors’ goal is to investigate variations in the writing style of book reviews published on different social reading platforms and referring to books of different genres…

Abstract

Purpose

The authors’ goal is to investigate variations in the writing style of book reviews published on different social reading platforms and referring to books of different genres, which enables acquiring insights into communication strategies adopted by readers to share their reading experiences.

Design/methodology/approach

The authors propose a corpus-based study focused on the analysis of A Good Review, a novel corpus of online book reviews written in Italian, posted on Amazon and Goodreads, and covering six literary fiction genres. The authors rely on stylometric analysis to explore the linguistic properties and lexicon of reviews and the authors conducted automatic classification experiments using multiple approaches and feature configurations to predict either the review's platform or the literary genre.

Findings

The analysis of user-generated reviews demonstrates that language is a quite variable dimension across reading platforms, but not as much across book genres. The classification experiments revealed that features modelling the syntactic structure of the sentence are reliable proxies for discerning Amazon and Goodreads reviews, whereas lexical information showed a higher predictive role for automatically discriminating the genre.

Originality/value

The high availability of cultural products makes information services necessary to help users navigate these resources and acquire information from unstructured data. This study contributes to a better understanding of the linguistic characteristics of user-generated book reviews, which can support the development of linguistically-informed recommendation services. Additionally, the authors release a novel corpus of online book reviews meant to support the reproducibility and advancements of the research.

Details

Journal of Documentation, vol. 80 no. 1
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 10 March 2022

Jayaram Boga and Dhilip Kumar V.

For achieving the profitable human activity recognition (HAR) method, this paper solves the HAR problem under wireless body area network (WBAN) using a developed ensemble learning…

103

Abstract

Purpose

For achieving the profitable human activity recognition (HAR) method, this paper solves the HAR problem under wireless body area network (WBAN) using a developed ensemble learning approach. The purpose of this study is,to solve the HAR problem under WBAN using a developed ensemble learning approach for achieving the profitable HAR method. There are three data sets used for this HAR in WBAN, namely, human activity recognition using smartphones, wireless sensor data mining and Kaggle. The proposed model undergoes four phases, namely, “pre-processing, feature extraction, feature selection and classification.” Here, the data can be preprocessed by artifacts removal and median filtering techniques. Then, the features are extracted by techniques such as “t-Distributed Stochastic Neighbor Embedding”, “Short-time Fourier transform” and statistical approaches. The weighted optimal feature selection is considered as the next step for selecting the important features based on computing the data variance of each class. This new feature selection is achieved by the hybrid coyote Jaya optimization (HCJO). Finally, the meta-heuristic-based ensemble learning approach is used as a new recognition approach with three classifiers, namely, “support vector machine (SVM), deep neural network (DNN) and fuzzy classifiers.” Experimental analysis is performed.

Design/methodology/approach

The proposed HCJO algorithm was developed for optimizing the membership function of fuzzy, iteration limit of SVM and hidden neuron count of DNN for getting superior classified outcomes and to enhance the performance of ensemble classification.

Findings

The accuracy for enhanced HAR model was pretty high in comparison to conventional models, i.e. higher than 6.66% to fuzzy, 4.34% to DNN, 4.34% to SVM, 7.86% to ensemble and 6.66% to Improved Sealion optimization algorithm-Attention Pyramid-Convolutional Neural Network-AP-CNN, respectively.

Originality/value

The suggested HAR model with WBAN using HCJO algorithm is accurate and improves the effectiveness of the recognition.

Details

International Journal of Pervasive Computing and Communications, vol. 19 no. 4
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 27 July 2022

Piyush Katariya, Vedika Gupta, Rohan Arora, Adarsh Kumar, Shreya Dhingra, Qin Xin and Jude Hemanth

The current natural language processing algorithms are still lacking in judgment criteria, and these approaches often require deep knowledge of political or social contexts…

Abstract

Purpose

The current natural language processing algorithms are still lacking in judgment criteria, and these approaches often require deep knowledge of political or social contexts. Seeing the damage done by the spreading of fake news in various sectors have attracted the attention of several low-level regional communities. However, such methods are widely developed for English language and low-resource languages remain unfocused. This study aims to provide analysis of Hindi fake news and develop a referral system with advanced techniques to identify fake news in Hindi.

Design/methodology/approach

The technique deployed in this model uses bidirectional long short-term memory (B-LSTM) as compared with other models like naïve bayes, logistic regression, random forest, support vector machine, decision tree classifier, kth nearest neighbor, gated recurrent unit and long short-term models.

Findings

The deep learning model such as B-LSTM yields an accuracy of 95.01%.

Originality/value

This study anticipates that this model will be a beneficial resource for building technologies to prevent the spreading of fake news and contribute to research with low resource languages.

Details

International Journal of Web Information Systems, vol. 18 no. 5/6
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 10 of 26