Search results

1 – 10 of 25
Open Access
Article
Publication date: 23 November 2023

Reema Khaled AlRowais and Duaa Alsaeed

Automatically extracting stance information from natural language texts is a significant research problem with various applications, particularly after the recent explosion of…

243

Abstract

Purpose

Automatically extracting stance information from natural language texts is a significant research problem with various applications, particularly after the recent explosion of data on the internet via platforms like social media sites. Stance detection system helps determine whether the author agree, against or has a neutral opinion with the given target. Most of the research in stance detection focuses on the English language, while few research was conducted on the Arabic language.

Design/methodology/approach

This paper aimed to address stance detection on Arabic tweets by building and comparing different stance detection models using four transformers, namely: Araelectra, MARBERT, AraBERT and Qarib. Using different weights for these transformers, the authors performed extensive experiments fine-tuning the task of stance detection Arabic tweets with the four different transformers.

Findings

The results showed that the AraBERT model learned better than the other three models with a 70% F1 score followed by the Qarib model with a 68% F1 score.

Research limitations/implications

A limitation of this study is the imbalanced dataset and the limited availability of annotated datasets of SD in Arabic.

Originality/value

Provide comprehensive overview of the current resources for stance detection in the literature, including datasets and machine learning methods used. Therefore, the authors examined the models to analyze and comprehend the obtained findings in order to make recommendations for the best performance models for the stance detection task.

Details

Arab Gulf Journal of Scientific Research, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1985-9899

Keywords

Open Access
Article
Publication date: 7 September 2021

Ema Utami, Irwan Oyong, Suwanto Raharjo, Anggit Dwi Hartanto and Sumarni Adi

Gathering knowledge regarding personality traits has long been the interest of academics and researchers in the fields of psychology and in computer science. Analyzing profile…

2821

Abstract

Purpose

Gathering knowledge regarding personality traits has long been the interest of academics and researchers in the fields of psychology and in computer science. Analyzing profile data from personal social media accounts reduces data collection time, as this method does not require users to fill any questionnaires. A pure natural language processing (NLP) approach can give decent results, and its reliability can be improved by combining it with machine learning (as shown by previous studies).

Design/methodology/approach

In this, cleaning the dataset and extracting relevant potential features “as assessed by psychological experts” are essential, as Indonesians tend to mix formal words, non-formal words, slang and abbreviations when writing social media posts. For this article, raw data were derived from a predefined dominance, influence, stability and conscientious (DISC) quiz website, returning 316,967 tweets from 1,244 Twitter accounts “filtered to include only personal and Indonesian-language accounts”. Using a combination of NLP techniques and machine learning, the authors aim to develop a better approach and more robust model, especially for the Indonesian language.

Findings

The authors find that employing a SMOTETomek re-sampling technique and hyperparameter tuning boosts the model’s performance on formalized datasets by 57% (as measured through the F1-score).

Originality/value

The process of cleaning dataset and extracting relevant potential features assessed by psychological experts from it are essential because Indonesian people tend to mix formal words, non-formal words, slang words and abbreviations when writing tweets. Organic data derived from a predefined DISC quiz website resulting 1244 records of Twitter accounts and 316.967 tweets.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 9 May 2022

Khalid Iqbal and Muhammad Shehrayar Khan

In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed.

9226

Abstract

Purpose

In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed.

Design/methodology/approach

Researchers contribute to solving this problem by a focus on advanced machine learning algorithms and improved models for detecting spam emails but there is still a gap in features. To achieve good results, features also play an important role. To evaluate the performance of applied classifiers, 10-fold cross-validation is used.

Findings

The results approve that the spam emails are correctly classified with the accuracy of 98.00% for the Support Vector Machine and 98.06% for the Artificial Neural Network as compared to other applied machine learning classifiers.

Originality/value

In this paper, Point-Biserial correlation is applied to each feature concerning the class label of the University of California Irvine (UCI) spambase email dataset to select the best features. Extensive experiments are conducted on selected features by training the different classifiers.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 12 October 2021

Kiran Fahd, Shah Jahan Miah and Khandakar Ahmed

Student attritions in tertiary educational institutes may play a significant role to achieve core values leading towards strategic mission and financial well-being. Analysis of…

3775

Abstract

Purpose

Student attritions in tertiary educational institutes may play a significant role to achieve core values leading towards strategic mission and financial well-being. Analysis of data generated from student interaction with learning management systems (LMSs) in blended learning (BL) environments may assist with the identification of students at risk of failing, but to what extent this may be possible is unknown. However, existing studies are limited to address the issues at a significant scale.

Design/methodology/approach

This study develops a new approach harnessing applications of machine learning (ML) models on a dataset, that is publicly available, relevant to student attrition to identify potential students at risk. The dataset consists of the data generated by the interaction of students with LMS for their BL environment.

Findings

Identifying students at risk through an innovative approach will promote timely intervention in the learning process, such as for improving student academic progress. To evaluate the performance of the proposed approach, the accuracy is compared with other representational ML methods.

Originality/value

The best ML algorithm random forest with 85% is selected to support educators in implementing various pedagogical practices to improve students’ learning.

Open Access
Article
Publication date: 27 February 2024

Oscar F. Bustinza, Luis M. Molina Fernandez and Marlene Mendoza Macías

Machine learning (ML) analytical tools are increasingly being considered as an alternative quantitative methodology in management research. This paper proposes a new approach for…

Abstract

Purpose

Machine learning (ML) analytical tools are increasingly being considered as an alternative quantitative methodology in management research. This paper proposes a new approach for uncovering the antecedents behind product and product–service innovation (PSI).

Design/methodology/approach

The ML approach is novel in the field of innovation antecedents at the country level. A sample of the Equatorian National Survey on Technology and Innovation, consisting of more than 6,000 firms, is used to rank the antecedents of innovation.

Findings

The analysis reveals that the antecedents of product and PSI are distinct, yet rooted in the principles of open innovation and competitive priorities.

Research limitations/implications

The analysis is based on a sample of Equatorian firms with the objective of showing how ML techniques are suitable for testing the antecedents of innovation in any other context.

Originality/value

The novel ML approach, in contrast to traditional quantitative analysis of the topic, can consider the full set of antecedent interactions to each of the innovations analyzed.

Details

Journal of Enterprise Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1741-0398

Keywords

Open Access
Article
Publication date: 27 November 2023

Reshmy Krishnan, Shantha Kumari, Ali Al Badi, Shermina Jeba and Menila James

Students pursuing different professional courses at the higher education level during 2021–2022 saw the first-time occurrence of a pandemic in the form of coronavirus disease 2019…

Abstract

Purpose

Students pursuing different professional courses at the higher education level during 2021–2022 saw the first-time occurrence of a pandemic in the form of coronavirus disease 2019 (COVID-19), and their mental health was affected. Many works are available in the literature to assess mental health severity. However, it is necessary to identify the affected students early for effective treatment.

Design/methodology/approach

Predictive analytics, a part of machine learning (ML), helps with early identification based on mental health severity levels to aid clinical psychologists. As a case study, engineering and medical course students were comparatively analysed in this work as they have rich course content and a stricter evaluation process than other streams. The methodology includes an online survey that obtains demographic details, academic qualifications, family details, etc. and anxiety and depression questions using the Hospital Anxiety and Depression Scale (HADS). The responses acquired through social media networks are analysed using ML algorithms – support vector machines (SVMs) (robust handling of health information) and J48 decision tree (DT) (interpretability/comprehensibility). Also, random forest is used to identify the predictors for anxiety and depression.

Findings

The results show that the support vector classifier produces outperforming results with classification accuracy of 100%, 1.0 precision and 1.0 recall, followed by the J48 DT classifier with 96%. It was found that medical students are affected by anxiety and depression marginally more when compared with engineering students.

Research limitations/implications

The entire work is dependent on the social media-displayed online questionnaire, and the participants were not met in person. This indicates that the response rate could not be evaluated appropriately. Due to the medical restrictions imposed by COVID-19, which remain in effect in 2022, this is the only method found to collect primary data from college students. Additionally, students self-selected themselves to participate in this survey, which raises the possibility of selection bias.

Practical implications

The responses acquired through social media networks are analysed using ML algorithms. This will be a big support for understanding the mental issues of the students due to COVID-19 and can taking appropriate actions to rectify them. This will improve the quality of the learning process in higher education in Oman.

Social implications

Furthermore, this study aims to provide recommendations for mental health screening as a regular practice in educational institutions to identify undetected students.

Originality/value

Comparing the mental health issues of two professional course students is the novelty of this work. This is needed because both studies require practical learning, long hours of work, etc.

Details

Arab Gulf Journal of Scientific Research, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1985-9899

Keywords

Open Access
Article
Publication date: 14 December 2021

Mariam Elhussein and Samiha Brahimi

This paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile…

Abstract

Purpose

This paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile classification. The method is demonstrated through the problem of sick-leave promoters on Twitter.

Design/methodology/approach

Four machine learning classifiers were used on a total of 35,578 tweets posted on Twitter. The data were manually labeled into two categories: promoter and nonpromoter. Classification performance was compared when the proposed clustering feature selection approach and the standard feature selection were applied.

Findings

Radom forest achieved the highest accuracy of 95.91% higher than similar work compared. Furthermore, using clustering as a feature selection method improved the Sensitivity of the model from 73.83% to 98.79%. Sensitivity (recall) is the most important measure of classifier performance when detecting promoters’ accounts that have spam-like behavior.

Research limitations/implications

The method applied is novel, more testing is needed in other datasets before generalizing its results.

Practical implications

The model applied can be used by Saudi authorities to report on the accounts that sell sick-leaves online.

Originality/value

The research is proposing a new way textual clustering can be used in feature selection.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 21 June 2022

Abhishek Das and Mihir Narayan Mohanty

In time and accurate detection of cancer can save the life of the person affected. According to the World Health Organization (WHO), breast cancer occupies the most frequent…

Abstract

Purpose

In time and accurate detection of cancer can save the life of the person affected. According to the World Health Organization (WHO), breast cancer occupies the most frequent incidence among all the cancers whereas breast cancer takes fifth place in the case of mortality numbers. Out of many image processing techniques, certain works have focused on convolutional neural networks (CNNs) for processing these images. However, deep learning models are to be explored well.

Design/methodology/approach

In this work, multivariate statistics-based kernel principal component analysis (KPCA) is used for essential features. KPCA is simultaneously helpful for denoising the data. These features are processed through a heterogeneous ensemble model that consists of three base models. The base models comprise recurrent neural network (RNN), long short-term memory (LSTM) and gated recurrent unit (GRU). The outcomes of these base learners are fed to fuzzy adaptive resonance theory mapping (ARTMAP) model for decision making as the nodes are added to the F_2ˆa layer if the winning criteria are fulfilled that makes the ARTMAP model more robust.

Findings

The proposed model is verified using breast histopathology image dataset publicly available at Kaggle. The model provides 99.36% training accuracy and 98.72% validation accuracy. The proposed model utilizes data processing in all aspects, i.e. image denoising to reduce the data redundancy, training by ensemble learning to provide higher results than that of single models. The final classification by a fuzzy ARTMAP model that controls the number of nodes depending upon the performance makes robust accurate classification.

Research limitations/implications

Research in the field of medical applications is an ongoing method. More advanced algorithms are being developed for better classification. Still, the scope is there to design the models in terms of better performance, practicability and cost efficiency in the future. Also, the ensemble models may be chosen with different combinations and characteristics. Only signal instead of images may be verified for this proposed model. Experimental analysis shows the improved performance of the proposed model. This method needs to be verified using practical models. Also, the practical implementation will be carried out for its real-time performance and cost efficiency.

Originality/value

The proposed model is utilized for denoising and to reduce the data redundancy so that the feature selection is done using KPCA. Training and classification are performed using heterogeneous ensemble model designed using RNN, LSTM and GRU as base classifiers to provide higher results than that of single models. Use of adaptive fuzzy mapping model makes the final classification accurate. The effectiveness of combining these methods to a single model is analyzed in this work.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 2 April 2024

Koraljka Golub, Osma Suominen, Ahmed Taiye Mohammed, Harriet Aagaard and Olof Osterman

In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an…

Abstract

Purpose

In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an open source software package on a large set of Swedish union catalogue metadata records, with Dewey Decimal Classification (DDC) as the target classification system. It also aimed to contribute to the body of research on aboutness and related challenges in automated subject indexing and evaluation.

Design/methodology/approach

On a sample of over 230,000 records with close to 12,000 distinct DDC classes, an open source tool Annif, developed by the National Library of Finland, was applied in the following implementations: lexical algorithm, support vector classifier, fastText, Omikuji Bonsai and an ensemble approach combing the former four. A qualitative study involving two senior catalogue librarians and three students of library and information studies was also conducted to investigate the value and inter-rater agreement of automatically assigned classes, on a sample of 60 records.

Findings

The best results were achieved using the ensemble approach that achieved 66.82% accuracy on the three-digit DDC classification task. The qualitative study confirmed earlier studies reporting low inter-rater agreement but also pointed to the potential value of automatically assigned classes as additional access points in information retrieval.

Originality/value

The paper presents an extensive study of automated classification in an operative library catalogue, accompanied by a qualitative study of automated classes. It demonstrates the value of applying semi-automated indexing in operative information retrieval systems.

Details

Journal of Documentation, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 25 October 2022

Heitor Hoffman Nakashima, Daielly Mantovani and Celso Machado Junior

This paper aims to investigate whether professional data analysts’ trust of black-box systems is increased by explainability artifacts.

1015

Abstract

Purpose

This paper aims to investigate whether professional data analysts’ trust of black-box systems is increased by explainability artifacts.

Design/methodology/approach

The study was developed in two phases. First a black-box prediction model was estimated using artificial neural networks, and local explainability artifacts were estimated using local interpretable model-agnostic explanations (LIME) algorithms. In the second phase, the model and explainability outcomes were presented to a sample of data analysts from the financial market and their trust of the models was measured. Finally, interviews were conducted in order to understand their perceptions regarding black-box models.

Findings

The data suggest that users’ trust of black-box systems is high and explainability artifacts do not influence this behavior. The interviews reveal that the nature and complexity of the problem a black-box model addresses influences the users’ perceptions, trust being reduced in situations that represent a threat (e.g. autonomous cars). Concerns about the models’ ethics were also mentioned by the interviewees.

Research limitations/implications

The study considered a small sample of professional analysts from the financial market, which traditionally employs data analysis techniques for credit and risk analysis. Research with personnel in other sectors might reveal different perceptions.

Originality/value

Other studies regarding trust in black-box models and explainability artifacts have focused on ordinary users, with little or no knowledge of data analysis. The present research focuses on expert users, which provides a different perspective and shows that, for them, trust is related to the quality of data and the nature of the problem being solved, as well as the practical consequences. Explanation of the algorithm mechanics itself is not significantly relevant.

Details

Revista de Gestão, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1809-2276

Keywords

Access

Only content I have access to

Year

Content type

Earlycite article (25)
1 – 10 of 25