Search results

1 – 10 of over 3000
Article
Publication date: 28 March 2023

Antonijo Marijić and Marina Bagić Babac

Genre classification of songs based on lyrics is a challenging task even for humans, however, state-of-the-art natural language processing has recently offered advanced solutions…

Abstract

Purpose

Genre classification of songs based on lyrics is a challenging task even for humans, however, state-of-the-art natural language processing has recently offered advanced solutions to this task. The purpose of this study is to advance the understanding and application of natural language processing and deep learning in the domain of music genre classification, while also contributing to the broader themes of global knowledge and communication, and sustainable preservation of cultural heritage.

Design/methodology/approach

The main contribution of this study is the development and evaluation of various machine and deep learning models for song genre classification. Additionally, we investigated the effect of different word embeddings, including Global Vectors for Word Representation (GloVe) and Word2Vec, on the classification performance. The tested models range from benchmarks such as logistic regression, support vector machine and random forest, to more complex neural network architectures and transformer-based models, such as recurrent neural network, long short-term memory, bidirectional long short-term memory and bidirectional encoder representations from transformers (BERT).

Findings

The authors conducted experiments on both English and multilingual data sets for genre classification. The results show that the BERT model achieved the best accuracy on the English data set, whereas cross-lingual language model pretraining based on RoBERTa (XLM-RoBERTa) performed the best on the multilingual data set. This study found that songs in the metal genre were the most accurately labeled, as their text style and topics were the most distinct from other genres. On the contrary, songs from the pop and rock genres were more challenging to differentiate. This study also compared the impact of different word embeddings on the classification task and found that models with GloVe word embeddings outperformed Word2Vec and the learning embedding layer.

Originality/value

This study presents the implementation, testing and comparison of various machine and deep learning models for genre classification. The results demonstrate that transformer models, including BERT, robustly optimized BERT pretraining approach, distilled bidirectional encoder representations from transformers, bidirectional and auto-regressive transformers and XLM-RoBERTa, outperformed other models.

Details

Global Knowledge, Memory and Communication, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9342

Keywords

Article
Publication date: 19 August 2022

Anjali More and Dipti Rana

Referred data set produces reliable information about the network flows and common attacks meeting with real-world criteria. Accordingly, this study aims to focus on the use of…

Abstract

Purpose

Referred data set produces reliable information about the network flows and common attacks meeting with real-world criteria. Accordingly, this study aims to focus on the use of imbalanced intrusion detection benchmark knowledge discovery in database (KDD) data set. KDD data set is most preferably used by many researchers for experimentation and analysis. The proposed algorithm improvised random forest classification with error tuning factors (IRFCETF) deals with experimentation on KDD data set and evaluates the performance of a complete set of network traffic features through IRFCETF.

Design/methodology/approach

In the current era of applications, the attention of researchers is immersed by a diverse number of existing time applications that deals with imbalanced data classification (ImDC). Real-time application areas, artificial intelligence (AI), Industrial Internet of Things (IIoT), etc. are dealing ImDC undergo with diverted classification performance due to skewed data distribution (SkDD). There are numerous application areas that deal with SkDD. Many of the data applications in AI and IIoT face the diverted data classification rate in SkDD. In recent advancements, there is an exponential expansion in the volume of computer network data and related application developments. Intrusion detection is one of the demanding applications of ImDC. The proposed study focusses on imbalanced intrusion benchmark data set, KDD data set and other benchmark data set with the proposed IRFCETF approach. IRFCETF justifies the enriched classification performance on imbalanced data set over the existing approach. The purpose of this work is to review imbalanced data applications in numerous application areas including AI and IIoT and tuning the performance with respect to principal component analysis. This study also focusses on the out-of-bag error performance-tuning factor.

Findings

Experimental results on KDD data set shows that proposed algorithm gives enriched performance. For referred intrusion detection data set, IRFCETF classification accuracy is 99.57% and error rate is 0.43%.

Research limitations/implications

This research work extended for further improvements in classification techniques with multiple correspondence analysis (MCA); hierarchical MCA can be focussed with the use of classification models for wide range of skewed data sets.

Practical implications

The metrics enhancement is measurable and helpful in dealing with intrusion detection systems–related imbalanced applications in current application domains such as security, AI and IIoT digitization. Analytical results show improvised metrics of the proposed approach than other traditional machine learning algorithms. Thus, error-tuning parameter creates a measurable impact on classification accuracy is justified with the proposed IRFCETF.

Social implications

Proposed algorithm is useful in numerous IIoT applications such as health care, machinery automation etc.

Originality/value

This research work addressed classification metric enhancement approach IRFCETF. The proposed method yields a test set categorization for each case with error reduction mechanism.

Details

International Journal of Pervasive Computing and Communications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 3 September 2024

Biplab Bhattacharjee, Kavya Unni and Maheshwar Pratap

Product returns are a major challenge for e-businesses as they involve huge logistical and operational costs. Therefore, it becomes crucial to predict returns in advance. This…

Abstract

Purpose

Product returns are a major challenge for e-businesses as they involve huge logistical and operational costs. Therefore, it becomes crucial to predict returns in advance. This study aims to evaluate different genres of classifiers for product return chance prediction, and further optimizes the best performing model.

Design/methodology/approach

An e-commerce data set having categorical type attributes has been used for this study. Feature selection based on chi-square provides a selective features-set which is used as inputs for model building. Predictive models are attempted using individual classifiers, ensemble models and deep neural networks. For performance evaluation, 75:25 train/test split and 10-fold cross-validation strategies are used. To improve the predictability of the best performing classifier, hyperparameter tuning is performed using different optimization methods such as, random search, grid search, Bayesian approach and evolutionary models (genetic algorithm, differential evolution and particle swarm optimization).

Findings

A comparison of F1-scores revealed that the Bayesian approach outperformed all other optimization approaches in terms of accuracy. The predictability of the Bayesian-optimized model is further compared with that of other classifiers using experimental analysis. The Bayesian-optimized XGBoost model possessed superior performance, with accuracies of 77.80% and 70.35% for holdout and 10-fold cross-validation methods, respectively.

Research limitations/implications

Given the anonymized data, the effects of individual attributes on outcomes could not be investigated in detail. The Bayesian-optimized predictive model may be used in decision support systems, enabling real-time prediction of returns and the implementation of preventive measures.

Originality/value

There are very few reported studies on predicting the chance of order return in e-businesses. To the best of the authors’ knowledge, this study is the first to compare different optimization methods and classifiers, demonstrating the superiority of the Bayesian-optimized XGBoost classification model for returns prediction.

Details

Journal of Systems and Information Technology, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1328-7265

Keywords

Article
Publication date: 25 January 2024

Yaolin Zhou, Zhaoyang Zhang, Xiaoyu Wang, Quanzheng Sheng and Rongying Zhao

The digitalization of archival management has rapidly developed with the maturation of digital technology. With data's exponential growth, archival resources have transitioned…

Abstract

Purpose

The digitalization of archival management has rapidly developed with the maturation of digital technology. With data's exponential growth, archival resources have transitioned from single modalities, such as text, images, audio and video, to integrated multimodal forms. This paper identifies key trends, gaps and areas of focus in the field. Furthermore, it proposes a theoretical organizational framework based on deep learning to address the challenges of managing archives in the era of big data.

Design/methodology/approach

Via a comprehensive systematic literature review, the authors investigate the field of multimodal archive resource organization and the application of deep learning techniques in archive organization. A systematic search and filtering process is conducted to identify relevant articles, which are then summarized, discussed and analyzed to provide a comprehensive understanding of existing literature.

Findings

The authors' findings reveal that most research on multimodal archive resources predominantly focuses on aspects related to storage, management and retrieval. Furthermore, the utilization of deep learning techniques in image archive retrieval is increasing, highlighting their potential for enhancing image archive organization practices; however, practical research and implementation remain scarce. The review also underscores gaps in the literature, emphasizing the need for more practical case studies and the application of theoretical concepts in real-world scenarios. In response to these insights, the authors' study proposes an innovative deep learning-based organizational framework. This proposed framework is designed to navigate the complexities inherent in managing multimodal archive resources, representing a significant stride toward more efficient and effective archival practices.

Originality/value

This study comprehensively reviews the existing literature on multimodal archive resources organization. Additionally, a theoretical organizational framework based on deep learning is proposed, offering a novel perspective and solution for further advancements in the field. These insights contribute theoretically and practically, providing valuable knowledge for researchers, practitioners and archivists involved in organizing multimodal archive resources.

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-3806

Keywords

Open Access
Article
Publication date: 15 June 2021

Leila Ismail and Huned Materwala

Machine Learning is an intelligent methodology used for prediction and has shown promising results in predictive classifications. One of the critical areas in which machine…

2324

Abstract

Purpose

Machine Learning is an intelligent methodology used for prediction and has shown promising results in predictive classifications. One of the critical areas in which machine learning can save lives is diabetes prediction. Diabetes is a chronic disease and one of the 10 causes of death worldwide. It is expected that the total number of diabetes will be 700 million in 2045; a 51.18% increase compared to 2019. These are alarming figures, and therefore, it becomes an emergency to provide an accurate diabetes prediction.

Design/methodology/approach

Health professionals and stakeholders are striving for classification models to support prognosis of diabetes and formulate strategies for prevention. The authors conduct literature review of machine models and propose an intelligent framework for diabetes prediction.

Findings

The authors provide critical analysis of machine learning models, propose and evaluate an intelligent machine learning-based architecture for diabetes prediction. The authors implement and evaluate the decision tree (DT)-based random forest (RF) and support vector machine (SVM) learning models for diabetes prediction as the mostly used approaches in the literature using our framework.

Originality/value

This paper provides novel intelligent diabetes mellitus prediction framework (IDMPF) using machine learning. The framework is the result of a critical examination of prediction models in the literature and their application to diabetes. The authors identify the training methodologies, models evaluation strategies, the challenges in diabetes prediction and propose solutions within the framework. The research results can be used by health professionals, stakeholders, students and researchers working in the diabetes prediction area.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 7 August 2024

Funda Demir

The energy generation process through photovoltaic (PV) panels is contingent upon uncontrollable variables such as wind patterns, cloud cover, temperatures, solar irradiance…

Abstract

Purpose

The energy generation process through photovoltaic (PV) panels is contingent upon uncontrollable variables such as wind patterns, cloud cover, temperatures, solar irradiance intensity and duration of exposure. Fluctuations in these variables can lead to interruptions in power generation and losses in output. This study aims to establish a measurement setup that enables monitoring, tracking and prediction of the generated energy in a PV energy system to ensure overall system security and stability. Toward this goal, data pertaining to the PV energy system is measured and recorded in real-time independently of location. Subsequently, the recorded data is used for power prediction.

Design/methodology/approach

Data obtained from the experimental setup include voltage and current values of the PV panel, battery and load; temperature readings of the solar panel surface, environment and the battery; and measurements of humidity, pressure and radiation values in the panel’s environment. These data were monitored and recorded in real-time through a computer interface and mobile interface enabling remote access. For prediction purposes, machine learning methods, including the gradient boosting regressor (GBR), support vector machine (SVM) and k-nearest neighbors (k-NN) algorithms, have been selected. The resulting outputs have been interpreted through graphical representations. For the numerical interpretation of the obtained predictive data, performance measurement criteria such as mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE) and R-squared (R2) have been used.

Findings

It has been determined that the most successful prediction model is k-NN, whereas the prediction model with the lowest performance is SVM. According to the accuracy performance comparison conducted on the test data, k-NN exhibits the highest accuracy rate of 82%, whereas the accuracy rate for the GBR algorithm is 80%, and the accuracy rate for the SVM algorithm is 72%.

Originality/value

The experimental setup used in this study, including the measurement and monitoring apparatus, has been specifically designed for this research. The system is capable of remote monitoring both through a computer interface and a custom-developed mobile application. Measurements were conducted on the Karabük University campus, thereby revealing the energy potential of the Karabük province. This system serves as an exemplary study and can be deployed to any desired location for remote monitoring. Numerous methods and techniques exist for power prediction. In this study, contemporary machine learning techniques, which are pertinent to power prediction, have been used, and their performances are presented comparatively.

Details

World Journal of Engineering, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1708-5284

Keywords

Article
Publication date: 4 July 2024

Tirth Patel, Brian H.W. Guo, Jacobus Daniel van der Walt and Yang Zou

Current solutions for monitoring the progress of pavement construction (such as collecting, processing and analysing data) are inefficient, labour-intensive, time-consuming…

Abstract

Purpose

Current solutions for monitoring the progress of pavement construction (such as collecting, processing and analysing data) are inefficient, labour-intensive, time-consuming, tedious and error-prone. In this study, an automated solution proposes sensors prototype mounted unmanned ground vehicle (UGV) for data collection, an LSTM classifier for road layer detection, the integrated algorithm for as-built progress calculation and web-based as-built reporting.

Design/methodology/approach

The crux of the proposed solution, the road layer detection model, is proposed to develop from the layer change detection model and rule-based reasoning. In the beginning, data were gathered using a UGV with a laser ToF (time-of-flight) distance sensor, accelerometer, gyroscope and GPS sensor in a controlled environment. The long short-term memory (LSTM) algorithm was utilised on acquired data to develop a classifier model for layer change detection, such as layer not changed, layer up and layer down.

Findings

In controlled environment experiments, the classification of road layer changes achieved 94.35% test accuracy with 14.05% loss. Subsequently, the proposed approach, including the layer detection model, as-built measurement algorithm and reporting, was successfully implemented with a real case study to test the robustness of the model and measure the as-built progress.

Research limitations/implications

The implementation of the proposed framework can allow continuous, real-time monitoring of road construction projects, eliminating the need for manual, time-consuming methods. This study will potentially help the construction industry in the real time decision-making process of construction progress monitoring and controlling action.

Originality/value

This first novel approach marks the first utilization of sensors mounted UGV for monitoring road construction progress, filling a crucial research gap in incremental and segment-wise construction monitoring and offering a solution that addresses challenges faced by Unmanned Aerial Vehicles (UAVs) and 3D reconstruction. Utilizing UGVs offers advantages like cost-effectiveness, safety and operational flexibility in no-fly zones.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Open Access
Article
Publication date: 18 November 2021

Shin'ichiro Ishikawa

Using a newly compiled corpus module consisting of utterances from Asian learners during L2 English interviews, this study examined how Asian EFL learners' L1s (Chinese…

1175

Abstract

Purpose

Using a newly compiled corpus module consisting of utterances from Asian learners during L2 English interviews, this study examined how Asian EFL learners' L1s (Chinese, Indonesian, Japanese, Korean, Taiwanese and Thai), their L2 proficiency levels (A2, B1 low, B1 upper and B2+) and speech task types (picture descriptions, roleplays and QA-based conversations) affected four aspects of vocabulary usage (number of tokens, standardized type/token ratio, mean word length and mean sentence length).

Design/methodology/approach

Four aspects concern speech fluency, lexical richness, lexical complexity and structural complexity, respectively.

Findings

Subsequent corpus-based quantitative data analyses revealed that (1) learner/native speaker differences existed during the conversation and roleplay tasks in terms of the number of tokens, type/token ratio and sentence length; (2) an L1 group effect existed in all three task types in terms of the number of tokens and sentence length; (3) an L2 proficiency effect existed in all three task types in terms of the number of tokens, type-token ratio and sentence length; and (4) the usage of high-frequency vocabulary was influenced more strongly by the task type and it was classified into four types: Type A vocabulary for grammar control, Type B vocabulary for speech maintenance, Type C vocabulary for negotiation and persuasion and Type D vocabulary for novice learners.

Originality/value

These findings provide clues for better understanding L2 English vocabulary usage among Asian learners during speech.

Details

PSU Research Review, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2399-1747

Keywords

Article
Publication date: 10 April 2024

Aslıhan Dursun-Cengizci and Meltem Caber

This study aims to predict customer churn in resort hotels by calculating the churn probability of repeat customers for future stays in the same hotel brand.

266

Abstract

Purpose

This study aims to predict customer churn in resort hotels by calculating the churn probability of repeat customers for future stays in the same hotel brand.

Design/methodology/approach

Based on the recency, frequency, monetary (RFM) paradigm, random forest and logistic regression supervised machine learning algorithms were used to predict churn behavior. The model with superior performance was used to detect potential churners and generate a priority matrix.

Findings

The random forest algorithm showed a higher prediction performance with an 80% accuracy rate. The most important variables were RFM-based, followed by hotel sector-specific variables such as market, season, accompaniers and booker. Some managerial strategies were proposed to retain future churners, clustered as “hesitant,” “economy,” “alternative seeker,” and “opportunity chaser” customer groups.

Research limitations/implications

This study contributes to the theoretical understanding of customer behavior in the hospitality industry and provides valuable insight for hotel practitioners by demonstrating the methods that facilitate the identification of potential churners and their characteristics.

Originality/value

Most customer retention studies in hospitality either concentrate on the antecedents of retention or customers’ revisit intentions using traditional methods. Taking a unique place within the literature, this study conducts churn prediction analysis for repeat hotel customers by opening a new area for inquiry in hospitality studies.

Details

International Journal of Contemporary Hospitality Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0959-6119

Keywords

Article
Publication date: 2 May 2023

Dongyuan Zhao, Zhongjun Tang and Duokui He

With the intensification of market competition, there is a growing demand for weak signal identification and evolutionary analysis for enterprise foresight. For decades, many…

Abstract

Purpose

With the intensification of market competition, there is a growing demand for weak signal identification and evolutionary analysis for enterprise foresight. For decades, many scholars have conducted relevant research. However, the existing research only cuts in from a single angle and lacks a systematic and comprehensive overview. In this paper, the authors summarize the articles related to weak signal recognition and evolutionary analysis, in an attempt to make contributions to relevant research.

Design/methodology/approach

The authors develop a systematic overview framework based on the most classical three-dimensional space model of weak signals. Framework comprehensively summarizes the current research insights and knowledge from three dimensions of research field, identification methods and interpretation methods.

Findings

The research results show that it is necessary to improve the automation level in the process of weak signal recognition and analysis and transfer valuable human resources to the decision-making stage. In addition, it is necessary to coordinate multiple types of data sources, expand research subfields and optimize weak signal recognition and interpretation methods, with a view to expanding weak signal future research, making theoretical and practical contributions to enterprise foresight, and providing reference for the government to establish weak signal technology monitoring, evaluation and early warning mechanisms.

Originality/value

The authors develop a systematic overview framework based on the most classical three-dimensional space model of weak signals. It comprehensively summarizes the current research insights and knowledge from three dimensions of research field, identification methods and interpretation methods.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0368-492X

Keywords

1 – 10 of over 3000