Search results

1 – 10 of 49

Open Access

Article

Publication date: 27 February 2023

Machine learning methods for results merging in patent retrieval

Vasileios Stamatis, Michail Salampasis and Konstantinos Diamantaras

In federated search, a query is sent simultaneously to multiple resources and each one of them returns a list of results. These lists are merged into a single list using the…

HTML

PDF (278 KB)

Downloads

510

Abstract

Purpose

In federated search, a query is sent simultaneously to multiple resources and each one of them returns a list of results. These lists are merged into a single list using the results merging process. In this work, the authors apply machine learning methods for results merging in federated patent search. Even though several methods for results merging have been developed, none of them were tested on patent data nor considered several machine learning models. Thus, the authors experiment with state-of-the-art methods using patent data and they propose two new methods for results merging that use machine learning models.

Design/methodology/approach

The methods are based on a centralized index containing samples of documents from all the remote resources, and they implement machine learning models to estimate comparable scores for the documents retrieved by different resources. The authors examine the new methods in cooperative and uncooperative settings where document scores from the remote search engines are available and not, respectively. In uncooperative environments, they propose two methods for assigning document scores.

Findings

The effectiveness of the new results merging methods was measured against state-of-the-art models and found to be superior to them in many cases with significant improvements. The random forest model achieves the best results in comparison to all other models and presents new insights for the results merging problem.

Originality/value

In this article the authors prove that machine learning models can substitute other standard methods and models that used for results merging for many years. Our methods outperformed state-of-the-art estimation methods for results merging, and they proved that they are more effective for federated patent search.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

Open Access

Article

Publication date: 12 October 2021

Predicting student performance in a blended learning environment using learning management system interaction data

Kiran Fahd, Shah Jahan Miah and Khandakar Ahmed

Student attritions in tertiary educational institutes may play a significant role to achieve core values leading towards strategic mission and financial well-being. Analysis of…

HTML

PDF (2.3 MB)

Downloads

3781

Abstract

Purpose

Student attritions in tertiary educational institutes may play a significant role to achieve core values leading towards strategic mission and financial well-being. Analysis of data generated from student interaction with learning management systems (LMSs) in blended learning (BL) environments may assist with the identification of students at risk of failing, but to what extent this may be possible is unknown. However, existing studies are limited to address the issues at a significant scale.

Design/methodology/approach

This study develops a new approach harnessing applications of machine learning (ML) models on a dataset, that is publicly available, relevant to student attrition to identify potential students at risk. The dataset consists of the data generated by the interaction of students with LMS for their BL environment.

Findings

Identifying students at risk through an innovative approach will promote timely intervention in the learning process, such as for improving student academic progress. To evaluate the performance of the proposed approach, the accuracy is compared with other representational ML methods.

Originality/value

The best ML algorithm random forest with 85% is selected to support educators in implementing various pedagogical practices to improve students’ learning.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Open Access

Article

Publication date: 13 January 2022

Machine learning models for predicting international tourist arrivals in Indonesia during the COVID-19 pandemic: a multisource Internet data approach

Dinda Thalia Andariesta and Meditya Wasesa

This research presents machine learning models for predicting international tourist arrivals in Indonesia during the COVID-19 pandemic using multisource Internet data.

HTML

PDF (2.4 MB)

Downloads

4903

Abstract

Purpose

This research presents machine learning models for predicting international tourist arrivals in Indonesia during the COVID-19 pandemic using multisource Internet data.

Design/methodology/approach

To develop the prediction models, this research utilizes multisource Internet data from TripAdvisor travel forum and Google Trends. Temporal factors, posts and comments, search queries index and previous tourist arrivals records are set as predictors. Four sets of predictors and three distinct data compositions were utilized for training the machine learning models, namely artificial neural networks (ANNs), support vector regression (SVR) and random forest (RF). To evaluate the models, this research uses three accuracy metrics, namely root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE).

Findings

Prediction models trained using multisource Internet data predictors have better accuracy than those trained using single-source Internet data or other predictors. In addition, using more training sets that cover the phenomenon of interest, such as COVID-19, will enhance the prediction model's learning process and accuracy. The experiments show that the RF models have better prediction accuracy than the ANN and SVR models.

Originality/value

First, this study pioneers the practice of a multisource Internet data approach in predicting tourist arrivals amid the unprecedented COVID-19 pandemic. Second, the use of multisource Internet data to improve prediction performance is validated with real empirical data. Finally, this is one of the few papers to provide perspectives on the current dynamics of Indonesia's tourism demand.

Details

Journal of Tourism Futures, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2055-5911

Keywords

Open Access

Article

Publication date: 27 November 2023

Predictive machine learning model for mental health issues in higher education students due to COVID-19 using HADS assessment

Reshmy Krishnan, Shantha Kumari, Ali Al Badi, Shermina Jeba and Menila James

Students pursuing different professional courses at the higher education level during 2021–2022 saw the first-time occurrence of a pandemic in the form of coronavirus disease 2019…

HTML

PDF (1.3 MB)

Downloads

419

Abstract

Purpose

Students pursuing different professional courses at the higher education level during 2021–2022 saw the first-time occurrence of a pandemic in the form of coronavirus disease 2019 (COVID-19), and their mental health was affected. Many works are available in the literature to assess mental health severity. However, it is necessary to identify the affected students early for effective treatment.

Design/methodology/approach

Predictive analytics, a part of machine learning (ML), helps with early identification based on mental health severity levels to aid clinical psychologists. As a case study, engineering and medical course students were comparatively analysed in this work as they have rich course content and a stricter evaluation process than other streams. The methodology includes an online survey that obtains demographic details, academic qualifications, family details, etc. and anxiety and depression questions using the Hospital Anxiety and Depression Scale (HADS). The responses acquired through social media networks are analysed using ML algorithms – support vector machines (SVMs) (robust handling of health information) and J48 decision tree (DT) (interpretability/comprehensibility). Also, random forest is used to identify the predictors for anxiety and depression.

Findings

The results show that the support vector classifier produces outperforming results with classification accuracy of 100%, 1.0 precision and 1.0 recall, followed by the J48 DT classifier with 96%. It was found that medical students are affected by anxiety and depression marginally more when compared with engineering students.

Research limitations/implications

The entire work is dependent on the social media-displayed online questionnaire, and the participants were not met in person. This indicates that the response rate could not be evaluated appropriately. Due to the medical restrictions imposed by COVID-19, which remain in effect in 2022, this is the only method found to collect primary data from college students. Additionally, students self-selected themselves to participate in this survey, which raises the possibility of selection bias.

Practical implications

The responses acquired through social media networks are analysed using ML algorithms. This will be a big support for understanding the mental issues of the students due to COVID-19 and can taking appropriate actions to rectify them. This will improve the quality of the learning process in higher education in Oman.

Social implications

Furthermore, this study aims to provide recommendations for mental health screening as a regular practice in educational institutions to identify undetected students.

Originality/value

Comparing the mental health issues of two professional course students is the novelty of this work. This is needed because both studies require practical learning, long hours of work, etc.

Details

Arab Gulf Journal of Scientific Research, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1985-9899

Keywords

Open Access

Article

Publication date: 22 June 2023

Machine learning algorithms applied to the estimation of liquidity: the 10-year United States treasury bond

Ignacio Manuel Luque Raya and Pablo Luque Raya

Having defined liquidity, the aim is to assess the predictive capacity of its representative variables, so that economic fluctuations may be better understood.

HTML

PDF (6.7 MB)

Downloads

858

Abstract

Purpose

Having defined liquidity, the aim is to assess the predictive capacity of its representative variables, so that economic fluctuations may be better understood.

Design/methodology/approach

Conceptual variables that are representative of liquidity will be used to formulate the predictions. The results of various machine learning models will be compared, leading to some reflections on the predictive value of the liquidity variables, with a view to defining their selection.

Findings

The predictive capacity of the model was also found to vary depending on the source of the liquidity, in so far as the data on liquidity within the private sector contributed more than the data on public sector liquidity to the prediction of economic fluctuations. International liquidity was seen as a more diffuse concept, and the standardization of its definition could be the focus of future studies. A benchmarking process was also performed when applying the state-of-the-art machine learning models.

Originality/value

Better understanding of these variables might help us toward a deeper understanding of the operation of financial markets. Liquidity, one of the key financial market variables, is neither well-defined nor standardized in the existing literature, which calls for further study. Hence, the novelty of an applied study employing modern data science techniques can provide a fresh perspective on financial markets.

流動資金,無論是在金融市場方面,抑或是在實體經濟方面,均為市場趨勢最明確的預報因素之一

因此,就了解經濟週期和經濟發展而言,流動資金是一個極其重要的概念。本研究擬在安全資產的價格預測方面取得進步。安全資產代表了經濟的實際情況,特別是美國的十年期國債。

研究目的

流動資金的定義上面已說明了; 為進一步了解經濟波動,本研究擬對流動資金代表性變量的預測能力進行評估。

研究方法

研究使用作為流動資金代表的概念變項去規劃預測。各機器學習模型的結果會作比較,這會帶來對流動資金變量的預測值的深思,而深思的目的是確定其選擇。

研究結果

只要在私營部門內流動資金的數據比公營部門的流動資金數據、在預測經濟波動方面貢獻更大時,我們發現、模型的預測能力也會依賴流動資金的來源而存在差異。國際流動資金被視為一個晦澀的概念,而它的定義的標準化,或許應是未來學術研究的焦點。當應用最先進的機器學習模型時,標桿分析法的步驟也施行了。

研究的原創性

若我們對有關的變量加深認識,我們就可更深入地理解金融市場的運作。流動資金,雖是金融市場中一個極其重要的變量,但在現存的學術文獻裏,不但沒有明確的定義,而且也沒有被標準化; 就此而言,未來的研究或許可在這方面作進一步的探討。因此,本研究為富有新穎思維的應用研究,研究使用了現代數據科學技術,這可為探討金融市場提供一個全新的視角。

Details

European Journal of Management and Business Economics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2444-8451

Keywords

Open Access

Article

Publication date: 15 December 2023

Advancing tourism demand forecasting in Sri Lanka: evaluating the performance of machine learning models and the impact of social media data integration

Isuru Udayangani Hewapathirana

This study explores the pioneering approach of utilising machine learning (ML) models and integrating social media data for predicting tourist arrivals in Sri Lanka.

HTML

PDF (3.7 MB)

Downloads

569

Abstract

Purpose

This study explores the pioneering approach of utilising machine learning (ML) models and integrating social media data for predicting tourist arrivals in Sri Lanka.

Design/methodology/approach

Two sets of experiments are performed in this research. First, the predictive accuracy of three ML models, support vector regression (SVR), random forest (RF) and artificial neural network (ANN), is compared against the seasonal autoregressive integrated moving average (SARIMA) model using historical tourist arrivals as features. Subsequently, the impact of incorporating social media data from TripAdvisor and Google Trends as additional features is investigated.

Findings

The findings reveal that the ML models generally outperform the SARIMA model, particularly from 2019 to 2021, when several unexpected events occurred in Sri Lanka. When integrating social media data, the RF model performs significantly better during most years, whereas the SVR model does not exhibit significant improvement. Although adding social media data to the ANN model does not yield superior forecasts, it exhibits proficiency in capturing data trends.

Practical implications

The findings offer substantial implications for the industry's growth and resilience, allowing stakeholders to make accurate data-driven decisions to navigate the unpredictable dynamics of Sri Lanka's tourism sector.

Originality/value

This study presents the first exploration of ML models and the integration of social media data for forecasting Sri Lankan tourist arrivals, contributing to the advancement of research in this domain.

Details

Journal of Tourism Futures, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2055-5911

Keywords

Open Access

Article

Publication date: 25 May 2021

Classification models for likelihood prediction of diabetes at early stage using feature selection

Oladosu Oyebisi Oladimeji, Abimbola Oladimeji and Olayanju Oladimeji

Diabetes is one of the life-threatening chronic diseases, which is already affecting 422m people globally based on (World Health Organization) WHO report as at 2018. This costs…

HTML

PDF (113 KB)

Downloads

2079

Abstract

Purpose

Diabetes is one of the life-threatening chronic diseases, which is already affecting 422m people globally based on (World Health Organization) WHO report as at 2018. This costs individuals, government and groups a whole lot; right from its diagnosis stage to the treatment stage. The reason for this cost, among others, is that it is a long-term treatment disease. This disease is likely to continue to affect more people because of its long asymptotic phase, which makes its early detection not feasible.

Design/methodology/approach

In this study, the authors have presented machine learning models with feature selection, which can detect diabetes disease at its early stage. Also, the models presented are not costly and available to everyone, including those in the remote areas.

Findings

The study result shows that feature selection helps in getting better model, as it prevents overfitting and removes redundant data. Hence, the study result when compared with previous research shows the better result has been achieved, after it was evaluated based on metrics such as F-measure, Precision-Recall curve and Receiver Operating Characteristic Area Under Curve. This discovery has the potential to impact on clinical practice, when health workers aim at diagnosing diabetes disease at its early stage.

Originality/value

This study has not been published anywhere else.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Open Access

Article

Publication date: 15 June 2021

IDMPF: intelligent diabetes mellitus prediction framework using machine learning

Leila Ismail and Huned Materwala

Machine Learning is an intelligent methodology used for prediction and has shown promising results in predictive classifications. One of the critical areas in which machine…

HTML

PDF (2 MB)

Downloads

2137

Abstract

Purpose

Machine Learning is an intelligent methodology used for prediction and has shown promising results in predictive classifications. One of the critical areas in which machine learning can save lives is diabetes prediction. Diabetes is a chronic disease and one of the 10 causes of death worldwide. It is expected that the total number of diabetes will be 700 million in 2045; a 51.18% increase compared to 2019. These are alarming figures, and therefore, it becomes an emergency to provide an accurate diabetes prediction.

Design/methodology/approach

Health professionals and stakeholders are striving for classification models to support prognosis of diabetes and formulate strategies for prevention. The authors conduct literature review of machine models and propose an intelligent framework for diabetes prediction.

Findings

The authors provide critical analysis of machine learning models, propose and evaluate an intelligent machine learning-based architecture for diabetes prediction. The authors implement and evaluate the decision tree (DT)-based random forest (RF) and support vector machine (SVM) learning models for diabetes prediction as the mostly used approaches in the literature using our framework.

Originality/value

This paper provides novel intelligent diabetes mellitus prediction framework (IDMPF) using machine learning. The framework is the result of a critical examination of prediction models in the literature and their application to diabetes. The authors identify the training methodologies, models evaluation strategies, the challenges in diabetes prediction and propose solutions within the framework. The research results can be used by health professionals, stakeholders, students and researchers working in the diabetes prediction area.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Open Access

Article

Publication date: 4 March 2022

Spatial prediction of flood-susceptible zones in the Ourika watershed of Morocco using machine learning algorithms

Modeste Meliho, Abdellatif Khattabi, Zejli Driss and Collins Ashianga Orlando

The purpose of the paper is to predict mapping of areas vulnerable to flooding in the Ourika watershed in the High Atlas of Morocco with the aim of providing a useful tool capable…

HTML

PDF (2.8 MB)

Downloads

1452

Abstract

Purpose

The purpose of the paper is to predict mapping of areas vulnerable to flooding in the Ourika watershed in the High Atlas of Morocco with the aim of providing a useful tool capable of helping in the mitigation and management of floods in the associated region, as well as Morocco as a whole.

Design/methodology/approach

Four machine learning (ML) algorithms including k-nearest neighbors (KNN), artificial neural network, random forest (RF) and x-gradient boost (XGB) are adopted for modeling. Additionally, 16 predictors divided into categorical and numerical variables are used as inputs for modeling.

Findings

The results showed that RF and XGB were the best performing algorithms, with AUC scores of 99.1 and 99.2%, respectively. Conversely, KNN had the lowest predictive power, scoring 94.4%. Overall, the algorithms predicted that over 60% of the watershed was in the very low flood risk class, while the high flood risk class accounted for less than 15% of the area.

Originality/value

There are limited, if not non-existent studies on modeling using AI tools including ML in the region in predictive modeling of flooding, making this study intriguing.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Open Access

Article

Publication date: 29 January 2024

Prediction of surface roughness using deep learning and data augmentation

Miaoxian Guo, Shouheng Wei, Chentong Han, Wanliang Xia, Chao Luo and Zhijian Lin

Surface roughness has a serious impact on the fatigue strength, wear resistance and life of mechanical products. Realizing the evolution of surface quality through theoretical…

HTML

PDF (4.9 MB)

Downloads

329

Abstract

Purpose

Surface roughness has a serious impact on the fatigue strength, wear resistance and life of mechanical products. Realizing the evolution of surface quality through theoretical modeling takes a lot of effort. To predict the surface roughness of milling processing, this paper aims to construct a neural network based on deep learning and data augmentation.

Design/methodology/approach

This study proposes a method consisting of three steps. Firstly, the machine tool multisource data acquisition platform is established, which combines sensor monitoring with machine tool communication to collect processing signals. Secondly, the feature parameters are extracted to reduce the interference and improve the model generalization ability. Thirdly, for different expectations, the parameters of the deep belief network (DBN) model are optimized by the tent-SSA algorithm to achieve more accurate roughness classification and regression prediction.

Findings

The adaptive synthetic sampling (ADASYN) algorithm can improve the classification prediction accuracy of DBN from 80.67% to 94.23%. After the DBN parameters were optimized by Tent-SSA, the roughness prediction accuracy was significantly improved. For the classification model, the prediction accuracy is improved by 5.77% based on ADASYN optimization. For regression models, different objective functions can be set according to production requirements, such as root-mean-square error (RMSE) or MaxAE, and the error is reduced by more than 40% compared to the original model.

Originality/value

A roughness prediction model based on multiple monitoring signals is proposed, which reduces the dependence on the acquisition of environmental variables and enhances the model's applicability. Furthermore, with the ADASYN algorithm, the Tent-SSA intelligent optimization algorithm is introduced to optimize the hyperparameters of the DBN model and improve the optimization performance.

Details

Journal of Intelligent Manufacturing and Special Equipment, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2633-6596

Keywords

Access

Year

Content type

Earlycite article (49)

1 – 10 of 49