Search results

1 – 10 of over 3000
Open Access
Article
Publication date: 31 July 2023

Daniel Šandor and Marina Bagić Babac

Sarcasm is a linguistic expression that usually carries the opposite meaning of what is being said by words, thus making it difficult for machines to discover the actual meaning…

2983

Abstract

Purpose

Sarcasm is a linguistic expression that usually carries the opposite meaning of what is being said by words, thus making it difficult for machines to discover the actual meaning. It is mainly distinguished by the inflection with which it is spoken, with an undercurrent of irony, and is largely dependent on context, which makes it a difficult task for computational analysis. Moreover, sarcasm expresses negative sentiments using positive words, allowing it to easily confuse sentiment analysis models. This paper aims to demonstrate the task of sarcasm detection using the approach of machine and deep learning.

Design/methodology/approach

For the purpose of sarcasm detection, machine and deep learning models were used on a data set consisting of 1.3 million social media comments, including both sarcastic and non-sarcastic comments. The data set was pre-processed using natural language processing methods, and additional features were extracted and analysed. Several machine learning models, including logistic regression, ridge regression, linear support vector and support vector machines, along with two deep learning models based on bidirectional long short-term memory and one bidirectional encoder representations from transformers (BERT)-based model, were implemented, evaluated and compared.

Findings

The performance of machine and deep learning models was compared in the task of sarcasm detection, and possible ways of improvement were discussed. Deep learning models showed more promise, performance-wise, for this type of task. Specifically, a state-of-the-art model in natural language processing, namely, BERT-based model, outperformed other machine and deep learning models.

Originality/value

This study compared the performance of the various machine and deep learning models in the task of sarcasm detection using the data set of 1.3 million comments from social media.

Details

Information Discovery and Delivery, vol. 52 no. 2
Type: Research Article
ISSN: 2398-6247

Keywords

Open Access
Article
Publication date: 14 July 2022

Karlo Puh and Marina Bagić Babac

As the tourism industry becomes more vital for the success of many economies around the world, the importance of technology in tourism grows daily. Alongside increasing tourism…

6065

Abstract

Purpose

As the tourism industry becomes more vital for the success of many economies around the world, the importance of technology in tourism grows daily. Alongside increasing tourism importance and popularity, the amount of significant data grows, too. On daily basis, millions of people write their opinions, suggestions and views about accommodation, services, and much more on various websites. Well-processed and filtered data can provide a lot of useful information that can be used for making tourists' experiences much better and help us decide when selecting a hotel or a restaurant. Thus, the purpose of this study is to explore machine and deep learning models for predicting sentiment and rating from tourist reviews.

Design/methodology/approach

This paper used machine learning models such as Naïve Bayes, support vector machines (SVM), convolutional neural network (CNN), long short-term memory (LSTM) and bidirectional long short-term memory (BiLSTM) for extracting sentiment and ratings from tourist reviews. These models were trained to classify reviews into positive, negative, or neutral sentiment, and into one to five grades or stars. Data used for training the models were gathered from TripAdvisor, the world's largest travel platform. The models based on multinomial Naïve Bayes (MNB) and SVM were trained using the term frequency-inverse document frequency (TF-IDF) for word representations while deep learning models were trained using global vectors (GloVe) for word representation. The results from testing these models are presented, compared and discussed.

Findings

The performance of machine and learning models achieved high accuracy in predicting positive, negative, or neutral sentiments and ratings from tourist reviews. The optimal model architecture for both classification tasks was a deep learning model based on BiLSTM. The study’s results confirmed that deep learning models are more efficient and accurate than machine learning algorithms.

Practical implications

The proposed models allow for forecasting the number of tourist arrivals and expenditure, gaining insights into the tourists' profiles, improving overall customer experience, and upgrading marketing strategies. Different service sectors can use the implemented models to get insights into customer satisfaction with the products and services as well as to predict the opinions given a particular context.

Originality/value

This study developed and compared different machine learning models for classifying customer reviews as positive, negative, or neutral, as well as predicting ratings with one to five stars based on a TripAdvisor hotel reviews dataset that contains 20,491 unique hotel reviews.

Details

Journal of Hospitality and Tourism Insights, vol. 6 no. 3
Type: Research Article
ISSN: 2514-9792

Keywords

Open Access
Article
Publication date: 25 October 2019

Ning Yan and Oliver Tat-Sheung Au

The purpose of this paper is to make a correlation analysis between students’ online learning behavior features and course grade, and to attempt to build some effective prediction…

7696

Abstract

Purpose

The purpose of this paper is to make a correlation analysis between students’ online learning behavior features and course grade, and to attempt to build some effective prediction model based on limited data.

Design/methodology/approach

The prediction label in this paper is the course grade of students, and the eigenvalues available are student age, student gender, connection time, hits count and days of access. The machine learning model used in this paper is the classical three-layer feedforward neural networks, and the scaled conjugate gradient algorithm is adopted. Pearson correlation analysis method is used to find the relationships between course grade and the student eigenvalues.

Findings

Days of access has the highest correlation with course grade, followed by hits count, and connection time is less relevant to students’ course grade. Student age and gender have the lowest correlation with course grade. Binary classification models have much higher prediction accuracy than multi-class classification models. Data normalization and data discretization can effectively improve the prediction accuracy of machine learning models, such as ANN model in this paper.

Originality/value

This paper may help teachers to find some clue to identify students with learning difficulties in advance and give timely help through the online learning behavior data. It shows that acceptable prediction models based on machine learning can be built using a small and limited data set. However, introducing external data into machine learning models to improve its prediction accuracy is still a valuable and hard issue.

Details

Asian Association of Open Universities Journal, vol. 14 no. 2
Type: Research Article
ISSN: 2414-6994

Keywords

Open Access
Article
Publication date: 18 July 2022

Youakim Badr

In this research, the authors demonstrate the advantage of reinforcement learning (RL) based intrusion detection systems (IDS) to solve very complex problems (e.g. selecting input…

1278

Abstract

Purpose

In this research, the authors demonstrate the advantage of reinforcement learning (RL) based intrusion detection systems (IDS) to solve very complex problems (e.g. selecting input features, considering scarce resources and constrains) that cannot be solved by classical machine learning. The authors include a comparative study to build intrusion detection based on statistical machine learning and representational learning, using knowledge discovery in databases (KDD) Cup99 and Installation Support Center of Expertise (ISCX) 2012.

Design/methodology/approach

The methodology applies a data analytics approach, consisting of data exploration and machine learning model training and evaluation. To build a network-based intrusion detection system, the authors apply dueling double deep Q-networks architecture enabled with costly features, k-nearest neighbors (K-NN), support-vector machines (SVM) and convolution neural networks (CNN).

Findings

Machine learning-based intrusion detection are trained on historical datasets which lead to model drift and lack of generalization whereas RL is trained with data collected through interactions. RL is bound to learn from its interactions with a stochastic environment in the absence of a training dataset whereas supervised learning simply learns from collected data and require less computational resources.

Research limitations/implications

All machine learning models have achieved high accuracy values and performance. One potential reason is that both datasets are simulated, and not realistic. It was not clear whether a validation was ever performed to show that data were collected from real network traffics.

Practical implications

The study provides guidelines to implement IDS with classical supervised learning, deep learning and RL.

Originality/value

The research applied the dueling double deep Q-networks architecture enabled with costly features to build network-based intrusion detection from network traffics. This research presents a comparative study of reinforcement-based instruction detection with counterparts built with statistical and representational machine learning.

Open Access
Article
Publication date: 3 August 2020

Djordje Cica, Branislav Sredanovic, Sasa Tesic and Davorin Kramar

Sustainable manufacturing is one of the most important and most challenging issues in present industrial scenario. With the intention of diminish negative effects associated with…

2117

Abstract

Sustainable manufacturing is one of the most important and most challenging issues in present industrial scenario. With the intention of diminish negative effects associated with cutting fluids, the machining industries are continuously developing technologies and systems for cooling/lubricating of the cutting zone while maintaining machining efficiency. In the present study, three regression based machine learning techniques, namely, polynomial regression (PR), support vector regression (SVR) and Gaussian process regression (GPR) were developed to predict machining force, cutting power and cutting pressure in the turning of AISI 1045. In the development of predictive models, machining parameters of cutting speed, depth of cut and feed rate were considered as control factors. Since cooling/lubricating techniques significantly affects the machining performance, prediction model development of quality characteristics was performed under minimum quantity lubrication (MQL) and high-pressure coolant (HPC) cutting conditions. The prediction accuracy of developed models was evaluated by statistical error analyzing methods. Results of regressions based machine learning techniques were also compared with probably one of the most frequently used machine learning method, namely artificial neural networks (ANN). Finally, a metaheuristic approach based on a neural network algorithm was utilized to perform an efficient multi-objective optimization of process parameters for both cutting environment.

Details

Applied Computing and Informatics, vol. 20 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 20 September 2022

Joo Hun Yoo, Hyejun Jeong, Jaehyeok Lee and Tai-Myoung Chung

This study aims to summarize the critical issues in medical federated learning and applicable solutions. Also, detailed explanations of how federated learning techniques can be…

2918

Abstract

Purpose

This study aims to summarize the critical issues in medical federated learning and applicable solutions. Also, detailed explanations of how federated learning techniques can be applied to the medical field are presented. About 80 reference studies described in the field were reviewed, and the federated learning framework currently being developed by the research team is provided. This paper will help researchers to build an actual medical federated learning environment.

Design/methodology/approach

Since machine learning techniques emerged, more efficient analysis was possible with a large amount of data. However, data regulations have been tightened worldwide, and the usage of centralized machine learning methods has become almost infeasible. Federated learning techniques have been introduced as a solution. Even with its powerful structural advantages, there still exist unsolved challenges in federated learning in a real medical data environment. This paper aims to summarize those by category and presents possible solutions.

Findings

This paper provides four critical categorized issues to be aware of when applying the federated learning technique to the actual medical data environment, then provides general guidelines for building a federated learning environment as a solution.

Originality/value

Existing studies have dealt with issues such as heterogeneity problems in the federated learning environment itself, but those were lacking on how these issues incur problems in actual working tasks. Therefore, this paper helps researchers understand the federated learning issues through examples of actual medical machine learning environments.

Details

International Journal of Web Information Systems, vol. 18 no. 2/3
Type: Research Article
ISSN: 1744-0084

Keywords

Open Access
Article
Publication date: 27 March 2020

Agostino Valier

In the literature there are numerous tests that compare the accuracy of automated valuation models (AVMs). These models first train themselves with price data and property…

3097

Abstract

Purpose

In the literature there are numerous tests that compare the accuracy of automated valuation models (AVMs). These models first train themselves with price data and property characteristics, then they are tested by measuring their ability to predict prices. Most of them compare the effectiveness of traditional econometric models against the use of machine learning algorithms. Although the latter seem to offer better performance, there is not yet a complete survey of the literature to confirm the hypothesis.

Design/methodology/approach

All tests comparing regression analysis and AVMs machine learning on the same data set have been identified. The scores obtained in terms of accuracy were then compared with each other.

Findings

Machine learning models are more accurate than traditional regression analysis in their ability to predict value. Nevertheless, many authors point out as their limit their black box nature and their poor inferential abilities.

Practical implications

AVMs machine learning offers a huge advantage for all real estate operators who know and can use them. Their use in public policy or litigation can be critical.

Originality/value

According to the author, this is the first systematic review that collects all the articles produced on the subject done comparing the results obtained.

Details

Journal of Property Investment & Finance, vol. 38 no. 3
Type: Research Article
ISSN: 1463-578X

Keywords

Open Access
Article
Publication date: 30 June 2021

Mohammad Abdullah

Financial health of a corporation is a great concern for every investor level and decision-makers. For many years, financial solvency prediction is a significant issue throughout…

3960

Abstract

Purpose

Financial health of a corporation is a great concern for every investor level and decision-makers. For many years, financial solvency prediction is a significant issue throughout academia, precisely in finance. This requirement leads this study to check whether machine learning can be implemented in financial solvency prediction.

Design/methodology/approach

This study analyzed 244 Dhaka stock exchange public-listed companies over the 2015–2019 period, and two subsets of data are also developed as training and testing datasets. For machine learning model building, samples are classified as secure, healthy and insolvent by the Altman Z-score. R statistical software is used to make predictive models of five classifiers and all model performances are measured with different performance metrics such as logarithmic loss (logLoss), area under the curve (AUC), precision recall AUC (prAUC), accuracy, kappa, sensitivity and specificity.

Findings

This study found that the artificial neural network classifier has 88% accuracy and sensitivity rate; also, AUC for this model is 96%. However, the ensemble classifier outperforms all other models by considering logLoss and other metrics.

Research limitations/implications

The major result of this study can be implicated to the financial institution for credit scoring, credit rating and loan classification, etc. And other companies can implement machine learning models to their enterprise resource planning software to trace their financial solvency.

Practical implications

Finally, a predictive application is developed through training a model with 1,200 observations and making it available for all rational and novice investors (Abdullah, 2020).

Originality/value

This study found that, with the best of author expertise, the author did not find any studies regarding machine learning research of financial solvency that examines a comparable number of a dataset, with all these models in Bangladesh.

Details

Journal of Asian Business and Economic Studies, vol. 28 no. 4
Type: Research Article
ISSN: 2515-964X

Keywords

Open Access
Article
Publication date: 20 January 2022

Francisco Pérez Moreno, Víctor Fernando Gómez Comendador, Raquel Delgado-Aguilera Jurado, María Zamarreño Suárez, Dominik Janisch and Rosa María Arnaldo Valdes

The purpose of this paper is to set out a methodology for characterising the complexity of air traffic control (ATC) sectors based on individual operations. This machine learning

Abstract

Purpose

The purpose of this paper is to set out a methodology for characterising the complexity of air traffic control (ATC) sectors based on individual operations. This machine learning methodology also learns from the data on which the model is based.

Design/methodology/approach

The methodology comprises three steps. Firstly, a statistical analysis of individual operations is carried out using elementary or initial variables, and these are combined using machine learning. Secondly, based on the initial statistical analysis and using machine learning techniques, the impact of air traffic flows on an ATC sector are determined. The last step is to calculate the complexity of the ATC sector based on the impact of its air traffic flows.

Findings

The results obtained are logical from an operational point of view and are easy to interpret. The classification of ATC sectors based on complexity is quite accurate.

Research limitations/implications

The methodology is in its preliminary phase and has been tested with very little data. Further refinement is required.

Originality/value

The methodology can be of significant value to ATC in that when applied to real cases, ATC will be able to anticipate the complexity of the airspace and optimise its resources accordingly.

Details

Aircraft Engineering and Aerospace Technology, vol. 94 no. 9
Type: Research Article
ISSN: 1748-8842

Keywords

Open Access
Article
Publication date: 28 July 2020

R. Shashikant and P. Chetankumar

Cardiac arrest is a severe heart anomaly that results in billions of annual casualties. Smoking is a specific hazard factor for cardiovascular pathology, including coronary heart…

2367

Abstract

Cardiac arrest is a severe heart anomaly that results in billions of annual casualties. Smoking is a specific hazard factor for cardiovascular pathology, including coronary heart disease, but data on smoking and heart death not earlier reviewed. The Heart Rate Variability (HRV) parameters used to predict cardiac arrest in smokers using machine learning technique in this paper. Machine learning is a method of computing experience based on automatic learning and enhances performances to increase prognosis. This study intends to compare the performance of logistical regression, decision tree, and random forest model to predict cardiac arrest in smokers. In this paper, a machine learning technique implemented on the dataset received from the data science research group MITU Skillogies Pune, India. To know the patient has a chance of cardiac arrest or not, developed three predictive models as 19 input feature of HRV indices and two output classes. These model evaluated based on their accuracy, precision, sensitivity, specificity, F1 score, and Area under the curve (AUC). The model of logistic regression has achieved an accuracy of 88.50%, precision of 83.11%, the sensitivity of 91.79%, the specificity of 86.03%, F1 score of 0.87, and AUC of 0.88. The decision tree model has arrived with an accuracy of 92.59%, precision of 97.29%, the sensitivity of 90.11%, the specificity of 97.38%, F1 score of 0.93, and AUC of 0.94. The model of the random forest has achieved an accuracy of 93.61%, precision of 94.59%, the sensitivity of 92.11%, the specificity of 95.03%, F1 score of 0.93 and AUC of 0.95. The random forest model achieved the best accuracy classification, followed by the decision tree, and logistic regression shows the lowest classification accuracy.

Details

Applied Computing and Informatics, vol. 19 no. 3/4
Type: Research Article
ISSN: 2634-1964

Keywords

1 – 10 of over 3000