Search results

1 – 10 of over 1000
Article
Publication date: 2 February 2022

Wenzhong Gao, Xingzong Huang, Mengya Lin, Jing Jia and Zhen Tian

The purpose of this paper is to target on designing a short-term load prediction framework that can accurately predict the cooling load of office buildings.

Abstract

Purpose

The purpose of this paper is to target on designing a short-term load prediction framework that can accurately predict the cooling load of office buildings.

Design/methodology/approach

A feature selection scheme and stacking ensemble model to fulfill cooling load prediction task was proposed. Firstly, the abnormal data were identified by the data density estimation algorithm. Secondly, the crucial input features were clarified from three aspects (i.e. historical load information, time information and meteorological information). Thirdly, the stacking ensemble model combined long short-term memory network and light gradient boosting machine was utilized to predict the cooling load. Finally, the proposed framework performances by predicting cooling load of office buildings were verified with indicators.

Findings

The identified input features can improve the prediction performance. The prediction accuracy of the proposed model is preferable to the existing ones. The stacking ensemble model is robust to weather forecasting errors.

Originality/value

The stacking ensemble model was used to fulfill cooling load prediction task which can overcome the shortcomings of deep learning models. The input features of the model, which are less focused on in most studies, are taken as an important step in this paper.

Details

Engineering Computations, vol. 39 no. 5
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 15 March 2023

Indranil Ghosh, Rabin K. Jana and Mohammad Zoynul Abedin

The prediction of Airbnb listing prices predominantly uses a set of amenity-driven features. Choosing an appropriate set of features from thousands of available amenity-driven…

Abstract

Purpose

The prediction of Airbnb listing prices predominantly uses a set of amenity-driven features. Choosing an appropriate set of features from thousands of available amenity-driven features makes the prediction task difficult. This paper aims to propose a scalable, robust framework to predict listing prices of Airbnb units without using amenity-driven features.

Design/methodology/approach

The authors propose an artificial intelligence (AI)-based framework to predict Airbnb listing prices. The authors consider 75 thousand Airbnb listings from the five US cities with more than 1.9 million observations. The proposed framework integrates (i) feature screening, (ii) stacking that combines gradient boosting, bagging, random forest, (iii) particle swarm optimization and (iv) explainable AI to accomplish the research objective.

Findings

The key findings have three aspects – prediction accuracy, homogeneity and identification of best and least predictable cities. The proposed framework yields predictions of supreme precision. The predictability of listing prices varies significantly across cities. The listing prices are the best predictable for Boston and the least predictable for Chicago.

Practical implications

The framework and findings of the research can be leveraged by the hosts to determine rental prices and augment the service offerings by emphasizing key features, respectively.

Originality/value

Although individual components are known, the way they have been integrated into the proposed framework to derive a high-quality forecast of Airbnb listing prices is unique. It is scalable. The Airbnb listing price modeling literature rarely witnesses such a framework.

Details

International Journal of Contemporary Hospitality Management, vol. 35 no. 10
Type: Research Article
ISSN: 0959-6119

Keywords

Article
Publication date: 21 January 2019

Issa Alsmadi and Keng Hoon Gan

Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type…

1108

Abstract

Purpose

Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type of document based on their content has a significant implication in many applications. The need to classify these documents in relevant classes according to their text contents should be interested in many practical reasons. Short-text classification is an essential step in many applications, such as spam filtering, sentiment analysis, Twitter personalization, customer review and many other applications related to social networks. Reviews on short text and its application are limited. Thus, this paper aims to discuss the characteristics of short text, its challenges and difficulties in classification. The paper attempt to introduce all stages in principle classification, the technique used in each stage and the possible development trend in each stage.

Design/methodology/approach

The paper as a review of the main aspect of short-text classification. The paper is structured based on the classification task stage.

Findings

This paper discusses related issues and approaches to these problems. Further research could be conducted to address the challenges in short texts and avoid poor accuracy in classification. Problems in low performance can be solved by using optimized solutions, such as genetic algorithms that are powerful in enhancing the quality of selected features. Soft computing solution has a fuzzy logic that makes short-text problems a promising area of research.

Originality/value

Using a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. Current solutions still have low performance, implying the need for improvement. This paper discusses related issues and approaches to these problems.

Details

International Journal of Web Information Systems, vol. 15 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 10 January 2020

Ammara Zamir, Hikmat Ullah Khan, Tassawar Iqbal, Nazish Yousaf, Farah Aslam, Almas Anjum and Maryam Hamdani

This paper aims to present a framework to detect phishing websites using stacking model. Phishing is a type of fraud to access users’ credentials. The attackers access users’…

3219

Abstract

Purpose

This paper aims to present a framework to detect phishing websites using stacking model. Phishing is a type of fraud to access users’ credentials. The attackers access users’ personal and sensitive information for monetary purposes. Phishing affects diverse fields, such as e-commerce, online business, banking and digital marketing, and is ordinarily carried out by sending spam emails and developing identical websites resembling the original websites. As people surf the targeted website, the phishers hijack their personal information.

Design/methodology/approach

Features of phishing data set are analysed by using feature selection techniques including information gain, gain ratio, Relief-F and recursive feature elimination (RFE) for feature selection. Two features are proposed combining the strongest and weakest attributes. Principal component analysis with diverse machine learning algorithms including (random forest [RF], neural network [NN], bagging, support vector machine, Naïve Bayes and k-nearest neighbour) is applied on proposed and remaining features. Afterwards, two stacking models: Stacking1 (RF + NN + Bagging) and Stacking2 (kNN + RF + Bagging) are applied by combining highest scoring classifiers to improve the classification accuracy.

Findings

The proposed features played an important role in improving the accuracy of all the classifiers. The results show that RFE plays an important role to remove the least important feature from the data set. Furthermore, Stacking1 (RF + NN + Bagging) outperformed all other classifiers in terms of classification accuracy to detect phishing website with 97.4% accuracy.

Originality/value

This research is novel in this regard that no previous research focusses on using feed forward NN and ensemble learners for detecting phishing websites.

Article
Publication date: 14 May 2021

Zhenyuan Wang, Chih-Fong Tsai and Wei-Chao Lin

Class imbalance learning, which exists in many domain problem datasets, is an important research topic in data mining and machine learning. One-class classification techniques…

Abstract

Purpose

Class imbalance learning, which exists in many domain problem datasets, is an important research topic in data mining and machine learning. One-class classification techniques, which aim to identify anomalies as the minority class from the normal data as the majority class, are one representative solution for class imbalanced datasets. Since one-class classifiers are trained using only normal data to create a decision boundary for later anomaly detection, the quality of the training set, i.e. the majority class, is one key factor that affects the performance of one-class classifiers.

Design/methodology/approach

In this paper, we focus on two data cleaning or preprocessing methods to address class imbalanced datasets. The first method examines whether performing instance selection to remove some noisy data from the majority class can improve the performance of one-class classifiers. The second method combines instance selection and missing value imputation, where the latter is used to handle incomplete datasets that contain missing values.

Findings

The experimental results are based on 44 class imbalanced datasets; three instance selection algorithms, including IB3, DROP3 and the GA, the CART decision tree for missing value imputation, and three one-class classifiers, which include OCSVM, IFOREST and LOF, show that if the instance selection algorithm is carefully chosen, performing this step could improve the quality of the training data, which makes one-class classifiers outperform the baselines without instance selection. Moreover, when class imbalanced datasets contain some missing values, combining missing value imputation and instance selection, regardless of which step is first performed, can maintain similar data quality as datasets without missing values.

Originality/value

The novelty of this paper is to investigate the effect of performing instance selection on the performance of one-class classifiers, which has never been done before. Moreover, this study is the first attempt to consider the scenario of missing values that exist in the training set for training one-class classifiers. In this case, performing missing value imputation and instance selection with different orders are compared.

Details

Data Technologies and Applications, vol. 55 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 25 January 2022

Tobias Mueller, Alexander Segin, Christoph Weigand and Robert H. Schmitt

In the determination of the measurement uncertainty, the GUM procedure requires the building of a measurement model that establishes a functional relationship between the…

Abstract

Purpose

In the determination of the measurement uncertainty, the GUM procedure requires the building of a measurement model that establishes a functional relationship between the measurand and all influencing quantities. Since the effort of modelling as well as quantifying the measurement uncertainties depend on the number of influencing quantities considered, the aim of this study is to determine relevant influencing quantities and to remove irrelevant ones from the dataset.

Design/methodology/approach

In this work, it was investigated whether the effort of modelling for the determination of measurement uncertainty can be reduced by the use of feature selection (FS) methods. For this purpose, 9 different FS methods were tested on 16 artificial test datasets, whose properties (number of data points, number of features, complexity, features with low influence and redundant features) were varied via a design of experiments.

Findings

Based on a success metric, the stability, universality and complexity of the method, two FS methods could be identified that reliably identify relevant and irrelevant influencing quantities for a measurement model.

Originality/value

For the first time, FS methods were applied to datasets with properties of classical measurement processes. The simulation-based results serve as a basis for further research in the field of FS for measurement models. The identified algorithms will be applied to real measurement processes in the future.

Details

International Journal of Quality & Reliability Management, vol. 40 no. 3
Type: Research Article
ISSN: 0265-671X

Keywords

Article
Publication date: 11 October 2019

Ahsan Mahmood and Hikmat Ullah Khan

The purpose of this paper is to apply state-of-the-art machine learning techniques for assessing the quality of the restaurants using restaurant inspection data. The machine…

Abstract

Purpose

The purpose of this paper is to apply state-of-the-art machine learning techniques for assessing the quality of the restaurants using restaurant inspection data. The machine learning techniques are applied to solve the real-world problems in all sphere of life. Health and food departments pay regular visits to restaurants for inspection and mark the condition of the restaurant on the basis of the inspection. These inspections consider many factors that determine the condition of the restaurants and make it possible for the authorities to classify the restaurants.

Design/methodology/approach

In this paper, standard machine learning techniques, support vector machines, naïve Bayes and random forest classifiers are applied to classify the critical level of the restaurants on the basis of features identified during the inspection. The importance of different factors of inspection is determined by using feature selection through the help of the minimum-redundancy-maximum-relevance and linear vector quantization feature importance methods.

Findings

The experiments are accomplished on the real-world New York City restaurant inspection data set that contains diverse inspection features. The results show that the nonlinear support vector machine achieves better accuracy than other techniques. Moreover, this research study investigates the importance of different factors of restaurant inspection and finds that inspection score and grade are significant features. The performance of the classifiers is measured by using the standard performance evaluation measures of accuracy, sensitivity and specificity.

Originality/value

This research uses a real-world data set of restaurant inspection that has, to the best of the authors’ knowledge, never been used previously by researchers. The findings are helpful in identifying the best restaurants and help finding the factors that are considered important in restaurant inspection. The results are also important in identifying possible biases in restaurant inspections by the authorities.

Details

The Electronic Library, vol. 37 no. 6
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 14 February 2022

Arslan Akram, Saba Ramzan, Akhtar Rasool, Arfan Jaffar, Usama Furqan and Wahab Javed

This paper aims to propose a novel splicing detection method using a discriminative robust local binary pattern (DRLBP) with a support vector machine (SVM). Reliable detection of…

Abstract

Purpose

This paper aims to propose a novel splicing detection method using a discriminative robust local binary pattern (DRLBP) with a support vector machine (SVM). Reliable detection of image splicing is of growing interest due to the extensive utilization of digital images as a communication medium and the availability of powerful image processing tools. Image splicing is a commonly used forgery technique in which a region of an image is copied and pasted to a different image to hide the original contents of the image.

Design/methodology/approach

The structural changes caused due to splicing are robustly described by DRLBP. The changes caused by image forgery are localized, so as a first step, localized description is divided into overlapping blocks by providing an image as input. DRLBP descriptor is calculated for each block, and the feature vector is created by concatenation. Finally, features are passed to the SVM classifier to predict whether the image is genuine or forged.

Findings

The performance and robustness of the method are evaluated on public domain benchmark data sets and achieved 98.95% prediction accuracy. The results are compared with state-of-the-art image splicing finding approaches, and it shows that the performance of the proposed method is improved using the given technique.

Originality/value

The proposed method is using DRLBP, an efficient texture descriptor, which combines both corner and inside design detail in a single representation. It produces discriminative and compact features in such a way that there is no need for the feature selection process to drop the redundant and insignificant features.

Details

World Journal of Engineering, vol. 19 no. 4
Type: Research Article
ISSN: 1708-5284

Keywords

Article
Publication date: 18 January 2022

Gomathi V., Kalaiselvi S. and Thamarai Selvi D

This work aims to develop a novel fuzzy associator rule-based fuzzified deep convolutional neural network (FDCNN) architecture for the classification of smartphone sensor-based…

Abstract

Purpose

This work aims to develop a novel fuzzy associator rule-based fuzzified deep convolutional neural network (FDCNN) architecture for the classification of smartphone sensor-based human activity recognition. This work mainly focuses on fusing the λmax method for weight initialization, as a data normalization technique, to achieve high accuracy of classification.

Design/methodology/approach

The major contributions of this work are modeled as FDCNN architecture, which is initially fused with a fuzzy logic based data aggregator. This work significantly focuses on normalizing the University of California, Irvine data set’s statistical parameters before feeding that to convolutional neural network layers. This FDCNN model with λmax method is instrumental in ensuring the faster convergence with improved performance accuracy in sensor based human activity recognition. Impact analysis is carried out to validate the appropriateness of the results with hyper-parameter tuning on the proposed FDCNN model with λmax method.

Findings

The effectiveness of the proposed FDCNN model with λmax method was outperformed than state-of-the-art models and attained with overall accuracy of 97.89% with overall F1 score as 0.9795.

Practical implications

The proposed fuzzy associate rule layer (FAL) layer is responsible for feature association based on fuzzy rules and regulates the uncertainty in the sensor data because of signal inferences and noises. Also, the normalized data is subjectively grouped based on the FAL kernel structure weights assigned with the λmax method.

Social implications

Contributed a novel FDCNN architecture that can support those who are keen in advancing human activity recognition (HAR) recognition.

Originality/value

A novel FDCNN architecture is implemented with appropriate FAL kernel structures.

Article
Publication date: 22 June 2022

Gang Yao, Xiaojian Hu, Liangcheng Xu and Zhening Wu

Social media data from financial websites contain information related to enterprise credit risk. Mining valuable new features in social media data helps to improve prediction…

Abstract

Purpose

Social media data from financial websites contain information related to enterprise credit risk. Mining valuable new features in social media data helps to improve prediction performance. This paper proposes a credit risk prediction framework that integrates social media information to improve listed enterprise credit risk prediction in the supply chain.

Design/methodology/approach

The prediction framework includes four stages. First, social media information is obtained through web crawler technology. Second, text sentiment in social media information is mined through natural language processing. Third, text sentiment features are constructed. Finally, the new features are integrated with traditional features as input for models for credit risk prediction. This paper takes Chinese pharmaceutical enterprises as an example to test the prediction framework and obtain relevant management enlightenment.

Findings

The prediction framework can improve enterprise credit risk prediction performance. The prediction performance of text sentiment features in social media data is better than that of most traditional features. The time-weighted text sentiment feature has the best prediction performance in mining social media information.

Practical implications

The prediction framework is helpful for the credit decision-making of credit departments and the policy regulation of regulatory departments and is conducive to the sustainable development of enterprises.

Originality/value

The prediction framework can effectively mine social media information and obtain an excellent prediction effect of listed enterprise credit risk in the supply chain.

1 – 10 of over 1000