Search results

1 – 10 of over 7000
Article
Publication date: 28 February 2023

Meltem Aksoy, Seda Yanık and Mehmet Fatih Amasyali

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals…

Abstract

Purpose

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals are primarily based on manual matching of similar topics, discipline areas and keywords declared by project applicants. When the number of proposals increases, this task becomes complex and requires excessive time. This paper aims to demonstrate how to effectively use the rich information in the titles and abstracts of Turkish project proposals to group them automatically.

Design/methodology/approach

This study proposes a model that effectively groups Turkish project proposals by combining word embedding, clustering and classification techniques. The proposed model uses FastText, BERT and term frequency/inverse document frequency (TF/IDF) word-embedding techniques to extract terms from the titles and abstracts of project proposals in Turkish. The extracted terms were grouped using both the clustering and classification techniques. Natural groups contained within the corpus were discovered using k-means, k-means++, k-medoids and agglomerative clustering algorithms. Additionally, this study employs classification approaches to predict the target class for each document in the corpus. To classify project proposals, various classifiers, including k-nearest neighbors (KNN), support vector machines (SVM), artificial neural networks (ANN), classification and regression trees (CART) and random forest (RF), are used. Empirical experiments were conducted to validate the effectiveness of the proposed method by using real data from the Istanbul Development Agency.

Findings

The results show that the generated word embeddings can effectively represent proposal texts as vectors, and can be used as inputs for clustering or classification algorithms. Using clustering algorithms, the document corpus is divided into five groups. In addition, the results demonstrate that the proposals can easily be categorized into predefined categories using classification algorithms. SVM-Linear achieved the highest prediction accuracy (89.2%) with the FastText word embedding method. A comparison of manual grouping with automatic classification and clustering results revealed that both classification and clustering techniques have a high success rate.

Research limitations/implications

The proposed model automatically benefits from the rich information in project proposals and significantly reduces numerous time-consuming tasks that managers must perform manually. Thus, it eliminates the drawbacks of the current manual methods and yields significantly more accurate results. In the future, additional experiments should be conducted to validate the proposed method using data from other funding organizations.

Originality/value

This study presents the application of word embedding methods to effectively use the rich information in the titles and abstracts of Turkish project proposals. Existing research studies focus on the automatic grouping of proposals; traditional frequency-based word embedding methods are used for feature extraction methods to represent project proposals. Unlike previous research, this study employs two outperforming neural network-based textual feature extraction techniques to obtain terms representing the proposals: BERT as a contextual word embedding method and FastText as a static word embedding method. Moreover, to the best of our knowledge, there has been no research conducted on the grouping of project proposals in Turkish.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 16 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 19 December 2023

Jinchao Huang

Single-shot multi-category clothing recognition and retrieval play a crucial role in online searching and offline settlement scenarios. Existing clothing recognition methods based…

Abstract

Purpose

Single-shot multi-category clothing recognition and retrieval play a crucial role in online searching and offline settlement scenarios. Existing clothing recognition methods based on RGBD clothing images often suffer from high-dimensional feature representations, leading to compromised performance and efficiency.

Design/methodology/approach

To address this issue, this paper proposes a novel method called Manifold Embedded Discriminative Feature Selection (MEDFS) to select global and local features, thereby reducing the dimensionality of the feature representation and improving performance. Specifically, by combining three global features and three local features, a low-dimensional embedding is constructed to capture the correlations between features and categories. The MEDFS method designs an optimization framework utilizing manifold mapping and sparse regularization to achieve feature selection. The optimization objective is solved using an alternating iterative strategy, ensuring convergence.

Findings

Empirical studies conducted on a publicly available RGBD clothing image dataset demonstrate that the proposed MEDFS method achieves highly competitive clothing classification performance while maintaining efficiency in clothing recognition and retrieval.

Originality/value

This paper introduces a novel approach for multi-category clothing recognition and retrieval, incorporating the selection of global and local features. The proposed method holds potential for practical applications in real-world clothing scenarios.

Details

International Journal of Intelligent Computing and Cybernetics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 6 October 2023

Vahide Bulut

Feature extraction from 3D datasets is a current problem. Machine learning is an important tool for classification of complex 3D datasets. Machine learning classification…

Abstract

Purpose

Feature extraction from 3D datasets is a current problem. Machine learning is an important tool for classification of complex 3D datasets. Machine learning classification techniques are widely used in various fields, such as text classification, pattern recognition, medical disease analysis, etc. The aim of this study is to apply the most popular classification and regression methods to determine the best classification and regression method based on the geodesics.

Design/methodology/approach

The feature vector is determined by the unit normal vector and the unit principal vector at each point of the 3D surface along with the point coordinates themselves. Moreover, different examples are compared according to the classification methods in terms of accuracy and the regression algorithms in terms of R-squared value.

Findings

Several surface examples are analyzed for the feature vector using classification (31 methods) and regression (23 methods) machine learning algorithms. In addition, two ensemble methods XGBoost and LightGBM are used for classification and regression. Also, the scores for each surface example are compared.

Originality/value

To the best of the author’s knowledge, this is the first study to analyze datasets based on geodesics using machine learning algorithms for classification and regression.

Details

Engineering Computations, vol. 40 no. 9/10
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 28 March 2023

Jun Liu, Sike Hu, Fuad Mehraliyev and Haolong Liu

This study aims to investigate the current state of research using deep learning methods for text classification in the tourism and hospitality field and to propose specific…

Abstract

Purpose

This study aims to investigate the current state of research using deep learning methods for text classification in the tourism and hospitality field and to propose specific guidelines for future research.

Design/methodology/approach

This study undertakes a qualitative and critical review of studies that use deep learning methods for text classification in research fields of tourism and hospitality and computer science. The data was collected from the Web of Science database and included studies published until February 2022.

Findings

Findings show that current research has mainly focused on text feature classification, text rating classification and text sentiment classification. Most of the deep learning methods used are relatively old, proposed in the 20th century, including feed-forward neural networks and artificial neural networks, among others. Deep learning algorithms proposed in recent years in the field of computer science with better classification performance have not been introduced to tourism and hospitality for large-scale dissemination and use. In addition, most of the data the studies used were from publicly available rating data sets; only two studies manually annotated data collected from online tourism websites.

Practical implications

The applications of deep learning algorithms and data in the tourism and hospitality field are discussed, laying the foundation for future text mining research. The findings also hold implications for managers regarding the use of deep learning in tourism and hospitality. Researchers and practitioners can use methodological frameworks and recommendations proposed in this study to perform more effective classifications such as for quality assessment or service feature extraction purposes.

Originality/value

The paper provides an integrative review of research in text classification using deep learning methods in the tourism and hospitality field, points out newer deep learning methods that are suitable for classification and identifies how to develop different annotated data sets applicable to the field. Furthermore, foundations and directions for future text classification research are set.

Details

International Journal of Contemporary Hospitality Management, vol. 35 no. 12
Type: Research Article
ISSN: 0959-6119

Keywords

Article
Publication date: 3 December 2022

Vahide Bulut

Surface curvature is needed to analyze the range data of real objects and is widely applied in object recognition and segmentation, robotics, and computer vision. Therefore, it is…

Abstract

Purpose

Surface curvature is needed to analyze the range data of real objects and is widely applied in object recognition and segmentation, robotics, and computer vision. Therefore, it is not easy to estimate the curvature of the scanned data. In recent years, machine learning classification methods have gained importance in various fields such as finance, health, engineering, etc. The purpose of this study is to classify surface points based on principal curvatures to find the best method for determining surface point types.

Design/methodology/approach

A feature selection method is presented to find the best feature vector that achieves the highest accuracy. For this reason, ten different feature selections are used and six sample datasets of different sizes are classified using these feature vectors.

Findings

The author examined the surface examples based on the feature vector using the machine learning classification methods. Also, the author compared the results for each experiment.

Originality/value

To the best of the author's knowledge, this is the first study to examine surface points according to principal curvatures using machine learning classification methods.

Details

Data Technologies and Applications, vol. 57 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 5 May 2023

Nguyen Thi Dinh, Nguyen Thi Uyen Nhi, Thanh Manh Le and Thanh The Van

The problem of image retrieval and image description exists in various fields. In this paper, a model of content-based image retrieval and image content extraction based on the…

Abstract

Purpose

The problem of image retrieval and image description exists in various fields. In this paper, a model of content-based image retrieval and image content extraction based on the KD-Tree structure was proposed.

Design/methodology/approach

A Random Forest structure was built to classify the objects on each image on the basis of the balanced multibranch KD-Tree structure. From that purpose, a KD-Tree structure was generated by the Random Forest to retrieve a set of similar images for an input image. A KD-Tree structure is applied to determine a relationship word at leaves to extract the relationship between objects on an input image. An input image content is described based on class names and relationships between objects.

Findings

A model of image retrieval and image content extraction was proposed based on the proposed theoretical basis; simultaneously, the experiment was built on multi-object image datasets including Microsoft COCO and Flickr with an average image retrieval precision of 0.9028 and 0.9163, respectively. The experimental results were compared with those of other works on the same image dataset to demonstrate the effectiveness of the proposed method.

Originality/value

A balanced multibranch KD-Tree structure was built to apply to relationship classification on the basis of the original KD-Tree structure. Then, KD-Tree Random Forest was built to improve the classifier performance and retrieve a set of similar images for an input image. Concurrently, the image content was described in the process of combining class names and relationships between objects.

Details

Data Technologies and Applications, vol. 57 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 14 December 2023

Huaxiang Song, Chai Wei and Zhou Yong

The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of…

Abstract

Purpose

The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of clustered ground objects and noisy backgrounds. Recent research typically leverages larger volume models to achieve advanced performance. However, the operating environments of remote sensing commonly cannot provide unconstrained computational and storage resources. It requires lightweight algorithms with exceptional generalization capabilities.

Design/methodology/approach

This study introduces an efficient knowledge distillation (KD) method to build a lightweight yet precise convolutional neural network (CNN) classifier. This method also aims to substantially decrease the training time expenses commonly linked with traditional KD techniques. This approach entails extensive alterations to both the model training framework and the distillation process, each tailored to the unique characteristics of RSIs. In particular, this study establishes a robust ensemble teacher by independently training two CNN models using a customized, efficient training algorithm. Following this, this study modifies a KD loss function to mitigate the suppression of non-target category predictions, which are essential for capturing the inter- and intra-similarity of RSIs.

Findings

This study validated the student model, termed KD-enhanced network (KDE-Net), obtained through the KD process on three benchmark RSI data sets. The KDE-Net surpasses 42 other state-of-the-art methods in the literature published from 2020 to 2023. Compared to the top-ranked method’s performance on the challenging NWPU45 data set, KDE-Net demonstrated a noticeable 0.4% increase in overall accuracy with a significant 88% reduction in parameters. Meanwhile, this study’s reformed KD framework significantly enhances the knowledge transfer speed by at least three times.

Originality/value

This study illustrates that the logit-based KD technique can effectively develop lightweight CNN classifiers for RSI classification without substantial sacrifices in computation and storage costs. Compared to neural architecture search or other methods aiming to provide lightweight solutions, this study’s KDE-Net, based on the inherent characteristics of RSIs, is currently more efficient in constructing accurate yet lightweight classifiers for RSI classification.

Details

International Journal of Web Information Systems, vol. 20 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 18 March 2022

Pinsheng Duan, Jianliang Zhou and Shiwei Tao

The outbreak of the pandemic makes it more difficult to manage the safety or health of construction workers in infrastructure construction. Risk events in construction workers'…

Abstract

Purpose

The outbreak of the pandemic makes it more difficult to manage the safety or health of construction workers in infrastructure construction. Risk events in construction workers' material handling tasks are highly relevant to workers' work-related musculoskeletal disorders. However, there are still many problems to be resolved in recognizing risk events accurately. The purpose of this research is to propose an automatic and non-invasive recognition method for construction workers in material handling tasks during the pandemic based on smartphone and machine learning.

Design/methodology/approach

This research proposes a method to recognize and classify four different risk events by collecting specific acceleration and angular velocity patterns through built-in sensors of smartphones. The events were simulated with anterior handling and shoulder handling methods in the laboratory. After data segmentation and feature extraction, five different machine learning methods are used to recognize risk events and the classification performances are compared.

Findings

The classification result of the shoulder handling method was slightly better than the anterior handling method. By comparing the accuracy of five different classifiers, cross-validation results showed that the classification accuracy of the random forest algorithm was the highest (76.71% in anterior handling method and 80.13% in shoulder handling method) when the window size was 0.64 s.

Originality/value

Less attention has been paid to the risk events in workers' material handling tasks in previous studies, and most events are recorded by manual observation methods. This study provided a simple and objective way to judge the risk events in manual material handling tasks of construction workers based on smartphones, which can be used as a non-invasive way for managers to improve health and labor productivity during the pandemic.

Details

Engineering, Construction and Architectural Management, vol. 30 no. 8
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 10 March 2023

Jingyi Li and Shiwei Chao

Binary classification on imbalanced data is a challenge; due to the imbalance of the classes, the minority class is easily masked by the majority class. However, most existing…

Abstract

Purpose

Binary classification on imbalanced data is a challenge; due to the imbalance of the classes, the minority class is easily masked by the majority class. However, most existing classifiers are better at identifying the majority class, thereby ignoring the minority class, which leads to classifier degradation. To address this, this paper proposes a twin-support vector machines for binary classification on imbalanced data.

Design/methodology/approach

In the proposed method, the authors construct two support vector machines to focus on majority classes and minority classes, respectively. In order to promote the learning ability of the two support vector machines, a new kernel is derived for them.

Findings

(1) A novel twin-support vector machine is proposed for binary classification on imbalanced data, and new kernels are derived. (2) For imbalanced data, the complexity of data distribution has negative effects on classification results; however, advanced classification results can be gained and desired boundaries are learned by using optimizing kernels. (3) Classifiers based on twin architectures have more advantages than those based on single architecture for binary classification on imbalanced data.

Originality/value

For imbalanced data, the complexity of data distribution has negative effects on classification results; however, advanced classification results can be gained and desired boundaries are learned through using optimizing kernels.

Details

Data Technologies and Applications, vol. 57 no. 3
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 21 December 2023

Majid Rahi, Ali Ebrahimnejad and Homayun Motameni

Taking into consideration the current human need for agricultural produce such as rice that requires water for growth, the optimal consumption of this valuable liquid is…

Abstract

Purpose

Taking into consideration the current human need for agricultural produce such as rice that requires water for growth, the optimal consumption of this valuable liquid is important. Unfortunately, the traditional use of water by humans for agricultural purposes contradicts the concept of optimal consumption. Therefore, designing and implementing a mechanized irrigation system is of the highest importance. This system includes hardware equipment such as liquid altimeter sensors, valves and pumps which have a failure phenomenon as an integral part, causing faults in the system. Naturally, these faults occur at probable time intervals, and the probability function with exponential distribution is used to simulate this interval. Thus, before the implementation of such high-cost systems, its evaluation is essential during the design phase.

Design/methodology/approach

The proposed approach included two main steps: offline and online. The offline phase included the simulation of the studied system (i.e. the irrigation system of paddy fields) and the acquisition of a data set for training machine learning algorithms such as decision trees to detect, locate (classification) and evaluate faults. In the online phase, C5.0 decision trees trained in the offline phase were used on a stream of data generated by the system.

Findings

The proposed approach is a comprehensive online component-oriented method, which is a combination of supervised machine learning methods to investigate system faults. Each of these methods is considered a component determined by the dimensions and complexity of the case study (to discover, classify and evaluate fault tolerance). These components are placed together in the form of a process framework so that the appropriate method for each component is obtained based on comparison with other machine learning methods. As a result, depending on the conditions under study, the most efficient method is selected in the components. Before the system implementation phase, its reliability is checked by evaluating the predicted faults (in the system design phase). Therefore, this approach avoids the construction of a high-risk system. Compared to existing methods, the proposed approach is more comprehensive and has greater flexibility.

Research limitations/implications

By expanding the dimensions of the problem, the model verification space grows exponentially using automata.

Originality/value

Unlike the existing methods that only examine one or two aspects of fault analysis such as fault detection, classification and fault-tolerance evaluation, this paper proposes a comprehensive process-oriented approach that investigates all three aspects of fault analysis concurrently.

Details

International Journal of Intelligent Computing and Cybernetics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1756-378X

Keywords

1 – 10 of over 7000