Search results

1 – 10 of 211
Article
Publication date: 28 February 2023

Meltem Aksoy, Seda Yanık and Mehmet Fatih Amasyali

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals…

Abstract

Purpose

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals are primarily based on manual matching of similar topics, discipline areas and keywords declared by project applicants. When the number of proposals increases, this task becomes complex and requires excessive time. This paper aims to demonstrate how to effectively use the rich information in the titles and abstracts of Turkish project proposals to group them automatically.

Design/methodology/approach

This study proposes a model that effectively groups Turkish project proposals by combining word embedding, clustering and classification techniques. The proposed model uses FastText, BERT and term frequency/inverse document frequency (TF/IDF) word-embedding techniques to extract terms from the titles and abstracts of project proposals in Turkish. The extracted terms were grouped using both the clustering and classification techniques. Natural groups contained within the corpus were discovered using k-means, k-means++, k-medoids and agglomerative clustering algorithms. Additionally, this study employs classification approaches to predict the target class for each document in the corpus. To classify project proposals, various classifiers, including k-nearest neighbors (KNN), support vector machines (SVM), artificial neural networks (ANN), classification and regression trees (CART) and random forest (RF), are used. Empirical experiments were conducted to validate the effectiveness of the proposed method by using real data from the Istanbul Development Agency.

Findings

The results show that the generated word embeddings can effectively represent proposal texts as vectors, and can be used as inputs for clustering or classification algorithms. Using clustering algorithms, the document corpus is divided into five groups. In addition, the results demonstrate that the proposals can easily be categorized into predefined categories using classification algorithms. SVM-Linear achieved the highest prediction accuracy (89.2%) with the FastText word embedding method. A comparison of manual grouping with automatic classification and clustering results revealed that both classification and clustering techniques have a high success rate.

Research limitations/implications

The proposed model automatically benefits from the rich information in project proposals and significantly reduces numerous time-consuming tasks that managers must perform manually. Thus, it eliminates the drawbacks of the current manual methods and yields significantly more accurate results. In the future, additional experiments should be conducted to validate the proposed method using data from other funding organizations.

Originality/value

This study presents the application of word embedding methods to effectively use the rich information in the titles and abstracts of Turkish project proposals. Existing research studies focus on the automatic grouping of proposals; traditional frequency-based word embedding methods are used for feature extraction methods to represent project proposals. Unlike previous research, this study employs two outperforming neural network-based textual feature extraction techniques to obtain terms representing the proposals: BERT as a contextual word embedding method and FastText as a static word embedding method. Moreover, to the best of our knowledge, there has been no research conducted on the grouping of project proposals in Turkish.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 16 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 10 March 2023

Jingyi Li and Shiwei Chao

Binary classification on imbalanced data is a challenge; due to the imbalance of the classes, the minority class is easily masked by the majority class. However, most existing…

Abstract

Purpose

Binary classification on imbalanced data is a challenge; due to the imbalance of the classes, the minority class is easily masked by the majority class. However, most existing classifiers are better at identifying the majority class, thereby ignoring the minority class, which leads to classifier degradation. To address this, this paper proposes a twin-support vector machines for binary classification on imbalanced data.

Design/methodology/approach

In the proposed method, the authors construct two support vector machines to focus on majority classes and minority classes, respectively. In order to promote the learning ability of the two support vector machines, a new kernel is derived for them.

Findings

(1) A novel twin-support vector machine is proposed for binary classification on imbalanced data, and new kernels are derived. (2) For imbalanced data, the complexity of data distribution has negative effects on classification results; however, advanced classification results can be gained and desired boundaries are learned by using optimizing kernels. (3) Classifiers based on twin architectures have more advantages than those based on single architecture for binary classification on imbalanced data.

Originality/value

For imbalanced data, the complexity of data distribution has negative effects on classification results; however, advanced classification results can be gained and desired boundaries are learned through using optimizing kernels.

Details

Data Technologies and Applications, vol. 57 no. 3
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 18 September 2023

Jianxiang Qiu, Jialiang Xie, Dongxiao Zhang and Ruping Zhang

Twin support vector machine (TSVM) is an effective machine learning technique. However, the TSVM model does not consider the influence of different data samples on the optimal…

Abstract

Purpose

Twin support vector machine (TSVM) is an effective machine learning technique. However, the TSVM model does not consider the influence of different data samples on the optimal hyperplane, which results in its sensitivity to noise. To solve this problem, this study proposes a twin support vector machine model based on fuzzy systems (FSTSVM).

Design/methodology/approach

This study designs an effective fuzzy membership assignment strategy based on fuzzy systems. It describes the relationship between the three inputs and the fuzzy membership of the sample by defining fuzzy inference rules and then exports the fuzzy membership of the sample. Combining this strategy with TSVM, the FSTSVM is proposed. Moreover, to speed up the model training, this study employs a coordinate descent strategy with shrinking by active set. To evaluate the performance of FSTSVM, this study conducts experiments designed on artificial data sets and UCI data sets.

Findings

The experimental results affirm the effectiveness of FSTSVM in addressing binary classification problems with noise, demonstrating its superior robustness and generalization performance compared to existing learning models. This can be attributed to the proposed fuzzy membership assignment strategy based on fuzzy systems, which effectively mitigates the adverse effects of noise.

Originality/value

This study designs a fuzzy membership assignment strategy based on fuzzy systems that effectively reduces the negative impact caused by noise and then proposes the noise-robust FSTSVM model. Moreover, the model employs a coordinate descent strategy with shrinking by active set to accelerate the training speed of the model.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 17 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 29 December 2023

Thanh-Nghi Do and Minh-Thu Tran-Nguyen

This study aims to propose novel edge device-tailored federated learning algorithms of local classifiers (stochastic gradient descent, support vector machines), namely, FL-lSGD…

Abstract

Purpose

This study aims to propose novel edge device-tailored federated learning algorithms of local classifiers (stochastic gradient descent, support vector machines), namely, FL-lSGD and FL-lSVM. These algorithms are designed to address the challenge of large-scale ImageNet classification.

Design/methodology/approach

The authors’ FL-lSGD and FL-lSVM trains in a parallel and incremental manner to build an ensemble local classifier on Raspberry Pis without requiring data exchange. The algorithms load small data blocks of the local training subset stored on the Raspberry Pi sequentially to train the local classifiers. The data block is split into k partitions using the k-means algorithm, and models are trained in parallel on each data partition to enable local data classification.

Findings

Empirical test results on the ImageNet data set show that the authors’ FL-lSGD and FL-lSVM algorithms with 4 Raspberry Pis (Quad core Cortex-A72, ARM v8, 64-bit SoC @ 1.5GHz, 4GB RAM) are faster than the state-of-the-art LIBLINEAR algorithm run on a PC (Intel(R) Core i7-4790 CPU, 3.6 GHz, 4 cores, 32GB RAM).

Originality/value

Efficiently addressing the challenge of large-scale ImageNet classification, the authors’ novel federated learning algorithms of local classifiers have been tailored to work on the Raspberry Pi. These algorithms can handle 1,281,167 images and 1,000 classes effectively.

Details

International Journal of Web Information Systems, vol. 20 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 29 March 2024

Jianping Zhang, Leilei Wang and Guodong Wang

With the rapid advancement in the automotive industry, the friction coefficient (FC), wear rate (WR) and weight loss (WL) have emerged as crucial parameters to measure the…

33

Abstract

Purpose

With the rapid advancement in the automotive industry, the friction coefficient (FC), wear rate (WR) and weight loss (WL) have emerged as crucial parameters to measure the performance of automotive braking systems, so the FC, WR and WL of friction material are predicted and analyzed in this work, with an aim of achieving accurate prediction of friction material properties.

Design/methodology/approach

Genetic algorithm support vector machine (GA-SVM) model is obtained by applying GA to optimize the SVM in this work, thus establishing a prediction model for friction material properties and achieving the predictive and comparative analysis of friction material properties. The process parameters are analyzed by using response surface methodology (RSM) and GA-RSM to determine them for optimal friction performance.

Findings

The results indicate that the GA-SVM prediction model has the smallest error for FC, WR and WL, showing that it owns excellent prediction accuracy. The predicted values obtained by response surface analysis are closed to those of GA-SVM model, providing further evidence of the validity and the rationality of the established prediction model.

Originality/value

The relevant results can serve as a valuable theoretical foundation for the preparation of friction material in engineering practice.

Details

Industrial Lubrication and Tribology, vol. 76 no. 3
Type: Research Article
ISSN: 0036-8792

Keywords

Article
Publication date: 17 May 2023

Md Shamim Hossain, Humaira Begum, Md. Abdur Rouf and Md. Mehedul Islam Sabuj

The goal of the current research is to use different machine learning (ML) approaches to examine and predict customer reviews of food delivery apps (FDAs).

Abstract

Purpose

The goal of the current research is to use different machine learning (ML) approaches to examine and predict customer reviews of food delivery apps (FDAs).

Design/methodology/approach

Using Google Play Scraper, data from five food delivery service providers were collected from the Google Play store. Following cleaning the reviews, the filtered texts were classified as having negative, positive, or neutral sentiments, which were then scored using two unsupervised sentiment algorithms (AFINN and Valence Aware Dictionary for sentiment Reasoning (VADER)). Furthermore, the authors employed four ML approaches to categorize each review of FDAs into the respective sentiment class.

Findings

According to the study's findings, the majority of customer reviews of FDAs were positive. This research also revealed that, while all of the methods (decision tree, linear support vector machine, random forest classifier and logistic regression) can appropriately classify the reviews into a sentiment category, support vector machines (SVM) beats the others in terms of model accuracy. The authors' study also showed that logistic regression provided the highest recall, F1 score and lowest Root Mean Square Error (RMSE) among the four ML models.

Practical implications

The findings aid FDAs in determining customer review behavior. The study's findings could help food apps developers better understand how customers feel about the developers' products and services. The food apps developer can learn how to use ML techniques to better understand the users' behavior.

Originality/value

The current study uses ML methodologies to investigate and predict consumer attitude regarding FDAs.

Details

Journal of Contemporary Marketing Science, vol. 6 no. 2
Type: Research Article
ISSN: 2516-7480

Keywords

Article
Publication date: 6 October 2023

Vahide Bulut

Feature extraction from 3D datasets is a current problem. Machine learning is an important tool for classification of complex 3D datasets. Machine learning classification…

Abstract

Purpose

Feature extraction from 3D datasets is a current problem. Machine learning is an important tool for classification of complex 3D datasets. Machine learning classification techniques are widely used in various fields, such as text classification, pattern recognition, medical disease analysis, etc. The aim of this study is to apply the most popular classification and regression methods to determine the best classification and regression method based on the geodesics.

Design/methodology/approach

The feature vector is determined by the unit normal vector and the unit principal vector at each point of the 3D surface along with the point coordinates themselves. Moreover, different examples are compared according to the classification methods in terms of accuracy and the regression algorithms in terms of R-squared value.

Findings

Several surface examples are analyzed for the feature vector using classification (31 methods) and regression (23 methods) machine learning algorithms. In addition, two ensemble methods XGBoost and LightGBM are used for classification and regression. Also, the scores for each surface example are compared.

Originality/value

To the best of the author’s knowledge, this is the first study to analyze datasets based on geodesics using machine learning algorithms for classification and regression.

Details

Engineering Computations, vol. 40 no. 9/10
Type: Research Article
ISSN: 0264-4401

Keywords

Open Access
Article
Publication date: 3 August 2020

Djordje Cica, Branislav Sredanovic, Sasa Tesic and Davorin Kramar

Sustainable manufacturing is one of the most important and most challenging issues in present industrial scenario. With the intention of diminish negative effects associated with…

2128

Abstract

Sustainable manufacturing is one of the most important and most challenging issues in present industrial scenario. With the intention of diminish negative effects associated with cutting fluids, the machining industries are continuously developing technologies and systems for cooling/lubricating of the cutting zone while maintaining machining efficiency. In the present study, three regression based machine learning techniques, namely, polynomial regression (PR), support vector regression (SVR) and Gaussian process regression (GPR) were developed to predict machining force, cutting power and cutting pressure in the turning of AISI 1045. In the development of predictive models, machining parameters of cutting speed, depth of cut and feed rate were considered as control factors. Since cooling/lubricating techniques significantly affects the machining performance, prediction model development of quality characteristics was performed under minimum quantity lubrication (MQL) and high-pressure coolant (HPC) cutting conditions. The prediction accuracy of developed models was evaluated by statistical error analyzing methods. Results of regressions based machine learning techniques were also compared with probably one of the most frequently used machine learning method, namely artificial neural networks (ANN). Finally, a metaheuristic approach based on a neural network algorithm was utilized to perform an efficient multi-objective optimization of process parameters for both cutting environment.

Details

Applied Computing and Informatics, vol. 20 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 11 October 2021

Khameel Mustapha, Jamal Alhiyafi, Aamir Shafi and Sunday Olusanya Olatunji

This study aims to investigate the prediction of the nonlinear response of three-dimensional-printed polymeric lattice structures with and without structural defects. Unlike…

Abstract

Purpose

This study aims to investigate the prediction of the nonlinear response of three-dimensional-printed polymeric lattice structures with and without structural defects. Unlike metallic structures, the deformation behavior of polymeric components is difficult to quantify through the classical numerical analysis approach as a result of their nonlinear behavior under mechanical loads.

Design/methodology/approach

Geometric models of periodic lattice structures were designed via PTC Creo. Imperfections in the form of missing unit cells are introduced in the replica of the lattice structure. The perfect and imperfect lattice structures have the same dimensions – 10 mm × 14 mm × 30 mm (w × h × L). The fused deposition modelling technique is used to fabricate the parts. The fabricated parts were subjected to physical compression tests to provide a measure of their transverse compressibility resistance. The ensuing nonlinear response from the experimental tests is deployed to develop a support vector machine surrogate model.

Findings

Results from the surrogate model’s performance, in terms of correlation coefficient, rose to as high as 99.91% for the nonlinear compressive stress with a minimum achieved being 98.51% across the four datasets used. In the case of deflection response, the model accuracy rose to as high as 99.74% while the minimum achieved is 98.56% across the four datasets used.

Originality/value

The developed model facilitates the prediction of the quasi-static response of the structures in the absence and presence of defects without the need for repeated physical experiments. The structure investigated is designed for target applications in hierarchical polymer packaging, and the methodology presents a cost-saving method for data-driven constitutive modelling of polymeric parts.

Details

Journal of Engineering, Design and Technology , vol. 21 no. 3
Type: Research Article
ISSN: 1726-0531

Keywords

Article
Publication date: 29 September 2021

Swetha Parvatha Reddy Chandrasekhara, Mohan G. Kabadi and Srivinay

This study has mainly aimed to compare and contrast two completely different image processing algorithms that are very adaptive for detecting prostate cancer using wearable…

Abstract

Purpose

This study has mainly aimed to compare and contrast two completely different image processing algorithms that are very adaptive for detecting prostate cancer using wearable Internet of Things (IoT) devices. Cancer in these modern times is still considered as one of the most dreaded disease, which is continuously pestering the mankind over a past few decades. According to Indian Council of Medical Research, India alone registers about 11.5 lakh cancer related cases every year and closely up to 8 lakh people die with cancer related issues each year. Earlier the incidence of prostate cancer was commonly seen in men aged above 60 years, but a recent study has revealed that this type of cancer has been on rise even in men between the age groups of 35 and 60 years as well. These findings make it even more necessary to prioritize the research on diagnosing the prostate cancer at an early stage, so that the patients can be cured and can lead a normal life.

Design/methodology/approach

The research focuses on two types of feature extraction algorithms, namely, scale invariant feature transform (SIFT) and gray level co-occurrence matrix (GLCM) that are commonly used in medical image processing, in an attempt to discover and improve the gap present in the potential detection of prostate cancer in medical IoT. Later the results obtained by these two strategies are classified separately using a machine learning based classification model called multi-class support vector machine (SVM). Owing to the advantage of better tissue discrimination and contrast resolution, magnetic resonance imaging images have been considered for this study. The classification results obtained for both the SIFT as well as GLCM methods are then compared to check, which feature extraction strategy provides the most accurate results for diagnosing the prostate cancer.

Findings

The potential of both the models has been evaluated in terms of three aspects, namely, accuracy, sensitivity and specificity. Each model’s result was checked against diversified ranges of training and test data set. It was found that the SIFT-multiclass SVM model achieved a highest performance rate of 99.9451% accuracy, 100% sensitivity and 99% specificity at 40:60 ratio of the training and testing data set.

Originality/value

The SIFT-multi SVM versus GLCM-multi SVM based comparison has been introduced for the first time to perceive the best model to be used for the accurate diagnosis of prostate cancer. The performance of the classification for each of the feature extraction strategies is enumerated in terms of accuracy, sensitivity and specificity.

Details

International Journal of Pervasive Computing and Communications, vol. 20 no. 1
Type: Research Article
ISSN: 1742-7371

Keywords

Access

Year

Last 12 months (211)

Content type

Article (211)
1 – 10 of 211