Search results

1 – 10 of over 2000

View access options

Article

Publication date: 16 March 2023

A new method based on ensemble time series for fast and accurate clustering

The common methods for clustering time series are the use of specific distance criteria or the use of standard clustering algorithms. Ensemble clustering is one of the common…

HTML

PDF (555 KB)

Downloads

Abstract

Purpose

The common methods for clustering time series are the use of specific distance criteria or the use of standard clustering algorithms. Ensemble clustering is one of the common techniques used in data mining to increase the accuracy of clustering. In this study, based on segmentation, selecting the best segments, and using ensemble clustering for selected segments, a multistep approach has been developed for the whole clustering of time series data.

Design/methodology/approach

First, this approach divides the time series dataset into equal segments. In the next step, using one or more internal clustering criteria, the best segments are selected, and then the selected segments are combined for final clustering. By using a loop and how to select the best segments for the final clustering (using one criterion or several criteria simultaneously), two algorithms have been developed in different settings. A logarithmic relationship limits the number of segments created in the loop.

Finding

According to Rand's external criteria and statistical tests, at first, the best setting of the two developed algorithms has been selected. Then this setting has been compared to different algorithms in the literature on clustering accuracy and execution time. The obtained results indicate more accuracy and less execution time for the proposed approach.

Originality/value

This paper proposed a fast and accurate approach for time series clustering in three main steps. This is the first work that uses a combination of segmentation and ensemble clustering. More accuracy and less execution time are the remarkable achievements of this study.

Details

Data Technologies and Applications, vol. 57 no. 5

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 22 February 2024

User profiling for Chinese super-new generation wine consumers based on improved density peak clustering algorithm

Yumeng Feng, Weisong Mu, Yue Li, Tianqi Liu and Jianying Feng

For a better understanding of the preferences and differences of young consumers in emerging wine markets, this study aims to propose a clustering method to segment the super-new…

HTML

PDF (3.7 MB)

Downloads

Abstract

Purpose

For a better understanding of the preferences and differences of young consumers in emerging wine markets, this study aims to propose a clustering method to segment the super-new generation wine consumers based on their sensitivity to wine brand, origin and price and then conduct user profiles for segmented consumer groups from the perspectives of demographic attributes, eating habits and wine sensory attribute preferences.

Design/methodology/approach

We first proposed a consumer clustering perspective based on their sensitivity to wine brand, origin and price and then conducted an adaptive density peak and label propagation layer-by-layer (ADPLP) clustering algorithm to segment consumers, which improved the issues of wrong centers' selection and inaccurate classification of remaining sample points for traditional DPC (DPeak clustering algorithm). Then, we built a consumer profile system from the perspectives of demographic attributes, eating habits and wine sensory attribute preferences for segmented consumer groups.

Findings

In this study, 10 typical public datasets and 6 basic test algorithms are used to evaluate the proposed method, and the results showed that the ADPLP algorithm was optimal or suboptimal on 10 datasets with accuracy above 0.78. The average improvement in accuracy over the base DPC algorithm is 0.184. As an outcome of the wine consumer profiles, sensitive consumers prefer wines with medium prices of 100–400 CNY and more personalized brands and origins, while casual consumers are fond of popular brands, popular origins and low prices within 50 CNY. The wine sensory attributes preferred by super-new generation consumers are red, semi-dry, semi-sweet, still, fresh tasting, fruity, floral and low acid.

Practical implications

Young Chinese consumers are the main driver of wine consumption in the future. This paper provides a tool for decision-makers and marketers to identify the preferences of young consumers quickly which is meaningful and helpful for wine marketing.

Originality/value

In this study, the ADPLP algorithm was introduced for the first time. Subsequently, the user profile label system was constructed for segmented consumers to highlight their characteristics and demand partiality from three aspects: demographic characteristics, consumers' eating habits and consumers' preferences for wine attributes. Moreover, the ADPLP algorithm can be considered for user profiles on other alcoholic products.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 28 February 2023

A comparative analysis of text representation, classification and clustering methods over real project proposals

Meltem Aksoy, Seda Yanık and Mehmet Fatih Amasyali

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals…

HTML

PDF (5.3 MB)

Downloads

264

Abstract

Purpose

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals are primarily based on manual matching of similar topics, discipline areas and keywords declared by project applicants. When the number of proposals increases, this task becomes complex and requires excessive time. This paper aims to demonstrate how to effectively use the rich information in the titles and abstracts of Turkish project proposals to group them automatically.

Design/methodology/approach

This study proposes a model that effectively groups Turkish project proposals by combining word embedding, clustering and classification techniques. The proposed model uses FastText, BERT and term frequency/inverse document frequency (TF/IDF) word-embedding techniques to extract terms from the titles and abstracts of project proposals in Turkish. The extracted terms were grouped using both the clustering and classification techniques. Natural groups contained within the corpus were discovered using k-means, k-means++, k-medoids and agglomerative clustering algorithms. Additionally, this study employs classification approaches to predict the target class for each document in the corpus. To classify project proposals, various classifiers, including k-nearest neighbors (KNN), support vector machines (SVM), artificial neural networks (ANN), classification and regression trees (CART) and random forest (RF), are used. Empirical experiments were conducted to validate the effectiveness of the proposed method by using real data from the Istanbul Development Agency.

Findings

The results show that the generated word embeddings can effectively represent proposal texts as vectors, and can be used as inputs for clustering or classification algorithms. Using clustering algorithms, the document corpus is divided into five groups. In addition, the results demonstrate that the proposals can easily be categorized into predefined categories using classification algorithms. SVM-Linear achieved the highest prediction accuracy (89.2%) with the FastText word embedding method. A comparison of manual grouping with automatic classification and clustering results revealed that both classification and clustering techniques have a high success rate.

Research limitations/implications

The proposed model automatically benefits from the rich information in project proposals and significantly reduces numerous time-consuming tasks that managers must perform manually. Thus, it eliminates the drawbacks of the current manual methods and yields significantly more accurate results. In the future, additional experiments should be conducted to validate the proposed method using data from other funding organizations.

Originality/value

This study presents the application of word embedding methods to effectively use the rich information in the titles and abstracts of Turkish project proposals. Existing research studies focus on the automatic grouping of proposals; traditional frequency-based word embedding methods are used for feature extraction methods to represent project proposals. Unlike previous research, this study employs two outperforming neural network-based textual feature extraction techniques to obtain terms representing the proposals: BERT as a contextual word embedding method and FastText as a static word embedding method. Moreover, to the best of our knowledge, there has been no research conducted on the grouping of project proposals in Turkish.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 16 no. 3

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

View access options

Article

Publication date: 3 November 2022

A clustering approach for data quality results of research information systems

Reza Edris Abadi, Mohammad Javad Ershadi and Seyed Taghi Akhavan Niaki

The overall goal of the data mining process is to extract information from an extensive data set and make it understandable for further use. When working with large volumes of…

HTML

PDF (393 KB)

Downloads

162

Abstract

Purpose

The overall goal of the data mining process is to extract information from an extensive data set and make it understandable for further use. When working with large volumes of unstructured data in research information systems, it is necessary to divide the information into logical groupings after examining their quality before attempting to analyze it. On the other hand, data quality results are valuable resources for defining quality excellence programs of any information system. Hence, the purpose of this study is to discover and extract knowledge to evaluate and improve data quality in research information systems.

Design/methodology/approach

Clustering in data analysis and exploiting the outputs allows practitioners to gain an in-depth and extensive look at their information to form some logical structures based on what they have found. In this study, data extracted from an information system are used in the first stage. Then, the data quality results are classified into an organized structure based on data quality dimension standards. Next, clustering algorithms (K-Means), density-based clustering (density-based spatial clustering of applications with noise [DBSCAN]) and hierarchical clustering (balanced iterative reducing and clustering using hierarchies [BIRCH]) are applied to compare and find the most appropriate clustering algorithms in the research information system.

Findings

This paper showed that quality control results of an information system could be categorized through well-known data quality dimensions, including precision, accuracy, completeness, consistency, reputation and timeliness. Furthermore, among different well-known clustering approaches, the BIRCH algorithm of hierarchical clustering methods performs better in data clustering and gives the highest silhouette coefficient value. Next in line is the DBSCAN method, which performs better than the K-Means method.

Research limitations/implications

In the data quality assessment process, the discrepancies identified and the lack of proper classification for inconsistent data have led to unstructured reports, making the statistical analysis of qualitative metadata problems difficult and thus impossible to root out the observed errors. Therefore, in this study, the evaluation results of data quality have been categorized into various data quality dimensions, based on which multiple analyses have been performed in the form of data mining methods.

Originality/value

Although several pieces of research have been conducted to assess data quality results of research information systems, knowledge extraction from obtained data quality scores is a crucial work that has rarely been studied in the literature. Besides, clustering in data quality analysis and exploiting the outputs allows practitioners to gain an in-depth and extensive look at their information to form some logical structures based on what they have found.

Details

Information Discovery and Delivery, vol. 51 no. 4

Type: Research Article

DOI:

ISSN: 2398-6247

Keywords

View access options

Article

Publication date: 2 January 2024

Multi-view fuzzy C-means clustering with kernel metric and local information for color image segmentation

Xiumei Cai, Xi Yang and Chengmao Wu

Multi-view fuzzy clustering algorithms are not widely used in image segmentation, and many of these algorithms are lacking in robustness. The purpose of this paper is to…

HTML

PDF (7.7 MB)

Downloads

Abstract

Purpose

Multi-view fuzzy clustering algorithms are not widely used in image segmentation, and many of these algorithms are lacking in robustness. The purpose of this paper is to investigate a new algorithm that can segment the image better and retain as much detailed information about the image as possible when segmenting noisy images.

Design/methodology/approach

The authors present a novel multi-view fuzzy c-means (FCM) clustering algorithm that includes an automatic view-weight learning mechanism. Firstly, this algorithm introduces a view-weight factor that can automatically adjust the weight of different views, thereby allowing each view to obtain the best possible weight. Secondly, the algorithm incorporates a weighted fuzzy factor, which serves to obtain local spatial information and local grayscale information to preserve image details as much as possible. Finally, in order to weaken the effects of noise and outliers in image segmentation, this algorithm employs the kernel distance measure instead of the Euclidean distance.

Findings

The authors added different kinds of noise to images and conducted a large number of experimental tests. The results show that the proposed algorithm performs better and is more accurate than previous multi-view fuzzy clustering algorithms in solving the problem of noisy image segmentation.

Originality/value

Most of the existing multi-view clustering algorithms are for multi-view datasets, and the multi-view fuzzy clustering algorithms are unable to eliminate noise points and outliers when dealing with noisy images. The algorithm proposed in this paper has stronger noise immunity and can better preserve the details of the original image.

Details

Engineering Computations, vol. 41 no. 1

Type: Research Article

DOI:

ISSN: 0264-4401

Keywords

View access options

Article

Publication date: 31 May 2022

User value identification based on an improved consumer value segmentation algorithm

Jianfang Qi, Yue Li, Haibin Jin, Jianying Feng and Weisong Mu

The purpose of this study is to propose a new consumer value segmentation method for low-dimensional dense market datasets to quickly detect and cluster the most profitable…

HTML

PDF (8.6 MB)

Downloads

283

Abstract

Purpose

The purpose of this study is to propose a new consumer value segmentation method for low-dimensional dense market datasets to quickly detect and cluster the most profitable customers for the enterprises.

Design/methodology/approach

In this study, the comprehensive segmentation bases (CSB) with richer meanings were obtained by introducing the weighted recency-frequency-monetary (RFM) model into the common segmentation bases (SB). Further, a new market segmentation method, the CSB-MBK algorithm was proposed by integrating the CSB model and the mini-batch k-means (MBK) clustering algorithm.

Findings

The results show that our proposed CSB model can reflect consumers' contributions to a market, as well as improve the clustering performance. Moreover, the proposed CSB-MBK algorithm is demonstrably superior to the SB-MBK, CSB-KMA and CSB-Chameleon algorithms with respect to the Silhouette Coefficient (SC), the Calinski-Harabasz (CH) Index , the average running time and superior to the SB-MBK, RFM-MBK and WRFM-MBK algorithms in terms of the inter-market value and characteristic differentiation.

Practical implications

This paper provides a tool for decision-makers and marketers to segment a market quickly, which can help them grasp consumers' activity, loyalty, purchasing power and other characteristics in a target market timely and achieve the precision marketing.

Originality/value

This study is the first to introduce the CSB-MBK algorithm for identifying valuable customers through the comprehensive consideration of the clustering quality, consumer value and segmentation speed. Moreover, the CSB-MBK algorithm can be considered for applications in other markets.

Details

Kybernetes, vol. 52 no. 10

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 16 November 2023

Proposing new clustering-based algorithms for the multi-skilled resource-constrained multi-project scheduling problem with resource leveling adjustments

Ehsan Goudarzi, Hamid Esmaeeli, Kia Parsa and Shervin Asadzadeh

The target of this research is to develop a mathematical model which combines the Resource-Constrained Multi-Project Scheduling Problem (RCMPSP) and the Multi-Skilled…

HTML

PDF (4.2 MB)

Downloads

Abstract

Purpose

The target of this research is to develop a mathematical model which combines the Resource-Constrained Multi-Project Scheduling Problem (RCMPSP) and the Multi-Skilled Resource-Constrained Project Scheduling Problem (MSRCPSP). Due to the importance of resource management, the proposed formulation comprises resource leveling considerations as well. The model aims to simultaneously optimize: (1) the total time to accomplish all projects and (2) the total deviation of resource consumptions from the uniform utilization levels.

Design/methodology/approach

The K-Means (KM) and Fuzzy C-Means (FCM) clustering methods have been separately applied to discover the clusters of activities which have the most similar resource demands. The discovered clusters are given to the scheduling process as priori knowledge. Consequently, the execution times of the activities with the most common resource requests will not overlap. The intricacy of the problem led us to incorporate the KM and FCM techniques into a meta-heuristic called the Bi-objective Symbiosis Organisms Search (BSOS) algorithm so that the real-life samples of this problem could be solved. Therefore, two clustering-based algorithms, namely, the BSOS-KM and BSOS-FCM have been developed.

Findings

Comparisons between the BSOS-KM, BSOS-FCM and the BSOS method without any clustering approach show that the clustering techniques could enhance the optimization process. Another hybrid clustering-based methodology called the NSGA-II-SPE has been added to the comparisons to evaluate the developed resource leveling framework.

Practical implications

The practical importance of the model and the clustering-based algorithms have been demonstrated in planning several construction projects, where multiple water supply systems are concurrently constructed.

Originality/value

Reviewing the literature revealed that there was a need for a hybrid formulation that embraces the characteristics of the RCMPSP and MSRCPSP with resource leveling considerations. Moreover, the application of clustering algorithms as resource leveling techniques was not studied sufficiently in the literature.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 30 April 2021

Efficient path planning of drone swarms over clustered human crowds in social events

Faruk Bulut, Melike Bektaş and Abdullah Yavuz

In this study, supervision and control of the possible problems among people over a large area with a limited number of drone cameras and security staff is established.

HTML

PDF (6.7 MB)

Downloads

245

Abstract

Purpose

In this study, supervision and control of the possible problems among people over a large area with a limited number of drone cameras and security staff is established.

Design/methodology/approach

These drones, namely unmanned aerial vehicles (UAVs) will be adaptively and automatically distributed over the crowds to control and track the communities by the proposed system. Since crowds are mobile, the design of the drone clusters will be simultaneously re-organized according to densities and distributions of people. An adaptive and dynamic distribution and routing mechanism of UAV fleets for crowds is implemented to control a specific given region. The nine popular clustering algorithms have been used and tested in the presented mechanism to gain better performance.

Findings

The nine popular clustering algorithms have been used and tested in the presented mechanism to gain better performance. An outperformed clustering performance from the aggregated model has been received when compared with a singular clustering method over five different test cases about crowds of human distributions. This study has three basic components. The first one is to divide the human crowds into clusters. The second one is to determine an optimum route of UAVs over clusters. The last one is to direct the most appropriate security personnel to the events that occurred.

Originality/value

This study has three basic components. The first one is to divide the human crowds into clusters. The second one is to determine an optimum route of UAVs over clusters. The last one is to direct the most appropriate security personnel to the events that occurred.

Details

International Journal of Intelligent Unmanned Systems, vol. 12 no. 1

Type: Research Article

DOI:

ISSN: 2049-6427

Keywords

View access options

Article

Publication date: 28 November 2023

FedACQ: adaptive clustering quantization of model parameters in federated learning

Tingting Tian, Hongjian Shi, Ruhui Ma and Yuan Liu

For privacy protection, federated learning based on data separation allows machine learning models to be trained on remote devices or in isolated data devices. However, due to the…

HTML

PDF (1.9 MB)

Downloads

Abstract

Purpose

For privacy protection, federated learning based on data separation allows machine learning models to be trained on remote devices or in isolated data devices. However, due to the limited resources such as bandwidth and power of local devices, communication in federated learning can be much slower than in local computing. This study aims to improve communication efficiency by reducing the number of communication rounds and the size of information transmitted in each round.

Design/methodology/approach

This paper allows each user node to perform multiple local trainings, then upload the local model parameters to a central server. The central server updates the global model parameters by weighted averaging the parameter information. Based on this aggregation, user nodes first cluster the parameter information to be uploaded and then replace each value with the mean value of its cluster. Considering the asymmetry of the federated learning framework, adaptively select the optimal number of clusters required to compress the model information.

Findings

While maintaining the loss convergence rate similar to that of federated averaging, the test accuracy did not decrease significantly.

Originality/value

By compressing uplink traffic, the work can improve communication efficiency on dynamic networks with limited resources.

Details

International Journal of Web Information Systems, vol. 20 no. 1

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 5 May 2021

Understanding digitally enabled complex networks: a plural granulation based hybrid community detection approach

Samrat Gupta and Swanand Deodhar

Communities representing groups of agents with similar interests or functions are one of the essential features of complex networks. Finding communities in real-world networks is…

HTML

PDF (4.1 MB)

Downloads

427

Abstract

Purpose

Communities representing groups of agents with similar interests or functions are one of the essential features of complex networks. Finding communities in real-world networks is critical for analyzing complex systems in various areas ranging from collaborative information to political systems. Given the different characteristics of networks and the capability of community detection in handling a plethora of societal problems, community detection methods represent an emerging area of research. Contributing to this field, the authors propose a new community detection algorithm based on the hybridization of node and link granulation.

Design/methodology/approach

The proposed algorithm utilizes a rough set-theoretic concept called closure on networks. Initial sets are constructed by using neighborhood topology around the nodes as well as links and represented as two different categories of granules. Subsequently, the authors iteratively obtain the constrained closure of these sets. The authors use node mutuality and link mutuality as merging criteria for node and link granules, respectively, during the iterations. Finally, the constrained closure subsets of nodes and links are combined and refined using the Jaccard similarity coefficient and a local density function to obtain communities in a binary network.

Findings

Extensive experiments conducted on twelve real-world networks followed by a comparison with state-of-the-art methods demonstrate the viability and effectiveness of the proposed algorithm.

Research limitations/implications

The study also contributes to the ongoing effort related to the application of soft computing techniques to model complex systems. The extant literature has integrated a rough set-theoretic approach with a fuzzy granular model (Kundu and Pal, 2015) and spectral clustering (Huang and Xiao, 2012) for node-centric community detection in complex networks. In contributing to this stream of work, the proposed algorithm leverages the unexplored synergy between rough set theory, node granulation and link granulation in the context of complex networks. Combined with experiments of network datasets from various domains, the results indicate that the proposed algorithm can effectively reveal co-occurring disjoint, overlapping and nested communities without necessarily assigning each node to a community.

Practical implications

This study carries important practical implications for complex adaptive systems in business and management sciences, in which entities are increasingly getting organized into communities (Jacucci et al., 2006). The proposed community detection method can be used for network-based fraud detection by enabling experts to understand the formation and development of fraudulent setups with an active exchange of information and resources between the firms (Van Vlasselaer et al., 2017). Products and services are getting connected and mapped in every walk of life due to the emergence of a variety of interconnected devices, social networks and software applications.

Social implications

The proposed algorithm could be extended for community detection on customer trajectory patterns and design recommendation systems for online products and services (Ghose et al., 2019; Liu and Wang, 2017). In line with prior research, the proposed algorithm can aid companies in investigating the characteristics of implicit communities of bloggers or social media users for their services and products so as to identify peer influencers and conduct targeted marketing (Chau and Xu, 2012; De Matos et al., 2014; Zhang et al., 2016). The proposed algorithm can be used to understand the behavior of each group and the appropriate communication strategy for that group. For instance, a group using a specific language or following a specific account might benefit more from a particular piece of content than another group. The proposed algorithm can thus help in exploring the factors defining communities and confronting many real-life challenges.

Originality/value

This work is based on a theoretical argument that communities in networks are not only based on compatibility among nodes but also on the compatibility among links. Building up on the aforementioned argument, the authors propose a community detection method that considers the relationship among both the entities in a network (nodes and links) as opposed to traditional methods, which are predominantly based on relationships among nodes only.

Details

Information Technology & People, vol. 37 no. 2

Type: Research Article

DOI:

ISSN: 0959-3845

Keywords

Access

Year

Content type

1 – 10 of over 2000