Search results

1 – 10 of over 11000
Article
Publication date: 16 March 2023

Ali Ghorbanian and Hamideh Razavi

The common methods for clustering time series are the use of specific distance criteria or the use of standard clustering algorithms. Ensemble clustering is one of the common…

Abstract

Purpose

The common methods for clustering time series are the use of specific distance criteria or the use of standard clustering algorithms. Ensemble clustering is one of the common techniques used in data mining to increase the accuracy of clustering. In this study, based on segmentation, selecting the best segments, and using ensemble clustering for selected segments, a multistep approach has been developed for the whole clustering of time series data.

Design/methodology/approach

First, this approach divides the time series dataset into equal segments. In the next step, using one or more internal clustering criteria, the best segments are selected, and then the selected segments are combined for final clustering. By using a loop and how to select the best segments for the final clustering (using one criterion or several criteria simultaneously), two algorithms have been developed in different settings. A logarithmic relationship limits the number of segments created in the loop.

Finding

According to Rand's external criteria and statistical tests, at first, the best setting of the two developed algorithms has been selected. Then this setting has been compared to different algorithms in the literature on clustering accuracy and execution time. The obtained results indicate more accuracy and less execution time for the proposed approach.

Originality/value

This paper proposed a fast and accurate approach for time series clustering in three main steps. This is the first work that uses a combination of segmentation and ensemble clustering. More accuracy and less execution time are the remarkable achievements of this study.

Details

Data Technologies and Applications, vol. 57 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 16 July 2021

Yerra Readdy Alekya Rani and Edara Sreenivasa Reddy

Wireless sensor networks (WSN) have been widely adopted for various applications due to their properties of pervasive computing. It is necessary to prolong the WSN lifetime; it…

Abstract

Purpose

Wireless sensor networks (WSN) have been widely adopted for various applications due to their properties of pervasive computing. It is necessary to prolong the WSN lifetime; it avails its benefit for a long time. WSN lifetime may vary according to the applications, and in most cases, it is considered as the time to the death of the first node in the module. Clustering has been one of the successful strategies for increasing the effectiveness of the network, as it selects the appropriate cluster head (CH) for communication. However, most clustering protocols are based on probabilistic schemes, which may create two CH for a single cluster group, leading to cause more energy consumption. Hence, it is necessary to build up a clustering strategy with the improved properties for the CH selection. The purpose of this paper is to provide better convergence for large simulation space and to use it for optimizing the communication path of WSN.

Design/methodology/approach

This paper plans to develop a new clustering protocol in WSN using fuzzy clustering and an improved meta-heuristic algorithm. The fuzzy clustering approach is adopted for performing the clustering of nodes with respective fuzzy centroid by using the input constraints such as signal-to-interference-plus-noise ratio (SINR), load and residual energy, between the CHs and nodes. After the cluster formation, the combined utility function is used to refine the CH selection. The CH is determined based on computing the combined utility function, in which the node attaining the maximum combined utility function is selected as the CH. After the clustering and CH formation, the optimal communication between the CH and the nodes is induced by a new meta-heuristic algorithm called Fitness updated Crow Search Algorithm (FU-CSA). This optimal communication is accomplished by concerning a multi-objective function with constraints with residual energy and the distance between the nodes. Finally, the simulation results show that the proposed technique enhances the network lifetime and energy efficiency when compared to the state-of-the-art techniques.

Findings

The proposed Fuzzy+FU-CSA algorithm has achieved low-cost function values of 48% to Fuzzy+Particle Swarm Optimization (PSO), 60% to Fuzzy+Grey Wolf Optimizer (GWO), 40% to Fuzzy+Whale Optimization Algorithm (WOA) and 25% to Fuzzy+CSA, respectively. Thus, the results prove that the proposed Fuzzy+FU-CSA has the optimal performance than the other algorithms, and thus provides a high network lifetime and energy.

Originality/value

For the efficient clustering and the CH selection, a combined utility function was developed by using the network parameters such as energy, load, SINR and distance. The fuzzy clustering uses the constraint inputs such as residual energy, load and SINR for clustering the nodes of WSN. This work had developed an FU-CSA algorithm for the selection of the optimal communication path for the WSN.

Details

International Journal of Pervasive Computing and Communications, vol. 19 no. 2
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 5 September 2016

Runhai Jiao, Shaolong Liu, Wu Wen and Biying Lin

The large volume of big data makes it impractical for traditional clustering algorithms which are usually designed for entire data set. The purpose of this paper is to focus on…

Abstract

Purpose

The large volume of big data makes it impractical for traditional clustering algorithms which are usually designed for entire data set. The purpose of this paper is to focus on incremental clustering which divides data into series of data chunks and only a small amount of data need to be clustered at each time. Few researches on incremental clustering algorithm address the problem of optimizing cluster center initialization for each data chunk and selecting multiple passing points for each cluster.

Design/methodology/approach

Through optimizing initial cluster centers, quality of clustering results is improved for each data chunk and then quality of final clustering results is enhanced. Moreover, through selecting multiple passing points, more accurate information is passed down to improve the final clustering results. The method has been proposed to solve those two problems and is applied in the proposed algorithm based on streaming kernel fuzzy c-means (stKFCM) algorithm.

Findings

Experimental results show that the proposed algorithm demonstrates more accuracy and better performance than streaming kernel stKFCM algorithm.

Originality/value

This paper addresses the problem of improving the performance of increment clustering through optimizing cluster center initialization and selecting multiple passing points. The paper analyzed the performance of the proposed scheme and proved its effectiveness.

Details

Kybernetes, vol. 45 no. 8
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 22 November 2011

Jingjing Ma, Maoguo Gong and Licheng Jiao

The purpose of this paper is to present an evolutionary clustering algorithm based on mixed measure for complex distributed data.

Abstract

Purpose

The purpose of this paper is to present an evolutionary clustering algorithm based on mixed measure for complex distributed data.

Design/methodology/approach

In this method, the data are first partitioned into some spherical distributed sub‐clusters by using the Euclidean distance as the similarity measurement, and each clustering center represents all the members of corresponding cluster. Then, the clustering centers obtained in the first phase are clustered by using a novel manifold distance as the similarity measurement. The two clustering processes in this method are both based on evolutionary algorithm.

Findings

Theoretical analysis and experimental results on seven artificial data sets and seven UCI data sets with different structures show that the novel algorithm has the ability to identify clusters efficiently with no matter simple or complex, convex or non‐convex distribution. When compared with the genetic algorithm‐based clustering and the K‐means algorithm, the proposed algorithm outperformed the compared algorithms on most of the test data sets.

Originality/value

The method presented in this paper represents a new approach to solving clustering problems of complex distributed data. The novel method applies the idea “coarse clustering, fine clustering”, which executes coarse clustering by Euclidean distance and fine clustering by manifold distance as similarity measurements, respectively. The proposed clustering algorithm is shown to be effective in solving data clustering problems with different distribution.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 4 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 15 May 2017

Young Wook Seo, Kun Chang Lee and Sangjae Lee

For those who plan research funds and assess the research performance from the funds, it is necessary to overcome the limitations of the conventional classification of evaluated…

Abstract

Purpose

For those who plan research funds and assess the research performance from the funds, it is necessary to overcome the limitations of the conventional classification of evaluated papers published by the research funds. Besides, they need to promote the objective, fair clustering of papers, and analysis of research performance. Therefore, the purpose of this paper is to find the optimum clustering algorithm using the MATLAB tools by comparing the performances of and the hybrid particle swarm optimization algorithms using the particle swarm optimization (PSO) algorithm and the conventional K-means clustering method.

Design/methodology/approach

The clustering analysis experiment for each of the three fields of study – health and medicine, physics, and chemistry – used the following three algorithms: “K-means+Simulated annealing (SA)+Adjustment of parameters+PSO” (KASA-PSO clustering), “K-means+SA+PSO” clustering, “K-means+PSO” clustering.

Findings

The clustering analyses of all the three fields showed that KASA-PSO is the best method for the minimization of fitness value. Furthermore, this study administered the surveys intended for the “performance measurement of decision-making process” with 13 members of the research fund organization to compare the group clustering by the clustering analysis method of KASA-PSO algorithm and the group clustering by research funds. The results statistically demonstrated that the group clustering by the clustering analysis method of KASA-PSO algorithm was better than the group clustering by research funds.

Practical implications

This study examined the impact of bibliometric indicators on research impact of papers. The results showed that research period, the number of authors, and the number of participating researchers had positive effects on the impact factor (IF) of the papers; the IF that indicates the qualitative level of papers had a positive effect on the primary times cited; and the primary times cited had a positive effect on the secondary times cited. Furthermore, this study clearly showed the decision quality perceived by those who are working for the research fund organization.

Originality/value

There are still too few studies that assess the research project evaluation mechanisms and its effectiveness perceived by the research fund managers. To fill the research void like this, this study aims to propose PSO and successfully proves validity of the proposed approach.

Article
Publication date: 18 June 2021

Shuai Luo, Hongwei Liu and Ershi Qi

The purpose of this paper is to recognize and label the faults in wind turbines with a new density-based clustering algorithm, named contour density scanning clustering (CDSC…

Abstract

Purpose

The purpose of this paper is to recognize and label the faults in wind turbines with a new density-based clustering algorithm, named contour density scanning clustering (CDSC) algorithm.

Design/methodology/approach

The algorithm includes four components: (1) computation of neighborhood density, (2) selection of core and noise data, (3) scanning core data and (4) updating clusters. The proposed algorithm considers the relationship between neighborhood data points according to a contour density scanning strategy.

Findings

The first experiment is conducted with artificial data to validate that the proposed CDSC algorithm is suitable for handling data points with arbitrary shapes. The second experiment with industrial gearbox vibration data is carried out to demonstrate that the time complexity and accuracy of the proposed CDSC algorithm in comparison with other conventional clustering algorithms, including k-means, density-based spatial clustering of applications with noise, density peaking clustering, neighborhood grid clustering, support vector clustering, random forest, core fusion-based density peak clustering, AdaBoost and extreme gradient boosting. The third experiment is conducted with an industrial bearing vibration data set to highlight that the CDSC algorithm can automatically track the emerging fault patterns of bearing in wind turbines over time.

Originality/value

Data points with different densities are clustered using three strategies: direct density reachability, density reachability and density connectivity. A contours density scanning strategy is proposed to determine whether the data points with the same density belong to one cluster. The proposed CDSC algorithm achieves automatically clustering, which means that the trends of the fault pattern could be tracked.

Details

Data Technologies and Applications, vol. 55 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 22 February 2024

Yumeng Feng, Weisong Mu, Yue Li, Tianqi Liu and Jianying Feng

For a better understanding of the preferences and differences of young consumers in emerging wine markets, this study aims to propose a clustering method to segment the super-new…

Abstract

Purpose

For a better understanding of the preferences and differences of young consumers in emerging wine markets, this study aims to propose a clustering method to segment the super-new generation wine consumers based on their sensitivity to wine brand, origin and price and then conduct user profiles for segmented consumer groups from the perspectives of demographic attributes, eating habits and wine sensory attribute preferences.

Design/methodology/approach

We first proposed a consumer clustering perspective based on their sensitivity to wine brand, origin and price and then conducted an adaptive density peak and label propagation layer-by-layer (ADPLP) clustering algorithm to segment consumers, which improved the issues of wrong centers' selection and inaccurate classification of remaining sample points for traditional DPC (DPeak clustering algorithm). Then, we built a consumer profile system from the perspectives of demographic attributes, eating habits and wine sensory attribute preferences for segmented consumer groups.

Findings

In this study, 10 typical public datasets and 6 basic test algorithms are used to evaluate the proposed method, and the results showed that the ADPLP algorithm was optimal or suboptimal on 10 datasets with accuracy above 0.78. The average improvement in accuracy over the base DPC algorithm is 0.184. As an outcome of the wine consumer profiles, sensitive consumers prefer wines with medium prices of 100–400 CNY and more personalized brands and origins, while casual consumers are fond of popular brands, popular origins and low prices within 50 CNY. The wine sensory attributes preferred by super-new generation consumers are red, semi-dry, semi-sweet, still, fresh tasting, fruity, floral and low acid.

Practical implications

Young Chinese consumers are the main driver of wine consumption in the future. This paper provides a tool for decision-makers and marketers to identify the preferences of young consumers quickly which is meaningful and helpful for wine marketing.

Originality/value

In this study, the ADPLP algorithm was introduced for the first time. Subsequently, the user profile label system was constructed for segmented consumers to highlight their characteristics and demand partiality from three aspects: demographic characteristics, consumers' eating habits and consumers' preferences for wine attributes. Moreover, the ADPLP algorithm can be considered for user profiles on other alcoholic products.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0368-492X

Keywords

Book part
Publication date: 13 December 2017

Qiongwei Ye and Baojun Ma

Internet + and Electronic Business in China is a comprehensive resource that provides insight and analysis into E-commerce in China and how it has revolutionized and continues to…

Abstract

Internet + and Electronic Business in China is a comprehensive resource that provides insight and analysis into E-commerce in China and how it has revolutionized and continues to revolutionize business and society. Split into four distinct sections, the book first lays out the theoretical foundations and fundamental concepts of E-Business before moving on to look at internet+ innovation models and their applications in different industries such as agriculture, finance and commerce. The book then provides a comprehensive analysis of E-business platforms and their applications in China before finishing with four comprehensive case studies of major E-business projects, providing readers with successful examples of implementing E-Business entrepreneurship projects.

Internet + and Electronic Business in China is a comprehensive resource that provides insights and analysis into how E-commerce has revolutionized and continues to revolutionize business and society in China.

Details

Internet+ and Electronic Business in China: Innovation and Applications
Type: Book
ISBN: 978-1-78743-115-7

Article
Publication date: 23 August 2022

Kamlesh Kumar Pandey and Diwakar Shukla

The K-means (KM) clustering algorithm is extremely responsive to the selection of initial centroids since the initial centroid of clusters determines computational effectiveness…

Abstract

Purpose

The K-means (KM) clustering algorithm is extremely responsive to the selection of initial centroids since the initial centroid of clusters determines computational effectiveness, efficiency and local optima issues. Numerous initialization strategies are to overcome these problems through the random and deterministic selection of initial centroids. The random initialization strategy suffers from local optimization issues with the worst clustering performance, while the deterministic initialization strategy achieves high computational cost. Big data clustering aims to reduce computation costs and improve cluster efficiency. The objective of this study is to achieve a better initial centroid for big data clustering on business management data without using random and deterministic initialization that avoids local optima and improves clustering efficiency with effectiveness in terms of cluster quality, computation cost, data comparisons and iterations on a single machine.

Design/methodology/approach

This study presents the Normal Distribution Probability Density (NDPD) algorithm for big data clustering on a single machine to solve business management-related clustering issues. The NDPDKM algorithm resolves the KM clustering problem by probability density of each data point. The NDPDKM algorithm first identifies the most probable density data points by using the mean and standard deviation of the datasets through normal probability density. Thereafter, the NDPDKM determines K initial centroid by using sorting and linear systematic sampling heuristics.

Findings

The performance of the proposed algorithm is compared with KM, KM++, Var-Part, Murat-KM, Mean-KM and Sort-KM algorithms through Davies Bouldin score, Silhouette coefficient, SD Validity, S_Dbw Validity, Number of Iterations and CPU time validation indices on eight real business datasets. The experimental evaluation demonstrates that the NDPDKM algorithm reduces iterations, local optima, computing costs, and improves cluster performance, effectiveness, efficiency with stable convergence as compared to other algorithms. The NDPDKM algorithm minimizes the average computing time up to 34.83%, 90.28%, 71.83%, 92.67%, 69.53% and 76.03%, and reduces the average iterations up to 40.32%, 44.06%, 32.02%, 62.78%, 19.07% and 36.74% with reference to KM, KM++, Var-Part, Murat-KM, Mean-KM and Sort-KM algorithms.

Originality/value

The KM algorithm is the most widely used partitional clustering approach in data mining techniques that extract hidden knowledge, patterns and trends for decision-making strategies in business data. Business analytics is one of the applications of big data clustering where KM clustering is useful for the various subcategories of business analytics such as customer segmentation analysis, employee salary and performance analysis, document searching, delivery optimization, discount and offer analysis, chaplain management, manufacturing analysis, productivity analysis, specialized employee and investor searching and other decision-making strategies in business.

Article
Publication date: 28 February 2023

Meltem Aksoy, Seda Yanık and Mehmet Fatih Amasyali

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals…

Abstract

Purpose

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals are primarily based on manual matching of similar topics, discipline areas and keywords declared by project applicants. When the number of proposals increases, this task becomes complex and requires excessive time. This paper aims to demonstrate how to effectively use the rich information in the titles and abstracts of Turkish project proposals to group them automatically.

Design/methodology/approach

This study proposes a model that effectively groups Turkish project proposals by combining word embedding, clustering and classification techniques. The proposed model uses FastText, BERT and term frequency/inverse document frequency (TF/IDF) word-embedding techniques to extract terms from the titles and abstracts of project proposals in Turkish. The extracted terms were grouped using both the clustering and classification techniques. Natural groups contained within the corpus were discovered using k-means, k-means++, k-medoids and agglomerative clustering algorithms. Additionally, this study employs classification approaches to predict the target class for each document in the corpus. To classify project proposals, various classifiers, including k-nearest neighbors (KNN), support vector machines (SVM), artificial neural networks (ANN), classification and regression trees (CART) and random forest (RF), are used. Empirical experiments were conducted to validate the effectiveness of the proposed method by using real data from the Istanbul Development Agency.

Findings

The results show that the generated word embeddings can effectively represent proposal texts as vectors, and can be used as inputs for clustering or classification algorithms. Using clustering algorithms, the document corpus is divided into five groups. In addition, the results demonstrate that the proposals can easily be categorized into predefined categories using classification algorithms. SVM-Linear achieved the highest prediction accuracy (89.2%) with the FastText word embedding method. A comparison of manual grouping with automatic classification and clustering results revealed that both classification and clustering techniques have a high success rate.

Research limitations/implications

The proposed model automatically benefits from the rich information in project proposals and significantly reduces numerous time-consuming tasks that managers must perform manually. Thus, it eliminates the drawbacks of the current manual methods and yields significantly more accurate results. In the future, additional experiments should be conducted to validate the proposed method using data from other funding organizations.

Originality/value

This study presents the application of word embedding methods to effectively use the rich information in the titles and abstracts of Turkish project proposals. Existing research studies focus on the automatic grouping of proposals; traditional frequency-based word embedding methods are used for feature extraction methods to represent project proposals. Unlike previous research, this study employs two outperforming neural network-based textual feature extraction techniques to obtain terms representing the proposals: BERT as a contextual word embedding method and FastText as a static word embedding method. Moreover, to the best of our knowledge, there has been no research conducted on the grouping of project proposals in Turkish.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 16 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

1 – 10 of over 11000