Search results

1 – 10 of 797
Article
Publication date: 23 August 2022

Kamlesh Kumar Pandey and Diwakar Shukla

The K-means (KM) clustering algorithm is extremely responsive to the selection of initial centroids since the initial centroid of clusters determines computational effectiveness…

Abstract

Purpose

The K-means (KM) clustering algorithm is extremely responsive to the selection of initial centroids since the initial centroid of clusters determines computational effectiveness, efficiency and local optima issues. Numerous initialization strategies are to overcome these problems through the random and deterministic selection of initial centroids. The random initialization strategy suffers from local optimization issues with the worst clustering performance, while the deterministic initialization strategy achieves high computational cost. Big data clustering aims to reduce computation costs and improve cluster efficiency. The objective of this study is to achieve a better initial centroid for big data clustering on business management data without using random and deterministic initialization that avoids local optima and improves clustering efficiency with effectiveness in terms of cluster quality, computation cost, data comparisons and iterations on a single machine.

Design/methodology/approach

This study presents the Normal Distribution Probability Density (NDPD) algorithm for big data clustering on a single machine to solve business management-related clustering issues. The NDPDKM algorithm resolves the KM clustering problem by probability density of each data point. The NDPDKM algorithm first identifies the most probable density data points by using the mean and standard deviation of the datasets through normal probability density. Thereafter, the NDPDKM determines K initial centroid by using sorting and linear systematic sampling heuristics.

Findings

The performance of the proposed algorithm is compared with KM, KM++, Var-Part, Murat-KM, Mean-KM and Sort-KM algorithms through Davies Bouldin score, Silhouette coefficient, SD Validity, S_Dbw Validity, Number of Iterations and CPU time validation indices on eight real business datasets. The experimental evaluation demonstrates that the NDPDKM algorithm reduces iterations, local optima, computing costs, and improves cluster performance, effectiveness, efficiency with stable convergence as compared to other algorithms. The NDPDKM algorithm minimizes the average computing time up to 34.83%, 90.28%, 71.83%, 92.67%, 69.53% and 76.03%, and reduces the average iterations up to 40.32%, 44.06%, 32.02%, 62.78%, 19.07% and 36.74% with reference to KM, KM++, Var-Part, Murat-KM, Mean-KM and Sort-KM algorithms.

Originality/value

The KM algorithm is the most widely used partitional clustering approach in data mining techniques that extract hidden knowledge, patterns and trends for decision-making strategies in business data. Business analytics is one of the applications of big data clustering where KM clustering is useful for the various subcategories of business analytics such as customer segmentation analysis, employee salary and performance analysis, document searching, delivery optimization, discount and offer analysis, chaplain management, manufacturing analysis, productivity analysis, specialized employee and investor searching and other decision-making strategies in business.

Article
Publication date: 19 April 2013

Barileé B. Baridam and M. Montaz Ali

The K‐means clustering algorithm has been intensely researched owing to its simplicity of implementation and usefulness in the clustering task. However, there have also been…

Abstract

Purpose

The K‐means clustering algorithm has been intensely researched owing to its simplicity of implementation and usefulness in the clustering task. However, there have also been criticisms on its performance, in particular, for demanding the value of K before the actual clustering task. It is evident from previous researches that providing the number of clusters a priori does not in any way assist in the production of good quality clusters. The authors' investigations in this paper also confirm this finding. The purpose of this paper is to investigate further, the usefulness of the K‐means clustering in the clustering of high and multi‐dimensional data by applying it to biological sequence data.

Design/methodology/approach

The authors suggest a scheme which maps the high dimensional data into low dimensions, then show that the K‐means algorithm with pre‐processor produces good quality, compact and well‐separated clusters of the biological data mapped in low dimensions. For the purpose of clustering, a character‐to‐numeric conversion was conducted to transform the nucleic/amino acids symbols to numeric values.

Findings

A preprocessing technique has been suggested.

Originality/value

Conceptually this is a new paper with new results.

Article
Publication date: 3 June 2014

Manuel Blanco Abello and Zbigniew Michalewicz

This is the first part of a two-part paper. The purpose of this paper is to report on methods that use the Response Surface Methodology (RSM) to investigate an Evolutionary…

Abstract

Purpose

This is the first part of a two-part paper. The purpose of this paper is to report on methods that use the Response Surface Methodology (RSM) to investigate an Evolutionary Algorithm (EA) and memory-based approach referred to as McBAR – the Mapping of Task IDs for Centroid-Based Adaptation with Random Immigrants. Some of the methods are useful for investigating the performance (solution-search abilities) of techniques (comprised of McBAR and other selected EA-based techniques) for solving some multi-objective dynamic resource-constrained project scheduling problems with time-varying number of tasks.

Design/methodology/approach

The RSM is applied to: determine some EA parameters of the techniques, develop models of the performance of each technique, legitimize some algorithmic components of McBAR, manifest the relative performance of McBAR over the other techniques and determine the resiliency of McBAR against changes in the environment.

Findings

The results of applying the methods are explored in the second part of this work.

Originality/value

The models are composite and characterize an EA memory-based technique. Further, the resiliency of techniques is determined by applying Lagrange optimization that involves the models.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 7 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 9 May 2016

Chao-Lung Yang and Thi Phuong Quyen Nguyen

Class-based storage has been studied extensively and proved to be an efficient storage policy. However, few literature addressed how to cluster stuck items for class-based…

2537

Abstract

Purpose

Class-based storage has been studied extensively and proved to be an efficient storage policy. However, few literature addressed how to cluster stuck items for class-based storage. The purpose of this paper is to develop a constrained clustering method integrated with principal component analysis (PCA) to meet the need of clustering stored items with the consideration of practical storage constraints.

Design/methodology/approach

In order to consider item characteristic and the associated storage restrictions, the must-link and cannot-link constraints were constructed to meet the storage requirement. The cube-per-order index (COI) which has been used for location assignment in class-based warehouse was analyzed by PCA. The proposed constrained clustering method utilizes the principal component loadings as item sub-group features to identify COI distribution of item sub-groups. The clustering results are then used for allocating storage by using the heuristic assignment model based on COI.

Findings

The clustering result showed that the proposed method was able to provide better compactness among item clusters. The simulated result also shows the new location assignment by the proposed method was able to improve the retrieval efficiency by 33 percent.

Practical implications

While number of items in warehouse is tremendously large, the human intervention on revealing storage constraints is going to be impossible. The developed method can be easily fit in to solve the problem no matter what the size of the data is.

Originality/value

The case study demonstrated an example of practical location assignment problem with constraints. This paper also sheds a light on developing a data clustering method which can be directly applied on solving the practical data analysis issues.

Details

Industrial Management & Data Systems, vol. 116 no. 4
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 15 May 2017

Young Wook Seo, Kun Chang Lee and Sangjae Lee

For those who plan research funds and assess the research performance from the funds, it is necessary to overcome the limitations of the conventional classification of evaluated…

Abstract

Purpose

For those who plan research funds and assess the research performance from the funds, it is necessary to overcome the limitations of the conventional classification of evaluated papers published by the research funds. Besides, they need to promote the objective, fair clustering of papers, and analysis of research performance. Therefore, the purpose of this paper is to find the optimum clustering algorithm using the MATLAB tools by comparing the performances of and the hybrid particle swarm optimization algorithms using the particle swarm optimization (PSO) algorithm and the conventional K-means clustering method.

Design/methodology/approach

The clustering analysis experiment for each of the three fields of study – health and medicine, physics, and chemistry – used the following three algorithms: “K-means+Simulated annealing (SA)+Adjustment of parameters+PSO” (KASA-PSO clustering), “K-means+SA+PSO” clustering, “K-means+PSO” clustering.

Findings

The clustering analyses of all the three fields showed that KASA-PSO is the best method for the minimization of fitness value. Furthermore, this study administered the surveys intended for the “performance measurement of decision-making process” with 13 members of the research fund organization to compare the group clustering by the clustering analysis method of KASA-PSO algorithm and the group clustering by research funds. The results statistically demonstrated that the group clustering by the clustering analysis method of KASA-PSO algorithm was better than the group clustering by research funds.

Practical implications

This study examined the impact of bibliometric indicators on research impact of papers. The results showed that research period, the number of authors, and the number of participating researchers had positive effects on the impact factor (IF) of the papers; the IF that indicates the qualitative level of papers had a positive effect on the primary times cited; and the primary times cited had a positive effect on the secondary times cited. Furthermore, this study clearly showed the decision quality perceived by those who are working for the research fund organization.

Originality/value

There are still too few studies that assess the research project evaluation mechanisms and its effectiveness perceived by the research fund managers. To fill the research void like this, this study aims to propose PSO and successfully proves validity of the proposed approach.

Article
Publication date: 25 February 2019

Celia Hireche and Habiba Drias

This paper is an extended version of Hireche and Drias (2018) presented at the WORLD-CIST’18 conference. The major contribution, in this work, is defined in two phases. First of…

Abstract

Purpose

This paper is an extended version of Hireche and Drias (2018) presented at the WORLD-CIST’18 conference. The major contribution, in this work, is defined in two phases. First of all, the use of data mining technologies and especially the tools of data preprocessing for instances of hard and complex problems prior to their resolution. The authors focus on clustering the instance aiming at reducing its complexity. The second phase is to solve the instance using the knowledge acquired in the first step and problem-solving methods. The paper aims to discuss these issues.

Design/methodology/approach

Because different clustering techniques may offer different results for a data set, a prior knowledge on data helps to determine the adequate type of clustering that should be applied. The first part of this work deals with a study on data descriptive characteristics in order to better understand the data. The dispersion and distribution of the variables in the problem instances is especially explored to determine the most suitable clustering technique to apply.

Findings

Several experiments were performed on different kinds of instances and different kinds of data distribution. The obtained results show the importance and the efficiency of the proposed appropriate preprocessing approaches prior to problem solving.

Practical implications

The proposed approach is developed, in this paper, on the Boolean satisfiability problem because of its well-recognised importance, with the aim of complexity reduction which allows an easier resolution of the later problem and particularly an important time saving.

Originality/value

State of the art of problem solving describes plenty of algorithms and solvers of hard problems that are still a challenge because of their complexity. The originality of this work lies on the investigation of appropriate preprocessing techniques to tackle and overcome this complexity prior to the resolution which becomes easier with an important time saving.

Details

Data Technologies and Applications, vol. 53 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 11 November 2014

Li Yang, Zhiping Chen and Qianhui Hu

To help investors find an investment policy with strong competitiveness, the purpose of this paper is to construct a multi-period investment decision model with practicality and…

Abstract

Purpose

To help investors find an investment policy with strong competitiveness, the purpose of this paper is to construct a multi-period investment decision model with practicality and superior performance.

Design/methodology/approach

The paper uses a suitable multi-period risk measure to construct a multi-period portfolio selection model, where target returns at intermediate periods and market frictions are taken into account simultaneously. An efficient scenario tree generation approach is proposed in order to transform the complex multi-period portfolio selection problem into a tractable one.

Findings

Numerical results show the new scenario tree generation algorithms are stable and can further reduce the tree size. With the scenario tree generated by the new scenario tree generation approach, the optimal investment strategy obtained under the multi-period investment decision model has more superior performance and robustness than the corresponding optimal investment strategy obtained under the single period investment model or the multi-period investment model only paying attention to the terminal cash flow.

Research limitations/implications

The new risk measure and multi-period investment decision models can stimulate readers to find even better models and to efficiently solve realistic multi-period portfolio selection problems.

Practical implications

The empirical results show the superior performance and robustness of optimal investment strategy obtained with the new models. What's more important, the empirical analyses tell readers how different market frictions affect the performance of optimal portfolios, which can guide them to efficiently solve real multi-period investment decision problems in practice.

Originality/value

The paper first derives the concrete structure of the time consistent generalized convex multi-period risk measure, then constructs a multi-period portfolio selection model based on the new multi-period risk measure, and proposes a new extremum scenario tree generation algorithm. The authors construct a realistic multi-period investment decision model. Furthermore, using the proposed scenario tree generation algorithm, the authors transform the established stochastic investment decision model into a deterministic optimization problem, which can provide optimal investment decisions with robustness and superior performance.

Open Access
Article
Publication date: 5 September 2016

Qingyuan Wu, Changchen Zhan, Fu Lee Wang, Siyang Wang and Zeping Tang

The quick growth of web-based and mobile e-learning applications such as massive open online courses have created a large volume of online learning resources. Confronting such a…

3521

Abstract

Purpose

The quick growth of web-based and mobile e-learning applications such as massive open online courses have created a large volume of online learning resources. Confronting such a large amount of learning data, it is important to develop effective clustering approaches for user group modeling and intelligent tutoring. The paper aims to discuss these issues.

Design/methodology/approach

In this paper, a minimum spanning tree based approach is proposed for clustering of online learning resources. The novel clustering approach has two main stages, namely, elimination stage and construction stage. During the elimination stage, the Euclidean distance is adopted as a metrics formula to measure density of learning resources. Resources with quite low densities are identified as outliers and therefore removed. During the construction stage, a minimum spanning tree is built by initializing the centroids according to the degree of freedom of the resources. Online learning resources are subsequently partitioned into clusters by exploiting the structure of minimum spanning tree.

Findings

Conventional clustering algorithms have a number of shortcomings such that they cannot handle online learning resources effectively. On the one hand, extant partitional clustering methods use a randomly assigned centroid for each cluster, which usually cause the problem of ineffective clustering results. On the other hand, classical density-based clustering methods are very computationally expensive and time-consuming. Experimental results indicate that the algorithm proposed outperforms the traditional clustering algorithms for online learning resources.

Originality/value

The effectiveness of the proposed algorithms has been validated by using several data sets. Moreover, the proposed clustering algorithm has great potential in e-learning applications. It has been demonstrated how the novel technique can be integrated in various e-learning systems. For example, the clustering technique can classify learners into groups so that homogeneous grouping can improve the effectiveness of learning. Moreover, clustering of online learning resources is valuable to decision making in terms of tutorial strategies and instructional design for intelligent tutoring. Lastly, a number of directions for future research have been identified in the study.

Details

Asian Association of Open Universities Journal, vol. 11 no. 2
Type: Research Article
ISSN: 1858-3431

Keywords

Article
Publication date: 16 July 2021

Yerra Readdy Alekya Rani and Edara Sreenivasa Reddy

Wireless sensor networks (WSN) have been widely adopted for various applications due to their properties of pervasive computing. It is necessary to prolong the WSN lifetime; it…

Abstract

Purpose

Wireless sensor networks (WSN) have been widely adopted for various applications due to their properties of pervasive computing. It is necessary to prolong the WSN lifetime; it avails its benefit for a long time. WSN lifetime may vary according to the applications, and in most cases, it is considered as the time to the death of the first node in the module. Clustering has been one of the successful strategies for increasing the effectiveness of the network, as it selects the appropriate cluster head (CH) for communication. However, most clustering protocols are based on probabilistic schemes, which may create two CH for a single cluster group, leading to cause more energy consumption. Hence, it is necessary to build up a clustering strategy with the improved properties for the CH selection. The purpose of this paper is to provide better convergence for large simulation space and to use it for optimizing the communication path of WSN.

Design/methodology/approach

This paper plans to develop a new clustering protocol in WSN using fuzzy clustering and an improved meta-heuristic algorithm. The fuzzy clustering approach is adopted for performing the clustering of nodes with respective fuzzy centroid by using the input constraints such as signal-to-interference-plus-noise ratio (SINR), load and residual energy, between the CHs and nodes. After the cluster formation, the combined utility function is used to refine the CH selection. The CH is determined based on computing the combined utility function, in which the node attaining the maximum combined utility function is selected as the CH. After the clustering and CH formation, the optimal communication between the CH and the nodes is induced by a new meta-heuristic algorithm called Fitness updated Crow Search Algorithm (FU-CSA). This optimal communication is accomplished by concerning a multi-objective function with constraints with residual energy and the distance between the nodes. Finally, the simulation results show that the proposed technique enhances the network lifetime and energy efficiency when compared to the state-of-the-art techniques.

Findings

The proposed Fuzzy+FU-CSA algorithm has achieved low-cost function values of 48% to Fuzzy+Particle Swarm Optimization (PSO), 60% to Fuzzy+Grey Wolf Optimizer (GWO), 40% to Fuzzy+Whale Optimization Algorithm (WOA) and 25% to Fuzzy+CSA, respectively. Thus, the results prove that the proposed Fuzzy+FU-CSA has the optimal performance than the other algorithms, and thus provides a high network lifetime and energy.

Originality/value

For the efficient clustering and the CH selection, a combined utility function was developed by using the network parameters such as energy, load, SINR and distance. The fuzzy clustering uses the constraint inputs such as residual energy, load and SINR for clustering the nodes of WSN. This work had developed an FU-CSA algorithm for the selection of the optimal communication path for the WSN.

Details

International Journal of Pervasive Computing and Communications, vol. 19 no. 2
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 28 March 2008

Stefan Janson, Daniel Merkle and Martin Middendorf

The purpose of this paper is to present an approach for the decentralization of swarm intelligence algorithms that run on computing systems with autonomous components that are…

1874

Abstract

Purpose

The purpose of this paper is to present an approach for the decentralization of swarm intelligence algorithms that run on computing systems with autonomous components that are connected by a network. The approach is applied to a particle swarm optimization (PSO) algorithm with multiple sub‐swarms. PSO is a nature inspired metaheuristic where a swarm of particles searches for an optimum of a function. A multiple sub‐swarms PSO can be used for example in applications where more than one optimum has to be found.

Design/methodology/approach

In the studied scenario the particles of the PSO algorithm correspond to data packets that are sent through the network of the computing system. Each data packet contains among other information the position of the corresponding particle in the search space and its sub‐swarm number. In the proposed decentralized PSO algorithm the application specific tasks, i.e. the function evaluations, are done by the autonomous components of the system. The more general tasks, like the dynamic clustering of data packets, are done by the routers of the network.

Findings

Simulation experiments show that the decentralized PSO algorithm can successfully find a set of minimum values for the used test functions. It was also shown that the PSO algorithm works well for different type of networks, like scale‐free network and ring like networks.

Originality/value

The proposed decentralization approach is interesting for the design of optimization algorithms that can run on computing systems that use principles of self‐organization and have no central control.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 1 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

1 – 10 of 797