Search results

1 – 10 of 490
Article
Publication date: 6 February 2017

Zhongyi Wang, Jin Zhang and Jing Huang

Current segmentation systems almost invariably focus on linear segmentation and can only divide text into linear sequences of segments. This suits cohesive text such as news feed…

Abstract

Purpose

Current segmentation systems almost invariably focus on linear segmentation and can only divide text into linear sequences of segments. This suits cohesive text such as news feed but not coherent texts such as documents of a digital library which have hierarchical structures. To overcome the focus on linear segmentation in document segmentation and to realize the purpose of hierarchical segmentation for a digital library’s structured resources, this paper aimed to propose a new multi-granularity hierarchical topic-based segmentation system (MHTSS) to decide section breaks.

Design/methodology/approach

MHTSS adopts up-down segmentation strategy to divide a structured, digital library document into a document segmentation tree. Specifically, it works in a three-stage process, such as document parsing, coarse segmentation based on document access structures and fine-grained segmentation based on lexical cohesion.

Findings

This paper analyzed limitations of document segmentation methods for the structured, digital library resources. Authors found that the combination of document access structures and lexical cohesion techniques should complement each other and allow for a better segmentation of structured, digital library resources. Based on this finding, this paper proposed the MHTSS for the structured, digital library resources. To evaluate it, MHTSS was compared to the TT and C99 algorithms on real-world digital library corpora. Through comparison, it was found that the MHTSS achieves top overall performance.

Practical implications

With MHTSS, digital library users can get their relevant information directly in segments instead of receiving the whole document. This will improve retrieval performance as well as dramatically reduce information overload.

Originality/value

This paper proposed MHTSS for the structured, digital library resources, which combines the document access structures and lexical cohesion techniques to decide section breaks. With this system, end-users can access a document by sections through a document structure tree.

Article
Publication date: 26 September 2019

Asma Ayari and Sadok Bouamama

The multi-robot task allocation (MRTA) problem is a challenging issue in the robotics area with plentiful practical applications. Expanding the number of tasks and robots…

Abstract

Purpose

The multi-robot task allocation (MRTA) problem is a challenging issue in the robotics area with plentiful practical applications. Expanding the number of tasks and robots increases the size of the state space significantly and influences the performance of the MRTA. As this process requires high computational time, this paper aims to describe a technique that minimizes the size of the explored state space, by partitioning the tasks into clusters. In this paper, the authors address the problem of MRTA by putting forward a new automatic clustering algorithm of the robots' tasks based on a dynamic-distributed double-guided particle swarm optimization, namely, ACD3GPSO.

Design/methodology/approach

This approach is made out of two phases: phase I groups the tasks into clusters using the ACD3GPSO algorithm and phase II allocates the robots to the clusters. Four factors are introduced in ACD3GPSO for better results. First, ACD3GPSO uses the k-means algorithm as a means to improve the initial generation of particles. The second factor is the distribution using the multi-agent approach to reduce the run time. The third one is the diversification introduced by two local optimum detectors LODpBest and LODgBest. The last one is based on the concept of templates and guidance probability Pguid.

Findings

Computational experiments were carried out to prove the effectiveness of this approach. It is compared against two state-of-the-art solutions of the MRTA and against two evolutionary methods under five different numerical simulations. The simulation results confirm that the proposed method is highly competitive in terms of the clustering time, clustering cost and MRTA time.

Practical implications

The proposed algorithm is quite useful for real-world applications, especially the scenarios involving a high number of robots and tasks.

Originality/value

In this methodology, owing to the ACD3GPSO algorithm, task allocation's run time has diminished. Therefore, the proposed method can be considered as a vital alternative in the field of MRTA with growing numbers of both robots and tasks. In PSO, stagnation and local optima issues are avoided by adding assorted variety to the population, without losing its fast convergence.

Details

Assembly Automation, vol. 40 no. 2
Type: Research Article
ISSN: 0144-5154

Keywords

Article
Publication date: 23 August 2022

Kamlesh Kumar Pandey and Diwakar Shukla

The K-means (KM) clustering algorithm is extremely responsive to the selection of initial centroids since the initial centroid of clusters determines computational effectiveness…

Abstract

Purpose

The K-means (KM) clustering algorithm is extremely responsive to the selection of initial centroids since the initial centroid of clusters determines computational effectiveness, efficiency and local optima issues. Numerous initialization strategies are to overcome these problems through the random and deterministic selection of initial centroids. The random initialization strategy suffers from local optimization issues with the worst clustering performance, while the deterministic initialization strategy achieves high computational cost. Big data clustering aims to reduce computation costs and improve cluster efficiency. The objective of this study is to achieve a better initial centroid for big data clustering on business management data without using random and deterministic initialization that avoids local optima and improves clustering efficiency with effectiveness in terms of cluster quality, computation cost, data comparisons and iterations on a single machine.

Design/methodology/approach

This study presents the Normal Distribution Probability Density (NDPD) algorithm for big data clustering on a single machine to solve business management-related clustering issues. The NDPDKM algorithm resolves the KM clustering problem by probability density of each data point. The NDPDKM algorithm first identifies the most probable density data points by using the mean and standard deviation of the datasets through normal probability density. Thereafter, the NDPDKM determines K initial centroid by using sorting and linear systematic sampling heuristics.

Findings

The performance of the proposed algorithm is compared with KM, KM++, Var-Part, Murat-KM, Mean-KM and Sort-KM algorithms through Davies Bouldin score, Silhouette coefficient, SD Validity, S_Dbw Validity, Number of Iterations and CPU time validation indices on eight real business datasets. The experimental evaluation demonstrates that the NDPDKM algorithm reduces iterations, local optima, computing costs, and improves cluster performance, effectiveness, efficiency with stable convergence as compared to other algorithms. The NDPDKM algorithm minimizes the average computing time up to 34.83%, 90.28%, 71.83%, 92.67%, 69.53% and 76.03%, and reduces the average iterations up to 40.32%, 44.06%, 32.02%, 62.78%, 19.07% and 36.74% with reference to KM, KM++, Var-Part, Murat-KM, Mean-KM and Sort-KM algorithms.

Originality/value

The KM algorithm is the most widely used partitional clustering approach in data mining techniques that extract hidden knowledge, patterns and trends for decision-making strategies in business data. Business analytics is one of the applications of big data clustering where KM clustering is useful for the various subcategories of business analytics such as customer segmentation analysis, employee salary and performance analysis, document searching, delivery optimization, discount and offer analysis, chaplain management, manufacturing analysis, productivity analysis, specialized employee and investor searching and other decision-making strategies in business.

Article
Publication date: 1 December 2003

Dimos C. Charmpis and Manolis Papadrakakis

Balancing and dual domain decomposition methods (DDMs) comprise a family of efficient high performance solution approaches for a large number of problems in computational…

Abstract

Balancing and dual domain decomposition methods (DDMs) comprise a family of efficient high performance solution approaches for a large number of problems in computational mechanics. Such DDMs are used in practice on parallel computing environments with the number of generated subdomains being generally larger than the number of available processors. This paper presents an effective heuristic technique for organizing the subdomains into subdomain clusters, in order to assign each cluster to a processor. This task is handled by the proposed approach as a graph partitioning optimization problem using the publicly available software METIS. The objective of the optimization process is to minimize the communication requirements of the DDMs under the constraint of producing balanced processor workloads. This constraint optimization procedure for treating the subdomain cluster generation task leads to increased computational efficiencies for balancing and dual DDMs.

Details

Engineering Computations, vol. 20 no. 8
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 1 August 1997

Nabil N.Z. Gindy and Tsvetan M. Ratchev

The decomposition of production facilities into efficient cells is one of the areas attracting increasing research attention due to the performance benefits which cellular…

1036

Abstract

The decomposition of production facilities into efficient cells is one of the areas attracting increasing research attention due to the performance benefits which cellular manufacturing offers. One of the problems associated with cell formation is the restrictive character of the existing task formalization models resulting in most cases from the use of single machine tool routeings in representing the component requirements. In contrast, the industrial reality provides a more complex picture with multiple choice of processing routeings in terms of available machine alternatives which need to be considered in order to achieve an “optimum” cellular decomposition of manufacturing facilities. Presents a facility decomposition approach based on multiple choice of processing alternatives for each component. The decision making is based on a generic description of the component processing routeings using unique machine capability patterns ‐ “resource elements”. Manufacturing cells are formed using concurrent fuzzy clustering methodology and a validation procedure for selection of the “optimum” facility partition.

Details

Integrated Manufacturing Systems, vol. 8 no. 4
Type: Research Article
ISSN: 0957-6061

Keywords

Article
Publication date: 4 May 2010

Ozlem Gemici Gunes and A. Sima Uyar

The purpose of this paper is to propose parallelization of a successful sequential ant‐based clustering algorithm (SABCA) to increase time performance.

Abstract

Purpose

The purpose of this paper is to propose parallelization of a successful sequential ant‐based clustering algorithm (SABCA) to increase time performance.

Design/methodology/approach

A SABCA is parallelized through the chosen parallelization library MPI. Parallelization is performed in two stages. In the first stage, data to be clustered are divided among processors. After the sequential ant‐based approach running on each processor clusters the data assigned to it, the resulting clusters are merged in the second stage. The merging is also performed through the same ant‐based technique. The experimental analysis focuses on whether the implemented parallel ant‐based clustering method leads to a better time performance than its fully sequential version or not. Since the aim of this paper is to speedup the time consuming, but otherwise successful, ant‐based clustering method, no extra steps are taken to improve the clustering solution. Tests are executed using 2 and 4 processors on selected sample datasets. Results are analyzed through commonly used cluster validity indices and parallelization performance metrices.

Findings

As a result of the experiments, it is seen that the proposed algorithm performs better based on time measurements and parallelization performance metrices; as expected, it does not improve the clustering quality based on the cluster validity indices. Furthermore, the communication cost is very small compared to other ant‐based clustering parallelization techniques proposed so far.

Research limitations/implications

The use of MPI for the parallelization step has been very effective. Also, the proposed parallelization technique is quite successful in increasing time performance; however, as a future study, improvements to clustering quality can be made in the final step where the partially clustered data are merged.

Practical implications

The results in literature show that ant‐based clustering techniques are successful; however, their high‐time complexity prohibit their effective use in practical applications. Through this low‐communication‐cost parallelization technique, this limitation may be overcome.

Originality/value

A new parallelization approach to ant‐based clustering is proposed. The proposed approach does not decrease clustering performance while it increases time performance. Also, another major contribution of this paper is the fact that the communication costs required for parallelization is lower than the previously proposed parallel ant‐based techniques.

Details

Kybernetes, vol. 39 no. 4
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 1 July 1980

J.A. Saunders

Examines the processes of cluster analysis and describes them using an example of benefit segmentation, and also discusses other applications suggesting new directions of research…

2749

Abstract

Examines the processes of cluster analysis and describes them using an example of benefit segmentation, and also discusses other applications suggesting new directions of research in related fields. Bases an example study with 200 early respondents to a survey into sixth formers' choice of degree course, in which students were given 23 criteria which related to their course choice. Comparisons of likeness using Euclidean distance measures were employed. Uses also importance ratings given by three drivers to characteristics of new cars. Proposes that hierarchical clustering can be criticised when used to cluster data that is not naturally hierarchical, but other procedures have similar failings. Posits that clumping and optimisation in conjunction with hierarchical clustering offer the greater potential. Concludes that cluster analysis is a flexible tool, which provides a number of opportunities for marketing, and it is an appealing and simple idea ‐ but there are many technical questions that a researcher must ask before it is used.

Details

European Journal of Marketing, vol. 14 no. 7
Type: Research Article
ISSN: 0309-0566

Keywords

Article
Publication date: 8 August 2016

Mahsan Esmaeilzadeh, Bijan Abdollahi, Asadallah Ganjali and Akbar Hasanpoor

The purpose of this paper is to introduce an evaluation methodology for employee profiles that will provide feedback to the training decision makers. Employee profiles play a…

Abstract

Purpose

The purpose of this paper is to introduce an evaluation methodology for employee profiles that will provide feedback to the training decision makers. Employee profiles play a crucial role in the evaluation process to improve the training process performance. This paper focuses on the clustering of the employees based on their profiles into specific categories that represent the employees’ characteristics. The employees are classified into following categories: necessary training, required training, and no training. The work may answer the question of how to spend the budget of training for the employees. This investigation presents the use of fuzzy optimization and clustering hybrid model (data mining approaches) as a fuzzy imperialistic competitive algorithm (FICA) and k-means to find the employees’ categories and predict their training requirements.

Design/methodology/approach

Prior research that served as an impetus for this paper is discussed. The approach is to apply evolutionary algorithms and clustering hybrid model to improve the training decision system directions.

Findings

This paper focuses on how to find a good model for the evaluation of employee profiles. The paper introduces the use of artificial intelligence methods (fuzzy optimization (FICA) and clustering techniques (K-means)) in management. The suggestion and the recommendations were constructed based on the clustering results that represent the employee profiles and reflect their requirements during the training courses. Finally, the paper proved the ability of fuzzy optimization technique and clustering hybrid model in predicting the employee’s training requirements.

Originality/value

This paper evaluates employee profiles based on new directions and expands the implication of clustering view in solving organizational challenges (in TCT for the first time).

Details

International Journal of Intelligent Computing and Cybernetics, vol. 9 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 25 February 2019

Celia Hireche and Habiba Drias

This paper is an extended version of Hireche and Drias (2018) presented at the WORLD-CIST’18 conference. The major contribution, in this work, is defined in two phases. First of…

Abstract

Purpose

This paper is an extended version of Hireche and Drias (2018) presented at the WORLD-CIST’18 conference. The major contribution, in this work, is defined in two phases. First of all, the use of data mining technologies and especially the tools of data preprocessing for instances of hard and complex problems prior to their resolution. The authors focus on clustering the instance aiming at reducing its complexity. The second phase is to solve the instance using the knowledge acquired in the first step and problem-solving methods. The paper aims to discuss these issues.

Design/methodology/approach

Because different clustering techniques may offer different results for a data set, a prior knowledge on data helps to determine the adequate type of clustering that should be applied. The first part of this work deals with a study on data descriptive characteristics in order to better understand the data. The dispersion and distribution of the variables in the problem instances is especially explored to determine the most suitable clustering technique to apply.

Findings

Several experiments were performed on different kinds of instances and different kinds of data distribution. The obtained results show the importance and the efficiency of the proposed appropriate preprocessing approaches prior to problem solving.

Practical implications

The proposed approach is developed, in this paper, on the Boolean satisfiability problem because of its well-recognised importance, with the aim of complexity reduction which allows an easier resolution of the later problem and particularly an important time saving.

Originality/value

State of the art of problem solving describes plenty of algorithms and solvers of hard problems that are still a challenge because of their complexity. The originality of this work lies on the investigation of appropriate preprocessing techniques to tackle and overcome this complexity prior to the resolution which becomes easier with an important time saving.

Details

Data Technologies and Applications, vol. 53 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 5 June 2017

Ravindra R. Rathod and Rahul Dev Garg

Electricity consumption around the world and in India is continuously increasing over the years. Presently, there is a huge diversity in electricity tariffs across states in…

1540

Abstract

Purpose

Electricity consumption around the world and in India is continuously increasing over the years. Presently, there is a huge diversity in electricity tariffs across states in India. This paper aims to focus on development of new tariff design method using K-means clustering and gap statistic.

Design/methodology/approach

Numbers of tariff plans are selected using gap-statistic for K-means clustering and regression analysis is used to deduce new tariffs from existing tariffs. The study has been carried on nearly 27,000 residential consumers from Sangli city, Maharashtra State, India.

Findings

These tariff plans are proposed with two objectives: first, possibility to shift consumer’s from existing to lower tariff plan for saving electricity and, second, to increase revenue by increasing tariff charges using Pay-by-Use policy.

Research limitations/implications

The study can be performed on hourly or daily data using automatic meter reading and to introduce Time of Use or demand based tariff.

Practical implications

The proposed study focuses on use of data mining techniques for tariff planning based on consumer’s electricity usage pattern. It will be helpful to detect abnormalities in consumption pattern as well as forecasting electricity usage.

Social implications

Consumers will be able to decide own monthly electricity consumption and related tariff leading to electricity savings, as well as high electricity consumption consumers have to pay more tariff charges for extra electricity usage.

Originality/value

To remove the disparity in various tariff plans across states and country, proposed method will help to provide a platform for designing uniform tariff for entire country based on consumer’s electricity consumption data.

Details

International Journal of Energy Sector Management, vol. 11 no. 2
Type: Research Article
ISSN: 1750-6220

Keywords

1 – 10 of 490