Search results

1 – 10 of 148
Article
Publication date: 19 April 2013

Barileé B. Baridam and M. Montaz Ali

The K‐means clustering algorithm has been intensely researched owing to its simplicity of implementation and usefulness in the clustering task. However, there have also been…

Abstract

Purpose

The K‐means clustering algorithm has been intensely researched owing to its simplicity of implementation and usefulness in the clustering task. However, there have also been criticisms on its performance, in particular, for demanding the value of K before the actual clustering task. It is evident from previous researches that providing the number of clusters a priori does not in any way assist in the production of good quality clusters. The authors' investigations in this paper also confirm this finding. The purpose of this paper is to investigate further, the usefulness of the K‐means clustering in the clustering of high and multi‐dimensional data by applying it to biological sequence data.

Design/methodology/approach

The authors suggest a scheme which maps the high dimensional data into low dimensions, then show that the K‐means algorithm with pre‐processor produces good quality, compact and well‐separated clusters of the biological data mapped in low dimensions. For the purpose of clustering, a character‐to‐numeric conversion was conducted to transform the nucleic/amino acids symbols to numeric values.

Findings

A preprocessing technique has been suggested.

Originality/value

Conceptually this is a new paper with new results.

Article
Publication date: 31 January 2020

Metin Vatansever, İbrahim Demir and Ali Hepşen

The main purpose of this study is to detect homogeneous housing market areas among 196 districts of 5 major cities of Turkey in terms of house sale price indices. The second…

Abstract

Purpose

The main purpose of this study is to detect homogeneous housing market areas among 196 districts of 5 major cities of Turkey in terms of house sale price indices. The second purpose is to forecast these 196 house sale price indices.

Design/methodology/approach

In this paper, the authors use the monthly house sale price indices of 196 districts of 5 major cities of Turkey. The authors propose an autoregressive (AR) model-based fuzzy clustering approach to detect homogeneous housing market areas and to forecast house price indices.

Findings

The AR model-based fuzzy clustering approach detects three numbers of homogenous property market areas among 196 districts of 5 major cities of Turkey where house sale price moves together (or with similar house sales dynamic). This approach also provides better forecasting results compared to standard AR models by higher data efficiency and lower model validation and maintenance effort.

Research limitations/implications

In this study, the authors could not use any district-based socioeconomic and consumption behavioral indicators and any discrete geographical and property characteristics because of the data limitation.

Practical implications

The finding of this study would help property investors for establishing more effective property management strategies by taking different geographical location conditions into account.

Social implications

From the government side, knowing future rises, falls and turning points of property prices in different locations can allow the government to monitor the property price changes and control the speculation activities that cause a dramatic change in the market.

Originality/value

There is no previous research paper focusing on neighborhood-based clusters and forecasting house sale price indices in Turkey. At this point, it is the first academic study.

Details

International Journal of Housing Markets and Analysis, vol. 13 no. 4
Type: Research Article
ISSN: 1753-8270

Keywords

Article
Publication date: 13 June 2016

M. Arif Wani and Romana Riyaz

The most commonly used approaches for cluster validation are based on indices but the majority of the existing cluster validity indices do not work well on data sets of different…

Abstract

Purpose

The most commonly used approaches for cluster validation are based on indices but the majority of the existing cluster validity indices do not work well on data sets of different complexities. The purpose of this paper is to propose a new cluster validity index (ARSD index) that works well on all types of data sets.

Design/methodology/approach

The authors introduce a new compactness measure that depicts the typical behaviour of a cluster where more points are located around the centre and lesser points towards the outer edge of the cluster. A novel penalty function is proposed for determining the distinctness measure of clusters. Random linear search-algorithm is employed to evaluate and compare the performance of the five commonly known validity indices and the proposed validity index. The values of the six indices are computed for all nc ranging from (nc min, nc max) to obtain the optimal number of clusters present in a data set. The data sets used in the experiments include shaped, Gaussian-like and real data sets.

Findings

Through extensive experimental study, it is observed that the proposed validity index is found to be more consistent and reliable in indicating the correct number of clusters compared to other validity indices. This is experimentally demonstrated on 11 data sets where the proposed index has achieved better results.

Originality/value

The originality of the research paper includes proposing a novel cluster validity index which is used to determine the optimal number of clusters present in data sets of different complexities.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 9 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 4 May 2010

Ozlem Gemici Gunes and A. Sima Uyar

The purpose of this paper is to propose parallelization of a successful sequential ant‐based clustering algorithm (SABCA) to increase time performance.

Abstract

Purpose

The purpose of this paper is to propose parallelization of a successful sequential ant‐based clustering algorithm (SABCA) to increase time performance.

Design/methodology/approach

A SABCA is parallelized through the chosen parallelization library MPI. Parallelization is performed in two stages. In the first stage, data to be clustered are divided among processors. After the sequential ant‐based approach running on each processor clusters the data assigned to it, the resulting clusters are merged in the second stage. The merging is also performed through the same ant‐based technique. The experimental analysis focuses on whether the implemented parallel ant‐based clustering method leads to a better time performance than its fully sequential version or not. Since the aim of this paper is to speedup the time consuming, but otherwise successful, ant‐based clustering method, no extra steps are taken to improve the clustering solution. Tests are executed using 2 and 4 processors on selected sample datasets. Results are analyzed through commonly used cluster validity indices and parallelization performance metrices.

Findings

As a result of the experiments, it is seen that the proposed algorithm performs better based on time measurements and parallelization performance metrices; as expected, it does not improve the clustering quality based on the cluster validity indices. Furthermore, the communication cost is very small compared to other ant‐based clustering parallelization techniques proposed so far.

Research limitations/implications

The use of MPI for the parallelization step has been very effective. Also, the proposed parallelization technique is quite successful in increasing time performance; however, as a future study, improvements to clustering quality can be made in the final step where the partially clustered data are merged.

Practical implications

The results in literature show that ant‐based clustering techniques are successful; however, their high‐time complexity prohibit their effective use in practical applications. Through this low‐communication‐cost parallelization technique, this limitation may be overcome.

Originality/value

A new parallelization approach to ant‐based clustering is proposed. The proposed approach does not decrease clustering performance while it increases time performance. Also, another major contribution of this paper is the fact that the communication costs required for parallelization is lower than the previously proposed parallel ant‐based techniques.

Details

Kybernetes, vol. 39 no. 4
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 23 August 2022

Kamlesh Kumar Pandey and Diwakar Shukla

The K-means (KM) clustering algorithm is extremely responsive to the selection of initial centroids since the initial centroid of clusters determines computational effectiveness…

Abstract

Purpose

The K-means (KM) clustering algorithm is extremely responsive to the selection of initial centroids since the initial centroid of clusters determines computational effectiveness, efficiency and local optima issues. Numerous initialization strategies are to overcome these problems through the random and deterministic selection of initial centroids. The random initialization strategy suffers from local optimization issues with the worst clustering performance, while the deterministic initialization strategy achieves high computational cost. Big data clustering aims to reduce computation costs and improve cluster efficiency. The objective of this study is to achieve a better initial centroid for big data clustering on business management data without using random and deterministic initialization that avoids local optima and improves clustering efficiency with effectiveness in terms of cluster quality, computation cost, data comparisons and iterations on a single machine.

Design/methodology/approach

This study presents the Normal Distribution Probability Density (NDPD) algorithm for big data clustering on a single machine to solve business management-related clustering issues. The NDPDKM algorithm resolves the KM clustering problem by probability density of each data point. The NDPDKM algorithm first identifies the most probable density data points by using the mean and standard deviation of the datasets through normal probability density. Thereafter, the NDPDKM determines K initial centroid by using sorting and linear systematic sampling heuristics.

Findings

The performance of the proposed algorithm is compared with KM, KM++, Var-Part, Murat-KM, Mean-KM and Sort-KM algorithms through Davies Bouldin score, Silhouette coefficient, SD Validity, S_Dbw Validity, Number of Iterations and CPU time validation indices on eight real business datasets. The experimental evaluation demonstrates that the NDPDKM algorithm reduces iterations, local optima, computing costs, and improves cluster performance, effectiveness, efficiency with stable convergence as compared to other algorithms. The NDPDKM algorithm minimizes the average computing time up to 34.83%, 90.28%, 71.83%, 92.67%, 69.53% and 76.03%, and reduces the average iterations up to 40.32%, 44.06%, 32.02%, 62.78%, 19.07% and 36.74% with reference to KM, KM++, Var-Part, Murat-KM, Mean-KM and Sort-KM algorithms.

Originality/value

The KM algorithm is the most widely used partitional clustering approach in data mining techniques that extract hidden knowledge, patterns and trends for decision-making strategies in business data. Business analytics is one of the applications of big data clustering where KM clustering is useful for the various subcategories of business analytics such as customer segmentation analysis, employee salary and performance analysis, document searching, delivery optimization, discount and offer analysis, chaplain management, manufacturing analysis, productivity analysis, specialized employee and investor searching and other decision-making strategies in business.

Article
Publication date: 7 March 2023

Sida Wan and Victor Kuzmichev

The purpose of this paper is to present a fit prediction case study of virtual twins of women jacket. For this reason, several basic principles of clothes fit prediction were…

Abstract

Purpose

The purpose of this paper is to present a fit prediction case study of virtual twins of women jacket. For this reason, several basic principles of clothes fit prediction were developed and verified.

Design/methodology/approach

To develop the principles of fit prediction, the women's jacket sleeve was selected as the study object. The study objects were categorized into three types: the patterns and two virtual sleeves generating on full avatar from Clo3D and on dummy (arm partly removed). Through series of subjective and objective evaluation experiments, the relationship between the similar indexes of the patterns and the virtual sleeves was built, including fit criteria range, the categorization of the indexes in terms of its sensitiveness, and the linear regressions to predict several indexes of virtual sleeves after its pattern parameterization. The results obtained were verified in case study by the virtual and real sleeves generating.

Findings

The proposed principles of clothing fit prediction based on parallel research of the flat patterns and its virtual 3D shapes. The principles include the choosing of virtual twins of human body for virtual try-on, the establishing of indexes common schedule for patterns and virtual sleeves, the creation of criteria and its ranges for perfect fit and poor fit evaluation, and the application of existing relations between the patterns and the sleeves for predicting indexes responsible for fit.

Research limitations/implications

The authors propose and verify the validity of the principles to predict several parameters of virtual 3D sleeve of women's jacket which are forming the level of fit. The result of this study can be used to convenient fit prediction and to find the misfit reasons.

Practical implications

This study developed basic principles for predicting the fit of the clothing through the virtual simulation and the statistical analysis. Through studying the jacket sleeve, the several related ranges, the row of more sensitive indexes, and the equations were presented and verified, which certified the validity of proposed principles.

Social implications

The results can effectively predict the sleeve fit before sewing, which reduce the time and materials cost and the operator's skill requirements.

Originality/value

The authors propose and verify the validity of the principles to predict several parameters of virtual 3D sleeve of women's jacket which form the level of fit. The result of this study can be used for convenient fit prediction and to find the misfit reasons.

Details

International Journal of Clothing Science and Technology, vol. 35 no. 3
Type: Research Article
ISSN: 0955-6222

Keywords

Article
Publication date: 15 June 2021

Simon Wiersma, Tobias Just and Michael Heinrich

Germany has a polycentric city structure. This paper aims to reduce complexity of this structure and to find a reliable classification scheme of German housing markets at city…

Abstract

Purpose

Germany has a polycentric city structure. This paper aims to reduce complexity of this structure and to find a reliable classification scheme of German housing markets at city level based on 17 relevant market parameters.

Design/methodology/approach

This paper uses a two-step clustering algorithm combining k-means with Ward’s method to develop the classification scheme. The clustering process is preceded by a principal component analysis to merely retain the most important dimensions of the market parameters. The robustness of the results is investigated with a bootstrapping method.

Findings

It is found that German residential markets can best be segmented into four groups. Geographic contiguity plays a specific role, but is not a main factor. Our bootstrapping analysis identifies the majority of pairwise city relations (88.5%) to be non-random.

Research limitations/implications

A deeper discussion concerning the most relevant market parameters is required. The stability of the clusters is to be re-investigated in the future, as the bootstrapping analysis indicates that some clusters are more homogeneous than others.

Practical implications

The developed classification scheme provides insights into opportunities and risks associated with specific city groups. The findings of this study can be used in portfolio management to reduce unsystematic investment risks and to formulate investment strategies.

Originality/value

To the best of the authors’ knowledge, this is the first paper to offer insights into the German housing markets which applies principal component, cluster and bootstrapping analyses in a sole integrated approach.

Details

International Journal of Housing Markets and Analysis, vol. 15 no. 3
Type: Research Article
ISSN: 1753-8270

Keywords

Article
Publication date: 21 October 2021

Noorullah Renigunta Mohammed and Moulana Mohammed

The purpose of this study for eHealth text mining domains, cosine-based visual methods (VM) assess the clusters more accurately than Euclidean; which are recommended for tweet…

Abstract

Purpose

The purpose of this study for eHealth text mining domains, cosine-based visual methods (VM) assess the clusters more accurately than Euclidean; which are recommended for tweet data models for clusters assessment. Such VM determines the clusters concerning a single viewpoint or none, which are less informative. Multi-viewpoints (MVP) were used for addressing the more informative clusters assessment of health-care tweet documents and to demonstrate visual analysis of cluster tendency.

Design/methodology/approach

In this paper, the authors proposed MVP-based VM by using traditional topic models with visual techniques to find cluster tendency, partitioning for cluster validity to propose health-care recommendations based on tweets. The authors demonstrated the effectiveness of proposed methods on different real-time Twitter health-care data sets in the experimental study. The authors also did a comparative analysis of proposed models with existing visual assessment tendency (VAT) and cVAT models by using cluster validity indices and computational complexities; the examples suggest that MVP VM were more informative.

Findings

In this paper, the authors proposed MVP-based VM by using traditional topic models with visual techniques to find cluster tendency, partitioning for cluster validity to propose health-care recommendations based on tweets.

Originality/value

In this paper, the authors proposed multi-viewpoints distance metric in topic model cluster tendency for the first time and visual representation using VAT images using hybrid topic models to find cluster tendency, partitioning for cluster validity to propose health-care recommendations based on tweets.

Details

International Journal of Pervasive Computing and Communications, vol. 18 no. 1
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 1 July 2021

Ghizlane El bok and Abdelaziz Berrado

Categorizing projects allows for better alignment of a portfolio with the organizational strategy and goals. An appropriate project categorization helps understand portfolio’s…

Abstract

Purpose

Categorizing projects allows for better alignment of a portfolio with the organizational strategy and goals. An appropriate project categorization helps understand portfolio’s structure and enables proper project portfolio selection (PPS). In practice, project categorization is, however, conducted in intuitive approaches. Furthermore, little attention has been given to project categorization methods in the project management literature. The purpose of this paper is to provide researchers and practitioners with a data-driven project categorization process designed for PPS.

Design/methodology/approach

The suggested process was modeled considering the main characteristics of project categorization systems revealed from the literature. The clustering analysis is used as the core-computing technology, allowing for an empirically based categorization. This study also presents a real-world case study in the automotive industry to illustrate the proposed approach.

Findings

This study confirmed the potential of clustering analysis for a consistent project categorization. The most important attributes that influenced the project grouping have been identified including strategic and intrinsic features. The proposed approach helps increase the visibility of the portfolio’s structure and the comparability of its components.

Originality/value

There is a lack of research regarding project categorization methods, particularly for the purpose of PPS. A novel data-driven process is proposed to help mitigate the issues raised by prior researchers including the inconsistencies, ambiguities and multiple interpretations related to the taken-for-granted categories. The suggested approach is also expected to facilitate projects evaluation and prioritization within appropriate categories and contribute in PPS effectiveness.

Details

Journal of Modelling in Management, vol. 17 no. 2
Type: Research Article
ISSN: 1746-5664

Keywords

Article
Publication date: 7 February 2024

Khatab Alqararah and Ibrahim Alnafrah

This research paper aims to contribute to the field of innovation performance benchmarking by identifying appropriate benchmarking groups and exploring learning opportunities and…

Abstract

Purpose

This research paper aims to contribute to the field of innovation performance benchmarking by identifying appropriate benchmarking groups and exploring learning opportunities and integration directions.

Design/methodology/approach

The study employs a multi-dimensional innovation-driven clustering methodology to analyze data from the 2019 edition of the Global Innovation Index (GII). Hierarchical and K-means Cluster Analysis techniques are applied using various sets of distance matrices to uncover and analyze distinct innovation patterns.

Findings

This study classifies 129 countries into four clusters: Specials, Advanced, Intermediates and Primitives. Each cluster exhibits strengths and weaknesses in terms of innovation performance. Specials excel in the areas of institutions and knowledge commercialization, while the Advanced cluster demonstrates strengths in education and ICT-related services but shows weakness in patent commercialization. Intermediates show strengths in venture-capital and labour productivity but display weaknesses in R&D expenditure and the higher education quality. Primitives exhibit strength in creative activities but suffer from weaknesses in digital skills, education and training. Additionally, the study has identified 35 indicators that have negligible variance contributions across countries.

Originality/value

The study contributes to finding the relevant countries’ grouping for the enhancement of communication, integration and learning. To this end, this study highlights the innovation structural differences among countries and provides tailored innovation policies.

Details

Journal of Entrepreneurship and Public Policy, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2045-2101

Keywords

1 – 10 of 148