Search results

1 – 10 of over 2000
Article
Publication date: 28 October 2014

Minchen Zhu, Weizhi Wang and Jingshan Huang

It is well known that the selection of initial cluster centers can significantly affect K-means clustering results. The purpose of this paper is to propose an improved, efficient…

Abstract

Purpose

It is well known that the selection of initial cluster centers can significantly affect K-means clustering results. The purpose of this paper is to propose an improved, efficient methodology to handle such a challenge.

Design/methodology/approach

According to the fact that the inner-class distance among samples within the same cluster is supposed to be smaller than the inter-class distance among clusters, the algorithm will dynamically adjust initial cluster centers that are randomly selected. Consequently, such adjusted initial cluster centers will be highly representative in the sense that they are distributed among as many samples as possible. As a result, local optima that are common in K-means clustering can then be effectively reduced. In addition, the algorithm is able to obtain all initial cluster centers simultaneously (instead of one center at a time) during the dynamic adjustment.

Findings

Experimental results demonstrate that the proposed algorithm greatly improves the accuracy of traditional K-means clustering results and, in a more efficient manner.

Originality/value

The authors presented in this paper an efficient algorithm, which is able to dynamically adjust initial cluster centers that are randomly selected. The adjusted centers are highly representative, i.e. they are distributed among as many samples as possible. As a result, local optima that are common in K-means clustering can be effectively reduced so that the authors can achieve an improved clustering accuracy. In addition, the algorithm is a cost-efficient one and the enhanced clustering accuracy can be obtained in a more efficient manner compared with traditional K-means algorithm.

Details

Engineering Computations, vol. 31 no. 8
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 15 May 2017

Young Wook Seo, Kun Chang Lee and Sangjae Lee

For those who plan research funds and assess the research performance from the funds, it is necessary to overcome the limitations of the conventional classification of evaluated…

Abstract

Purpose

For those who plan research funds and assess the research performance from the funds, it is necessary to overcome the limitations of the conventional classification of evaluated papers published by the research funds. Besides, they need to promote the objective, fair clustering of papers, and analysis of research performance. Therefore, the purpose of this paper is to find the optimum clustering algorithm using the MATLAB tools by comparing the performances of and the hybrid particle swarm optimization algorithms using the particle swarm optimization (PSO) algorithm and the conventional K-means clustering method.

Design/methodology/approach

The clustering analysis experiment for each of the three fields of study – health and medicine, physics, and chemistry – used the following three algorithms: “K-means+Simulated annealing (SA)+Adjustment of parameters+PSO” (KASA-PSO clustering), “K-means+SA+PSO” clustering, “K-means+PSO” clustering.

Findings

The clustering analyses of all the three fields showed that KASA-PSO is the best method for the minimization of fitness value. Furthermore, this study administered the surveys intended for the “performance measurement of decision-making process” with 13 members of the research fund organization to compare the group clustering by the clustering analysis method of KASA-PSO algorithm and the group clustering by research funds. The results statistically demonstrated that the group clustering by the clustering analysis method of KASA-PSO algorithm was better than the group clustering by research funds.

Practical implications

This study examined the impact of bibliometric indicators on research impact of papers. The results showed that research period, the number of authors, and the number of participating researchers had positive effects on the impact factor (IF) of the papers; the IF that indicates the qualitative level of papers had a positive effect on the primary times cited; and the primary times cited had a positive effect on the secondary times cited. Furthermore, this study clearly showed the decision quality perceived by those who are working for the research fund organization.

Originality/value

There are still too few studies that assess the research project evaluation mechanisms and its effectiveness perceived by the research fund managers. To fill the research void like this, this study aims to propose PSO and successfully proves validity of the proposed approach.

Article
Publication date: 19 April 2013

Barileé B. Baridam and M. Montaz Ali

The K‐means clustering algorithm has been intensely researched owing to its simplicity of implementation and usefulness in the clustering task. However, there have also been…

Abstract

Purpose

The K‐means clustering algorithm has been intensely researched owing to its simplicity of implementation and usefulness in the clustering task. However, there have also been criticisms on its performance, in particular, for demanding the value of K before the actual clustering task. It is evident from previous researches that providing the number of clusters a priori does not in any way assist in the production of good quality clusters. The authors' investigations in this paper also confirm this finding. The purpose of this paper is to investigate further, the usefulness of the K‐means clustering in the clustering of high and multi‐dimensional data by applying it to biological sequence data.

Design/methodology/approach

The authors suggest a scheme which maps the high dimensional data into low dimensions, then show that the K‐means algorithm with pre‐processor produces good quality, compact and well‐separated clusters of the biological data mapped in low dimensions. For the purpose of clustering, a character‐to‐numeric conversion was conducted to transform the nucleic/amino acids symbols to numeric values.

Findings

A preprocessing technique has been suggested.

Originality/value

Conceptually this is a new paper with new results.

Article
Publication date: 25 February 2020

Wolfram Höpken, Marcel Müller, Matthias Fuchs and Maria Lexhagen

The purpose of this study is to analyse the suitability of photo-sharing platforms, such as Flickr, to extract relevant knowledge on tourists’ spatial movement and point of…

Abstract

Purpose

The purpose of this study is to analyse the suitability of photo-sharing platforms, such as Flickr, to extract relevant knowledge on tourists’ spatial movement and point of interest (POI) visitation behaviour and compare the most prominent clustering approaches to identify POIs in various application scenarios.

Design/methodology/approach

The study, first, extracts photo metadata from Flickr, such as upload time, location and user. Then, photo uploads are assigned to latent POIs by density-based spatial clustering of applications with noise (DBSCAN) and k-means clustering algorithms. Finally, association rule analysis (FP-growth algorithm) and sequential pattern mining (generalised sequential pattern algorithm) are used to identify tourists’ behavioural patterns.

Findings

The approach has been demonstrated for the city of Munich, extracting 13,545 photos for the year 2015. POIs, identified by DBSCAN and k-means clustering, could be meaningfully assigned to well-known POIs. By doing so, both techniques show specific advantages for different usage scenarios. Association rule analysis revealed strong rules (support: 1.0-4.6 per cent; lift: 1.4-32.1 per cent), and sequential pattern mining identified relevant frequent visitation sequences (support: 0.6-1.7 per cent).

Research limitations/implications

As a theoretic contribution, this study comparatively analyses the suitability of different clustering techniques to appropriately identify POIs based on photo upload data as an input to association rule analysis and sequential pattern mining as an alternative but also complementary techniques to analyse tourists’ spatial behaviour.

Practical implications

From a practical perspective, the study highlights that big data sources, such as Flickr, show the potential to effectively substitute traditional data sources for analysing tourists’ spatial behaviour and movement patterns within a destination. Especially, the approach offers the advantage of being fully automatic and executable in a real-time environment.

Originality/value

The study presents an approach to identify POIs by clustering photo uploads on social media platforms and to analyse tourists’ spatial behaviour by association rule analysis and sequential pattern mining. The study gains novel insights into the suitability of different clustering techniques to identify POIs in different application scenarios.

摘要 研究目的

本论文旨在分析图片分享平台Flickr对截取游客空间动线信息和景点(POI)游览行为的适用性, 并且对比最知名的几种聚类分析手段, 以确定不同情况下的POI。

研究设计/方法/途径

本论文首先从Flickr上摘录下图片大数据, 比如上传时间、地点、用户等。其次, 本论文使用DBSCAN和k-means聚类分析参数来将上传图片分配给POI隐性变量。最后, 本论文采用关联规则挖掘分析(FP-growth参数)和序列样式勘探分析(GSP参数)以确认游客行为模式。

研究结果

本论文以慕尼黑城市为样本, 截取2015年13,545张图片。POIs由DBSCAN和k-means聚类分析将其分配到有名的POIs。由此, 本论文证明了两种技术对不同用法的各自优势。关联规则挖掘分析显示了显著联系(support:1%−4.6%;lift:1.4%−32.1%), 序列样式勘探分析确立了相关频率游览次序(support:0.6%−1.7%。

研究理论限制/意义

本论文的理论贡献在于, 根据图片数据, 通过对比分析不同聚类分析技术对确立POIs, 并且证明关联规则挖掘分析和序列样式勘探分析各有千秋又互相补充的分析技术以确立游客空间行为。

研究现实意义

本论文的现实意义在于, 强调了大数据的来源, 比如Flickr,证明了其对于有效代替传统数据的潜力, 以分析在游客在一个旅游目的地的空间行为和动线模式。特别是这种方法实现了实时自动可操作性等优势。

研究原创性/价值

本论文展示了一种方法, 这种方法通过聚类分析社交媒体上的上传图片以确立POIs, 以及通过关联规则挖掘分析和序列样式勘探分析来分析游客空间行为。本论文对于不同聚类分析以确立不同适用情况下的POIs的确立提出了独到见解。

Article
Publication date: 1 May 2006

Derry Tanti Wijaya and Stéphane Bressan

Querying search engines with the keyword “jaguars” returns results as diverse as web sites about cars, computer games, attack planes, American football, and animals. More and more…

Abstract

Querying search engines with the keyword “jaguars” returns results as diverse as web sites about cars, computer games, attack planes, American football, and animals. More and more search engines offer options to organize query results by categories or, given a document, to return a list of links to topically related documents. While information retrieval traditionally defines similarity of documents in terms of contents, it seems natural to expect that the very structure of the Web carries important information about the topical similarity of documents. Here we study the role of a matrix constructed from weighted co‐citations (documents referenced by the same document), weighted couplings (documents referencing the same document), incoming, and outgoing links for the clustering of documents on the Web. We present and discuss three methods of clustering based on this matrix construction using three clustering algorithms, K‐means, Markov and Maximum Spanning Tree, respectively. Our main contribution is a clustering technique based on the Maximum Spanning Tree technique and an evaluation of its effectiveness comparatively to the two most robust alternatives: K‐means and Markov clustering.

Details

International Journal of Web Information Systems, vol. 2 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 7 June 2019

Anubha Rautela, S.K. Sharma and P. Bhardwaj

The purpose of this paper is to reduce the distribution cost of an Indian cooperative dairy. The reduction of cost was achieved with the application of the clustering method…

Abstract

Purpose

The purpose of this paper is to reduce the distribution cost of an Indian cooperative dairy. The reduction of cost was achieved with the application of the clustering method (k-means clustering) and capacitated vehicle routing problem (cheapest link algorithm (CLA)).

Design/methodology/approach

Capacitated k-means clustering was used to split delivery locations into similar size groups (i.e. clusters) based on proximity without exceeding a specified total cluster capacity. Each cluster would be served by a local stockist. CLA was then used to find delivery routes from dairy (i.e. depot) to stockist in each cluster and from stockist to all other delivery locations within the cluster.

Findings

K-means clustering and CLA suggested optimal delivery routes on which vehicles will run. The complete algorithm was able to provide a solution within 30 s.

Practical implications

Clustering of delivery locations and use of heterogeneous fleet of delivery vehicles can result in considerable savings in daily operational cost.

Originality/value

Most of the research related to the use of demand clustering to improve distribution routes has been theoretical, which do not take into account real-world limitations like vehicle’s specific limitations. The authors tried to address that gap by taking a real-world case of a cooperative dairy and compared the result with existing distribution routes used by dairy. This work can be used by other dairies or distribution companies according to their scenario.

Details

Journal of Advances in Management Research, vol. 16 no. 5
Type: Research Article
ISSN: 0972-7981

Keywords

Article
Publication date: 21 November 2008

Chun‐Nan Lin, Chih‐Fong Tsai and Jinsheng Roan

Because of the popularity of digital cameras, the number of personal photographs is increasing rapidly. In general, people manage their photos by date, subject, participants, etc…

Abstract

Purpose

Because of the popularity of digital cameras, the number of personal photographs is increasing rapidly. In general, people manage their photos by date, subject, participants, etc. for future browsing and searching. However, it is difficult and/or takes time to retrieve desired photos from a large number of photographs based on the general personal photo management strategy. In this paper the authors aim to propose a systematic solution to effectively organising and browsing personal photos.

Design/methodology/approach

In their system the authors apply the concept of content‐based image retrieval (CBIR) to automatically extract visual image features of personal photos. Then three well‐known clustering techniques – k‐means, self‐organising maps and fuzzy c‐means – are used to group personal photos. Finally, the clustering results are evaluated by human subjects in terms of retrieval effectiveness and efficiency.

Findings

Experimental results based on the dataset of 1,000 personal photos show that the k‐means clustering method outperforms self‐organising maps and fuzzy c‐means. That is, 12 subjects out of 30 preferred the clustering results of k‐means. In particular, most subjects agreed that larger numbers of clusters (e.g. 15 to 20) enabled more effective browsing of personal photos. For the efficiency evaluation, the clustering results using k‐means allowed subjects to search for relevant images in the least amount of time.

Originality/value

CBIR is applied in many areas, but very few related works focus on personal photo browsing and retrieval. This paper examines the applicability of using CBIR and clustering techniques for browsing personal photos. In addition, the evaluation based on the effectiveness and efficiency strategies ensures the reliability of our findings.

Details

Online Information Review, vol. 32 no. 6
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 1 March 2021

Hardi M. Mohammed, Zrar Kh. Abdul, Tarik A. Rashid, Abeer Alsadoon and Nebojsa Bacanin

This paper aims at studying meta-heuristic algorithms. One of the common meta-heuristic optimization algorithms is called grey wolf optimization (GWO). The key aim is to enhance…

Abstract

Purpose

This paper aims at studying meta-heuristic algorithms. One of the common meta-heuristic optimization algorithms is called grey wolf optimization (GWO). The key aim is to enhance the limitations of the wolves’ searching process of attacking gray wolves.

Design/methodology/approach

The development of meta-heuristic algorithms has increased by researchers to use them extensively in the field of business, science and engineering. In this paper, the K-means clustering algorithm is used to enhance the performance of the original GWO; the new algorithm is called K-means clustering gray wolf optimization (KMGWO).

Findings

Results illustrate the efficiency of KMGWO against to the GWO. To evaluate the performance of the KMGWO, KMGWO applied to solve CEC2019 benchmark test functions.

Originality/value

Results prove that KMGWO is superior to GWO. KMGWO is also compared to cat swarm optimization (CSO), whale optimization algorithm-bat algorithm (WOA-BAT), WOA and GWO so KMGWO achieved the first rank in terms of performance. In addition, the KMGWO is used to solve a classical engineering problem and it is superior.

Details

World Journal of Engineering, vol. 18 no. 4
Type: Research Article
ISSN: 1708-5284

Keywords

Article
Publication date: 8 August 2016

Mahsan Esmaeilzadeh, Bijan Abdollahi, Asadallah Ganjali and Akbar Hasanpoor

The purpose of this paper is to introduce an evaluation methodology for employee profiles that will provide feedback to the training decision makers. Employee profiles play a…

Abstract

Purpose

The purpose of this paper is to introduce an evaluation methodology for employee profiles that will provide feedback to the training decision makers. Employee profiles play a crucial role in the evaluation process to improve the training process performance. This paper focuses on the clustering of the employees based on their profiles into specific categories that represent the employees’ characteristics. The employees are classified into following categories: necessary training, required training, and no training. The work may answer the question of how to spend the budget of training for the employees. This investigation presents the use of fuzzy optimization and clustering hybrid model (data mining approaches) as a fuzzy imperialistic competitive algorithm (FICA) and k-means to find the employees’ categories and predict their training requirements.

Design/methodology/approach

Prior research that served as an impetus for this paper is discussed. The approach is to apply evolutionary algorithms and clustering hybrid model to improve the training decision system directions.

Findings

This paper focuses on how to find a good model for the evaluation of employee profiles. The paper introduces the use of artificial intelligence methods (fuzzy optimization (FICA) and clustering techniques (K-means)) in management. The suggestion and the recommendations were constructed based on the clustering results that represent the employee profiles and reflect their requirements during the training courses. Finally, the paper proved the ability of fuzzy optimization technique and clustering hybrid model in predicting the employee’s training requirements.

Originality/value

This paper evaluates employee profiles based on new directions and expands the implication of clustering view in solving organizational challenges (in TCT for the first time).

Details

International Journal of Intelligent Computing and Cybernetics, vol. 9 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 19 June 2017

Khai Tan Huynh, Tho Thanh Quan and Thang Hoai Bui

Service-oriented architecture is an emerging software architecture, in which web service (WS) plays a crucial role. In this architecture, the task of WS composition and…

Abstract

Purpose

Service-oriented architecture is an emerging software architecture, in which web service (WS) plays a crucial role. In this architecture, the task of WS composition and verification is required when handling complex requirement of services from users. When the number of WS becomes very huge in practice, the complexity of the composition and verification is also correspondingly high. In this paper, the authors aim to propose a logic-based clustering approach to solve this problem by separating the original repository of WS into clusters. Moreover, they also propose a so-called quality-controlled clustering approach to ensure the quality of generated clusters in a reasonable execution time.

Design/methodology/approach

The approach represents WSs as logical formulas on which the authors conduct the clustering task. They also combine two most popular clustering approaches of hierarchical agglomerative clustering (HAC) and k-means to ensure the quality of generated clusters.

Findings

This logic-based clustering approach really helps to increase the performance of the WS composition and verification significantly. Furthermore, the logic-based approach helps us to maintain the soundness and completeness of the composition solution. Eventually, the quality-controlled strategy can ensure the quality of generated clusters in low complexity time.

Research limitations/implications

The work discussed in this paper is just implemented as a research tool known as WSCOVER. More work is needed to make it a practical and usable system for real life applications.

Originality/value

In this paper, the authors propose a logic-based paradigm to represent and cluster WSs. Moreover, they also propose an approach of quality-controlled clustering which combines and takes advantages of two most popular clustering approaches of HAC and k-means.

1 – 10 of over 2000