Search results

1 – 10 of over 2000
Article
Publication date: 28 October 2014

Minchen Zhu, Weizhi Wang and Jingshan Huang

It is well known that the selection of initial cluster centers can significantly affect K-means clustering results. The purpose of this paper is to propose an improved, efficient…

Abstract

Purpose

It is well known that the selection of initial cluster centers can significantly affect K-means clustering results. The purpose of this paper is to propose an improved, efficient methodology to handle such a challenge.

Design/methodology/approach

According to the fact that the inner-class distance among samples within the same cluster is supposed to be smaller than the inter-class distance among clusters, the algorithm will dynamically adjust initial cluster centers that are randomly selected. Consequently, such adjusted initial cluster centers will be highly representative in the sense that they are distributed among as many samples as possible. As a result, local optima that are common in K-means clustering can then be effectively reduced. In addition, the algorithm is able to obtain all initial cluster centers simultaneously (instead of one center at a time) during the dynamic adjustment.

Findings

Experimental results demonstrate that the proposed algorithm greatly improves the accuracy of traditional K-means clustering results and, in a more efficient manner.

Originality/value

The authors presented in this paper an efficient algorithm, which is able to dynamically adjust initial cluster centers that are randomly selected. The adjusted centers are highly representative, i.e. they are distributed among as many samples as possible. As a result, local optima that are common in K-means clustering can be effectively reduced so that the authors can achieve an improved clustering accuracy. In addition, the algorithm is a cost-efficient one and the enhanced clustering accuracy can be obtained in a more efficient manner compared with traditional K-means algorithm.

Details

Engineering Computations, vol. 31 no. 8
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 15 May 2017

Young Wook Seo, Kun Chang Lee and Sangjae Lee

For those who plan research funds and assess the research performance from the funds, it is necessary to overcome the limitations of the conventional classification of evaluated…

Abstract

Purpose

For those who plan research funds and assess the research performance from the funds, it is necessary to overcome the limitations of the conventional classification of evaluated papers published by the research funds. Besides, they need to promote the objective, fair clustering of papers, and analysis of research performance. Therefore, the purpose of this paper is to find the optimum clustering algorithm using the MATLAB tools by comparing the performances of and the hybrid particle swarm optimization algorithms using the particle swarm optimization (PSO) algorithm and the conventional K-means clustering method.

Design/methodology/approach

The clustering analysis experiment for each of the three fields of study – health and medicine, physics, and chemistry – used the following three algorithms: “K-means+Simulated annealing (SA)+Adjustment of parameters+PSO” (KASA-PSO clustering), “K-means+SA+PSO” clustering, “K-means+PSO” clustering.

Findings

The clustering analyses of all the three fields showed that KASA-PSO is the best method for the minimization of fitness value. Furthermore, this study administered the surveys intended for the “performance measurement of decision-making process” with 13 members of the research fund organization to compare the group clustering by the clustering analysis method of KASA-PSO algorithm and the group clustering by research funds. The results statistically demonstrated that the group clustering by the clustering analysis method of KASA-PSO algorithm was better than the group clustering by research funds.

Practical implications

This study examined the impact of bibliometric indicators on research impact of papers. The results showed that research period, the number of authors, and the number of participating researchers had positive effects on the impact factor (IF) of the papers; the IF that indicates the qualitative level of papers had a positive effect on the primary times cited; and the primary times cited had a positive effect on the secondary times cited. Furthermore, this study clearly showed the decision quality perceived by those who are working for the research fund organization.

Originality/value

There are still too few studies that assess the research project evaluation mechanisms and its effectiveness perceived by the research fund managers. To fill the research void like this, this study aims to propose PSO and successfully proves validity of the proposed approach.

Article
Publication date: 19 April 2013

Barileé B. Baridam and M. Montaz Ali

The K‐means clustering algorithm has been intensely researched owing to its simplicity of implementation and usefulness in the clustering task. However, there have also been…

Abstract

Purpose

The K‐means clustering algorithm has been intensely researched owing to its simplicity of implementation and usefulness in the clustering task. However, there have also been criticisms on its performance, in particular, for demanding the value of K before the actual clustering task. It is evident from previous researches that providing the number of clusters a priori does not in any way assist in the production of good quality clusters. The authors' investigations in this paper also confirm this finding. The purpose of this paper is to investigate further, the usefulness of the K‐means clustering in the clustering of high and multi‐dimensional data by applying it to biological sequence data.

Design/methodology/approach

The authors suggest a scheme which maps the high dimensional data into low dimensions, then show that the K‐means algorithm with pre‐processor produces good quality, compact and well‐separated clusters of the biological data mapped in low dimensions. For the purpose of clustering, a character‐to‐numeric conversion was conducted to transform the nucleic/amino acids symbols to numeric values.

Findings

A preprocessing technique has been suggested.

Originality/value

Conceptually this is a new paper with new results.

Article
Publication date: 17 April 2024

Charitha Sasika Hettiarachchi, Nanfei Sun, Trang Minh Quynh Le and Naveed Saleem

The COVID-19 pandemic has posed many challenges in almost all sectors around the globe. Because of the pandemic, government entities responsible for managing health-care resources…

Abstract

Purpose

The COVID-19 pandemic has posed many challenges in almost all sectors around the globe. Because of the pandemic, government entities responsible for managing health-care resources face challenges in managing and distributing their limited and valuable health resources. In addition, severe outbreaks may occur in a small or large geographical area. Therefore, county-level preparation is crucial for officials and organizations who manage such disease outbreaks. However, most COVID-19-related research projects have focused on either state- or country-level. Only a few studies have considered county-level preparations, such as identifying high-risk counties of a particular state to fight against the COVID-19 pandemic. Therefore, the purpose of this research is to prioritize counties in a state based on their COVID-19-related risks to manage the COVID outbreak effectively.

Design/methodology/approach

In this research, the authors use a systematic hybrid approach that uses a clustering technique to group counties that share similar COVID conditions and use a multi-criteria decision-making approach – the analytic hierarchy process – to rank clusters with respect to the severity of the pandemic. The clustering was performed using two methods, k-means and fuzzy c-means, but only one of them was used at a time during the experiment.

Findings

The results of this study indicate that the proposed approach can effectively identify and rank the most vulnerable counties in a particular state. Hence, state health resources managing entities can identify counties in desperate need of more attention before they allocate their resources and better prepare those counties before another surge.

Originality/value

To the best of the authors’ knowledge, this study is the first to use both an unsupervised learning approach and the analytic hierarchy process to identify and rank state counties in accordance with the severity of COVID-19.

Details

Journal of Systems and Information Technology, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1328-7265

Keywords

Article
Publication date: 25 February 2020

Wolfram Höpken, Marcel Müller, Matthias Fuchs and Maria Lexhagen

The purpose of this study is to analyse the suitability of photo-sharing platforms, such as Flickr, to extract relevant knowledge on tourists’ spatial movement and point of…

Abstract

Purpose

The purpose of this study is to analyse the suitability of photo-sharing platforms, such as Flickr, to extract relevant knowledge on tourists’ spatial movement and point of interest (POI) visitation behaviour and compare the most prominent clustering approaches to identify POIs in various application scenarios.

Design/methodology/approach

The study, first, extracts photo metadata from Flickr, such as upload time, location and user. Then, photo uploads are assigned to latent POIs by density-based spatial clustering of applications with noise (DBSCAN) and k-means clustering algorithms. Finally, association rule analysis (FP-growth algorithm) and sequential pattern mining (generalised sequential pattern algorithm) are used to identify tourists’ behavioural patterns.

Findings

The approach has been demonstrated for the city of Munich, extracting 13,545 photos for the year 2015. POIs, identified by DBSCAN and k-means clustering, could be meaningfully assigned to well-known POIs. By doing so, both techniques show specific advantages for different usage scenarios. Association rule analysis revealed strong rules (support: 1.0-4.6 per cent; lift: 1.4-32.1 per cent), and sequential pattern mining identified relevant frequent visitation sequences (support: 0.6-1.7 per cent).

Research limitations/implications

As a theoretic contribution, this study comparatively analyses the suitability of different clustering techniques to appropriately identify POIs based on photo upload data as an input to association rule analysis and sequential pattern mining as an alternative but also complementary techniques to analyse tourists’ spatial behaviour.

Practical implications

From a practical perspective, the study highlights that big data sources, such as Flickr, show the potential to effectively substitute traditional data sources for analysing tourists’ spatial behaviour and movement patterns within a destination. Especially, the approach offers the advantage of being fully automatic and executable in a real-time environment.

Originality/value

The study presents an approach to identify POIs by clustering photo uploads on social media platforms and to analyse tourists’ spatial behaviour by association rule analysis and sequential pattern mining. The study gains novel insights into the suitability of different clustering techniques to identify POIs in different application scenarios.

摘要 研究目的

本论文旨在分析图片分享平台Flickr对截取游客空间动线信息和景点(POI)游览行为的适用性, 并且对比最知名的几种聚类分析手段, 以确定不同情况下的POI。

研究设计/方法/途径

本论文首先从Flickr上摘录下图片大数据, 比如上传时间、地点、用户等。其次, 本论文使用DBSCAN和k-means聚类分析参数来将上传图片分配给POI隐性变量。最后, 本论文采用关联规则挖掘分析(FP-growth参数)和序列样式勘探分析(GSP参数)以确认游客行为模式。

研究结果

本论文以慕尼黑城市为样本, 截取2015年13,545张图片。POIs由DBSCAN和k-means聚类分析将其分配到有名的POIs。由此, 本论文证明了两种技术对不同用法的各自优势。关联规则挖掘分析显示了显著联系(support:1%−4.6%;lift:1.4%−32.1%), 序列样式勘探分析确立了相关频率游览次序(support:0.6%−1.7%。

研究理论限制/意义

本论文的理论贡献在于, 根据图片数据, 通过对比分析不同聚类分析技术对确立POIs, 并且证明关联规则挖掘分析和序列样式勘探分析各有千秋又互相补充的分析技术以确立游客空间行为。

研究现实意义

本论文的现实意义在于, 强调了大数据的来源, 比如Flickr,证明了其对于有效代替传统数据的潜力, 以分析在游客在一个旅游目的地的空间行为和动线模式。特别是这种方法实现了实时自动可操作性等优势。

研究原创性/价值

本论文展示了一种方法, 这种方法通过聚类分析社交媒体上的上传图片以确立POIs, 以及通过关联规则挖掘分析和序列样式勘探分析来分析游客空间行为。本论文对于不同聚类分析以确立不同适用情况下的POIs的确立提出了独到见解。

Article
Publication date: 1 May 2006

Derry Tanti Wijaya and Stéphane Bressan

Querying search engines with the keyword “jaguars” returns results as diverse as web sites about cars, computer games, attack planes, American football, and animals. More and more…

Abstract

Querying search engines with the keyword “jaguars” returns results as diverse as web sites about cars, computer games, attack planes, American football, and animals. More and more search engines offer options to organize query results by categories or, given a document, to return a list of links to topically related documents. While information retrieval traditionally defines similarity of documents in terms of contents, it seems natural to expect that the very structure of the Web carries important information about the topical similarity of documents. Here we study the role of a matrix constructed from weighted co‐citations (documents referenced by the same document), weighted couplings (documents referencing the same document), incoming, and outgoing links for the clustering of documents on the Web. We present and discuss three methods of clustering based on this matrix construction using three clustering algorithms, K‐means, Markov and Maximum Spanning Tree, respectively. Our main contribution is a clustering technique based on the Maximum Spanning Tree technique and an evaluation of its effectiveness comparatively to the two most robust alternatives: K‐means and Markov clustering.

Details

International Journal of Web Information Systems, vol. 2 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Book part
Publication date: 10 April 2023

Surachai Chancharat and Arisa Phadungviang

This study groups mutual funds using k-means clustering analysis and compares the k-means clustering process with existing clustering techniques using mutual fund data for equity…

Abstract

This study groups mutual funds using k-means clustering analysis and compares the k-means clustering process with existing clustering techniques using mutual fund data for equity funds, general fixed-income funds, and balanced open-end mutual funds rated by the Association of Investment Management Companies. Data are from January 2016 to December 2020 for 60 months and includes information on prices, risks, and investment policies. The sample for this study comprises 173 funds from 10 asset management companies with the highest net assets. The tool used for analysis is the k-means technique using a statistical package set for k = 3. The funds can be divided into three groups: Group 1 has 5 mutual funds (2.89%), Group 2 has 24 mutual funds (13.87%), and Group 3 has a total of 144 mutual funds (83.24%). In Group 1, four of the five mutual funds are equity funds with a track record of beating the market, and fund managers have good market timing skills. Moreover, the efficiency of fund grouping using the k-means technique was compared with the existing grouping with close results at 57.23%. This work provides a methodology to obtain a better categorization of mutual funds by using k-means clustering, allowing the investors to know how mutual funds are. This categorization is very useful for improving the formulation of mutual funds, with the goal of further optimizing investment.

Details

Comparative Analysis of Trade and Finance in Emerging Economies
Type: Book
ISBN: 978-1-80455-758-7

Keywords

Article
Publication date: 7 June 2019

Anubha Rautela, S.K. Sharma and P. Bhardwaj

The purpose of this paper is to reduce the distribution cost of an Indian cooperative dairy. The reduction of cost was achieved with the application of the clustering method …

Abstract

Purpose

The purpose of this paper is to reduce the distribution cost of an Indian cooperative dairy. The reduction of cost was achieved with the application of the clustering method (k-means clustering) and capacitated vehicle routing problem (cheapest link algorithm (CLA)).

Design/methodology/approach

Capacitated k-means clustering was used to split delivery locations into similar size groups (i.e. clusters) based on proximity without exceeding a specified total cluster capacity. Each cluster would be served by a local stockist. CLA was then used to find delivery routes from dairy (i.e. depot) to stockist in each cluster and from stockist to all other delivery locations within the cluster.

Findings

K-means clustering and CLA suggested optimal delivery routes on which vehicles will run. The complete algorithm was able to provide a solution within 30 s.

Practical implications

Clustering of delivery locations and use of heterogeneous fleet of delivery vehicles can result in considerable savings in daily operational cost.

Originality/value

Most of the research related to the use of demand clustering to improve distribution routes has been theoretical, which do not take into account real-world limitations like vehicle’s specific limitations. The authors tried to address that gap by taking a real-world case of a cooperative dairy and compared the result with existing distribution routes used by dairy. This work can be used by other dairies or distribution companies according to their scenario.

Details

Journal of Advances in Management Research, vol. 16 no. 5
Type: Research Article
ISSN: 0972-7981

Keywords

Article
Publication date: 21 November 2008

Chun‐Nan Lin, Chih‐Fong Tsai and Jinsheng Roan

Because of the popularity of digital cameras, the number of personal photographs is increasing rapidly. In general, people manage their photos by date, subject, participants, etc…

Abstract

Purpose

Because of the popularity of digital cameras, the number of personal photographs is increasing rapidly. In general, people manage their photos by date, subject, participants, etc. for future browsing and searching. However, it is difficult and/or takes time to retrieve desired photos from a large number of photographs based on the general personal photo management strategy. In this paper the authors aim to propose a systematic solution to effectively organising and browsing personal photos.

Design/methodology/approach

In their system the authors apply the concept of content‐based image retrieval (CBIR) to automatically extract visual image features of personal photos. Then three well‐known clustering techniques – k‐means, self‐organising maps and fuzzy c‐means – are used to group personal photos. Finally, the clustering results are evaluated by human subjects in terms of retrieval effectiveness and efficiency.

Findings

Experimental results based on the dataset of 1,000 personal photos show that the k‐means clustering method outperforms self‐organising maps and fuzzy c‐means. That is, 12 subjects out of 30 preferred the clustering results of k‐means. In particular, most subjects agreed that larger numbers of clusters (e.g. 15 to 20) enabled more effective browsing of personal photos. For the efficiency evaluation, the clustering results using k‐means allowed subjects to search for relevant images in the least amount of time.

Originality/value

CBIR is applied in many areas, but very few related works focus on personal photo browsing and retrieval. This paper examines the applicability of using CBIR and clustering techniques for browsing personal photos. In addition, the evaluation based on the effectiveness and efficiency strategies ensures the reliability of our findings.

Details

Online Information Review, vol. 32 no. 6
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 1 March 2021

Hardi M. Mohammed, Zrar Kh. Abdul, Tarik A. Rashid, Abeer Alsadoon and Nebojsa Bacanin

This paper aims at studying meta-heuristic algorithms. One of the common meta-heuristic optimization algorithms is called grey wolf optimization (GWO). The key aim is to enhance…

Abstract

Purpose

This paper aims at studying meta-heuristic algorithms. One of the common meta-heuristic optimization algorithms is called grey wolf optimization (GWO). The key aim is to enhance the limitations of the wolves’ searching process of attacking gray wolves.

Design/methodology/approach

The development of meta-heuristic algorithms has increased by researchers to use them extensively in the field of business, science and engineering. In this paper, the K-means clustering algorithm is used to enhance the performance of the original GWO; the new algorithm is called K-means clustering gray wolf optimization (KMGWO).

Findings

Results illustrate the efficiency of KMGWO against to the GWO. To evaluate the performance of the KMGWO, KMGWO applied to solve CEC2019 benchmark test functions.

Originality/value

Results prove that KMGWO is superior to GWO. KMGWO is also compared to cat swarm optimization (CSO), whale optimization algorithm-bat algorithm (WOA-BAT), WOA and GWO so KMGWO achieved the first rank in terms of performance. In addition, the KMGWO is used to solve a classical engineering problem and it is superior.

Details

World Journal of Engineering, vol. 18 no. 4
Type: Research Article
ISSN: 1708-5284

Keywords

1 – 10 of over 2000