Search results

1 – 10 of over 10000
Article
Publication date: 1 March 1984

ALAN GRIFFITHS, LESLEY A. ROBINSON and PETER WILLETT

This paper considers the classifications produced by application of the single linkage, complete linkage, group average and Ward clustering methods to the Keen and Cranfield…

Abstract

This paper considers the classifications produced by application of the single linkage, complete linkage, group average and Ward clustering methods to the Keen and Cranfield document test collections. Experiments were carried out to study the structure of the hierarchies produced by the different methods, the extent to which the methods distort the input similarity matrices during the generation of a classification, and the retrieval effectiveness obtainable in cluster based retrieval. The results would suggest that the single linkage method, which has been used extensively in previous work on document clustering, is not the most effective procedure of those tested, although it should be emphasized that the experiments have used only small document test collections.

Details

Journal of Documentation, vol. 40 no. 3
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 July 1980

J.A. Saunders

Examines the processes of cluster analysis and describes them using an example of benefit segmentation, and also discusses other applications suggesting new directions of research…

2747

Abstract

Examines the processes of cluster analysis and describes them using an example of benefit segmentation, and also discusses other applications suggesting new directions of research in related fields. Bases an example study with 200 early respondents to a survey into sixth formers' choice of degree course, in which students were given 23 criteria which related to their course choice. Comparisons of likeness using Euclidean distance measures were employed. Uses also importance ratings given by three drivers to characteristics of new cars. Proposes that hierarchical clustering can be criticised when used to cluster data that is not naturally hierarchical, but other procedures have similar failings. Posits that clumping and optimisation in conjunction with hierarchical clustering offer the greater potential. Concludes that cluster analysis is a flexible tool, which provides a number of opportunities for marketing, and it is an appealing and simple idea ‐ but there are many technical questions that a researcher must ask before it is used.

Details

European Journal of Marketing, vol. 14 no. 7
Type: Research Article
ISSN: 0309-0566

Keywords

Article
Publication date: 8 November 2018

Radhia Toujani and Jalel Akaichi

Nowadays, the event detection is so important in gathering news from social media. Indeed, it is widely employed by journalists to generate early alerts of reported stories. In…

Abstract

Purpose

Nowadays, the event detection is so important in gathering news from social media. Indeed, it is widely employed by journalists to generate early alerts of reported stories. In order to incorporate available data on social media into a news story, journalists must manually process, compile and verify the news content within a very short time span. Despite its utility and importance, this process is time-consuming and labor-intensive for media organizations. Because of the afore-mentioned reason and as social media provides an essential source of data used as a support for professional journalists, the purpose of this paper is to propose the citizen clustering technique which allows the community of journalists and media professionals to document news during crises.

Design/methodology/approach

The authors develop, in this study, an approach for natural hazard events news detection and danger citizen’ groups clustering based on three major steps. In the first stage, the authors present a pipeline of several natural language processing tasks: event trigger detection, applied to recuperate potential event triggers; named entity recognition, used for the detection and recognition of event participants related to the extracted event triggers; and, ultimately, a dependency analysis between all the extracted data. Analyzing the ambiguity and the vagueness of similarity of news plays a key role in event detection. This issue was ignored in traditional event detection techniques. To this end, in the second step of our approach, the authors apply fuzzy sets techniques on these extracted events to enhance the clustering quality and remove the vagueness of the extracted information. Then, the defined degree of citizens’ danger is injected as input to the introduced citizens clustering method in order to detect citizens’ communities with close disaster degrees.

Findings

Empirical results indicate that homogeneous and compact citizen’ clusters can be detected using the suggested event detection method. It can also be observed that event news can be analyzed efficiently using the fuzzy theory. In addition, the proposed visualization process plays a crucial role in data journalism, as it is used to analyze event news, as well as in the final presentation of detected danger citizens’ clusters.

Originality/value

The introduced citizens clustering method is profitable for journalists and editors to better judge the veracity of social media content, navigate the overwhelming, identify eyewitnesses and contextualize the event. The empirical analysis results illustrate the efficiency of the developed method for both real and artificial networks.

Details

Online Information Review, vol. 43 no. 1
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 17 October 2008

Rui Xu and Donald C. Wunsch

The purpose of this paper is to provide a review of the issues related to cluster analysis, one of the most important and primitive activities of human beings, and of the advances…

1746

Abstract

Purpose

The purpose of this paper is to provide a review of the issues related to cluster analysis, one of the most important and primitive activities of human beings, and of the advances made in recent years.

Design/methodology/approach

The paper investigates the clustering algorithms rooted in machine learning, computer science, statistics, and computational intelligence.

Findings

The paper reviews the basic issues of cluster analysis and discusses the recent advances of clustering algorithms in scalability, robustness, visualization, irregular cluster shape detection, and so on.

Originality/value

The paper presents a comprehensive and systematic survey of cluster analysis and emphasizes its recent efforts in order to meet the challenges caused by the glut of complicated data from a wide variety of communities.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 1 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 19 June 2017

Khai Tan Huynh, Tho Thanh Quan and Thang Hoai Bui

Service-oriented architecture is an emerging software architecture, in which web service (WS) plays a crucial role. In this architecture, the task of WS composition and…

Abstract

Purpose

Service-oriented architecture is an emerging software architecture, in which web service (WS) plays a crucial role. In this architecture, the task of WS composition and verification is required when handling complex requirement of services from users. When the number of WS becomes very huge in practice, the complexity of the composition and verification is also correspondingly high. In this paper, the authors aim to propose a logic-based clustering approach to solve this problem by separating the original repository of WS into clusters. Moreover, they also propose a so-called quality-controlled clustering approach to ensure the quality of generated clusters in a reasonable execution time.

Design/methodology/approach

The approach represents WSs as logical formulas on which the authors conduct the clustering task. They also combine two most popular clustering approaches of hierarchical agglomerative clustering (HAC) and k-means to ensure the quality of generated clusters.

Findings

This logic-based clustering approach really helps to increase the performance of the WS composition and verification significantly. Furthermore, the logic-based approach helps us to maintain the soundness and completeness of the composition solution. Eventually, the quality-controlled strategy can ensure the quality of generated clusters in low complexity time.

Research limitations/implications

The work discussed in this paper is just implemented as a research tool known as WSCOVER. More work is needed to make it a practical and usable system for real life applications.

Originality/value

In this paper, the authors propose a logic-based paradigm to represent and cluster WSs. Moreover, they also propose an approach of quality-controlled clustering which combines and takes advantages of two most popular clustering approaches of HAC and k-means.

Article
Publication date: 16 January 2017

Chirihane Gherbi, Zibouda Aliouat and Mohamed Benmohammed

In particular, this paper aims to systematically analyze a few prominent wireless sensor network (WSN) clustering routing protocols and compare these different approaches…

651

Abstract

Purpose

In particular, this paper aims to systematically analyze a few prominent wireless sensor network (WSN) clustering routing protocols and compare these different approaches according to the taxonomy and several significant metrics.

Design/methodology/approach

In this paper, the authors have summarized recent research results on data routing in sensor networks and classified the approaches into four main categories, namely, data-centric, hierarchical, location-based and quality of service (QoS)-aware, and the authors have discussed the effect of node placement strategies on the operation and performance of WSNs.

Originality/value

Performance-controlled planned networks, where placement and routing must be intertwined and everything from delays to throughput to energy requirements is well-defined and relevant, is an interesting subject of current and future research. Real-time, deadline guarantees and their relationship with routing, mac-layer, duty-cycles and other protocol stack issues are interesting issues that would benefit from further research.

Details

Sensor Review, vol. 37 no. 1
Type: Research Article
ISSN: 0260-2288

Keywords

Article
Publication date: 19 April 2022

Prosenjit Ghosh and Sabyasachi Mukherjee

The study aims to cluster the travellers based on their social media interactions as well as to find the different segments with similar and dissimilar categories according to…

602

Abstract

Purpose

The study aims to cluster the travellers based on their social media interactions as well as to find the different segments with similar and dissimilar categories according to traveller's choice. The study also aims to understand the behaviour of clusters of the travellers towards destination selection and accordingly make the tour packages in order to improve tourists' satisfaction and gain viable benefits.

Design/methodology/approach

Agglomerative hierarchical clustering with Ward's minimum variance linkage algorithm and model-based clustering with parameterized finite Gaussian mixture models has been implemented to achieve the respective goals. The dimension reduction (DR) technique was introduced for better visualizing clustering structure obtained from a finite mixture of Gaussian densities.

Findings

A total of 980 travellers have been clustered into 8 different interest groups according to their tourism destinations selection across East Asia based on individual social media feedback. For selecting the optimal number of clusters as well as the behaviour of the interested travellers groups, both these proposed methods have shown remarkable similarities. DR technique ensures the reduction in dimensionality with seven directions, of which the first two directions explained 95% of total variability.

Practical implications

Tourism organizations focus on marketing efforts to promote the most attractive benefits to the clusters of travellers. By segmenting travellers of East Asia into homogeneous groups, it is feasible to choose a similar area to test different marketing techniques. Finally, it can be identified to which segments, new respondents or potential clients belong; consequently, the tourism organizations can design the tour packages.

Originality/value

The study has uniqueness in two aspects. Firstly, the study empirically revealed tourists' experience and behavioural intention to select tourism destinations and secondly, it finds quantifiable insights into the tourism phenomenon in East Asia, which helps tourism organizations to understand the buying behaviours of tourists' segments. Finally, the application of clustering algorithms to achieve the purpose of this study and the findings are very new in the literature on tourism, to understand the tourist behaviour towards destination selection based on social media reviews.

Details

Journal of Hospitality and Tourism Insights, vol. 6 no. 2
Type: Research Article
ISSN: 2514-9792

Keywords

Open Access
Article
Publication date: 22 November 2022

Kedong Yin, Yun Cao, Shiwei Zhou and Xinman Lv

The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems…

Abstract

Purpose

The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems for the design optimization and inspection process. The research may form the basis for a rational, comprehensive evaluation and provide the most effective way of improving the quality of management decision-making. It is of practical significance to improve the rationality and reliability of the index system and provide standardized, scientific reference standards and theoretical guidance for the design and construction of the index system.

Design/methodology/approach

Using modern methods such as complex networks and machine learning, a system for the quality diagnosis of index data and the classification and stratification of index systems is designed. This guarantees the quality of the index data, realizes the scientific classification and stratification of the index system, reduces the subjectivity and randomness of the design of the index system, enhances its objectivity and rationality and lays a solid foundation for the optimal design of the index system.

Findings

Based on the ideas of statistics, system theory, machine learning and data mining, the focus in the present research is on “data quality diagnosis” and “index classification and stratification” and clarifying the classification standards and data quality characteristics of index data; a data-quality diagnosis system of “data review – data cleaning – data conversion – data inspection” is established. Using a decision tree, explanatory structural model, cluster analysis, K-means clustering and other methods, classification and hierarchical method system of indicators is designed to reduce the redundancy of indicator data and improve the quality of the data used. Finally, the scientific and standardized classification and hierarchical design of the index system can be realized.

Originality/value

The innovative contributions and research value of the paper are reflected in three aspects. First, a method system for index data quality diagnosis is designed, and multi-source data fusion technology is adopted to ensure the quality of multi-source, heterogeneous and mixed-frequency data of the index system. The second is to design a systematic quality-inspection process for missing data based on the systematic thinking of the whole and the individual. Aiming at the accuracy, reliability, and feasibility of the patched data, a quality-inspection method of patched data based on inversion thought and a unified representation method of data fusion based on a tensor model are proposed. The third is to use the modern method of unsupervised learning to classify and stratify the index system, which reduces the subjectivity and randomness of the design of the index system and enhances its objectivity and rationality.

Details

Marine Economics and Management, vol. 5 no. 2
Type: Research Article
ISSN: 2516-158X

Keywords

Article
Publication date: 1 July 2014

Héctor Rodríguez-Déniz and Augusto Voltes-Dorta

When large samples are used to estimate airport efficiency, clustering is a necessary step before carrying out any benchmarking analysis. However, the existing literature has paid…

1187

Abstract

Purpose

When large samples are used to estimate airport efficiency, clustering is a necessary step before carrying out any benchmarking analysis. However, the existing literature has paid little attention to developing a robust methodology for airport classification, instead relying on ad hoc techniques. In order to address this issue, this paper aims to develop a new airport clustering procedure.

Design/methodology/approach

A frontier-based hierarchical clustering procedure is developed. An application to cost-efficiency benchmarking is presented using the cost function parameters available in the literature. A cross-section of worldwide airports is clustered according to the relevant outputs and input prices, with cost elasticities and factor shares serving as optimal variable weights.

Findings

The authors found 17 distinct airport clusters without any ad hoc input. Factors like the use of larger aircraft or the dominance of low-cost carriers are shown to improve cost performance in the airport industry.

Practical implications

The proposed method allows for a more precise identification of the efficiency benchmarks, which are characterized by a set of cophenetic distances to their “peers”. Furthermore, the resulting classification can also be used to benchmark other indicators linked to airport costs, such as aeronautical charges or service quality.

Originality/value

This paper contributed to airport clustering by providing the first discussion and application of optimal variable weighting. In regard to efficiency benchmarking, the paper aims to overcome the limitations of previous papers by defining a method that is not dependent on performance, but on technology, and that can be easily adapted to large airport datasets.

Details

Benchmarking: An International Journal, vol. 21 no. 4
Type: Research Article
ISSN: 1463-5771

Keywords

Article
Publication date: 9 May 2016

Chao-Lung Yang and Thi Phuong Quyen Nguyen

Class-based storage has been studied extensively and proved to be an efficient storage policy. However, few literature addressed how to cluster stuck items for class-based…

2534

Abstract

Purpose

Class-based storage has been studied extensively and proved to be an efficient storage policy. However, few literature addressed how to cluster stuck items for class-based storage. The purpose of this paper is to develop a constrained clustering method integrated with principal component analysis (PCA) to meet the need of clustering stored items with the consideration of practical storage constraints.

Design/methodology/approach

In order to consider item characteristic and the associated storage restrictions, the must-link and cannot-link constraints were constructed to meet the storage requirement. The cube-per-order index (COI) which has been used for location assignment in class-based warehouse was analyzed by PCA. The proposed constrained clustering method utilizes the principal component loadings as item sub-group features to identify COI distribution of item sub-groups. The clustering results are then used for allocating storage by using the heuristic assignment model based on COI.

Findings

The clustering result showed that the proposed method was able to provide better compactness among item clusters. The simulated result also shows the new location assignment by the proposed method was able to improve the retrieval efficiency by 33 percent.

Practical implications

While number of items in warehouse is tremendously large, the human intervention on revealing storage constraints is going to be impossible. The developed method can be easily fit in to solve the problem no matter what the size of the data is.

Originality/value

The case study demonstrated an example of practical location assignment problem with constraints. This paper also sheds a light on developing a data clustering method which can be directly applied on solving the practical data analysis issues.

Details

Industrial Management & Data Systems, vol. 116 no. 4
Type: Research Article
ISSN: 0263-5577

Keywords

1 – 10 of over 10000