Search results
1 – 10 of over 12000ALAN GRIFFITHS, LESLEY A. ROBINSON and PETER WILLETT
This paper considers the classifications produced by application of the single linkage, complete linkage, group average and Ward clustering methods to the Keen and Cranfield…
Abstract
This paper considers the classifications produced by application of the single linkage, complete linkage, group average and Ward clustering methods to the Keen and Cranfield document test collections. Experiments were carried out to study the structure of the hierarchies produced by the different methods, the extent to which the methods distort the input similarity matrices during the generation of a classification, and the retrieval effectiveness obtainable in cluster based retrieval. The results would suggest that the single linkage method, which has been used extensively in previous work on document clustering, is not the most effective procedure of those tested, although it should be emphasized that the experiments have used only small document test collections.
Nur Syazwin Mansor, Norhaiza Ahmad and Arien Heryansyah
This study compares the performance of two types of clustering methods, time-based and non-time-based clustering, in the identification of river discharge patterns at the Johor…
Abstract
This study compares the performance of two types of clustering methods, time-based and non-time-based clustering, in the identification of river discharge patterns at the Johor River basin during the northeast monsoon season. Time-based clustering is represented by employing dynamic time warping (DTW) dissimilarity measure, whereas non-time-based clustering is represented by employing Euclidean dissimilarity measure in analysing the Johor River discharge data. In addition, we combine each of these clustering methods with a frequency domain representation of the discharge data using Discrete Fourier Transform (DFT) to see if such transformation affects the clustering results. The clustering quality from the hierarchical data structures of the identified river discharge patterns for each of the methods is measured by the Cophenetic Correlation Coefficient (CPCC). The results from the time-based clustering using DTW based on DFT transformation show a higher CPCC value as compared to that of non-time-based clustering methods.
Details
Keywords
Examines the processes of cluster analysis and describes them using an example of benefit segmentation, and also discusses other applications suggesting new directions of research…
Abstract
Examines the processes of cluster analysis and describes them using an example of benefit segmentation, and also discusses other applications suggesting new directions of research in related fields. Bases an example study with 200 early respondents to a survey into sixth formers' choice of degree course, in which students were given 23 criteria which related to their course choice. Comparisons of likeness using Euclidean distance measures were employed. Uses also importance ratings given by three drivers to characteristics of new cars. Proposes that hierarchical clustering can be criticised when used to cluster data that is not naturally hierarchical, but other procedures have similar failings. Posits that clumping and optimisation in conjunction with hierarchical clustering offer the greater potential. Concludes that cluster analysis is a flexible tool, which provides a number of opportunities for marketing, and it is an appealing and simple idea ‐ but there are many technical questions that a researcher must ask before it is used.
Details
Keywords
Sunil Kumar Jauhar, B. Ripon Chakma, Sachin S. Kamble and Amine Belhadi
As e-commerce has expanded rapidly, online shopping platforms have become widespread in India and throughout the world. Product return, which has a negative effect on the…
Abstract
Purpose
As e-commerce has expanded rapidly, online shopping platforms have become widespread in India and throughout the world. Product return, which has a negative effect on the E-Commerce Industry's economic and ecological sustainability, is one of the E-Commerce Industry's greatest challenges in light of the substantial increase in online transactions. The authors have analyzed the purchasing patterns of the customers to better comprehend their product purchase and return patterns.
Design/methodology/approach
The authors utilized digital transformation techniques-based recency, frequency and monetary models to better understand and segment potential customers in order to address personalized strategies to increase sales, and the authors performed seller clustering using k-means and hierarchical clustering to determine why some sellers have the most sales and what products they offer that entice customers to purchase.
Findings
The authors discovered, through the application of digital transformation models to customer segmentation, that over 61.15% of consumers are likely to purchase, loyal customers and utilize firm service, whereas approximately 35% of customers have either stopped purchasing or have relatively low spending. To retain these consumer segments, special consideration and an enticing offer are required. As the authors dug deeper into the seller clustering, we discovered that the maximum number of clusters is six, while certain clusters indicate that prompt delivery of the goods plays a crucial role in customer feedback and high sales volume.
Originality/value
This is one of the rare study that develops a seller segmentation strategy by utilizing digital transformation-based methods in order to achieve seller group division.
Details
Keywords
Nihan Yildirim, Derya Gultekin, Cansu Hürses and Abdullah Mert Akman
This paper aims to use text mining methods to explore the similarities and differences between countries’ national digital transformation (DT) and Industry 4.0 (I4.0) policies…
Abstract
Purpose
This paper aims to use text mining methods to explore the similarities and differences between countries’ national digital transformation (DT) and Industry 4.0 (I4.0) policies. The study examines the applicability of text mining as an alternative for comprehensive clustering of national I4.0 and DT strategies, encouraging policy researchers toward data science that can offer rapid policy analysis and benchmarking.
Design/methodology/approach
With an exploratory research approach, topic modeling, principal component analysis and unsupervised machine learning algorithms (k-means and hierarchical clustering) are used for clustering national I4.0 and DT strategies. This paper uses a corpus of policy documents and related scientific publications from several countries and integrate their science and technology performance. The paper also presents the positioning of Türkiye’s I4.0 and DT national policy as a case from a developing country context.
Findings
Text mining provides meaningful clustering results on similarities and differences between countries regarding their national I4.0 and DT policies, aligned with their geographic, economic and political circumstances. Findings also shed light on the DT strategic landscape and the key themes spanning various policy dimensions. Drawing from the Turkish case, political options are discussed in the context of developing (follower) countries’ I4.0 and DT.
Practical implications
The paper reveals meaningful clustering results on similarities and differences between countries regarding their national I4.0 and DT policies, reflecting political proximities aligned with their geographic, economic and political circumstances. This can help policymakers to comparatively understand national DT and I4.0 policies and use this knowledge to reflect collaborative and competitive measures to their policies.
Originality/value
This paper provides a unique combined methodology for text mining-based policy analysis in the DT context, which has not been adopted. In an era where computational social science and machine learning have gained importance and adaptability to political and social science fields, and in the technology and innovation management discipline, clustering applications showed similar and different policy patterns in a timely and unbiased manner.
Details
Keywords
Radhia Toujani and Jalel Akaichi
Nowadays, the event detection is so important in gathering news from social media. Indeed, it is widely employed by journalists to generate early alerts of reported stories. In…
Abstract
Purpose
Nowadays, the event detection is so important in gathering news from social media. Indeed, it is widely employed by journalists to generate early alerts of reported stories. In order to incorporate available data on social media into a news story, journalists must manually process, compile and verify the news content within a very short time span. Despite its utility and importance, this process is time-consuming and labor-intensive for media organizations. Because of the afore-mentioned reason and as social media provides an essential source of data used as a support for professional journalists, the purpose of this paper is to propose the citizen clustering technique which allows the community of journalists and media professionals to document news during crises.
Design/methodology/approach
The authors develop, in this study, an approach for natural hazard events news detection and danger citizen’ groups clustering based on three major steps. In the first stage, the authors present a pipeline of several natural language processing tasks: event trigger detection, applied to recuperate potential event triggers; named entity recognition, used for the detection and recognition of event participants related to the extracted event triggers; and, ultimately, a dependency analysis between all the extracted data. Analyzing the ambiguity and the vagueness of similarity of news plays a key role in event detection. This issue was ignored in traditional event detection techniques. To this end, in the second step of our approach, the authors apply fuzzy sets techniques on these extracted events to enhance the clustering quality and remove the vagueness of the extracted information. Then, the defined degree of citizens’ danger is injected as input to the introduced citizens clustering method in order to detect citizens’ communities with close disaster degrees.
Findings
Empirical results indicate that homogeneous and compact citizen’ clusters can be detected using the suggested event detection method. It can also be observed that event news can be analyzed efficiently using the fuzzy theory. In addition, the proposed visualization process plays a crucial role in data journalism, as it is used to analyze event news, as well as in the final presentation of detected danger citizens’ clusters.
Originality/value
The introduced citizens clustering method is profitable for journalists and editors to better judge the veracity of social media content, navigate the overwhelming, identify eyewitnesses and contextualize the event. The empirical analysis results illustrate the efficiency of the developed method for both real and artificial networks.
Details
Keywords
The purpose of this paper is to provide a review of the issues related to cluster analysis, one of the most important and primitive activities of human beings, and of the advances…
Abstract
Purpose
The purpose of this paper is to provide a review of the issues related to cluster analysis, one of the most important and primitive activities of human beings, and of the advances made in recent years.
Design/methodology/approach
The paper investigates the clustering algorithms rooted in machine learning, computer science, statistics, and computational intelligence.
Findings
The paper reviews the basic issues of cluster analysis and discusses the recent advances of clustering algorithms in scalability, robustness, visualization, irregular cluster shape detection, and so on.
Originality/value
The paper presents a comprehensive and systematic survey of cluster analysis and emphasizes its recent efforts in order to meet the challenges caused by the glut of complicated data from a wide variety of communities.
Details
Keywords
Khai Tan Huynh, Tho Thanh Quan and Thang Hoai Bui
Service-oriented architecture is an emerging software architecture, in which web service (WS) plays a crucial role. In this architecture, the task of WS composition and…
Abstract
Purpose
Service-oriented architecture is an emerging software architecture, in which web service (WS) plays a crucial role. In this architecture, the task of WS composition and verification is required when handling complex requirement of services from users. When the number of WS becomes very huge in practice, the complexity of the composition and verification is also correspondingly high. In this paper, the authors aim to propose a logic-based clustering approach to solve this problem by separating the original repository of WS into clusters. Moreover, they also propose a so-called quality-controlled clustering approach to ensure the quality of generated clusters in a reasonable execution time.
Design/methodology/approach
The approach represents WSs as logical formulas on which the authors conduct the clustering task. They also combine two most popular clustering approaches of hierarchical agglomerative clustering (HAC) and k-means to ensure the quality of generated clusters.
Findings
This logic-based clustering approach really helps to increase the performance of the WS composition and verification significantly. Furthermore, the logic-based approach helps us to maintain the soundness and completeness of the composition solution. Eventually, the quality-controlled strategy can ensure the quality of generated clusters in low complexity time.
Research limitations/implications
The work discussed in this paper is just implemented as a research tool known as WSCOVER. More work is needed to make it a practical and usable system for real life applications.
Originality/value
In this paper, the authors propose a logic-based paradigm to represent and cluster WSs. Moreover, they also propose an approach of quality-controlled clustering which combines and takes advantages of two most popular clustering approaches of HAC and k-means.
Details
Keywords
Chirihane Gherbi, Zibouda Aliouat and Mohamed Benmohammed
In particular, this paper aims to systematically analyze a few prominent wireless sensor network (WSN) clustering routing protocols and compare these different approaches…
Abstract
Purpose
In particular, this paper aims to systematically analyze a few prominent wireless sensor network (WSN) clustering routing protocols and compare these different approaches according to the taxonomy and several significant metrics.
Design/methodology/approach
In this paper, the authors have summarized recent research results on data routing in sensor networks and classified the approaches into four main categories, namely, data-centric, hierarchical, location-based and quality of service (QoS)-aware, and the authors have discussed the effect of node placement strategies on the operation and performance of WSNs.
Originality/value
Performance-controlled planned networks, where placement and routing must be intertwined and everything from delays to throughput to energy requirements is well-defined and relevant, is an interesting subject of current and future research. Real-time, deadline guarantees and their relationship with routing, mac-layer, duty-cycles and other protocol stack issues are interesting issues that would benefit from further research.
Details
Keywords
Prosenjit Ghosh and Sabyasachi Mukherjee
The study aims to cluster the travellers based on their social media interactions as well as to find the different segments with similar and dissimilar categories according to…
Abstract
Purpose
The study aims to cluster the travellers based on their social media interactions as well as to find the different segments with similar and dissimilar categories according to traveller's choice. The study also aims to understand the behaviour of clusters of the travellers towards destination selection and accordingly make the tour packages in order to improve tourists' satisfaction and gain viable benefits.
Design/methodology/approach
Agglomerative hierarchical clustering with Ward's minimum variance linkage algorithm and model-based clustering with parameterized finite Gaussian mixture models has been implemented to achieve the respective goals. The dimension reduction (DR) technique was introduced for better visualizing clustering structure obtained from a finite mixture of Gaussian densities.
Findings
A total of 980 travellers have been clustered into 8 different interest groups according to their tourism destinations selection across East Asia based on individual social media feedback. For selecting the optimal number of clusters as well as the behaviour of the interested travellers groups, both these proposed methods have shown remarkable similarities. DR technique ensures the reduction in dimensionality with seven directions, of which the first two directions explained 95% of total variability.
Practical implications
Tourism organizations focus on marketing efforts to promote the most attractive benefits to the clusters of travellers. By segmenting travellers of East Asia into homogeneous groups, it is feasible to choose a similar area to test different marketing techniques. Finally, it can be identified to which segments, new respondents or potential clients belong; consequently, the tourism organizations can design the tour packages.
Originality/value
The study has uniqueness in two aspects. Firstly, the study empirically revealed tourists' experience and behavioural intention to select tourism destinations and secondly, it finds quantifiable insights into the tourism phenomenon in East Asia, which helps tourism organizations to understand the buying behaviours of tourists' segments. Finally, the application of clustering algorithms to achieve the purpose of this study and the findings are very new in the literature on tourism, to understand the tourist behaviour towards destination selection based on social media reviews.
Details