Search results

1 – 10 of 21
Open Access
Article
Publication date: 10 August 2022

Jie Ma, Zhiyuan Hao and Mo Hu

The density peak clustering algorithm (DP) is proposed to identify cluster centers by two parameters, i.e. ρ value (local density) and δ value (the distance between a point and…

Abstract

Purpose

The density peak clustering algorithm (DP) is proposed to identify cluster centers by two parameters, i.e. ρ value (local density) and δ value (the distance between a point and another point with a higher ρ value). According to the center-identifying principle of the DP, the potential cluster centers should have a higher ρ value and a higher δ value than other points. However, this principle may limit the DP from identifying some categories with multi-centers or the centers in lower-density regions. In addition, the improper assignment strategy of the DP could cause a wrong assignment result for the non-center points. This paper aims to address the aforementioned issues and improve the clustering performance of the DP.

Design/methodology/approach

First, to identify as many potential cluster centers as possible, the authors construct a point-domain by introducing the pinhole imaging strategy to extend the searching range of the potential cluster centers. Second, they design different novel calculation methods for calculating the domain distance, point-domain density and domain similarity. Third, they adopt domain similarity to achieve the domain merging process and optimize the final clustering results.

Findings

The experimental results on analyzing 12 synthetic data sets and 12 real-world data sets show that two-stage density peak clustering based on multi-strategy optimization (TMsDP) outperforms the DP and other state-of-the-art algorithms.

Originality/value

The authors propose a novel DP-based clustering method, i.e. TMsDP, and transform the relationship between points into that between domains to ultimately further optimize the clustering performance of the DP.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Article
Publication date: 5 September 2016

Qingyuan Wu, Changchen Zhan, Fu Lee Wang, Siyang Wang and Zeping Tang

The quick growth of web-based and mobile e-learning applications such as massive open online courses have created a large volume of online learning resources. Confronting such a…

3535

Abstract

Purpose

The quick growth of web-based and mobile e-learning applications such as massive open online courses have created a large volume of online learning resources. Confronting such a large amount of learning data, it is important to develop effective clustering approaches for user group modeling and intelligent tutoring. The paper aims to discuss these issues.

Design/methodology/approach

In this paper, a minimum spanning tree based approach is proposed for clustering of online learning resources. The novel clustering approach has two main stages, namely, elimination stage and construction stage. During the elimination stage, the Euclidean distance is adopted as a metrics formula to measure density of learning resources. Resources with quite low densities are identified as outliers and therefore removed. During the construction stage, a minimum spanning tree is built by initializing the centroids according to the degree of freedom of the resources. Online learning resources are subsequently partitioned into clusters by exploiting the structure of minimum spanning tree.

Findings

Conventional clustering algorithms have a number of shortcomings such that they cannot handle online learning resources effectively. On the one hand, extant partitional clustering methods use a randomly assigned centroid for each cluster, which usually cause the problem of ineffective clustering results. On the other hand, classical density-based clustering methods are very computationally expensive and time-consuming. Experimental results indicate that the algorithm proposed outperforms the traditional clustering algorithms for online learning resources.

Originality/value

The effectiveness of the proposed algorithms has been validated by using several data sets. Moreover, the proposed clustering algorithm has great potential in e-learning applications. It has been demonstrated how the novel technique can be integrated in various e-learning systems. For example, the clustering technique can classify learners into groups so that homogeneous grouping can improve the effectiveness of learning. Moreover, clustering of online learning resources is valuable to decision making in terms of tutorial strategies and instructional design for intelligent tutoring. Lastly, a number of directions for future research have been identified in the study.

Details

Asian Association of Open Universities Journal, vol. 11 no. 2
Type: Research Article
ISSN: 1858-3431

Keywords

Open Access
Article
Publication date: 13 November 2018

Zhiwen Pan, Wen Ji, Yiqiang Chen, Lianjun Dai and Jun Zhang

The disability datasets are the datasets that contain the information of disabled populations. By analyzing these datasets, professionals who work with disabled populations can…

1254

Abstract

Purpose

The disability datasets are the datasets that contain the information of disabled populations. By analyzing these datasets, professionals who work with disabled populations can have a better understanding of the inherent characteristics of the disabled populations, so that working plans and policies, which can effectively help the disabled populations, can be made accordingly.

Design/methodology/approach

In this paper, the authors proposed a big data management and analytic approach for disability datasets.

Findings

By using a set of data mining algorithms, the proposed approach can provide the following services. The data management scheme in the approach can improve the quality of disability data by estimating miss attribute values and detecting anomaly and low-quality data instances. The data mining scheme in the approach can explore useful patterns which reflect the correlation, association and interactional between the disability data attributes. Experiments based on real-world dataset are conducted at the end to prove the effectiveness of the approach.

Originality/value

The proposed approach can enable data-driven decision-making for professionals who work with disabled populations.

Details

International Journal of Crowd Science, vol. 2 no. 2
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 26 October 2020

Mohammed S. Al-kahtani, Lutful Karim and Nargis Khan

Designing an efficient routing protocol that opportunistically forwards data to the destination node through nearby sensor nodes or devices is significantly important for an…

Abstract

Designing an efficient routing protocol that opportunistically forwards data to the destination node through nearby sensor nodes or devices is significantly important for an effective incidence response and disaster recovery framework. Existing sensor routing protocols are mostly not effective in such disaster recovery applications as the networks are affected (destroyed or overused) in disasters such as earthquake, flood, Tsunami and wildfire. These protocols require a large number of message transmissions to reestablish the clusters and communications that is not energy efficient and result in packet loss. This paper introduces ODCR - an energy efficient and reliable opportunistic density clustered-based routing protocol for such emergency sensor applications. We perform simulation to measure the performance of ODCR protocol in terms of network energy consumptions, throughput and packet loss ratio. Simulation results demonstrate that the ODCR protocol is much better than the existing TEEN, LEACH and LORA protocols in term of these performance metrics.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 19 December 2023

Qinxu Ding, Ding Ding, Yue Wang, Chong Guan and Bosheng Ding

The rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a comprehensive…

2011

Abstract

Purpose

The rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a comprehensive examination of the research landscape in LLMs, providing an overview of the prevailing themes and topics within this dynamic domain.

Design/methodology/approach

Drawing from an extensive corpus of 198 records published between 1996 to 2023 from the relevant academic database encompassing journal articles, books, book chapters, conference papers and selected working papers, this study delves deep into the multifaceted world of LLM research. In this study, the authors employed the BERTopic algorithm, a recent advancement in topic modeling, to conduct a comprehensive analysis of the data after it had been meticulously cleaned and preprocessed. BERTopic leverages the power of transformer-based language models like bidirectional encoder representations from transformers (BERT) to generate more meaningful and coherent topics. This approach facilitates the identification of hidden patterns within the data, enabling authors to uncover valuable insights that might otherwise have remained obscure. The analysis revealed four distinct clusters of topics in LLM research: “language and NLP”, “education and teaching”, “clinical and medical applications” and “speech and recognition techniques”. Each cluster embodies a unique aspect of LLM application and showcases the breadth of possibilities that LLM technology has to offer. In addition to presenting the research findings, this paper identifies key challenges and opportunities in the realm of LLMs. It underscores the necessity for further investigation in specific areas, including the paramount importance of addressing potential biases, transparency and explainability, data privacy and security, and responsible deployment of LLM technology.

Findings

The analysis revealed four distinct clusters of topics in LLM research: “language and NLP”, “education and teaching”, “clinical and medical applications” and “speech and recognition techniques”. Each cluster embodies a unique aspect of LLM application and showcases the breadth of possibilities that LLM technology has to offer. In addition to presenting the research findings, this paper identifies key challenges and opportunities in the realm of LLMs. It underscores the necessity for further investigation in specific areas, including the paramount importance of addressing potential biases, transparency and explainability, data privacy and security, and responsible deployment of LLM technology.

Practical implications

This classification offers practical guidance for researchers, developers, educators, and policymakers to focus efforts and resources. The study underscores the importance of addressing challenges in LLMs, including potential biases, transparency, data privacy, and responsible deployment. Policymakers can utilize this information to shape regulations, while developers can tailor technology development based on the diverse applications identified. The findings also emphasize the need for interdisciplinary collaboration and highlight ethical considerations, providing a roadmap for navigating the complex landscape of LLM research and applications.

Originality/value

This study stands out as the first to examine the evolution of LLMs across such a long time frame and across such diversified disciplines. It provides a unique perspective on the key areas of LLM research, highlighting the breadth and depth of LLM’s evolution.

Details

Journal of Electronic Business & Digital Economics, vol. 3 no. 1
Type: Research Article
ISSN: 2754-4214

Keywords

Open Access
Article
Publication date: 10 February 2020

Veronika Fenyves, Kinga Emese Zsido, Ioan Bircea and Tibor Tarnoczi

Changes in food retailing (globalization, concentration) have negative impacts on smaller, “traditional” food retail businesses. Their market share decreasing year by year. The…

4133

Abstract

Purpose

Changes in food retailing (globalization, concentration) have negative impacts on smaller, “traditional” food retail businesses. Their market share decreasing year by year. The purpose of this study is to examine and compare the financial performances of these businesses under the given circumstances and current economic environment in a Hungarian and a Romanian county.

Design/methodology/approach

The study is based on two complete databases, including all companies that behoove retail food activity (considering the NACE cod) in the counties of Hajdu-Bihar (Hungary) and Cluj (Romania). The database analyzed contains the financial statements for five consecutive years for 212 and 690 businesses. Databases were examined by the most typical financial indicators using the multivariate and univariate analysis of variance and the k-medoid cluster analysis methods.

Findings

The results of the analysis have shown that there are differences in the number of retail food companies in the case of two counties, both in number and in financial performance. Companies in Hajdú-Bihar county perform better in terms of financial ratios than those in Cluj county. The groups created by k-medoids cluster analysis are relatively well distinguished in the case of Hajdú-Bihar county, while the picture is much more mixed in the case of Kolozs county. However, it is also important to note that the companies analyzed should generally perform better to survive.

Research limitations/implications

Among the limitations of the study, it is important to note that the findings are relevant only to the two counties examined. Another limiting factor is that quite several companies had to be excluded from the analysis due to missing data or outliers.

Practical implications

The study presents for the corporate decision-makers the current performance of the companies of the sector examined in the two counties. The results of the study highlight the business areas of concern in management. The findings show that they need to change this performance to strengthen their market position. We believe that it is not enough to complain about the expansion of the supermarket chains, but they should take appropriate actions to improve their situation. Based on the results of the study, it can be concluded that there is a need to improve the financial efficiency of retail food companies in both counties to survive in the long run. This improvement is essential because retailers can play an important role in smaller settlements and narrower residential environments.

Originality/value

Comparative analysis of retail food companies in similar counties in these two neighboring countries has not been conducted using complex financial analysis. The study revealed the common and/or individual characteristics of these companies.

Details

British Food Journal, vol. 122 no. 11
Type: Research Article
ISSN: 0007-070X

Keywords

Open Access
Article
Publication date: 26 April 2024

Xue Xin, Yuepeng Jiao, Yunfeng Zhang, Ming Liang and Zhanyong Yao

This study aims to ensure reliable analysis of dynamic responses in asphalt pavement structures. It investigates noise reduction and data mining techniques for pavement dynamic…

Abstract

Purpose

This study aims to ensure reliable analysis of dynamic responses in asphalt pavement structures. It investigates noise reduction and data mining techniques for pavement dynamic response signals.

Design/methodology/approach

The paper conducts time-frequency analysis on signals of pavement dynamic response initially. It also uses two common noise reduction methods, namely, low-pass filtering and wavelet decomposition reconstruction, to evaluate their effectiveness in reducing noise in these signals. Furthermore, as these signals are generated in response to vehicle loading, they contain a substantial amount of data and are prone to environmental interference, potentially resulting in outliers. Hence, it becomes crucial to extract dynamic strain response features (e.g. peaks and peak intervals) in real-time and efficiently.

Findings

The study introduces an improved density-based spatial clustering of applications with Noise (DBSCAN) algorithm for identifying outliers in denoised data. The results demonstrate that low-pass filtering is highly effective in reducing noise in pavement dynamic response signals within specified frequency ranges. The improved DBSCAN algorithm effectively identifies outliers in these signals through testing. Furthermore, the peak detection process, using the enhanced findpeaks function, consistently achieves excellent performance in identifying peak values, even when complex multi-axle heavy-duty truck strain signals are present.

Originality/value

The authors identified a suitable frequency domain range for low-pass filtering in asphalt road dynamic response signals, revealing minimal amplitude loss and effective strain information reflection between road layers. Furthermore, the authors introduced the DBSCAN-based anomaly data detection method and enhancements to the Matlab findpeaks function, enabling the detection of anomalies in road sensor data and automated peak identification.

Details

Smart and Resilient Transportation, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2632-0487

Keywords

Open Access
Article
Publication date: 28 July 2020

Prabhat Pokharel, Roshan Pokhrel and Basanta Joshi

Analysis of log message is very important for the identification of a suspicious system and network activity. This analysis requires the correct extraction of variable entities…

1094

Abstract

Analysis of log message is very important for the identification of a suspicious system and network activity. This analysis requires the correct extraction of variable entities. The variable entities are extracted by comparing the logs messages against the log patterns. Each of these log patterns can be represented in the form of a log signature. In this paper, we present a hybrid approach for log signature extraction. The approach consists of two modules. The first module identifies log patterns by generating log clusters. The second module uses Named Entity Recognition (NER) to extract signatures by using the extracted log clusters. Experiments were performed on event logs from Windows Operating System, Exchange and Unix and validation of the result was done by comparing the signatures and the variable entities against the standard log documentation. The outcome of the experiments was that extracted signatures were ready to be used with a high degree of accuracy.

Details

Applied Computing and Informatics, vol. 19 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Content available
Book part
Publication date: 9 March 2021

Abstract

Details

The Emerald Handbook of Blockchain for Business
Type: Book
ISBN: 978-1-83982-198-1

Content available
Book part
Publication date: 15 March 2021

Abstract

Details

The Machine Age of Customer Insight
Type: Book
ISBN: 978-1-83909-697-6

1 – 10 of 21