Search results
1 – 10 of 24Jie Ma, Zhiyuan Hao and Mo Hu
The density peak clustering algorithm (DP) is proposed to identify cluster centers by two parameters, i.e. ρ value (local density) and δ value (the distance between a point and…
Abstract
Purpose
The density peak clustering algorithm (DP) is proposed to identify cluster centers by two parameters, i.e. ρ value (local density) and δ value (the distance between a point and another point with a higher ρ value). According to the center-identifying principle of the DP, the potential cluster centers should have a higher ρ value and a higher δ value than other points. However, this principle may limit the DP from identifying some categories with multi-centers or the centers in lower-density regions. In addition, the improper assignment strategy of the DP could cause a wrong assignment result for the non-center points. This paper aims to address the aforementioned issues and improve the clustering performance of the DP.
Design/methodology/approach
First, to identify as many potential cluster centers as possible, the authors construct a point-domain by introducing the pinhole imaging strategy to extend the searching range of the potential cluster centers. Second, they design different novel calculation methods for calculating the domain distance, point-domain density and domain similarity. Third, they adopt domain similarity to achieve the domain merging process and optimize the final clustering results.
Findings
The experimental results on analyzing 12 synthetic data sets and 12 real-world data sets show that two-stage density peak clustering based on multi-strategy optimization (TMsDP) outperforms the DP and other state-of-the-art algorithms.
Originality/value
The authors propose a novel DP-based clustering method, i.e. TMsDP, and transform the relationship between points into that between domains to ultimately further optimize the clustering performance of the DP.
Details
Keywords
Qingyuan Wu, Changchen Zhan, Fu Lee Wang, Siyang Wang and Zeping Tang
The quick growth of web-based and mobile e-learning applications such as massive open online courses have created a large volume of online learning resources. Confronting such a…
Abstract
Purpose
The quick growth of web-based and mobile e-learning applications such as massive open online courses have created a large volume of online learning resources. Confronting such a large amount of learning data, it is important to develop effective clustering approaches for user group modeling and intelligent tutoring. The paper aims to discuss these issues.
Design/methodology/approach
In this paper, a minimum spanning tree based approach is proposed for clustering of online learning resources. The novel clustering approach has two main stages, namely, elimination stage and construction stage. During the elimination stage, the Euclidean distance is adopted as a metrics formula to measure density of learning resources. Resources with quite low densities are identified as outliers and therefore removed. During the construction stage, a minimum spanning tree is built by initializing the centroids according to the degree of freedom of the resources. Online learning resources are subsequently partitioned into clusters by exploiting the structure of minimum spanning tree.
Findings
Conventional clustering algorithms have a number of shortcomings such that they cannot handle online learning resources effectively. On the one hand, extant partitional clustering methods use a randomly assigned centroid for each cluster, which usually cause the problem of ineffective clustering results. On the other hand, classical density-based clustering methods are very computationally expensive and time-consuming. Experimental results indicate that the algorithm proposed outperforms the traditional clustering algorithms for online learning resources.
Originality/value
The effectiveness of the proposed algorithms has been validated by using several data sets. Moreover, the proposed clustering algorithm has great potential in e-learning applications. It has been demonstrated how the novel technique can be integrated in various e-learning systems. For example, the clustering technique can classify learners into groups so that homogeneous grouping can improve the effectiveness of learning. Moreover, clustering of online learning resources is valuable to decision making in terms of tutorial strategies and instructional design for intelligent tutoring. Lastly, a number of directions for future research have been identified in the study.
Details
Keywords
Zhiwen Pan, Wen Ji, Yiqiang Chen, Lianjun Dai and Jun Zhang
The disability datasets are the datasets that contain the information of disabled populations. By analyzing these datasets, professionals who work with disabled populations can…
Abstract
Purpose
The disability datasets are the datasets that contain the information of disabled populations. By analyzing these datasets, professionals who work with disabled populations can have a better understanding of the inherent characteristics of the disabled populations, so that working plans and policies, which can effectively help the disabled populations, can be made accordingly.
Design/methodology/approach
In this paper, the authors proposed a big data management and analytic approach for disability datasets.
Findings
By using a set of data mining algorithms, the proposed approach can provide the following services. The data management scheme in the approach can improve the quality of disability data by estimating miss attribute values and detecting anomaly and low-quality data instances. The data mining scheme in the approach can explore useful patterns which reflect the correlation, association and interactional between the disability data attributes. Experiments based on real-world dataset are conducted at the end to prove the effectiveness of the approach.
Originality/value
The proposed approach can enable data-driven decision-making for professionals who work with disabled populations.
Details
Keywords
Mohammed S. Al-kahtani, Lutful Karim and Nargis Khan
Designing an efficient routing protocol that opportunistically forwards data to the destination node through nearby sensor nodes or devices is significantly important for an…
Abstract
Designing an efficient routing protocol that opportunistically forwards data to the destination node through nearby sensor nodes or devices is significantly important for an effective incidence response and disaster recovery framework. Existing sensor routing protocols are mostly not effective in such disaster recovery applications as the networks are affected (destroyed or overused) in disasters such as earthquake, flood, Tsunami and wildfire. These protocols require a large number of message transmissions to reestablish the clusters and communications that is not energy efficient and result in packet loss. This paper introduces ODCR - an energy efficient and reliable opportunistic density clustered-based routing protocol for such emergency sensor applications. We perform simulation to measure the performance of ODCR protocol in terms of network energy consumptions, throughput and packet loss ratio. Simulation results demonstrate that the ODCR protocol is much better than the existing TEEN, LEACH and LORA protocols in term of these performance metrics.
Details
Keywords
Qinxu Ding, Ding Ding, Yue Wang, Chong Guan and Bosheng Ding
The rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a comprehensive…
Abstract
Purpose
The rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a comprehensive examination of the research landscape in LLMs, providing an overview of the prevailing themes and topics within this dynamic domain.
Design/methodology/approach
Drawing from an extensive corpus of 198 records published between 1996 to 2023 from the relevant academic database encompassing journal articles, books, book chapters, conference papers and selected working papers, this study delves deep into the multifaceted world of LLM research. In this study, the authors employed the BERTopic algorithm, a recent advancement in topic modeling, to conduct a comprehensive analysis of the data after it had been meticulously cleaned and preprocessed. BERTopic leverages the power of transformer-based language models like bidirectional encoder representations from transformers (BERT) to generate more meaningful and coherent topics. This approach facilitates the identification of hidden patterns within the data, enabling authors to uncover valuable insights that might otherwise have remained obscure. The analysis revealed four distinct clusters of topics in LLM research: “language and NLP”, “education and teaching”, “clinical and medical applications” and “speech and recognition techniques”. Each cluster embodies a unique aspect of LLM application and showcases the breadth of possibilities that LLM technology has to offer. In addition to presenting the research findings, this paper identifies key challenges and opportunities in the realm of LLMs. It underscores the necessity for further investigation in specific areas, including the paramount importance of addressing potential biases, transparency and explainability, data privacy and security, and responsible deployment of LLM technology.
Findings
The analysis revealed four distinct clusters of topics in LLM research: “language and NLP”, “education and teaching”, “clinical and medical applications” and “speech and recognition techniques”. Each cluster embodies a unique aspect of LLM application and showcases the breadth of possibilities that LLM technology has to offer. In addition to presenting the research findings, this paper identifies key challenges and opportunities in the realm of LLMs. It underscores the necessity for further investigation in specific areas, including the paramount importance of addressing potential biases, transparency and explainability, data privacy and security, and responsible deployment of LLM technology.
Practical implications
This classification offers practical guidance for researchers, developers, educators, and policymakers to focus efforts and resources. The study underscores the importance of addressing challenges in LLMs, including potential biases, transparency, data privacy, and responsible deployment. Policymakers can utilize this information to shape regulations, while developers can tailor technology development based on the diverse applications identified. The findings also emphasize the need for interdisciplinary collaboration and highlight ethical considerations, providing a roadmap for navigating the complex landscape of LLM research and applications.
Originality/value
This study stands out as the first to examine the evolution of LLMs across such a long time frame and across such diversified disciplines. It provides a unique perspective on the key areas of LLM research, highlighting the breadth and depth of LLM’s evolution.
Details
Keywords
Luan Thanh Le and Trang Xuan-Thi-Thu
To achieve the Sustainable Development Goals (SDGs) in the era of Logistics 4.0, machine learning (ML) techniques and simulations have emerged as highly optimized tools. This…
Abstract
Purpose
To achieve the Sustainable Development Goals (SDGs) in the era of Logistics 4.0, machine learning (ML) techniques and simulations have emerged as highly optimized tools. This study examines the operational dynamics of a supply chain (SC) in Vietnam as a case study utilizing an ML simulation approach.
Design/methodology/approach
A robust fuel consumption estimation model is constructed by leveraging multiple linear regression (MLR) and artificial neural network (ANN). Subsequently, the proposed model is seamlessly integrated into a cutting-edge SC simulation framework.
Findings
This paper provides valuable insights and actionable recommendations, empowering SC practitioners to optimize operational efficiencies and fostering an avenue for further scholarly investigations and advancements in this field.
Originality/value
This study introduces a novel approach assessing sustainable SC performance by utilizing both traditional regression and ML models to estimate transportation costs, which are then inputted into the discrete event simulation (DES) model.
Details
Keywords
Veronika Fenyves, Kinga Emese Zsido, Ioan Bircea and Tibor Tarnoczi
Changes in food retailing (globalization, concentration) have negative impacts on smaller, “traditional” food retail businesses. Their market share decreasing year by year. The…
Abstract
Purpose
Changes in food retailing (globalization, concentration) have negative impacts on smaller, “traditional” food retail businesses. Their market share decreasing year by year. The purpose of this study is to examine and compare the financial performances of these businesses under the given circumstances and current economic environment in a Hungarian and a Romanian county.
Design/methodology/approach
The study is based on two complete databases, including all companies that behoove retail food activity (considering the NACE cod) in the counties of Hajdu-Bihar (Hungary) and Cluj (Romania). The database analyzed contains the financial statements for five consecutive years for 212 and 690 businesses. Databases were examined by the most typical financial indicators using the multivariate and univariate analysis of variance and the k-medoid cluster analysis methods.
Findings
The results of the analysis have shown that there are differences in the number of retail food companies in the case of two counties, both in number and in financial performance. Companies in Hajdú-Bihar county perform better in terms of financial ratios than those in Cluj county. The groups created by k-medoids cluster analysis are relatively well distinguished in the case of Hajdú-Bihar county, while the picture is much more mixed in the case of Kolozs county. However, it is also important to note that the companies analyzed should generally perform better to survive.
Research limitations/implications
Among the limitations of the study, it is important to note that the findings are relevant only to the two counties examined. Another limiting factor is that quite several companies had to be excluded from the analysis due to missing data or outliers.
Practical implications
The study presents for the corporate decision-makers the current performance of the companies of the sector examined in the two counties. The results of the study highlight the business areas of concern in management. The findings show that they need to change this performance to strengthen their market position. We believe that it is not enough to complain about the expansion of the supermarket chains, but they should take appropriate actions to improve their situation. Based on the results of the study, it can be concluded that there is a need to improve the financial efficiency of retail food companies in both counties to survive in the long run. This improvement is essential because retailers can play an important role in smaller settlements and narrower residential environments.
Originality/value
Comparative analysis of retail food companies in similar counties in these two neighboring countries has not been conducted using complex financial analysis. The study revealed the common and/or individual characteristics of these companies.
Details
Keywords
Xue Xin, Yuepeng Jiao, Yunfeng Zhang, Ming Liang and Zhanyong Yao
This study aims to ensure reliable analysis of dynamic responses in asphalt pavement structures. It investigates noise reduction and data mining techniques for pavement dynamic…
Abstract
Purpose
This study aims to ensure reliable analysis of dynamic responses in asphalt pavement structures. It investigates noise reduction and data mining techniques for pavement dynamic response signals.
Design/methodology/approach
The paper conducts time-frequency analysis on signals of pavement dynamic response initially. It also uses two common noise reduction methods, namely, low-pass filtering and wavelet decomposition reconstruction, to evaluate their effectiveness in reducing noise in these signals. Furthermore, as these signals are generated in response to vehicle loading, they contain a substantial amount of data and are prone to environmental interference, potentially resulting in outliers. Hence, it becomes crucial to extract dynamic strain response features (e.g. peaks and peak intervals) in real-time and efficiently.
Findings
The study introduces an improved density-based spatial clustering of applications with Noise (DBSCAN) algorithm for identifying outliers in denoised data. The results demonstrate that low-pass filtering is highly effective in reducing noise in pavement dynamic response signals within specified frequency ranges. The improved DBSCAN algorithm effectively identifies outliers in these signals through testing. Furthermore, the peak detection process, using the enhanced findpeaks function, consistently achieves excellent performance in identifying peak values, even when complex multi-axle heavy-duty truck strain signals are present.
Originality/value
The authors identified a suitable frequency domain range for low-pass filtering in asphalt road dynamic response signals, revealing minimal amplitude loss and effective strain information reflection between road layers. Furthermore, the authors introduced the DBSCAN-based anomaly data detection method and enhancements to the Matlab findpeaks function, enabling the detection of anomalies in road sensor data and automated peak identification.
Details
Keywords
Prabhat Pokharel, Roshan Pokhrel and Basanta Joshi
Analysis of log message is very important for the identification of a suspicious system and network activity. This analysis requires the correct extraction of variable entities…
Abstract
Analysis of log message is very important for the identification of a suspicious system and network activity. This analysis requires the correct extraction of variable entities. The variable entities are extracted by comparing the logs messages against the log patterns. Each of these log patterns can be represented in the form of a log signature. In this paper, we present a hybrid approach for log signature extraction. The approach consists of two modules. The first module identifies log patterns by generating log clusters. The second module uses Named Entity Recognition (NER) to extract signatures by using the extracted log clusters. Experiments were performed on event logs from Windows Operating System, Exchange and Unix and validation of the result was done by comparing the signatures and the variable entities against the standard log documentation. The outcome of the experiments was that extracted signatures were ready to be used with a high degree of accuracy.
Details