Search results
1 – 10 of 18Jie Ma, Zhiyuan Hao and Mo Hu
The density peak clustering algorithm (DP) is proposed to identify cluster centers by two parameters, i.e. ρ value (local density) and δ value (the distance between a point and…
Abstract
Purpose
The density peak clustering algorithm (DP) is proposed to identify cluster centers by two parameters, i.e. ρ value (local density) and δ value (the distance between a point and another point with a higher ρ value). According to the center-identifying principle of the DP, the potential cluster centers should have a higher ρ value and a higher δ value than other points. However, this principle may limit the DP from identifying some categories with multi-centers or the centers in lower-density regions. In addition, the improper assignment strategy of the DP could cause a wrong assignment result for the non-center points. This paper aims to address the aforementioned issues and improve the clustering performance of the DP.
Design/methodology/approach
First, to identify as many potential cluster centers as possible, the authors construct a point-domain by introducing the pinhole imaging strategy to extend the searching range of the potential cluster centers. Second, they design different novel calculation methods for calculating the domain distance, point-domain density and domain similarity. Third, they adopt domain similarity to achieve the domain merging process and optimize the final clustering results.
Findings
The experimental results on analyzing 12 synthetic data sets and 12 real-world data sets show that two-stage density peak clustering based on multi-strategy optimization (TMsDP) outperforms the DP and other state-of-the-art algorithms.
Originality/value
The authors propose a novel DP-based clustering method, i.e. TMsDP, and transform the relationship between points into that between domains to ultimately further optimize the clustering performance of the DP.
Details
Keywords
Qingyuan Wu, Changchen Zhan, Fu Lee Wang, Siyang Wang and Zeping Tang
The quick growth of web-based and mobile e-learning applications such as massive open online courses have created a large volume of online learning resources. Confronting such a…
Abstract
Purpose
The quick growth of web-based and mobile e-learning applications such as massive open online courses have created a large volume of online learning resources. Confronting such a large amount of learning data, it is important to develop effective clustering approaches for user group modeling and intelligent tutoring. The paper aims to discuss these issues.
Design/methodology/approach
In this paper, a minimum spanning tree based approach is proposed for clustering of online learning resources. The novel clustering approach has two main stages, namely, elimination stage and construction stage. During the elimination stage, the Euclidean distance is adopted as a metrics formula to measure density of learning resources. Resources with quite low densities are identified as outliers and therefore removed. During the construction stage, a minimum spanning tree is built by initializing the centroids according to the degree of freedom of the resources. Online learning resources are subsequently partitioned into clusters by exploiting the structure of minimum spanning tree.
Findings
Conventional clustering algorithms have a number of shortcomings such that they cannot handle online learning resources effectively. On the one hand, extant partitional clustering methods use a randomly assigned centroid for each cluster, which usually cause the problem of ineffective clustering results. On the other hand, classical density-based clustering methods are very computationally expensive and time-consuming. Experimental results indicate that the algorithm proposed outperforms the traditional clustering algorithms for online learning resources.
Originality/value
The effectiveness of the proposed algorithms has been validated by using several data sets. Moreover, the proposed clustering algorithm has great potential in e-learning applications. It has been demonstrated how the novel technique can be integrated in various e-learning systems. For example, the clustering technique can classify learners into groups so that homogeneous grouping can improve the effectiveness of learning. Moreover, clustering of online learning resources is valuable to decision making in terms of tutorial strategies and instructional design for intelligent tutoring. Lastly, a number of directions for future research have been identified in the study.
Details
Keywords
Zhiwen Pan, Wen Ji, Yiqiang Chen, Lianjun Dai and Jun Zhang
The disability datasets are the datasets that contain the information of disabled populations. By analyzing these datasets, professionals who work with disabled populations can…
Abstract
Purpose
The disability datasets are the datasets that contain the information of disabled populations. By analyzing these datasets, professionals who work with disabled populations can have a better understanding of the inherent characteristics of the disabled populations, so that working plans and policies, which can effectively help the disabled populations, can be made accordingly.
Design/methodology/approach
In this paper, the authors proposed a big data management and analytic approach for disability datasets.
Findings
By using a set of data mining algorithms, the proposed approach can provide the following services. The data management scheme in the approach can improve the quality of disability data by estimating miss attribute values and detecting anomaly and low-quality data instances. The data mining scheme in the approach can explore useful patterns which reflect the correlation, association and interactional between the disability data attributes. Experiments based on real-world dataset are conducted at the end to prove the effectiveness of the approach.
Originality/value
The proposed approach can enable data-driven decision-making for professionals who work with disabled populations.
Details
Keywords
Mohammed S. Al-kahtani, Lutful Karim and Nargis Khan
Designing an efficient routing protocol that opportunistically forwards data to the destination node through nearby sensor nodes or devices is significantly important for an…
Abstract
Designing an efficient routing protocol that opportunistically forwards data to the destination node through nearby sensor nodes or devices is significantly important for an effective incidence response and disaster recovery framework. Existing sensor routing protocols are mostly not effective in such disaster recovery applications as the networks are affected (destroyed or overused) in disasters such as earthquake, flood, Tsunami and wildfire. These protocols require a large number of message transmissions to reestablish the clusters and communications that is not energy efficient and result in packet loss. This paper introduces ODCR - an energy efficient and reliable opportunistic density clustered-based routing protocol for such emergency sensor applications. We perform simulation to measure the performance of ODCR protocol in terms of network energy consumptions, throughput and packet loss ratio. Simulation results demonstrate that the ODCR protocol is much better than the existing TEEN, LEACH and LORA protocols in term of these performance metrics.
Details
Keywords
Qinxu Ding, Ding Ding, Yue Wang, Chong Guan and Bosheng Ding
The rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a comprehensive…
Abstract
Purpose
The rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a comprehensive examination of the research landscape in LLMs, providing an overview of the prevailing themes and topics within this dynamic domain.
Design/methodology/approach
Drawing from an extensive corpus of 198 records published between 1996 to 2023 from the relevant academic database encompassing journal articles, books, book chapters, conference papers and selected working papers, this study delves deep into the multifaceted world of LLM research. In this study, the authors employed the BERTopic algorithm, a recent advancement in topic modeling, to conduct a comprehensive analysis of the data after it had been meticulously cleaned and preprocessed. BERTopic leverages the power of transformer-based language models like bidirectional encoder representations from transformers (BERT) to generate more meaningful and coherent topics. This approach facilitates the identification of hidden patterns within the data, enabling authors to uncover valuable insights that might otherwise have remained obscure. The analysis revealed four distinct clusters of topics in LLM research: “language and NLP”, “education and teaching”, “clinical and medical applications” and “speech and recognition techniques”. Each cluster embodies a unique aspect of LLM application and showcases the breadth of possibilities that LLM technology has to offer. In addition to presenting the research findings, this paper identifies key challenges and opportunities in the realm of LLMs. It underscores the necessity for further investigation in specific areas, including the paramount importance of addressing potential biases, transparency and explainability, data privacy and security, and responsible deployment of LLM technology.
Findings
The analysis revealed four distinct clusters of topics in LLM research: “language and NLP”, “education and teaching”, “clinical and medical applications” and “speech and recognition techniques”. Each cluster embodies a unique aspect of LLM application and showcases the breadth of possibilities that LLM technology has to offer. In addition to presenting the research findings, this paper identifies key challenges and opportunities in the realm of LLMs. It underscores the necessity for further investigation in specific areas, including the paramount importance of addressing potential biases, transparency and explainability, data privacy and security, and responsible deployment of LLM technology.
Practical implications
This classification offers practical guidance for researchers, developers, educators, and policymakers to focus efforts and resources. The study underscores the importance of addressing challenges in LLMs, including potential biases, transparency, data privacy, and responsible deployment. Policymakers can utilize this information to shape regulations, while developers can tailor technology development based on the diverse applications identified. The findings also emphasize the need for interdisciplinary collaboration and highlight ethical considerations, providing a roadmap for navigating the complex landscape of LLM research and applications.
Originality/value
This study stands out as the first to examine the evolution of LLMs across such a long time frame and across such diversified disciplines. It provides a unique perspective on the key areas of LLM research, highlighting the breadth and depth of LLM’s evolution.
Details
Keywords
Veronika Fenyves, Kinga Emese Zsido, Ioan Bircea and Tibor Tarnoczi
Changes in food retailing (globalization, concentration) have negative impacts on smaller, “traditional” food retail businesses. Their market share decreasing year by year. The…
Abstract
Purpose
Changes in food retailing (globalization, concentration) have negative impacts on smaller, “traditional” food retail businesses. Their market share decreasing year by year. The purpose of this study is to examine and compare the financial performances of these businesses under the given circumstances and current economic environment in a Hungarian and a Romanian county.
Design/methodology/approach
The study is based on two complete databases, including all companies that behoove retail food activity (considering the NACE cod) in the counties of Hajdu-Bihar (Hungary) and Cluj (Romania). The database analyzed contains the financial statements for five consecutive years for 212 and 690 businesses. Databases were examined by the most typical financial indicators using the multivariate and univariate analysis of variance and the k-medoid cluster analysis methods.
Findings
The results of the analysis have shown that there are differences in the number of retail food companies in the case of two counties, both in number and in financial performance. Companies in Hajdú-Bihar county perform better in terms of financial ratios than those in Cluj county. The groups created by k-medoids cluster analysis are relatively well distinguished in the case of Hajdú-Bihar county, while the picture is much more mixed in the case of Kolozs county. However, it is also important to note that the companies analyzed should generally perform better to survive.
Research limitations/implications
Among the limitations of the study, it is important to note that the findings are relevant only to the two counties examined. Another limiting factor is that quite several companies had to be excluded from the analysis due to missing data or outliers.
Practical implications
The study presents for the corporate decision-makers the current performance of the companies of the sector examined in the two counties. The results of the study highlight the business areas of concern in management. The findings show that they need to change this performance to strengthen their market position. We believe that it is not enough to complain about the expansion of the supermarket chains, but they should take appropriate actions to improve their situation. Based on the results of the study, it can be concluded that there is a need to improve the financial efficiency of retail food companies in both counties to survive in the long run. This improvement is essential because retailers can play an important role in smaller settlements and narrower residential environments.
Originality/value
Comparative analysis of retail food companies in similar counties in these two neighboring countries has not been conducted using complex financial analysis. The study revealed the common and/or individual characteristics of these companies.
Details
Keywords
Xue Xin, Yuepeng Jiao, Yunfeng Zhang, Ming Liang and Zhanyong Yao
This study aims to ensure reliable analysis of dynamic responses in asphalt pavement structures. It investigates noise reduction and data mining techniques for pavement dynamic…
Abstract
Purpose
This study aims to ensure reliable analysis of dynamic responses in asphalt pavement structures. It investigates noise reduction and data mining techniques for pavement dynamic response signals.
Design/methodology/approach
The paper conducts time-frequency analysis on signals of pavement dynamic response initially. It also uses two common noise reduction methods, namely, low-pass filtering and wavelet decomposition reconstruction, to evaluate their effectiveness in reducing noise in these signals. Furthermore, as these signals are generated in response to vehicle loading, they contain a substantial amount of data and are prone to environmental interference, potentially resulting in outliers. Hence, it becomes crucial to extract dynamic strain response features (e.g. peaks and peak intervals) in real-time and efficiently.
Findings
The study introduces an improved density-based spatial clustering of applications with Noise (DBSCAN) algorithm for identifying outliers in denoised data. The results demonstrate that low-pass filtering is highly effective in reducing noise in pavement dynamic response signals within specified frequency ranges. The improved DBSCAN algorithm effectively identifies outliers in these signals through testing. Furthermore, the peak detection process, using the enhanced findpeaks function, consistently achieves excellent performance in identifying peak values, even when complex multi-axle heavy-duty truck strain signals are present.
Originality/value
The authors identified a suitable frequency domain range for low-pass filtering in asphalt road dynamic response signals, revealing minimal amplitude loss and effective strain information reflection between road layers. Furthermore, the authors introduced the DBSCAN-based anomaly data detection method and enhancements to the Matlab findpeaks function, enabling the detection of anomalies in road sensor data and automated peak identification.
Details
Keywords
Prabhat Pokharel, Roshan Pokhrel and Basanta Joshi
Analysis of log message is very important for the identification of a suspicious system and network activity. This analysis requires the correct extraction of variable entities…
Abstract
Analysis of log message is very important for the identification of a suspicious system and network activity. This analysis requires the correct extraction of variable entities. The variable entities are extracted by comparing the logs messages against the log patterns. Each of these log patterns can be represented in the form of a log signature. In this paper, we present a hybrid approach for log signature extraction. The approach consists of two modules. The first module identifies log patterns by generating log clusters. The second module uses Named Entity Recognition (NER) to extract signatures by using the extracted log clusters. Experiments were performed on event logs from Windows Operating System, Exchange and Unix and validation of the result was done by comparing the signatures and the variable entities against the standard log documentation. The outcome of the experiments was that extracted signatures were ready to be used with a high degree of accuracy.
Details
Keywords
Katarzyna Piwowar-Sulej, Sławomir Wawak, Małgorzata Tyrańska, Małgorzata Zakrzewska, Szymon Jarosz and Mariusz Sołtysik
The purpose of the study was to detect trends in human resource management (HRM) research presented in journals during the 2000–2020 timeframe. The research question is: How are…
Abstract
Purpose
The purpose of the study was to detect trends in human resource management (HRM) research presented in journals during the 2000–2020 timeframe. The research question is: How are the interests of researchers changing in the field of HRM and which topics have gained popularity in recent years?
Design/methodology/approach
The approach adopted in this study was designed to overcome all the limitations specific to the systematic literature reviews and bibliometric studies presented in the Introduction. The full texts of papers were analyzed. The text-mining tools detected first clusters and then trends, moreover, which limited the impact of a researcher's bias. The approach applied is consistent with the general rules of systematic literature reviews.
Findings
The article makes a threefold contribution to academic knowledge. First, it uses modern methodology to gather and synthesize HRM research topics. The proposed approach was designed to allow early detection of nascent, non-obvious trends in research, which will help researchers address topics of high value for both theory and practice. Second, the results of our study highlight shifts in focus in HRM over the past 19 years. Third, the article suggests further directions of research.
Research limitations/implications
In this study, the approach designed to overcome the limitations of using systematic literature review was presented. The analysis was done on the basis of the full text of the articles and the categories were discovered directly from the articles rather than predetermined. The study's findings may, however, potentially be limited by the following issues. First, the eligibility criteria included only papers indexed in the Scopus and WoS database and excluded conference proceedings, book chapters, and non-English papers. Second, only full-text articles were included in the study, which could narrow down the research area. As a consequence, important information regarding the research presented in the excluded documents is potentially lost. Third, most of the papers in our database were published in the International Journal of Human Resource Management, and therefore such trends as “challenges for international HRM” can be considered significant (long-lasting). Another – the fourth – limitation of the study is the lack of estimation of the proportion between searches in HRM journals and articles published in other journals. Future research may overcome the above-presented limitations. Although the authors used valuable techniques such as TF-IDF and HDBSCAN, the fifth limitation is that, after trends were discovered, it was necessary to evaluate and interpret them. That could have induced researchers' bias even if – as in this study – researchers from different areas of experience were involved. Finally, this study covers the 2000–2020 timeframe. Since HRM is a rapidly developing field, in a few years from now academics will probably begin to move into exciting new research areas. As a consequence, it might be worthwhile conducting similar analyses to those presented in this study and compare their results.
Originality/value
The present study provides an analysis of HRM journals with the aim of establishing trends in HRM research. It makes contributions to the field by providing a more comprehensive and objective review than analyses resulting from systematic literature reviews. It fills the gap in literature studies on HRM with a novel research approach – a methodology based on full-text mining and a big data toolset. As a consequence, this study can be considered as providing an adequate reflection of all the articles published in journals strictly devoted to HRM issues and which may serve as an important source of reference for both researchers and practitioners. This study can help them identify the core journals focused on HRM research as well as topics which are of particular interest and importance.
Details
Keywords
Jie Zhang, Yuwei Wu, Jianyong Gao, Guangjun Gao and Zhigang Yang
This study aims to explore the formation mechanism of aerodynamic noise of a high-speed maglev train and understand the characteristics of dipole and quadrupole sound sources of…
Abstract
Purpose
This study aims to explore the formation mechanism of aerodynamic noise of a high-speed maglev train and understand the characteristics of dipole and quadrupole sound sources of the maglev train at different speed levels.
Design/methodology/approach
Based on large eddy simulation (LES) method and Kirchhoff–Ffowcs Williams and Hawkings (K-FWH) equations, the characteristics of dipole and quadrupole sound sources of maglev trains at different speed levels were simulated and analyzed by constructing reasonable penetrable integral surface.
Findings
The spatial disturbance resulting from the separation of the boundary layer in the streamlined area of the tail car is the source of aerodynamic sound of the maglev train. The dipole sources of the train are mainly distributed around the radio terminals of the head and tail cars of the maglev train, the bottom of the arms of the streamlined parts of the head and tail cars and the nose tip area of the streamlined part of the tail car, and the quadrupole sources are mainly distributed in the wake area. When the train runs at three speed levels of 400, 500 and 600 km·h−1, respectively, the radiated energy of quadrupole source is 62.4%, 63.3% and 71.7%, respectively, which exceeds that of dipole sources.
Originality/value
This study can help understand the aerodynamic noise characteristics generated by the high-speed maglev train and provide a reference for the optimization design of its aerodynamic shape.
Details