Search results

1 – 10 of over 1000
Article
Publication date: 5 September 2016

Runhai Jiao, Shaolong Liu, Wu Wen and Biying Lin

The large volume of big data makes it impractical for traditional clustering algorithms which are usually designed for entire data set. The purpose of this paper is to focus on…

Abstract

Purpose

The large volume of big data makes it impractical for traditional clustering algorithms which are usually designed for entire data set. The purpose of this paper is to focus on incremental clustering which divides data into series of data chunks and only a small amount of data need to be clustered at each time. Few researches on incremental clustering algorithm address the problem of optimizing cluster center initialization for each data chunk and selecting multiple passing points for each cluster.

Design/methodology/approach

Through optimizing initial cluster centers, quality of clustering results is improved for each data chunk and then quality of final clustering results is enhanced. Moreover, through selecting multiple passing points, more accurate information is passed down to improve the final clustering results. The method has been proposed to solve those two problems and is applied in the proposed algorithm based on streaming kernel fuzzy c-means (stKFCM) algorithm.

Findings

Experimental results show that the proposed algorithm demonstrates more accuracy and better performance than streaming kernel stKFCM algorithm.

Originality/value

This paper addresses the problem of improving the performance of increment clustering through optimizing cluster center initialization and selecting multiple passing points. The paper analyzed the performance of the proposed scheme and proved its effectiveness.

Details

Kybernetes, vol. 45 no. 8
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 4 August 2021

Archana Yashodip Chaudhari and Preeti Mulay

To reduce the electricity consumption in our homes, a first step is to make the user aware of it. Reading a meter once in a month is not enough, instead, it requires real-time…

Abstract

Purpose

To reduce the electricity consumption in our homes, a first step is to make the user aware of it. Reading a meter once in a month is not enough, instead, it requires real-time meter reading. Smart electricity meter (SEM) is capable of providing a quick and exact meter reading in real-time at regular time intervals. SEM generates a considerable amount of household electricity consumption data in an incremental manner. However, such data has embedded load patterns and hidden information to extract and learn consumer behavior. The extracted load patterns from data clustering should be updated because consumer behaviors may be changed over time. The purpose of this study is to update the new clustering results based on the old data rather than to re-cluster all of the data from scratch.

Design/methodology/approach

This paper proposes an incremental clustering with nearness factor (ICNF) algorithm to update load patterns without overall daily load curve clustering.

Findings

Extensive experiments are implemented on real-world SEM data of Irish Social Science Data Archive (Ireland) data set. The results are evaluated by both accuracy measures and clustering validity indices, which indicate that proposed method is useful for using the enormous amount of smart meter data to understand customers’ electricity consumption behaviors.

Originality/value

ICNF can provide an efficient response for electricity consumption patterns analysis to end consumers via SEMs.

Article
Publication date: 17 October 2008

Rui Xu and Donald C. Wunsch

The purpose of this paper is to provide a review of the issues related to cluster analysis, one of the most important and primitive activities of human beings, and of the advances…

1746

Abstract

Purpose

The purpose of this paper is to provide a review of the issues related to cluster analysis, one of the most important and primitive activities of human beings, and of the advances made in recent years.

Design/methodology/approach

The paper investigates the clustering algorithms rooted in machine learning, computer science, statistics, and computational intelligence.

Findings

The paper reviews the basic issues of cluster analysis and discusses the recent advances of clustering algorithms in scalability, robustness, visualization, irregular cluster shape detection, and so on.

Originality/value

The paper presents a comprehensive and systematic survey of cluster analysis and emphasizes its recent efforts in order to meet the challenges caused by the glut of complicated data from a wide variety of communities.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 1 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 26 June 2019

Mamta Kayest and Sanjay Kumar Jain

Document retrieval has become a hot research topic over the past few years, and has been paid more attention in browsing and synthesizing information from different documents. The…

Abstract

Purpose

Document retrieval has become a hot research topic over the past few years, and has been paid more attention in browsing and synthesizing information from different documents. The purpose of this paper is to develop an effective document retrieval method, which focuses on reducing the time needed for the navigator to evoke the whole document based on contents, themes and concepts of documents.

Design/methodology/approach

This paper introduces an incremental learning approach for text categorization using Monarch Butterfly optimization–FireFly optimization based Neural Network (MB–FF based NN). Initially, the feature extraction is carried out on the pre-processed data using Term Frequency–Inverse Document Frequency (TF–IDF) and holoentropy to find the keywords of the document. Then, cluster-based indexing is performed using MB–FF algorithm, and finally, by matching process with the modified Bhattacharya distance measure, the document retrieval is done. In MB–FF based NN, the weights in the NN are chosen using MB–FF algorithm.

Findings

The effectiveness of the proposed MB–FF based NN is proven with an improved precision value of 0.8769, recall value of 0.7957, F-measure of 0.8143 and accuracy of 0.7815, respectively.

Originality/value

The experimental results show that the proposed MB–FF based NN is useful to companies, which have a large workforce across the country.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 12 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 26 April 2011

I‐Chin Wu

Seeking and retrieving information is an essential aspect of knowledge workers' activities during problem‐solving and decision‐making tasks. In recent years, user‐oriented…

1829

Abstract

Purpose

Seeking and retrieving information is an essential aspect of knowledge workers' activities during problem‐solving and decision‐making tasks. In recent years, user‐oriented Information Seeking (IS) research methods rooted in the social sciences have been integrated with Information Retrieval (IR) research approaches based on computer science to capitalize on the strengths of each field. Given this background, the objective is to develop a topic‐needs variation determination technique based on the observations of IS&R theories.

Design/methodology/approach

In this study, implicit and explicit methods for identifying users' evolving topic‐needs are proposed. Knowledge‐intensive tasks performed by academic researchers are used to evaluate the efficacy of the proposed methods. The paper conducted two sets of experiments to demonstrate and verify the importance of determining changes in topic‐needs during the IS&R process.

Findings

The results in terms of precision and discounted cumulated gain (DCG) values show that the proposed Stage‐Topic_W (G,S) and Stage‐Topic‐Interaction methods can retrieve relevant document sets for users engaged in long‐term tasks more efficiently and effectively than traditional methods.

Practical implications

The improved precision of the proposed methods means that they can retrieve more relevant documents for the searcher. Accordingly, the results of this research have implications for enhancing the search function in enterprise content management (ECM) applications to support the execution of projects/tasks by professionals and facilitate effective ECM.

Originality/value

The model observes a user's search behavior pattern to determine the personal factors (e.g. changes in the user's cognitive status), and content factors (e.g. changes in topic‐needs) simultaneously. The objective is to capture changes in the user's information needs precisely so that evolving information needs can be satisfied in a timely manner.

Details

Journal of Documentation, vol. 67 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 23 August 2022

Kamlesh Kumar Pandey and Diwakar Shukla

The K-means (KM) clustering algorithm is extremely responsive to the selection of initial centroids since the initial centroid of clusters determines computational effectiveness…

Abstract

Purpose

The K-means (KM) clustering algorithm is extremely responsive to the selection of initial centroids since the initial centroid of clusters determines computational effectiveness, efficiency and local optima issues. Numerous initialization strategies are to overcome these problems through the random and deterministic selection of initial centroids. The random initialization strategy suffers from local optimization issues with the worst clustering performance, while the deterministic initialization strategy achieves high computational cost. Big data clustering aims to reduce computation costs and improve cluster efficiency. The objective of this study is to achieve a better initial centroid for big data clustering on business management data without using random and deterministic initialization that avoids local optima and improves clustering efficiency with effectiveness in terms of cluster quality, computation cost, data comparisons and iterations on a single machine.

Design/methodology/approach

This study presents the Normal Distribution Probability Density (NDPD) algorithm for big data clustering on a single machine to solve business management-related clustering issues. The NDPDKM algorithm resolves the KM clustering problem by probability density of each data point. The NDPDKM algorithm first identifies the most probable density data points by using the mean and standard deviation of the datasets through normal probability density. Thereafter, the NDPDKM determines K initial centroid by using sorting and linear systematic sampling heuristics.

Findings

The performance of the proposed algorithm is compared with KM, KM++, Var-Part, Murat-KM, Mean-KM and Sort-KM algorithms through Davies Bouldin score, Silhouette coefficient, SD Validity, S_Dbw Validity, Number of Iterations and CPU time validation indices on eight real business datasets. The experimental evaluation demonstrates that the NDPDKM algorithm reduces iterations, local optima, computing costs, and improves cluster performance, effectiveness, efficiency with stable convergence as compared to other algorithms. The NDPDKM algorithm minimizes the average computing time up to 34.83%, 90.28%, 71.83%, 92.67%, 69.53% and 76.03%, and reduces the average iterations up to 40.32%, 44.06%, 32.02%, 62.78%, 19.07% and 36.74% with reference to KM, KM++, Var-Part, Murat-KM, Mean-KM and Sort-KM algorithms.

Originality/value

The KM algorithm is the most widely used partitional clustering approach in data mining techniques that extract hidden knowledge, patterns and trends for decision-making strategies in business data. Business analytics is one of the applications of big data clustering where KM clustering is useful for the various subcategories of business analytics such as customer segmentation analysis, employee salary and performance analysis, document searching, delivery optimization, discount and offer analysis, chaplain management, manufacturing analysis, productivity analysis, specialized employee and investor searching and other decision-making strategies in business.

Article
Publication date: 5 August 2021

Farzad Kiani, Amir Seyyedabbasi and Sajjad Nematzadeh

Efficient resource utilization in wireless sensor networks is an important issue. Clustering structure has an important effect on the efficient use of energy, which is one of the…

172

Abstract

Purpose

Efficient resource utilization in wireless sensor networks is an important issue. Clustering structure has an important effect on the efficient use of energy, which is one of the most critical resources. However, it is extremely vital to choose efficient and suitable cluster head (CH) elements in these structures to harness their benefits. Selecting appropriate CHs and finding optimal coefficients for each parameter of a relevant fitness function in CHs election is a non-deterministic polynomial-time (NP-hard) problem that requires additional processing. Therefore, the purpose of this paper is to propose efficient solutions to achieve the main goal by addressing the related issues.

Design/methodology/approach

This paper draws inspiration from three metaheuristic-based algorithms; gray wolf optimizer (GWO), incremental GWO and expanded GWO. These methods perform various complex processes very efficiently and much faster. They consist of cluster setup and data transmission phases. The first phase focuses on clusters formation and CHs election, and the second phase tries to find routes for data transmission. The CH selection is obtained using a new fitness function. This function focuses on four parameters, i.e. energy of each node, energy of its neighbors, number of neighbors and its distance from the base station.

Findings

The results obtained from the proposed methods have been compared with HEEL, EESTDC, iABC and NR-LEACH algorithms and are found to be successful using various analysis parameters. Particularly, I-HEELEx-GWO method has provided the best results.

Originality/value

This paper proposes three new methods to elect optimal CH that prolong the networks lifetime, save energy, improve overhead along with packet delivery ratio.

Details

Sensor Review, vol. 41 no. 4
Type: Research Article
ISSN: 0260-2288

Keywords

Article
Publication date: 12 September 2022

Adah-Kole Onjewu, Razieh Sadraei and Vahid Jafari-Sadeghi

In spite of wide civic and academic interest in obesity, there are no bibliometric records of this issue in the marketing corpus. Thus, this inquiry is conceived to address this…

Abstract

Purpose

In spite of wide civic and academic interest in obesity, there are no bibliometric records of this issue in the marketing corpus. Thus, this inquiry is conceived to address this shortcoming with a bibliometric analysis of Scopus indexed articles published on the subject.

Design/methodology/approach

The analysis followed a five-step science mapping approach of study design, data collection, data analysis, data visualisation and data interpretation. R programming software was used to review 88 peer reviewed journals published between 1987 and 2021.

Findings

A sizable stream of literature exploring obesity has accrued in the marketing area as authors have drawn parallels between the influence of persuasive communication and advertising on human wellbeing and child health. The United States of America is found to be by far the country with the highest number of publications on obesity, followed by Australia and the United Kingdom. The topic dendrogram indicates two strands of obesity discourse: (1) social and policy intervention opportunities and (2) the effects on social groups in the population.

Research limitations/implications

This review will shape future enquiries investigating obesity. Beyond the focus on children, males and females, an emerging focus on cola, ethics, food waste, milk, policy-making and students is highlighted.

Originality/value

This is the first bibliometric review of obesity in the marketing literature. This is especially timely for weighing up the utility of research aimed at understanding and reporting the trends, influences and role of stakeholders in addressing obesity.

Details

EuroMed Journal of Business, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1450-2194

Keywords

Abstract

Details

Self-Learning and Adaptive Algorithms for Business Applications
Type: Book
ISBN: 978-1-83867-174-7

Article
Publication date: 28 July 2020

Sathyaraj R, Ramanathan L, Lavanya K, Balasubramanian V and Saira Banu J

The innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of…

Abstract

Purpose

The innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of the imbalance data in the massive data sets is a major constraint to the research industry.

Design/methodology/approach

The purpose of the paper is to introduce a big data classification technique using the MapReduce framework based on an optimization algorithm. The big data classification is enabled using the MapReduce framework, which utilizes the proposed optimization algorithm, named chicken-based bacterial foraging (CBF) algorithm. The proposed algorithm is generated by integrating the bacterial foraging optimization (BFO) algorithm with the cat swarm optimization (CSO) algorithm. The proposed model executes the process in two stages, namely, training and testing phases. In the training phase, the big data that is produced from different distributed sources is subjected to parallel processing using the mappers in the mapper phase, which perform the preprocessing and feature selection based on the proposed CBF algorithm. The preprocessing step eliminates the redundant and inconsistent data, whereas the feature section step is done on the preprocessed data for extracting the significant features from the data, to provide improved classification accuracy. The selected features are fed into the reducer for data classification using the deep belief network (DBN) classifier, which is trained using the proposed CBF algorithm such that the data are classified into various classes, and finally, at the end of the training process, the individual reducers present the trained models. Thus, the incremental data are handled effectively based on the training model in the training phase. In the testing phase, the incremental data are taken and split into different subsets and fed into the different mappers for the classification. Each mapper contains a trained model which is obtained from the training phase. The trained model is utilized for classifying the incremental data. After classification, the output obtained from each mapper is fused and fed into the reducer for the classification.

Findings

The maximum accuracy and Jaccard coefficient are obtained using the epileptic seizure recognition database. The proposed CBF-DBN produces a maximal accuracy value of 91.129%, whereas the accuracy values of the existing neural network (NN), DBN, naive Bayes classifier-term frequency–inverse document frequency (NBC-TFIDF) are 82.894%, 86.184% and 86.512%, respectively. The Jaccard coefficient of the proposed CBF-DBN produces a maximal Jaccard coefficient value of 88.928%, whereas the Jaccard coefficient values of the existing NN, DBN, NBC-TFIDF are 75.891%, 79.850% and 81.103%, respectively.

Originality/value

In this paper, a big data classification method is proposed for categorizing massive data sets for meeting the constraints of huge data. The big data classification is performed on the MapReduce framework based on training and testing phases in such a way that the data are handled in parallel at the same time. In the training phase, the big data is obtained and partitioned into different subsets of data and fed into the mapper. In the mapper, the features extraction step is performed for extracting the significant features. The obtained features are subjected to the reducers for classifying the data using the obtained features. The DBN classifier is utilized for the classification wherein the DBN is trained using the proposed CBF algorithm. The trained model is obtained as an output after the classification. In the testing phase, the incremental data are considered for the classification. New data are first split into subsets and fed into the mapper for classification. The trained models obtained from the training phase are used for the classification. The classified results from each mapper are fused and fed into the reducer for the classification of big data.

Details

Data Technologies and Applications, vol. 55 no. 3
Type: Research Article
ISSN: 2514-9288

Keywords

1 – 10 of over 1000