Search results
1 – 10 of over 6000Archana Yashodip Chaudhari and Preeti Mulay
To reduce the electricity consumption in our homes, a first step is to make the user aware of it. Reading a meter once in a month is not enough, instead, it requires real-time…
Abstract
Purpose
To reduce the electricity consumption in our homes, a first step is to make the user aware of it. Reading a meter once in a month is not enough, instead, it requires real-time meter reading. Smart electricity meter (SEM) is capable of providing a quick and exact meter reading in real-time at regular time intervals. SEM generates a considerable amount of household electricity consumption data in an incremental manner. However, such data has embedded load patterns and hidden information to extract and learn consumer behavior. The extracted load patterns from data clustering should be updated because consumer behaviors may be changed over time. The purpose of this study is to update the new clustering results based on the old data rather than to re-cluster all of the data from scratch.
Design/methodology/approach
This paper proposes an incremental clustering with nearness factor (ICNF) algorithm to update load patterns without overall daily load curve clustering.
Findings
Extensive experiments are implemented on real-world SEM data of Irish Social Science Data Archive (Ireland) data set. The results are evaluated by both accuracy measures and clustering validity indices, which indicate that proposed method is useful for using the enormous amount of smart meter data to understand customers’ electricity consumption behaviors.
Originality/value
ICNF can provide an efficient response for electricity consumption patterns analysis to end consumers via SEMs.
Details
Keywords
Runhai Jiao, Shaolong Liu, Wu Wen and Biying Lin
The large volume of big data makes it impractical for traditional clustering algorithms which are usually designed for entire data set. The purpose of this paper is to focus on…
Abstract
Purpose
The large volume of big data makes it impractical for traditional clustering algorithms which are usually designed for entire data set. The purpose of this paper is to focus on incremental clustering which divides data into series of data chunks and only a small amount of data need to be clustered at each time. Few researches on incremental clustering algorithm address the problem of optimizing cluster center initialization for each data chunk and selecting multiple passing points for each cluster.
Design/methodology/approach
Through optimizing initial cluster centers, quality of clustering results is improved for each data chunk and then quality of final clustering results is enhanced. Moreover, through selecting multiple passing points, more accurate information is passed down to improve the final clustering results. The method has been proposed to solve those two problems and is applied in the proposed algorithm based on streaming kernel fuzzy c-means (stKFCM) algorithm.
Findings
Experimental results show that the proposed algorithm demonstrates more accuracy and better performance than streaming kernel stKFCM algorithm.
Originality/value
This paper addresses the problem of improving the performance of increment clustering through optimizing cluster center initialization and selecting multiple passing points. The paper analyzed the performance of the proposed scheme and proved its effectiveness.
Details
Keywords
The purpose of this paper is to provide a review of the issues related to cluster analysis, one of the most important and primitive activities of human beings, and of the advances…
Abstract
Purpose
The purpose of this paper is to provide a review of the issues related to cluster analysis, one of the most important and primitive activities of human beings, and of the advances made in recent years.
Design/methodology/approach
The paper investigates the clustering algorithms rooted in machine learning, computer science, statistics, and computational intelligence.
Findings
The paper reviews the basic issues of cluster analysis and discusses the recent advances of clustering algorithms in scalability, robustness, visualization, irregular cluster shape detection, and so on.
Originality/value
The paper presents a comprehensive and systematic survey of cluster analysis and emphasizes its recent efforts in order to meet the challenges caused by the glut of complicated data from a wide variety of communities.
Details
Keywords
Gian Luca Casali, Mirko Perano, Angelo Presenza and Tindara Abbate
The aim of this paper is to analyze the relationships between distribution strategies and the level of innovation propensity in the winemaking industry. It intends to identify the…
Abstract
Purpose
The aim of this paper is to analyze the relationships between distribution strategies and the level of innovation propensity in the winemaking industry. It intends to identify the existence of patterns around the way wineries innovate and the way distribution channels are used. These determinants can support or constrain wineries’ behaviors in their strategic choices related to distribution channels.
Design/methodology/approach
The sample comprised 191 Italian small- to medium-sized enterprises in the wine industry. First, a two-step cluster analysis was used to identify patterns in the level of innovation propensity and differences in distribution channel strategies. Second, the research question was tested using multinomial logit regression.
Findings
Five clusters of innovation propensity were identified, varying from “no propensity to innovate” to “propensity for radical innovation”, and three clusters of distribution channel strategies were found. A significant negative relationship between innovation propensity and distribution channel strategies was revealed. This means that the greater the propensity to innovate, the smaller the need for a wholesale distribution option.
Research limitations/implications
As with most research, there are limitations to this study. First, the sample is from only one country. A second limitation is the sample size (191 Italian firms). A sample including large firms can be used to further validate the findings. Linked to the sample, another possible limitation is that all respondents were small- and medium-sized enterprises from a single industry.
Practical implications
This study contributes to the current innovation research by showing the existence of a negative relationship between innovation propensity and the choice of distribution channel in the wine industry. This knowledge is precious to entrepreneurs and managers in the wine sector, allowing them to better consider not only the type of strategies related to distribution channels but also the importance of building the firm’s propensity to innovate into the strategic decision-making process. Furthermore, the paper provides an opportunity for practitioners to reflect upon the fact that changing the distribution channel is more than just changing the outlet for their product; it might also require a revision in their innovation propensity to better facilitate the process.
Social implications
There are also social implications, in particular providing an advantage for consumers. The major advantage is based on the fact that consumers are now aware that the level of innovation propensity in a wine industry is directly linked to the type of distribution channel adopted. Therefore, wines with low-innovation propensity are most likely found to adopt wholesale distribution strategy, while the more innovative wineries adopt the wine expert and direct distribution channels.
Originality/value
For the first time, a cluster analysis approach was used to review different typologies of Italian wineries based on their propensity toward to innovation and subsequent distribution strategies. This study further explains the direct relationship between innovation propensity and the strategic choice toward between long or short distribution channels.
Details
Keywords
Maria Soledad Pera and Yiu‐Kai Ng
Tens of thousands of news articles are posted online each day, covering topics from politics to science to current events. To better cope with this overwhelming volume of…
Abstract
Purpose
Tens of thousands of news articles are posted online each day, covering topics from politics to science to current events. To better cope with this overwhelming volume of information, RSS (news) feeds are used to categorize newly posted articles. Nonetheless, most RSS users must filter through many articles within the same or different RSS feeds to locate articles pertaining to their particular interests. Due to the large number of news articles in individual RSS feeds, there is a need for further organizing articles to aid users in locating non‐redundant, informative, and related articles of interest quickly. This paper aims to address these issues.
Design/methodology/approach
The paper presents a novel approach which uses the word‐correlation factors in a fuzzy set information retrieval model to: filter out redundant news articles from RSS feeds; shed less‐informative articles from the non‐redundant ones; and cluster the remaining informative articles according to the fuzzy equivalence classes on the news articles.
Findings
The clustering approach requires little overhead or computational costs, and experimental results have shown that it outperforms other existing, well‐known clustering approaches.
Research limitations/implications
The clustering approach as proposed in this paper applies only to RSS news articles; however, it can be extended to other application domains.
Originality/value
The developed clustering tool is highly efficient and effective in filtering and classifying RSS news articles and does not employ any labor‐intensive user‐feedback strategy. Therefore, it can be implemented in real‐world RSS feeds to aid users in locating RSS news articles of interest.
Details
Keywords
Mamta Kayest and Sanjay Kumar Jain
Document retrieval has become a hot research topic over the past few years, and has been paid more attention in browsing and synthesizing information from different documents. The…
Abstract
Purpose
Document retrieval has become a hot research topic over the past few years, and has been paid more attention in browsing and synthesizing information from different documents. The purpose of this paper is to develop an effective document retrieval method, which focuses on reducing the time needed for the navigator to evoke the whole document based on contents, themes and concepts of documents.
Design/methodology/approach
This paper introduces an incremental learning approach for text categorization using Monarch Butterfly optimization–FireFly optimization based Neural Network (MB–FF based NN). Initially, the feature extraction is carried out on the pre-processed data using Term Frequency–Inverse Document Frequency (TF–IDF) and holoentropy to find the keywords of the document. Then, cluster-based indexing is performed using MB–FF algorithm, and finally, by matching process with the modified Bhattacharya distance measure, the document retrieval is done. In MB–FF based NN, the weights in the NN are chosen using MB–FF algorithm.
Findings
The effectiveness of the proposed MB–FF based NN is proven with an improved precision value of 0.8769, recall value of 0.7957, F-measure of 0.8143 and accuracy of 0.7815, respectively.
Originality/value
The experimental results show that the proposed MB–FF based NN is useful to companies, which have a large workforce across the country.
Details
Keywords
Nils Grashof, Alexander Kopka, Colin Wessendorf and Dirk Fornahl
This paper aims to show the interaction effects between clusters and cluster-specific attributes and the industrial internet of things (IoT) knowledge of a firm on the…
Abstract
Purpose
This paper aims to show the interaction effects between clusters and cluster-specific attributes and the industrial internet of things (IoT) knowledge of a firm on the innovativeness of firms. Cluster theory and the concept of key enabling technologies are linked to test their effect on a firm’s incremental and radical knowledge generation.
Design/methodology/approach
Quantitative approach at the firm-level. By combining several data sources (e.g. ORBIS, PATSTAT and German subsidy catalogue) the paper relies on a unique database encompassing 8,347 firms in Germany. Ordinary least squares (OLS)-regression techniques are used for data analysis.
Findings
Industrial IoT is an important driver of radical patents, mediated positively by firm size. For incremental knowledge, a substitution effect occurs between a cluster and IoT effects, which is bigger for larger firms and dependent on cluster attributes and firms’ outside connections.
Research limitations/implications
The paper opens up new research paths considering long-term disruptive effects of the industrial IoT compared to short-term effects on the innovativeness of firms within clusters. Additionally, it enables further research enriching the discussion about cluster attributes and how these affect ongoing processes.
Practical implications
Linking cluster theory and policy with Industry 4.0 raises awareness for being considerate in terms of funding and scrutinising one-size-fits-all approaches.
Originality/value
Connecting the concepts of a cluster and advanced manufacturing technologies as a proxy for industrial IoT, specifically focussing on both radical and incremental innovations is a new approach. Especially, taking into account the interaction effects between cluster attributes and the influence of industrial IoT on the innovativeness of firms.
Details
Keywords
Prafulla Bafna, Dhanya Pramod, Shailaja Shrwaikar and Atiya Hassan
Document management is growing in importance proportionate to the growth of unstructured data, and its applications are increasing from process benchmarking to customer…
Abstract
Purpose
Document management is growing in importance proportionate to the growth of unstructured data, and its applications are increasing from process benchmarking to customer relationship management and so on. The purpose of this paper is to improve important components of document management that is keyword extraction and document clustering. It is achieved through knowledge extraction by updating the phrase document matrix. The objective is to manage documents by extending the phrase document matrix and achieve refined clusters. The study achieves consistency in cluster quality in spite of the increasing size of data set. Domain independence of the proposed method is tested and compared with other methods.
Design/methodology/approach
In this paper, a synset-based phrase document matrix construction method is proposed where semantically similar phrases are grouped to reduce the dimension curse. When a large collection of documents is to be processed, it includes some documents that are very much related to the topic of interest known as model documents and also the documents that deviate from the topic of interest. These non-relevant documents may affect the cluster quality. The first step in knowledge extraction from the unstructured textual data is converting it into structured form either as term frequency-inverse document frequency matrix or as phrase document matrix. Once in structured form, a range of mining algorithms from classification to clustering can be applied.
Findings
In the enhanced approach, the model documents are used to extract key phrases with synset groups, whereas the other documents participate in the construction of the feature matrix. It gives a better feature vector representation and improved cluster quality.
Research limitations/implications
Various applications that require managing of unstructured documents can use this approach by specifically incorporating the domain knowledge with a thesaurus.
Practical implications
Experiment pertaining to the academic domain is presented that categorizes research papers according to the context and topic, and this will help academicians to organize and build knowledge in a better way. The grouping and feature extraction for resume data can facilitate the candidate selection process.
Social implications
Applications like knowledge management, clustering of search engine results, different recommender systems like hotel recommender, task recommender, and so on, will benefit from this study. Hence, the study contributes to improving document management in business domains or areas of interest of its users from various strata’s of society.
Originality/value
The study proposed an improvement to document management approach that can be applied in various domains. The efficacy of the proposed approach and its enhancement is validated on three different data sets of well-articulated documents from data sets such as biography, resume and research papers. These results can be used for benchmarking further work carried out in these areas.
Details
Keywords
Jie Zhu, Jing Yang, Shaoning Di, Jiazhu Zheng and Leying Zhang
The spatial and non-spatial attributes are the two important characteristics of a spatial point, which belong to the two different attribute domains in many Geographic Information…
Abstract
Purpose
The spatial and non-spatial attributes are the two important characteristics of a spatial point, which belong to the two different attribute domains in many Geographic Information Systems applications. The dual clustering algorithms take into account both spatial and non-spatial attributes, where a cluster has not only high proximity in spatial domain but also high similarity in non-spatial domain. In a geographical dataset, traditional dual spatial clustering algorithms discover homogeneous spatially adjacent clusters suffering from the between-cluster inhomogeneity where those spatial points are described in non-spatial domain. To overcome this limitation, a novel dual-domain clustering algorithm (DDCA) is proposed by considering both spatial proximity and attribute similarity with the presence of inhomogeneity.
Design/methodology/approach
In this algorithm, Delaunay triangulation with edge length constraints is first employed to construct spatial proximity relationships amongst objects. Then, a clustering strategy based on statistical change detection is designed to obtain clusters with similar attributes.
Findings
The effectiveness and practicability of the proposed algorithm are illustrated by experiments on both simulated datasets and real spatial events. It is found that the proposed algorithm can adaptively and accurately detect clusters with spatial proximity and similar non-spatial attributes under the consideration of inhomogeneity.
Originality/value
Traditional dual spatial clustering algorithms discover homogeneous spatially adjacent clusters suffering from the between-cluster inhomogeneity where those spatial points are described in non-spatial domain. The research here is a contribution to developing a dual spatial clustering method considering both spatial proximity and attribute similarity with the presence of inhomogeneity. The detection of these clusters is useful to understand the local patterns of geographical phenomena, such as land use classification, spatial patterns research and big geo-data analysis.
Details
Keywords
Vassiliki A. Koutsonikola, Sophia G. Petridou, Athena I. Vakali and Georgios I. Papadimitriou
Web users' clustering is an important mining task since it contributes in identifying usage patterns, a beneficial task for a wide range of applications that rely on the web. The…
Abstract
Purpose
Web users' clustering is an important mining task since it contributes in identifying usage patterns, a beneficial task for a wide range of applications that rely on the web. The purpose of this paper is to examine the usage of Kullback‐Leibler (KL) divergence, an information theoretic distance, as an alternative option for measuring distances in web users clustering.
Design/methodology/approach
KL‐divergence is compared with other well‐known distance measures and clustering results are evaluated using a criterion function, validity indices, and graphical representations. Furthermore, the impact of noise (i.e. occasional or mistaken page visits) is evaluated, since it is imperative to assess whether a clustering process exhibits tolerance in noisy environments such as the web.
Findings
The proposed KL clustering approach is of similar performance when compared with other distance measures under both synthetic and real data workloads. Moreover, imposing extra noise on real data, the approach shows minimum deterioration among most of the other conventional distance measures.
Practical implications
The experimental results show that a probabilistic measure such as KL‐divergence has proven to be quite efficient in noisy environments and thus constitute a good alternative, the web users clustering problem.
Originality/value
This work is inspired by the usage of divergence in clustering of biological data and it is introduced by the authors in the area of web clustering. According to the experimental results presented in this paper, KL‐divergence can be considered as a good alternative for measuring distances in noisy environments such as the web.
Details