Search results

1 – 10 of over 3000
Article
Publication date: 1 November 2005

Yue‐Shi Lee, Show‐Jane Yen and Min‐Chi Hsieh

Web mining is one of the mining technologies, which applies data mining techniques in large amount of web data to improve the web services. Web traversal pattern mining discovers…

Abstract

Web mining is one of the mining technologies, which applies data mining techniques in large amount of web data to improve the web services. Web traversal pattern mining discovers most of the users’ access patterns from web logs. This information can provide the navigation suggestions for web users such that appropriate actions can be adopted. However, the web data will grow rapidly in the short time, and some of the web data may be antiquated. The user behaviors may be changed when the new web data is inserted into and the old web data is deleted from web logs. Besides, it is considerably difficult to select a perfect minimum support threshold during the mining process to find the interesting rules. Even though the experienced experts, they also cannot determine the appropriate minimum support. Thus, we must constantly adjust the minimum support until the satisfactory mining results can be found. The essences of incremental or interactive data mining are that we can use the previous mining results to reduce the unnecessary processes when the minimum support is changed or web logs are updated. In this paper, we propose efficient incremental and interactive data mining algorithms to discover web traversal patterns and make the mining results to satisfy the users’ requirements. The experimental results show that our algorithms are more efficient than the other approaches.

Details

International Journal of Web Information Systems, vol. 1 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 30 April 2021

Shengyu Guo, Yujia Zhao, Yuqiu Luoren, Kongzheng Liang and Bing Tang

Knowledge discovery related to unsafe behaviors promotes the performance of accident prevention in construction. Although numerous studies on accident causation models have…

Abstract

Purpose

Knowledge discovery related to unsafe behaviors promotes the performance of accident prevention in construction. Although numerous studies on accident causation models have discussed the correlations of unsafe behaviors with various factors (e.g., unsafe conditions), limited research explores correlations between unsafe behaviors within accidents. The purpose of this paper is mining strong association rules of unsafe behaviors from historical accidents to clarify this kind of tacit knowledge.

Design/methodology/approach

A case study was adopted as the research approach, in which accident records from building and urban railway construction in China were selected as data resources. The groups of unsafe behaviors extracted from accident records were expressed by the definitions of unsafe behaviors from safety regulations and operating procedures. Frequent Pattern (FP)-Growth algorithm was used for association rule mining, and the critical correlations between unsafe behaviors were represented by the effective strong rules.

Findings

The findings identify and distinguish correlations between unsafe behaviors within construction accidents. In building construction, workers and managers should pay attention to preventing unsafe behaviors related to personal protective equipment and machines and equipment. In urban railway construction, workers should especially avoid unsafe behaviors of inadequately dealing with environmental factors.

Practical implications

Tacit knowledge is transferred to explicit knowledge as the critical correlations between unsafe behaviors within accidents are determined by the effective strong rules. Additionally, the findings provide practice guidance for safety management, to collaboratively control unsafe behaviors with strong correlations.

Originality/value

This study contributes to the body of safety knowledge in construction and provides a further understanding of how construction accidents are caused by multiple unsafe behaviors.

Details

Engineering, Construction and Architectural Management, vol. 29 no. 4
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 2 July 2020

N. Venkata Sailaja, L. Padmasree and N. Mangathayaru

Text mining has been used for various knowledge discovery based applications, and thus, a lot of research has been contributed towards it. Latest trending research in the text…

176

Abstract

Purpose

Text mining has been used for various knowledge discovery based applications, and thus, a lot of research has been contributed towards it. Latest trending research in the text mining is adopting the incremental learning data, as it is economical while dealing with large volume of information.

Design/methodology/approach

The primary intention of this research is to design and develop a technique for incremental text categorization using optimized Support Vector Neural Network (SVNN). The proposed technique involves four major steps, such as pre-processing, feature selection, classification and feature extraction. Initially, the data is pre-processed based on stop word removal and stemming. Then, the feature extraction is done by extracting semantic word-based features and Term Frequency and Inverse Document Frequency (TF-IDF). From the extracted features, the important features are selected using Bhattacharya distance measure and the features are subjected as the input to the proposed classifier. The proposed classifier performs incremental learning using SVNN, wherein the weights are bounded in a limit using rough set theory. Moreover, for the optimal selection of weights in SVNN, Moth Search (MS) algorithm is used. Thus, the proposed classifier, named Rough set MS-SVNN, performs the text categorization for the incremental data, given as the input.

Findings

For the experimentation, the 20 News group dataset, and the Reuters dataset are used. Simulation results indicate that the proposed Rough set based MS-SVNN has achieved 0.7743, 0.7774 and 0.7745 for the precision, recall and F-measure, respectively.

Originality/value

In this paper, an online incremental learner is developed for the text categorization. The text categorization is done by developing the Rough set MS-SVNN classifier, which classifies the incoming texts based on the boundary condition evaluated by the Rough set theory, and the optimal weights from the MS. The proposed online text categorization scheme has the basic steps, like pre-processing, feature extraction, feature selection and classification. The pre-processing is carried out to identify the unique words from the dataset, and the features like semantic word-based features and TF-IDF are obtained from the keyword set. Feature selection is done by setting a minimum Bhattacharya distance measure, and the selected features are provided to the proposed Rough set MS-SVNN for the classification.

Details

Data Technologies and Applications, vol. 54 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 5 September 2016

Runhai Jiao, Shaolong Liu, Wu Wen and Biying Lin

The large volume of big data makes it impractical for traditional clustering algorithms which are usually designed for entire data set. The purpose of this paper is to focus on…

Abstract

Purpose

The large volume of big data makes it impractical for traditional clustering algorithms which are usually designed for entire data set. The purpose of this paper is to focus on incremental clustering which divides data into series of data chunks and only a small amount of data need to be clustered at each time. Few researches on incremental clustering algorithm address the problem of optimizing cluster center initialization for each data chunk and selecting multiple passing points for each cluster.

Design/methodology/approach

Through optimizing initial cluster centers, quality of clustering results is improved for each data chunk and then quality of final clustering results is enhanced. Moreover, through selecting multiple passing points, more accurate information is passed down to improve the final clustering results. The method has been proposed to solve those two problems and is applied in the proposed algorithm based on streaming kernel fuzzy c-means (stKFCM) algorithm.

Findings

Experimental results show that the proposed algorithm demonstrates more accuracy and better performance than streaming kernel stKFCM algorithm.

Originality/value

This paper addresses the problem of improving the performance of increment clustering through optimizing cluster center initialization and selecting multiple passing points. The paper analyzed the performance of the proposed scheme and proved its effectiveness.

Details

Kybernetes, vol. 45 no. 8
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 8 February 2021

Thiago Cesar de Oliveira, Lúcio de Medeiros and Daniel Henrique Marco Detzel

Real estate appraisals are becoming an increasingly important means of backing up financial operations based on the values of these kinds of assets. However, in very large…

Abstract

Purpose

Real estate appraisals are becoming an increasingly important means of backing up financial operations based on the values of these kinds of assets. However, in very large databases, there is a reduction in the predictive capacity when traditional methods, such as multiple linear regression (MLR), are used. This paper aims to determine whether in these cases the application of data mining algorithms can achieve superior statistical results. First, real estate appraisal databases from five towns and cities in the State of Paraná, Brazil, were obtained from Caixa Econômica Federal bank.

Design/methodology/approach

After initial validations, additional databases were generated with both real, transformed and nominal values, in clean and raw data. Each was assisted by the application of a wide range of data mining algorithms (multilayer perceptron, support vector regression, K-star, M5Rules and random forest), either isolated or combined (regression by discretization – logistic, bagging and stacking), with the use of 10-fold cross-validation in Weka software.

Findings

The results showed more varied incremental statistical results with the use of algorithms than those obtained by MLR, especially when combined algorithms were used. The largest increments were obtained in databases with a large amount of data and in those where minor initial data cleaning was carried out. The paper also conducts a further analysis, including an algorithmic ranking based on the number of significant results obtained.

Originality/value

The authors did not find similar studies or research studies conducted in Brazil.

Details

International Journal of Housing Markets and Analysis, vol. 14 no. 5
Type: Research Article
ISSN: 1753-8270

Keywords

Article
Publication date: 15 May 2019

Usha Manasi Mohapatra, Babita Majhi and Alok Kumar Jagadev

The purpose of this paper is to propose distributed learning-based three different metaheuristic algorithms for the identification of nonlinear systems. The proposed algorithms…

Abstract

Purpose

The purpose of this paper is to propose distributed learning-based three different metaheuristic algorithms for the identification of nonlinear systems. The proposed algorithms are experimented in this study to address problems for which input data are available at different geographic locations. In addition, the models are tested for nonlinear systems with different noise conditions. In a nutshell, the suggested model aims to handle voluminous data with low communication overhead compared to traditional centralized processing methodologies.

Design/methodology/approach

Population-based evolutionary algorithms such as genetic algorithm (GA), particle swarm optimization (PSO) and cat swarm optimization (CSO) are implemented in a distributed form to address the system identification problem having distributed input data. Out of different distributed approaches mentioned in the literature, the study has considered incremental and diffusion strategies.

Findings

Performances of the proposed distributed learning-based algorithms are compared for different noise conditions. The experimental results indicate that CSO performs better compared to GA and PSO at all noise strengths with respect to accuracy and error convergence rate, but incremental CSO is slightly superior to diffusion CSO.

Originality/value

This paper employs evolutionary algorithms using distributed learning strategies and applies these algorithms for the identification of unknown systems. Very few existing studies have been reported in which these distributed learning strategies are experimented for the parameter estimation task.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 12 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 20 September 2021

Viktor Prokop, Jan Stejskal, Beata Mikusova Merickova and Samuel Amponsah Odei

The purpose of this study is to introduce innovative ideas into the treatment of the radical and incremental innovations and to fill the research gap by using: (1) methods that…

318

Abstract

Purpose

The purpose of this study is to introduce innovative ideas into the treatment of the radical and incremental innovations and to fill the research gap by using: (1) methods that can perform complicated tasks and solve complex problems leading in creation of radical and incremental innovation and (2) a broad sample of firms across countries. The authors’ ambition is to contribute to the scientific knowledge by producing evidence about the novel usage of artificial neural network techniques for measuring European firms' innovation activities appearing in black boxes of innovation processes.

Design/methodology/approach

In this study, the authors incorporate an international context into Chesbrough's open innovation (OI) theory and, on the one hand, support the hypothesis that European radical innovators benefit more from foreign cooperation than incremental innovators. On the other hand, the results of the analyses show that European incremental innovators rely on domestic cooperation supported by cooperation with foreign public research institutes. Moreover, the use of decision trees (DT) allows the authors to reveal specific patterns of successful innovators emerging within the hidden layers of neural networks.

Findings

The authors prove that radical European innovators using either internal or external R&D strategies, while the combinations of these strategies do not bring successful innovation outputs. In contrast, European incremental innovators benefit from various internal R&D processes in which engagement in design activities plays a crucial role.

Originality/value

The authors introduce innovative ideas into the treatment of hidden innovation processes and measuring the innovation performance (affected by domestic or international cooperation) of European firms. The approach places emphasis on the novelty of innovation and the issue of international cooperation in the era of OI by designing the framework using a combination of artificial neural networks and DT.

Details

European Journal of Innovation Management, vol. 26 no. 2
Type: Research Article
ISSN: 1460-1060

Keywords

Article
Publication date: 22 June 2022

Gang Yao, Xiaojian Hu, Liangcheng Xu and Zhening Wu

Social media data from financial websites contain information related to enterprise credit risk. Mining valuable new features in social media data helps to improve prediction…

Abstract

Purpose

Social media data from financial websites contain information related to enterprise credit risk. Mining valuable new features in social media data helps to improve prediction performance. This paper proposes a credit risk prediction framework that integrates social media information to improve listed enterprise credit risk prediction in the supply chain.

Design/methodology/approach

The prediction framework includes four stages. First, social media information is obtained through web crawler technology. Second, text sentiment in social media information is mined through natural language processing. Third, text sentiment features are constructed. Finally, the new features are integrated with traditional features as input for models for credit risk prediction. This paper takes Chinese pharmaceutical enterprises as an example to test the prediction framework and obtain relevant management enlightenment.

Findings

The prediction framework can improve enterprise credit risk prediction performance. The prediction performance of text sentiment features in social media data is better than that of most traditional features. The time-weighted text sentiment feature has the best prediction performance in mining social media information.

Practical implications

The prediction framework is helpful for the credit decision-making of credit departments and the policy regulation of regulatory departments and is conducive to the sustainable development of enterprises.

Originality/value

The prediction framework can effectively mine social media information and obtain an excellent prediction effect of listed enterprise credit risk in the supply chain.

Article
Publication date: 24 August 2021

K. Sujatha and V. Udayarani

The purpose of this paper is to improve the privacy in healthcare datasets that hold sensitive information. Putting a stop to privacy divulgence and bestowing relevant information…

Abstract

Purpose

The purpose of this paper is to improve the privacy in healthcare datasets that hold sensitive information. Putting a stop to privacy divulgence and bestowing relevant information to legitimate users are at the same time said to be of differing goals. Also, the swift evolution of big data has put forward considerable ease to all chores of life. As far as the big data era is concerned, propagation and information sharing are said to be the two main facets. Despite several research works performed on these aspects, with the incremental nature of data, the likelihood of privacy leakage is also substantially expanded through various benefits availed of big data. Hence, safeguarding data privacy in a complicated environment has become a major setback.

Design/methodology/approach

In this study, a method called deep restricted additive homomorphic ElGamal privacy preservation (DR-AHEPP) to preserve the privacy of data even in case of incremental data is proposed. An entropy-based differential privacy quasi identification and DR-AHEPP algorithms are designed, respectively, for obtaining privacy-preserved minimum falsified quasi-identifier set and computationally efficient privacy-preserved data.

Findings

Analysis results using Diabetes 130-US hospitals illustrate that the proposed DR-AHEPP method is more significant in preserving privacy on incremental data than existing methods. A comparative analysis of state-of-the-art works with the objective to minimize information loss, false positive rate and execution time with higher accuracy is calibrated.

Originality/value

The paper provides better performance using Diabetes 130-US hospitals for achieving high accuracy, low information loss and false positive rate. The result illustrates that the proposed method increases the accuracy by 4% and reduces the false positive rate and information loss by 25 and 35%, respectively, as compared to state-of-the-art works.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 15 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 26 June 2019

Mamta Kayest and Sanjay Kumar Jain

Document retrieval has become a hot research topic over the past few years, and has been paid more attention in browsing and synthesizing information from different documents. The…

Abstract

Purpose

Document retrieval has become a hot research topic over the past few years, and has been paid more attention in browsing and synthesizing information from different documents. The purpose of this paper is to develop an effective document retrieval method, which focuses on reducing the time needed for the navigator to evoke the whole document based on contents, themes and concepts of documents.

Design/methodology/approach

This paper introduces an incremental learning approach for text categorization using Monarch Butterfly optimization–FireFly optimization based Neural Network (MB–FF based NN). Initially, the feature extraction is carried out on the pre-processed data using Term Frequency–Inverse Document Frequency (TF–IDF) and holoentropy to find the keywords of the document. Then, cluster-based indexing is performed using MB–FF algorithm, and finally, by matching process with the modified Bhattacharya distance measure, the document retrieval is done. In MB–FF based NN, the weights in the NN are chosen using MB–FF algorithm.

Findings

The effectiveness of the proposed MB–FF based NN is proven with an improved precision value of 0.8769, recall value of 0.7957, F-measure of 0.8143 and accuracy of 0.7815, respectively.

Originality/value

The experimental results show that the proposed MB–FF based NN is useful to companies, which have a large workforce across the country.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 12 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

1 – 10 of over 3000