Search results
1 – 10 of over 20000Janani Balakumar and S. Vijayarani Mohan
Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification…
Abstract
Purpose
Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content.
Design/methodology/approach
This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper.
Findings
The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy.
Originality/value
This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.
Details
Keywords
Noura AlNuaimi, Mohammad Mehedy Masud, Mohamed Adel Serhani and Nazar Zaki
Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real time…
Abstract
Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real time. However, storing and processing large and varied datasets (known as big data) is challenging to do in real time. In machine learning, streaming feature selection has always been considered a superior technique for selecting the relevant subset features from highly dimensional data and thus reducing learning complexity. In the relevant literature, streaming feature selection refers to the features that arrive consecutively over time; despite a lack of exact figure on the number of features, numbers of instances are well-established. Many scholars in the field have proposed streaming-feature-selection algorithms in attempts to find the proper solution to this problem. This paper presents an exhaustive and methodological introduction of these techniques. This study provides a review of the traditional feature-selection algorithms and then scrutinizes the current algorithms that use streaming feature selection to determine their strengths and weaknesses. The survey also sheds light on the ongoing challenges in big-data research.
Details
Keywords
Chong Wu, Xiaofang Chen and Yongjie Jiang
While the Chinese securities market is booming, the phenomenon of listed companies falling into financial distress is also emerging, which affects the operation and development of…
Abstract
Purpose
While the Chinese securities market is booming, the phenomenon of listed companies falling into financial distress is also emerging, which affects the operation and development of enterprises and also jeopardizes the interests of investors. Therefore, it is important to understand how to accurately and reasonably predict the financial distress of enterprises.
Design/methodology/approach
In the present study, ensemble feature selection (EFS) and improved stacking were used for financial distress prediction (FDP). Mutual information, analysis of variance (ANOVA), random forest (RF), genetic algorithms, and recursive feature elimination (RFE) were chosen for EFS to select features. Since there may be missing information when feeding the results of the base learner directly into the meta-learner, the features with high importance were fed into the meta-learner together. A screening layer was added to select the meta-learner with better performance. Finally, Optima hyperparameters were used for parameter tuning by the learners.
Findings
An empirical study was conducted with a sample of A-share listed companies in China. The F1-score of the model constructed using the features screened by EFS reached 84.55%, representing an improvement of 4.37% compared to the original features. To verify the effectiveness of improved stacking, benchmark model comparison experiments were conducted. Compared to the original stacking model, the accuracy of the improved stacking model was improved by 0.44%, and the F1-score was improved by 0.51%. In addition, the improved stacking model had the highest area under the curve (AUC) value (0.905) among all the compared models.
Originality/value
Compared to previous models, the proposed FDP model has better performance, thus bridging the research gap of feature selection. The present study provides new ideas for stacking improvement research and a reference for subsequent research in this field.
Details
Keywords
Farshad Faezy Razi and Seyed Hooman Shariat
The purpose of this paper is twofold: the selection of project portfolios through hybrid artificial neural network algorithms, feature selection based on grey relational analysis…
Abstract
Purpose
The purpose of this paper is twofold: the selection of project portfolios through hybrid artificial neural network algorithms, feature selection based on grey relational analysis, decision tree and regression; and the identification of the features affecting project portfolio selection using the artificial neural network algorithm, decision tree and regression. The authors also aim to classify the available options using the decision tree algorithm.
Design/methodology/approach
In order to achieve the research goals, a project-oriented organization was selected and studied. In all, 49 project management indicators were chosen from A Guide to the Project Management Body of Knowledge (PMBOK Guide), and the most important indicators were identified using a feature selection algorithm and decision tree. After the extraction of rules, decision rule-based multi-criteria decision making matrices were produced. Each matrix was ranked through grey relational analysis, similarity to ideal solution method and multi-criteria optimization. Finally, a model for choosing the best ranking method was designed and implemented using the genetic algorithm. To analyze the responses, stability of the classes was investigated.
Findings
The results showed that projects ranked based on neural network weights by the grey relational analysis method prove to be better options for the selection of a project portfolio. The process of identification of the features affecting project portfolio selection resulted in the following factors: scope management, project charter, project management plan, stakeholders and risk.
Originality/value
This study presents the most effective features affecting project portfolio selection which is highly impressive in organizational decision making and must be considered seriously. Deploying sensitivity analysis, which is an innovation in such studies, played a constructive role in examining the accuracy and reliability of the proposed models, and it can be firmly argued that the results have had an important role in validating the findings of this study.
Details
Keywords
Svetlana Boudko, Wolfgang Leister and Stein Gjessing
Coexistence of various wireless access networks and the ability of mobile terminals to switch between them make an optimal selection of serving networks for multicast groups a…
Abstract
Purpose
Coexistence of various wireless access networks and the ability of mobile terminals to switch between them make an optimal selection of serving networks for multicast groups a challenging problem. Since optimal network selection requires large dimensions of data to be collected from several network locations and sent between several network components, the scalability can easily become a bottleneck in large-scale systems. Therefore, reducing data exchange within heterogeneous wireless networks is important. The paper aims to discuss these issues.
Design/methodology/approach
The authors study the decision-making process and the data that need to be sent between different network components. To analyze the operation of the wireless heterogeneous network, the authors built a mathematical model of the network. The objective is defined as a minimization of multicast streams in the system. To evaluate the heuristic solutions, the authors define the upper and lower bounds to their operation.
Findings
The proposed heuristic solutions substantially reduce the usage of bandwidth in mobile networks and exchange of information between the network components.
Originality/value
The authors proposed the approach that allows network selection in a decentralized manner with only limited information shared among the decision makers. The authors studied how different sets of information available to decision makers influenced the performance of the system. The work also investigates the usage of multiple paths for multicast in heterogeneous mobile environments.
Details
Keywords
Jonathan S. Greipel, Regina M. Frank, Meike Huber, Ansgar Steland and Robert H. Schmitt
To ensure product quality within a manufacturing process, inspection processes are indispensable. One task of inspection planning is the selection of inspection characteristics…
Abstract
Purpose
To ensure product quality within a manufacturing process, inspection processes are indispensable. One task of inspection planning is the selection of inspection characteristics. For optimization of costs and benefits, key characteristics can be defined by which the product quality can be checked with sufficient accuracy. The manual selection of key characteristics requires substantial planning effort and becomes uneconomic if many product variants prevail. This paper, therefore, aims to show a method for the efficient determination of key characteristics.
Design/methodology/approach
The authors present a novel Algorithm for the Selection of Key Characteristics (ASKC) based on an auto-encoder and a risk analysis. Given historical measurement data and tolerances, the algorithm clusters characteristics with redundant information and selects key characteristics based on a risk assessment. The authors compare ASKC with the algorithm Principal Feature Analysis (PFA) using artificial and historical measurement data.
Findings
The authors find that ASKC delivers superior results than PFA. Findings show that the algorithms enable the cost-efficient selection of key characteristics while maintaining the informative value of the inspection concerning the quality.
Originality/value
This paper fills an identified gap for simplified inspection planning with the method for the efficient selection of key features via ASKC.
Details
Keywords
Rama Rao A., Satyananda Reddy and Valli Kumari V.
Multimedia applications such as digital audio and video have stringent quality of service (QoS) requirement in mobile ad hoc network. To support wide range of QoS, complex routing…
Abstract
Purpose
Multimedia applications such as digital audio and video have stringent quality of service (QoS) requirement in mobile ad hoc network. To support wide range of QoS, complex routing protocols with multiple QoS constraints are necessary. In QoS routing, the basic problem is to find a path that satisfies multiple QoS constraints. Moreover, mobility, congestion and packet loss in dynamic topology of network also leads to QoS performance degradation of protocol.
Design/methodology/approach
In this paper, the authors proposed a multi-path selection scheme for QoS aware routing in mobile ad hoc network based on fractional cuckoo search algorithm (FCS-MQARP). Here, multiple QoS constraints energy, link life time, distance and delay are considered for path selection.
Findings
The experimentation of proposed FCS-MQARP is performed over existing QoS aware routing protocols AOMDV, MMQARP, CS-MQARP using measures such as normalized delay, energy and throughput. The extensive simulation study of the proposed FCS-based multipath selection shows that the proposed QoS aware routing protocol performs better than the existing routing protocol with maximal energy of 99.1501 and minimal delay of 0.0554.
Originality/value
This paper presents a hybrid optimization algorithm called the FCS algorithm for the multi-path selection. Also, a new fitness function is developed by considering the QoS constraints such as energy, link life time, distance and delay.
Details
Keywords
Abhishek Dixit, Ashish Mani and Rohit Bansal
Feature selection is an important step for data pre-processing specially in the case of high dimensional data set. Performance of the data model is reduced if the model is trained…
Abstract
Purpose
Feature selection is an important step for data pre-processing specially in the case of high dimensional data set. Performance of the data model is reduced if the model is trained with high dimensional data set, and it results in poor classification accuracy. Therefore, before training the model an important step to apply is the feature selection on the dataset to improve the performance and classification accuracy.
Design/methodology/approach
A novel optimization approach that hybridizes binary particle swarm optimization (BPSO) and differential evolution (DE) for fine tuning of SVM classifier is presented. The name of the implemented classifier is given as DEPSOSVM.
Findings
This approach is evaluated using 20 UCI benchmark text data classification data set. Further, the performance of the proposed technique is also evaluated on UCI benchmark image data set of cancer images. From the results, it can be observed that the proposed DEPSOSVM techniques have significant improvement in performance over other algorithms in the literature for feature selection. The proposed technique shows better classification accuracy as well.
Originality/value
The proposed approach is different from the previous work, as in all the previous work DE/(rand/1) mutation strategy is used whereas in this study DE/(rand/2) is used and the mutation strategy with BPSO is updated. Another difference is on the crossover approach in our case as we have used a novel approach of comparing best particle with sigmoid function. The core contribution of this paper is to hybridize DE with BPSO combined with SVM classifier (DEPSOSVM) to handle the feature selection problems.
Details
Keywords
Mohamed A. Tawhid and Kevin B. Dsouza
In this paper, we present a new hybrid binary version of bat and enhanced particle swarm optimization algorithm in order to solve feature selection problems. The proposed algorithm…
Abstract
In this paper, we present a new hybrid binary version of bat and enhanced particle swarm optimization algorithm in order to solve feature selection problems. The proposed algorithm is called Hybrid Binary Bat Enhanced Particle Swarm Optimization Algorithm (HBBEPSO). In the proposed HBBEPSO algorithm, we combine the bat algorithm with its capacity for echolocation helping explore the feature space and enhanced version of the particle swarm optimization with its ability to converge to the best global solution in the search space. In order to investigate the general performance of the proposed HBBEPSO algorithm, the proposed algorithm is compared with the original optimizers and other optimizers that have been used for feature selection in the past. A set of assessment indicators are used to evaluate and compare the different optimizers over 20 standard data sets obtained from the UCI repository. Results prove the ability of the proposed HBBEPSO algorithm to search the feature space for optimal feature combinations.
Details
Keywords
Pablo A.D. Castro and Fernando J. Von Zuben
The purpose of this paper is to apply a multi‐objective Bayesian artificial immune system (MOBAIS) to feature selection in classification problems aiming at minimizing both the…
Abstract
Purpose
The purpose of this paper is to apply a multi‐objective Bayesian artificial immune system (MOBAIS) to feature selection in classification problems aiming at minimizing both the classification error and cardinality of the subset of features. The algorithm is able to perform a multimodal search maintaining population diversity and controlling automatically the population size according to the problem. In addition, it is capable of identifying and preserving building blocks (partial components of the whole solution) effectively.
Design/methodology/approach
The algorithm evolves candidate subsets of features by replacing the traditional mutation operator in immune‐inspired algorithms with a probabilistic model which represents the probability distribution of the promising solutions found so far. Then, the probabilistic model is used to generate new individuals. A Bayesian network is adopted as the probabilistic model due to its capability of capturing expressive interactions among the variables of the problem. In order to evaluate the proposal, it was applied to ten datasets and the results compared with those generated by state‐of‐the‐art algorithms.
Findings
The experiments demonstrate the effectiveness of the multi‐objective approach to feature selection. The algorithm found parsimonious subsets of features and the classifiers produced a significant improvement in the accuracy. In addition, the maintenance of building blocks avoids the disruption of partial solutions, leading to a quick convergence.
Originality/value
The originality of this paper relies on the proposal of a novel algorithm to multi‐objective feature selection.
Details