Search results

1 – 10 of over 2000
Click here to view access options
Article
Publication date: 29 April 2014

Wei-Chao Lin, Chih-Fong Tsai and Shih-Wen Ke

Churn prediction is a very important task for successful customer relationship management. In general, churn prediction can be achieved by many data mining techniques…

Abstract

Purpose

Churn prediction is a very important task for successful customer relationship management. In general, churn prediction can be achieved by many data mining techniques. However, during data mining, dimensionality reduction (or feature selection) and data reduction are the two important data preprocessing steps. In particular, the aims of feature selection and data reduction are to filter out irrelevant features and noisy data samples, respectively. The purpose of this paper, performing these data preprocessing tasks, is to make the mining algorithm produce good quality mining results.

Design/methodology/approach

Based on a real telecom customer churn data set, seven different preprocessed data sets based on performing feature selection and data reduction by different priorities are used to train the artificial neural network as the churn prediction model.

Findings

The results show that performing data reduction first by self-organizing maps and feature selection second by principal component analysis can allow the prediction model to provide the highest prediction accuracy. In addition, this priority allows the prediction model for more efficient learning since 66 and 62 percent of the original features and data samples are reduced, respectively.

Originality/value

The contribution of this paper is to understand the better procedure of performing the two important data preprocessing steps for telecom churn prediction.

Details

Kybernetes, vol. 43 no. 5
Type: Research Article
ISSN: 0368-492X

Keywords

Click here to view access options
Article
Publication date: 15 February 2016

Joanna Poon and Michael Brownlow

The purpose of this paper is to investigate whether gender has an impact on real estate and built environment graduates’ employment outcomes, employment patterns and other…

Abstract

Purpose

The purpose of this paper is to investigate whether gender has an impact on real estate and built environment graduates’ employment outcomes, employment patterns and other important employment related issues, such as pay, role, contract type and employment opportunity in different states of a country.

Design/methodology/approach

The data used in this paper has been collected from the Australian Graduate Survey (AGS). Data from the years 2010-2012 was combined into a single data set. Dimensionality reduction was used to prepare the data set for the courses listed in AGS data, in order to develop the simplified classifications for real estate and built environment courses which are used to conduct further analysis in this paper. Dimensionality reduction was also used to prepare data set for the further analysis of the employment outcomes and patterns for real estate graduates. Descriptive and statistical analysis methods were used to identify the impact of gender on the employment outcomes, employment patterns and other important employment related issues, such as pay, role, contract type and location of job, for real estate graduates in Australia. This paper also benchmarks the employment result of real estate graduates to built environment graduates.

Findings

Recent male built environment graduates in Australia are more likely to gain full-time employment than females. The dominant role for recent female built environment graduates in Australia is a secretarial or administrative role while for the male it is a professional or technical role. Male real estate and built environment graduates are more likely to have a higher level of salary. Gender also has an impact on the contract type. Male built environment graduates are more likely to be employed on a permanent contract. On the other hand, gender has no impact on gaining employment in different states, such as New South Wales and Queensland, in Australia. The finding of this paper reinforces the view of previous literature, which is that male graduates have a more favourable employment outcomes and on better employment terms. The finding also shows that graduate employment outcomes for real estate and built environment graduates in Australia are similar to that in other countries, such as the UK, where equivalent studies have been published.

Originality/value

This is pioneering research that investigates the impact of gender on employment outcomes, employment patterns and other employment related issues for real estate graduates and built environment graduates in Australia.

Details

Property Management, vol. 34 no. 1
Type: Research Article
ISSN: 0263-7472

Keywords

Click here to view access options
Article
Publication date: 29 April 2014

Mohammad Amin Shayegan and Saeed Aghabozorgi

Pattern recognition systems often have to handle problem of large volume of training data sets including duplicate and similar training samples. This problem leads to…

Abstract

Purpose

Pattern recognition systems often have to handle problem of large volume of training data sets including duplicate and similar training samples. This problem leads to large memory requirement for saving and processing data, and the time complexity for training algorithms. The purpose of the paper is to reduce the volume of training part of a data set – in order to increase the system speed, without any significant decrease in system accuracy.

Design/methodology/approach

A new technique for data set size reduction – using a version of modified frequency diagram approach – is presented. In order to reduce processing time, the proposed method compares the samples of a class to other samples in the same class, instead of comparing samples from different classes. It only removes patterns that are similar to the generated class template in each class. To achieve this aim, no feature extraction operation was carried out, in order to produce more precise assessment on the proposed data size reduction technique.

Findings

The results from the experiments, and according to one of the biggest handwritten numeral standard optical character recognition (OCR) data sets, Hoda, show a 14.88 percent decrease in data set volume without significant decrease in performance.

Practical implications

The proposed technique is effective for size reduction for all pictorial databases such as OCR data sets.

Originality/value

State-of-the-art algorithms currently used for data set size reduction usually remove samples near to class's centers, or support vector (SV) samples between different classes. However, the samples near to a class center have valuable information about class characteristics, and they are necessary to build a system model. Also, SV s are important samples to evaluate the system efficiency. The proposed technique, unlike the other available methods, keeps both outlier samples, as well as the samples close to the class centers.

Click here to view access options
Article
Publication date: 5 December 2017

Rabeb Faleh, Sami Gomri, Mehdi Othman, Khalifa Aguir and Abdennaceur Kachouri

In this paper, a novel hybrid approach aimed at solving the problem of cross-selectivity of gases in electronic nose (E-nose) using the combination classifiers of support…

Abstract

Purpose

In this paper, a novel hybrid approach aimed at solving the problem of cross-selectivity of gases in electronic nose (E-nose) using the combination classifiers of support vector machine (SVM) and k-nearest neighbors (KNN) methods was proposed.

Design/methodology/approach

First, three WO3 sensors E-nose system was used for data acquisition to detect three gases, namely, ozone, ethanol and acetone. Then, two transient parameters, derivate and integral, were extracted for each gas response. Next, the principal component analysis (PCA) was been applied to extract the most relevant sensor data and dimensionality reduction. The new coordinates calculated by PCA were used as inputs for classification by the SVM method. Finally, the classification achieved by the KNN method was carried out to calculate only the support vectors (SVs), not all the data.

Findings

This work has proved that the proposed fusion method led to the highest classification rate (100 per cent) compared to the accuracy of the individual classifiers: KNN, SVM-linear, SVM-RBF, SVM-polynomial that present, respectively, 89, 75.2, 80 and 79.9 per cent as classification rate.

Originality/value

The authors propose a fusion classifier approach to improve the classification rate. In this method, the extracted features are projected into the PCA subspace to reduce the dimensionality. Then, the obtained principal components are introduced to the SVM classifier and calculated SVs which will be used in the KNN method.

Details

Sensor Review, vol. 38 no. 1
Type: Research Article
ISSN: 0260-2288

Keywords

Open Access
Article
Publication date: 28 July 2020

Harleen Kaur and Vinita Kumari

Diabetes is a major metabolic disorder which can affect entire body system adversely. Undiagnosed diabetes can increase the risk of cardiac stroke, diabetic nephropathy…

Downloads
1524

Abstract

Diabetes is a major metabolic disorder which can affect entire body system adversely. Undiagnosed diabetes can increase the risk of cardiac stroke, diabetic nephropathy and other disorders. All over the world millions of people are affected by this disease. Early detection of diabetes is very important to maintain a healthy life. This disease is a reason of global concern as the cases of diabetes are rising rapidly. Machine learning (ML) is a computational method for automatic learning from experience and improves the performance to make more accurate predictions. In the current research we have utilized machine learning technique in Pima Indian diabetes dataset to develop trends and detect patterns with risk factors using R data manipulation tool. To classify the patients into diabetic and non-diabetic we have developed and analyzed five different predictive models using R data manipulation tool. For this purpose we used supervised machine learning algorithms namely linear kernel support vector machine (SVM-linear), radial basis function (RBF) kernel support vector machine, k-nearest neighbour (k-NN), artificial neural network (ANN) and multifactor dimensionality reduction (MDR).

Click here to view access options
Article
Publication date: 6 August 2020

Chunyan Zeng, Dongliang Zhu, Zhifeng Wang, Zhenghui Wang, Nan Zhao and Lu He

Most source recording device identification models for Web media forensics are based on a single feature to complete the identification task and often have the…

Abstract

Purpose

Most source recording device identification models for Web media forensics are based on a single feature to complete the identification task and often have the disadvantages of long time and poor accuracy. The purpose of this paper is to propose a new method for end-to-end network source identification of multi-feature fusion devices.

Design/methodology/approach

This paper proposes an efficient multi-feature fusion source recording device identification method based on end-to-end and attention mechanism, so as to achieve efficient and convenient identification of recording devices of Web media forensics.

Findings

The authors conducted sufficient experiments to prove the effectiveness of the models that they have proposed. The experiments show that the end-to-end system is improved by 7.1% compared to the baseline i-vector system, compared to the authors’ previous system, the accuracy is improved by 0.4%, and the training time is reduced by 50%.

Research limitations/implications

With the development of Web media forensics and internet technology, the use of Web media as evidence is increasing. Among them, it is particularly important to study the authenticity and accuracy of Web media audio.

Originality/value

This paper aims to promote the development of source recording device identification and provide effective technology for Web media forensics and judicial record evidence that need to apply device source identification technology.

Details

International Journal of Web Information Systems, vol. 16 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Open Access
Article
Publication date: 28 July 2020

Noura AlNuaimi, Mohammad Mehedy Masud, Mohamed Adel Serhani and Nazar Zaki

Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real…

Abstract

Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real time. However, storing and processing large and varied datasets (known as big data) is challenging to do in real time. In machine learning, streaming feature selection has always been considered a superior technique for selecting the relevant subset features from highly dimensional data and thus reducing learning complexity. In the relevant literature, streaming feature selection refers to the features that arrive consecutively over time; despite a lack of exact figure on the number of features, numbers of instances are well-established. Many scholars in the field have proposed streaming-feature-selection algorithms in attempts to find the proper solution to this problem. This paper presents an exhaustive and methodological introduction of these techniques. This study provides a review of the traditional feature-selection algorithms and then scrutinizes the current algorithms that use streaming feature selection to determine their strengths and weaknesses. The survey also sheds light on the ongoing challenges in big-data research.

Details

Applied Computing and Informatics, vol. 18 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Click here to view access options
Article
Publication date: 17 January 2022

Syed Haroon Abdul Gafoor and Padma Theagarajan

Conventional diagnostic techniques, on the other hand, may be prone to subjectivity since they depend on assessment of motions that are often subtle to individual eyes and…

Abstract

Purpose

Conventional diagnostic techniques, on the other hand, may be prone to subjectivity since they depend on assessment of motions that are often subtle to individual eyes and hence hard to classify, potentially resulting in misdiagnosis. Meanwhile, early nonmotor signs of Parkinson’s disease (PD) can be mild and may be due to variety of other conditions. As a result, these signs are usually ignored, making early PD diagnosis difficult. Machine learning approaches for PD classification and healthy controls or individuals with similar medical symptoms have been introduced to solve these problems and to enhance the diagnostic and assessment processes of PD (like, movement disorders or other Parkinsonian syndromes).

Design/methodology/approach

Medical observations and evaluation of medical symptoms, including characterization of a wide range of motor indications, are commonly used to diagnose PD. The quantity of the data being processed has grown in the last five years; feature selection has become a prerequisite before any classification. This study introduces a feature selection method based on the score-based artificial fish swarm algorithm (SAFSA) to overcome this issue.

Findings

This study adds to the accuracy of PD identification by reducing the amount of chosen vocal features while to use the most recent and largest publicly accessible database. Feature subset selection in PD detection techniques starts by eliminating features that are not relevant or redundant. According to a few objective functions, features subset chosen should provide the best performance.

Research limitations/implications

In many situations, this is an Nondeterministic Polynomial Time (NP-Hard) issue. This method enhances the PD detection rate by selecting the most essential features from the database. To begin, the data set's dimensionality is reduced using Singular Value Decomposition dimensionality technique. Next, Biogeography-Based Optimization (BBO) for feature selection; the weight value is a vital parameter for finding the best features in PD classification.

Originality/value

PD classification is done by using ensemble learning classification approaches such as hybrid classifier of fuzzy K-nearest neighbor, kernel support vector machines, fuzzy convolutional neural network and random forest. The suggested classifiers are trained using data from UCI ML repository, and their results are verified using leave-one-person-out cross validation. The measures employed to assess the classifier efficiency include accuracy, F-measure, Matthews correlation coefficient.

Details

International Journal of Intelligent Computing and Cybernetics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1756-378X

Keywords

Click here to view access options
Article
Publication date: 20 April 2015

Joanna Poon and Michael Brownlow

The purpose of this paper is to identify the relative importance of the factors that influence the overall satisfaction of real estate students and also examine the extent…

Downloads
1361

Abstract

Purpose

The purpose of this paper is to identify the relative importance of the factors that influence the overall satisfaction of real estate students and also examine the extent to which demographic backgrounds affect this. Furthermore, this paper benchmarks the satisfaction of real estate students against that of built environment students.

Design/methodology/approach

The data used in this paper have been collected from the Course Experience Questionnaire (CEQ) within the Australian Graduate Survey (AGS). Dimensionality reduction was used to prepare the data about the courses identified in the AGS for analysis. This was done in order to simplify classification of real estate and built environment courses examined in this paper. Descriptive and statistical analysis methods were used to analyse student satisfaction variables and identify the extent to which demographic factors influenced overall student satisfaction.

Findings

Real estate students in Australia have a relatively higher level of student satisfaction compared to built environment students overall, but built environment students have a higher level of satisfaction with regard to compulsory variables such as “Good Teaching Scale” and “Generic Skills Scale”. However, real estate students show a higher level of agreement in the Likert scale regarding the optional variables “Appropriate Assessment” and “Learning Community”, respectively. The most important factor for overall student satisfaction was the question: “the staff made it clear right from the start what they expected from the students”. The answers to this question had a Pearson correlation value of 1.000 for both real estate and built environment students. Age and mode of study also have some impact on the overall satisfaction level of both sets of students, while gender, degree class and the year the university were established are additional factors affecting the overall satisfaction of built environment students.

Practical implications

This research identifies the factors that affect the satisfaction of property course students in ascending order of importance. Course directors of real estate courses can use the findings of this research to make recommendations on the redesign and redevelopment of their courses in order to make them more attractive and appealing to students to enhance student recruitment and retention.

Originality/value

This is pioneering research that provides a comprehensive overview of the factors affecting student satisfaction with regard to real estate and built environment students in Australia.

Details

Property Management, vol. 33 no. 2
Type: Research Article
ISSN: 0263-7472

Keywords

Click here to view access options
Article
Publication date: 11 October 2021

Changro Lee

Sampling taxpayers for audits has always been a major concern for policymakers of tax administration. The purpose of this study is to propose a systematic method to select…

Abstract

Purpose

Sampling taxpayers for audits has always been a major concern for policymakers of tax administration. The purpose of this study is to propose a systematic method to select a small number of taxpayers with a high probability of tax fraud.

Design/methodology/approach

An efficient sampling method for taxpayers for an audit is investigated in the context of a property acquisition tax. An autoencoder, a popular unsupervised learning algorithm, is applied to 2,228 tax returns, and reconstruction errors are calculated to determine the probability of tax deficiencies for each return. The reasonableness of the estimated reconstruction errors is verified using the Apriori algorithm, a well-known marketing tool for identifying patterns in purchased item sets.

Findings

The sorted reconstruction scores are reasonably consistent with actual fraudulent/non-fraudulent cases, indicating that the reconstruction errors can be utilized to select suspected taxpayers for an audit in a cost-effective manner.

Originality/value

The proposed deep learning-based approach is expected to be applied in a real-world tax administration, promoting voluntary compliance of taxpayers, and reinforcing the self-assessing acquisition tax system.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9288

Keywords

1 – 10 of over 2000