Search results

1 – 10 of over 86000
Open Access
Article
Publication date: 28 July 2020

Noura AlNuaimi, Mohammad Mehedy Masud, Mohamed Adel Serhani and Nazar Zaki

Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real time…

3602

Abstract

Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real time. However, storing and processing large and varied datasets (known as big data) is challenging to do in real time. In machine learning, streaming feature selection has always been considered a superior technique for selecting the relevant subset features from highly dimensional data and thus reducing learning complexity. In the relevant literature, streaming feature selection refers to the features that arrive consecutively over time; despite a lack of exact figure on the number of features, numbers of instances are well-established. Many scholars in the field have proposed streaming-feature-selection algorithms in attempts to find the proper solution to this problem. This paper presents an exhaustive and methodological introduction of these techniques. This study provides a review of the traditional feature-selection algorithms and then scrutinizes the current algorithms that use streaming feature selection to determine their strengths and weaknesses. The survey also sheds light on the ongoing challenges in big-data research.

Details

Applied Computing and Informatics, vol. 18 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 24 October 2023

Doan Thao Tram Pham, Sascha Steinmann and Birger Boutrup Jensen

In this paper the authors aim to review the state-of-the-art literature on online review systems and their impacts on consumer behavior and retailers' performance with the aim of…

291

Abstract

Purpose

In this paper the authors aim to review the state-of-the-art literature on online review systems and their impacts on consumer behavior and retailers' performance with the aim of identifying research gaps related to different design features of review systems and developing future research agenda.

Design/methodology/approach

The authors conducted a systematic review based on PRISMA 2020 protocol, focusing on studies published in the domains of retailing and marketing. This procedure resulted in 48 selected papers investigating the design features of retailer online review systems.

Findings

The authors identify eight design features that are controllable by retailers in an online review system. The design features have been researched independently in previous literature, with some features receiving more attention. Most selected studies focus on the design features adapted metrics and review presentations, while other features are generally neglected (e.g. rating dimensions). Previous literature argues that design features affect consumer behaviors and retailers' performance. However, the interactions among the features are still neglected in the literature, creating a relevant gap for future research.

Originality/value

This paper distinguishes between different types of retailer online review systems based on how they are implemented. The authors summarize the state-of-the-art of relevant literature on design features of online review systems and their effects on consumer- and retailer-related outcome variables. This systematic literature review distinguishes between online reviews provided on websites controlled by retailers (internal systems) and third-party websites (external systems).

Details

International Journal of Retail & Distribution Management, vol. 51 no. 9/10
Type: Research Article
ISSN: 0959-0552

Keywords

Article
Publication date: 25 January 2022

Tobias Mueller, Alexander Segin, Christoph Weigand and Robert H. Schmitt

In the determination of the measurement uncertainty, the GUM procedure requires the building of a measurement model that establishes a functional relationship between the…

Abstract

Purpose

In the determination of the measurement uncertainty, the GUM procedure requires the building of a measurement model that establishes a functional relationship between the measurand and all influencing quantities. Since the effort of modelling as well as quantifying the measurement uncertainties depend on the number of influencing quantities considered, the aim of this study is to determine relevant influencing quantities and to remove irrelevant ones from the dataset.

Design/methodology/approach

In this work, it was investigated whether the effort of modelling for the determination of measurement uncertainty can be reduced by the use of feature selection (FS) methods. For this purpose, 9 different FS methods were tested on 16 artificial test datasets, whose properties (number of data points, number of features, complexity, features with low influence and redundant features) were varied via a design of experiments.

Findings

Based on a success metric, the stability, universality and complexity of the method, two FS methods could be identified that reliably identify relevant and irrelevant influencing quantities for a measurement model.

Originality/value

For the first time, FS methods were applied to datasets with properties of classical measurement processes. The simulation-based results serve as a basis for further research in the field of FS for measurement models. The identified algorithms will be applied to real measurement processes in the future.

Details

International Journal of Quality & Reliability Management, vol. 40 no. 3
Type: Research Article
ISSN: 0265-671X

Keywords

Article
Publication date: 6 January 2022

Deepti Sisodia and Dilip Singh Sisodia

The problem of choosing the utmost useful features from hundreds of features from time-series user click data arises in online advertising toward fraudulent publisher's…

Abstract

Purpose

The problem of choosing the utmost useful features from hundreds of features from time-series user click data arises in online advertising toward fraudulent publisher's classification. Selecting feature subsets is a key issue in such classification tasks. Practically, the use of filter approaches is common; however, they neglect the correlations amid features. Conversely, wrapper approaches could not be applied due to their complexities. Moreover, in particular, existing feature selection methods could not handle such data, which is one of the major causes of instability of feature selection.

Design/methodology/approach

To overcome such issues, a majority voting-based hybrid feature selection method, namely feature distillation and accumulated selection (FDAS), is proposed to investigate the optimal subset of relevant features for analyzing the publisher's fraudulent conduct. FDAS works in two phases: (1) feature distillation, where significant features from standard filter and wrapper feature selection methods are obtained using majority voting; (2) accumulated selection, where we enumerated an accumulated evaluation of relevant feature subset to search for an optimal feature subset using effective machine learning (ML) models.

Findings

Empirical results prove enhanced classification performance with proposed features in average precision, recall, f1-score and AUC in publisher identification and classification.

Originality/value

The FDAS is evaluated on FDMA2012 user-click data and nine other benchmark datasets to gauge its generalizing characteristics, first, considering original features, second, with relevant feature subsets selected by feature selection (FS) methods, third, with optimal feature subset obtained by the proposed approach. ANOVA significance test is conducted to demonstrate significant differences between independent features.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 14 October 2013

Dong Liu, Ming Cong, Yu Du and Clarence W. de Silva

Indoor robotic tasks frequently specify objects. For these applications, this paper aims to propose an object-based attention method using task-relevant feature for target…

Abstract

Purpose

Indoor robotic tasks frequently specify objects. For these applications, this paper aims to propose an object-based attention method using task-relevant feature for target selection. The task-relevant feature(s) are deduced from the learned object representation in semantic memory (SM), and low dimensional bias feature templates are obtained using Gaussian mixture model (GMM) to get an efficient attention process. This method can be used to select target in a scene which forms a task-specific representation of the environment and improves the scene understanding by driving the robot to a position in which the objects of interest can be detected with a smaller error probability.

Design/methodology/approach

Task definition and object representation in SM are proposed, and bias feature templates are obtained using GMM deduction for features from high dimension to low dimension. Mean shift method is used to segment the visual scene into discrete proto-objects. Given a task-specific object, the top-down bias attention uses obtained statistical knowledge of the visual features of the desired target to impact proto-objects and generate the saliency map by combining with the bottom-up saliency-based attention so as to maximize target detection speed.

Findings

Experimental results show that the proposed GMM-based attention model provides an effective and efficient method for task-specific target selection under different conditions. The promising results show that the method may provide good approximation to how humans combine target cues to optimize target selection.

Practical implications

The present method has been successfully applied in plenty of natural scenes of indoor robotic tasks. The proposed method has a wide range of applications and is using for an intelligent homecare robot cognitive control project. Due to the computational cost, the current implementation of this method has some limitations in real-time application.

Originality/value

The novel attention model which uses GMM to get the bias feature templates is proposed for attention competition. It provides a solution for object-based attention, and it is effective and efficient to improve search speed due to the autonomous deduction of features. The proposed model is adaptive without requiring predefined distinct types of features for task-specific objects.

Details

Industrial Robot: An International Journal, vol. 40 no. 6
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 27 July 2010

Natalie Clewley, Sherry Y. Chen and Xiaohui Liu

Cognitive style has been identified to be significantly influential in deciding users' preferences of search engines. In particular, Witkin's field dependence/independence has…

1430

Abstract

Purpose

Cognitive style has been identified to be significantly influential in deciding users' preferences of search engines. In particular, Witkin's field dependence/independence has been widely studied in the area of web searching. It has been suggested that this cognitive style has conceptual links with the holism/serialism. This study aims to investigate the differences between the field dependence/independence and holism/serialism.

Design/methodology/approach

An empirical study was conducted with 120 students from a UK university. Riding's cognitive style analysis (CSA) and Ford's study preference questionnaire (SPQ) were used to identify the students' cognitive styles. A questionnaire was designed to identify users' preferences for the design of search engines. Data mining techniques were applied to analyse the data obtained from the empirical study.

Findings

The results highlight three findings. First, a fundamental link is confirmed between the two cognitive styles. Second, the relationship between field dependent users and holists is suggested to be more prominent than that of field independent users and serialists. Third, the interface design preferences of field dependent and field independent users can be split more clearly than those of holists and serialists.

Originality/value

The contributions of this study include a deeper understanding of the similarities and differences between field dependence/independence and holists/serialists as well as proposing a novel methodology for data analyses.

Details

Journal of Documentation, vol. 66 no. 4
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 9 August 2011

Brano Glumac, Qi Han, Jos Smeets and Wim Schaefer

A brownfield site is well described by various definitions and the idea to redevelop it is supported by identifying numerous benefits for the society. Further, the existing…

Abstract

Purpose

A brownfield site is well described by various definitions and the idea to redevelop it is supported by identifying numerous benefits for the society. Further, the existing literature covers a broad range of different aspects of the brownfield redevelopment thus elaborating different features. At present, there is no overview of the brownfield features from the real estate development perspective focusing on the physical, legal and financial aspects of a site and property. This paper aims to address these issues.

Design/methodology/approach

At first, this paper contributes with the literature survey after which the features were structured according to the real estate development perspective. Additionally, the authors distinguish different expert groups and show the importance of keeping apart their aggregated opinions. Fuzzy Delphi technique is considered as an excellent method to gather such diverse panel data since it supports expert diversity in its procedure and calculation. Additionally, this method captures the uncertainty due to the human factor in valuation and thus improves the validity of the features quantification.

Findings

The survey was among the experts grouped by the specific goals and tasks. This paper presents the findings how the different expert groups value the brownfield features relevant for development potential and what they are.

Originality/value

The paper aims to contribute to the complex decision‐making process in brownfield redevelopment by identifying, structuring and rating the most relevant features of development potential. The authors introduced the method that highlights the importance of rigorous procedure for the panel data collection and advances the weighting of the features. This is of particular importance for the field of the real estate development appraisal since the present features or variables influence the future marketability and cost of a development. Missing these features seriously endangers the appraisal. A similar threat can influence any econometric model recently extensively used in policymaking.

Details

Journal of European Real Estate Research, vol. 4 no. 2
Type: Research Article
ISSN: 1753-9269

Keywords

Article
Publication date: 3 June 2020

Euripidis N. Loukis, Manolis Maragoudakis and Niki Kyriakou

Public sector has started exploiting artificial intelligence (AI) techniques, however, mainly for operational but much less for tactical or level tasks. The purpose of this study…

Abstract

Purpose

Public sector has started exploiting artificial intelligence (AI) techniques, however, mainly for operational but much less for tactical or level tasks. The purpose of this study is to exploit AI for the highest strategic-level task of government: to develop an AI-based public sector data analytics methodology for supporting policymaking for one of the most serious and large-scale challenges that governments repeatedly face, the economic crises that lead to economic recessions (though the proposed methodology is of much more general applicability).

Design/methodology/approach

A public sector data analytics methodology has been developed, which enables the exploitation of existing public and private sector data, through advanced processing of them using a big data-oriented AI technique, “all-relevant” feature selection, to identify characteristics of firms as well as their external environment that affect (positively or negatively) their resilience to economic crisis.

Findings

A first application of the proposed public sector data analytics methodology has been conducted, using Greek firms’ data concerning the economic crisis period 2009–2014, which has led to interesting conclusions and insights, revealing factors affecting the extent of sales revenue decrease in Greek firms during the above crisis period and providing a first validation of the methodology used in this study.

Research limitations/implications

This paper contributes to the advancement of two emerging highly important, for the society, but minimally researched, digital government research domains: public sector data analytics (and especially policy analytics) and government exploitation of AI. It exploits an AI feature selection algorithm, the Boruta “all-relevant” variables identification algorithm, which has been minimally exploited in the past for public sector data analytics, to support the design of public policies for addressing one of the most serious and large-scale economic challenges that governments repeatedly face: the economic crises.

Practical implications

The proposed methodology allows the identification of characteristics of firms as well as their external environment that affect positively or negatively their resilience to economic crisis. This enables a better understanding of the kinds of firms that are more strongly hit by the crisis, which is quite useful for the design of public policies for supporting them; and at the same time reveals firms’ practices, resources, capabilities, etc. that enhance their ability to cope with economic crisis, to design policies for promoting them through educational and support activities.

Social implications

This methodology can be very useful for the design of more effective public policies for reducing the negative impacts of economic crises on firms, and therefore mitigating their negative consequences for the society, such as unemployment, poverty and social exclusion.

Originality/value

This study develops a novel approach to the exploitation of public and private sector data, based on a minimally exploited, for such purposes, AI technique (“all-relevant” feature selection), to support the design of public policies for addressing one of the most threatening disruptions that modern economies and societies repeatedly face, the economic crises.

Details

Transforming Government: People, Process and Policy, vol. 14 no. 4
Type: Research Article
ISSN: 1750-6166

Keywords

Article
Publication date: 12 June 2020

Sandeepkumar Hegde and Monica R. Mundada

According to the World Health Organization, by 2025, the contribution of chronic disease is expected to rise by 73% compared to all deaths and it is considered as global burden of…

Abstract

Purpose

According to the World Health Organization, by 2025, the contribution of chronic disease is expected to rise by 73% compared to all deaths and it is considered as global burden of disease with a rate of 60%. These diseases persist for a longer duration of time, which are almost incurable and can only be controlled. Cardiovascular disease, chronic kidney disease (CKD) and diabetes mellitus are considered as three major chronic diseases that will increase the risk among the adults, as they get older. CKD is considered a major disease among all these chronic diseases, which will increase the risk among the adults as they get older. Overall 10% of the population of the world is affected by CKD and it is likely to double in the year 2030. The paper aims to propose novel feature selection approach in combination with the machine-learning algorithm which can early predict the chronic disease with utmost accuracy. Hence, a novel feature selection adaptive probabilistic divergence-based feature selection (APDFS) algorithm is proposed in combination with the hyper-parameterized logistic regression model (HLRM) for the early prediction of chronic disease.

Design/methodology/approach

A novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals in India. The HLRM is used as a machine-learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results compared to the existing work in most of the cases.

Findings

The performance of the proposed framework is validated by using the metric such as recall, precision, F1 measure and ROC. The predictive performance of the proposed framework is analyzed by passing the data set belongs to various chronic disease such as CKD, diabetes and heart disease. The diagnostic ability of the proposed approach is demonstrated by comparing its result with existing algorithms. The experimental figures illustrated that the proposed framework performed exceptionally well in prior prediction of CKD disease with an accuracy of 91.6.

Originality/value

The capability of the machine learning algorithms depends on feature selection (FS) algorithms in identifying the relevant traits from the data set, which impact the predictive result. It is considered as a process of choosing the relevant features from the data set by removing redundant and irrelevant features. Although there are many approaches that have been already proposed toward this objective, they are computationally complex because of the strategy of following a one-step scheme in selecting the features. In this paper, a novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The proposed algorithm handles the process of feature selection in two separate indices. Hence, the computational complexity of the algorithm is reduced to O(nk+1). The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals of karkala taluk ,India. The HLRM is used as a machine learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results are compared to the existing work in most of the cases.

Details

International Journal of Pervasive Computing and Communications, vol. 17 no. 1
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 13 August 2019

Hongshan Xiao and Yu Wang

Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis…

Abstract

Purpose

Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance.

Design/methodology/approach

A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification.

Findings

The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets.

Research limitations/implications

Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue.

Practical implications

Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems.

Originality/value

A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.

Details

Kybernetes, vol. 48 no. 9
Type: Research Article
ISSN: 0368-492X

Keywords

1 – 10 of over 86000