Search results

1 – 10 of over 3000
Article
Publication date: 15 March 2021

Putta Hemalatha and Geetha Mary Amalanathan

Adequate resources for learning and training the data are an important constraint to develop an efficient classifier with outstanding performance. The data usually follows a…

Abstract

Purpose

Adequate resources for learning and training the data are an important constraint to develop an efficient classifier with outstanding performance. The data usually follows a biased distribution of classes that reflects an unequal distribution of classes within a dataset. This issue is known as the imbalance problem, which is one of the most common issues occurring in real-time applications. Learning of imbalanced datasets is a ubiquitous challenge in the field of data mining. Imbalanced data degrades the performance of the classifier by producing inaccurate results.

Design/methodology/approach

In the proposed work, a novel fuzzy-based Gaussian synthetic minority oversampling (FG-SMOTE) algorithm is proposed to process the imbalanced data. The mechanism of the Gaussian SMOTE technique is based on finding the nearest neighbour concept to balance the ratio between minority and majority class datasets. The ratio of the datasets belonging to the minority and majority class is balanced using a fuzzy-based Levenshtein distance measure technique.

Findings

The performance and the accuracy of the proposed algorithm is evaluated using the deep belief networks classifier and the results showed the efficiency of the fuzzy-based Gaussian SMOTE technique achieved an AUC: 93.7%. F1 Score Prediction: 94.2%, Geometric Mean Score: 93.6% predicted from confusion matrix.

Research limitations/implications

The proposed research still retains some of the challenges that need to be focused such as application FG-SMOTE to multiclass imbalanced dataset and to evaluate dataset imbalance problem in a distributed environment.

Originality/value

The proposed algorithm fundamentally solves the data imbalance issues and challenges involved in handling the imbalanced data. FG-SMOTE has aided in balancing minority and majority class datasets.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 14 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 20 June 2022

Lokesh Singh, Rekh Ram Janghel and Satya Prakash Sahu

Automated skin lesion analysis plays a vital role in early detection. Having relatively small-sized imbalanced skin lesion datasets impedes learning and dominates research in…

Abstract

Purpose

Automated skin lesion analysis plays a vital role in early detection. Having relatively small-sized imbalanced skin lesion datasets impedes learning and dominates research in automated skin lesion analysis. The unavailability of adequate data poses difficulty in developing classification methods due to the skewed class distribution.

Design/methodology/approach

Boosting-based transfer learning (TL) paradigms like Transfer AdaBoost algorithm can compensate for such a lack of samples by taking advantage of auxiliary data. However, in such methods, beneficial source instances representing the target have a fast and stochastic weight convergence, which results in “weight-drift” that negates transfer. In this paper, a framework is designed utilizing the “Rare-Transfer” (RT), a boosting-based TL algorithm, that prevents “weight-drift” and simultaneously addresses absolute-rarity in skin lesion datasets. RT prevents the weights of source samples from quick convergence. It addresses absolute-rarity using an instance transfer approach incorporating the best-fit set of auxiliary examples, which improves balanced error minimization. It compensates for class unbalance and scarcity of training samples in absolute-rarity simultaneously for inducing balanced error optimization.

Findings

Promising results are obtained utilizing the RT compared with state-of-the-art techniques on absolute-rare skin lesion datasets with an accuracy of 92.5%. Wilcoxon signed-rank test examines significant differences amid the proposed RT algorithm and conventional algorithms used in the experiment.

Originality/value

Experimentation is performed on absolute-rare four skin lesion datasets, and the effectiveness of RT is assessed based on accuracy, sensitivity, specificity and area under curve. The performance is compared with an existing ensemble and boosting-based TL methods.

Details

Data Technologies and Applications, vol. 57 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 1 February 1984

B.C. BROOKES

Haitun has recently shown that empirical distributions are of two types—‘Gaussian’ and ‘Zipfian’—characterized by the presence or absence of moments. Gaussian‐type distributions

Abstract

Haitun has recently shown that empirical distributions are of two types—‘Gaussian’ and ‘Zipfian’—characterized by the presence or absence of moments. Gaussian‐type distributions arise only in physical contexts: Zipfian only in social contexts. As the whole of modern statistical theory is based on Gaussian distributions, Haitun thus shows that its application to social statistics, including cognitive statistics, is ‘inadmissible’. A new statistical theory based on ‘Zipfian’ distributions is therefore needed for the social sciences. Laplace's notorious ‘law of succession’, which has evaded derivation by classical probability theory, is shown to be the ‘Zipfian’ frequency analogue of the Bradford law. It is argued that these two laws together provide the most convenient analytical instruments for the exploration of social science data. Some implications of these findings for the quantitative analysis of information systems are briefly discussed.

Details

Journal of Documentation, vol. 40 no. 2
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 January 1988

QUENTIN L BURRELL

A probabilistic mechanism is proposed to describe various forms of the Bradford phenomenon reported in bibliometric research. This leads to a stochastic process termed the Waring…

Abstract

A probabilistic mechanism is proposed to describe various forms of the Bradford phenomenon reported in bibliometric research. This leads to a stochastic process termed the Waring process, a special case of which seems to conform with the general features of ‘Bradford's Law’. The presence of a time parameter in the model emphasises that we are considering dynamic systems and allows the possibility of predictions being made.

Details

Journal of Documentation, vol. 44 no. 1
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 17 November 2020

Deepti Sisodia and Dilip Singh Sisodia

Analysis of the publisher's behavior plays a vital role in identifying fraudulent publishers in the pay-per-click model of online advertising. However, the vast amount of raw user…

Abstract

Purpose

Analysis of the publisher's behavior plays a vital role in identifying fraudulent publishers in the pay-per-click model of online advertising. However, the vast amount of raw user click data with missing values pose a challenge in analyzing the conduct of publishers. The presence of high cardinality in categorical attributes with multiple possible values has further aggrieved the issue.

Design/methodology/approach

In this paper, gradient tree boosting (GTB) learning is used to address the challenges encountered in learning the publishers' behavior from raw user click data and effectively classifying fraudulent publishers.

Findings

The results demonstrate that the GTB effectively classified fraudulent publishers and exhibited significantly improved performance as compared to other learning methods in terms of average precision (60.5 %), recall (57.8 %) and f-measure (59.1%).

Originality/value

The experiments were conducted using publicly available multiclass raw user click dataset and eight other imbalanced datasets to test the GTB's generalizing behavior, while training and testing were done using 10-fold cross-validation. The performance of GTB was evaluated using average precision, recall and f-measure. The performance of GTB learning was also compared with eleven other state-of-the-art individual and ensemble classification models.

Details

Data Technologies and Applications, vol. 55 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 1 January 2002

Raymond A.K. Cox and Robert T. Kleiman

Outlines previous research on the security analyst “superstar” phenomenon, including the stochastic model of Yule and Simon. Applies this to data on the 1986‐1997 selections for…

178

Abstract

Outlines previous research on the security analyst “superstar” phenomenon, including the stochastic model of Yule and Simon. Applies this to data on the 1986‐1997 selections for the Institutional Investor’s All‐British Research First Team (ABRT) and finds that it does not explain the distribution, i.e. that selection does appear to be based on skill rather than luck. Considers consistency with other research and expects future research to concentrate on the ABRT’s ability to forecast earnings per share and share prices.

Details

Managerial Finance, vol. 28 no. 1
Type: Research Article
ISSN: 0307-4358

Keywords

Article
Publication date: 4 December 2018

Zhongyi Hu, Raymond Chiong, Ilung Pranata, Yukun Bao and Yuqing Lin

Malicious web domain identification is of significant importance to the security protection of internet users. With online credibility and performance data, the purpose of this…

Abstract

Purpose

Malicious web domain identification is of significant importance to the security protection of internet users. With online credibility and performance data, the purpose of this paper to investigate the use of machine learning techniques for malicious web domain identification by considering the class imbalance issue (i.e. there are more benign web domains than malicious ones).

Design/methodology/approach

The authors propose an integrated resampling approach to handle class imbalance by combining the synthetic minority oversampling technique (SMOTE) and particle swarm optimisation (PSO), a population-based meta-heuristic algorithm. The authors use the SMOTE for oversampling and PSO for undersampling.

Findings

By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain data sets with different imbalance ratios. Compared to five other well-known resampling approaches, experimental results confirm that the proposed approach is highly effective.

Practical implications

This study not only inspires the practical use of online credibility and performance data for identifying malicious web domains but also provides an effective resampling approach for handling the class imbalance issue in the area of malicious web domain identification.

Originality/value

Online credibility and performance data are applied to build malicious web domain identification models using machine learning techniques. An integrated resampling approach is proposed to address the class imbalance issue. The performance of the proposed approach is confirmed based on real-world data sets with different imbalance ratios.

Book part
Publication date: 30 November 2016

Robert L. Axtell

Certain elements of Hayek’s work are prominent precursors to the modern field of complex adaptive systems, including his ideas on spontaneous order, his focus on market processes…

Abstract

Certain elements of Hayek’s work are prominent precursors to the modern field of complex adaptive systems, including his ideas on spontaneous order, his focus on market processes, his contrast between designing and gardening, and his own framing of complex systems. Conceptually, he was well ahead of his time, prescient in his formulation of novel ways to think about economies and societies. Technically, the fact that he did not mathematically formalize most of the notions he developed makes his insights hard to incorporate unambiguously into models. However, because so much of his work is divorced from the simplistic models proffered by early mathematical economics, it stands as fertile ground for complex systems researchers today. I suggest that Austrian economists can create a progressive research program by building models of these Hayekian ideas, and thereby gain traction within the economics profession. Instead of mathematical models the suite of techniques and tools known as agent-based computing seems particularly well-suited to addressing traditional Austrian topics like money, business cycles, coordination, market processes, and so on, while staying faithful to the methodological individualism and bottom-up perspective that underpin the entire school of thought.

Details

Revisiting Hayek’s Political Economy
Type: Book
ISBN: 978-1-78560-988-6

Keywords

Content available
Book part
Publication date: 20 January 2005

Abstract

Details

Power Laws in the Information Production Process: Lotkaian Informetrics
Type: Book
ISBN: 978-0-12088-753-8

Article
Publication date: 11 September 2020

Chien-Yi Hsiang and Julia Taylor Rayz

This study aims to predict popular contributors through text representations of user-generated content in open crowds.

1446

Abstract

Purpose

This study aims to predict popular contributors through text representations of user-generated content in open crowds.

Design/methodology/approach

Three text representation approaches – count vector, Tf-Idf vector, word embedding and supervised machine learning techniques – are used to generate popular contributor predictions.

Findings

The results of the experiments demonstrate that popular contributor predictions are considered successful. The F1 scores are all higher than the baseline model. Popular contributors in open crowds can be predicted through user-generated content.

Research limitations/implications

This research presents brand new empirical evidence drawn from text representations of user-generated content that reveals why some contributors' ideas are more viral than others in open crowds.

Practical implications

This research suggests that companies can learn from popular contributors in ways that help them improve customer agility and better satisfy customers' needs. In addition to boosting customer engagement and triggering discussion, popular contributors' ideas provide insights into the latest trends and customer preferences. The results of this study will benefit marketing strategy, new product development, customer agility and management of information systems.

Originality/value

The paper provides new empirical evidence for popular contributor prediction in an innovation crowd through text representation approaches.

Details

Information Technology & People, vol. 35 no. 2
Type: Research Article
ISSN: 0959-3845

Keywords

1 – 10 of over 3000