Search results

1 – 10 of 187

View access options

Article

Publication date: 25 September 2019

An analysis of design process and performance in distributed data science teams

Torsten Maier, Joanna DeFranco and Christopher Mccomb

Often, it is assumed that teams are better at solving problems than individuals working independently. However, recent work in engineering, design and psychology contradicts this…

HTML

PDF (1.1 MB)

Downloads

396

Abstract

Purpose

Often, it is assumed that teams are better at solving problems than individuals working independently. However, recent work in engineering, design and psychology contradicts this assumption. This study aims to examine the behavior of teams engaged in data science competitions. Crowdsourced competitions have seen increased use for software development and data science, and platforms often encourage teamwork between participants.

Design/methodology/approach

We specifically examine the teams participating in data science competitions hosted by Kaggle. We analyze the data provided by Kaggle to compare the effect of team size and interaction frequency on team performance. We also contextualize these results through a semantic analysis.

Findings

This work demonstrates that groups of individuals working independently may outperform interacting teams on average, but that small, interacting teams are more likely to win competitions. The semantic analysis revealed differences in forum participation, verb usage and pronoun usage when comparing top- and bottom-performing teams.

Research limitations/implications

These results reveal a perplexing tension that must be explored further: true teams may experience better performance with higher cohesion, but nominal teams may perform even better on average with essentially no cohesion. Limitations of this research include not factoring in team member experience level and reliance on extant data.

Originality/value

These results are potentially of use to designers of crowdsourced data science competitions as well as managers and contributors to distributed software development projects.

Details

Team Performance Management: An International Journal, vol. 25 no. 7/8

Type: Research Article

DOI:

ISSN: 1352-7592

Keywords

View access options

Article

Publication date: 10 March 2022

Human activity recognition in WBAN using ensemble model

Jayaram Boga and Dhilip Kumar V.

For achieving the profitable human activity recognition (HAR) method, this paper solves the HAR problem under wireless body area network (WBAN) using a developed ensemble learning…

HTML

PDF (6 MB)

Downloads

Abstract

Purpose

For achieving the profitable human activity recognition (HAR) method, this paper solves the HAR problem under wireless body area network (WBAN) using a developed ensemble learning approach. The purpose of this study is,to solve the HAR problem under WBAN using a developed ensemble learning approach for achieving the profitable HAR method. There are three data sets used for this HAR in WBAN, namely, human activity recognition using smartphones, wireless sensor data mining and Kaggle. The proposed model undergoes four phases, namely, “pre-processing, feature extraction, feature selection and classification.” Here, the data can be preprocessed by artifacts removal and median filtering techniques. Then, the features are extracted by techniques such as “t-Distributed Stochastic Neighbor Embedding”, “Short-time Fourier transform” and statistical approaches. The weighted optimal feature selection is considered as the next step for selecting the important features based on computing the data variance of each class. This new feature selection is achieved by the hybrid coyote Jaya optimization (HCJO). Finally, the meta-heuristic-based ensemble learning approach is used as a new recognition approach with three classifiers, namely, “support vector machine (SVM), deep neural network (DNN) and fuzzy classifiers.” Experimental analysis is performed.

Design/methodology/approach

The proposed HCJO algorithm was developed for optimizing the membership function of fuzzy, iteration limit of SVM and hidden neuron count of DNN for getting superior classified outcomes and to enhance the performance of ensemble classification.

Findings

The accuracy for enhanced HAR model was pretty high in comparison to conventional models, i.e. higher than 6.66% to fuzzy, 4.34% to DNN, 4.34% to SVM, 7.86% to ensemble and 6.66% to Improved Sealion optimization algorithm-Attention Pyramid-Convolutional Neural Network-AP-CNN, respectively.

Originality/value

The suggested HAR model with WBAN using HCJO algorithm is accurate and improves the effectiveness of the recognition.

Details

International Journal of Pervasive Computing and Communications, vol. 19 no. 4

Type: Research Article

DOI:

ISSN: 1742-7371

Keywords

View access options

Article

Publication date: 29 November 2021

A novel semi-supervised self-training method based on resampling for Twitter fake account identification

Ziming Zeng, Tingting Li, Shouqiang Sun, Jingjing Sun and Jie Yin

Twitter fake accounts refer to bot accounts created by third-party organizations to influence public opinion, commercial propaganda or impersonate others. The effective…

HTML

PDF (3.5 MB)

Downloads

280

Abstract

Purpose

Twitter fake accounts refer to bot accounts created by third-party organizations to influence public opinion, commercial propaganda or impersonate others. The effective identification of bot accounts is conducive to accurately judge the disseminated information for the public. However, in actual fake account identification, it is expensive and inefficient to manually label Twitter accounts, and the labeled data are usually unbalanced in classes. To this end, the authors propose a novel framework to solve these problems.

Design/methodology/approach

In the proposed framework, the authors introduce the concept of semi-supervised self-training learning and apply it to the real Twitter account data set from Kaggle. Specifically, the authors first train the classifier in the initial small amount of labeled account data, then use the trained classifier to automatically label large-scale unlabeled account data. Next, iteratively select high confidence instances from unlabeled data to expand the labeled data. Finally, an expanded Twitter account training set is obtained. It is worth mentioning that the resampling technique is integrated into the self-training process, and the data class is balanced at the initial stage of the self-training iteration.

Findings

The proposed framework effectively improves labeling efficiency and reduces the influence of class imbalance. It shows excellent identification results on 6 different base classifiers, especially for the initial small-scale labeled Twitter accounts.

Originality/value

This paper provides novel insights in identifying Twitter fake accounts. First, the authors take the lead in introducing a self-training method to automatically label Twitter accounts from the semi-supervised background. Second, the resampling technique is integrated into the self-training process to effectively reduce the influence of class imbalance on the identification effect.

Details

Data Technologies and Applications, vol. 56 no. 3

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 20 September 2019

Supply chain sales forecasting based on lightGBM and LSTM combination model

Tingyu Weng, Wenyang Liu and Jun Xiao

The purpose of this paper is to design a model that can accurately forecast the supply chain sales.

HTML

PDF (812 KB)

Downloads

2302

Abstract

Purpose

The purpose of this paper is to design a model that can accurately forecast the supply chain sales.

Design/methodology/approach

This paper proposed a new model based on lightGBM and LSTM to forecast the supply chain sales. In order to verify the accuracy and efficiency of this model, three representative supply chain sales data sets are selected for experiments.

Findings

The experimental results show that the combined model can forecast supply chain sales with high accuracy, efficiency and interpretability.

Practical implications

With the rapid development of big data and AI, using big data analysis and algorithm technology to accurately forecast the long-term sales of goods will provide the database for the supply chain and key technical support for enterprises to establish supply chain solutions. This paper provides an effective method for supply chain sales forecasting, which can help enterprises to scientifically and reasonably forecast long-term commodity sales.

Originality/value

The proposed model not only inherits the ability of LSTM model to automatically mine high-level temporal features, but also has the advantages of lightGBM model, such as high efficiency, strong interpretability, which is suitable for industrial production environment.

Details

Industrial Management & Data Systems, vol. 120 no. 2

Type: Research Article

DOI:

ISSN: 0263-5577

Keywords

View access options

Article

Publication date: 10 January 2023

Predictive model for admission uncertainty in high education using Naïve Bayes classifier

Atul Rawal and Bechoo Lal

The uncertainty of getting admission into universities/institutions is one of the global problems in an academic environment. The students are having good marks with highest…

HTML

PDF (279 KB)

Downloads

260

Abstract

Purpose

The uncertainty of getting admission into universities/institutions is one of the global problems in an academic environment. The students are having good marks with highest credential, but they are not sure about getting their admission into universities/institutions. In this research study, the researcher builds a predictive model using Naïve Bayes classifiers – machine learning algorithm to extract and analyze hidden pattern in students’ academic records and their credentials. The main purpose of this research study is to reduce the uncertainty for getting admission into universities/institutions based on their previous credentials and some other essential parameters.

Design/methodology/approach

This research study presents a joint venture of Naïve Bayes Classification and Kernel Density Estimations (KDE) to predict the student’s admission into universities or any higher institutions. The researcher collected data from the Kaggle data sets based on grade point average (GPA), graduate record examinations (GRE) and RANK of universities which are essential to take admission in higher education.

Findings

The classification model is built on the training data set of students’ examination score such as GPA, GRE, RANK and some other essential features that offered the admission with a predictive accuracy rate 72% and has been experimentally verified. To improve the quality of accuracy, the researcher used the Shapiro–Walk Normality Test and Gaussian distribution on large data sets.

Research limitations/implications

The limitation of this research study is that the developed predictive model is not applicable for getting admission into all courses. The researcher used the limited data attributes such as GRE, GPA and RANK which does not define the admission into all possible courses. It is stated that it is applicable only for student’s admission into universities/institutions, and the researcher used only three attributes of admission parameters, namely, GRE, GPA and RANK.

Practical implications

The researcher used the Naïve Bayes classifiers and KDE machine learning algorithms to develop a predictive model which is more reliable and efficient to classify the admission category (Admitted/Not Admitted) into universities/institutions. During the research study, the researcher found that accuracy performance of the predictive Model 1 and that of predictive Model 2 are very close to each other, with predictive Model 1 having truly predictive and falsely predictive rate of 70.46% and 29.53%, respectively.

Social implications

Yes, it is having a significant contribution for society; students and parents can get prior information about the possibilities of admission in higher academic institutions and universities.

Originality/value

The classification model can reduce the admission uncertainty and enhance the university’s decision-making capabilities. The significance of this research study is to reduce human intervention for making decisions with respect to the student’s admission into universities or any higher academic institutions, and it demonstrates many universities and higher-level institutions could use this predictive model to improve their admission process without human intervention.

Details

Journal of Indian Business Research, vol. 15 no. 2

Type: Research Article

DOI:

ISSN: 1755-4195

Keywords

View access options

Article

Publication date: 25 January 2022

Classification of disordered patient’s voice by using pervasive computational algorithms

Anil Kumar Maddali and Habibulla Khan

Currently, the design, technological features of voices, and their analysis of various applications are being simulated with the requirement to communicate at a greater distance…

HTML

PDF (1.9 MB)

Downloads

Abstract

Purpose

Currently, the design, technological features of voices, and their analysis of various applications are being simulated with the requirement to communicate at a greater distance or more discreetly. The purpose of this study is to explore how voices and their analyses are used in modern literature to generate a variety of solutions, of which only a few successful models exist.

Design/methodology

The mel-frequency cepstral coefficient (MFCC), average magnitude difference function, cepstrum analysis and other voice characteristics are effectively modeled and implemented using mathematical modeling with variable weights parametric for each algorithm, which can be used with or without noises. Improvising the design characteristics and their weights with different supervised algorithms that regulate the design model simulation.

Findings

Different data models have been influenced by the parametric range and solution analysis in different space parameters, such as frequency or time model, with features such as without, with and after noise reduction. The frequency response of the current design can be analyzed through the Windowing techniques.

Original value

A new model and its implementation scenario with pervasive computational algorithms’ (PCA) (such as the hybrid PCA with AdaBoost (HPCA), PCA with bag of features and improved PCA with bag of features) relating the different features such as MFCC, power spectrum, pitch, Window techniques, etc. are calculated using the HPCA. The features are accumulated on the matrix formulations and govern the design feature comparison and its feature classification for improved performance parameters, as mentioned in the results.

Details

International Journal of Pervasive Computing and Communications, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1742-7371

Keywords

Open Access

Article

Publication date: 4 December 2020

Are you a good borrower? Mining interpretable pattern structures in credit scoring

Sergei O. Kuznetsov, Alexey Masyutin and Aleksandr Ageev

The purpose of this study is to show that closure-based classification and regression models provide both high accuracy and interpretability.

HTML

PDF (652 KB)

Downloads

750

Abstract

Purpose

The purpose of this study is to show that closure-based classification and regression models provide both high accuracy and interpretability.

Design/methodology/approach

Pattern structures allow one to approach the knowledge extraction problem in case of partially ordered descriptions. They provide a way to apply techniques based on closed descriptions to non-binary data. To provide scalability of the approach, the author introduced a lazy (query-based) classification algorithm.

Findings

The experiments support the hypothesis that closure-based classification and regression allow one to both achieve higher accuracy in scoring models as compared to results obtained with classical banking models and retain interpretability of model results, whereas black-box methods grant better accuracy for the cost of losing interpretability.

Originality/value

This is an original research showing the advantage of closure-based classification and regression models in the banking sphere.

Details

Asian Journal of Economics and Banking, vol. 4 no. 3

Type: Research Article

DOI:

ISSN: 2615-9821

Keywords

View access options

Article

Publication date: 3 January 2023

Weighted ensemble classifier for malicious link detection using natural language processing

Saleem Raja A., Sundaravadivazhagan Balasubaramanian, Pradeepa Ganesan, Justin Rajasekaran and Karthikeyan R.

The internet has completely merged into contemporary life. People are addicted to using internet services for everyday activities. Consequently, an abundance of information about…

HTML

PDF (849 KB)

Downloads

Abstract

Purpose

The internet has completely merged into contemporary life. People are addicted to using internet services for everyday activities. Consequently, an abundance of information about people and organizations is available online, which encourages the proliferation of cybercrimes. Cybercriminals often use malicious links for large-scale cyberattacks, which are disseminated via email, SMS and social media. Recognizing malicious links online can be exceedingly challenging. The purpose of this paper is to present a strong security system that can detect malicious links in the cyberspace using natural language processing technique.

Design/methodology/approach

The researcher recommends a variety of approaches, including blacklisting and rules-based machine/deep learning, for automatically recognizing malicious links. But the approaches generally necessitate the generation of a set of features to generalize the detection process. Most of the features are generated by processing URLs and content of the web page, as well as some external features such as the ranking of the web page and domain name system information. This process of feature extraction and selection typically takes more time and demands a high level of expertise in the domain. Sometimes the generated features may not leverage the full potentials of the data set. In addition, the majority of the currently deployed systems make use of a single classifier for the classification of malicious links. However, prediction accuracy may vary widely depending on the data set and the classifier used.

Findings

To address the issue of generating feature sets, the proposed method uses natural language processing techniques (term frequency and inverse document frequency) that vectorize URLs. To build a robust system for the classification of malicious links, the proposed system implements weighted soft voting classifier, an ensemble classifier that combines predictions of base classifiers. The ability or skill of each classifier serves as the base for the weight that is assigned to it.

Originality/value

The proposed method performs better when the optimal weights are assigned. The performance of the proposed method was assessed by using two different data sets (D1 and D2) and compared performance against base machine learning classifiers and previous research results. The outcome accuracy shows that the proposed method is superior to the existing methods, offering 91.4% and 98.8% accuracy for data sets D1 and D2, respectively.

Details

International Journal of Pervasive Computing and Communications, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1742-7371

Keywords

View access options

Article

Publication date: 6 August 2020

SARS n-CoV2-19 detection from chest x-ray images using deep neural networks

Mohammad Khalid Pandit and Shoaib Amin Banday

Novel coronavirus is fast spreading pathogen worldwide and is threatening billions of lives. SARS n-CoV2 is known to affect the lungs of the COVID-19 positive patients. Chest…

HTML

PDF (1.5 MB)

Downloads

163

Abstract

Purpose

Novel coronavirus is fast spreading pathogen worldwide and is threatening billions of lives. SARS n-CoV2 is known to affect the lungs of the COVID-19 positive patients. Chest x-rays are the most widely used imaging technique for clinical diagnosis due to fast imaging time and low cost. The purpose of this study is to use deep learning technique for automatic detection of COVID-19 using chest x-rays.

Design/methodology/approach

The authors used a data set containing confirmed COVID-19 positive, common bacterial pneumonia and healthy cases (no infection). A collection of 1,428 x-ray images is used in this study. The authors used a pre-trained VGG-16 model for the classification task. Transfer learning with fine-tuning was used in this study to effectively train the network on a relatively small chest x-ray data set. Initial experiments show that the model achieves promising results and can be greatly used to expedite COVID-19 detection.

Findings

The authors achieved an accuracy of 96% and 92.5% in two and three output class cases, respectively. Based on these findings, the medical community can access using x-ray images as possible diagnostic tool for faster COVID-19 detection to complement the already testing and diagnosis methods.

Originality/value

The proposed method can be used as initial screening which can help health-care professionals to better treat the COVID patients by timely detecting and screening the presence of disease.

Details

International Journal of Pervasive Computing and Communications, vol. 16 no. 5

Type: Research Article

DOI:

ISSN: 1742-7371

Keywords

Open Access

Article

Publication date: 31 July 2023

Sarcasm detection in online comments using machine learning

Daniel Šandor and Marina Bagić Babac

Sarcasm is a linguistic expression that usually carries the opposite meaning of what is being said by words, thus making it difficult for machines to discover the actual meaning…

HTML

PDF (2 MB)

Downloads

3097

Abstract

Purpose

Sarcasm is a linguistic expression that usually carries the opposite meaning of what is being said by words, thus making it difficult for machines to discover the actual meaning. It is mainly distinguished by the inflection with which it is spoken, with an undercurrent of irony, and is largely dependent on context, which makes it a difficult task for computational analysis. Moreover, sarcasm expresses negative sentiments using positive words, allowing it to easily confuse sentiment analysis models. This paper aims to demonstrate the task of sarcasm detection using the approach of machine and deep learning.

Design/methodology/approach

For the purpose of sarcasm detection, machine and deep learning models were used on a data set consisting of 1.3 million social media comments, including both sarcastic and non-sarcastic comments. The data set was pre-processed using natural language processing methods, and additional features were extracted and analysed. Several machine learning models, including logistic regression, ridge regression, linear support vector and support vector machines, along with two deep learning models based on bidirectional long short-term memory and one bidirectional encoder representations from transformers (BERT)-based model, were implemented, evaluated and compared.

Findings

The performance of machine and deep learning models was compared in the task of sarcasm detection, and possible ways of improvement were discussed. Deep learning models showed more promise, performance-wise, for this type of task. Specifically, a state-of-the-art model in natural language processing, namely, BERT-based model, outperformed other machine and deep learning models.

Originality/value

This study compared the performance of the various machine and deep learning models in the task of sarcasm detection using the data set of 1.3 million comments from social media.

Details

Information Discovery and Delivery, vol. 52 no. 2

Type: Research Article

DOI:

ISSN: 2398-6247

Keywords

Access

Year

Content type

1 – 10 of 187

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Social implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology

Findings

Original value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

We’re listening — tell us what you think

Something didn’t work…