Search results

1 – 10 of 73
Book part
Publication date: 31 December 2010

Dominique Guégan and Patrick Rakotomarolahy

Purpose – The purpose of this chapter is twofold: to forecast gross domestic product (GDP) using nonparametric method, known as multivariate k-nearest neighbors method, and to…

Abstract

Purpose – The purpose of this chapter is twofold: to forecast gross domestic product (GDP) using nonparametric method, known as multivariate k-nearest neighbors method, and to provide asymptotic properties for this method.

Methodology/approach – We consider monthly and quarterly macroeconomic variables, and to match the quarterly GDP, we estimate the missing monthly economic variables using multivariate k-nearest neighbors method and parametric vector autoregressive (VAR) modeling. Then linking these monthly macroeconomic variables through the use of bridge equations, we can produce nowcasting and forecasting of GDP.

Findings – Using multivariate k-nearest neighbors method, we provide a forecast of the euro area monthly economic indicator and quarterly GDP, which is better than that obtained with a competitive linear VAR modeling. We also provide the asymptotic normality of this k-nearest neighbors regression estimator for dependent time series, as a confidence interval for point forecast in time series.

Originality/value of chapter – We provide a new theoretical result for nonparametric method and propose a novel methodology for forecasting using macroeconomic data.

Details

Nonlinear Modeling of Economic and Financial Time-Series
Type: Book
ISBN: 978-0-85724-489-5

Keywords

Book part
Publication date: 30 September 2020

Hera Khan, Ayush Srivastav and Amit Kumar Mishra

A detailed description will be provided of all the classification algorithms that have been widely used in the domain of medical science. The foundation will be laid by giving a…

Abstract

A detailed description will be provided of all the classification algorithms that have been widely used in the domain of medical science. The foundation will be laid by giving a comprehensive overview pertaining to the background and history of the classification algorithms. This will be followed by an extensive discussion regarding various techniques of classification algorithm in machine learning (ML) hence concluding with their relevant applications in data analysis in medical science and health care. To begin with, the initials of this chapter will deal with the basic fundamentals required for a profound understanding of the classification techniques in ML which will comprise of the underlying differences between Unsupervised and Supervised Learning followed by the basic terminologies of classification and its history. Further, it will include the types of classification algorithms ranging from linear classifiers like Logistic Regression, Naïve Bayes to Nearest Neighbour, Support Vector Machine, Tree-based Classifiers, and Neural Networks, and their respective mathematics. Ensemble algorithms such as Majority Voting, Boosting, Bagging, Stacking will also be discussed at great length along with their relevant applications. Furthermore, this chapter will also incorporate comprehensive elucidation regarding the areas of application of such classification algorithms in the field of biomedicine and health care and their contribution to decision-making systems and predictive analysis. To conclude, this chapter will devote highly in the field of research and development as it will provide a thorough insight to the classification algorithms and their relevant applications used in the cases of the healthcare development sector.

Details

Big Data Analytics and Intelligence: A Perspective for Health Care
Type: Book
ISBN: 978-1-83909-099-8

Keywords

Book part
Publication date: 15 May 2023

Birol Yıldız and Şafak Ağdeniz

Purpose: The main aim of the study is to provide a tool for non-financial information in decision-making. We analysed the non-financial data in the annual reports in order to show…

Abstract

Purpose: The main aim of the study is to provide a tool for non-financial information in decision-making. We analysed the non-financial data in the annual reports in order to show the usage of this information in financial decision processes.

Need for the Study: Main financial reports such as balance sheets and income statements can be analysed by statistical methods. However, an expanded financial reporting framework needs new analysing methods due to unstructured and big data. The study offers a solution to the analysis problem that comes with non-financial reporting, which is an essential communication tool in corporate reporting.

Methodology: Text mining analysis of annual reports is conducted using software named R. To simplify the problem, we try to predict the companies’ corporate governance qualifications using text mining. K Nearest Neighbor, Naive Bayes and Decision Tree machine learning algorithms were used.

Findings: Our analysis illustrates that K Nearest Neighbor has classified the highest number of correct classifications by 85%, compared to 50% for the random walk. The empirical evidence suggests that text mining can be used by all stakeholders as a financial analysis method.

Practical Implications: Combining financial statement analyses with financial reporting analyses will decrease the information asymmetry between the company and stakeholders. So stakeholders can make more accurate decisions. Analysis of non-financial data with text mining will provide a decisive competitive advantage, especially for investors to make the right decisions. This method will lead to allocating scarce resources more effectively. Another contribution of the study is that stakeholders can predict the corporate governance qualification of the company from the annual reports even if it does not include in the Corporate Governance Index (CGI).

Details

Contemporary Studies of Risks in Emerging Technology, Part B
Type: Book
ISBN: 978-1-80455-567-5

Keywords

Book part
Publication date: 6 September 2019

Son Nguyen, Gao Niu, John Quinn, Alan Olinsky, Jonathan Ormsbee, Richard M. Smith and James Bishop

In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an…

Abstract

In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an abundance of imbalanced data in many fields. In this chapter, we compare the performance of six classification methods on an imbalanced dataset under the influence of four resampling techniques. These classification methods are the random forest, the support vector machine, logistic regression, k-nearest neighbor (KNN), the decision tree, and AdaBoost. Our study has shown that all of the classification methods have difficulty when working with the imbalanced data, with the KNN performing the worst, detecting only 27.4% of the minority class. However, with the help of resampling techniques, all of the classification methods experience improvement on overall performances. In particular, the Random Forest, in combination with the random over-sampling technique, performs the best, achieving 82.8% balanced accuracy (the average of the true-positive rate and true-negative rate).

We then propose a new procedure to resample the data. Our method is based on the idea of eliminating “easy” majority observations before under-sampling them. It has further improved the balanced accuracy of the Random Forest to 83.7%, making it the best approach for the imbalanced data.

Details

Advances in Business and Management Forecasting
Type: Book
ISBN: 978-1-78754-290-7

Keywords

Book part
Publication date: 29 September 2023

Torben Juul Andersen

This chapter outlines how the comprehensive North American and European datasets were collected and explains the ensuing data cleaning process outlining three alternative methods…

Abstract

This chapter outlines how the comprehensive North American and European datasets were collected and explains the ensuing data cleaning process outlining three alternative methods applied to deal with missing values, the complete case, the multiple imputation (MI), and the K-nearest neighbor (KNN) methods. The complete case method is the conventional approach adopted in many mainstream management studies. We further discuss the implied assumption underlying use of this technique, which is rarely assessed, or tested in practice and explain the alternative imputation approaches in detail. Use of North American data is the norm but we also collected a European dataset, which is rarely done to enable subsequent comparative analysis between these geographical regions. We introduce the structure of firms organized within different industry classification schemes for use in the ensuing comparative analyses and provide base information on missing values in the original and cleaned datasets. The calculated performance indicators derived from the sampled data are defined and presented. We show how the three alternative approaches considered to deal with missing values have significantly different effects on the calculated performance measures in terms of extreme estimate ranges and mean performance values.

Details

A Study of Risky Business Outcomes: Adapting to Strategic Disruption
Type: Book
ISBN: 978-1-83797-074-2

Keywords

Book part
Publication date: 4 July 2019

Utku Kose

It is possible to see effective use of Artificial Intelligence-based systems in many fields because it easily outperforms traditional solutions or provides solutions for the…

Abstract

It is possible to see effective use of Artificial Intelligence-based systems in many fields because it easily outperforms traditional solutions or provides solutions for the problems not previously solved. Prediction applications are a widely used mechanism in research because they allow for forecasting of future states. Logical inference mechanisms in the field of Artificial Intelligence allow for faster and more accurate and powerful computation. Machine Learning, which is a sub-field of Artificial Intelligence, has been used as a tool for creating effective solutions for prediction problems.

In this chapter the authors will focus on employing Machine Learning techniques for predicting data for future states of economic using techniques which include Artificial Neural Networks, Adaptive Neuro-Fuzzy Inference System, Dynamic Boltzmann Machine, Support Vector Machine, Hidden Markov Model, Bayesian Learning on Gaussian process model, Autoregressive Integrated Moving Average, Autoregressive Model (Poggi, Muselli, Notton, Cristofari, & Louche, 2003), and K-Nearest Neighbor Algorithm. Findings revealed positive results in terms of predicting economic data.

Book part
Publication date: 30 November 2018

Maria De Marsico, Filippo Sciarrone, Andrea Sterbini and Marco Temperini

In the last years, the design and implementation of web-based education systems has grown exponentially, spurred by the fact that neither students nor teachers are bound to a…

Abstract

In the last years, the design and implementation of web-based education systems has grown exponentially, spurred by the fact that neither students nor teachers are bound to a specific location and that this form of computer-based education is virtually independent of any specific hardware platform. These systems accumulate a large amount of data: educational data mining and learning analytics are the two much related fields of research with the aim of using these educational data to improve the learning process. In this chapter, the authors investigate the peer assessment setting in communities of learners. Peer assessment is an effective didactic strategy, useful to evaluate groups of students in educational environments such as high schools and universities where students are required to answer open-ended questions to increase their problem-solving skills. Furthermore, such an approach could become necessary in the learning contexts where the number of students to evaluate could be very large as, for example, in massive open online courses. Here the author focus on the automated support to grading open answers via a peer evaluation-based approach, which is mediated by the (partial) grading work of the teacher, and produces a (partial as well) automated grading. The author propose to support such automated grading by means of two methods, coming from the data-mining field, such as Bayesian Networks and K-Nearest Neighbours (K-NN), presenting some experimental results, which support our choices.

Details

The Future of Innovation and Technology in Education: Policies and Practices for Teaching and Learning Excellence
Type: Book
ISBN: 978-1-78756-555-5

Keywords

Book part
Publication date: 18 July 2022

Yakub Kayode Saheed, Usman Ahmad Baba and Mustafa Ayobami Raji

Purpose: This chapter aims to examine machine learning (ML) models for predicting credit card fraud (CCF).Need for the study: With the advance of technology, the world is…

Abstract

Purpose: This chapter aims to examine machine learning (ML) models for predicting credit card fraud (CCF).

Need for the study: With the advance of technology, the world is increasingly relying on credit cards rather than cash in daily life. This creates a slew of new opportunities for fraudulent individuals to abuse these cards. As of December 2020, global card losses reached $28.65billion, up 2.9% from $27.85 billion in 2018, according to the Nilson 2019 research. To safeguard the safety of credit card users, the credit card issuer should include a service that protects customers from potential risks. CCF has become a severe threat as internet buying has grown. To this goal, various studies in the field of automatic and real-time fraud detection are required. Due to their advantageous properties, the most recent ones employ a variety of ML algorithms and techniques to construct a well-fitting model to detect fraudulent transactions. When it comes to recognising credit card risk is huge and high-dimensional data, feature selection (FS) is critical for improving classification accuracy and fraud detection.

Methodology/design/approach: The objectives of this chapter are to construct a new model for credit card fraud detection (CCFD) based on principal component analysis (PCA) for FS and using supervised ML techniques such as K-nearest neighbour (KNN), ridge classifier, gradient boosting, quadratic discriminant analysis, AdaBoost, and random forest for classification of fraudulent and legitimate transactions. When compared to earlier experiments, the suggested approach demonstrates a high capacity for detecting fraudulent transactions. To be more precise, our model’s resilience is constructed by integrating the power of PCA for determining the most useful predictive features. The experimental analysis was performed on German credit card and Taiwan credit card data sets.

Findings: The experimental findings revealed that the KNN achieved an accuracy of 96.29%, recall of 100%, and precision of 96.29%, which is the best performing model on the German data set. While the ridge classifier was the best performing model on Taiwan Credit data with an accuracy of 81.75%, recall of 34.89, and precision of 66.61%.

Practical implications: The poor performance of the models on the Taiwan data revealed that it is an imbalanced credit card data set. The comparison of our proposed models with state-of-the-art credit card ML models showed that our results were competitive.

Book part
Publication date: 30 September 2020

Suryakanthi Tangirala

With the advent of Big Data, the ability to store and use the unprecedented amount of clinical information is now feasible via Electronic Health Records (EHRs). The massive…

Abstract

With the advent of Big Data, the ability to store and use the unprecedented amount of clinical information is now feasible via Electronic Health Records (EHRs). The massive collection of clinical data by health care systems and treatment canters can be productively used to perform predictive analytics on treatment plans to improve patient health outcomes. These massive data sets have stimulated opportunities to adapt computational algorithms to track and identify target areas for quality improvement in health care.

According to a report from Association of American Medical Colleges, there will be an alarming gap between demand and supply of health care work force in near future. The projections show that, by 2032 there is will be a shortfall of between 46,900 and 121,900 physicians in US (AAMC, 2019). Therefore, early prediction of health care risks is a demanding requirement to improve health care quality and reduce health care costs. Predictive analytics uses historical data and algorithms based on either statistics or machine learning to develop predictive models that capture important trends. These models have the ability to predict the likelihood of the future events. Predictive models developed using supervised machine learning approaches are commonly applied for various health care problems such as disease diagnosis, treatment selection, and treatment personalization.

This chapter provides an overview of various machine learning and statistical techniques for developing predictive models. Case examples from the extant literature are provided to illustrate the role of predictive modeling in health care research. Together with adaptation of these predictive modeling techniques with Big Data analytics underscores the need for standardization and transparency while recognizing the opportunities and challenges ahead.

Details

Big Data Analytics and Intelligence: A Perspective for Health Care
Type: Book
ISBN: 978-1-83909-099-8

Keywords

Book part
Publication date: 29 September 2023

Torben Juul Andersen

This chapter first analyzes how the data-cleaning process affects the share of missing values in the extracted European and North American datasets. It then moves on to examine…

Abstract

This chapter first analyzes how the data-cleaning process affects the share of missing values in the extracted European and North American datasets. It then moves on to examine how three different approaches to treat the issue of missing values, Complete Case, Multiple Imputation Chained Equations (MICE), and K-Nearest Neighbor (KNN) imputations affect the number of firms and their average lifespan in the datasets compared to the original sample and assessed across different SIC industry divisions. This is extended to consider implied effects on the distribution of a key performance indicator, return on assets (ROA), calculating skewness and kurtosis measures for each of the treatment methods and across industry contexts. This consistently shows highly negatively skewed distributions with high positive excess kurtosis across all the industries where the KNN imputation treatment creates results with distribution characteristics that are closest to the original untreated data. We further analyze the persistency of the (extreme) left-skewed tails measured in terms of the share of outliers and extreme outliers, which shows consistent and rather high percentages of outliers around 15% of the full sample and extreme outliers around 7.5% indicating pervasive skewness in the data. Of the three alternative approaches to deal with missing values, the KNN imputation treatment is found to be the method that generates final datasets that most closely resemble the original data even though the Complete Case approach remains the norm in mainstream studies. One consequence of this is that most empirical studies are likely to underestimate the prevalence of extreme negative performance outcomes.

Details

A Study of Risky Business Outcomes: Adapting to Strategic Disruption
Type: Book
ISBN: 978-1-83797-074-2

Keywords

1 – 10 of 73