Search results
1 – 10 of over 195000Renze Zhou, Zhiguo Xing, Haidou Wang, Zhongyu Piao, Yanfei Huang, Weiling Guo and Runbo Ma
With the development of deep learning-based analytical techniques, increased research has focused on fatigue data analysis methods based on deep learning, which are gaining in…
Abstract
Purpose
With the development of deep learning-based analytical techniques, increased research has focused on fatigue data analysis methods based on deep learning, which are gaining in popularity. However, the application of deep neural networks in the material science domain is mainly inhibited by data availability. In this paper, to overcome the difficulty of multifactor fatigue life prediction with small data sets,
Design/methodology/approach
A multiple neural network ensemble (MNNE) is used, and an MNNE with a general and flexible explicit function is developed to accurately quantify the complicated relationships hidden in multivariable data sets. Moreover, a variational autoencoder-based data generator is trained with small sample sets to expand the size of the training data set. A comparative study involving the proposed method and traditional models is performed. In addition, a filtering rule based on the R2 score is proposed and applied in the training process of the MNNE, and this approach has a beneficial effect on the prediction accuracy and generalization ability.
Findings
A comparative study involving the proposed method and traditional models is performed. The comparative experiment confirms that the use of hybrid data can improve the accuracy and generalization ability of the deep neural network and that the MNNE outperforms support vector machines, multilayer perceptron and deep neural network models based on the goodness of fit and robustness in the small sample case.
Practical implications
The experimental results imply that the proposed algorithm is a sophisticated and promising multivariate method for predicting the contact fatigue life of a coating when data availability is limited.
Originality/value
A data generated model based on variational autoencoder was used to make up lack of data. An MNNE method was proposed to apply in the small data case of fatigue life prediction.
Details
Keywords
Yingjie Yang, Sifeng Liu and Naiming Xie
The purpose of this paper is to propose a framework for data analytics where everything is grey in nature and the associated uncertainty is considered as an essential part in data…
Abstract
Purpose
The purpose of this paper is to propose a framework for data analytics where everything is grey in nature and the associated uncertainty is considered as an essential part in data collection, profiling, imputation, analysis and decision making.
Design/methodology/approach
A comparative study is conducted between the available uncertainty models and the feasibility of grey systems is highlighted. Furthermore, a general framework for the integration of grey systems and grey sets into data analytics is proposed.
Findings
Grey systems and grey sets are useful not only for small data, but also big data as well. It is complementary to other models and can play a significant role in data analytics.
Research limitations/implications
The proposed framework brings a radical change in data analytics. It may bring a fundamental change in our way to deal with uncertainties.
Practical implications
The proposed model has the potential to avoid the mistake from a misleading data imputation.
Social implications
The proposed model takes the philosophy of grey systems in recognising the limitation of our knowledge which has significant implications in our way to deal with our social life and relations.
Originality/value
This is the first time that the whole data analytics is considered from the point of view of grey systems.
Details
Keywords
R. Dale Wilson and Harriette Bettis-Outland
Artificial neural network (ANN) models, part of the discipline of machine learning and artificial intelligence, are becoming more popular in the marketing literature and in…
Abstract
Purpose
Artificial neural network (ANN) models, part of the discipline of machine learning and artificial intelligence, are becoming more popular in the marketing literature and in marketing practice. This paper aims to provide a series of tests between ANN models and competing predictive models.
Design/methodology/approach
A total of 46 pairs of models were evaluated in an objective model-building environment. Either logistic regression or multiple regression models were developed and then were compared to ANN models using the same set of input variables. Three sets of B2B data were used to test the models. Emphasis also was placed on evaluating small samples.
Findings
ANN models tend to generate model predictions that are more accurate or the same as logistic regression models. However, when ANN models are compared to multiple regression models, the results are mixed. For small sample sizes, the modeling results are the same as for larger samples.
Research limitations/implications
Like all marketing research, this application is limited by the methods and the data used to conduct the research. The findings strongly suggest that, because of their predictive accuracy, ANN models will have an important role in the future of B2B marketing research and model-building applications.
Practical implications
ANN models should be carefully considered for potential use in marketing research and model-building applications by B2B academics and practitioners alike.
Originality/value
The research contributes to the B2B marketing literature by providing a more rigorous test on ANN models using B2B data than has been conducted before.
Details
Keywords
Kun-Huang Huarng and Tiffany Hui-Kuang Yu
The use of linear regression analysis is common in the social sciences. The purpose of this paper is to show the advantage of a qualitative research method, namely, structured…
Abstract
Purpose
The use of linear regression analysis is common in the social sciences. The purpose of this paper is to show the advantage of a qualitative research method, namely, structured qualitative analysis (SQA), over the linear regression method by using different characteristics of data.
Design/methodology/approach
Data were gathered from a study of online consumer behavior in Taiwan. The authors changed the content of the data to have different sets of data. These data sets were used to demonstrate how SQA and linear regression works individually, and to contrast the empirical analyses and empirical results from linear regression and SQA.
Findings
The linear regression method uses one equation to model different characteristics of data. When facing a data set containing a big and a small size of different characteristics, linear regression tends to provide an equation by modeling the characteristics of the big size data and subsuming those of the small size. When facing a data set containing similar sizes of data with different characteristics, linear regression tends to provide an equation by averaging these data. The major concern is that the one equation may not be able to reflect the data of various characteristics (different values of independent variables) that result in the same outcome (the same value of dependent variable). In contrast, SQA can identify various variable combinations (multiple relationships) leading to the same outcome. SQA provided multiple relationships to represent different sizes of data with different characteristics so it created consistent empirical results.
Research limitations/implications
Two research methods work differently. The popular linear regression tends to use one equation to model different sizes and characteristics of data. The single equation may not be able to cover different behaviors but may lead to the same outcome. Instead, SQA provides multiple relationships for different sizes of data with different characteristics. The analyses are more consistent and the results are more appropriate. The academics may re-think the existing literature using linear regression. It would be interesting to see if there are new findings for similar problems by using SQA. The practitioners have a new method to model real world problems and to understand different possible combinations of variables leading to the same outcome. Even the relationship obtained from a small data set may be very valuable to practitioners.
Originality/value
This paper compared online consumer behavior by using two research methods to analyze different data sets. The paper offered the manipulation of real data sets to create different data sizes of different characteristics. The variations in empirical results from both methods due to the various data sets facilitate the comparison of both methods. Hence, this paper can serve as a complement to the existing literature, focusing on the justification of research methods and on limitations of linear regression.
Details
Keywords
Erion Çano and Maurizio Morisio
The fabulous results of convolution neural networks in image-related tasks attracted attention of text mining, sentiment analysis and other text analysis researchers. It is…
Abstract
Purpose
The fabulous results of convolution neural networks in image-related tasks attracted attention of text mining, sentiment analysis and other text analysis researchers. It is, however, difficult to find enough data for feeding such networks, optimize their parameters, and make the right design choices when constructing network architectures. The purpose of this paper is to present the creation steps of two big data sets of song emotions. The authors also explore usage of convolution and max-pooling neural layers on song lyrics, product and movie review text data sets. Three variants of a simple and flexible neural network architecture are also compared.
Design/methodology/approach
The intention was to spot any important patterns that can serve as guidelines for parameter optimization of similar models. The authors also wanted to identify architecture design choices which lead to high performing sentiment analysis models. To this end, the authors conducted a series of experiments with neural architectures of various configurations.
Findings
The results indicate that parallel convolutions of filter lengths up to 3 are usually enough for capturing relevant text features. Also, max-pooling region size should be adapted to the length of text documents for producing the best feature maps.
Originality/value
Top results the authors got are obtained with feature maps of lengths 6–18. An improvement on future neural network models for sentiment analysis could be generating sentiment polarity prediction of documents using aggregation of predictions on smaller excerpt of the entire text.
Details
Keywords
Gregory E. Smith and Cliff T. Ragsdale
Several prominent data-mining studies have evaluated the performance of neural networks (NNs) against traditional statistical methods on the two-group classification problem in…
Abstract
Several prominent data-mining studies have evaluated the performance of neural networks (NNs) against traditional statistical methods on the two-group classification problem in discriminant analysis. Although NNs often outperform traditional statistical methods, their performance can be hindered because of failings in the use of training data. This problem is particularly acute when using NNs on smaller data sets. A heuristic is presented that utilizes Mahalanobis distance measures (MDM) to deterministically partition training data so that the resulting NN models are less prone to overfitting. We show this heuristic produces classification results that are more accurate, on average, than traditional NNs and MDM.
Rokas Jurevičius and Virginijus Marcinkevičius
The purpose of this paper is to present a new data set of aerial imagery from robotics simulator (AIR). AIR data set aims to provide a starting point for localization system…
Abstract
Purpose
The purpose of this paper is to present a new data set of aerial imagery from robotics simulator (AIR). AIR data set aims to provide a starting point for localization system development and to become a typical benchmark for accuracy comparison of map-based localization algorithms, visual odometry and SLAM for high-altitude flights.
Design/methodology/approach
The presented data set contains over 100,000 aerial images captured from Gazebo robotics simulator using orthophoto maps as a ground plane. Flights with three different trajectories are performed on maps from urban and forest environment at different altitudes, totaling over 33 kilometers of flight distance.
Findings
The review of previous research studies show that the presented data set is the largest currently available public data set with downward facing camera imagery.
Originality/value
This paper presents the problem of missing publicly available data sets for high-altitude (100‒3,000 meters) UAV flights; the current state-of-the-art research studies performed to develop map-based localization system for UAVs depend on real-life test flights and custom-simulated data sets for accuracy evaluation of the algorithms. The presented new data set solves this problem and aims to help the researchers to improve and benchmark new algorithms for high-altitude flights.
Details
Keywords
Despite the potential of Big Data analytics, the analysis of Micro Data represents the main way of forecasting the expected values of recorded amounts and/or ratios for small…
Abstract
Purpose
Despite the potential of Big Data analytics, the analysis of Micro Data represents the main way of forecasting the expected values of recorded amounts and/or ratios for small auditing firms and certified public accountants dealing with analytical procedures. This study aims to examine how effective Micro Data analytics are by testing the forecast accuracy of the ratio of the allowance for doubtful accounts to the trade accounts receivable and the natural logarithm of the net sales of goods and services, the first exposed to a greater uncertainty than the second.
Design/methodology/approach
Micro Data are low in volume, variety, velocity and variability, but high in veracity. Given the over-fitting problems affecting Micro Data analytics, the in-sample and out-of-sample forecasts were made for both tests. Multiple regression and neural network models were performed using a sample of 35 Italian industrial listed companies.
Findings
The accuracy level of the forecasting models was found in terms of mean absolute percentage error and other accuracy measures. The neural network model provided more accurate forecasts than multiple regression in both tests, showing a higher accuracy level for the amounts exposed to less uncertainty. Moreover, no generalized conclusions on predictors included in the models could be drawn.
Practical implications
The examination of forecast accuracy helps auditors to evaluate whether analytical procedures can be successfully applied to detect misstatements when Micro Data are used and which model gives the most accurate forecasts.
Originality/value
This is the first study to measure the forecast accuracy of the multiple regression and neural network models performed using a Micro Data set. Forecast accuracy is crucial for evaluating the effectiveness of analytical procedures.
Details
Keywords
Luca Rampini and Fulvio Re Cecconi
This study aims to introduce a new methodology for generating synthetic images for facility management purposes. The method starts by leveraging the existing 3D open-source BIM…
Abstract
Purpose
This study aims to introduce a new methodology for generating synthetic images for facility management purposes. The method starts by leveraging the existing 3D open-source BIM models and using them inside a graphic engine to produce a photorealistic representation of indoor spaces enriched with facility-related objects. The virtual environment creates several images by changing lighting conditions, camera poses or material. Moreover, the created images are labeled and ready to be trained in the model.
Design/methodology/approach
This paper focuses on the challenges characterizing object detection models to enrich digital twins with facility management-related information. The automatic detection of small objects, such as sockets, power plugs, etc., requires big, labeled data sets that are costly and time-consuming to create. This study proposes a solution based on existing 3D BIM models to produce quick and automatically labeled synthetic images.
Findings
The paper presents a conceptual model for creating synthetic images to increase the performance in training object detection models for facility management. The results show that virtually generated images, rather than an alternative to real images, are a powerful tool for integrating existing data sets. In other words, while a base of real images is still needed, introducing synthetic images helps augment the model’s performance and robustness in covering different types of objects.
Originality/value
This study introduced the first pipeline for creating synthetic images for facility management. Moreover, this paper validates this pipeline by proposing a case study where the performance of object detection models trained on real data or a combination of real and synthetic images are compared.
Details
Keywords
Runhai Jiao, Shaolong Liu, Wu Wen and Biying Lin
The large volume of big data makes it impractical for traditional clustering algorithms which are usually designed for entire data set. The purpose of this paper is to focus on…
Abstract
Purpose
The large volume of big data makes it impractical for traditional clustering algorithms which are usually designed for entire data set. The purpose of this paper is to focus on incremental clustering which divides data into series of data chunks and only a small amount of data need to be clustered at each time. Few researches on incremental clustering algorithm address the problem of optimizing cluster center initialization for each data chunk and selecting multiple passing points for each cluster.
Design/methodology/approach
Through optimizing initial cluster centers, quality of clustering results is improved for each data chunk and then quality of final clustering results is enhanced. Moreover, through selecting multiple passing points, more accurate information is passed down to improve the final clustering results. The method has been proposed to solve those two problems and is applied in the proposed algorithm based on streaming kernel fuzzy c-means (stKFCM) algorithm.
Findings
Experimental results show that the proposed algorithm demonstrates more accuracy and better performance than streaming kernel stKFCM algorithm.
Originality/value
This paper addresses the problem of improving the performance of increment clustering through optimizing cluster center initialization and selecting multiple passing points. The paper analyzed the performance of the proposed scheme and proved its effectiveness.
Details