Search results

1 – 10 of over 13000
Open Access
Article
Publication date: 17 December 2019

Yingjie Yang, Sifeng Liu and Naiming Xie

The purpose of this paper is to propose a framework for data analytics where everything is grey in nature and the associated uncertainty is considered as an essential part in data

1270

Abstract

Purpose

The purpose of this paper is to propose a framework for data analytics where everything is grey in nature and the associated uncertainty is considered as an essential part in data collection, profiling, imputation, analysis and decision making.

Design/methodology/approach

A comparative study is conducted between the available uncertainty models and the feasibility of grey systems is highlighted. Furthermore, a general framework for the integration of grey systems and grey sets into data analytics is proposed.

Findings

Grey systems and grey sets are useful not only for small data, but also big data as well. It is complementary to other models and can play a significant role in data analytics.

Research limitations/implications

The proposed framework brings a radical change in data analytics. It may bring a fundamental change in our way to deal with uncertainties.

Practical implications

The proposed model has the potential to avoid the mistake from a misleading data imputation.

Social implications

The proposed model takes the philosophy of grey systems in recognising the limitation of our knowledge which has significant implications in our way to deal with our social life and relations.

Originality/value

This is the first time that the whole data analytics is considered from the point of view of grey systems.

Details

Marine Economics and Management, vol. 2 no. 2
Type: Research Article
ISSN: 2516-158X

Keywords

Open Access
Article
Publication date: 21 February 2022

Héctor Rubén Morales, Marcela Porporato and Nicolas Epelbaum

The technical feasibility of using Benford's law to assist internal auditors in reviewing the integrity of high-volume data sets is analysed. This study explores whether Benford's…

2565

Abstract

Purpose

The technical feasibility of using Benford's law to assist internal auditors in reviewing the integrity of high-volume data sets is analysed. This study explores whether Benford's distribution applies to the set of numbers represented by the quantity of records (size) that comprise the different tables that make up a state-owned enterprise's (SOE) enterprise resource planning (ERP) relational database. The use of Benford's law streamlines the search for possible abnormalities within the ERP system's data set, increasing the ability of the internal audit functions (IAFs) to detect anomalies within the database. In the SOEs of emerging economies, where groups compete for power and resources, internal auditors are better off employing analytical tests to discharge their duties without getting involved in power struggles.

Design/methodology/approach

Records of eight databases of an SOE in Argentina are used to analyse the number of records of each table in periods of three to 12 years. The case develops step-by-step Benford's law application to test each ERP module records using Chi-squared (χ²) and mean absolute deviation (MAD) goodness-of-fit tests.

Findings

Benford's law is an adequate tool for performing integrity tests of high-volume databases. A minimum of 350 tables within each database are required for the MAD test to be effective; this threshold is higher than the 67 reported by earlier researches. Robust results are obtained for the complete ERP system and for large modules; modules with less than 350 tables show low conformity with Benford's law.

Research limitations/implications

This study is not about detecting fraud; it aims to help internal auditors red flag databases that will need further attention, making the most out of available limited resources in SOEs. The contribution is a simple, cheap and useful quantitative tool that can be employed by internal auditors in emerging economies to perform the first scan of the data contained in relational databases.

Practical implications

This paper provides a tool to test whether large amounts of data behave as expected, and if not, they can be pinpointed for future investigation. It offers tests and explanations on the tool's application so that internal auditors of SOEs in emerging economies can use it, particularly those that face divergent expectations from antagonist powerful interest groups.

Originality/value

This study demonstrates that even in the context of limited information technology tools available for internal auditors, there are simple and inexpensive tests to review the integrity of high-volume databases. It also extends the literature on high-volume database integrity tests and our knowledge of the IAF in Civil law countries, particularly emerging economies in Latin America.

Details

Journal of Economics, Finance and Administrative Science, vol. 27 no. 53
Type: Research Article
ISSN: 2218-0648

Keywords

Content available
Article
Publication date: 1 March 2005

Karl Wennberg

This article provides an account of how databases can be effectively used in entrepreneurship research. Improved quality and access to large secondary databases offer paths to…

1263

Abstract

This article provides an account of how databases can be effectively used in entrepreneurship research. Improved quality and access to large secondary databases offer paths to answer questions of great theoretical value. I present an overview of theoretical, methodological, and practical difficulties in working with database data, together with advice on how such difficulties can be overcome. Conclusions are given, together with suggestions of areas where databases might provide real and important contributions to entrepreneurship research.

Details

New England Journal of Entrepreneurship, vol. 8 no. 2
Type: Research Article
ISSN: 2574-8904

Open Access
Article
Publication date: 12 March 2018

Hafiz A. Alaka, Lukumon O. Oyedele, Hakeem A. Owolabi, Muhammad Bilal, Saheed O. Ajayi and Olugbenga O. Akinade

This study explored use of big data analytics (BDA) to analyse data of a large number of construction firms to develop a construction business failure prediction model (CB-FPM)…

Abstract

This study explored use of big data analytics (BDA) to analyse data of a large number of construction firms to develop a construction business failure prediction model (CB-FPM). Careful analysis of literature revealed financial ratios as the best form of variable for this problem. Because of MapReduce’s unsuitability for iteration problems involved in developing CB-FPMs, various BDA initiatives for iteration problems were identified. A BDA framework for developing CB-FPM was proposed. It was validated by using 150,000 datacells of 30,000 construction firms, artificial neural network, Amazon Elastic Compute Cloud, Apache Spark and the R software. The BDA CB-FPM was developed in eight seconds while the same process without BDA was aborted after nine hours without success. This shows the issue of not wanting to use large dataset to develop CB-FPM due to tedious duration is resolvable by applying BDA technique. The BDA CB-FPM largely outperformed an ordinary CB-FPM developed with a dataset of 200 construction firms, proving that use of larger sample size with the aid of BDA, leads to better performing CB-FPMs. The high financial and social cost associated with misclassifications (i.e. model error) thus makes adoption of BDA CB-FPMs very important for, among others, financiers, clients and policy makers.

Details

Applied Computing and Informatics, vol. 16 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 15 November 2018

Amanda Bresler

The purpose of this study is to evaluate Department of Defense (DoD)-backed innovation programs as a means of enhancing the adoption of new technology throughout the armed forces.

1342

Abstract

Purpose

The purpose of this study is to evaluate Department of Defense (DoD)-backed innovation programs as a means of enhancing the adoption of new technology throughout the armed forces.

Design/methodology/approach

The distribution of 1.29 million defense contract awards over seven years was analyzed across a data set of more than 8,000 DoD-backed innovation program award recipients. Surveys and interviews of key stakeholder groups were conducted to contextualize the quantitative results and garner additional insights.

Findings

Nearly half of DoD innovation program participants achieve no meaningful growth in direct defense business after program completion, and most small, innovative companies that win follow-on defense contracts solely support their initial sponsor branch. Causes for these program failures include the fact that programs do not market participants’ capabilities to the defense community and do not track participant companies after program completion.

Practical implications

Because the DoD does not market the capabilities of its innovation program participants internally, prospective DoD customers conduct redundant market research or fail to modernize. Program participants become increasingly unwilling to invest in the DoD market long term after the programs fail to deliver their expected benefits.

Originality/value

Limited scholarship evaluates the efficacy of DoD-backed innovation programs as a means of enhancing force readiness. This research not only uses a vast data set to demonstrate the failures of these programs but also presents concrete recommendations for improving them – including establishing an “Innovators Database” to track program participants and an incentive to encourage contracting entities and contractors to engage with them.

Details

Journal of Defense Analytics and Logistics, vol. 2 no. 2
Type: Research Article
ISSN: 2399-6439

Keywords

Content available
Article
Publication date: 10 May 2021

Zachary Hornberger, Bruce Cox and Raymond R. Hill

Large/stochastic spatiotemporal demand data sets can prove intractable for location optimization problems, motivating the need for aggregation. However, demand aggregation induces…

Abstract

Purpose

Large/stochastic spatiotemporal demand data sets can prove intractable for location optimization problems, motivating the need for aggregation. However, demand aggregation induces errors. Significant theoretical research has been performed related to the modifiable areal unit problem and the zone definition problem. Minimal research has been accomplished related to the specific issues inherent to spatiotemporal demand data, such as search and rescue (SAR) data. This study provides a quantitative comparison of various aggregation methodologies and their relation to distance and volume based aggregation errors.

Design/methodology/approach

This paper introduces and applies a framework for comparing both deterministic and stochastic aggregation methods using distance- and volume-based aggregation error metrics. This paper additionally applies weighted versions of these metrics to account for the reality that demand events are nonhomogeneous. These metrics are applied to a large, highly variable, spatiotemporal demand data set of SAR events in the Pacific Ocean. Comparisons using these metrics are conducted between six quadrat aggregations of varying scales and two zonal distribution models using hierarchical clustering.

Findings

As quadrat fidelity increases the distance-based aggregation error decreases, while the two deliberate zonal approaches further reduce this error while using fewer zones. However, the higher fidelity aggregations detrimentally affect volume error. Additionally, by splitting the SAR data set into training and test sets this paper shows the stochastic zonal distribution aggregation method is effective at simulating actual future demands.

Originality/value

This study indicates no singular best aggregation method exists, by quantifying trade-offs in aggregation-induced errors practitioners can utilize the method that minimizes errors most relevant to their study. Study also quantifies the ability of a stochastic zonal distribution method to effectively simulate future demand data.

Details

Journal of Defense Analytics and Logistics, vol. 5 no. 1
Type: Research Article
ISSN: 2399-6439

Keywords

Content available
Article
Publication date: 8 July 2022

Vania Vidal, Valéria Magalhães Pequeno, Narciso Moura Arruda Júnior and Marco Antonio Casanova

Enterprise knowledge graphs (EKG) in resource description framework (RDF) consolidate and semantically integrate heterogeneous data sources into a comprehensive dataspace…

Abstract

Purpose

Enterprise knowledge graphs (EKG) in resource description framework (RDF) consolidate and semantically integrate heterogeneous data sources into a comprehensive dataspace. However, to make an external relational data source accessible through an EKG, an RDF view of the underlying relational database, called an RDB2RDF view, must be created. The RDB2RDF view should be materialized in situations where live access to the data source is not possible, or the data source imposes restrictions on the type of query forms and the number of results. In this case, a mechanism for maintaining the materialized view data up-to-date is also required. The purpose of this paper is to address the problem of the efficient maintenance of externally materialized RDB2RDF views.

Design/methodology/approach

This paper proposes a formal framework for the incremental maintenance of externally materialized RDB2RDF views, in which the server computes and publishes changesets, indicating the difference between the two states of the view. The EKG system can then download the changesets and synchronize the externally materialized view. The changesets are computed based solely on the update and the source database state and require no access to the content of the view.

Findings

The central result of this paper shows that changesets computed according to the formal framework correctly maintain the externally materialized RDB2RDF view. The experiments indicate that the proposed strategy supports live synchronization of large RDB2RDF views and that the time taken to compute the changesets with the proposed approach was almost three orders of magnitude smaller than partial rematerialization and three orders of magnitude smaller than full rematerialization.

Originality/value

The main idea that differentiates the proposed approach from previous work on incremental view maintenance is to explore the object-preserving property of typical RDB2RDF views so that the solution can deal with views with duplicates. The algorithms for the incremental maintenance of relational views with duplicates published in the literature require querying the materialized view data to precisely compute the changesets. By contrast, the approach proposed in this paper requires no access to view data. This is important when the view is maintained externally, because accessing a remote data source may be too slow.

Details

International Journal of Web Information Systems, vol. 18 no. 5/6
Type: Research Article
ISSN: 1744-0084

Keywords

Open Access
Book part
Publication date: 1 December 2022

Clemens Striebing

Purpose: The study elaborates the contextual conditions of the academic workplace in which gender, age, and nationality considerably influence the likelihood of…

Abstract

Purpose: The study elaborates the contextual conditions of the academic workplace in which gender, age, and nationality considerably influence the likelihood of self-categorization as being affected by workplace bullying. Furthermore, the intersectionality of these sociodemographic characteristics is examined.

Basic Design: The hypotheses underlying the study were mainly derived from the social role, social identity, and cultural distance theory, as well as from role congruity and relative deprivation theory. A survey data set of a large German research organization, the Max Planck Society, was used. A total of 3,272 cases of researchers and 2,995 cases of non-scientific employees were included in the analyses performed. For both groups of employees, binary logistic regression equations were constructed. the outcome of each equation is the estimated percentage of individuals who reported themselves as having experienced bullying at work occasionally or more frequently in the 12 months prior to the survey. The predictors are the demographic and organization-specific characteristics (hierarchical position, scientific field, administrative unit) of the respondents and selected interaction terms. Using regression equations, hypothetically relevant conditional marginal means and differences in regression parameters were calculated and compared by means of t-tests.

Results: In particular, the gender-related hypotheses of the study could be completely or conditionally verified. Accordingly, female scientific and non-scientific employees showed a higher bullying vulnerability in (almost) all contexts of the academic workplace. An increased bullying vulnerability was also found for foreign researchers. However, the patterns found here contradicted those that were hypothesized. Concerning the effect of age analyzed for non-scientific personnel, especially the age group 45–59 years showed a higher bullying probability, with the gender gap in bullying vulnerability being greatest for the youngest and oldest age groups in the sample.

Interpre4tation and Relevance: The results of the study especially support the social identity theory regarding gender. In the sample studied, women in minority positions have a higher vulnerability to bullying in their work fields, which is not the case for men. However, the influence of nationality on bullying vulnerability is more complex. The study points to the further development of cultural distance theory, whose hypotheses are only partly able to explain the results. The evidence for social role theory is primarily seen in the interaction of gender with age and hierarchical level. Accordingly, female early career researchers and young women (and women in the oldest age group) on the non-scientific staff presumably experience a masculine workplace. Thus, the results of the study contradict the role congruity theory.

Details

Diversity and Discrimination in Research Organizations
Type: Book
ISBN: 978-1-80117-959-1

Keywords

Open Access
Article
Publication date: 3 February 2020

Kai Zheng, Xianjun Yang, Yilei Wang, Yingjie Wu and Xianghan Zheng

The purpose of this paper is to alleviate the problem of poor robustness and over-fitting caused by large-scale data in collaborative filtering recommendation algorithms.

Abstract

Purpose

The purpose of this paper is to alleviate the problem of poor robustness and over-fitting caused by large-scale data in collaborative filtering recommendation algorithms.

Design/methodology/approach

Interpreting user behavior from the probabilistic perspective of hidden variables is helpful to improve robustness and over-fitting problems. Constructing a recommendation network by variational inference can effectively solve the complex distribution calculation in the probabilistic recommendation model. Based on the aforementioned analysis, this paper uses variational auto-encoder to construct a generating network, which can restore user-rating data to solve the problem of poor robustness and over-fitting caused by large-scale data. Meanwhile, for the existing KL-vanishing problem in the variational inference deep learning model, this paper optimizes the model by the KL annealing and Free Bits methods.

Findings

The effect of the basic model is considerably improved after using the KL annealing or Free Bits method to solve KL vanishing. The proposed models evidently perform worse than competitors on small data sets, such as MovieLens 1 M. By contrast, they have better effects on large data sets such as MovieLens 10 M and MovieLens 20 M.

Originality/value

This paper presents the usage of the variational inference model for collaborative filtering recommendation and introduces the KL annealing and Free Bits methods to improve the basic model effect. Because the variational inference training denotes the probability distribution of the hidden vector, the problem of poor robustness and overfitting is alleviated. When the amount of data is relatively large in the actual application scenario, the probability distribution of the fitted actual data can better represent the user and the item. Therefore, using variational inference for collaborative filtering recommendation is of practical value.

Details

International Journal of Crowd Science, vol. 4 no. 1
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 31 July 2023

Daniel Šandor and Marina Bagić Babac

Sarcasm is a linguistic expression that usually carries the opposite meaning of what is being said by words, thus making it difficult for machines to discover the actual meaning…

2858

Abstract

Purpose

Sarcasm is a linguistic expression that usually carries the opposite meaning of what is being said by words, thus making it difficult for machines to discover the actual meaning. It is mainly distinguished by the inflection with which it is spoken, with an undercurrent of irony, and is largely dependent on context, which makes it a difficult task for computational analysis. Moreover, sarcasm expresses negative sentiments using positive words, allowing it to easily confuse sentiment analysis models. This paper aims to demonstrate the task of sarcasm detection using the approach of machine and deep learning.

Design/methodology/approach

For the purpose of sarcasm detection, machine and deep learning models were used on a data set consisting of 1.3 million social media comments, including both sarcastic and non-sarcastic comments. The data set was pre-processed using natural language processing methods, and additional features were extracted and analysed. Several machine learning models, including logistic regression, ridge regression, linear support vector and support vector machines, along with two deep learning models based on bidirectional long short-term memory and one bidirectional encoder representations from transformers (BERT)-based model, were implemented, evaluated and compared.

Findings

The performance of machine and deep learning models was compared in the task of sarcasm detection, and possible ways of improvement were discussed. Deep learning models showed more promise, performance-wise, for this type of task. Specifically, a state-of-the-art model in natural language processing, namely, BERT-based model, outperformed other machine and deep learning models.

Originality/value

This study compared the performance of the various machine and deep learning models in the task of sarcasm detection using the data set of 1.3 million comments from social media.

Details

Information Discovery and Delivery, vol. 52 no. 2
Type: Research Article
ISSN: 2398-6247

Keywords

1 – 10 of over 13000