Search results

1 – 4 of 4
Open Access
Article
Publication date: 12 March 2018

Hafiz A. Alaka, Lukumon O. Oyedele, Hakeem A. Owolabi, Muhammad Bilal, Saheed O. Ajayi and Olugbenga O. Akinade

This study explored use of big data analytics (BDA) to analyse data of a large number of construction firms to develop a construction business failure prediction model (CB-FPM)…

Abstract

This study explored use of big data analytics (BDA) to analyse data of a large number of construction firms to develop a construction business failure prediction model (CB-FPM). Careful analysis of literature revealed financial ratios as the best form of variable for this problem. Because of MapReduce’s unsuitability for iteration problems involved in developing CB-FPMs, various BDA initiatives for iteration problems were identified. A BDA framework for developing CB-FPM was proposed. It was validated by using 150,000 datacells of 30,000 construction firms, artificial neural network, Amazon Elastic Compute Cloud, Apache Spark and the R software. The BDA CB-FPM was developed in eight seconds while the same process without BDA was aborted after nine hours without success. This shows the issue of not wanting to use large dataset to develop CB-FPM due to tedious duration is resolvable by applying BDA technique. The BDA CB-FPM largely outperformed an ordinary CB-FPM developed with a dataset of 200 construction firms, proving that use of larger sample size with the aid of BDA, leads to better performing CB-FPMs. The high financial and social cost associated with misclassifications (i.e. model error) thus makes adoption of BDA CB-FPMs very important for, among others, financiers, clients and policy makers.

Details

Applied Computing and Informatics, vol. 16 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 3 August 2020

Maryam AlJame and Imtiaz Ahmad

The evolution of technologies has unleashed a wealth of challenges by generating massive amount of data. Recently, biological data has increased exponentially, which has…

1156

Abstract

The evolution of technologies has unleashed a wealth of challenges by generating massive amount of data. Recently, biological data has increased exponentially, which has introduced several computational challenges. DNA short read alignment is an important problem in bioinformatics. The exponential growth in the number of short reads has increased the need for an ideal platform to accelerate the alignment process. Apache Spark is a cluster-computing framework that involves data parallelism and fault tolerance. In this article, we proposed a Spark-based algorithm to accelerate DNA short reads alignment problem, and it is called Spark-DNAligning. Spark-DNAligning exploits Apache Spark ’s performance optimizations such as broadcast variable, join after partitioning, caching, and in-memory computations. Spark-DNAligning is evaluated in term of performance by comparing it with SparkBWA tool and a MapReduce based algorithm called CloudBurst. All the experiments are conducted on Amazon Web Services (AWS). Results demonstrate that Spark-DNAligning outperforms both tools by providing a speedup in the range of 101–702 in aligning gigabytes of short reads to the human genome. Empirical evaluation reveals that Apache Spark offers promising solutions to DNA short reads alignment problem.

Details

Applied Computing and Informatics, vol. 19 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 6 August 2021

Alexander Döschl, Max-Emanuel Keller and Peter Mandl

This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing…

Abstract

Purpose

This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and resilient distributed data set (RDD) (Apache Spark) paradigms and a graphics processing unit (GPU) approach with Numba for compute unified device architecture (CUDA).

Design/methodology/approach

The paper uses a simple but computationally intensive puzzle as a case study for experiments. To find all solutions using brute force search, 15! permutations had to be computed and tested against the solution rules. The experimental application comprises a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and RDD (Apache Spark) paradigms and a GPU approach with Numba for CUDA. The implementations were benchmarked on Amazon-EC2 instances for performance and scalability measurements.

Findings

The comparison of the solutions with Apache Hadoop and Apache Spark under Amazon EMR showed that the processing time measured in CPU minutes with Spark was up to 30% lower, while the performance of Spark especially benefits from an increasing number of tasks. With the CUDA implementation, more than 16 times faster execution is achievable for the same price compared to the Spark solution. Apart from the multi-threaded implementation, the processing times of all solutions scale approximately linearly. Finally, several application suggestions for the different parallelization approaches are derived from the insights of this study.

Originality/value

There are numerous studies that have examined the performance of parallelization approaches. Most of these studies deal with processing large amounts of data or mathematical problems. This work, in contrast, compares these technologies on their ability to implement computationally intensive distributed algorithms.

Details

International Journal of Web Information Systems, vol. 17 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 30 September 2021

Narender Kumar, Girish Kumar and Rajesh Kr Singh

The study presents various barriers to adopt big data analytics (BDA) for sustainable manufacturing operations (SMOs) post-coronavirus disease (COVID-19) pandemics. In this study…

Abstract

Purpose

The study presents various barriers to adopt big data analytics (BDA) for sustainable manufacturing operations (SMOs) post-coronavirus disease (COVID-19) pandemics. In this study, 17 barriers are identified through extensive literature review and experts’ opinions for investing in BDA implementation. A questionnaire-based survey is conducted to collect responses from experts. The identified barriers are grouped into three categories with the help of factor analysis. These are organizational barriers, data management barriers and human barriers. For the quantification of barriers, the graph theory matrix approach (GTMA) is applied.

Design/methodology/approach

The study presents various barriers to adopt BDA for the SMOs post-COVID-19 pandemic. In this study, 17 barriers are identified through extensive literature review and experts’ opinions for investing in BDA implementation. A questionnaire-based survey is conducted to collect responses from experts. The identified barriers are grouped into three categories with the help of factor analysis. These are organizational barriers, data management barriers and human barriers. For the quantification of barriers, the GTMA is applied.

Findings

The study identifies barriers to investment in BDA implementation. It categorizes the barriers based on factor analysis and computes the intensity for each category of a barrier for BDA investment for SMOs. It is observed that the organizational barriers have the highest intensity whereas the human barriers have the smallest intensity.

Practical implications

This study may help organizations to take strategic decisions for investing in BDA applications for achieving one of the sustainable development goals. Organizations should prioritize their efforts first to counter the barriers under the category of organizational barriers followed by barriers in data management and human barriers.

Originality/value

The novelty of this paper is that barriers to BDA investment for SMOs in the context of Indian manufacturing organizations have been analyzed. The findings of the study will assist the professionals and practitioners in formulating policies based on the actual nature and intensity of the barriers.

Details

Journal of Enterprise Information Management, vol. 35 no. 1
Type: Research Article
ISSN: 1741-0398

Keywords

1 – 4 of 4