Search results
1 – 4 of 4Hafiz A. Alaka, Lukumon O. Oyedele, Hakeem A. Owolabi, Muhammad Bilal, Saheed O. Ajayi and Olugbenga O. Akinade
This study explored use of big data analytics (BDA) to analyse data of a large number of construction firms to develop a construction business failure prediction model (CB-FPM)…
Abstract
This study explored use of big data analytics (BDA) to analyse data of a large number of construction firms to develop a construction business failure prediction model (CB-FPM). Careful analysis of literature revealed financial ratios as the best form of variable for this problem. Because of MapReduce’s unsuitability for iteration problems involved in developing CB-FPMs, various BDA initiatives for iteration problems were identified. A BDA framework for developing CB-FPM was proposed. It was validated by using 150,000 datacells of 30,000 construction firms, artificial neural network, Amazon Elastic Compute Cloud, Apache Spark and the R software. The BDA CB-FPM was developed in eight seconds while the same process without BDA was aborted after nine hours without success. This shows the issue of not wanting to use large dataset to develop CB-FPM due to tedious duration is resolvable by applying BDA technique. The BDA CB-FPM largely outperformed an ordinary CB-FPM developed with a dataset of 200 construction firms, proving that use of larger sample size with the aid of BDA, leads to better performing CB-FPMs. The high financial and social cost associated with misclassifications (i.e. model error) thus makes adoption of BDA CB-FPMs very important for, among others, financiers, clients and policy makers.
Details
Keywords
Maryam AlJame and Imtiaz Ahmad
The evolution of technologies has unleashed a wealth of challenges by generating massive amount of data. Recently, biological data has increased exponentially, which has…
Abstract
The evolution of technologies has unleashed a wealth of challenges by generating massive amount of data. Recently, biological data has increased exponentially, which has introduced several computational challenges. DNA short read alignment is an important problem in bioinformatics. The exponential growth in the number of short reads has increased the need for an ideal platform to accelerate the alignment process. Apache Spark is a cluster-computing framework that involves data parallelism and fault tolerance. In this article, we proposed a Spark-based algorithm to accelerate DNA short reads alignment problem, and it is called Spark-DNAligning. Spark-DNAligning exploits Apache Spark ’s performance optimizations such as broadcast variable, join after partitioning, caching, and in-memory computations. Spark-DNAligning is evaluated in term of performance by comparing it with SparkBWA tool and a MapReduce based algorithm called CloudBurst. All the experiments are conducted on Amazon Web Services (AWS). Results demonstrate that Spark-DNAligning outperforms both tools by providing a speedup in the range of 101–702 in aligning gigabytes of short reads to the human genome. Empirical evaluation reveals that Apache Spark offers promising solutions to DNA short reads alignment problem.
Alexander Döschl, Max-Emanuel Keller and Peter Mandl
This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing…
Abstract
Purpose
This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and resilient distributed data set (RDD) (Apache Spark) paradigms and a graphics processing unit (GPU) approach with Numba for compute unified device architecture (CUDA).
Design/methodology/approach
The paper uses a simple but computationally intensive puzzle as a case study for experiments. To find all solutions using brute force search, 15! permutations had to be computed and tested against the solution rules. The experimental application comprises a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and RDD (Apache Spark) paradigms and a GPU approach with Numba for CUDA. The implementations were benchmarked on Amazon-EC2 instances for performance and scalability measurements.
Findings
The comparison of the solutions with Apache Hadoop and Apache Spark under Amazon EMR showed that the processing time measured in CPU minutes with Spark was up to 30% lower, while the performance of Spark especially benefits from an increasing number of tasks. With the CUDA implementation, more than 16 times faster execution is achievable for the same price compared to the Spark solution. Apart from the multi-threaded implementation, the processing times of all solutions scale approximately linearly. Finally, several application suggestions for the different parallelization approaches are derived from the insights of this study.
Originality/value
There are numerous studies that have examined the performance of parallelization approaches. Most of these studies deal with processing large amounts of data or mathematical problems. This work, in contrast, compares these technologies on their ability to implement computationally intensive distributed algorithms.
Details
Keywords
Narender Kumar, Girish Kumar and Rajesh Kr Singh
The study presents various barriers to adopt big data analytics (BDA) for sustainable manufacturing operations (SMOs) post-coronavirus disease (COVID-19) pandemics. In this study…
Abstract
Purpose
The study presents various barriers to adopt big data analytics (BDA) for sustainable manufacturing operations (SMOs) post-coronavirus disease (COVID-19) pandemics. In this study, 17 barriers are identified through extensive literature review and experts’ opinions for investing in BDA implementation. A questionnaire-based survey is conducted to collect responses from experts. The identified barriers are grouped into three categories with the help of factor analysis. These are organizational barriers, data management barriers and human barriers. For the quantification of barriers, the graph theory matrix approach (GTMA) is applied.
Design/methodology/approach
The study presents various barriers to adopt BDA for the SMOs post-COVID-19 pandemic. In this study, 17 barriers are identified through extensive literature review and experts’ opinions for investing in BDA implementation. A questionnaire-based survey is conducted to collect responses from experts. The identified barriers are grouped into three categories with the help of factor analysis. These are organizational barriers, data management barriers and human barriers. For the quantification of barriers, the GTMA is applied.
Findings
The study identifies barriers to investment in BDA implementation. It categorizes the barriers based on factor analysis and computes the intensity for each category of a barrier for BDA investment for SMOs. It is observed that the organizational barriers have the highest intensity whereas the human barriers have the smallest intensity.
Practical implications
This study may help organizations to take strategic decisions for investing in BDA applications for achieving one of the sustainable development goals. Organizations should prioritize their efforts first to counter the barriers under the category of organizational barriers followed by barriers in data management and human barriers.
Originality/value
The novelty of this paper is that barriers to BDA investment for SMOs in the context of Indian manufacturing organizations have been analyzed. The findings of the study will assist the professionals and practitioners in formulating policies based on the actual nature and intensity of the barriers.
Details