Search results

1 – 10 of 465
Article
Publication date: 6 August 2021

Alexander Döschl, Max-Emanuel Keller and Peter Mandl

This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing

Abstract

Purpose

This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and resilient distributed data set (RDD) (Apache Spark) paradigms and a graphics processing unit (GPU) approach with Numba for compute unified device architecture (CUDA).

Design/methodology/approach

The paper uses a simple but computationally intensive puzzle as a case study for experiments. To find all solutions using brute force search, 15! permutations had to be computed and tested against the solution rules. The experimental application comprises a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and RDD (Apache Spark) paradigms and a GPU approach with Numba for CUDA. The implementations were benchmarked on Amazon-EC2 instances for performance and scalability measurements.

Findings

The comparison of the solutions with Apache Hadoop and Apache Spark under Amazon EMR showed that the processing time measured in CPU minutes with Spark was up to 30% lower, while the performance of Spark especially benefits from an increasing number of tasks. With the CUDA implementation, more than 16 times faster execution is achievable for the same price compared to the Spark solution. Apart from the multi-threaded implementation, the processing times of all solutions scale approximately linearly. Finally, several application suggestions for the different parallelization approaches are derived from the insights of this study.

Originality/value

There are numerous studies that have examined the performance of parallelization approaches. Most of these studies deal with processing large amounts of data or mathematical problems. This work, in contrast, compares these technologies on their ability to implement computationally intensive distributed algorithms.

Details

International Journal of Web Information Systems, vol. 17 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 3 November 2022

Shashi Kant Ratnakar, Utpal Kiran and Deepak Sharma

Structural topology optimization is computationally expensive due to the involvement of high-resolution mesh and repetitive use of finite element analysis (FEA) for computing the…

151

Abstract

Purpose

Structural topology optimization is computationally expensive due to the involvement of high-resolution mesh and repetitive use of finite element analysis (FEA) for computing the structural response. Since FEA consumes most of the computational time in each optimization iteration, a novel GPU-based parallel strategy for FEA is presented and applied to the large-scale structural topology optimization of 3D continuum structures.

Design/methodology/approach

A matrix-free solver based on preconditioned conjugate gradient (PCG) method is proposed to minimize the computational time associated with solution of linear system of equations in FEA. The proposed solver uses an innovative strategy to utilize only symmetric half of elemental stiffness matrices for implementation of the element-by-element matrix-free solver on GPU.

Findings

Using solid isotropic material with penalization (SIMP) method, the proposed matrix-free solver is tested over three 3D structural optimization problems that are discretized using all hexahedral structured and unstructured meshes. Results show that the proposed strategy demonstrates 3.1× –3.3× speedup for the FEA solver stage and overall speedup of 2.9× –3.3× over the standard element-by-element strategy on the GPU. Moreover, the proposed strategy requires almost 1.8× less GPU memory than the standard element-by-element strategy.

Originality/value

The proposed GPU-based matrix-free element-by-element solver takes a more general approach to the symmetry concept than previous works. It stores only symmetric half of the elemental matrices in memory and performs matrix-free sparse matrix-vector multiplication (SpMV) without any inter-thread communication. A customized data storage format is also proposed to store and access only symmetric half of elemental stiffness matrices for coalesced read and write operations on GPU over the unstructured mesh.

Details

Engineering Computations, vol. 39 no. 10
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 25 October 2021

Mandeep Kaur, Rajinder Sandhu and Rajni Mohana

The purpose of this study is to verify that if applications categories are segmented and resources are allocated based on their specific category, how effective scheduling can be…

Abstract

Purpose

The purpose of this study is to verify that if applications categories are segmented and resources are allocated based on their specific category, how effective scheduling can be done?.

Design/methodology/approach

This paper proposes a scheduling framework for IoT application jobs, based upon the Quality of Service (QoS) parameters, which works at coarse grained level to select a fog environment and at fine grained level to select a fog node. Fog environment is chosen considering availability, physical distance, latency and throughput. At fine grained (node selection) level, a probability triad (C, M, G) is anticipated using Naïve Bayes algorithm which provides probability of newly submitted application job to fall in either of the categories Compute (C) intensive, Memory (M) intensive and GPU (G) intensive.

Findings

Experiment results showed that the proposed framework performed better than traditional cloud and fog computing paradigms.

Originality/value

The proposed framework combines types of applications and computation capabilities of Fog computing environment, which is not carried out to the best of knowledge of authors.

Details

International Journal of Pervasive Computing and Communications, vol. 19 no. 3
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 12 June 2017

Andre Luis Cavalcanti Bueno, Noemi de La Rocque Rodriguez and Elisa Dominguez Sotelino

The purpose of this work is to present a methodology that harnesses the computational power of multiple graphics processing units (GPUs) and hides the complexities of tuning GPU

Abstract

Purpose

The purpose of this work is to present a methodology that harnesses the computational power of multiple graphics processing units (GPUs) and hides the complexities of tuning GPU parameters from the users.

Design/methodology/approach

A methodology for auto-tuning OpenCL configuration parameters has been developed.

Findings

This described process helps simplify coding and generates a significant gain in time for each method execution.

Originality/value

Most authors develop their GPU applications for specific hardware configurations. In this work, a solution is offered to make the developed code portable to any GPU hardware.

Details

Engineering Computations, vol. 34 no. 4
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 5 January 2015

Victor U. Karthik, Sivamayam Sivasuthan, Arunasalam Rahunanthan, Ravi S. Thyagarajan, Paramsothy Jayakumar, Lalita Udpa and S. Ratnajeevan H. Hoole

Inverting electroheat problems involves synthesizing the electromagnetic arrangement of coils and geometries to realize a desired heat distribution. To this end two finite element…

Abstract

Purpose

Inverting electroheat problems involves synthesizing the electromagnetic arrangement of coils and geometries to realize a desired heat distribution. To this end two finite element problems need to be solved, first for the magnetic fields and the joule heat that the associated eddy currents generate and then, based on these heat sources, the second problem for heat distribution. This two-part problem needs to be iterated on to obtain the desired thermal distribution by optimization. Being a time consuming process, the purpose of this paper is to parallelize the process using the graphics processing unit (GPU) and the real-coded genetic algorithm, each for both speed and accuracy.

Design/methodology/approach

This coupled problem represents a heavy computational load with long wait-times for results. The GPU has recently been demonstrated to enhance the efficiency and accuracy of the finite element computations and cut down solution times. It has also been used to speedup the naturally parallel genetic algorithm. The authors use the GPU to perform coupled electroheat finite element optimization by the genetic algorithm to achieve computational efficiencies far better than those reported for a single finite element problem. In the genetic algorithm, coding objective functions in real numbers rather than binary arithmetic gives added speed and accuracy.

Findings

The feasibility of the method proposed to reduce computational time and increase accuracy is established through the simple problem of shaping a current carrying conductor so as to yield a constant temperature along a line. The authors obtained a speedup (CPU time to GPU time ratio) saturating to about 28 at a population size of 500 because of increasing communications between threads. But this far better than what is possible on a workstation.

Research limitations/implications

By using the intrinsically parallel genetic algorithm on a GPU, large complex coupled problems may be solved very quickly. The method demonstrated here without accounting for radiation and convection, may be trivially extended to more completely modeled electroheat systems. Since the primary purpose here is to establish methodology and feasibility, the thermal problem is simplified by neglecting convection and radiation. While that introduces some error, the computational procedure is still validated.

Practical implications

The methodology established has direct applications in electrical machine design, metallurgical mixing processes, and hyperthermia treatment in oncology. In these three practical application areas, the authors need to compute the exciting coil (or antenna) arrangement (current magnitude and phase) and device geometry that would accomplish a desired heat distribution to achieve mixing, reduce machine heat or burn cancerous tissue. This process presented does it more accurately and speedily.

Social implications

Particularly the above-mentioned application in oncology will alleviate human suffering through use in hyperthermia treatment planning in cancer treatment. The method presented provides scope for new commercial software development and employment.

Originality/value

Previous finite element shape optimization of coupled electroheat problems by this group used gradient methods whose difficulties are explained. Others have used analytical and circuit models in place of finite elements. This paper applies the massive parallelization possible with GPUs to the inherently parallel genetic algorithm, and extends it from single field system problems to coupled problems, and thereby realizes practicable solution times for such a computationally complex problem. Further, by using GPU computations rather than CPU, accuracy is enhanced. And then by using real number rather than binary coding for object functions, further accuracy and speed gains are realized.

Details

COMPEL: The International Journal for Computation and Mathematics in Electrical and Electronic Engineering, vol. 34 no. 1
Type: Research Article
ISSN: 0332-1649

Keywords

Article
Publication date: 25 February 2014

Guoli Ji, Yong Zeng, Zijiang Yang, Congting Ye and Jingci Yao

The time complexity of most multiple sequence alignment algorithm is O(N2) or O(N3) (N is the number of sequences). In addition, with the development of biotechnology, the amount…

Abstract

Purpose

The time complexity of most multiple sequence alignment algorithm is O(N2) or O(N3) (N is the number of sequences). In addition, with the development of biotechnology, the amount of biological sequences grows significantly. The traditional methods have some difficulties in handling large-scale sequence. The proposed Lemk_MSA method aims to reduce the time complexity, especially for large-scale sequences. At the same time, it can keep similar accuracy level compared to the traditional methods.

Design/methodology/approach

LemK_MSA converts multiple sequence alignment into corresponding 10D vector alignment by ten types of copy modes based on Lempel-Ziv. Then, it uses k-means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each group. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Moreover, for large-scale multiple sequence, Lemk_MSA proposes a GPU-based parallel way for distance matrix calculation.

Findings

Under this approach, the time efficiency to process multiple sequence alignment can be improved. The high-throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time.

Originality/value

This paper proposes a novel method with sequence vectorization for multiple sequence alignment based on Lempel-Ziv. A GPU-based parallel method has been designed for large-scale distance matrix calculation. It provides a new way for multiple sequence alignment research.

Details

Engineering Computations, vol. 31 no. 2
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 25 February 2020

Shengquan Wang, Chao Wang, Yong Cai and Guangyao Li

The purpose of this paper is to improve the computational speed of solving nonlinear dynamics by using parallel methods and mixed-precision algorithm on graphic processing units…

Abstract

Purpose

The purpose of this paper is to improve the computational speed of solving nonlinear dynamics by using parallel methods and mixed-precision algorithm on graphic processing units (GPUs). The computational efficiency of traditional central processing units (CPUs)-based computer aided engineering software has been difficult to satisfy the needs of scientific research and practical engineering, especially for nonlinear dynamic problems. Besides, when calculations are performed on GPUs, double-precision operations are slower than single-precision operations. So this paper implemented mixed precision for nonlinear dynamic problem simulation using Belytschko-Tsay (BT) shell element on GPU.

Design/methodology/approach

To minimize data transfer between heterogeneous architectures, the parallel computation of the fully explicit finite element (FE) calculation is realized using a vectorized thread-level parallelism algorithm. An asynchronous data transmission strategy and a novel dependency relationship link-based method, for efficiently solving parallel explicit shell element equations, are used to improve the GPU utilization ratio. Finally, this paper implements mixed precision for nonlinear dynamic problems simulation using the BT shell element on a GPU and compare it to the CPU-based serially executed program and a GPU-based double-precision parallel computing program.

Findings

For a car body model containing approximately 5.3 million degrees of freedom, the computational speed is improved 25 times over CPU sequential computation, and approximately 10% over double-precision parallel computing method. The accuracy error of the mixed-precision computation is small and can satisfy the requirements of practical engineering problems.

Originality/value

This paper realized a novel FE parallel computing procedure for nonlinear dynamic problems using mixed-precision algorithm on CPU-GPU platform. Compared with the CPU serial program, the program implemented in this article obtains a 25 times acceleration ratio when calculating the model of 883,168 elements, which greatly improves the calculation speed for solving nonlinear dynamic problems.

Details

Engineering Computations, vol. 37 no. 6
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 24 August 2018

Hongbin Liu, Xinrong Su and Xin Yuan

Adopting large eddy simulation (LES) to simulate the complex flow in turbomachinery is appropriate to overcome the limitation of current Reynolds-Averaged Navier–Stokes modelling…

Abstract

Purpose

Adopting large eddy simulation (LES) to simulate the complex flow in turbomachinery is appropriate to overcome the limitation of current Reynolds-Averaged Navier–Stokes modelling and it provides a deeper understanding of the complicated transitional and turbulent flow mechanism; however, the large computational cost limits its application in high Reynolds number flow. This study aims to develop a three-dimensional GPU-enabled parallel-unstructured solver to speed up the high-fidelity LES simulation.

Design/methodology/approach

Compared to the central processing units (CPUs), graphics processing units (GPUs) can provide higher computational speed. This work aims to develop a three-dimensional GPU-enabled parallel-unstructured solver to speed up the high-fidelity LES simulation. A set of low-dissipation schemes designed for unstructured mesh is implemented with compute unified device architecture programming model. Several key parameters affecting the performance of the GPU code are discussed and further speed-up can be obtained by analysing the underlying finite volume-based numerical scheme.

Findings

The results show that an acceleration ratio of approximately 84 (on a single GPU) for double precision algorithm can be achieved with this unstructured GPU code. The transitional flow inside a compressor is simulated and the computational efficiency has been improved greatly. The transition process is discussed and the role of K-H instability playing in the transition mechanism is verified.

Practical/implications

The speed-up gained from GPU-enabled solver reaches 84 compared to original code running on CPU and the vast speed-up enables the fast-turnaround high-fidelity LES simulation.

Originality/value

The GPU-enabled flow solver is implemented and optimized according to the feature of finite volume scheme. The solving time is reduced remarkably and the detail structures including vortices are captured.

Details

Engineering Computations, vol. 35 no. 5
Type: Research Article
ISSN: 0264-4401

Keywords

Book part
Publication date: 19 November 2014

Garland Durham and John Geweke

Massively parallel desktop computing capabilities now well within the reach of individual academics modify the environment for posterior simulation in fundamental and potentially…

Abstract

Massively parallel desktop computing capabilities now well within the reach of individual academics modify the environment for posterior simulation in fundamental and potentially quite advantageous ways. But to fully exploit these benefits algorithms that conform to parallel computing environments are needed. This paper presents a sequential posterior simulator designed to operate efficiently in this context. The simulator makes fewer analytical and programming demands on investigators, and is faster, more reliable, and more complete than conventional posterior simulators. The paper extends existing sequential Monte Carlo methods and theory to provide a thorough and practical foundation for sequential posterior simulation that is well suited to massively parallel computing environments. It provides detailed recommendations on implementation, yielding an algorithm that requires only code for simulation from the prior and evaluation of prior and data densities and works well in a variety of applications representative of serious empirical work in economics and finance. The algorithm facilitates Bayesian model comparison by producing marginal likelihood approximations of unprecedented accuracy as an incidental by-product, is robust to pathological posterior distributions, and provides estimates of numerical standard error and relative numerical efficiency intrinsically. The paper concludes with an application that illustrates the potential of these simulators for applied Bayesian inference.

Article
Publication date: 22 December 2023

Vaclav Snasel, Tran Khanh Dang, Josef Kueng and Lingping Kong

This paper aims to review in-memory computing (IMC) for machine learning (ML) applications from history, architectures and options aspects. In this review, the authors investigate…

80

Abstract

Purpose

This paper aims to review in-memory computing (IMC) for machine learning (ML) applications from history, architectures and options aspects. In this review, the authors investigate different architectural aspects and collect and provide our comparative evaluations.

Design/methodology/approach

Collecting over 40 IMC papers related to hardware design and optimization techniques of recent years, then classify them into three optimization option categories: optimization through graphic processing unit (GPU), optimization through reduced precision and optimization through hardware accelerator. Then, the authors brief those techniques in aspects such as what kind of data set it applied, how it is designed and what is the contribution of this design.

Findings

ML algorithms are potent tools accommodated on IMC architecture. Although general-purpose hardware (central processing units and GPUs) can supply explicit solutions, their energy efficiencies have limitations because of their excessive flexibility support. On the other hand, hardware accelerators (field programmable gate arrays and application-specific integrated circuits) win on the energy efficiency aspect, but individual accelerator often adapts exclusively to ax single ML approach (family). From a long hardware evolution perspective, hardware/software collaboration heterogeneity design from hybrid platforms is an option for the researcher.

Originality/value

IMC’s optimization enables high-speed processing, increases performance and analyzes massive volumes of data in real-time. This work reviews IMC and its evolution. Then, the authors categorize three optimization paths for the IMC architecture to improve performance metrics.

Details

International Journal of Web Information Systems, vol. 20 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 10 of 465