Search results

1 – 10 of 107

View access options

Article

Publication date: 6 August 2021

Performance evaluation of GPU- and cluster-computing for parallelization of compute-intensive tasks

Alexander Döschl, Max-Emanuel Keller and Peter Mandl

This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing…

HTML

PDF (548 KB)

Downloads

Abstract

Purpose

This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and resilient distributed data set (RDD) (Apache Spark) paradigms and a graphics processing unit (GPU) approach with Numba for compute unified device architecture (CUDA).

Design/methodology/approach

The paper uses a simple but computationally intensive puzzle as a case study for experiments. To find all solutions using brute force search, 15! permutations had to be computed and tested against the solution rules. The experimental application comprises a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and RDD (Apache Spark) paradigms and a GPU approach with Numba for CUDA. The implementations were benchmarked on Amazon-EC2 instances for performance and scalability measurements.

Findings

The comparison of the solutions with Apache Hadoop and Apache Spark under Amazon EMR showed that the processing time measured in CPU minutes with Spark was up to 30% lower, while the performance of Spark especially benefits from an increasing number of tasks. With the CUDA implementation, more than 16 times faster execution is achievable for the same price compared to the Spark solution. Apart from the multi-threaded implementation, the processing times of all solutions scale approximately linearly. Finally, several application suggestions for the different parallelization approaches are derived from the insights of this study.

Originality/value

There are numerous studies that have examined the performance of parallelization approaches. Most of these studies deal with processing large amounts of data or mathematical problems. This work, in contrast, compares these technologies on their ability to implement computationally intensive distributed algorithms.

Details

International Journal of Web Information Systems, vol. 17 no. 4

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 20 April 2015

Acceleration of free-vibrations analysis with the Dual Reciprocity BEM based on ℋ-matrices and CUDA

Yixiong Wei, Qifu Wang, Yunbao Huang, Yingjun Wang and Zhaohui Xia

The purpose of this paper is to present a novel strategy used for acceleration of free-vibration analysis, in which the hierarchical matrices structure and Compute Unified Device…

HTML

PDF (1.3 MB)

Downloads

207

Abstract

Purpose

The purpose of this paper is to present a novel strategy used for acceleration of free-vibration analysis, in which the hierarchical matrices structure and Compute Unified Device Architecture (CUDA) platform is applied to improve the performance of the traditional dual reciprocity boundary element method (DRBEM).

Design/methodology/approach

The DRBEM is applied in forming integral equation to reduce complexity. In the procedure of optimization computation, ℋ-Matrices are introduced by applying adaptive cross-approximation method. At the same time, this paper proposes a high-efficiency parallel algorithm using CUDA and the counterpart of the serial effective algorithm in ℋ-Matrices for inverse arithmetic operation.

Findings

The analysis for free-vibration could achieve impressive time and space efficiency by introducing hierarchical matrices technique. Although the serial algorithm based on ℋ-Matrices could obtain fair performance for complex inversion operation, the CUDA parallel algorithm would further double the efficiency. Without much loss in accuracy according to the examination of the numerical example, the relative error appeared in approximation process can be fixed by increasing degrees of freedoms or introducing certain amount of internal points.

Originality/value

The paper proposes a novel effective strategy to improve computational efficiency and decrease memory consumption of free-vibration problems. ℋ-Matrices structure and parallel operation based on CUDA are introduced in traditional DRBEM.

Details

Engineering Computations, vol. 32 no. 2

Type: Research Article

DOI:

ISSN: 0264-4401

Keywords

View access options

Article

Publication date: 15 November 2011

Acceleration of moment method using CUDA

Imre Kiss, József Pávó and Szabolcs Gyimóthy

The purpose of this paper is to accelerate the time‐consuming task of assembling the impedance matrix resulting from the discretization of integral equations by the moment method…

HTML

PDF (130 KB)

Downloads

266

Abstract

Purpose

The purpose of this paper is to accelerate the time‐consuming task of assembling the impedance matrix resulting from the discretization of integral equations by the moment method, accelerated using massively parallel processing scheme.

Design/methodology/approach

This paper provides several approaches for the implementation of moment method on compute unified device architecture (CUDA) capable general purpose video cards, as well as giving general implementation design patterns and a good overview on the topic.

Findings

The proposed method seems to be efficient in the light of the presented numerical results.

Originality/value

The subject of the paper is an evolving, considerably new aspect among computation techniques which could be of high interest for the scientific community.

Details

COMPEL - The international journal for computation and mathematics in electrical and electronic engineering, vol. 30 no. 6

Type: Research Article

DOI:

ISSN: 0332-1649

Keywords

View access options

Article

Publication date: 5 April 2021

Dynamic characteristics analysis for vehicle parts based on parallel optimization algorithm with CUDA

Tianyu Zhao, Guobing Li, Honggang Pan and Huiqun Yuan

An accurate and fast dynamic analysis innovative approach for vehicle parts is provided for engineering practice.

HTML

PDF (1.2 MB)

Downloads

208

Abstract

Purpose

An accurate and fast dynamic analysis innovative approach for vehicle parts is provided for engineering practice.

Design/methodology/approach

This paper presents an innovative dynamic analysis approach for vehicle parts based on parallel optimization algorithm with CUDA.

Findings

This project is supported by the National Science Foundation of China (No. 51805076, No. U1708255 and No. 51775093), the Fundamental Research Funds for the Central Universities (No. N170503011) and the Natural Science Foundation of Liaoning Province, China (No. 20180551058).

Originality/value

This paper presents an innovative approach for vehicle parts using parallel optimization algorithm based on CUDA, which can improve the computing accuracy and speed effectively.

Details

Engineering Computations, vol. 38 no. 9

Type: Research Article

DOI:

ISSN: 0264-4401

Keywords

View access options

Article

Publication date: 17 October 2018

The acceleration of 3D graphics transformations based on CUDA

Sura Nawfal and Fakhrulddin Ali

The purpose of this paper is to achieve the acceleration of 3D object transformation using parallel techniques such as multi-core central processing unit (MC CPU) or graphic…

HTML

PDF (1.1 MB)

Downloads

138

Abstract

Purpose

The purpose of this paper is to achieve the acceleration of 3D object transformation using parallel techniques such as multi-core central processing unit (MC CPU) or graphic processing unit (GPU) or even both. Generating 3D animation scenes in computer graphics requires applying a 3D transformation on the vertices of the objects. These transformations consume most of the execution time. Hence, for high-speed graphic systems, acceleration of vertex transform is very much sought for because it requires many matrix operations (need) to be performed in a real time, so the execution time is essential for such processing.

Design/methodology/approach

In this paper, the acceleration of 3D object transformation is achieved using parallel techniques such as MC CPU or GPU or even both. Multiple geometric transformations are concatenated together at a time in any order in an interactive manner.

Findings

The performance results are presented for a number of 3D objects with paralleled implementations of the affine transform on the NVIDIA GPU series. The maximum execution time was about 0.508 s to transform 100 million vertices using LabVIEW and 0.096 s using Visual Studio. Other results also showed the significant speed-up compared to CPU, MC CPU and other previous work computations for the same object complexity.

Originality/value

The high-speed execution of 3D models is essential in many applications such as medical imaging, 3D games and robotics.

Details

Journal of Engineering, Design and Technology, vol. 16 no. 6

Type: Research Article

DOI:

ISSN: 1726-0531

Keywords

View access options

Article

Publication date: 3 July 2017

GPU-based parallelization for bubble mesh generation

Van Quang Dinh and Yves Marechal

In FEM computations, the mesh quality improves the accuracy of the approximation solution and reduces the computation time. The dynamic bubble system meshing technique can provide…

HTML

PDF (3.3 MB)

Downloads

105

Abstract

Purpose

In FEM computations, the mesh quality improves the accuracy of the approximation solution and reduces the computation time. The dynamic bubble system meshing technique can provide high-quality meshes, but the packing process is time-consuming. This paper aims to improve the running time of the bubble meshing by using the advantages of parallel computing on graphics processing unit (GPU).

Design/methodology/approach

This paper is based on the analysis of the processing time on CPU. A massively parallel computing-based CUDA architecture is proposed to improve the bubble displacement and database updating. Constraints linked to hardware considerations are taken into account. Finally, speedup factors are provided on test cases and real scale examples.

Findings

The numerical experiences show the efficiency of parallel performance reaches a speedup of 35 compared to the serial implementation.

Research limitations/implications

This contribution is so far limited to two-dimensional (2D) geometries although the extension to three-dimension (3D) is straightforward regarding the meshing technique itself and the GPU implementation. The authors’ works are based on a CUDA environment which is widely used by developers. C\C++ and Java were the programming languages used. Other languages may of course lead to slightly different implementations.

Practical implications

This approach makes it possible to use bubble meshing technique for both initial design and optimization, as excellent meshes can be built in few seconds.

Originality/value

Compared to previous works, this contribution shows that the scalability of the bubble meshing technique needs to solve two key issues: reach a T(N) global cost of the implementation and reach a very fast size map interpolation strategy.

Details

COMPEL - The international journal for computation and mathematics in electrical and electronic engineering, vol. 36 no. 4

Type: Research Article

DOI:

ISSN: 0332-1649

Keywords

View access options

Article

Publication date: 6 September 2023

ISOMORPH: an efficient application on GPU for detecting graph isomorphism

Antonio Llanes, Baldomero Imbernón Tudela, Manuel Curado and Jesús Soto

The authors will review the main concepts of graphs, present the implemented algorithm, as well as explain the different techniques applied to the graph, to achieve an efficient…

HTML

PDF (845 KB)

Downloads

Abstract

Purpose

The authors will review the main concepts of graphs, present the implemented algorithm, as well as explain the different techniques applied to the graph, to achieve an efficient execution of the algorithm, both in terms of the use of multiple cores that the authors have available today, and the use of massive data parallelism through the parallelization of the algorithm, bringing the graph closer to the execution through CUDA on GPUs.

Design/methodology/approach

In this work, the authors approach the graphs isomorphism problem, approaching this problem from a point of view very little worked during all this time, the application of parallelism and the high-performance computing (HPC) techniques to the detection of isomorphism between graphs.

Findings

Results obtained give compelling reasons to ensure that more in-depth studies on the HPC techniques should be applied in these fields, since gains of up to 722x speedup are achieved in the most favorable scenarios, maintaining an average performance speedup of 454x.

Originality/value

The paper is new and original.

Details

Engineering Computations, vol. 40 no. 7/8

Type: Research Article

DOI:

ISSN: 0264-4401

Keywords

View access options

Article

Publication date: 7 September 2015

A curvilinear stochastic-FDTD algorithm for 3-D EMC problems with media uncertainties

Georgios Pyrialakos, Athanasios Papadimopoulos, Theodoros Zygiridis, Nikolaos Kantartzis and Theodoros Tsiboukis

Stochastic uncertainties in material parameters have a significant impact on the analysis of real-world electromagnetic compatibility (EMC) problems. Conventional approaches via…

HTML

PDF (724 KB)

Downloads

195

Abstract

Purpose

Stochastic uncertainties in material parameters have a significant impact on the analysis of real-world electromagnetic compatibility (EMC) problems. Conventional approaches via the Monte-Carlo scheme attempt to provide viable solutions, yet at the expense of prohibitively elongated simulations and system overhead, due to the large amount of statistical implementations. The purpose of this paper is to introduce a 3-D stochastic finite-difference time-domain (S-FDTD) technique for the accurate modelling of generalised EMC applications with highly random media properties, while concurrently offering fast and economical single-run realisations.

Design/methodology/approach

The proposed method establishes the concept of covariant/contravariant metrics for robust tessellations of arbitrarily curved structures and derives the mean value and standard deviation of the generated fields in a single-run. Also, the critical case of geometrical and physical uncertainties is handled via an optimal parameterisation, which locally reforms the curvilinear grid. In order to pursue extra speed efficiency, code implementation is conducted through contemporary graphics processor units and parallel programming.

Findings

The curvilinear S-FDTD algorithm is proven very precise and stable, compared to existing multiple-realisation approaches, in the analysis of statistically-varying problems. Moreover, its generalised formulation allows the effective treatment of realistic structures with arbitrarily curved geometries, unlike staircase schemes. Finally, the GPU-based enhancements accomplish notably accelerated simulations that may exceed the level of 120 times. Conclusively, the featured technique can successfully attain highly accurate results with very limited system requirements.

Originality/value

Development of a generalised curvilinear S-FDTD methodology, based on a covariant/contravariant algorithm. Incorporation of the important geometric/physical uncertainties through a locally adaptive curved mesh. Speed advancement via modern GPU and CUDA programming which leads to reliable estimations, even for abrupt statistical media parameter fluctuations.

Details

COMPEL: The International Journal for Computation and Mathematics in Electrical and Electronic Engineering, vol. 34 no. 5

Type: Research Article

DOI:

ISSN: 0332-1649

Keywords

View access options

Article

Publication date: 25 February 2020

A novel parallel finite element procedure for nonlinear dynamic problems using GPU and mixed-precision algorithm

Shengquan Wang, Chao Wang, Yong Cai and Guangyao Li

The purpose of this paper is to improve the computational speed of solving nonlinear dynamics by using parallel methods and mixed-precision algorithm on graphic processing units…

HTML

PDF (1.8 MB)

Downloads

171

Abstract

Purpose

The purpose of this paper is to improve the computational speed of solving nonlinear dynamics by using parallel methods and mixed-precision algorithm on graphic processing units (GPUs). The computational efficiency of traditional central processing units (CPUs)-based computer aided engineering software has been difficult to satisfy the needs of scientific research and practical engineering, especially for nonlinear dynamic problems. Besides, when calculations are performed on GPUs, double-precision operations are slower than single-precision operations. So this paper implemented mixed precision for nonlinear dynamic problem simulation using Belytschko-Tsay (BT) shell element on GPU.

Design/methodology/approach

To minimize data transfer between heterogeneous architectures, the parallel computation of the fully explicit finite element (FE) calculation is realized using a vectorized thread-level parallelism algorithm. An asynchronous data transmission strategy and a novel dependency relationship link-based method, for efficiently solving parallel explicit shell element equations, are used to improve the GPU utilization ratio. Finally, this paper implements mixed precision for nonlinear dynamic problems simulation using the BT shell element on a GPU and compare it to the CPU-based serially executed program and a GPU-based double-precision parallel computing program.

Findings

For a car body model containing approximately 5.3 million degrees of freedom, the computational speed is improved 25 times over CPU sequential computation, and approximately 10% over double-precision parallel computing method. The accuracy error of the mixed-precision computation is small and can satisfy the requirements of practical engineering problems.

Originality/value

This paper realized a novel FE parallel computing procedure for nonlinear dynamic problems using mixed-precision algorithm on CPU-GPU platform. Compared with the CPU serial program, the program implemented in this article obtains a 25 times acceleration ratio when calculating the model of 883,168 elements, which greatly improves the calculation speed for solving nonlinear dynamic problems.

Details

Engineering Computations, vol. 37 no. 6

Type: Research Article

DOI:

ISSN: 0264-4401

Keywords

View access options

Book part

Publication date: 19 November 2014

Adaptive Sequential Posterior Simulators for Massively Parallel Computing Environments

Garland Durham and John Geweke

Massively parallel desktop computing capabilities now well within the reach of individual academics modify the environment for posterior simulation in fundamental and potentially…

HTML

PDF (906 KB)

EPUB (868 KB)

Abstract

Massively parallel desktop computing capabilities now well within the reach of individual academics modify the environment for posterior simulation in fundamental and potentially quite advantageous ways. But to fully exploit these benefits algorithms that conform to parallel computing environments are needed. This paper presents a sequential posterior simulator designed to operate efficiently in this context. The simulator makes fewer analytical and programming demands on investigators, and is faster, more reliable, and more complete than conventional posterior simulators. The paper extends existing sequential Monte Carlo methods and theory to provide a thorough and practical foundation for sequential posterior simulation that is well suited to massively parallel computing environments. It provides detailed recommendations on implementation, yielding an algorithm that requires only code for simulation from the prior and evaluation of prior and data densities and works well in a variety of applications representative of serious empirical work in economics and finance. The algorithm facilitates Bayesian model comparison by producing marginal likelihood approximations of unprecedented accuracy as an incidental by-product, is robust to pathological posterior distributions, and provides estimates of numerical standard error and relative numerical efficiency intrinsically. The paper concludes with an application that illustrates the potential of these simulators for applied Bayesian inference.

Details

Bayesian Model Comparison

Type: Book

DOI:

ISBN: 978-1-78441-185-5

Keywords

Access

Year

Content type

1 – 10 of 107

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Details

Keywords

Access

Year

Content type

All feedback is valuable

Report an issue or find answers to frequently asked questions