Search results

1 – 10 of 408

View access options

Article

Publication date: 7 February 2019

Parallelization and analysis of selected numerical algorithms using OpenMP and Pluto on symmetric multiprocessing machine

Tanvir Habib Sardar and Ahmed Rimaz Faizabadi

In recent years, there is a gradual shift from sequential computing to parallel computing. Nowadays, nearly all computers are of multicore processors. To exploit the available…

HTML

PDF (424 KB)

Downloads

2030

Abstract

Purpose

In recent years, there is a gradual shift from sequential computing to parallel computing. Nowadays, nearly all computers are of multicore processors. To exploit the available cores, parallel computing becomes necessary. It increases speed by processing huge amount of data in real time. The purpose of this paper is to parallelize a set of well-known programs using different techniques to determine best way to parallelize a program experimented.

Design/methodology/approach

A set of numeric algorithms are parallelized using hand parallelization using OpenMP and auto parallelization using Pluto tool.

Findings

The work discovers that few of the algorithms are well suited in auto parallelization using Pluto tool but many of the algorithms execute more efficiently using OpenMP hand parallelization.

Originality/value

The work provides an original work on parallelization using OpenMP programming paradigm and Pluto tool.

Details

Data Technologies and Applications, vol. 53 no. 1

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 4 May 2010

Parallelization of an ant‐based clustering approach

Ozlem Gemici Gunes and A. Sima Uyar

The purpose of this paper is to propose parallelization of a successful sequential ant‐based clustering algorithm (SABCA) to increase time performance.

HTML

PDF (197 KB)

Downloads

447

Abstract

Purpose

The purpose of this paper is to propose parallelization of a successful sequential ant‐based clustering algorithm (SABCA) to increase time performance.

Design/methodology/approach

A SABCA is parallelized through the chosen parallelization library MPI. Parallelization is performed in two stages. In the first stage, data to be clustered are divided among processors. After the sequential ant‐based approach running on each processor clusters the data assigned to it, the resulting clusters are merged in the second stage. The merging is also performed through the same ant‐based technique. The experimental analysis focuses on whether the implemented parallel ant‐based clustering method leads to a better time performance than its fully sequential version or not. Since the aim of this paper is to speedup the time consuming, but otherwise successful, ant‐based clustering method, no extra steps are taken to improve the clustering solution. Tests are executed using 2 and 4 processors on selected sample datasets. Results are analyzed through commonly used cluster validity indices and parallelization performance metrices.

Findings

As a result of the experiments, it is seen that the proposed algorithm performs better based on time measurements and parallelization performance metrices; as expected, it does not improve the clustering quality based on the cluster validity indices. Furthermore, the communication cost is very small compared to other ant‐based clustering parallelization techniques proposed so far.

Research limitations/implications

The use of MPI for the parallelization step has been very effective. Also, the proposed parallelization technique is quite successful in increasing time performance; however, as a future study, improvements to clustering quality can be made in the final step where the partially clustered data are merged.

Practical implications

The results in literature show that ant‐based clustering techniques are successful; however, their high‐time complexity prohibit their effective use in practical applications. Through this low‐communication‐cost parallelization technique, this limitation may be overcome.

Originality/value

A new parallelization approach to ant‐based clustering is proposed. The proposed approach does not decrease clustering performance while it increases time performance. Also, another major contribution of this paper is the fact that the communication costs required for parallelization is lower than the previously proposed parallel ant‐based techniques.

Details

Kybernetes, vol. 39 no. 4

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 4 May 2012

On pragmatic parallelization of a serial Navier‐Stokes solver in cylindrical coordinates

Frode Nygård and Helge I. Andersson

The purpose of this paper is to describe a pragmatic parallelization of a publicly available serial code aimed for direct numerical simulations of turbulent flow fields. The code…

HTML

PDF (90 KB)

Downloads

160

Abstract

Purpose

The purpose of this paper is to describe a pragmatic parallelization of a publicly available serial code aimed for direct numerical simulations of turbulent flow fields. The code solves the full Navier‐Stokes equations in a cylindrical coordinate system.

Design/methodology/approach

The parallelization is performed by a single program multiple data approach using the Message‐Passing Interface (MPI) Library for processor communication.

Findings

In order to maintain the original coding of the subroutines, two obstacles had to be overcome. First, special attention had to be given to the inversion of the sparse matrixes from the linear terms in the Navier‐Stokes equations solved by an implicit scheme. Second, the serial FFT‐routines, needed for the direct Poisson‐solver, had to be replaced by parallel versions. Two directions of parallelization were tested. Parallelization in the axial direction turned out to be more efficient than parallelization in the circumferential direction.

Originality/value

This paper presents a pragmatic parallelization of an open source finite difference code and should be useful to researchers in the field of numerical methods for fluid flow who need to parallelize a numerical code.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. 22 no. 4

Type: Research Article

DOI:

ISSN: 0961-5539

Keywords

View access options

Article

Publication date: 1 June 2005

Parallelized computation of compressed BEM matrices on multiprocessor computer clusters

André Buchau, Wolfgang Hafla, Friedemann Groh and Wolfgang M. Rucker

Various parallelization strategies are investigated to mainly reduce the computational costs in the context of boundary element methods and a compressed system matrix.

HTML

PDF (463 KB)

Downloads

211

Abstract

Purpose

Various parallelization strategies are investigated to mainly reduce the computational costs in the context of boundary element methods and a compressed system matrix.

Design/methodology/approach

Electrostatic field problems are solved numerically by an indirect boundary element method. The fully dense system matrix is compressed by an application of the fast multipole method. Various parallelization techniques such as vectorization, multiple threads, and multiple processes are applied to reduce the computational costs.

Findings

It is shown that in total a good speedup is achieved by a parallelization approach which is relatively easy to implement. Furthermore, a detailed discussion on the influence of problem oriented meshes to the different parts of the method is presented. On the one hand the application of problem oriented meshes leads to relatively small linear systems of equations along with a high accuracy of the solution, but on the other hand the efficiency of parallelization itself is diminished.

Research limitations/implications

The presented parallelization approach has been tested on a small PC cluster only. Additionally, the main focus has been laid on a reduction of computing time.

Practical implications

Typical properties of general static field problems are comprised in the investigated numerical example. Hence, the results and conclusions are rather general.

Originality/value

Implementation details of a parallelization of existing fast and efficient boundary element method solvers are discussed. The presented approach is relatively easy to implement and takes special properties of fast methods in combination with parallelization into account.

Details

COMPEL - The international journal for computation and mathematics in electrical and electronic engineering, vol. 24 no. 2

Type: Research Article

DOI:

ISSN: 0332-1649

Keywords

View access options

Article

Publication date: 7 March 2016

Performance comparisons of bonding box-based contact detection algorithms and a new improvement technique based on parallelization

Mahmoud Yazdani, Hamidreza Paseh and Mostafa Sharifzadeh

– The purpose of this paper is to find a convenient contact detection algorithm in order to apply in distinct element simulation.

HTML

PDF (1.6 MB)

Downloads

229

Abstract

Purpose

The purpose of this paper is to find a convenient contact detection algorithm in order to apply in distinct element simulation.

Design/methodology/approach

Taking the most computation effort, the performance of the contact detection algorithm highly affects the running time. The algorithms investigated in this study consist of Incremental Sort-and-Update (ISU) and Double-Ended Spatial Sorting (DESS). These algorithms are based on bounding boxes, which makes the algorithm independent of blocks shapes. ISU and DESS algorithms contain sorting and updating phases. To compare the algorithms, they were implemented in identical examples of rock engineering problems with varying parameters.

Findings

The results show that the ISU algorithm gives lower running time and shows better performance when blocks are unevenly distributed in both axes. The conventional ISU merges the sorting and updating phases in its naïve implementation. In this paper, a new computational technique is proposed based on parallelization in order to effectively improve the ISU algorithm and decrease the running time of numerical analysis in large-scale rock mass projects.

Originality/value

In this approach, the sorting and updating phases are separated by minor changes in the algorithm. This tends to a minimal overhead of running time and a little extra memory usage and then the parallelization of phases can be applied. On the other hand, the time consumed by the updating phase of ISU algorithm is about 30 percent of the total time, which makes the parallelization justifiable. Here, according to the results for the large-scale problems, this improved technique can increase the performance of the ISU algorithm up to 20 percent.

Details

Engineering Computations, vol. 33 no. 1

Type: Research Article

DOI:

ISSN: 0264-4401

Keywords

View access options

Article

Publication date: 1 January 2013

Parallel construction of explicit boundaries using support vector machines

Ke Lin, Anirban Basudhar and Samy Missoum

The purpose of this paper is to present a study of the parallelization of the construction of explicit constraints or limit‐state functions using support vector machines. These…

HTML

PDF (176 KB)

Downloads

204

Abstract

Purpose

The purpose of this paper is to present a study of the parallelization of the construction of explicit constraints or limit‐state functions using support vector machines. These explicit boundaries have proven to be beneficial for design optimization and reliability assessment, especially for problems with large computational times, discontinuities, or binary outputs. In addition to the study of the parallelization, the objective of this article is also to provide an approach to select the number of processors.

Design/methodology/approach

This article investigates the parallelization in two ways. First, the efficiency of the parallelization is assessed by comparing, over several runs, the number of iterations needed to create an accurate boundary to the number of iterations associated with a theoretical “linear” speedup. Second, by studying these differences, an “appropriate” range of parallel processors can be inferred.

Findings

The parallelization of the construction of explicit boundaries can lead to a markedly reduced computational burden. The study provides an approach to select the number of processors for an optimal use of computational resources.

Originality/value

The construction of explicit boundaries for design optimization and reliability assessment is designed to alleviate many hurdles in these areas. The parallelization of the construction of the boundaries is a much needed study to reinforce the efficacy and efficiency of this approach.

Details

Engineering Computations, vol. 30 no. 1

Type: Research Article

DOI:

ISSN: 0264-4401

Keywords

View access options

Article

Publication date: 6 August 2021

Performance evaluation of GPU- and cluster-computing for parallelization of compute-intensive tasks

Alexander Döschl, Max-Emanuel Keller and Peter Mandl

This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing…

HTML

PDF (548 KB)

Downloads

Abstract

Purpose

This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and resilient distributed data set (RDD) (Apache Spark) paradigms and a graphics processing unit (GPU) approach with Numba for compute unified device architecture (CUDA).

Design/methodology/approach

The paper uses a simple but computationally intensive puzzle as a case study for experiments. To find all solutions using brute force search, 15! permutations had to be computed and tested against the solution rules. The experimental application comprises a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and RDD (Apache Spark) paradigms and a GPU approach with Numba for CUDA. The implementations were benchmarked on Amazon-EC2 instances for performance and scalability measurements.

Findings

The comparison of the solutions with Apache Hadoop and Apache Spark under Amazon EMR showed that the processing time measured in CPU minutes with Spark was up to 30% lower, while the performance of Spark especially benefits from an increasing number of tasks. With the CUDA implementation, more than 16 times faster execution is achievable for the same price compared to the Spark solution. Apart from the multi-threaded implementation, the processing times of all solutions scale approximately linearly. Finally, several application suggestions for the different parallelization approaches are derived from the insights of this study.

Originality/value

There are numerous studies that have examined the performance of parallelization approaches. Most of these studies deal with processing large amounts of data or mathematical problems. This work, in contrast, compares these technologies on their ability to implement computationally intensive distributed algorithms.

Details

International Journal of Web Information Systems, vol. 17 no. 4

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 1 June 1999

Parallelization of the finite volume method for radiation heat transfer

P.J. Coelho and J. Gonçalves

The finite volume method for radiative heat transfer calculations has been parallelized using two strategies, the angular domain decomposition and the spatial domain…

HTML

PDF (186 KB)

Downloads

879

Abstract

The finite volume method for radiative heat transfer calculations has been parallelized using two strategies, the angular domain decomposition and the spatial domain decomposition. In the first case each processor performs the calculations for the whole domain and for a subset of control angles, while in the second case each processor deals with all the control angles but only treats a spatial subdomain. The method is applied to three‐dimensional rectangular enclosures containing a grey emitting‐absorbing medium. The results obtained show that the number of iterations required to achieve convergence is independent of the number of processors in the angular decomposition strategy, but increases with the number of processors in the domain decomposition method. As a consequence, higher parallel efficiencies are obtained in the first case. The influence of the angular discretization, grid size and absorption coefficient of the medium on the parallel performance is also investigated.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. 9 no. 4

Type: Research Article

DOI:

ISSN: 0961-5539

Keywords

View access options

Article

Publication date: 1 June 2003

FEM and BEM parallel processing: theory and applications – a bibliography (1996‐2002)

Jaroslav Mackerle

This paper gives a bibliographical review of the finite element and boundary element parallel processing techniques from the theoretical and application points of view. Topics…

HTML

PDF (227 KB)

Downloads

1205

Abstract

This paper gives a bibliographical review of the finite element and boundary element parallel processing techniques from the theoretical and application points of view. Topics include: theory – domain decomposition/partitioning, load balancing, parallel solvers/algorithms, parallel mesh generation, adaptive methods, and visualization/graphics; applications – structural mechanics problems, dynamic problems, material/geometrical non‐linear problems, contact problems, fracture mechanics, field problems, coupled problems, sensitivity and optimization, and other problems; hardware and software environments – hardware environments, programming techniques, and software development and presentations. The bibliography at the end of this paper contains 850 references to papers, conference proceedings and theses/dissertations dealing with presented subjects that were published between 1996 and 2002.

Details

Engineering Computations, vol. 20 no. 4

Type: Research Article

DOI:

ISSN: 0264-4401

Keywords

View access options

Article

Publication date: 1 November 2001

Parallel ray tracing for radiative heat transfer: Application in a distributed computing environment

J.G. Marakis, J. Chamiço, G. Brenner and F. Durst

Notes that, in a full‐scale application of the Monte Carlo method for combined heat transfer analysis, problems usually arise from the large computing requirements. Here the…

HTML

PDF (381 KB)

Downloads

584

Abstract

Notes that, in a full‐scale application of the Monte Carlo method for combined heat transfer analysis, problems usually arise from the large computing requirements. Here the method to overcome this difficulty is the parallel execution of the Monte Carlo method in a distributed computing environment. Addresses the problem of determination of the temperature field formed under the assumption of radiative equilibrium in an enclosure idealizing an industrial furnace. The medium contained in this enclosure absorbs, emits and scatters anisotropically thermal radiation. Discusses two topics in detail: first, the efficiency of the parallelization of the developed code, and second, the influence of the scattering behavior of the medium. The adopted parallelization method for the first topic is the decomposition of the statistical sample and its subsequent distribution among the available processors. The measured high efficiencies showed that this method is particularly suited to the target architecture of this study, which is a dedicated network of workstations supporting the message passing paradigm. For the second topic, the results showed that taking into account the isotropic scattering, as opposed to neglecting the scattering, has a pronounced impact on the temperature distribution inside the enclosure. In contrast, the consideration of the sharply forward scattering, that is characteristic of all the real combustion particles, leaves the predicted temperature field almost undistinguishable from the absorbing/emitting case.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. 11 no. 7

Type: Research Article

DOI:

ISSN: 0961-5539

Keywords

Access

Year

Content type

1 – 10 of 408

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Details

Keywords

Abstract

Details

Keywords

Abstract

Details

Keywords

Access

Year

Content type

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information