Search results
1 – 10 of 386Tanvir Habib Sardar and Ahmed Rimaz Faizabadi
In recent years, there is a gradual shift from sequential computing to parallel computing. Nowadays, nearly all computers are of multicore processors. To exploit the available…
Abstract
Purpose
In recent years, there is a gradual shift from sequential computing to parallel computing. Nowadays, nearly all computers are of multicore processors. To exploit the available cores, parallel computing becomes necessary. It increases speed by processing huge amount of data in real time. The purpose of this paper is to parallelize a set of well-known programs using different techniques to determine best way to parallelize a program experimented.
Design/methodology/approach
A set of numeric algorithms are parallelized using hand parallelization using OpenMP and auto parallelization using Pluto tool.
Findings
The work discovers that few of the algorithms are well suited in auto parallelization using Pluto tool but many of the algorithms execute more efficiently using OpenMP hand parallelization.
Originality/value
The work provides an original work on parallelization using OpenMP programming paradigm and Pluto tool.
Details
Keywords
Ozlem Gemici Gunes and A. Sima Uyar
The purpose of this paper is to propose parallelization of a successful sequential ant‐based clustering algorithm (SABCA) to increase time performance.
Abstract
Purpose
The purpose of this paper is to propose parallelization of a successful sequential ant‐based clustering algorithm (SABCA) to increase time performance.
Design/methodology/approach
A SABCA is parallelized through the chosen parallelization library MPI. Parallelization is performed in two stages. In the first stage, data to be clustered are divided among processors. After the sequential ant‐based approach running on each processor clusters the data assigned to it, the resulting clusters are merged in the second stage. The merging is also performed through the same ant‐based technique. The experimental analysis focuses on whether the implemented parallel ant‐based clustering method leads to a better time performance than its fully sequential version or not. Since the aim of this paper is to speedup the time consuming, but otherwise successful, ant‐based clustering method, no extra steps are taken to improve the clustering solution. Tests are executed using 2 and 4 processors on selected sample datasets. Results are analyzed through commonly used cluster validity indices and parallelization performance metrices.
Findings
As a result of the experiments, it is seen that the proposed algorithm performs better based on time measurements and parallelization performance metrices; as expected, it does not improve the clustering quality based on the cluster validity indices. Furthermore, the communication cost is very small compared to other ant‐based clustering parallelization techniques proposed so far.
Research limitations/implications
The use of MPI for the parallelization step has been very effective. Also, the proposed parallelization technique is quite successful in increasing time performance; however, as a future study, improvements to clustering quality can be made in the final step where the partially clustered data are merged.
Practical implications
The results in literature show that ant‐based clustering techniques are successful; however, their high‐time complexity prohibit their effective use in practical applications. Through this low‐communication‐cost parallelization technique, this limitation may be overcome.
Originality/value
A new parallelization approach to ant‐based clustering is proposed. The proposed approach does not decrease clustering performance while it increases time performance. Also, another major contribution of this paper is the fact that the communication costs required for parallelization is lower than the previously proposed parallel ant‐based techniques.
Details
Keywords
Frode Nygård and Helge I. Andersson
The purpose of this paper is to describe a pragmatic parallelization of a publicly available serial code aimed for direct numerical simulations of turbulent flow fields. The code…
Abstract
Purpose
The purpose of this paper is to describe a pragmatic parallelization of a publicly available serial code aimed for direct numerical simulations of turbulent flow fields. The code solves the full Navier‐Stokes equations in a cylindrical coordinate system.
Design/methodology/approach
The parallelization is performed by a single program multiple data approach using the Message‐Passing Interface (MPI) Library for processor communication.
Findings
In order to maintain the original coding of the subroutines, two obstacles had to be overcome. First, special attention had to be given to the inversion of the sparse matrixes from the linear terms in the Navier‐Stokes equations solved by an implicit scheme. Second, the serial FFT‐routines, needed for the direct Poisson‐solver, had to be replaced by parallel versions. Two directions of parallelization were tested. Parallelization in the axial direction turned out to be more efficient than parallelization in the circumferential direction.
Originality/value
This paper presents a pragmatic parallelization of an open source finite difference code and should be useful to researchers in the field of numerical methods for fluid flow who need to parallelize a numerical code.
Details
Keywords
André Buchau, Wolfgang Hafla, Friedemann Groh and Wolfgang M. Rucker
Various parallelization strategies are investigated to mainly reduce the computational costs in the context of boundary element methods and a compressed system matrix.
Abstract
Purpose
Various parallelization strategies are investigated to mainly reduce the computational costs in the context of boundary element methods and a compressed system matrix.
Design/methodology/approach
Electrostatic field problems are solved numerically by an indirect boundary element method. The fully dense system matrix is compressed by an application of the fast multipole method. Various parallelization techniques such as vectorization, multiple threads, and multiple processes are applied to reduce the computational costs.
Findings
It is shown that in total a good speedup is achieved by a parallelization approach which is relatively easy to implement. Furthermore, a detailed discussion on the influence of problem oriented meshes to the different parts of the method is presented. On the one hand the application of problem oriented meshes leads to relatively small linear systems of equations along with a high accuracy of the solution, but on the other hand the efficiency of parallelization itself is diminished.
Research limitations/implications
The presented parallelization approach has been tested on a small PC cluster only. Additionally, the main focus has been laid on a reduction of computing time.
Practical implications
Typical properties of general static field problems are comprised in the investigated numerical example. Hence, the results and conclusions are rather general.
Originality/value
Implementation details of a parallelization of existing fast and efficient boundary element method solvers are discussed. The presented approach is relatively easy to implement and takes special properties of fast methods in combination with parallelization into account.
Details
Keywords
Mahmoud Yazdani, Hamidreza Paseh and Mostafa Sharifzadeh
– The purpose of this paper is to find a convenient contact detection algorithm in order to apply in distinct element simulation.
Abstract
Purpose
The purpose of this paper is to find a convenient contact detection algorithm in order to apply in distinct element simulation.
Design/methodology/approach
Taking the most computation effort, the performance of the contact detection algorithm highly affects the running time. The algorithms investigated in this study consist of Incremental Sort-and-Update (ISU) and Double-Ended Spatial Sorting (DESS). These algorithms are based on bounding boxes, which makes the algorithm independent of blocks shapes. ISU and DESS algorithms contain sorting and updating phases. To compare the algorithms, they were implemented in identical examples of rock engineering problems with varying parameters.
Findings
The results show that the ISU algorithm gives lower running time and shows better performance when blocks are unevenly distributed in both axes. The conventional ISU merges the sorting and updating phases in its naïve implementation. In this paper, a new computational technique is proposed based on parallelization in order to effectively improve the ISU algorithm and decrease the running time of numerical analysis in large-scale rock mass projects.
Originality/value
In this approach, the sorting and updating phases are separated by minor changes in the algorithm. This tends to a minimal overhead of running time and a little extra memory usage and then the parallelization of phases can be applied. On the other hand, the time consumed by the updating phase of ISU algorithm is about 30 percent of the total time, which makes the parallelization justifiable. Here, according to the results for the large-scale problems, this improved technique can increase the performance of the ISU algorithm up to 20 percent.
Details
Keywords
Ke Lin, Anirban Basudhar and Samy Missoum
The purpose of this paper is to present a study of the parallelization of the construction of explicit constraints or limit‐state functions using support vector machines. These…
Abstract
Purpose
The purpose of this paper is to present a study of the parallelization of the construction of explicit constraints or limit‐state functions using support vector machines. These explicit boundaries have proven to be beneficial for design optimization and reliability assessment, especially for problems with large computational times, discontinuities, or binary outputs. In addition to the study of the parallelization, the objective of this article is also to provide an approach to select the number of processors.
Design/methodology/approach
This article investigates the parallelization in two ways. First, the efficiency of the parallelization is assessed by comparing, over several runs, the number of iterations needed to create an accurate boundary to the number of iterations associated with a theoretical “linear” speedup. Second, by studying these differences, an “appropriate” range of parallel processors can be inferred.
Findings
The parallelization of the construction of explicit boundaries can lead to a markedly reduced computational burden. The study provides an approach to select the number of processors for an optimal use of computational resources.
Originality/value
The construction of explicit boundaries for design optimization and reliability assessment is designed to alleviate many hurdles in these areas. The parallelization of the construction of the boundaries is a much needed study to reinforce the efficacy and efficiency of this approach.
Details
Keywords
Alexander Döschl, Max-Emanuel Keller and Peter Mandl
This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing…
Abstract
Purpose
This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and resilient distributed data set (RDD) (Apache Spark) paradigms and a graphics processing unit (GPU) approach with Numba for compute unified device architecture (CUDA).
Design/methodology/approach
The paper uses a simple but computationally intensive puzzle as a case study for experiments. To find all solutions using brute force search, 15! permutations had to be computed and tested against the solution rules. The experimental application comprises a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and RDD (Apache Spark) paradigms and a GPU approach with Numba for CUDA. The implementations were benchmarked on Amazon-EC2 instances for performance and scalability measurements.
Findings
The comparison of the solutions with Apache Hadoop and Apache Spark under Amazon EMR showed that the processing time measured in CPU minutes with Spark was up to 30% lower, while the performance of Spark especially benefits from an increasing number of tasks. With the CUDA implementation, more than 16 times faster execution is achievable for the same price compared to the Spark solution. Apart from the multi-threaded implementation, the processing times of all solutions scale approximately linearly. Finally, several application suggestions for the different parallelization approaches are derived from the insights of this study.
Originality/value
There are numerous studies that have examined the performance of parallelization approaches. Most of these studies deal with processing large amounts of data or mathematical problems. This work, in contrast, compares these technologies on their ability to implement computationally intensive distributed algorithms.
Details
Keywords
The finite volume method for radiative heat transfer calculations has been parallelized using two strategies, the angular domain decomposition and the spatial domain…
Abstract
The finite volume method for radiative heat transfer calculations has been parallelized using two strategies, the angular domain decomposition and the spatial domain decomposition. In the first case each processor performs the calculations for the whole domain and for a subset of control angles, while in the second case each processor deals with all the control angles but only treats a spatial subdomain. The method is applied to three‐dimensional rectangular enclosures containing a grey emitting‐absorbing medium. The results obtained show that the number of iterations required to achieve convergence is independent of the number of processors in the angular decomposition strategy, but increases with the number of processors in the domain decomposition method. As a consequence, higher parallel efficiencies are obtained in the first case. The influence of the angular discretization, grid size and absorption coefficient of the medium on the parallel performance is also investigated.
Details
Keywords
This paper gives a bibliographical review of the finite element and boundary element parallel processing techniques from the theoretical and application points of view. Topics…
Abstract
This paper gives a bibliographical review of the finite element and boundary element parallel processing techniques from the theoretical and application points of view. Topics include: theory – domain decomposition/partitioning, load balancing, parallel solvers/algorithms, parallel mesh generation, adaptive methods, and visualization/graphics; applications – structural mechanics problems, dynamic problems, material/geometrical non‐linear problems, contact problems, fracture mechanics, field problems, coupled problems, sensitivity and optimization, and other problems; hardware and software environments – hardware environments, programming techniques, and software development and presentations. The bibliography at the end of this paper contains 850 references to papers, conference proceedings and theses/dissertations dealing with presented subjects that were published between 1996 and 2002.
Details
Keywords
J.G. Marakis, J. Chamiço, G. Brenner and F. Durst
Notes that, in a full‐scale application of the Monte Carlo method for combined heat transfer analysis, problems usually arise from the large computing requirements. Here the…
Abstract
Notes that, in a full‐scale application of the Monte Carlo method for combined heat transfer analysis, problems usually arise from the large computing requirements. Here the method to overcome this difficulty is the parallel execution of the Monte Carlo method in a distributed computing environment. Addresses the problem of determination of the temperature field formed under the assumption of radiative equilibrium in an enclosure idealizing an industrial furnace. The medium contained in this enclosure absorbs, emits and scatters anisotropically thermal radiation. Discusses two topics in detail: first, the efficiency of the parallelization of the developed code, and second, the influence of the scattering behavior of the medium. The adopted parallelization method for the first topic is the decomposition of the statistical sample and its subsequent distribution among the available processors. The measured high efficiencies showed that this method is particularly suited to the target architecture of this study, which is a dedicated network of workstations supporting the message passing paradigm. For the second topic, the results showed that taking into account the isotropic scattering, as opposed to neglecting the scattering, has a pronounced impact on the temperature distribution inside the enclosure. In contrast, the consideration of the sharply forward scattering, that is characteristic of all the real combustion particles, leaves the predicted temperature field almost undistinguishable from the absorbing/emitting case.
Details