Books and journals Case studies Expert Briefings Open Access
Advanced search

Search results

1 – 10 of 348
To view the access options for this content please click here
Article
Publication date: 4 February 2019

Parallelization and analysis of selected numerical algorithms using OpenMP and Pluto on symmetric multiprocessing machine

Tanvir Habib Sardar and Ahmed Rimaz Faizabadi

In recent years, there is a gradual shift from sequential computing to parallel computing. Nowadays, nearly all computers are of multicore processors. To exploit the…

HTML
PDF (424 KB)

Abstract

Purpose

In recent years, there is a gradual shift from sequential computing to parallel computing. Nowadays, nearly all computers are of multicore processors. To exploit the available cores, parallel computing becomes necessary. It increases speed by processing huge amount of data in real time. The purpose of this paper is to parallelize a set of well-known programs using different techniques to determine best way to parallelize a program experimented.

Design/methodology/approach

A set of numeric algorithms are parallelized using hand parallelization using OpenMP and auto parallelization using Pluto tool.

Findings

The work discovers that few of the algorithms are well suited in auto parallelization using Pluto tool but many of the algorithms execute more efficiently using OpenMP hand parallelization.

Originality/value

The work provides an original work on parallelization using OpenMP programming paradigm and Pluto tool.

Details

Data Technologies and Applications, vol. 53 no. 1
Type: Research Article
DOI: https://doi.org/10.1108/DTA-05-2018-0040
ISSN: 2514-9288

Keywords

  • Algorithms
  • OpenMP
  • Auto parallelization
  • Code parallelization
  • Hand parallelization
  • Pluto

To view the access options for this content please click here
Article
Publication date: 4 May 2010

Parallelization of an ant‐based clustering approach

Ozlem Gemici Gunes and A. Sima Uyar

The purpose of this paper is to propose parallelization of a successful sequential ant‐based clustering algorithm (SABCA) to increase time performance.

HTML
PDF (197 KB)

Abstract

Purpose

The purpose of this paper is to propose parallelization of a successful sequential ant‐based clustering algorithm (SABCA) to increase time performance.

Design/methodology/approach

A SABCA is parallelized through the chosen parallelization library MPI. Parallelization is performed in two stages. In the first stage, data to be clustered are divided among processors. After the sequential ant‐based approach running on each processor clusters the data assigned to it, the resulting clusters are merged in the second stage. The merging is also performed through the same ant‐based technique. The experimental analysis focuses on whether the implemented parallel ant‐based clustering method leads to a better time performance than its fully sequential version or not. Since the aim of this paper is to speedup the time consuming, but otherwise successful, ant‐based clustering method, no extra steps are taken to improve the clustering solution. Tests are executed using 2 and 4 processors on selected sample datasets. Results are analyzed through commonly used cluster validity indices and parallelization performance metrices.

Findings

As a result of the experiments, it is seen that the proposed algorithm performs better based on time measurements and parallelization performance metrices; as expected, it does not improve the clustering quality based on the cluster validity indices. Furthermore, the communication cost is very small compared to other ant‐based clustering parallelization techniques proposed so far.

Research limitations/implications

The use of MPI for the parallelization step has been very effective. Also, the proposed parallelization technique is quite successful in increasing time performance; however, as a future study, improvements to clustering quality can be made in the final step where the partially clustered data are merged.

Practical implications

The results in literature show that ant‐based clustering techniques are successful; however, their high‐time complexity prohibit their effective use in practical applications. Through this low‐communication‐cost parallelization technique, this limitation may be overcome.

Originality/value

A new parallelization approach to ant‐based clustering is proposed. The proposed approach does not decrease clustering performance while it increases time performance. Also, another major contribution of this paper is the fact that the communication costs required for parallelization is lower than the previously proposed parallel ant‐based techniques.

Details

Kybernetes, vol. 39 no. 4
Type: Research Article
DOI: https://doi.org/10.1108/03684921011036844
ISSN: 0368-492X

Keywords

  • Cybernetics
  • Programming and algorithm theory
  • Cluster analysis

To view the access options for this content please click here
Article
Publication date: 4 May 2012

On pragmatic parallelization of a serial Navier‐Stokes solver in cylindrical coordinates

Frode Nygård and Helge I. Andersson

The purpose of this paper is to describe a pragmatic parallelization of a publicly available serial code aimed for direct numerical simulations of turbulent flow fields…

HTML
PDF (90 KB)

Abstract

Purpose

The purpose of this paper is to describe a pragmatic parallelization of a publicly available serial code aimed for direct numerical simulations of turbulent flow fields. The code solves the full Navier‐Stokes equations in a cylindrical coordinate system.

Design/methodology/approach

The parallelization is performed by a single program multiple data approach using the Message‐Passing Interface (MPI) Library for processor communication.

Findings

In order to maintain the original coding of the subroutines, two obstacles had to be overcome. First, special attention had to be given to the inversion of the sparse matrixes from the linear terms in the Navier‐Stokes equations solved by an implicit scheme. Second, the serial FFT‐routines, needed for the direct Poisson‐solver, had to be replaced by parallel versions. Two directions of parallelization were tested. Parallelization in the axial direction turned out to be more efficient than parallelization in the circumferential direction.

Originality/value

This paper presents a pragmatic parallelization of an open source finite difference code and should be useful to researchers in the field of numerical methods for fluid flow who need to parallelize a numerical code.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. 22 no. 4
Type: Research Article
DOI: https://doi.org/10.1108/09615531211215783
ISSN: 0961-5539

Keywords

  • Flow
  • Turbulence
  • Differential equations
  • Parallelization
  • Message‐passing interface
  • Finite difference
  • Navier‐Stokes

To view the access options for this content please click here
Article
Publication date: 1 June 2005

Parallelized computation of compressed BEM matrices on multiprocessor computer clusters

André Buchau, Wolfgang Hafla, Friedemann Groh and Wolfgang M. Rucker

Various parallelization strategies are investigated to mainly reduce the computational costs in the context of boundary element methods and a compressed system matrix.

HTML
PDF (463 KB)

Abstract

Purpose

Various parallelization strategies are investigated to mainly reduce the computational costs in the context of boundary element methods and a compressed system matrix.

Design/methodology/approach

Electrostatic field problems are solved numerically by an indirect boundary element method. The fully dense system matrix is compressed by an application of the fast multipole method. Various parallelization techniques such as vectorization, multiple threads, and multiple processes are applied to reduce the computational costs.

Findings

It is shown that in total a good speedup is achieved by a parallelization approach which is relatively easy to implement. Furthermore, a detailed discussion on the influence of problem oriented meshes to the different parts of the method is presented. On the one hand the application of problem oriented meshes leads to relatively small linear systems of equations along with a high accuracy of the solution, but on the other hand the efficiency of parallelization itself is diminished.

Research limitations/implications

The presented parallelization approach has been tested on a small PC cluster only. Additionally, the main focus has been laid on a reduction of computing time.

Practical implications

Typical properties of general static field problems are comprised in the investigated numerical example. Hence, the results and conclusions are rather general.

Originality/value

Implementation details of a parallelization of existing fast and efficient boundary element method solvers are discussed. The presented approach is relatively easy to implement and takes special properties of fast methods in combination with parallelization into account.

Details

COMPEL - The international journal for computation and mathematics in electrical and electronic engineering, vol. 24 no. 2
Type: Research Article
DOI: https://doi.org/10.1108/03321640510586105
ISSN: 0332-1649

Keywords

  • Parallel programming
  • Electrostatics

To view the access options for this content please click here
Article
Publication date: 7 March 2016

Performance comparisons of bonding box-based contact detection algorithms and a new improvement technique based on parallelization

Mahmoud Yazdani, Hamidreza Paseh and Mostafa Sharifzadeh

– The purpose of this paper is to find a convenient contact detection algorithm in order to apply in distinct element simulation.

HTML
PDF (1.6 MB)

Abstract

Purpose

The purpose of this paper is to find a convenient contact detection algorithm in order to apply in distinct element simulation.

Design/methodology/approach

Taking the most computation effort, the performance of the contact detection algorithm highly affects the running time. The algorithms investigated in this study consist of Incremental Sort-and-Update (ISU) and Double-Ended Spatial Sorting (DESS). These algorithms are based on bounding boxes, which makes the algorithm independent of blocks shapes. ISU and DESS algorithms contain sorting and updating phases. To compare the algorithms, they were implemented in identical examples of rock engineering problems with varying parameters.

Findings

The results show that the ISU algorithm gives lower running time and shows better performance when blocks are unevenly distributed in both axes. The conventional ISU merges the sorting and updating phases in its naïve implementation. In this paper, a new computational technique is proposed based on parallelization in order to effectively improve the ISU algorithm and decrease the running time of numerical analysis in large-scale rock mass projects.

Originality/value

In this approach, the sorting and updating phases are separated by minor changes in the algorithm. This tends to a minimal overhead of running time and a little extra memory usage and then the parallelization of phases can be applied. On the other hand, the time consumed by the updating phase of ISU algorithm is about 30 percent of the total time, which makes the parallelization justifiable. Here, according to the results for the large-scale problems, this improved technique can increase the performance of the ISU algorithm up to 20 percent.

Details

Engineering Computations, vol. 33 no. 1
Type: Research Article
DOI: https://doi.org/10.1108/EC-05-2014-0102
ISSN: 0264-4401

Keywords

  • Parallelization
  • Distinct element method
  • Bounding box
  • Contact detection
  • DESS algorithm
  • ISU algorithm

To view the access options for this content please click here
Article
Publication date: 1 January 2013

Parallel construction of explicit boundaries using support vector machines

Ke Lin, Anirban Basudhar and Samy Missoum

The purpose of this paper is to present a study of the parallelization of the construction of explicit constraints or limit‐state functions using support vector machines…

HTML
PDF (176 KB)

Abstract

Purpose

The purpose of this paper is to present a study of the parallelization of the construction of explicit constraints or limit‐state functions using support vector machines. These explicit boundaries have proven to be beneficial for design optimization and reliability assessment, especially for problems with large computational times, discontinuities, or binary outputs. In addition to the study of the parallelization, the objective of this article is also to provide an approach to select the number of processors.

Design/methodology/approach

This article investigates the parallelization in two ways. First, the efficiency of the parallelization is assessed by comparing, over several runs, the number of iterations needed to create an accurate boundary to the number of iterations associated with a theoretical “linear” speedup. Second, by studying these differences, an “appropriate” range of parallel processors can be inferred.

Findings

The parallelization of the construction of explicit boundaries can lead to a markedly reduced computational burden. The study provides an approach to select the number of processors for an optimal use of computational resources.

Originality/value

The construction of explicit boundaries for design optimization and reliability assessment is designed to alleviate many hurdles in these areas. The parallelization of the construction of the boundaries is a much needed study to reinforce the efficacy and efficiency of this approach.

Details

Engineering Computations, vol. 30 no. 1
Type: Research Article
DOI: https://doi.org/10.1108/02644401311286099
ISSN: 0264-4401

Keywords

  • Explicit design space decomposition
  • Support vector machines
  • Parallel processing
  • Optimum design
  • Reliability management

To view the access options for this content please click here
Article
Publication date: 30 June 2020

Exploring compression and parallelization techniques for distribution of deep neural networks over Edge–Fog continuum – a review

Azra Nazir, Roohie Naaz Mir and Shaima Qureshi

The trend of “Deep Learning for Internet of Things (IoT)” has gained fresh momentum with enormous upcoming applications employing these models as their processing engine…

HTML
PDF (2 MB)

Abstract

Purpose

The trend of “Deep Learning for Internet of Things (IoT)” has gained fresh momentum with enormous upcoming applications employing these models as their processing engine and Cloud as their resource giant. But this picture leads to underutilization of ever-increasing device pool of IoT that has already passed 15 billion mark in 2015. Thus, it is high time to explore a different approach to tackle this issue, keeping in view the characteristics and needs of the two fields. Processing at the Edge can boost applications with real-time deadlines while complementing security.

Design/methodology/approach

This review paper contributes towards three cardinal directions of research in the field of DL for IoT. The first section covers the categories of IoT devices and how Fog can aid in overcoming the underutilization of millions of devices, forming the realm of the things for IoT. The second direction handles the issue of immense computational requirements of DL models by uncovering specific compression techniques. An appropriate combination of these techniques, including regularization, quantization, and pruning, can aid in building an effective compression pipeline for establishing DL models for IoT use-cases. The third direction incorporates both these views and introduces a novel approach of parallelization for setting up a distributed systems view of DL for IoT.

Findings

DL models are growing deeper with every passing year. Well-coordinated distributed execution of such models using Fog displays a promising future for the IoT application realm. It is realized that a vertically partitioned compressed deep model can handle the trade-off between size, accuracy, communication overhead, bandwidth utilization, and latency but at the expense of an additionally considerable memory footprint. To reduce the memory budget, we propose to exploit Hashed Nets as potentially favorable candidates for distributed frameworks. However, the critical point between accuracy and size for such models needs further investigation.

Originality/value

To the best of our knowledge, no study has explored the inherent parallelism in deep neural network architectures for their efficient distribution over the Edge-Fog continuum. Besides covering techniques and frameworks that have tried to bring inference to the Edge, the review uncovers significant issues and possible future directions for endorsing deep models as processing engines for real-time IoT. The study is directed to both researchers and industrialists to take on various applications to the Edge for better user experience.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 13 no. 3
Type: Research Article
DOI: https://doi.org/10.1108/IJICC-04-2020-0038
ISSN: 1756-378X

Keywords

  • Distributed deep neural networks
  • Fog
  • Internet of things
  • Compression
  • Parallelization

To view the access options for this content please click here
Article
Publication date: 1 June 1999

Parallelization of the finite volume method for radiation heat transfer

P.J. Coelho and J. Gonçalves

The finite volume method for radiative heat transfer calculations has been parallelized using two strategies, the angular domain decomposition and the spatial domain…

HTML
PDF (186 KB)

Abstract

The finite volume method for radiative heat transfer calculations has been parallelized using two strategies, the angular domain decomposition and the spatial domain decomposition. In the first case each processor performs the calculations for the whole domain and for a subset of control angles, while in the second case each processor deals with all the control angles but only treats a spatial subdomain. The method is applied to three‐dimensional rectangular enclosures containing a grey emitting‐absorbing medium. The results obtained show that the number of iterations required to achieve convergence is independent of the number of processors in the angular decomposition strategy, but increases with the number of processors in the domain decomposition method. As a consequence, higher parallel efficiencies are obtained in the first case. The influence of the angular discretization, grid size and absorption coefficient of the medium on the parallel performance is also investigated.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. 9 no. 4
Type: Research Article
DOI: https://doi.org/10.1108/09615539910266576
ISSN: 0961-5539

Keywords

  • Finite volume
  • Heat transfer
  • Parallel computing

To view the access options for this content please click here
Article
Publication date: 1 June 2003

FEM and BEM parallel processing: theory and applications – a bibliography (1996‐2002)

Jaroslav Mackerle

This paper gives a bibliographical review of the finite element and boundary element parallel processing techniques from the theoretical and application points of view…

HTML
PDF (227 KB)

Abstract

This paper gives a bibliographical review of the finite element and boundary element parallel processing techniques from the theoretical and application points of view. Topics include: theory – domain decomposition/partitioning, load balancing, parallel solvers/algorithms, parallel mesh generation, adaptive methods, and visualization/graphics; applications – structural mechanics problems, dynamic problems, material/geometrical non‐linear problems, contact problems, fracture mechanics, field problems, coupled problems, sensitivity and optimization, and other problems; hardware and software environments – hardware environments, programming techniques, and software development and presentations. The bibliography at the end of this paper contains 850 references to papers, conference proceedings and theses/dissertations dealing with presented subjects that were published between 1996 and 2002.

Details

Engineering Computations, vol. 20 no. 4
Type: Research Article
DOI: https://doi.org/10.1108/02644400310476333
ISSN: 0264-4401

Keywords

  • Finite elements
  • Boundary elements
  • Adaptive techniques
  • Optimization
  • Bibliography

To view the access options for this content please click here
Article
Publication date: 1 November 2001

Parallel ray tracing for radiative heat transfer: Application in a distributed computing environment

J.G. Marakis, J. Chamiço, G. Brenner and F. Durst

Notes that, in a full‐scale application of the Monte Carlo method for combined heat transfer analysis, problems usually arise from the large computing requirements. Here…

HTML
PDF (381 KB)

Abstract

Notes that, in a full‐scale application of the Monte Carlo method for combined heat transfer analysis, problems usually arise from the large computing requirements. Here the method to overcome this difficulty is the parallel execution of the Monte Carlo method in a distributed computing environment. Addresses the problem of determination of the temperature field formed under the assumption of radiative equilibrium in an enclosure idealizing an industrial furnace. The medium contained in this enclosure absorbs, emits and scatters anisotropically thermal radiation. Discusses two topics in detail: first, the efficiency of the parallelization of the developed code, and second, the influence of the scattering behavior of the medium. The adopted parallelization method for the first topic is the decomposition of the statistical sample and its subsequent distribution among the available processors. The measured high efficiencies showed that this method is particularly suited to the target architecture of this study, which is a dedicated network of workstations supporting the message passing paradigm. For the second topic, the results showed that taking into account the isotropic scattering, as opposed to neglecting the scattering, has a pronounced impact on the temperature distribution inside the enclosure. In contrast, the consideration of the sharply forward scattering, that is characteristic of all the real combustion particles, leaves the predicted temperature field almost undistinguishable from the absorbing/emitting case.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. 11 no. 7
Type: Research Article
DOI: https://doi.org/10.1108/EUM0000000005983
ISSN: 0961-5539

Keywords

  • Radiation
  • Anisotropy
  • Monte Carlo simulation
  • Parallel computing
  • Heat transfer

Access
Only content I have access to
Only Open Access
Year
  • Last month (1)
  • Last 3 months (4)
  • Last 6 months (11)
  • Last 12 months (23)
  • All dates (348)
Content type
  • Article (321)
  • Book part (13)
  • Earlycite article (11)
  • Case study (3)
1 – 10 of 348
Emerald Publishing
  • Opens in new window
  • Opens in new window
  • Opens in new window
  • Opens in new window
© 2021 Emerald Publishing Limited

Services

  • Authors Opens in new window
  • Editors Opens in new window
  • Librarians Opens in new window
  • Researchers Opens in new window
  • Reviewers Opens in new window

About

  • About Emerald Opens in new window
  • Working for Emerald Opens in new window
  • Contact us Opens in new window
  • Publication sitemap

Policies and information

  • Privacy notice
  • Site policies
  • Modern Slavery Act Opens in new window
  • Chair of Trustees governance statement Opens in new window
  • COVID-19 policy Opens in new window
Manage cookies

We’re listening — tell us what you think

  • Something didn’t work…

    Report bugs here

  • All feedback is valuable

    Please share your general feedback

  • Member of Emerald Engage?

    You can join in the discussion by joining the community or logging in here.
    You can also find out more about Emerald Engage.

Join us on our journey

  • Platform update page

    Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

  • Questions & More Information

    Answers to the most commonly asked questions here