Search results

1 – 10 of 54
To view the access options for this content please click here
Article
Publication date: 3 July 2017

Alex A. Schmidt, Alice de Jesus Kozakevicius and Stefan Jakobsson

The current work aims to present a parallel code using the open multi-processing (OpenMP) programming model for an adaptive multi-resolution high-order finite difference…

Abstract

Purpose

The current work aims to present a parallel code using the open multi-processing (OpenMP) programming model for an adaptive multi-resolution high-order finite difference scheme for solving 2D conservation laws, comparing efficiencies obtained with a previous message passing interface formulation for the same serial scheme and considering the same type of 2D formulations laws.

Design/methodology/approach

The serial version of the code is naturally suitable for parallelization because the spatial operator formulation is based on a splitting scheme per direction for which the flux components are numerically computed by a Lax–Friedrichs factorization independently for each row or column. High-order approximations for numerical fluxes are computed by the third-order essentially non-oscillatory (ENO) and fifth-order weighted essentially non-oscillatory (WENO) interpolation schemes, assuming sparse grids in each direction. The grid adaptivity is obtained by a cubic interpolating wavelet transform applied in each space dimension, associated to a threshold operator. Time is evolved by a third order TVD Runge–Kutta method.

Findings

The parallel formulation is implemented automatically at compiling time by the OpenMP library routines, being virtually transparent to the programmer. This over simplifies any concerns about managing and/or updating the adaptive grid when compared to what is necessary to be done when other parallel approaches are considered. Numerical simulations results and the large speedups obtained for the Euler equations in gas dynamics highlight the efficiency of the OpenMP approach.

Research limitations/implications

The resulting speedups reflect the effectiveness of the OpenMP approach but are, to a large extension, limited by the hardware used (2 E5-2620 Intel Xeon processors, 6 cores, 2 threads/core, hyper-threading enabled). As the demand for OpenMP threads increases, the code starts to make explicit use of the second logical thread available in each E5-2620 processor core and efficiency drops. The speedup peak is reached near the possible maximum (24) at about 22, 23 threads. This peak reflects the hardware configuration and the true software limit should be located way beyond this value.

Practical implications

So far no attempts have been made to parallelize other possible code segments (for instance, the ENO|-WENO-TVD code lines that process the different data components which could potentially push the speed up limit to higher values even further. The fact that the speedup peak is located close to the present hardware limit reflects the scalability properties of the OpenMP programming and of the splitting scheme as well. Consequently, it is likely that the speedup peak with the OpenMP approach for this kind of problem formulation will be close to the physical (and/or logical) limit of the hardware used.

Social implications

This work is the result of a successful collaboration among researchers from two different institutions, one internationally well-known and with a long-term experience in applied mathematics for industrial applications and the other in a starting process of international academic insertion. In this way, this scientific partnership has the potential of promoting further knowledge exchange, involving students and other collaborators.

Originality/value

The proposed methodology (use of OpenMP programming model for the wavelet adaptive splitting scheme) is original and contributes to a very active research area in the past years, namely, adaptive methods for conservation laws and their parallel formulations, which is of great interest for the entire scientific community.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. 27 no. 7
Type: Research Article
ISSN: 0961-5539

Keywords

To view the access options for this content please click here
Article
Publication date: 7 February 2019

Tanvir Habib Sardar and Ahmed Rimaz Faizabadi

In recent years, there is a gradual shift from sequential computing to parallel computing. Nowadays, nearly all computers are of multicore processors. To exploit the…

Abstract

Purpose

In recent years, there is a gradual shift from sequential computing to parallel computing. Nowadays, nearly all computers are of multicore processors. To exploit the available cores, parallel computing becomes necessary. It increases speed by processing huge amount of data in real time. The purpose of this paper is to parallelize a set of well-known programs using different techniques to determine best way to parallelize a program experimented.

Design/methodology/approach

A set of numeric algorithms are parallelized using hand parallelization using OpenMP and auto parallelization using Pluto tool.

Findings

The work discovers that few of the algorithms are well suited in auto parallelization using Pluto tool but many of the algorithms execute more efficiently using OpenMP hand parallelization.

Originality/value

The work provides an original work on parallelization using OpenMP programming paradigm and Pluto tool.

Details

Data Technologies and Applications, vol. 53 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

To view the access options for this content please click here
Article
Publication date: 5 May 2015

Guangtao Duan and Bin Chen

The purpose of this paper is to find the best solver for parallelizing particle methods based on solving Pressure Poisson Equation (PPE) by taking Moving Particle…

Abstract

Purpose

The purpose of this paper is to find the best solver for parallelizing particle methods based on solving Pressure Poisson Equation (PPE) by taking Moving Particle Semi-Implicit (MPS) method as an example because the solution for PPE is usually the most time-consuming part difficult to parallelize.

Design/methodology/approach

To find the best solver, the authors compare six Krylov solvers, namely, Conjugate Gradient method (CG), Scaled Conjugate Gradient method (SCG), Bi-Conjugate Gradient Stabilized (BiCGStab) method, Conjugate Gradient Squared (CGS) method with Symmetric Lanczos Algorithm (SLA) method and Incomplete Cholesky Conjugate Gradient method (ICCG) in terms of convergence, time consumption, parallel efficiency and memory consumption for the semi-implicit particle method. The MPS method is parallelized by the hybrid Open Multi-Processing (OpenMP)/Message Passing Interface (MPI) model. The dam-break flow and channel flow simulations are used to evaluate the performance of different solvers.

Findings

It is found that CG converges stably, runs fastest in the serial way, uses the least memory and has highest OpenMP parallel efficiency, but its MPI parallel efficiency is lower than SLA because SLA requires less synchronization than CG.

Originality/value

With all these criteria considered and weighed, the recommended parallel solver for the MPS method is CG.

To view the access options for this content please click here
Article
Publication date: 3 July 2018

Nen-Zi Wang and Hsin-Yi Chen

A cross-platform paradigm (computing model), which combines the graphical user interface of MATLAB and parallel Fortran programming, for fluid-film lubrication analysis is…

Abstract

Purpose

A cross-platform paradigm (computing model), which combines the graphical user interface of MATLAB and parallel Fortran programming, for fluid-film lubrication analysis is proposed. The purpose of this paper is to take the advantages of effective multithreaded computing of OpenMP and MATLAB’s user-friendly interface and real-time display capability.

Design/methodology/approach

A validation of computing performance of MATLAB and Fortran coding for solving two simple sliders by iterative solution methods is conducted. The online display of the particles’ search process is incorporated in the MATLAB coding, and the execution of the air foil bearing optimum design is conducted by using OpenMP multithreaded computing in the background. The optimization analysis is conducted by particle swarm optimization method for an air foil bearing design.

Findings

It is found that the MATLAB programs require prolonged execution times than those by using Fortran computing in iterative methods. The execution time of the air foil bearing optimum design is significantly minimized by using the OpenMP computing. As a result, the cross-platform paradigm can provide a useful graphical user interface. And very little code rewritting of the original numerical models is required, which is usually optimized for either serial or parallel computing.

Research limitations/implications

Iterative methods are commonly applied in fluid-film lubrication analyses. In this study, iterative methods are used as the solution methods, which may not be an effective way to compute in the MATLAB’s setting.

Originality/value

In this study, a cross-platform paradigm consisting of a standalone MATLAB and Fortran codes is proposed. The approach combines the best of the two paradigms and each coding can be modified or maintained independently for different applications.

Details

Industrial Lubrication and Tribology, vol. 70 no. 6
Type: Research Article
ISSN: 0036-8792

Keywords

To view the access options for this content please click here
Article
Publication date: 3 May 2013

Nikola Jeranče, Goran Stojanović, Nataša Samardžić and Daniel Kesler

The motivation for this research work is the need for an efficient software tool for inductance calculation of components in flexible electronics. A software package…

Abstract

Purpose

The motivation for this research work is the need for an efficient software tool for inductance calculation of components in flexible electronics. A software package PROVOD has been developed and it has produced very accurate results but the applied numerical method can lead to a huge amount of calculations. The aim of this research is to apply the parallel computing to this specific computational technique and to investigate the impact of increasing the number of parallel executing threads.

Design/methodology/approach

The largest possible amount of operations is put in parallel using the fact that the inductance between two segments is a sum of independent elements. OpenMP and Microsoft's Concurrency Runtime have been tested as parallel programming techniques.

Findings

Parallel computing with a different number of threads (up to 24) has been tested with OpenMP. A significant increase in computational speed (up to 21 times) has been obtained.

Research limitations/implications

The research is limited by the available number of parallel processors.

Practical implications

Accurate and fast inductance calculation for flexible electronic components is possible to achieve. The impact of parallel processing is proven.

Social implications

The proposed method of calculation acceleration of inductances can be helpful in the design and optimization of new flexible devices in electronics.

Originality/value

Parallel computing is applied to the design of flexible electronic components. It is shown that a large number of parallel processors can be efficiently used in this type of calculation. The obtained results are interesting for people involved in the design of flexible components, and generally, for researchers/engineers dealing with similar electromagnetic problems.

Details

COMPEL - The international journal for computation and mathematics in electrical and electronic engineering, vol. 32 no. 3
Type: Research Article
ISSN: 0332-1649

Keywords

To view the access options for this content please click here
Article
Publication date: 9 April 2019

Mohammad Mortezazadeh and Liangzhu (Leon) Wang

The purpose of this paper is the development of a new density-based (DB) semi-Lagrangian method to speed up the conventional pressure-based (PB) semi-Lagrangian methods.

Abstract

Purpose

The purpose of this paper is the development of a new density-based (DB) semi-Lagrangian method to speed up the conventional pressure-based (PB) semi-Lagrangian methods.

Design/methodology/approach

The semi-Lagrangian-based solvers are typically PB, i.e. semi-Lagrangian pressure-based (SLPB) solvers, where a Poisson equation is solved for obtaining the pressure field and ensuring a divergence-free flow field. As an elliptic-type equation, the Poisson equation often relies on an iterative solution, so it can create a challenge of parallel computing and a bottleneck of computing speed. This study proposes a new DB semi-Lagrangian method, i.e. the semi-Lagrangian artificial compressibility (SLAC), which replaces the Poisson equation by a hyperbolic continuity equation with an added artificial compressibility (AC) term, so a time-marching solution is possible. Without the Poisson equation, the proposed SLAC solver is faster, particularly for the cases with more computational cells, and better suited for parallel computing.

Findings

The study compares the accuracy and the computing speeds of both SLPB and SLAC solvers for the lid-driven cavity flow and the step-flow problems. It shows that the proposed SLAC solver is able to achieve the same results as the SLPB, whereas with a 3.03 times speed up before using the OpenMP parallelization and a 3.35 times speed up for the large grid number case (512 × 512) after the parallelization. The speed up can be improved further for larger cases because of increasing the condition number of the coefficient matrixes of the Poisson equation.

Originality/value

This paper proposes a method of avoiding solving the Poisson equation, a typical computing bottleneck for semi-Lagrangian-based fluid solvers by converting the conventional PB solver (SLPB) to the DB solver (SLAC) through the addition of the AC term. The method simplifies and facilitates the parallelization process of semi-Lagrangian-based fluid solvers for modern HPC infrastructures, such as OpenMP and GPU computing.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. 29 no. 6
Type: Research Article
ISSN: 0961-5539

Keywords

To view the access options for this content please click here
Article
Publication date: 30 September 2014

Pedro Miguel de Almeida Areias, Timon Rabczuk and Joaquim Infante Barbosa

– The purpose of this paper is to discuss the linear solution of equality constrained problems by using the Frontal solution method without explicit assembling.

Abstract

Purpose

The purpose of this paper is to discuss the linear solution of equality constrained problems by using the Frontal solution method without explicit assembling.

Design/methodology/approach

Re-written frontal solution method with a priori pivot and front sequence. OpenMP parallelization, nearly linear (in elimination and substitution) up to 40 threads. Constraints enforced at the local assembling stage.

Findings

When compared with both standard sparse solvers and classical frontal implementations, memory requirements and code size are significantly reduced.

Research limitations/implications

Large, non-linear problems with constraints typically make use of the Newton method with Lagrange multipliers. In the context of the solution of problems with large number of constraints, the matrix transformation methods (MTM) are often more cost-effective. The paper presents a complete solution, with topological ordering, for this problem.

Practical implications

A complete software package in Fortran 2003 is described. Examples of clique-based problems are shown with large systems solved in core.

Social implications

More realistic non-linear problems can be solved with this Frontal code at the core of the Newton method.

Originality/value

Use of topological ordering of constraints. A-priori pivot and front sequences. No need for symbolic assembling. Constraints treated at the core of the Frontal solver. Use of OpenMP in the main Frontal loop, now quantified. Availability of Software.

Details

Engineering Computations, vol. 31 no. 7
Type: Research Article
ISSN: 0264-4401

Keywords

To view the access options for this content please click here
Article
Publication date: 25 June 2020

Abedalmuhdi Almomany, Ahmad M. Al-Omari, Amin Jarrah and Mohammad Tawalbeh

The problem of motif discovery has become a significant challenge in the era of big data where there are hundreds of genomes requiring annotations. The importance of…

Abstract

Purpose

The problem of motif discovery has become a significant challenge in the era of big data where there are hundreds of genomes requiring annotations. The importance of motifs has led many researchers to develop different tools and algorithms for finding them. The purpose of this paper is to propose a new algorithm to increase the speed and accuracy of the motif discovering process, which is the main drawback of motif discovery algorithms.

Design/methodology/approach

All motifs are sorted in a tree-based indexing structure where each motif is created from a combination of nucleotides: ‘A’, ‘C’, ‘T’ and ‘G’. The full motif can be discovered by extending the search around 4-mer nucleotides in both directions, left and right. Resultant motifs would be identical or degenerated with various lengths.

Findings

The developed implementation discovers conserved string motifs in DNA without having prior information about the motifs. Even for a large data set that contains millions of nucleotides and thousands of very long sequences, the entire process is completed in a few seconds.

Originality/value

Experimental results demonstrate the efficiency of the proposed implementation; as for a real-sequence of 1,270,000 nucleotides spread into 2,000 samples, it takes 5.9 s to complete the overall discovering process when the code ran on an Intel Core i7-6700 @ 3.4 GHz machine and 26.7 s when running on an Intel Xeon x5670 @ 2.93 GHz machine. In addition, the authors have improved computational performance by parallelizing the implementation to run on multi-core machines using the OpenMP framework. The speedup achieved by parallelizing the implementation is scalable and proportional to the number of processors with a high efficiency that is close to 100%.

Details

Engineering Computations, vol. 38 no. 1
Type: Research Article
ISSN: 0264-4401

Keywords

To view the access options for this content please click here
Article
Publication date: 25 February 2014

S.H. Ju

This paper develops C++ and Fortran-90 solvers to establish parallel solution procedures in a finite element or meshless analysis program using shared memory computers…

Abstract

Purpose

This paper develops C++ and Fortran-90 solvers to establish parallel solution procedures in a finite element or meshless analysis program using shared memory computers. The paper aims to discuss these issues.

Design/methodology/approach

The stiffness matrix can be symmetrical or unsymmetrical, and the solution schemes include sky-line Cholesky and parallel preconditioned conjugate gradient-like methods.

Findings

By using the features of C++ or Fortran-90, the stiffness matrix and its auxiliary arrays can be encapsulated into a class or module as private arrays. This class or module will handle how to allocate, renumber, assemble, parallelize and solve these complicated arrays automatically.

Practical implications

The source codes can be obtained online at http//myweb.ncku.edu.tw/∼juju. The major advantage of the scheme is that it is simple and systematic, so an efficient parallel finite element or meshless program can be established easily.

Originality/value

With the minimum requirement of computer memory, an object-oriented C++ class and a Fortran-90 module were established to allocate, renumber, assemble, parallel, and solve the global stiffness matrix, so that the programmer does not need to handle them directly.

Details

Engineering Computations, vol. 31 no. 1
Type: Research Article
ISSN: 0264-4401

Keywords

To view the access options for this content please click here
Article
Publication date: 7 November 2016

Diogo Tenório Cintra, Ramiro Brito Willmersdorf, Paulo Roberto Maciel Lyra and William Wagner Matos Lira

The purpose of this paper is to present a methodology of hybrid parallelization applied to the discrete element method that combines message-passing interface and OpenMP

Abstract

Purpose

The purpose of this paper is to present a methodology of hybrid parallelization applied to the discrete element method that combines message-passing interface and OpenMP to improve computational performance. The scheme is based on mapping procedures based on Hilbert space-filling curves (HSFC).

Design/methodology/approach

The methodology uses domain decomposition strategies to distribute the computation of large-scale models in a cluster. It also partitions the workload of each subdomain among threads. This additional procedure aims to reach higher computational performance by adjusting the usage of message-passing artefacts and threads. The main objective is to reduce the communication among processes. The work division by threads employs HSFC in order to improve data locality and to avoid related overheads. Numerical simulations presented in this work permit to evaluate the proposed method in terms of parallel performance for models that contain up to 3.2 million particles.

Findings

Distinct partitioning algorithms were used in order to evaluate the local decomposition scheme, including the recursive coordinate bisection method and a topological scheme based on METIS. The results show that the hybrid implementations reach better computational performance than those based on message passing only, including a good control of load balancing among threads. Case studies present good scalability and parallel efficiencies.

Originality/value

The proposed approach defines a configurable execution environment for numerical models and introduces a combined scheme that improves data locality and iterative workload balancing.

Details

Engineering Computations, vol. 33 no. 8
Type: Research Article
ISSN: 0264-4401

Keywords

1 – 10 of 54