Search results

1 – 10 of 68
Article
Publication date: 5 April 2024

Abhishek Kumar Singh and Krishna Mohan Singh

In the present work, we focus on developing an in-house parallel meshless local Petrov-Galerkin (MLPG) code for the analysis of heat conduction in two-dimensional and…

Abstract

Purpose

In the present work, we focus on developing an in-house parallel meshless local Petrov-Galerkin (MLPG) code for the analysis of heat conduction in two-dimensional and three-dimensional regular as well as complex geometries.

Design/methodology/approach

The parallel MLPG code has been implemented using open multi-processing (OpenMP) application programming interface (API) on the shared memory multicore CPU architecture. Numerical simulations have been performed to find the critical regions of the serial code, and an OpenMP-based parallel MLPG code is developed, considering the critical regions of the sequential code.

Findings

Based on performance parameters such as speed-up and parallel efficiency, the credibility of the parallelization procedure has been established. Maximum speed-up and parallel efficiency are 10.94 and 0.92 for regular three-dimensional geometry (343,000 nodes). Results demonstrate the suitability of parallelization for larger nodes as parallel efficiency and speed-up are more for the larger nodes.

Originality/value

Few attempts have been made in parallel implementation of the MLPG method for solving large-scale industrial problems. Although the literature suggests that message-passing interface (MPI) based parallel MLPG codes have been developed, the OpenMP model has rarely been touched. This work is an attempt at the development of OpenMP-based parallel MLPG code for the very first time.

Details

Engineering Computations, vol. 41 no. 2
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 3 July 2017

Alex A. Schmidt, Alice de Jesus Kozakevicius and Stefan Jakobsson

The current work aims to present a parallel code using the open multi-processing (OpenMP) programming model for an adaptive multi-resolution high-order finite difference scheme…

Abstract

Purpose

The current work aims to present a parallel code using the open multi-processing (OpenMP) programming model for an adaptive multi-resolution high-order finite difference scheme for solving 2D conservation laws, comparing efficiencies obtained with a previous message passing interface formulation for the same serial scheme and considering the same type of 2D formulations laws.

Design/methodology/approach

The serial version of the code is naturally suitable for parallelization because the spatial operator formulation is based on a splitting scheme per direction for which the flux components are numerically computed by a Lax–Friedrichs factorization independently for each row or column. High-order approximations for numerical fluxes are computed by the third-order essentially non-oscillatory (ENO) and fifth-order weighted essentially non-oscillatory (WENO) interpolation schemes, assuming sparse grids in each direction. The grid adaptivity is obtained by a cubic interpolating wavelet transform applied in each space dimension, associated to a threshold operator. Time is evolved by a third order TVD Runge–Kutta method.

Findings

The parallel formulation is implemented automatically at compiling time by the OpenMP library routines, being virtually transparent to the programmer. This over simplifies any concerns about managing and/or updating the adaptive grid when compared to what is necessary to be done when other parallel approaches are considered. Numerical simulations results and the large speedups obtained for the Euler equations in gas dynamics highlight the efficiency of the OpenMP approach.

Research limitations/implications

The resulting speedups reflect the effectiveness of the OpenMP approach but are, to a large extension, limited by the hardware used (2 E5-2620 Intel Xeon processors, 6 cores, 2 threads/core, hyper-threading enabled). As the demand for OpenMP threads increases, the code starts to make explicit use of the second logical thread available in each E5-2620 processor core and efficiency drops. The speedup peak is reached near the possible maximum (24) at about 22, 23 threads. This peak reflects the hardware configuration and the true software limit should be located way beyond this value.

Practical implications

So far no attempts have been made to parallelize other possible code segments (for instance, the ENO|-WENO-TVD code lines that process the different data components which could potentially push the speed up limit to higher values even further. The fact that the speedup peak is located close to the present hardware limit reflects the scalability properties of the OpenMP programming and of the splitting scheme as well. Consequently, it is likely that the speedup peak with the OpenMP approach for this kind of problem formulation will be close to the physical (and/or logical) limit of the hardware used.

Social implications

This work is the result of a successful collaboration among researchers from two different institutions, one internationally well-known and with a long-term experience in applied mathematics for industrial applications and the other in a starting process of international academic insertion. In this way, this scientific partnership has the potential of promoting further knowledge exchange, involving students and other collaborators.

Originality/value

The proposed methodology (use of OpenMP programming model for the wavelet adaptive splitting scheme) is original and contributes to a very active research area in the past years, namely, adaptive methods for conservation laws and their parallel formulations, which is of great interest for the entire scientific community.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. 27 no. 7
Type: Research Article
ISSN: 0961-5539

Keywords

Article
Publication date: 7 February 2019

Tanvir Habib Sardar and Ahmed Rimaz Faizabadi

In recent years, there is a gradual shift from sequential computing to parallel computing. Nowadays, nearly all computers are of multicore processors. To exploit the available…

2331

Abstract

Purpose

In recent years, there is a gradual shift from sequential computing to parallel computing. Nowadays, nearly all computers are of multicore processors. To exploit the available cores, parallel computing becomes necessary. It increases speed by processing huge amount of data in real time. The purpose of this paper is to parallelize a set of well-known programs using different techniques to determine best way to parallelize a program experimented.

Design/methodology/approach

A set of numeric algorithms are parallelized using hand parallelization using OpenMP and auto parallelization using Pluto tool.

Findings

The work discovers that few of the algorithms are well suited in auto parallelization using Pluto tool but many of the algorithms execute more efficiently using OpenMP hand parallelization.

Originality/value

The work provides an original work on parallelization using OpenMP programming paradigm and Pluto tool.

Details

Data Technologies and Applications, vol. 53 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 5 May 2015

Guangtao Duan and Bin Chen

The purpose of this paper is to find the best solver for parallelizing particle methods based on solving Pressure Poisson Equation (PPE) by taking Moving Particle Semi-Implicit…

Abstract

Purpose

The purpose of this paper is to find the best solver for parallelizing particle methods based on solving Pressure Poisson Equation (PPE) by taking Moving Particle Semi-Implicit (MPS) method as an example because the solution for PPE is usually the most time-consuming part difficult to parallelize.

Design/methodology/approach

To find the best solver, the authors compare six Krylov solvers, namely, Conjugate Gradient method (CG), Scaled Conjugate Gradient method (SCG), Bi-Conjugate Gradient Stabilized (BiCGStab) method, Conjugate Gradient Squared (CGS) method with Symmetric Lanczos Algorithm (SLA) method and Incomplete Cholesky Conjugate Gradient method (ICCG) in terms of convergence, time consumption, parallel efficiency and memory consumption for the semi-implicit particle method. The MPS method is parallelized by the hybrid Open Multi-Processing (OpenMP)/Message Passing Interface (MPI) model. The dam-break flow and channel flow simulations are used to evaluate the performance of different solvers.

Findings

It is found that CG converges stably, runs fastest in the serial way, uses the least memory and has highest OpenMP parallel efficiency, but its MPI parallel efficiency is lower than SLA because SLA requires less synchronization than CG.

Originality/value

With all these criteria considered and weighed, the recommended parallel solver for the MPS method is CG.

Open Access
Article
Publication date: 7 July 2022

Sirilak Ketchaya and Apisit Rattanatranurak

Sorting is a very important algorithm to solve problems in computer science. The most well-known divide and conquer sorting algorithm is quicksort. It starts with dividing the…

1458

Abstract

Purpose

Sorting is a very important algorithm to solve problems in computer science. The most well-known divide and conquer sorting algorithm is quicksort. It starts with dividing the data into subarrays and finally sorting them.

Design/methodology/approach

In this paper, the algorithm named Dual Parallel Partition Sorting (DPPSort) is analyzed and optimized. It consists of a partitioning algorithm named Dual Parallel Partition (DPPartition). The DPPartition is analyzed and optimized in this paper and sorted with standard sorting functions named qsort and STLSort which are quicksort, and introsort algorithms, respectively. This algorithm is run on any shared memory/multicore systems. OpenMP library which supports multiprocessing programming is developed to be compatible with C/C++ standard library function. The authors’ algorithm recursively divides an unsorted array into two halves equally in parallel with Lomuto's partitioning and merge without compare-and-swap instructions. Then, qsort/STLSort is executed in parallel while the subarray is smaller than the sorting cutoff.

Findings

In the authors’ experiments, the 4-core Intel i7-6770 with Ubuntu Linux system is implemented. DPPSort is faster than qsort and STLSort up to 6.82× and 5.88× on Uint64 random distributions, respectively.

Originality/value

The authors can improve the performance of the parallel sorting algorithm by reducing the compare-and-swap instructions in the algorithm. This concept can be used to develop related problems to increase speedup of algorithms.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 3 July 2018

Nen-Zi Wang and Hsin-Yi Chen

A cross-platform paradigm (computing model), which combines the graphical user interface of MATLAB and parallel Fortran programming, for fluid-film lubrication analysis is…

135

Abstract

Purpose

A cross-platform paradigm (computing model), which combines the graphical user interface of MATLAB and parallel Fortran programming, for fluid-film lubrication analysis is proposed. The purpose of this paper is to take the advantages of effective multithreaded computing of OpenMP and MATLAB’s user-friendly interface and real-time display capability.

Design/methodology/approach

A validation of computing performance of MATLAB and Fortran coding for solving two simple sliders by iterative solution methods is conducted. The online display of the particles’ search process is incorporated in the MATLAB coding, and the execution of the air foil bearing optimum design is conducted by using OpenMP multithreaded computing in the background. The optimization analysis is conducted by particle swarm optimization method for an air foil bearing design.

Findings

It is found that the MATLAB programs require prolonged execution times than those by using Fortran computing in iterative methods. The execution time of the air foil bearing optimum design is significantly minimized by using the OpenMP computing. As a result, the cross-platform paradigm can provide a useful graphical user interface. And very little code rewritting of the original numerical models is required, which is usually optimized for either serial or parallel computing.

Research limitations/implications

Iterative methods are commonly applied in fluid-film lubrication analyses. In this study, iterative methods are used as the solution methods, which may not be an effective way to compute in the MATLAB’s setting.

Originality/value

In this study, a cross-platform paradigm consisting of a standalone MATLAB and Fortran codes is proposed. The approach combines the best of the two paradigms and each coding can be modified or maintained independently for different applications.

Details

Industrial Lubrication and Tribology, vol. 70 no. 6
Type: Research Article
ISSN: 0036-8792

Keywords

Article
Publication date: 3 May 2013

Nikola Jeranče, Goran Stojanović, Nataša Samardžić and Daniel Kesler

The motivation for this research work is the need for an efficient software tool for inductance calculation of components in flexible electronics. A software package PROVOD has…

Abstract

Purpose

The motivation for this research work is the need for an efficient software tool for inductance calculation of components in flexible electronics. A software package PROVOD has been developed and it has produced very accurate results but the applied numerical method can lead to a huge amount of calculations. The aim of this research is to apply the parallel computing to this specific computational technique and to investigate the impact of increasing the number of parallel executing threads.

Design/methodology/approach

The largest possible amount of operations is put in parallel using the fact that the inductance between two segments is a sum of independent elements. OpenMP and Microsoft's Concurrency Runtime have been tested as parallel programming techniques.

Findings

Parallel computing with a different number of threads (up to 24) has been tested with OpenMP. A significant increase in computational speed (up to 21 times) has been obtained.

Research limitations/implications

The research is limited by the available number of parallel processors.

Practical implications

Accurate and fast inductance calculation for flexible electronic components is possible to achieve. The impact of parallel processing is proven.

Social implications

The proposed method of calculation acceleration of inductances can be helpful in the design and optimization of new flexible devices in electronics.

Originality/value

Parallel computing is applied to the design of flexible electronic components. It is shown that a large number of parallel processors can be efficiently used in this type of calculation. The obtained results are interesting for people involved in the design of flexible components, and generally, for researchers/engineers dealing with similar electromagnetic problems.

Details

COMPEL - The international journal for computation and mathematics in electrical and electronic engineering, vol. 32 no. 3
Type: Research Article
ISSN: 0332-1649

Keywords

Article
Publication date: 9 April 2019

Mohammad Mortezazadeh and Liangzhu (Leon) Wang

The purpose of this paper is the development of a new density-based (DB) semi-Lagrangian method to speed up the conventional pressure-based (PB) semi-Lagrangian methods.

Abstract

Purpose

The purpose of this paper is the development of a new density-based (DB) semi-Lagrangian method to speed up the conventional pressure-based (PB) semi-Lagrangian methods.

Design/methodology/approach

The semi-Lagrangian-based solvers are typically PB, i.e. semi-Lagrangian pressure-based (SLPB) solvers, where a Poisson equation is solved for obtaining the pressure field and ensuring a divergence-free flow field. As an elliptic-type equation, the Poisson equation often relies on an iterative solution, so it can create a challenge of parallel computing and a bottleneck of computing speed. This study proposes a new DB semi-Lagrangian method, i.e. the semi-Lagrangian artificial compressibility (SLAC), which replaces the Poisson equation by a hyperbolic continuity equation with an added artificial compressibility (AC) term, so a time-marching solution is possible. Without the Poisson equation, the proposed SLAC solver is faster, particularly for the cases with more computational cells, and better suited for parallel computing.

Findings

The study compares the accuracy and the computing speeds of both SLPB and SLAC solvers for the lid-driven cavity flow and the step-flow problems. It shows that the proposed SLAC solver is able to achieve the same results as the SLPB, whereas with a 3.03 times speed up before using the OpenMP parallelization and a 3.35 times speed up for the large grid number case (512 × 512) after the parallelization. The speed up can be improved further for larger cases because of increasing the condition number of the coefficient matrixes of the Poisson equation.

Originality/value

This paper proposes a method of avoiding solving the Poisson equation, a typical computing bottleneck for semi-Lagrangian-based fluid solvers by converting the conventional PB solver (SLPB) to the DB solver (SLAC) through the addition of the AC term. The method simplifies and facilitates the parallelization process of semi-Lagrangian-based fluid solvers for modern HPC infrastructures, such as OpenMP and GPU computing.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. 29 no. 6
Type: Research Article
ISSN: 0961-5539

Keywords

Article
Publication date: 30 September 2014

Pedro Miguel de Almeida Areias, Timon Rabczuk and Joaquim Infante Barbosa

– The purpose of this paper is to discuss the linear solution of equality constrained problems by using the Frontal solution method without explicit assembling.

Abstract

Purpose

The purpose of this paper is to discuss the linear solution of equality constrained problems by using the Frontal solution method without explicit assembling.

Design/methodology/approach

Re-written frontal solution method with a priori pivot and front sequence. OpenMP parallelization, nearly linear (in elimination and substitution) up to 40 threads. Constraints enforced at the local assembling stage.

Findings

When compared with both standard sparse solvers and classical frontal implementations, memory requirements and code size are significantly reduced.

Research limitations/implications

Large, non-linear problems with constraints typically make use of the Newton method with Lagrange multipliers. In the context of the solution of problems with large number of constraints, the matrix transformation methods (MTM) are often more cost-effective. The paper presents a complete solution, with topological ordering, for this problem.

Practical implications

A complete software package in Fortran 2003 is described. Examples of clique-based problems are shown with large systems solved in core.

Social implications

More realistic non-linear problems can be solved with this Frontal code at the core of the Newton method.

Originality/value

Use of topological ordering of constraints. A-priori pivot and front sequences. No need for symbolic assembling. Constraints treated at the core of the Frontal solver. Use of OpenMP in the main Frontal loop, now quantified. Availability of Software.

Details

Engineering Computations, vol. 31 no. 7
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 25 June 2020

Abedalmuhdi Almomany, Ahmad M. Al-Omari, Amin Jarrah and Mohammad Tawalbeh

The problem of motif discovery has become a significant challenge in the era of big data where there are hundreds of genomes requiring annotations. The importance of motifs has…

Abstract

Purpose

The problem of motif discovery has become a significant challenge in the era of big data where there are hundreds of genomes requiring annotations. The importance of motifs has led many researchers to develop different tools and algorithms for finding them. The purpose of this paper is to propose a new algorithm to increase the speed and accuracy of the motif discovering process, which is the main drawback of motif discovery algorithms.

Design/methodology/approach

All motifs are sorted in a tree-based indexing structure where each motif is created from a combination of nucleotides: ‘A’, ‘C’, ‘T’ and ‘G’. The full motif can be discovered by extending the search around 4-mer nucleotides in both directions, left and right. Resultant motifs would be identical or degenerated with various lengths.

Findings

The developed implementation discovers conserved string motifs in DNA without having prior information about the motifs. Even for a large data set that contains millions of nucleotides and thousands of very long sequences, the entire process is completed in a few seconds.

Originality/value

Experimental results demonstrate the efficiency of the proposed implementation; as for a real-sequence of 1,270,000 nucleotides spread into 2,000 samples, it takes 5.9 s to complete the overall discovering process when the code ran on an Intel Core i7-6700 @ 3.4 GHz machine and 26.7 s when running on an Intel Xeon x5670 @ 2.93 GHz machine. In addition, the authors have improved computational performance by parallelizing the implementation to run on multi-core machines using the OpenMP framework. The speedup achieved by parallelizing the implementation is scalable and proportional to the number of processors with a high efficiency that is close to 100%.

Details

Engineering Computations, vol. 38 no. 1
Type: Research Article
ISSN: 0264-4401

Keywords

1 – 10 of 68