Search results

1 – 10 of 264
Article
Publication date: 7 February 2019

Tanvir Habib Sardar and Ahmed Rimaz Faizabadi

In recent years, there is a gradual shift from sequential computing to parallel computing. Nowadays, nearly all computers are of multicore processors. To exploit the available…

2030

Abstract

Purpose

In recent years, there is a gradual shift from sequential computing to parallel computing. Nowadays, nearly all computers are of multicore processors. To exploit the available cores, parallel computing becomes necessary. It increases speed by processing huge amount of data in real time. The purpose of this paper is to parallelize a set of well-known programs using different techniques to determine best way to parallelize a program experimented.

Design/methodology/approach

A set of numeric algorithms are parallelized using hand parallelization using OpenMP and auto parallelization using Pluto tool.

Findings

The work discovers that few of the algorithms are well suited in auto parallelization using Pluto tool but many of the algorithms execute more efficiently using OpenMP hand parallelization.

Originality/value

The work provides an original work on parallelization using OpenMP programming paradigm and Pluto tool.

Details

Data Technologies and Applications, vol. 53 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 21 November 2008

Rabab Hayek, Guillaume Raschia, Patrick Valduriez and Noureddine Mouaddib

The goal of this paper is to contribute to the development of both data localization and description techniques in P2P systems.

Abstract

Purpose

The goal of this paper is to contribute to the development of both data localization and description techniques in P2P systems.

Design/methodology/approach

The approach consists of introducing a novel indexing technique that relies on linguistic data summarization into the context of P2P systems.

Findings

The cost model of the approach, as well as the simulation results have shown that the approach allows the efficient maintenance of data summaries, without incurring high traffic overhead. In addition, the cost of query routing is significantly reduced in the context of summaries.

Research limitations/implications

The paper has considered a summary service defined on the APPA's architecture. Future works have to study the extension of this work in order to be generally applicable to any P2P data management system.

Practical implications

This paper has mainly studied the quantitative gain that could be obtained in query processing from exploiting data summaries. Future works aim to implement this technique on real data (not synthetic) in order to study the qualitative gain that can be obtained from approximately answering a query.

Originality/value

The novelty of the approach shown in the paper relies on the double exploitation of the summaries in P2P systems: data summaries allow for a semantic‐based query routing, and also for an approximate query answering, using their intentional descriptions.

Details

International Journal of Pervasive Computing and Communications, vol. 4 no. 4
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 1 June 2005

André Buchau, Wolfgang Hafla, Friedemann Groh and Wolfgang M. Rucker

Various parallelization strategies are investigated to mainly reduce the computational costs in the context of boundary element methods and a compressed system matrix.

Abstract

Purpose

Various parallelization strategies are investigated to mainly reduce the computational costs in the context of boundary element methods and a compressed system matrix.

Design/methodology/approach

Electrostatic field problems are solved numerically by an indirect boundary element method. The fully dense system matrix is compressed by an application of the fast multipole method. Various parallelization techniques such as vectorization, multiple threads, and multiple processes are applied to reduce the computational costs.

Findings

It is shown that in total a good speedup is achieved by a parallelization approach which is relatively easy to implement. Furthermore, a detailed discussion on the influence of problem oriented meshes to the different parts of the method is presented. On the one hand the application of problem oriented meshes leads to relatively small linear systems of equations along with a high accuracy of the solution, but on the other hand the efficiency of parallelization itself is diminished.

Research limitations/implications

The presented parallelization approach has been tested on a small PC cluster only. Additionally, the main focus has been laid on a reduction of computing time.

Practical implications

Typical properties of general static field problems are comprised in the investigated numerical example. Hence, the results and conclusions are rather general.

Originality/value

Implementation details of a parallelization of existing fast and efficient boundary element method solvers are discussed. The presented approach is relatively easy to implement and takes special properties of fast methods in combination with parallelization into account.

Details

COMPEL - The international journal for computation and mathematics in electrical and electronic engineering, vol. 24 no. 2
Type: Research Article
ISSN: 0332-1649

Keywords

Article
Publication date: 7 March 2016

Mahmoud Yazdani, Hamidreza Paseh and Mostafa Sharifzadeh

– The purpose of this paper is to find a convenient contact detection algorithm in order to apply in distinct element simulation.

Abstract

Purpose

The purpose of this paper is to find a convenient contact detection algorithm in order to apply in distinct element simulation.

Design/methodology/approach

Taking the most computation effort, the performance of the contact detection algorithm highly affects the running time. The algorithms investigated in this study consist of Incremental Sort-and-Update (ISU) and Double-Ended Spatial Sorting (DESS). These algorithms are based on bounding boxes, which makes the algorithm independent of blocks shapes. ISU and DESS algorithms contain sorting and updating phases. To compare the algorithms, they were implemented in identical examples of rock engineering problems with varying parameters.

Findings

The results show that the ISU algorithm gives lower running time and shows better performance when blocks are unevenly distributed in both axes. The conventional ISU merges the sorting and updating phases in its naïve implementation. In this paper, a new computational technique is proposed based on parallelization in order to effectively improve the ISU algorithm and decrease the running time of numerical analysis in large-scale rock mass projects.

Originality/value

In this approach, the sorting and updating phases are separated by minor changes in the algorithm. This tends to a minimal overhead of running time and a little extra memory usage and then the parallelization of phases can be applied. On the other hand, the time consumed by the updating phase of ISU algorithm is about 30 percent of the total time, which makes the parallelization justifiable. Here, according to the results for the large-scale problems, this improved technique can increase the performance of the ISU algorithm up to 20 percent.

Details

Engineering Computations, vol. 33 no. 1
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 5 May 2015

Guangtao Duan and Bin Chen

The purpose of this paper is to find the best solver for parallelizing particle methods based on solving Pressure Poisson Equation (PPE) by taking Moving Particle Semi-Implicit…

Abstract

Purpose

The purpose of this paper is to find the best solver for parallelizing particle methods based on solving Pressure Poisson Equation (PPE) by taking Moving Particle Semi-Implicit (MPS) method as an example because the solution for PPE is usually the most time-consuming part difficult to parallelize.

Design/methodology/approach

To find the best solver, the authors compare six Krylov solvers, namely, Conjugate Gradient method (CG), Scaled Conjugate Gradient method (SCG), Bi-Conjugate Gradient Stabilized (BiCGStab) method, Conjugate Gradient Squared (CGS) method with Symmetric Lanczos Algorithm (SLA) method and Incomplete Cholesky Conjugate Gradient method (ICCG) in terms of convergence, time consumption, parallel efficiency and memory consumption for the semi-implicit particle method. The MPS method is parallelized by the hybrid Open Multi-Processing (OpenMP)/Message Passing Interface (MPI) model. The dam-break flow and channel flow simulations are used to evaluate the performance of different solvers.

Findings

It is found that CG converges stably, runs fastest in the serial way, uses the least memory and has highest OpenMP parallel efficiency, but its MPI parallel efficiency is lower than SLA because SLA requires less synchronization than CG.

Originality/value

With all these criteria considered and weighed, the recommended parallel solver for the MPS method is CG.

Article
Publication date: 4 May 2012

Frode Nygård and Helge I. Andersson

The purpose of this paper is to describe a pragmatic parallelization of a publicly available serial code aimed for direct numerical simulations of turbulent flow fields. The code…

160

Abstract

Purpose

The purpose of this paper is to describe a pragmatic parallelization of a publicly available serial code aimed for direct numerical simulations of turbulent flow fields. The code solves the full Navier‐Stokes equations in a cylindrical coordinate system.

Design/methodology/approach

The parallelization is performed by a single program multiple data approach using the Message‐Passing Interface (MPI) Library for processor communication.

Findings

In order to maintain the original coding of the subroutines, two obstacles had to be overcome. First, special attention had to be given to the inversion of the sparse matrixes from the linear terms in the Navier‐Stokes equations solved by an implicit scheme. Second, the serial FFT‐routines, needed for the direct Poisson‐solver, had to be replaced by parallel versions. Two directions of parallelization were tested. Parallelization in the axial direction turned out to be more efficient than parallelization in the circumferential direction.

Originality/value

This paper presents a pragmatic parallelization of an open source finite difference code and should be useful to researchers in the field of numerical methods for fluid flow who need to parallelize a numerical code.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. 22 no. 4
Type: Research Article
ISSN: 0961-5539

Keywords

Article
Publication date: 1 March 2012

Sander Lenferink, Jos Arts, Taede Tillema, Marcelle van Valkenburg and Roel Nijsten

Traditionally, in the Netherlands, the procurement procedure for infrastructure does not start until the public decision-making procedure is fully completed. In the new…

Abstract

Traditionally, in the Netherlands, the procurement procedure for infrastructure does not start until the public decision-making procedure is fully completed. In the new procurement strategy, early contractor involvement is applied by carrying out the procurement procedure and the public planning procedure simultaneously. This article explores the first experiences and lessons learned with early contractor involvement in four Dutch infrastructure projects. It can be concluded that the new strategy adds value in terms of time gains, improved project control and more innovative solutions. However, to optimize early contractor involvement, the differences between the competitive procurement procedures and the open, cooperative public planning procedures need to be bridged.

Details

Journal of Public Procurement, vol. 12 no. 1
Type: Research Article
ISSN: 1535-0118

Article
Publication date: 4 March 2014

Yuji Sato and Mikiko Sato

The purpose of this paper is to propose a fault-tolerant technology for increasing the durability of application programs when evolutionary computation is performed by fast…

Abstract

Purpose

The purpose of this paper is to propose a fault-tolerant technology for increasing the durability of application programs when evolutionary computation is performed by fast parallel processing on many-core processors such as graphics processing units (GPUs) and multi-core processors (MCPs).

Design/methodology/approach

For distributed genetic algorithm (GA) models, the paper proposes a method where an island's ID number is added to the header of data transferred by this island for use in fault detection.

Findings

The paper has shown that the processing time of the proposed idea is practically negligible in applications and also shown that an optimal solution can be obtained even with a single stuck-at fault or a transient fault, and that increasing the number of parallel threads makes the system less susceptible to faults.

Originality/value

The study described in this paper is a new approach to increase the sustainability of application program using distributed GA on GPUs and MCPs.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 7 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 3 July 2020

Azra Nazir, Roohie Naaz Mir and Shaima Qureshi

The trend of “Deep Learning for Internet of Things (IoT)” has gained fresh momentum with enormous upcoming applications employing these models as their processing engine and Cloud…

274

Abstract

Purpose

The trend of “Deep Learning for Internet of Things (IoT)” has gained fresh momentum with enormous upcoming applications employing these models as their processing engine and Cloud as their resource giant. But this picture leads to underutilization of ever-increasing device pool of IoT that has already passed 15 billion mark in 2015. Thus, it is high time to explore a different approach to tackle this issue, keeping in view the characteristics and needs of the two fields. Processing at the Edge can boost applications with real-time deadlines while complementing security.

Design/methodology/approach

This review paper contributes towards three cardinal directions of research in the field of DL for IoT. The first section covers the categories of IoT devices and how Fog can aid in overcoming the underutilization of millions of devices, forming the realm of the things for IoT. The second direction handles the issue of immense computational requirements of DL models by uncovering specific compression techniques. An appropriate combination of these techniques, including regularization, quantization, and pruning, can aid in building an effective compression pipeline for establishing DL models for IoT use-cases. The third direction incorporates both these views and introduces a novel approach of parallelization for setting up a distributed systems view of DL for IoT.

Findings

DL models are growing deeper with every passing year. Well-coordinated distributed execution of such models using Fog displays a promising future for the IoT application realm. It is realized that a vertically partitioned compressed deep model can handle the trade-off between size, accuracy, communication overhead, bandwidth utilization, and latency but at the expense of an additionally considerable memory footprint. To reduce the memory budget, we propose to exploit Hashed Nets as potentially favorable candidates for distributed frameworks. However, the critical point between accuracy and size for such models needs further investigation.

Originality/value

To the best of our knowledge, no study has explored the inherent parallelism in deep neural network architectures for their efficient distribution over the Edge-Fog continuum. Besides covering techniques and frameworks that have tried to bring inference to the Edge, the review uncovers significant issues and possible future directions for endorsing deep models as processing engines for real-time IoT. The study is directed to both researchers and industrialists to take on various applications to the Edge for better user experience.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 13 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 1 June 1999

P.J. Coelho and J. Gonçalves

The finite volume method for radiative heat transfer calculations has been parallelized using two strategies, the angular domain decomposition and the spatial domain…

Abstract

The finite volume method for radiative heat transfer calculations has been parallelized using two strategies, the angular domain decomposition and the spatial domain decomposition. In the first case each processor performs the calculations for the whole domain and for a subset of control angles, while in the second case each processor deals with all the control angles but only treats a spatial subdomain. The method is applied to three‐dimensional rectangular enclosures containing a grey emitting‐absorbing medium. The results obtained show that the number of iterations required to achieve convergence is independent of the number of processors in the angular decomposition strategy, but increases with the number of processors in the domain decomposition method. As a consequence, higher parallel efficiencies are obtained in the first case. The influence of the angular discretization, grid size and absorption coefficient of the medium on the parallel performance is also investigated.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. 9 no. 4
Type: Research Article
ISSN: 0961-5539

Keywords

1 – 10 of 264