Search results

1 – 10 of 120
Article
Publication date: 24 August 2018

Hongbin Liu, Xinrong Su and Xin Yuan

Adopting large eddy simulation (LES) to simulate the complex flow in turbomachinery is appropriate to overcome the limitation of current Reynolds-Averaged Navier–Stokes modelling…

Abstract

Purpose

Adopting large eddy simulation (LES) to simulate the complex flow in turbomachinery is appropriate to overcome the limitation of current Reynolds-Averaged Navier–Stokes modelling and it provides a deeper understanding of the complicated transitional and turbulent flow mechanism; however, the large computational cost limits its application in high Reynolds number flow. This study aims to develop a three-dimensional GPU-enabled parallel-unstructured solver to speed up the high-fidelity LES simulation.

Design/methodology/approach

Compared to the central processing units (CPUs), graphics processing units (GPUs) can provide higher computational speed. This work aims to develop a three-dimensional GPU-enabled parallel-unstructured solver to speed up the high-fidelity LES simulation. A set of low-dissipation schemes designed for unstructured mesh is implemented with compute unified device architecture programming model. Several key parameters affecting the performance of the GPU code are discussed and further speed-up can be obtained by analysing the underlying finite volume-based numerical scheme.

Findings

The results show that an acceleration ratio of approximately 84 (on a single GPU) for double precision algorithm can be achieved with this unstructured GPU code. The transitional flow inside a compressor is simulated and the computational efficiency has been improved greatly. The transition process is discussed and the role of K-H instability playing in the transition mechanism is verified.

Practical/implications

The speed-up gained from GPU-enabled solver reaches 84 compared to original code running on CPU and the vast speed-up enables the fast-turnaround high-fidelity LES simulation.

Originality/value

The GPU-enabled flow solver is implemented and optimized according to the feature of finite volume scheme. The solving time is reduced remarkably and the detail structures including vortices are captured.

Details

Engineering Computations, vol. 35 no. 5
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 25 February 2020

Shengquan Wang, Chao Wang, Yong Cai and Guangyao Li

The purpose of this paper is to improve the computational speed of solving nonlinear dynamics by using parallel methods and mixed-precision algorithm on graphic processing units…

Abstract

Purpose

The purpose of this paper is to improve the computational speed of solving nonlinear dynamics by using parallel methods and mixed-precision algorithm on graphic processing units (GPUs). The computational efficiency of traditional central processing units (CPUs)-based computer aided engineering software has been difficult to satisfy the needs of scientific research and practical engineering, especially for nonlinear dynamic problems. Besides, when calculations are performed on GPUs, double-precision operations are slower than single-precision operations. So this paper implemented mixed precision for nonlinear dynamic problem simulation using Belytschko-Tsay (BT) shell element on GPU.

Design/methodology/approach

To minimize data transfer between heterogeneous architectures, the parallel computation of the fully explicit finite element (FE) calculation is realized using a vectorized thread-level parallelism algorithm. An asynchronous data transmission strategy and a novel dependency relationship link-based method, for efficiently solving parallel explicit shell element equations, are used to improve the GPU utilization ratio. Finally, this paper implements mixed precision for nonlinear dynamic problems simulation using the BT shell element on a GPU and compare it to the CPU-based serially executed program and a GPU-based double-precision parallel computing program.

Findings

For a car body model containing approximately 5.3 million degrees of freedom, the computational speed is improved 25 times over CPU sequential computation, and approximately 10% over double-precision parallel computing method. The accuracy error of the mixed-precision computation is small and can satisfy the requirements of practical engineering problems.

Originality/value

This paper realized a novel FE parallel computing procedure for nonlinear dynamic problems using mixed-precision algorithm on CPU-GPU platform. Compared with the CPU serial program, the program implemented in this article obtains a 25 times acceleration ratio when calculating the model of 883,168 elements, which greatly improves the calculation speed for solving nonlinear dynamic problems.

Details

Engineering Computations, vol. 37 no. 6
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 3 November 2022

Shashi Kant Ratnakar, Utpal Kiran and Deepak Sharma

Structural topology optimization is computationally expensive due to the involvement of high-resolution mesh and repetitive use of finite element analysis (FEA) for computing the…

149

Abstract

Purpose

Structural topology optimization is computationally expensive due to the involvement of high-resolution mesh and repetitive use of finite element analysis (FEA) for computing the structural response. Since FEA consumes most of the computational time in each optimization iteration, a novel GPU-based parallel strategy for FEA is presented and applied to the large-scale structural topology optimization of 3D continuum structures.

Design/methodology/approach

A matrix-free solver based on preconditioned conjugate gradient (PCG) method is proposed to minimize the computational time associated with solution of linear system of equations in FEA. The proposed solver uses an innovative strategy to utilize only symmetric half of elemental stiffness matrices for implementation of the element-by-element matrix-free solver on GPU.

Findings

Using solid isotropic material with penalization (SIMP) method, the proposed matrix-free solver is tested over three 3D structural optimization problems that are discretized using all hexahedral structured and unstructured meshes. Results show that the proposed strategy demonstrates 3.1× –3.3× speedup for the FEA solver stage and overall speedup of 2.9× –3.3× over the standard element-by-element strategy on the GPU. Moreover, the proposed strategy requires almost 1.8× less GPU memory than the standard element-by-element strategy.

Originality/value

The proposed GPU-based matrix-free element-by-element solver takes a more general approach to the symmetry concept than previous works. It stores only symmetric half of the elemental matrices in memory and performs matrix-free sparse matrix-vector multiplication (SpMV) without any inter-thread communication. A customized data storage format is also proposed to store and access only symmetric half of elemental stiffness matrices for coalesced read and write operations on GPU over the unstructured mesh.

Details

Engineering Computations, vol. 39 no. 10
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 20 September 2019

S. Velliangiri

The service denial threats are regularly regarded as tools for effortlessly triggering online-based services offline. Moreover, the present occurrences reveal that these threats…

Abstract

Purpose

The service denial threats are regularly regarded as tools for effortlessly triggering online-based services offline. Moreover, the present occurrences reveal that these threats are being constantly employed for masking other vulnerable threats like disseminating malware, information losses, wire scams and mining bitcoins (Sujithra et al., 2018; Boujnouni and Jedra, 2018). In some cases, service denials have been employed to cyberheist financial firms which sums around $100,000. Documentation from Neustar accounts that is about 70 percent of the financial sector are aware of the threat, and therefore, incidents result in few losses, more than 35 percent of service denial attempts are identified as malware soon after the threat is sent out (Divyavani and Dileep Kumar Reddy, 2018). Intensive packet analysis (IPA) explores the packet headers from Layers 2 to 4 along with the application information layer from Layers 5 to 7 for locating and evading vulnerable network-related threats. The networked systems could be simply contained by low potent service denial operations in case the supplies of the systems are minimized by the safety modules. The paper aims to discuss these issues.

Design/methodology/approach

The initial feature will be resolved using the IPDME by locating the standard precise header delimiters such as carriage return line feed equally locating the header names. For the designed IPDME, the time difficulties in locating the initial position of the header field within a packet with static time expenses of four cycles. For buffering packets, the framework functions at the speed of cables. Soon after locating the header position, the value of the field is mined linearly from the position. Mining all the field values consequentially resolves the forthcoming restrictions which could be increased by estimating various information bytes per cycle and omitting non-required information packets. In this way, the exploration space is minimized from the packet length to the length of the header. Because of the minimized mining time, the buffered packets could be operated at an increasing time.

Findings

Based on the assessments of IPDME against broadly employed SIP application layer function tools it discloses hardware offloading of IPDME it could minimize the loads on the essential system supplies of about 25 percent. The IPDME reveals that the acceleration of 22X– 75X as evaluated against PJSIP parser and SNORT SIP pre-processor. One IPDME portrays an acceleration of 4X–6X during 12 occurrences of SNORT parsers executing on 12 processors. The IPDME accomplishes 3X superior to 200 parallel occurrences of GPU speeded up processors. Additionally, the IPDME has very minimal latencies with 12X–1,010X minimal than GPUs. IPDME accomplishes minimal energy trails of nearly 0.75 W using two engines and for 15 engines it is 3.6 W, which is 22.5X–100X less as evaluated to the graphic-based GPU speeding up.

Originality/value

IPDME assures that the system pools are not fatigued on Layer 7 mining by transmitting straightforwardly based on network intrusions without branching into the operating systems. IPDME averts the latencies because of the memory accesses by sidestepping the operating system which essentially permits the scheme to function at wired speed. Based on the safety perception, IPDME ultimately enhances the performance of the safety systems employing them. The increased bandwidth of the IPDME assures that the IPA’s could function at their utmost bandwidth. The service time for the threat independent traffic is enhanced because of minimization over the comprehensive latencies over the path among the network intrusions and the related applications.

Details

International Journal of Intelligent Unmanned Systems, vol. 7 no. 4
Type: Research Article
ISSN: 2049-6427

Keywords

Article
Publication date: 19 June 2017

Janusz Marian Bedkowski and Timo Röhling

This paper aims to focus on real-world mobile systems, and thus propose relevant contribution to the special issue on “Real-world mobile robot systems”. This work on 3D laser…

Abstract

Purpose

This paper aims to focus on real-world mobile systems, and thus propose relevant contribution to the special issue on “Real-world mobile robot systems”. This work on 3D laser semantic mobile mapping and particle filter localization dedicated for robot patrolling urban sites is elaborated with a focus on parallel computing application for semantic mapping and particle filter localization. The real robotic application of patrolling urban sites is the goal; thus, it has been shown that crucial robotic components have reach high Technology Readiness Level (TRL).

Design/methodology/approach

Three different robotic platforms equipped with different 3D laser measurement system were compared. Each system provides different data according to the measured distance, density of points and noise; thus, the influence of data into final semantic maps has been compared. The realistic problem is to use these semantic maps for robot localization; thus, the influence of different maps into particle filter localization has been elaborated. A new approach has been proposed for particle filter localization based on 3D semantic information, and thus, the behavior of particle filter in different realistic conditions has been elaborated. The process of using proposed robotic components for patrolling urban site, such as the robot checking geometrical changes of the environment, has been detailed.

Findings

The focus on real-world mobile systems requires different points of view for scientific work. This study is focused on robust and reliable solutions that could be integrated with real applications. Thus, new parallel computing approach for semantic mapping and particle filter localization has been proposed. Based on the literature, semantic 3D particle filter localization has not yet been elaborated; thus, innovative solutions for solving this issue have been proposed. Recently, a semantic mapping framework that was already published was developed. For this reason, this study claimed that the authors’ applied studies during real-world trials with such mapping system are added value relevant for this special issue.

Research limitations/implications

The main problem is the compromise between computer power and energy consumed by heavy calculations, thus our main focus is to use modern GPGPU, NVIDIA PASCAL parallel processor architecture. Recent advances in GPGPUs shows great potency for mobile robotic applications, thus this study is focused on increasing mapping and localization capabilities by improving the algorithms. Current limitation is related with the number of particles processed by a single processor, and thus achieved performance of 500 particles in real-time is the current limitation. The implication is that multi-GPU architectures for increasing the number of processed particle can be used. Thus, further studies are required.

Practical implications

The research focus is related to real-world mobile systems; thus, practical aspects of the work are crucial. The main practical application is semantic mapping that could be used for many robotic applications. The authors claim that their particle filter localization is ready to integrate with real robotic platforms using modern 3D laser measurement system. For this reason, the authors claim that their system can improve existing autonomous robotic platforms. The proposed components can be used for detection of geometrical changes in the scene; thus, many practical functionalities can be applied such as: detection of cars, detection of opened/closed gate, etc. […] These functionalities are crucial elements of the safe and security domain.

Social implications

Improvement of safe and security domain is a crucial aspect of modern society. Protecting critical infrastructure plays an important role, thus introducing autonomous mobile platforms capable of supporting human operators of safe and security systems could have a positive impact if viewed from many points of view.

Originality/value

This study elaborates the novel approach of particle filter localization based on 3D data and semantic mapping. This original work could have a great impact on the mobile robotics domain, and thus, this study claims that many algorithmic and implementation issues were solved assuming real-task experiments. The originality of this work is influenced by the use of modern advanced robotic systems being a relevant set of technologies for proper evaluation of the proposed approach. Such a combination of experimental hardware and original algorithms and implementation is definitely an added value.

Details

Industrial Robot: An International Journal, vol. 44 no. 4
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 7 September 2015

Theodoros Zygiridis, Georgios Pyrialakos, Nikolaos Kantartzis and Theodoros Tsiboukis

The locally one-dimensional (LOD) finite-difference time-domain (FDTD) method features unconditional stability, yet its low accuracy in time can potentially become detrimental…

Abstract

Purpose

The locally one-dimensional (LOD) finite-difference time-domain (FDTD) method features unconditional stability, yet its low accuracy in time can potentially become detrimental. Regarding the improvement of the method’s reliability, existing solutions introduce high-order spatial operators, which nevertheless cannot deal with the augmented temporal errors. The purpose of the paper is to describe a systematic procedure that enables the efficient implementation of extended spatial stencils in the context of the LOD-FDTD scheme, capable of reducing the combined space-time flaws without additional computational cost.

Design/methodology/approach

To accomplish the goal, the authors introduce spatial derivative approximations in parametric form, and then construct error formulae from the update equations, once they are represented as a one-stage process. The unknown operators are determined with the aid of two error-minimization procedures, which equally suppress errors both in space and time. Furthermore, accelerated implementation of the scheme is accomplished via parallelization on a graphics-processing-unit (GPU), which greatly shortens the duration of implicit updates.

Findings

It is shown that the performance of the LOD-FDTD method can be improved significantly, if it is properly modified according to accuracy-preserving principles. In addition, the numerical results verify that a GPU implementation of the implicit solver can result in up to 100× acceleration. Overall, the formulation developed herein describes a fast, unconditionally stable technique that remains reliable, even at coarse temporal resolutions.

Originality/value

Dispersion-relation-preserving optimization is applied to an unconditionally stable FDTD technique. In addition, parallel cyclic reduction is adapted to hepta-diagonal systems, and it is proven that GPU parallelization can offer non-trivial benefits to implicit FDTD approaches as well.

Details

COMPEL: The International Journal for Computation and Mathematics in Electrical and Electronic Engineering, vol. 34 no. 5
Type: Research Article
ISSN: 0332-1649

Keywords

Article
Publication date: 17 August 2012

Janusz Będkowski, Andrzej Masłowski and Geert De Cubber

The purpose of this paper is to demonstrate a real time 3D localization and mapping approach for the USAR (Urban Search and Rescue) robotic application, focusing on the…

Abstract

Purpose

The purpose of this paper is to demonstrate a real time 3D localization and mapping approach for the USAR (Urban Search and Rescue) robotic application, focusing on the performance and the accuracy of the General‐purpose computing on graphics processing units (GPGPU)‐based iterative closest point (ICP) 3D data registration implemented using modern GPGPU with FERMI architecture.

Design/methodology/approach

The authors put all the ICP computation into GPU, and performed the experiments with registration up to 106 data points. The main goal of the research was to provide a method for real‐time data registration performed by a mobile robot equipped with commercially available laser measurement system 3D. The main contribution of the paper is a new GPGPU based ICP implementation with regular grid decomposition. It guarantees high accuracy as equivalent CPU based ICP implementation with better performance.

Findings

The authors have shown an empirical analysis of the tuning of GPUICP parameters for obtaining much better performance (acceptable level of the variance of the computing time) with minimal lost of accuracy. Loop closing method is added and demonstrates satisfactory results of 3D localization and mapping in urban environments. This work can help in building the USAR mobile robotic applications that process 3D cloud of points in real time.

Practical implications

This work can help in developing real time mapping for USAR robotic applications.

Originality/value

The paper proposes a new method for nearest neighbor search that guarantees better performance with minimal loss of accuracy. The variance of computational time is much less than SoA.

Article
Publication date: 11 October 2019

Yaxin Peng, Naiwu Wen, Chaomin Shen, Xiaohuang Zhu and Shihui Ying

Partial alignment for 3 D point sets is a challenging problem for laser calibration and robot calibration due to the unbalance of data sets, especially when the overlap of data…

Abstract

Purpose

Partial alignment for 3 D point sets is a challenging problem for laser calibration and robot calibration due to the unbalance of data sets, especially when the overlap of data sets is low. Geometric features can promote the accuracy of alignment. However, the corresponding feature extraction methods are time consuming. The purpose of this paper is to find a framework for partial alignment by an adaptive trimmed strategy.

Design/methodology/approach

First, the authors propose an adaptive trimmed strategy based on point feature histograms (PFH) coding. Second, they obtain an initial transformation based on this partition, which improves the accuracy of the normal direction weighted trimmed iterative closest point (ICP) method. Third, they conduct a series of GPU parallel implementations for time efficiency.

Findings

The initial partition based on PFH feature improves the accuracy of the partial registration significantly. Moreover, the parallel GPU algorithms accelerate the alignment process.

Research limitations/implications

This study is applicable to rigid transformation so far. It could be extended to non-rigid transformation.

Practical implications

In practice, point set alignment for calibration is a technique widely used in the fields of aircraft assembly, industry examination, simultaneous localization and mapping and surgery navigation.

Social implications

Point set calibration is a building block in the field of intelligent manufacturing.

Originality/value

The contributions are as follows: first, the authors introduce a novel coarse alignment as an initial calibration by PFH descriptor similarity, which can be viewed as a coarse trimmed process by partitioning the data to the almost overlap part and the rest part; second, they reduce the computation time by GPU parallel coding during the acquisition of feature descriptor; finally, they use the weighted trimmed ICP method to refine the transformation.

Details

Assembly Automation, vol. 40 no. 2
Type: Research Article
ISSN: 0144-5154

Keywords

Book part
Publication date: 19 November 2014

Garland Durham and John Geweke

Massively parallel desktop computing capabilities now well within the reach of individual academics modify the environment for posterior simulation in fundamental and potentially…

Abstract

Massively parallel desktop computing capabilities now well within the reach of individual academics modify the environment for posterior simulation in fundamental and potentially quite advantageous ways. But to fully exploit these benefits algorithms that conform to parallel computing environments are needed. This paper presents a sequential posterior simulator designed to operate efficiently in this context. The simulator makes fewer analytical and programming demands on investigators, and is faster, more reliable, and more complete than conventional posterior simulators. The paper extends existing sequential Monte Carlo methods and theory to provide a thorough and practical foundation for sequential posterior simulation that is well suited to massively parallel computing environments. It provides detailed recommendations on implementation, yielding an algorithm that requires only code for simulation from the prior and evaluation of prior and data densities and works well in a variety of applications representative of serious empirical work in economics and finance. The algorithm facilitates Bayesian model comparison by producing marginal likelihood approximations of unprecedented accuracy as an incidental by-product, is robust to pathological posterior distributions, and provides estimates of numerical standard error and relative numerical efficiency intrinsically. The paper concludes with an application that illustrates the potential of these simulators for applied Bayesian inference.

Book part
Publication date: 19 November 2014

Martin Burda

The BEKK GARCH class of models presents a popular set of tools for applied analysis of dynamic conditional covariances. Within this class the analyst faces a range of model…

Abstract

The BEKK GARCH class of models presents a popular set of tools for applied analysis of dynamic conditional covariances. Within this class the analyst faces a range of model choices that trade off flexibility with parameter parsimony. In the most flexible unrestricted BEKK the parameter dimensionality increases quickly with the number of variables. Covariance targeting decreases model dimensionality but induces a set of nonlinear constraints on the underlying parameter space that are difficult to implement. Recently, the rotated BEKK (RBEKK) has been proposed whereby a targeted BEKK model is applied after the spectral decomposition of the conditional covariance matrix. An easily estimable RBEKK implies a full albeit constrained BEKK for the unrotated returns. However, the degree of the implied restrictiveness is currently unknown. In this paper, we suggest a Bayesian approach to estimation of the BEKK model with targeting based on Constrained Hamiltonian Monte Carlo (CHMC). We take advantage of suitable parallelization of the problem within CHMC utilizing the newly available computing power of multi-core CPUs and Graphical Processing Units (GPUs) that enables us to deal effectively with the inherent nonlinear constraints posed by covariance targeting in relatively high dimensions. Using parallel CHMC we perform a model comparison in terms of predictive ability of the targeted BEKK with the RBEKK in the context of an application concerning a multivariate dynamic volatility analysis of a Dow Jones Industrial returns portfolio. Although the RBEKK does improve over a diagonal BEKK restriction, it is clearly dominated by the full targeted BEKK model.

Details

Bayesian Model Comparison
Type: Book
ISBN: 978-1-78441-185-5

Keywords

1 – 10 of 120