Search results

1 – 10 of over 17000
Article
Publication date: 4 April 2019

Chao Peng

The purpose of this paper is to investigate possibilities to adopt state-of-the-art computer graphics technologies for big data visualization in engineering applications. Toward…

Abstract

Purpose

The purpose of this paper is to investigate possibilities to adopt state-of-the-art computer graphics technologies for big data visualization in engineering applications. Toward this purpose, a conceptual heterogeneous system is proposed for graphical rendering, which is established with multiple central processing unit cores and multiple graphics processing unit GPUs.

Design/methodology/approach

The design of the system supports both general-purpose computation and graphics-related computation. Three processing components are discussed to fulfill the execution requirements in load balancing, data streaming and display. This design fully uses computational and memory resources and enhances the performance with the support of GPU-based parallelization.

Findings

The advantages and disadvantages of particular technical methods for each processing component are discussed. The possible ways to integrate them are analyzed.

Originality/value

This work has contributions of using computer graphics technologies in engineering applications.

Details

World Journal of Engineering, vol. 16 no. 2
Type: Research Article
ISSN: 1708-5284

Keywords

Article
Publication date: 17 October 2018

Sura Nawfal and Fakhrulddin Ali

The purpose of this paper is to achieve the acceleration of 3D object transformation using parallel techniques such as multi-core central processing unit (MC CPU) or graphic

Abstract

Purpose

The purpose of this paper is to achieve the acceleration of 3D object transformation using parallel techniques such as multi-core central processing unit (MC CPU) or graphic processing unit (GPU) or even both. Generating 3D animation scenes in computer graphics requires applying a 3D transformation on the vertices of the objects. These transformations consume most of the execution time. Hence, for high-speed graphic systems, acceleration of vertex transform is very much sought for because it requires many matrix operations (need) to be performed in a real time, so the execution time is essential for such processing.

Design/methodology/approach

In this paper, the acceleration of 3D object transformation is achieved using parallel techniques such as MC CPU or GPU or even both. Multiple geometric transformations are concatenated together at a time in any order in an interactive manner.

Findings

The performance results are presented for a number of 3D objects with paralleled implementations of the affine transform on the NVIDIA GPU series. The maximum execution time was about 0.508 s to transform 100 million vertices using LabVIEW and 0.096 s using Visual Studio. Other results also showed the significant speed-up compared to CPU, MC CPU and other previous work computations for the same object complexity.

Originality/value

The high-speed execution of 3D models is essential in many applications such as medical imaging, 3D games and robotics.

Details

Journal of Engineering, Design and Technology, vol. 16 no. 6
Type: Research Article
ISSN: 1726-0531

Keywords

Article
Publication date: 24 August 2018

Hongbin Liu, Xinrong Su and Xin Yuan

Adopting large eddy simulation (LES) to simulate the complex flow in turbomachinery is appropriate to overcome the limitation of current Reynolds-Averaged Navier–Stokes modelling…

Abstract

Purpose

Adopting large eddy simulation (LES) to simulate the complex flow in turbomachinery is appropriate to overcome the limitation of current Reynolds-Averaged Navier–Stokes modelling and it provides a deeper understanding of the complicated transitional and turbulent flow mechanism; however, the large computational cost limits its application in high Reynolds number flow. This study aims to develop a three-dimensional GPU-enabled parallel-unstructured solver to speed up the high-fidelity LES simulation.

Design/methodology/approach

Compared to the central processing units (CPUs), graphics processing units (GPUs) can provide higher computational speed. This work aims to develop a three-dimensional GPU-enabled parallel-unstructured solver to speed up the high-fidelity LES simulation. A set of low-dissipation schemes designed for unstructured mesh is implemented with compute unified device architecture programming model. Several key parameters affecting the performance of the GPU code are discussed and further speed-up can be obtained by analysing the underlying finite volume-based numerical scheme.

Findings

The results show that an acceleration ratio of approximately 84 (on a single GPU) for double precision algorithm can be achieved with this unstructured GPU code. The transitional flow inside a compressor is simulated and the computational efficiency has been improved greatly. The transition process is discussed and the role of K-H instability playing in the transition mechanism is verified.

Practical/implications

The speed-up gained from GPU-enabled solver reaches 84 compared to original code running on CPU and the vast speed-up enables the fast-turnaround high-fidelity LES simulation.

Originality/value

The GPU-enabled flow solver is implemented and optimized according to the feature of finite volume scheme. The solving time is reduced remarkably and the detail structures including vortices are captured.

Details

Engineering Computations, vol. 35 no. 5
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 5 January 2015

Victor U. Karthik, Sivamayam Sivasuthan, Arunasalam Rahunanthan, Ravi S. Thyagarajan, Paramsothy Jayakumar, Lalita Udpa and S. Ratnajeevan H. Hoole

Inverting electroheat problems involves synthesizing the electromagnetic arrangement of coils and geometries to realize a desired heat distribution. To this end two finite element…

Abstract

Purpose

Inverting electroheat problems involves synthesizing the electromagnetic arrangement of coils and geometries to realize a desired heat distribution. To this end two finite element problems need to be solved, first for the magnetic fields and the joule heat that the associated eddy currents generate and then, based on these heat sources, the second problem for heat distribution. This two-part problem needs to be iterated on to obtain the desired thermal distribution by optimization. Being a time consuming process, the purpose of this paper is to parallelize the process using the graphics processing unit (GPU) and the real-coded genetic algorithm, each for both speed and accuracy.

Design/methodology/approach

This coupled problem represents a heavy computational load with long wait-times for results. The GPU has recently been demonstrated to enhance the efficiency and accuracy of the finite element computations and cut down solution times. It has also been used to speedup the naturally parallel genetic algorithm. The authors use the GPU to perform coupled electroheat finite element optimization by the genetic algorithm to achieve computational efficiencies far better than those reported for a single finite element problem. In the genetic algorithm, coding objective functions in real numbers rather than binary arithmetic gives added speed and accuracy.

Findings

The feasibility of the method proposed to reduce computational time and increase accuracy is established through the simple problem of shaping a current carrying conductor so as to yield a constant temperature along a line. The authors obtained a speedup (CPU time to GPU time ratio) saturating to about 28 at a population size of 500 because of increasing communications between threads. But this far better than what is possible on a workstation.

Research limitations/implications

By using the intrinsically parallel genetic algorithm on a GPU, large complex coupled problems may be solved very quickly. The method demonstrated here without accounting for radiation and convection, may be trivially extended to more completely modeled electroheat systems. Since the primary purpose here is to establish methodology and feasibility, the thermal problem is simplified by neglecting convection and radiation. While that introduces some error, the computational procedure is still validated.

Practical implications

The methodology established has direct applications in electrical machine design, metallurgical mixing processes, and hyperthermia treatment in oncology. In these three practical application areas, the authors need to compute the exciting coil (or antenna) arrangement (current magnitude and phase) and device geometry that would accomplish a desired heat distribution to achieve mixing, reduce machine heat or burn cancerous tissue. This process presented does it more accurately and speedily.

Social implications

Particularly the above-mentioned application in oncology will alleviate human suffering through use in hyperthermia treatment planning in cancer treatment. The method presented provides scope for new commercial software development and employment.

Originality/value

Previous finite element shape optimization of coupled electroheat problems by this group used gradient methods whose difficulties are explained. Others have used analytical and circuit models in place of finite elements. This paper applies the massive parallelization possible with GPUs to the inherently parallel genetic algorithm, and extends it from single field system problems to coupled problems, and thereby realizes practicable solution times for such a computationally complex problem. Further, by using GPU computations rather than CPU, accuracy is enhanced. And then by using real number rather than binary coding for object functions, further accuracy and speed gains are realized.

Details

COMPEL: The International Journal for Computation and Mathematics in Electrical and Electronic Engineering, vol. 34 no. 1
Type: Research Article
ISSN: 0332-1649

Keywords

Article
Publication date: 4 March 2014

Yuji Sato and Mikiko Sato

The purpose of this paper is to propose a fault-tolerant technology for increasing the durability of application programs when evolutionary computation is performed by fast…

Abstract

Purpose

The purpose of this paper is to propose a fault-tolerant technology for increasing the durability of application programs when evolutionary computation is performed by fast parallel processing on many-core processors such as graphics processing units (GPUs) and multi-core processors (MCPs).

Design/methodology/approach

For distributed genetic algorithm (GA) models, the paper proposes a method where an island's ID number is added to the header of data transferred by this island for use in fault detection.

Findings

The paper has shown that the processing time of the proposed idea is practically negligible in applications and also shown that an optimal solution can be obtained even with a single stuck-at fault or a transient fault, and that increasing the number of parallel threads makes the system less susceptible to faults.

Originality/value

The study described in this paper is a new approach to increase the sustainability of application program using distributed GA on GPUs and MCPs.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 7 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 22 December 2023

Vaclav Snasel, Tran Khanh Dang, Josef Kueng and Lingping Kong

This paper aims to review in-memory computing (IMC) for machine learning (ML) applications from history, architectures and options aspects. In this review, the authors investigate…

75

Abstract

Purpose

This paper aims to review in-memory computing (IMC) for machine learning (ML) applications from history, architectures and options aspects. In this review, the authors investigate different architectural aspects and collect and provide our comparative evaluations.

Design/methodology/approach

Collecting over 40 IMC papers related to hardware design and optimization techniques of recent years, then classify them into three optimization option categories: optimization through graphic processing unit (GPU), optimization through reduced precision and optimization through hardware accelerator. Then, the authors brief those techniques in aspects such as what kind of data set it applied, how it is designed and what is the contribution of this design.

Findings

ML algorithms are potent tools accommodated on IMC architecture. Although general-purpose hardware (central processing units and GPUs) can supply explicit solutions, their energy efficiencies have limitations because of their excessive flexibility support. On the other hand, hardware accelerators (field programmable gate arrays and application-specific integrated circuits) win on the energy efficiency aspect, but individual accelerator often adapts exclusively to ax single ML approach (family). From a long hardware evolution perspective, hardware/software collaboration heterogeneity design from hybrid platforms is an option for the researcher.

Originality/value

IMC’s optimization enables high-speed processing, increases performance and analyzes massive volumes of data in real-time. This work reviews IMC and its evolution. Then, the authors categorize three optimization paths for the IMC architecture to improve performance metrics.

Details

International Journal of Web Information Systems, vol. 20 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 12 June 2017

Andre Luis Cavalcanti Bueno, Noemi de La Rocque Rodriguez and Elisa Dominguez Sotelino

The purpose of this work is to present a methodology that harnesses the computational power of multiple graphics processing units (GPUs) and hides the complexities of tuning GPU…

Abstract

Purpose

The purpose of this work is to present a methodology that harnesses the computational power of multiple graphics processing units (GPUs) and hides the complexities of tuning GPU parameters from the users.

Design/methodology/approach

A methodology for auto-tuning OpenCL configuration parameters has been developed.

Findings

This described process helps simplify coding and generates a significant gain in time for each method execution.

Originality/value

Most authors develop their GPU applications for specific hardware configurations. In this work, a solution is offered to make the developed code portable to any GPU hardware.

Details

Engineering Computations, vol. 34 no. 4
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 17 July 2019

Ali Ayyed Abdul-Kadhim, Fue-Sang Lien and Eugene Yee

This study aims to modify the standard probabilistic lattice Boltzmann methodology (LBM) cellular automata (CA) algorithm to enable a more realistic and accurate computation of…

Abstract

Purpose

This study aims to modify the standard probabilistic lattice Boltzmann methodology (LBM) cellular automata (CA) algorithm to enable a more realistic and accurate computation of the ensemble rather than individual particle trajectories that need to be updated from one time step to the next (allowing, as such, a fraction of the collection of particles in any lattice grid cell to be updated in a time step, rather than the entire collection of particles as in the standard LBM-CA algorithm leading to a better representation of the dynamic interaction between the particles and the background flow). Exploitation of the inherent parallelism of the modified LBM-CA algorithm to provide a computationally efficient scheme for computation of particle-laden flows on readily available commodity general-purpose graphics processing units (GPGPUs).

Design/methodology/approach

This paper presents a framework for the implementation of a LBM for the simulation of particle transport and deposition in complex flows on a GPGPU. Towards this objective, the authors have shown how to map the data structure of the LBM with a multiple-relaxation-time (MRT) collision operator and the Smagorinsky subgrid-scale turbulence model (for turbulent fluid flow simulations) coupled with a CA probabilistic method (for particle transport and deposition simulations) to a GPGPU to give a high-performance computing tool for the calculation of particle-laden flows.

Findings

A fluid-particle simulation using our LBM-MRT-CA algorithm run on a single GPGPU was 160 times as computationally efficient as the same algorithm run on a single CPU.

Research limitations/implications

The method is limited by the available computational resources (e.g. GPU memory size).

Originality/value

A new 3D LBM-MRT-CA model was developed to simulate the particle transport and deposition in complex laminar and turbulent flows with different hydrodynamic characteristics (e.g. vortex shedding, impingement, free shear layer, turbulent boundary layer). The solid particle information is encapsulated locally at the lattice grid nodes, allowing for straightforward mapping of the datastructure onto a GPGPU enabling a massive parallel execution of the LBM-MRT-CA algorithm. The new particle transport algorithm was based on the local (bulk) particle density and velocity and provides more realistic results for the particle transport and deposition than the standard LBM-CA algorithm.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. 29 no. 7
Type: Research Article
ISSN: 0961-5539

Keywords

Article
Publication date: 2 November 2020

Emmanuel Imuetinyan Aghimien, Lerato Millicent Aghimien, Olutomilayo Olayemi Petinrin and Douglas Omoregie Aghimien

This paper aims to present the result of a scientometric analysis conducted using studies on high-performance computing in computational modelling. This was done with a view to…

Abstract

Purpose

This paper aims to present the result of a scientometric analysis conducted using studies on high-performance computing in computational modelling. This was done with a view to showcasing the need for high-performance computers (HPC) within the architecture, engineering and construction (AEC) industry in developing countries, particularly in Africa, where the use of HPC in developing computational models (CMs) for effective problem solving is still low.

Design/methodology/approach

An interpretivism philosophical stance was adopted for the study which informed a scientometric review of existing studies gathered from the Scopus database. Keywords such as high-performance computing, and computational modelling were used to extract papers from the database. Visualisation of Similarities viewer (VOSviewer) was used to prepare co-occurrence maps based on the bibliographic data gathered.

Findings

Findings revealed the scarcity of research emanating from Africa in this area of study. Furthermore, past studies had placed focus on high-performance computing in the development of computational modelling and theory, parallel computing and improved visualisation, large-scale application software, computer simulations and computational mathematical modelling. Future studies can also explore areas such as cloud computing, optimisation, high-level programming language, natural science computing, computer graphics equipment and Graphics Processing Units as they relate to the AEC industry.

Research limitations/implications

The study assessed a single database for the search of related studies.

Originality/value

The findings of this study serve as an excellent theoretical background for AEC researchers seeking to explore the use of HPC for CMs development in the quest for solving complex problems in the industry.

Details

Journal of Engineering, Design and Technology , vol. 19 no. 5
Type: Research Article
ISSN: 1726-0531

Keywords

Article
Publication date: 17 September 2021

Sukumar Rajendran, Sandeep Kumar Mathivanan, Prabhu Jayagopal, Kumar Purushothaman Janaki, Benjula Anbu Malar Manickam Bernard, Suganya Pandy and Manivannan Sorakaya Somanathan

Artificial Intelligence (AI) has surpassed expectations in opening up different possibilities for machines from different walks of life. Cloud service providers are pushing. Edge…

Abstract

Purpose

Artificial Intelligence (AI) has surpassed expectations in opening up different possibilities for machines from different walks of life. Cloud service providers are pushing. Edge computing reduces latency, improving availability and saving bandwidth.

Design/methodology/approach

The exponential growth in tensor processing unit (TPU) and graphics processing unit (GPU) combined with different types of sensors has enabled the pairing of medical technology with deep learning in providing the best patient care. A significant role of pushing and pulling data from the cloud, big data comes into play as velocity, veracity and volume of data with IoT assisting doctors in predicting the abnormalities and providing customized treatment based on the patient electronic health record (EHR).

Findings

The primary focus of edge computing is decentralizing and bringing intelligent IoT devices to provide real-time computing at the point of presence (PoP). The impact of the PoP in healthcare gains importance as wearable devices and mobile apps are entrusted with real-time monitoring and diagnosis of patients. The impact edge computing of the PoP in healthcare gains significance as wearable devices and mobile apps are entrusted with real-time monitoring and diagnosis of patients.

Originality/value

The utility value of sensors data improves through the Laplacian mechanism of preserved PII response to each query from the ODL. The scalability is at 50% with respect to the sensitivity and preservation of the PII values in the local ODL.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 15 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

1 – 10 of over 17000