Search results

1 – 3 of 3
Article
Publication date: 5 April 2024

Abhishek Kumar Singh and Krishna Mohan Singh

In the present work, we focus on developing an in-house parallel meshless local Petrov-Galerkin (MLPG) code for the analysis of heat conduction in two-dimensional and…

Abstract

Purpose

In the present work, we focus on developing an in-house parallel meshless local Petrov-Galerkin (MLPG) code for the analysis of heat conduction in two-dimensional and three-dimensional regular as well as complex geometries.

Design/methodology/approach

The parallel MLPG code has been implemented using open multi-processing (OpenMP) application programming interface (API) on the shared memory multicore CPU architecture. Numerical simulations have been performed to find the critical regions of the serial code, and an OpenMP-based parallel MLPG code is developed, considering the critical regions of the sequential code.

Findings

Based on performance parameters such as speed-up and parallel efficiency, the credibility of the parallelization procedure has been established. Maximum speed-up and parallel efficiency are 10.94 and 0.92 for regular three-dimensional geometry (343,000 nodes). Results demonstrate the suitability of parallelization for larger nodes as parallel efficiency and speed-up are more for the larger nodes.

Originality/value

Few attempts have been made in parallel implementation of the MLPG method for solving large-scale industrial problems. Although the literature suggests that message-passing interface (MPI) based parallel MLPG codes have been developed, the OpenMP model has rarely been touched. This work is an attempt at the development of OpenMP-based parallel MLPG code for the very first time.

Details

Engineering Computations, vol. 41 no. 2
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 29 December 2023

Thanh-Nghi Do and Minh-Thu Tran-Nguyen

This study aims to propose novel edge device-tailored federated learning algorithms of local classifiers (stochastic gradient descent, support vector machines), namely, FL-lSGD…

Abstract

Purpose

This study aims to propose novel edge device-tailored federated learning algorithms of local classifiers (stochastic gradient descent, support vector machines), namely, FL-lSGD and FL-lSVM. These algorithms are designed to address the challenge of large-scale ImageNet classification.

Design/methodology/approach

The authors’ FL-lSGD and FL-lSVM trains in a parallel and incremental manner to build an ensemble local classifier on Raspberry Pis without requiring data exchange. The algorithms load small data blocks of the local training subset stored on the Raspberry Pi sequentially to train the local classifiers. The data block is split into k partitions using the k-means algorithm, and models are trained in parallel on each data partition to enable local data classification.

Findings

Empirical test results on the ImageNet data set show that the authors’ FL-lSGD and FL-lSVM algorithms with 4 Raspberry Pis (Quad core Cortex-A72, ARM v8, 64-bit SoC @ 1.5GHz, 4GB RAM) are faster than the state-of-the-art LIBLINEAR algorithm run on a PC (Intel(R) Core i7-4790 CPU, 3.6 GHz, 4 cores, 32GB RAM).

Originality/value

Efficiently addressing the challenge of large-scale ImageNet classification, the authors’ novel federated learning algorithms of local classifiers have been tailored to work on the Raspberry Pi. These algorithms can handle 1,281,167 images and 1,000 classes effectively.

Details

International Journal of Web Information Systems, vol. 20 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 22 December 2023

Vaclav Snasel, Tran Khanh Dang, Josef Kueng and Lingping Kong

This paper aims to review in-memory computing (IMC) for machine learning (ML) applications from history, architectures and options aspects. In this review, the authors investigate…

87

Abstract

Purpose

This paper aims to review in-memory computing (IMC) for machine learning (ML) applications from history, architectures and options aspects. In this review, the authors investigate different architectural aspects and collect and provide our comparative evaluations.

Design/methodology/approach

Collecting over 40 IMC papers related to hardware design and optimization techniques of recent years, then classify them into three optimization option categories: optimization through graphic processing unit (GPU), optimization through reduced precision and optimization through hardware accelerator. Then, the authors brief those techniques in aspects such as what kind of data set it applied, how it is designed and what is the contribution of this design.

Findings

ML algorithms are potent tools accommodated on IMC architecture. Although general-purpose hardware (central processing units and GPUs) can supply explicit solutions, their energy efficiencies have limitations because of their excessive flexibility support. On the other hand, hardware accelerators (field programmable gate arrays and application-specific integrated circuits) win on the energy efficiency aspect, but individual accelerator often adapts exclusively to ax single ML approach (family). From a long hardware evolution perspective, hardware/software collaboration heterogeneity design from hybrid platforms is an option for the researcher.

Originality/value

IMC’s optimization enables high-speed processing, increases performance and analyzes massive volumes of data in real-time. This work reviews IMC and its evolution. Then, the authors categorize three optimization paths for the IMC architecture to improve performance metrics.

Details

International Journal of Web Information Systems, vol. 20 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Access

Year

Last 3 months (3)

Content type

1 – 3 of 3