To read this content please select one of the options below:

UNAT: UNstructured Acceleration Toolkit on SW26010 many-core processor

Hongbin Liu (Department of Advanced Manufacture, National Supercomputing Centre, Wuxi, China)
Hu Ren (Department of Advanced Manufacture, National Supercomputing Centre, Wuxi, China)
Hanfeng Gu (Department of Advanced Manufacture, National Supercomputing Centre, Wuxi, China)
Fei Gao (Department of Advanced Manufacture, National Supercomputing Centre, Wuxi, China)
Guangwen Yang (Department of Computer Science and Technology, Tsinghua University, Beijing, China and Beijing National Research Center for Information Science and Technology, Beijing, China)

Engineering Computations

ISSN: 0264-4401

Article publication date: 30 April 2020

Issue publication date: 28 October 2020

88

Abstract

Purpose

The purpose of this paper is to provide an automatic parallelization toolkit for unstructured mesh-based computation. Among all kinds of mesh types, unstructured meshes are dominant in engineering simulation scenarios and play an essential role in scientific computations for their geometrical flexibility. However, the high-fidelity applications based on unstructured grids are still time-consuming, no matter for programming or running.

Design/methodology/approach

This study develops an efficient UNstructured Acceleration Toolkit (UNAT), which provides friendly high-level programming interfaces and elaborates lower level implementation on the target hardware to get nearly hand-optimized performance. At the present state, two efficient strategies, a multi-level blocks method and a row-subsections method, are designed and implemented on Sunway architecture. Random memory access and write–write conflict issues of unstructured meshes have been handled by partitioning, coloring and other hardware-specific techniques. Moreover, a data-reuse mechanism is developed to increase the computational intensity and alleviate the memory bandwidth bottleneck.

Findings

The authors select sparse matrix-vector multiplication as a performance benchmark of UNAT across different data layouts and different matrix formats. Experimental results show that the speed-ups reach up to 26× compared to single management processing element, and the utilization ratio tests indicate the capability of achieving nearly hand-optimized performance. Finally, the authors adopt UNAT to accelerate a well-tuned unstructured solver and obtain speed-ups of 19× and 10× on average for main kernels and overall solver, respectively.

Originality/value

The authors design an unstructured mesh toolkit, UNAT, to link the hardware and numerical algorithm, and then, engineers can focus on the algorithms and solvers rather than the parallel implementation. For the many-core processor SW26010 of the fastest supercomputer in China, UNAT yields up to 26× speed-ups and achieves nearly hand-optimized performance.

Keywords

Acknowledgements

This work was supported by the National Key R&D Program of China under Project 2017YFB0203602. This work was also partly supported the National Natural Science Foundation of China (grant no. 91746119, 61672312), and Center for High Performance Computing and System Simulation, Pilot National Laboratory for Marine Science and Technology (Qingdao).

Citation

Liu, H., Ren, H., Gu, H., Gao, F. and Yang, G. (2020), "UNAT: UNstructured Acceleration Toolkit on SW26010 many-core processor", Engineering Computations, Vol. 37 No. 9, pp. 3187-3208. https://doi.org/10.1108/EC-09-2019-0401

Publisher

:

Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited

Related articles