To read this content please select one of the options below:

Parallel holistic twig joins on a multi‐core system

Imam Machdi (Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan)
Toshiyuki Amagasa (Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan)
Hiroyuki Kitagawa (Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 22 June 2010

217

Abstract

Purpose

The purpose of this paper is to propose general parallelism techniques for holistic twig join algorithms to process queries against Extensible Markup Language (XML) databases on a multi‐core system.

Design/methodology/approach

The parallelism techniques comprised data and task parallelism. As for data parallelism, the paper adopted the stream‐based partitioning for XML to partition XML data as the basis of parallelism on multiple CPU cores. The XML data partitioning was performed in two levels. The first level was to create buckets for creating data independence and balancing loads among CPU cores; each bucket was assigned onto a CPU core. Within each bucket, the second level of XML data partitioning was performed to create finer partitions for providing finer parallelism. Each CPU core performed the holistic twig join algorithm on each finer partition of its own in parallel with other CPU cores. In task parallelism, the holistic twig join algorithm was decomposed into two main tasks, which were pipelined to create parallelism. The first task adopted the data parallelism technique and their outputs were transferred to the second task periodically. Since data transfers incurred overheads, the size of each data transfer needed to be estimated cautiously for achieving optimal performance.

Findings

The data and task parallelism techniques contribute to good performance especially for queries having complex structures and/or higher values of query selectivity. The performance of data parallelism can be further improved by task parallelism. Significant performance improvement is attained by queries having higher selectivity because more outputs computed by the second task is performed in parallel with the first task.

Research limitations/implications

The proposed parallelism techniques primarily deals with executing a single long‐running query for intra‐query parallelism, partitioning XML data on‐the‐fly, and allocating partitions on CPU cores statically. During the parallel execution, presumably there are no such dynamic XML data updates.

Practical implications

The effectiveness of the proposed parallel holistic twig joins relies fundamentally on some system parameter values that can be obtained from a benchmark of the system platform.

Originality/value

The paper proposes novel techniques to increase parallelism by combining techniques of data and task parallelism for achieving high performance. To the best of the author's knowledge, this is the first paper of parallelizing the holistic twig join algorithms on a multi‐core system.

Keywords

Citation

Machdi, I., Amagasa, T. and Kitagawa, H. (2010), "Parallel holistic twig joins on a multi‐core system", International Journal of Web Information Systems, Vol. 6 No. 2, pp. 149-177. https://doi.org/10.1108/17440081011053131

Publisher

:

Emerald Group Publishing Limited

Copyright © 2010, Emerald Group Publishing Limited

Related articles