To read the full version of this content please select one of the options below:

XML data partitioning schemes for parallel holistic twig joins

Imam Machdi (Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan)
Toshiyuki Amagasa (Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan)
Hiroyuki Kitagawa (Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 19 June 2009

Abstract

Purpose

The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig joins: grid metadata model for XML (GMX) and streams‐based partitioning method for XML (SPX).

Design/methodology/approach

GMX exploits the relationships between XML documents and query patterns to perform workload‐aware partitioning of XML data. Specifically, the paper constructs a two‐dimensional model with a document dimension and a query dimension in which each object in a dimension is composed from XML metadata related to the dimension. GMX provides a set of XML data partitioning methods that include document clustering, query clustering, document‐based refinement, query‐based refinement, and query‐path refinement, thereby enabling XML data partitioning based on the static information of XML metadata. In contrast, SPX explores the structural relationships of query elements and a range‐containment property of XML streams to generate partitions and allocate them to cluster nodes on‐the‐fly.

Findings

GMX provides several salient features: a set of partition granularities that balance workloads of query processing costs among cluster nodes statically; inter‐query parallelism as well as intra‐query parallelism at multiple extents; and better parallel query performance when all estimated queries are executed simultaneously to meet their probability of query occurrences in the system. SPX also offers the following features: minimal computation time to generate partitions; balancing skewed workloads dynamically on the system; producing higher intra‐query parallelism; and gaining better parallel query performance.

Research limitations/implications

The current status of the proposed XML data partitioning schemes does not take into account XML data updates, e.g. new XML documents and query pattern changes submitted by users on the system.

Practical implications

Note that effectiveness of the XML data partitioning schemes mainly relies on the accuracy of the cost model to estimate query processing costs. The cost model must be adjusted to reflect characteristics of a system platform used in the implementation.

Originality/value

This paper proposes novel schemes of conducting XML data partitioning to achieve both static and dynamic workload balance.

Keywords

Citation

Machdi, I., Amagasa, T. and Kitagawa, H. (2009), "XML data partitioning schemes for parallel holistic twig joins", International Journal of Web Information Systems, Vol. 5 No. 2, pp. 151-194. https://doi.org/10.1108/17440080910968445

Publisher

:

Emerald Group Publishing Limited

Copyright © 2009, Emerald Group Publishing Limited