Search results

1 – 10 of over 20000
Article
Publication date: 19 February 2021

C. Lakshmi and K. Usha Rani

Resilient distributed processing technique (RDPT), in which mapper and reducer are simplified with the Spark contexts and support distributed parallel query processing.

Abstract

Purpose

Resilient distributed processing technique (RDPT), in which mapper and reducer are simplified with the Spark contexts and support distributed parallel query processing.

Design/methodology/approach

The proposed work is implemented with Pig Latin with Spark contexts to develop query processing in a distributed environment.

Findings

Query processing in Hadoop influences the distributed processing with the MapReduce model. MapReduce caters to the works on different nodes with the implementation of complex mappers and reducers. Its results are valid for some extent size of the data.

Originality/value

Pig supports the required parallel processing framework with the following constructs during the processing of queries: FOREACH; FLATTEN; COGROUP.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 14 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 19 June 2009

Imam Machdi, Toshiyuki Amagasa and Hiroyuki Kitagawa

The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig…

Abstract

Purpose

The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig joins: grid metadata model for XML (GMX) and streams‐based partitioning method for XML (SPX).

Design/methodology/approach

GMX exploits the relationships between XML documents and query patterns to perform workload‐aware partitioning of XML data. Specifically, the paper constructs a two‐dimensional model with a document dimension and a query dimension in which each object in a dimension is composed from XML metadata related to the dimension. GMX provides a set of XML data partitioning methods that include document clustering, query clustering, document‐based refinement, query‐based refinement, and query‐path refinement, thereby enabling XML data partitioning based on the static information of XML metadata. In contrast, SPX explores the structural relationships of query elements and a range‐containment property of XML streams to generate partitions and allocate them to cluster nodes on‐the‐fly.

Findings

GMX provides several salient features: a set of partition granularities that balance workloads of query processing costs among cluster nodes statically; inter‐query parallelism as well as intra‐query parallelism at multiple extents; and better parallel query performance when all estimated queries are executed simultaneously to meet their probability of query occurrences in the system. SPX also offers the following features: minimal computation time to generate partitions; balancing skewed workloads dynamically on the system; producing higher intra‐query parallelism; and gaining better parallel query performance.

Research limitations/implications

The current status of the proposed XML data partitioning schemes does not take into account XML data updates, e.g. new XML documents and query pattern changes submitted by users on the system.

Practical implications

Note that effectiveness of the XML data partitioning schemes mainly relies on the accuracy of the cost model to estimate query processing costs. The cost model must be adjusted to reflect characteristics of a system platform used in the implementation.

Originality/value

This paper proposes novel schemes of conducting XML data partitioning to achieve both static and dynamic workload balance.

Details

International Journal of Web Information Systems, vol. 5 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 February 1993

BRIAN VICKERY and ALINA VICKERY

There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely…

Abstract

There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely held that less use is made of these databases than could or should be the case, and that one reason for this is that potential users find it difficult to identify which databases to search, to use the various command languages of the hosts and to construct the Boolean search statements required. This reasoning has stimulated a considerable amount of exploration and development work on the construction of search interfaces, to aid the inexperienced user to gain effective access to these databases. The aim of our paper is to review aspects of the design of such interfaces: to indicate the requirements that must be met if maximum aid is to be offered to the inexperienced searcher; to spell out the knowledge that must be incorporated in an interface if such aid is to be given; to describe some of the solutions that have been implemented in experimental and operational interfaces; and to discuss some of the problems encountered. The paper closes with an extensive bibliography of references relevant to online search aids, going well beyond the items explicitly mentioned in the text. An index to software appears after the bibliography at the end of the paper.

Details

Journal of Documentation, vol. 49 no. 2
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 31 December 2006

Hooman Homayounfar and Fangju Wang

XML is becoming one of the most important structures for data exchange on the web. Despite having many advantages, XML structure imposes several major obstacles to large document…

Abstract

XML is becoming one of the most important structures for data exchange on the web. Despite having many advantages, XML structure imposes several major obstacles to large document processing. Inconsistency between the linear nature of the current algorithms (e.g. for caching and prefetch) used in operating systems and databases, and the non‐linear structure of XML data makes XML processing more costly. In addition to verbosity (e.g. tag redundancy), interpreting (i.e. parsing) depthfirst (DF) structure of XML documents is a significant overhead to processing applications (e.g. query engines). Recent research on XML query processing has learned that sibling clustering can improve performance significantly. However, the existing clustering methods are not able to avoid parsing overhead as they are limited by larger document sizes. In this research, We have developed a better data organization for native XML databases, named sibling‐first (SF) format that improves query performance significantly. SF uses an embedded index for fast accessing to child nodes. It also compresses documents by eliminating extra information from the original DF format. The converted SF documents can be processed for XPath query purposes without being parsed. We have implemented the SF storage in virtual memory as well as a format on disk. Experimental results with real data have showed that significantly higher performance can be achieved when XPath queries are conducted on very large SF documents.

Details

International Journal of Web Information Systems, vol. 2 no. 3/4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 23 November 2010

Nils Hoeller, Christoph Reinke, Jana Neumann, Sven Groppe, Christian Werner and Volker Linnemann

In the last decade, XML has become the de facto standard for data exchange in the world wide web (WWW). The positive benefits of data exchangeability to support system and…

Abstract

Purpose

In the last decade, XML has become the de facto standard for data exchange in the world wide web (WWW). The positive benefits of data exchangeability to support system and software heterogeneity on application level and easy WWW integration make XML an ideal data format for many other application and network scenarios like wireless sensor networks (WSNs). Moreover, the usage of XML encourages using standardized techniques like SOAP to adapt the service‐oriented paradigm to sensor network engineering. Nevertheless, integrating XML usage in WSN data management is limited by the low hardware resources that require efficient XML data management strategies suitable to bridge the general resource gap. The purpose of this paper is to present two separate strategies on integrating XML data management in WSNs.

Design/methodology/approach

The paper presents two separate strategies on integrating XML data management in WSNs that have been implemented and are running on today's sensor node platforms. The paper shows how XML data can be processed and how XPath queries can be evaluated dynamically. In an extended evaluation, the performance of both strategies concerning the memory and energy efficiency are compared and both solutions are shown to have application domains fully applicable on today's sensor node products.

Findings

This work shows that dynamic XML data management and query evaluation is possible on sensor nodes with strict limitations in terms of memory, processing power and energy supply.

Originality/value

The paper presents an optimized stream‐based XML compression technique and shows how XML queries can be evaluated on compressed XML bit streams using generic pushdown automata. To the best of the authors' knowledge, this is the first complete approach on integrating dynamic XML data management into WSNs.

Details

International Journal of Web Information Systems, vol. 6 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 28 September 2007

Alasdair J.G. Gray, Werner Nutt and M. Howard Williams

Distributed data streams are an important topic of current research. In such a setting, data values will be missed, e.g. due to network errors. This paper aims to allow this…

Abstract

Purpose

Distributed data streams are an important topic of current research. In such a setting, data values will be missed, e.g. due to network errors. This paper aims to allow this incompleteness to be detected and overcome with either the user not being affected or the effects of the incompleteness being reported to the user.

Design/methodology/approach

A model for representing the incomplete information has been developed that captures the information that is known about the missing data. Techniques for query answering involving certain and possible answer sets have been extended so that queries over incomplete data stream histories can be answered.

Findings

It is possible to detect when a distributed data stream is missing one or more values. When such data values are missing there will be some information that is known about the data and this is stored in an appropriate format. Even when the available data are incomplete, it is possible in some circumstances to answer a query completely. When this is not possible, additional meta‐data can be returned to inform the user of the effects of the incompleteness.

Research limitations/implications

The techniques and models proposed in this paper have only been partially implemented.

Practical implications

The proposed system is general and can be applied wherever there is a need to query the history of distributed data streams. The work in this paper enables the system to answer queries when there are missing values in the data.

Originality/value

This paper presents a general model of how to detect, represent, and answer historical queries over incomplete distributed data streams.

Details

International Journal of Web Information Systems, vol. 3 no. 1/2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 14 April 2014

Chang-Sup Park and Sungchae Lim

The paper aims to propose an effective method to process keyword-based queries over graph-structured databases which are widely used in various applications such as XML, semantic…

Abstract

Purpose

The paper aims to propose an effective method to process keyword-based queries over graph-structured databases which are widely used in various applications such as XML, semantic web, and social network services. To satisfy users' information need, it proposes an extended answer structure for keyword queries, inverted list indexes on keywords and nodes, and query processing algorithms exploiting the inverted lists. The study aims to provide more effective and relevant answers to a given query than the previous approaches in an efficient way.

Design/methodology/approach

A new relevance measure for nodes to a given keyword query is defined in the paper and according to the relevance metric, a new answer tree structure is proposed which has no constraint on the number of keyword nodes chosen for each query keyword. For efficient query processing, an inverted list-style index is suggested which pre-computes connectivity and relevance information on the nodes in the graph. Then, a query processing algorithm based on the pre-constructed inverted lists is designed, which aggregates list entries for each graph node relevant to given keywords and identifies top-k root nodes of answer trees most relevant to the given query. The basic search method is also enhanced by using extend inverted lists which store additional relevance information of the related entries in the lists in order to estimate the relevance score of a node more closely and to find top-k answers more efficiently.

Findings

Experiments with real datasets and various test queries were conducted for evaluating effectiveness and performance of the proposed methods in comparison with one of the previous approaches. The experimental results show that the proposed methods with an extended answer structure produce more effective top-k results than the compared previous method for most of the queries, especially for those with OR semantics. An extended inverted list and enhanced search algorithm are shown to achieve much improvement on the execution performance compared to the basic search method.

Originality/value

This paper proposes a new extended answer structure and query processing scheme for keyword queries on graph databases which can satisfy the users' information need represented by a keyword set having various semantics.

Details

International Journal of Web Information Systems, vol. 10 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 March 1990

EFTHIMIS N. EFTHIMIADIS

This review reports on the current state and the potential of tools and systems designed to aid online searching, referred to here as online searching aids. Intermediary…

239

Abstract

This review reports on the current state and the potential of tools and systems designed to aid online searching, referred to here as online searching aids. Intermediary mechanisms are examined in terms of the two stage model, i.e. end‐user, intermediary, ‘raw database’, and different forms of user — system interaction are discussed. The evolution of the terminology of online searching aids is presented with special emphasis on the expert/non‐expert division. Terms defined include gateways, front‐end systems, intermediary systems and post‐processing. The alternative configurations that such systems can have and the approaches to the design of the user interface are discussed. The review then analyses the functions of online searching aids, i.e. logon procedures, access to hosts, help features, search formulation, query reformulation, database selection, uploading, downloading and post‐processing. Costs are then briefly examined. The review concludes by looking at future trends following recent developments in computer science and elsewhere. Distributed expert based information systems (debis), the standard generalised mark‐up language (SGML), the client‐server model, object‐orientation and parallel processing are expected to influence, if they have not done so already, the design and implementation of future online searching aids.

Details

Journal of Documentation, vol. 46 no. 3
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 August 2005

Jinbao Li, Yingshu Li, My T. Thai and Jianzhong Li

This paper investigates query processing in MANETs. Cache techniques and multi‐join database operations are studied. For data caching, a group‐caching strategy is proposed. Using…

Abstract

This paper investigates query processing in MANETs. Cache techniques and multi‐join database operations are studied. For data caching, a group‐caching strategy is proposed. Using the cache and the index of the cached data, queries can be processed at a single node or within the group containing this single node. For multi‐join, a cost evaluation model and a query plan generation algorithm are presented. Query cost is evaluated based on the parameters including the size of the transmitted data, the transmission distance and the query cost at each single node. According to the evaluations, the nodes on which the query should be executed and the join order are determined. Theoretical analysis and experiment results show that the proposed group‐caching based query processing and the cost based join strategy are efficient in MANETs. It is suitable for the mobility, the disconnection and the multi‐hop features of MANETs. The communication cost between nodes is reduced and the efficiency of the query is improved greatly.

Details

International Journal of Pervasive Computing and Communications, vol. 1 no. 3
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 1 June 2000

Kalervo Järvelin, Peter Ingwersen and Timo Niemi

This article presents a novel user‐oriented interface for generalised informetric analysis and demonstrates how informetric calculations can easily and declaratively be specified…

Abstract

This article presents a novel user‐oriented interface for generalised informetric analysis and demonstrates how informetric calculations can easily and declaratively be specified through advanced data modelling techniques. The interface is declarative and at a high level. Therefore it is easy to use, flexible and extensible. It enables end users to perform basic informetric ad hoc calculations easily and often with much less effort than in contemporary online retrieval systems. It also provides several fruitful generalisations of typical informetric measurements like impact factors. These are based on substituting traditional foci of analysis, for instance journals, by other object types, such as authors, organisations or countries. In the interface, bibliographic data are modelled as complex objects (non‐first normal form relations) and terminological and citation networks involving transitive relationships are modelled as binary relations for deductive processing. The interface is flexible, because it makes it easy to switch focus between various object types for informetric calculations, e.g. from authors to institutions. Moreover, it is demonstrated that all informetric data can easily be broken down by criteria that foster advanced analysis, e.g. by years or content‐bearing attributes. Such modelling allows flexible data aggregation along many dimensions. These salient features emerge from the query interface‘s general data restructuring and aggregation capabilities combined with transitive processing capabilities. The features are illustrated by means of sample queries and results in the article.

Details

Journal of Documentation, vol. 56 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

1 – 10 of over 20000