Search results

1 – 10 of 129
Article
Publication date: 17 August 2015

Savong Bou, Toshiyuki Amagasa and Hiroyuki Kitagawa

In purpose of this paper is to propose a novel scheme to process XPath-based keyword search over Extensible Markup Language (XML) streams, where one can specify query keywords and…

114

Abstract

Purpose

In purpose of this paper is to propose a novel scheme to process XPath-based keyword search over Extensible Markup Language (XML) streams, where one can specify query keywords and XPath-based filtering conditions at the same time. Experimental results prove that our proposed scheme can efficiently and practically process XPath-based keyword search over XML streams.

Design/methodology/approach

To allow XPath-based keyword search over XML streams, it was attempted to integrate YFilter (Diao et al., 2003) with CKStream (Hummel et al., 2011). More precisely, the nondeterministic finite automation (NFA) of YFilter is extended so that keyword matching at text nodes is supported. Next, the stack data structure is modified by integrating set of NFA states in YFilter with bitmaps generated from set of keyword queries in CKStream.

Findings

Extensive experiments were conducted using both synthetic and real data set to show the effectiveness of the proposed method. The experimental results showed that the accuracy of the proposed method was better than the baseline method (CKStream), while it consumed less memory. Moreover, the proposed scheme showed good scalability with respect to the number of queries.

Originality/value

Due to the rapid diffusion of XML streams, the demand for querying such information is also growing. In such a situation, the ability to query by combining XPath and keyword search is important, because it is easy to use, but powerful means to query XML streams. However, none of existing works has addressed this issue. This work is to cope with this problem by combining an existing XPath-based YFilter and a keyword-search-based CKStream for XML streams to enable XPath-based keyword search.

Details

International Journal of Web Information Systems, vol. 11 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 23 November 2010

Nils Hoeller, Christoph Reinke, Jana Neumann, Sven Groppe, Christian Werner and Volker Linnemann

In the last decade, XML has become the de facto standard for data exchange in the world wide web (WWW). The positive benefits of data exchangeability to support system and…

Abstract

Purpose

In the last decade, XML has become the de facto standard for data exchange in the world wide web (WWW). The positive benefits of data exchangeability to support system and software heterogeneity on application level and easy WWW integration make XML an ideal data format for many other application and network scenarios like wireless sensor networks (WSNs). Moreover, the usage of XML encourages using standardized techniques like SOAP to adapt the service‐oriented paradigm to sensor network engineering. Nevertheless, integrating XML usage in WSN data management is limited by the low hardware resources that require efficient XML data management strategies suitable to bridge the general resource gap. The purpose of this paper is to present two separate strategies on integrating XML data management in WSNs.

Design/methodology/approach

The paper presents two separate strategies on integrating XML data management in WSNs that have been implemented and are running on today's sensor node platforms. The paper shows how XML data can be processed and how XPath queries can be evaluated dynamically. In an extended evaluation, the performance of both strategies concerning the memory and energy efficiency are compared and both solutions are shown to have application domains fully applicable on today's sensor node products.

Findings

This work shows that dynamic XML data management and query evaluation is possible on sensor nodes with strict limitations in terms of memory, processing power and energy supply.

Originality/value

The paper presents an optimized stream‐based XML compression technique and shows how XML queries can be evaluated on compressed XML bit streams using generic pushdown automata. To the best of the authors' knowledge, this is the first complete approach on integrating dynamic XML data management into WSNs.

Details

International Journal of Web Information Systems, vol. 6 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 18 November 2013

Jaroslav Pokorný

This paper considers schemaless XML data stored in a column-oriented storage, particularly in C-store. Axes of the XPath language are studied and a design and analysis of…

Abstract

Purpose

This paper considers schemaless XML data stored in a column-oriented storage, particularly in C-store. Axes of the XPath language are studied and a design and analysis of algorithms for processing the XPath fragment XP{*, //, /} are described in detail. The paper aims to discuss these issues.

Design/methodology/approach

A two-level model of C-store based on XML-enabled relational databases is supposed. The axes of XPath language in this environment have been studied by Cástková and Pokorný. The associated algorithms have been used for the implementation of the XPath fragment XP{*, //, /}.

Findings

The main advantage of this approach is algorithms implementing axes evaluations that are mostly of logarithmic complexity in n, where n is the number of nodes of XML tree associated with an XML document. A low-level memory system enables the estimation of the number of two abstract operations providing an interface to an external memory. The algorithms developed are mostly of logarithmic complexity in n, where n is the number of nodes of XML tree associated with an XML document.

Originality/value

The paper extends the approach of querying XML data stored in a column-oriented storage to the XPath fragment using only child and descendant axes and estimates the complexity of evaluating its queries.

Details

International Journal of Web Information Systems, vol. 9 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 31 December 2006

Hooman Homayounfar and Fangju Wang

XML is becoming one of the most important structures for data exchange on the web. Despite having many advantages, XML structure imposes several major obstacles to large document…

Abstract

XML is becoming one of the most important structures for data exchange on the web. Despite having many advantages, XML structure imposes several major obstacles to large document processing. Inconsistency between the linear nature of the current algorithms (e.g. for caching and prefetch) used in operating systems and databases, and the non‐linear structure of XML data makes XML processing more costly. In addition to verbosity (e.g. tag redundancy), interpreting (i.e. parsing) depthfirst (DF) structure of XML documents is a significant overhead to processing applications (e.g. query engines). Recent research on XML query processing has learned that sibling clustering can improve performance significantly. However, the existing clustering methods are not able to avoid parsing overhead as they are limited by larger document sizes. In this research, We have developed a better data organization for native XML databases, named sibling‐first (SF) format that improves query performance significantly. SF uses an embedded index for fast accessing to child nodes. It also compresses documents by eliminating extra information from the original DF format. The converted SF documents can be processed for XPath query purposes without being parsed. We have implemented the SF storage in virtual memory as well as a format on disk. Experimental results with real data have showed that significantly higher performance can be achieved when XPath queries are conducted on very large SF documents.

Details

International Journal of Web Information Systems, vol. 2 no. 3/4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 4 April 2008

Sherif Sakr

Estimating the sizes of query results and intermediate results is crucial to many aspects of query processing. All database systems rely on the use of cardinality estimates to…

1604

Abstract

Purpose

Estimating the sizes of query results and intermediate results is crucial to many aspects of query processing. All database systems rely on the use of cardinality estimates to choose the cheapest execution plan. In principle, the problem of cardinality estimation is more complicated in the Extensible Markup Language (XML) domain than the relational domain. The purpose of this paper is to present a novel framework for estimating the cardinality of XQuery expressions as well as their sub‐expressions. Additionally, this paper proposes a novel XQuery cardinality estimation benchmark. The main aim of this benchmark is to establish the basis of comparison between the different estimation approaches in the XQuery domain.

Design/methodology/approach

As a major innovation, the paper exploits the relational algebraic infrastructure to provide accurate estimation in the context of XML and XQuery domains. In the proposed framework, XQuery expressions are translated into an equivalent relational algebraic plans and then using a well defined set of inference rules and a set of special properties of the algebraic plan, this framework is able to provide high‐accurate estimation for XQuery expressions.

Findings

This paper is believed to be the first which provides a uniform framework to estimate the cardinality of more powerful XML querying capabilities using XQuery expressions as well as their sub‐expressions. It exploits the relational algebraic infrastructure to provide accurate estimation in the context of XML and XQuery domains. Moreover, the proposed framework can act as a meta‐model through its ability to incorporate different summarized XML structures and different histogram techniques which allows the model designers to achieve their targets by focusing their effort on designing or selecting the adequate techniques for them. In addition, this paper proposes benchmark for XQuery cardinality estimation systems. The proposed benchmark distinguishes itself from the other existing XML benchmarks in its focus on establishing the basis for comparing the different estimation approaches in the XML domain in terms of their accuracy of the estimations and their completeness in handling different XML querying features.

Research limitations/implications

The current status of this proposed XQuery cardinality estimations framework does not support the estimation of the queries over the order information of the source XML documents and does not support non‐numeric predicates.

Practical implications

The experiments of this XQuery cardinality estimation system demonstrate its effectiveness and show high‐accurate estimation results. Utilizing the cardinality estimation properties during the SQL translation of XQuery expression results in an average improvement of 20 percent on the performance of their execution times.

Originality/value

This paper presents a novel framework for estimating the cardinality of XQuery expressions as well as its sub‐expressions. A novel XQuery cardinality estimation benchmark is introduced to establish the basis of comparison between the different estimation approaches in the XQuery domain.

Details

International Journal of Web Information Systems, vol. 4 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 22 June 2010

Awny Sayed, Ahmed A. Radwan and Mohamed M. Abdallah

Information retrieval (IR) and feedback in Extensible Markup Language (XML) are rather new fields for researchers; natural questions arise, such as: how good are the feedback…

Abstract

Purpose

Information retrieval (IR) and feedback in Extensible Markup Language (XML) are rather new fields for researchers; natural questions arise, such as: how good are the feedback algorithms in XML IR? Can they be evaluated with standard evaluation tools? Even though some evaluation methods have been proposed in the literature it is still not clear yet which of them are applicable in the context of XML IR, and which metrics they can be combined with to assess the quality of XML retrieval algorithms that use feedback. This paper aims to elaborate on this.

Design/methodology/approach

The efficient evaluation of relevance feedback (RF) algorithms for XML collection posed interesting challenges on the IR and database researchers. The system based on the keyword‐based queries whether on the main query or in the RF processing instead of the XPath and structure query languages which were more complex. For measuring the efficiency of the system, the paper used the extended RF algorithms (residual collection and freezeTop) for evaluating the performance of the XML search engines. Compared to previous approaches, the paper aimed at removing the effect of the results for which the system has knowledge about their relevance, and at measuring the improvement on unseen relevant elements. The paper implemented the proposed evaluation methodologies by extending a standard evaluation tool with a module capable of assessing feedback algorithms for a specific set of metrics.

Findings

In this paper, the authors create an efficient XML retrieval system that is based on a query refinement by making a feedback processing and extending the main query terms with new terms mostly related to the main terms.

Research limitations/implications

The authors are working on more efficient retrieval algorithms to get the top‐ten results related to the submitted query. Moreover, they plan to extend the system to handle complex XPath expression.

Originality/value

This paper presents an efficient evaluation of RF algorithms for XML collection retrieval system.

Details

International Journal of Web Information Systems, vol. 6 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 November 2005

Joseph Fong, San Kuen Cheung, Herbert Shiu and Chi Chung Cheung

XML Schema Definition (XSD) is in the logical level of XML model and is used in most web applications. At present, there is no standard format for the conceptual level of XML…

Abstract

XML Schema Definition (XSD) is in the logical level of XML model and is used in most web applications. At present, there is no standard format for the conceptual level of XML model. Therefore, we introduce an XML Tree Model as an XML conceptual schema for representing and confirming the data semantics according to the user requirements in a diagram. The XML Tree Model consists of nodes representing all elements within the XSD. We apply reverse engineering from an XSD to an XML Tree Model to assist end users in applying an XML database for information highway on the Internet. The data semantics recovered for visualization include root element, weak elements, participation, cardinality, aggregation, generalization, categorization, and n‐ary association, and which can be derived by analyzing the structural constraints of XSD based on its key features such as key, keyref, minOccurs, maxOccurs, Choice, Sequence and extension. We use the Eclipse user interface for generating a graphical view for XML conceptual schema.

Details

International Journal of Web Information Systems, vol. 1 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 May 2005

Eric Pardede, J. Wendy Rahayu and David Taniar

Despite the increasing demand for an effective XML document repository, many are still reluctant to store XML documents in their natural tree form. One main reason is the…

Abstract

Despite the increasing demand for an effective XML document repository, many are still reluctant to store XML documents in their natural tree form. One main reason is the inadequacy of XML query languages to update the tree‐form XML documents. Even though some of the languages have supported minimum update facilities, they do not concern on preserving the documents constraints. The results are updated documents with very low database integrity. In this paper, we propose a methodology to accommodate XML Update without violating the conceptual constraints of the documents. The method takes form as a set of functions that perform checking mechanisms before update operations. In this paper we discuss the conceptual constraints embedded in three different relationship structures: association, aggregation and inheritance relationship. We highlight four constraints related with association relationship (nuber of participants, referential integrity, cardinality, and adhesion), five constraints related with aggregation relationship (cardinality, adhesion, ordering, homogeneity and share‐ability) and two constraints related to inheritance relationship (disjoint and number of super‐class). In addition, a specific constraint, which is collection type of children, will also be discussed. The proposed method can be implemented in different ways, for example in this paper we use XQuery language. Since the XML update requires schema, in this paper we also propose the mapping of the these constraints in the conceptual level to the XML Schema. We use XML Schema for structure validation, even though the algorithm can be used by any schema languages.

Details

International Journal of Web Information Systems, vol. 1 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 20 June 2008

Nikolaos Fousteris, Manolis Gergatsoulis and Yannis Stavrakas

In a wide spectrum of applications, it is desirable to manipulate semistructured information that may present variations according to different circumstances. Multidimensional XML…

Abstract

Purpose

In a wide spectrum of applications, it is desirable to manipulate semistructured information that may present variations according to different circumstances. Multidimensional XML (MXML) is an extension of XML suitable for representing data that assume different facets, having different value and/or structure under different contexts. The purpose of this paper is to develop techniques for updating MXML documents.

Design/methodology/approach

Updating XML has been studied in the past, however, updating MXML must take into account the additional features, which stem from incorporating context into MXML. This paper investigates the problem of updating MXML in two levels: at the graph level, i.e. in an implementation independent way; and at the relational storage level.

Findings

The paper introduces six basic update operations, which are capable of any possible change. Those operations are specified in an implementation independent way, and their effect explained through examples. Algorithms are given that implement those operations using SQL on a specific storage method that employs relational tables for keeping MXML. An overview is given of multidimensional XPath (MXPath), an extension of XPath that incorporates context, and show how to translate MXPath queries to “equivalent” SQL queries.

Research limitations/implications

Though the proposed operations solve the problem of updating MXML documents, several problems, such as formally define MXPath and its translation to SQL, remain to be investigated in the future in order to implement a system that stores, queries and updates MXML documents through a relational database infrastructure.

Practical implications

MXML is suitable for representing, in a compact way, data that assume different facets, having different value or structure, under different contexts. In order for MXML to be applicable in practice, it is vital to develop techniques and tools for storing, updating and querying MXML documents. The techniques proposed in this paper form a significant step in this direction.

Originality/value

This paper presents a novel approach for updating MXML documents by proposing update operations on both, the graph level and the (relational) storage level.

Details

International Journal of Web Information Systems, vol. 4 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Book part
Publication date: 15 March 2021

Reto Hofstetter

Every second, vast amounts of data are generated and stored on the Internet. Data scraping makes these data accessible and usable for business and scientific purposes. Web-scraped…

Abstract

Every second, vast amounts of data are generated and stored on the Internet. Data scraping makes these data accessible and usable for business and scientific purposes. Web-scraped data are of high value to businesses as they can be used to inform many strategic decisions such as pricing or market positioning. Although it is not difficult to scrape data, particularly when they come from public websites, there are six key steps that analysts should ideally consider and follow. Following these steps can help to better harness the business value of online data.

1 – 10 of 129