Search results

1 – 10 of over 2000
Article
Publication date: 22 November 2011

Atsushi Keyaki, Kenji Hatano and Jun Miyzaki

Nowadays there are a large number of XML documents on the web. This means that information retrieval techniques for searching XML documents are very important and necessary for…

Abstract

Purpose

Nowadays there are a large number of XML documents on the web. This means that information retrieval techniques for searching XML documents are very important and necessary for internet users. Moreover, it is often said that users of search engines want to browse only relevant content in each document. Therefore, an effective XML element search aims to produce only the relevant elements or portions of an XML document. Based on the demand by users, the purpose of this paper is to propose and evaluate a method for obtaining more accurate search results in XML search.

Design/methodology/approach

The existing approaches generate a ranked list in descending order of each XML element's relevance to a search query; however, these approaches often extract irrelevant XML elements and overlook more relevant elements. To address these problems, the authors' approach extracts the relevant XML elements by considering the size of the elements and the relationships between the elements. Next, the authors score the XML elements to generate a refined ranked list. For scoring, the authors rank high the XML elements that are the most relevant to the user's information needs. In particular, each XML element is scored using the statistics of its descendant and ancestor XML elements.

Findings

The experimental evaluations show that the proposed method outperforms BM25E, a conventional approach, which neither reconstructs XML elements nor uses descendant and ancestor statistics. As a result, the authors found that the accuracy of an XML element search can be improved by reconstructing the XML elements and emphasizing the informative ones by applying the statistics of the descendant XML elements.

Research limitations/implications

This work focused on the effectiveness of XML element search and the authors did not consider the search efficiency in this paper. One of the authors' next challenges is to reduce search time.

Originality/value

The paper proposes a method for improving the effectiveness of XML element search.

Article
Publication date: 17 August 2015

Savong Bou, Toshiyuki Amagasa and Hiroyuki Kitagawa

In purpose of this paper is to propose a novel scheme to process XPath-based keyword search over Extensible Markup Language (XML) streams, where one can specify query keywords and…

114

Abstract

Purpose

In purpose of this paper is to propose a novel scheme to process XPath-based keyword search over Extensible Markup Language (XML) streams, where one can specify query keywords and XPath-based filtering conditions at the same time. Experimental results prove that our proposed scheme can efficiently and practically process XPath-based keyword search over XML streams.

Design/methodology/approach

To allow XPath-based keyword search over XML streams, it was attempted to integrate YFilter (Diao et al., 2003) with CKStream (Hummel et al., 2011). More precisely, the nondeterministic finite automation (NFA) of YFilter is extended so that keyword matching at text nodes is supported. Next, the stack data structure is modified by integrating set of NFA states in YFilter with bitmaps generated from set of keyword queries in CKStream.

Findings

Extensive experiments were conducted using both synthetic and real data set to show the effectiveness of the proposed method. The experimental results showed that the accuracy of the proposed method was better than the baseline method (CKStream), while it consumed less memory. Moreover, the proposed scheme showed good scalability with respect to the number of queries.

Originality/value

Due to the rapid diffusion of XML streams, the demand for querying such information is also growing. In such a situation, the ability to query by combining XPath and keyword search is important, because it is easy to use, but powerful means to query XML streams. However, none of existing works has addressed this issue. This work is to cope with this problem by combining an existing XPath-based YFilter and a keyword-search-based CKStream for XML streams to enable XPath-based keyword search.

Details

International Journal of Web Information Systems, vol. 11 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 14 June 2013

Atsushi Keyaki, Jun Miyazaki, Kenji Hatano, Goshiro Yamamoto, Takafumi Taketomi and Hirokazu Kato

The purpose of this paper is to propose methods for fast incremental indexing with effective and efficient query processing in XML element retrieval. The effectiveness of a search

Abstract

Purpose

The purpose of this paper is to propose methods for fast incremental indexing with effective and efficient query processing in XML element retrieval. The effectiveness of a search system becomes lower if document updates are not handled when these occur frequently on the Web. The search accuracy is also reduced if drastic changes in document statistics are not managed. However, existing studies of XML element retrieval do not consider document updates, although these studies have attained both effectiveness and efficiency in query processing. Thus, the authors add a function for handling document updates to the existing techniques for XML element retrieval.

Design/methodology/approach

Though it will be important to enable fast updates of indices, preliminary experiments have shown that a simple incremental update approach has two problems: some kinds of statistics are inaccurate, and it takes a long time to update indices. Therefore, two methods are proposed: one to approximate term weights accurately with a small number of documents, even for dynamically changing statistics; and the other to eliminate unnecessary update targets.

Findings

Experimental results show that this proposed system can update indices up to 32 per cent faster than the simple incremental updates while the search accuracy improved by 4 per cent compared with the simple approach. The proposed methods can also be fast and accurate in query processing, even if document statistics change drastically.

Originality/value

The paper shows that there could be a more practical XML element search engine, which can access the latest XML documents accurately and efficiently.

Details

International Journal of Web Information Systems, vol. 9 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 17 August 2015

Takahiro Komamizu, Toshiyuki Amagasa and Hiroyuki Kitagawa

The purpose of this paper is to extract appropriate terms to summarize the current results in terms of the contents of textual facets. Faceted search on XML data helps users find…

Abstract

Purpose

The purpose of this paper is to extract appropriate terms to summarize the current results in terms of the contents of textual facets. Faceted search on XML data helps users find necessary information from XML data by giving attribute–content pairs (called facet-value pair) about the current search results. However, if most of the contents of a facet have longer texts in average (such facets are called textual facets), it is not easy to overview the current results.

Design/methodology/approach

The proposed approach is based upon subsumption relationships of terms among the contents of a facet. The subsumption relationship can be extracted using co-occurrences of terms among a number of documents (in this paper, a content of a facet is considered as a document). Subsumption relationships compose hierarchies, and the authors utilize the hierarchies to extract facet-values from textual facets. In the faceted search context, users have ambiguous search demands, they expect broader terms. Thus, we extract high-level terms in the hierarchies as facet-values.

Findings

The main findings of this paper are the extracted terms improve users’ search experiences, especially in cases when the search demands are ambiguous.

Originality/value

An originality of this paper is the way to utilize the textual contents of XML data for improving users’ search experiences on faceted search. The other originality is how to design the tasks to evaluate exploratory search like faceted search.

Details

International Journal of Web Information Systems, vol. 11 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 20 April 2015

Abubakar Roko, Shyamala Doraisamy, Azrul Hazri Jantan and Azreen Azman

The purpose of this paper is to propose and evaluate XKQSS, a query structuring method that relegates the task of generating structured queries from a user to a search engine…

Abstract

Purpose

The purpose of this paper is to propose and evaluate XKQSS, a query structuring method that relegates the task of generating structured queries from a user to a search engine while retaining the simple keyword search query interface. A more effective way for searching XML database is to use structured queries. However, using query languages to express queries prove to be difficult for most users since this requires learning a query language and knowledge of the underlying data schema. On the other hand, the success of Web search engines has made many users to be familiar with keyword search and, therefore, they prefer to use a keyword search query interface to search XML data.

Design/methodology/approach

Existing query structuring approaches require users to provide structural hints in their input keyword queries even though their interface is keyword base. Other problems with existing systems include their inability to put keyword query ambiguities into consideration during query structuring and how to select the best generated structure query that best represents a given keyword query. To address these problems, this study allows users to submit a schema independent keyword query, use named entity recognition (NER) to categorize query keywords to resolve query ambiguities and compute semantic information for a node from its data content. Algorithms were proposed that find user search intentions and convert the intentions into a set of ranked structured queries.

Findings

Experiments with Sigmod and IMDB datasets were conducted to evaluate the effectiveness of the method. The experimental result shows that the XKQSS is about 20 per cent more effective than XReal in terms of return nodes identification, a state-of-art systems for XML retrieval.

Originality/value

Existing systems do not take keyword query ambiguities into account. XKSS consists of two guidelines based on NER that help to resolve these ambiguities before converting the submitted query. It also include a ranking function computes a score for each generated query by using both semantic information and data statistic, as opposed to data statistic only approach used by the existing approaches.

Details

International Journal of Web Information Systems, vol. 11 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 5 June 2009

Danijela Boberić and Dušan Surla

The purpose of this paper is modelling and implementation of the XML‐based editor for search and retrieval of bibliographic records. Search and retrieval of bibliographic records…

1127

Abstract

Purpose

The purpose of this paper is modelling and implementation of the XML‐based editor for search and retrieval of bibliographic records. Search and retrieval of bibliographic records are enabled by the editor from the remote databases via the Z39.50 protocol. The client application is realized in the Java environment and business logic is based on the XML technologies.

Design/methodology/approach

Object‐oriented methodology is used for modelling and implementation of information systems. Modelling is done in the CASE tools that support the Unified Modelling Language (UML 2.0) and implementation is developed in Eclipse environment.

Findings

The result is an application for retrieving bibliographic records within the Z39.50 standard. The editor supports the query formulation of the type‐1 that is defined by the Z39.50 standard. The implementation of the editor is based on the XML technologies by which a simple migration to the other query types defined by the standard is enabled. The application is verified by search and retrieval of the bibliographic records from several libraries that support the Z39.50 protocol.

Research limitations/implications

The editor supports only type‐1 query that is modelled in XML schema language. Addition of other types of queries should require their modelling in XML schema as well as corresponding changes of the screen forms.

Practical implications

The editor is integrated into the BISIS software system. In that way, the retrieval of the bibliographic record is enabled from the world's libraries that use the Z39.50 protocol. The retrieved record is stored into the object structure that is later processed in the editor for bibliographic material processing of the BISIS system.

Originality/value

The contribution of this work is in the software system architecture that is based on the XML technologies and independent of the standard by which the query language is defined as well as independent of the software system into which the system is integrated. The XML schema of the query language is the input information into the editor software system, thus the introduction of a new query type consists of creation of the corresponding XML schema of the query language. After the query was run, a set of results that could be retrieved in the form of the XML document is obtained and processed in the same form in different software systems.

Details

The Electronic Library, vol. 27 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 18 September 2009

Wei Lu, Andrew MacFarlane and Fabio Venuti

Being an important data exchange and information storage standard, XML has generated a great deal of interest and particular attention has been paid to the issue of XML indexing…

Abstract

Purpose

Being an important data exchange and information storage standard, XML has generated a great deal of interest and particular attention has been paid to the issue of XML indexing. Clear use cases for structured search in XML have been established. However, most of the research in the area is either based on relational database systems or specialized semi‐structured data management systems. This paper aims to propose a method for XML indexing based on the information retrieval (IR) system Okapi.

Design/methodology/approach

First, the paper reviews the structure of inverted files and gives an overview of the issues of why this indexing mechanism cannot properly support XML retrieval, using the underlying data structures of Okapi as an example. Then the paper explores a revised method implemented on Okapi using path indexing structures. The paper evaluates these index structures through the metrics of indexing run time, path search run time and space costs using the INEX and Reuters RVC1 collections.

Findings

Initial results on the INEX collections show that there is a substantial overhead in space costs for the method, but this increase does not affect run time adversely. Indexing results on differing sized Reuters RVC1 sub‐collections show that the increase in space costs with increasing the size of a collection is significant, but in terms of run time the increase is linear. Path search results show sub‐millisecond run times, demonstrating minimal overhead for XML search.

Practical implications

Overall, the results show the method implemented to support XML search in a traditional IR system such as Okapi is viable.

Originality/value

The paper provides useful information on a method for XML indexing based on the IR system Okapi.

Details

Aslib Proceedings, vol. 61 no. 5
Type: Research Article
ISSN: 0001-253X

Keywords

Article
Publication date: 14 June 2013

Yousuke Watanabe, Hidetaka Kamigaito and Haruo Yokota

Office documents are widely used in our daily activities, so the number of them has been increasing. A demand for sophisticated search for office documents becomes more important…

Abstract

Purpose

Office documents are widely used in our daily activities, so the number of them has been increasing. A demand for sophisticated search for office documents becomes more important. The recent file format of office documents is based on a package of multiple XML files. These XML files include not only body text but also page structure data and style data. The purpose of this paper is to utilize them to find similar office documents.

Design/methodology/approach

The authors propose SOS, a similarity search method based on structures and styles of office documents. SOS needs to compute similarity values between multiple pairs of XML files included in the office documents. We also propose LAX+, which is an algorithm to calculate a similarity value for a pair of XML files, by extending existing XML leaf node clustering algorithm.

Findings

SOS and LAX+ are evaluated by using three types of office documents (docx, xlsx and pptx) in our experiments. The results of LAX+ and SOS are better than ones of the existing algorithms.

Originality/value

Existing text‐based search engines do not take structure and style of documents into account. SOS can find similar documents by calculating similarities between multiple XML files corresponding to body texts, structures and styles.

Article
Publication date: 1 September 1999

Joe Jackson and Donald L. Gilstrap

This article addresses the implications of the new Web meta‐language XML for World Wide Web searching. Compared to HTML, XML is more concerned with structure of data than…

743

Abstract

This article addresses the implications of the new Web meta‐language XML for World Wide Web searching. Compared to HTML, XML is more concerned with structure of data than documents. These XML data structures, especially when declared in document type definitions, should prove conducive to precise, context rich searching. Some of the directions the XML language is intended to move are briefly covered. Additionally, trends in World Wide Web development with respect to beta versions of the XML language are discussed.

Details

Library Hi Tech, vol. 17 no. 3
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 16 November 2012

Takahiro Komamizu, Toshiyuki Amagasa and Hiroyuki Kitagawa

XML has become a standard data format for many applications and efficient retrieval methods are required. Typically, there are roughly two kinds of retrieval methods, namely…

Abstract

Purpose

XML has become a standard data format for many applications and efficient retrieval methods are required. Typically, there are roughly two kinds of retrieval methods, namely path‐based method (e.g. XPath and XQuery) and keyword search, but these methods do not work when users do not have any concrete information need. To expand feasibility of XML data retrieval is an important task and this is the purpose of this paper.

Design/methodology/approach

The paper's strategy is to apply faceted navigation for XML data. Faceted navigation is an exploratory search which enables the exploration of data making use of attributes, called facets. General faceted navigation methods are applied for attributed objects but XML data have no criteria because XML nodes are objects and facets. Thus, the paper's approach is to construct a framework to enable faceted navigation over XML data. It first extracts objects based on occurrence of nodes and facets. Then it constructs a faceted navigation interface for extracted objects and facets.

Findings

The framework achieves semi‐automatic construction of faceted navigation interface from an XML database. In the experiments, the show feasibility of the framework is shown by three faceted navigation interfaces using existing real XML data. On the other hand, the user study shows the retrieval method helps users to find required information.

Originality/value

There are only a few works which apply faceted navigation for XML data and these works are based on predefined objects and facets which need human effort. In contrast, this framework needs human decision making only when choosing objects and facets to be used in the faceted navigation interface.

Details

International Journal of Web Information Systems, vol. 8 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 10 of over 2000