Search results

1 – 10 of over 1000
Article
Publication date: 14 June 2013

Atsushi Keyaki, Jun Miyazaki, Kenji Hatano, Goshiro Yamamoto, Takafumi Taketomi and Hirokazu Kato

The purpose of this paper is to propose methods for fast incremental indexing with effective and efficient query processing in XML element retrieval. The effectiveness of a search

Abstract

Purpose

The purpose of this paper is to propose methods for fast incremental indexing with effective and efficient query processing in XML element retrieval. The effectiveness of a search system becomes lower if document updates are not handled when these occur frequently on the Web. The search accuracy is also reduced if drastic changes in document statistics are not managed. However, existing studies of XML element retrieval do not consider document updates, although these studies have attained both effectiveness and efficiency in query processing. Thus, the authors add a function for handling document updates to the existing techniques for XML element retrieval.

Design/methodology/approach

Though it will be important to enable fast updates of indices, preliminary experiments have shown that a simple incremental update approach has two problems: some kinds of statistics are inaccurate, and it takes a long time to update indices. Therefore, two methods are proposed: one to approximate term weights accurately with a small number of documents, even for dynamically changing statistics; and the other to eliminate unnecessary update targets.

Findings

Experimental results show that this proposed system can update indices up to 32 per cent faster than the simple incremental updates while the search accuracy improved by 4 per cent compared with the simple approach. The proposed methods can also be fast and accurate in query processing, even if document statistics change drastically.

Originality/value

The paper shows that there could be a more practical XML element search engine, which can access the latest XML documents accurately and efficiently.

Details

International Journal of Web Information Systems, vol. 9 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 22 November 2011

Atsushi Keyaki, Kenji Hatano and Jun Miyzaki

Nowadays there are a large number of XML documents on the web. This means that information retrieval techniques for searching XML documents are very important and necessary for…

Abstract

Purpose

Nowadays there are a large number of XML documents on the web. This means that information retrieval techniques for searching XML documents are very important and necessary for internet users. Moreover, it is often said that users of search engines want to browse only relevant content in each document. Therefore, an effective XML element search aims to produce only the relevant elements or portions of an XML document. Based on the demand by users, the purpose of this paper is to propose and evaluate a method for obtaining more accurate search results in XML search.

Design/methodology/approach

The existing approaches generate a ranked list in descending order of each XML element's relevance to a search query; however, these approaches often extract irrelevant XML elements and overlook more relevant elements. To address these problems, the authors' approach extracts the relevant XML elements by considering the size of the elements and the relationships between the elements. Next, the authors score the XML elements to generate a refined ranked list. For scoring, the authors rank high the XML elements that are the most relevant to the user's information needs. In particular, each XML element is scored using the statistics of its descendant and ancestor XML elements.

Findings

The experimental evaluations show that the proposed method outperforms BM25E, a conventional approach, which neither reconstructs XML elements nor uses descendant and ancestor statistics. As a result, the authors found that the accuracy of an XML element search can be improved by reconstructing the XML elements and emphasizing the informative ones by applying the statistics of the descendant XML elements.

Research limitations/implications

This work focused on the effectiveness of XML element search and the authors did not consider the search efficiency in this paper. One of the authors' next challenges is to reduce search time.

Originality/value

The paper proposes a method for improving the effectiveness of XML element search.

Content available
Article
Publication date: 22 November 2011

Ismail Khalil

345

Abstract

Details

International Journal of Web Information Systems, vol. 7 no. 4
Type: Research Article
ISSN: 1744-0084

Article
Publication date: 11 March 2014

Sayyed Mahdi Taheri, Nadjla Hariri and Sayyed Rahmatollah Fattahi

The aim of this research was to examine the use of the data island method for creating metadata records based on DCXML, MARCXML, and MODS with indexability and visibility of…

Abstract

Purpose

The aim of this research was to examine the use of the data island method for creating metadata records based on DCXML, MARCXML, and MODS with indexability and visibility of element tag names in web search engines.

Design/methodology/approach

A total of 600 metadata records were developed in two groups (300 HTML-based records in an experimental group with special structure embedded in the < pre> tag of HTML based on the data island method, and 300 XML-based records as the control group with the normal structure). These records were analyzed through an experimental approach. The records of these two groups were published on two independent websites, and were submitted to Google and Bing search engines.

Findings

Findings show that all the tag names of the metadata records created based on the data island method relating to the experimental group indexed by Google and Bing were visible in the search results. But the tag names in the control group's metadata records were not indexed by the search engines. Accordingly it is possible to index and retrieve the metadata records by their tag name in the search engines. But the records of the control group are accessible by the element values only. The research suggests some patterns to the metadata creators and the end users for better indexing and retrieval.

Originality/value

The research used the data island method for creating the metadata records, and deals with the indexability and visibility of the metadata element tag names for the first time.

Details

Library Hi Tech, vol. 32 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 18 September 2009

Wei Lu, Andrew MacFarlane and Fabio Venuti

Being an important data exchange and information storage standard, XML has generated a great deal of interest and particular attention has been paid to the issue of XML indexing…

Abstract

Purpose

Being an important data exchange and information storage standard, XML has generated a great deal of interest and particular attention has been paid to the issue of XML indexing. Clear use cases for structured search in XML have been established. However, most of the research in the area is either based on relational database systems or specialized semi‐structured data management systems. This paper aims to propose a method for XML indexing based on the information retrieval (IR) system Okapi.

Design/methodology/approach

First, the paper reviews the structure of inverted files and gives an overview of the issues of why this indexing mechanism cannot properly support XML retrieval, using the underlying data structures of Okapi as an example. Then the paper explores a revised method implemented on Okapi using path indexing structures. The paper evaluates these index structures through the metrics of indexing run time, path search run time and space costs using the INEX and Reuters RVC1 collections.

Findings

Initial results on the INEX collections show that there is a substantial overhead in space costs for the method, but this increase does not affect run time adversely. Indexing results on differing sized Reuters RVC1 sub‐collections show that the increase in space costs with increasing the size of a collection is significant, but in terms of run time the increase is linear. Path search results show sub‐millisecond run times, demonstrating minimal overhead for XML search.

Practical implications

Overall, the results show the method implemented to support XML search in a traditional IR system such as Okapi is viable.

Originality/value

The paper provides useful information on a method for XML indexing based on the IR system Okapi.

Details

Aslib Proceedings, vol. 61 no. 5
Type: Research Article
ISSN: 0001-253X

Keywords

Article
Publication date: 22 November 2011

A. Hossein Farajpahlou and Faeze Tabatabai

The aim of this paper is to examine the indexing quality and ranking of XML content objects containing Dublin Core and MARC 21 metadata elements in dynamic online information…

2074

Abstract

Purpose

The aim of this paper is to examine the indexing quality and ranking of XML content objects containing Dublin Core and MARC 21 metadata elements in dynamic online information environments by general search engines such as Google and Yahoo!

Design/methodology/approach

In total, 100 XML content objects were divided into two groups: those with DCXML elements and those with MARCXML elements. Both groups were published on the web site www.marcdcmi.ir in late July 2009 and were online until June 2010. The web site was introduced to Google and Yahoo! search engines. The indexing quality of metadata elements embedded in the content objects in a dynamic online information environment and their indexing and ranking capabilities were compared and examined.

Findings

Google search engine was able to retrieve fully all the content objects through their Dublin Core and MARC 21 metadata elements; Yahoo! search engine, however, did not respond at all. Results of the study showed that all Dublin Core and MARC 21 metadata elements were indexed by Google search engine. No difference was observed between indexing quality and ranking of DCXML metadata elements with that of MARCXML. The results of the study revealed that neither the XML‐based Dublin Core Metadata Initiative nor MARC 21 demonstrate any preference regarding access in dynamic online information environments through Google search engine.

Practical implications

The findings can provide useful information for search engine designers.

Originality/value

The present study was conducted for the first time in dynamic environments using XML‐based metadata elements. It can provide grounds for further studies of this kind.

Details

Aslib Proceedings, vol. 63 no. 6
Type: Research Article
ISSN: 0001-253X

Keywords

Article
Publication date: 17 August 2015

Takahiro Komamizu, Toshiyuki Amagasa and Hiroyuki Kitagawa

The purpose of this paper is to extract appropriate terms to summarize the current results in terms of the contents of textual facets. Faceted search on XML data helps users find…

Abstract

Purpose

The purpose of this paper is to extract appropriate terms to summarize the current results in terms of the contents of textual facets. Faceted search on XML data helps users find necessary information from XML data by giving attribute–content pairs (called facet-value pair) about the current search results. However, if most of the contents of a facet have longer texts in average (such facets are called textual facets), it is not easy to overview the current results.

Design/methodology/approach

The proposed approach is based upon subsumption relationships of terms among the contents of a facet. The subsumption relationship can be extracted using co-occurrences of terms among a number of documents (in this paper, a content of a facet is considered as a document). Subsumption relationships compose hierarchies, and the authors utilize the hierarchies to extract facet-values from textual facets. In the faceted search context, users have ambiguous search demands, they expect broader terms. Thus, we extract high-level terms in the hierarchies as facet-values.

Findings

The main findings of this paper are the extracted terms improve users’ search experiences, especially in cases when the search demands are ambiguous.

Originality/value

An originality of this paper is the way to utilize the textual contents of XML data for improving users’ search experiences on faceted search. The other originality is how to design the tasks to evaluate exploratory search like faceted search.

Details

International Journal of Web Information Systems, vol. 11 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 3 August 2012

Sayyed Mahdi Taheri and Nadjla Hariri

The purpose of this research was to assess and compare the indexing and ranking of XML‐based content objects containing MARCXML and XML‐based Dublin Core (DCXML) metadata elements

1149

Abstract

Purpose

The purpose of this research was to assess and compare the indexing and ranking of XML‐based content objects containing MARCXML and XML‐based Dublin Core (DCXML) metadata elements by general search engines (Google and Yahoo!), in a comparative analytical study.

Design/methodology/approach

One hundred XML content objects in two groups were analyzed: those with MARCXML elements (50 records) and those with DCXML (50 records) published on two web sites (www.dcmixml.islamicdoc.org and www.marcxml.islamicdoc.org).The web sites were then introduced to the Google and Yahoo! search engines.

Findings

The indexing of metadata records and the difference between their indexing and ranking were examined using descriptive statistics and a non‐parametric Mann‐Whitney U test. The findings show that the visibility of content objects was possible by all their metadata elements. There was no significant difference between two groups' indexing, but a difference was observed in terms of ranking.

Practical implications

The findings of this research can help search engine designers in the optimum use of metadata elements to improve their indexing and ranking process with the aim of increasing availability. The findings can also help web content object providers in the proper and efficient use of metadata systems.

Originality/value

This is the first research to examine the interoperability between XML‐based metadata and web search engines, and compares the MARC format and DCMI in a research approach.

Details

The Electronic Library, vol. 30 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 5 June 2009

Danijela Boberić and Dušan Surla

The purpose of this paper is modelling and implementation of the XML‐based editor for search and retrieval of bibliographic records. Search and retrieval of bibliographic records…

1130

Abstract

Purpose

The purpose of this paper is modelling and implementation of the XML‐based editor for search and retrieval of bibliographic records. Search and retrieval of bibliographic records are enabled by the editor from the remote databases via the Z39.50 protocol. The client application is realized in the Java environment and business logic is based on the XML technologies.

Design/methodology/approach

Object‐oriented methodology is used for modelling and implementation of information systems. Modelling is done in the CASE tools that support the Unified Modelling Language (UML 2.0) and implementation is developed in Eclipse environment.

Findings

The result is an application for retrieving bibliographic records within the Z39.50 standard. The editor supports the query formulation of the type‐1 that is defined by the Z39.50 standard. The implementation of the editor is based on the XML technologies by which a simple migration to the other query types defined by the standard is enabled. The application is verified by search and retrieval of the bibliographic records from several libraries that support the Z39.50 protocol.

Research limitations/implications

The editor supports only type‐1 query that is modelled in XML schema language. Addition of other types of queries should require their modelling in XML schema as well as corresponding changes of the screen forms.

Practical implications

The editor is integrated into the BISIS software system. In that way, the retrieval of the bibliographic record is enabled from the world's libraries that use the Z39.50 protocol. The retrieved record is stored into the object structure that is later processed in the editor for bibliographic material processing of the BISIS system.

Originality/value

The contribution of this work is in the software system architecture that is based on the XML technologies and independent of the standard by which the query language is defined as well as independent of the software system into which the system is integrated. The XML schema of the query language is the input information into the editor software system, thus the introduction of a new query type consists of creation of the corresponding XML schema of the query language. After the query was run, a set of results that could be retrieved in the form of the XML document is obtained and processed in the same form in different software systems.

Details

The Electronic Library, vol. 27 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 17 August 2015

Savong Bou, Toshiyuki Amagasa and Hiroyuki Kitagawa

In purpose of this paper is to propose a novel scheme to process XPath-based keyword search over Extensible Markup Language (XML) streams, where one can specify query keywords and…

114

Abstract

Purpose

In purpose of this paper is to propose a novel scheme to process XPath-based keyword search over Extensible Markup Language (XML) streams, where one can specify query keywords and XPath-based filtering conditions at the same time. Experimental results prove that our proposed scheme can efficiently and practically process XPath-based keyword search over XML streams.

Design/methodology/approach

To allow XPath-based keyword search over XML streams, it was attempted to integrate YFilter (Diao et al., 2003) with CKStream (Hummel et al., 2011). More precisely, the nondeterministic finite automation (NFA) of YFilter is extended so that keyword matching at text nodes is supported. Next, the stack data structure is modified by integrating set of NFA states in YFilter with bitmaps generated from set of keyword queries in CKStream.

Findings

Extensive experiments were conducted using both synthetic and real data set to show the effectiveness of the proposed method. The experimental results showed that the accuracy of the proposed method was better than the baseline method (CKStream), while it consumed less memory. Moreover, the proposed scheme showed good scalability with respect to the number of queries.

Originality/value

Due to the rapid diffusion of XML streams, the demand for querying such information is also growing. In such a situation, the ability to query by combining XPath and keyword search is important, because it is easy to use, but powerful means to query XML streams. However, none of existing works has addressed this issue. This work is to cope with this problem by combining an existing XPath-based YFilter and a keyword-search-based CKStream for XML streams to enable XPath-based keyword search.

Details

International Journal of Web Information Systems, vol. 11 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 10 of over 1000