Search results

1 – 10 of over 1000
Article
Publication date: 14 June 2013

Atsushi Keyaki, Jun Miyazaki, Kenji Hatano, Goshiro Yamamoto, Takafumi Taketomi and Hirokazu Kato

The purpose of this paper is to propose methods for fast incremental indexing with effective and efficient query processing in XML element retrieval. The effectiveness of a search…

Abstract

Purpose

The purpose of this paper is to propose methods for fast incremental indexing with effective and efficient query processing in XML element retrieval. The effectiveness of a search system becomes lower if document updates are not handled when these occur frequently on the Web. The search accuracy is also reduced if drastic changes in document statistics are not managed. However, existing studies of XML element retrieval do not consider document updates, although these studies have attained both effectiveness and efficiency in query processing. Thus, the authors add a function for handling document updates to the existing techniques for XML element retrieval.

Design/methodology/approach

Though it will be important to enable fast updates of indices, preliminary experiments have shown that a simple incremental update approach has two problems: some kinds of statistics are inaccurate, and it takes a long time to update indices. Therefore, two methods are proposed: one to approximate term weights accurately with a small number of documents, even for dynamically changing statistics; and the other to eliminate unnecessary update targets.

Findings

Experimental results show that this proposed system can update indices up to 32 per cent faster than the simple incremental updates while the search accuracy improved by 4 per cent compared with the simple approach. The proposed methods can also be fast and accurate in query processing, even if document statistics change drastically.

Originality/value

The paper shows that there could be a more practical XML element search engine, which can access the latest XML documents accurately and efficiently.

Details

International Journal of Web Information Systems, vol. 9 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 18 September 2009

Wei Lu, Andrew MacFarlane and Fabio Venuti

Being an important data exchange and information storage standard, XML has generated a great deal of interest and particular attention has been paid to the issue of XML indexing…

Abstract

Purpose

Being an important data exchange and information storage standard, XML has generated a great deal of interest and particular attention has been paid to the issue of XML indexing. Clear use cases for structured search in XML have been established. However, most of the research in the area is either based on relational database systems or specialized semi‐structured data management systems. This paper aims to propose a method for XML indexing based on the information retrieval (IR) system Okapi.

Design/methodology/approach

First, the paper reviews the structure of inverted files and gives an overview of the issues of why this indexing mechanism cannot properly support XML retrieval, using the underlying data structures of Okapi as an example. Then the paper explores a revised method implemented on Okapi using path indexing structures. The paper evaluates these index structures through the metrics of indexing run time, path search run time and space costs using the INEX and Reuters RVC1 collections.

Findings

Initial results on the INEX collections show that there is a substantial overhead in space costs for the method, but this increase does not affect run time adversely. Indexing results on differing sized Reuters RVC1 sub‐collections show that the increase in space costs with increasing the size of a collection is significant, but in terms of run time the increase is linear. Path search results show sub‐millisecond run times, demonstrating minimal overhead for XML search.

Practical implications

Overall, the results show the method implemented to support XML search in a traditional IR system such as Okapi is viable.

Originality/value

The paper provides useful information on a method for XML indexing based on the IR system Okapi.

Details

Aslib Proceedings, vol. 61 no. 5
Type: Research Article
ISSN: 0001-253X

Keywords

Article
Publication date: 1 June 2000

Hyun‐Hee Kim and Chang‐Seok Choi

The purpose of this paper is to show how XML is applied to digital library systems. For a better understanding of XML, the major features of XML are reviewed and compared with…

1323

Abstract

The purpose of this paper is to show how XML is applied to digital library systems. For a better understanding of XML, the major features of XML are reviewed and compared with those of HTML. An experimental XML‐based metadata retrieval system, which is designed as a subsystem of the Korean Virtual Library and Information System (VINIS) is demonstrated. The metadata retrieval system consists of two modules: a retrieval module and a browsing module. The retrieval module allows the retrieval of metadata stored in Microsoft Access files and the display of search results in an XML file format, while the browse module permits browsing of metadata in XML/XSL document formats. Finally, some issues for a more efficient application of XML to digital libraries are discussed.

Details

The Electronic Library, vol. 18 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 22 June 2010

Awny Sayed, Ahmed A. Radwan and Mohamed M. Abdallah

Information retrieval (IR) and feedback in Extensible Markup Language (XML) are rather new fields for researchers; natural questions arise, such as: how good are the feedback…

Abstract

Purpose

Information retrieval (IR) and feedback in Extensible Markup Language (XML) are rather new fields for researchers; natural questions arise, such as: how good are the feedback algorithms in XML IR? Can they be evaluated with standard evaluation tools? Even though some evaluation methods have been proposed in the literature it is still not clear yet which of them are applicable in the context of XML IR, and which metrics they can be combined with to assess the quality of XML retrieval algorithms that use feedback. This paper aims to elaborate on this.

Design/methodology/approach

The efficient evaluation of relevance feedback (RF) algorithms for XML collection posed interesting challenges on the IR and database researchers. The system based on the keyword‐based queries whether on the main query or in the RF processing instead of the XPath and structure query languages which were more complex. For measuring the efficiency of the system, the paper used the extended RF algorithms (residual collection and freezeTop) for evaluating the performance of the XML search engines. Compared to previous approaches, the paper aimed at removing the effect of the results for which the system has knowledge about their relevance, and at measuring the improvement on unseen relevant elements. The paper implemented the proposed evaluation methodologies by extending a standard evaluation tool with a module capable of assessing feedback algorithms for a specific set of metrics.

Findings

In this paper, the authors create an efficient XML retrieval system that is based on a query refinement by making a feedback processing and extending the main query terms with new terms mostly related to the main terms.

Research limitations/implications

The authors are working on more efficient retrieval algorithms to get the top‐ten results related to the submitted query. Moreover, they plan to extend the system to handle complex XPath expression.

Originality/value

This paper presents an efficient evaluation of RF algorithms for XML collection retrieval system.

Details

International Journal of Web Information Systems, vol. 6 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 22 November 2011

Atsushi Keyaki, Kenji Hatano and Jun Miyzaki

Nowadays there are a large number of XML documents on the web. This means that information retrieval techniques for searching XML documents are very important and necessary for…

Abstract

Purpose

Nowadays there are a large number of XML documents on the web. This means that information retrieval techniques for searching XML documents are very important and necessary for internet users. Moreover, it is often said that users of search engines want to browse only relevant content in each document. Therefore, an effective XML element search aims to produce only the relevant elements or portions of an XML document. Based on the demand by users, the purpose of this paper is to propose and evaluate a method for obtaining more accurate search results in XML search.

Design/methodology/approach

The existing approaches generate a ranked list in descending order of each XML element's relevance to a search query; however, these approaches often extract irrelevant XML elements and overlook more relevant elements. To address these problems, the authors' approach extracts the relevant XML elements by considering the size of the elements and the relationships between the elements. Next, the authors score the XML elements to generate a refined ranked list. For scoring, the authors rank high the XML elements that are the most relevant to the user's information needs. In particular, each XML element is scored using the statistics of its descendant and ancestor XML elements.

Findings

The experimental evaluations show that the proposed method outperforms BM25E, a conventional approach, which neither reconstructs XML elements nor uses descendant and ancestor statistics. As a result, the authors found that the accuracy of an XML element search can be improved by reconstructing the XML elements and emphasizing the informative ones by applying the statistics of the descendant XML elements.

Research limitations/implications

This work focused on the effectiveness of XML element search and the authors did not consider the search efficiency in this paper. One of the authors' next challenges is to reduce search time.

Originality/value

The paper proposes a method for improving the effectiveness of XML element search.

Article
Publication date: 19 June 2017

Keng Hoon Gan and Keat Keong Phang

When accessing structured contents in XML form, information requests are formulated in the form of special query languages such as NEXI, Xquery, etc. However, it is not easy for…

Abstract

Purpose

When accessing structured contents in XML form, information requests are formulated in the form of special query languages such as NEXI, Xquery, etc. However, it is not easy for end users to compose such information requests using these special queries because of their complexities. Hence, the purpose of this paper is to automate the construction of such queries from common query like keywords or form-based queries.

Design/methodology/approach

In this paper, the authors address the problem of constructing queries for XML retrieval by proposing a semantic-syntax query model that can be used to construct different types of structured queries. First, a generic query structure known as semantic query structure is designed to store query contents given by user. Then, generation of a target language is carried out by mapping the contents in semantic query structure to query syntax templates stored in knowledge base.

Findings

Evaluations were carried out based on how well information needs are captured and transformed into a target query language. In summary, the proposed model is able to express information needs specified using query like NEXI. Xquery records a lower percentage because of its language complexity. The authors also achieve satisfactory query construction rate with an example-based method, i.e. 86 per cent (for NEXI IMDB topics) and 87 per cent (NEXI Wiki topics), respectively, compare to benchmark of 78 per cent by Sumita and Iida in language translation.

Originality/value

The proposed semantic-syntax query model allows flexibility of accommodating new query language by separating the semantic of query from its syntax.

Details

International Journal of Web Information Systems, vol. 13 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 December 2002

Catherine Ebenezer

Provides an overview of the present state of development of integrated library systems and identifies, describes and evaluates significant trends in the industry in relation to…

3107

Abstract

Provides an overview of the present state of development of integrated library systems and identifies, describes and evaluates significant trends in the industry in relation to their context within the overall development of library services. Notes that the library systems market, and developments in library systems, are driven by Internet trends and by the software Industry rather than by the library and information community and that they are subject to global economic imperatives.

Details

VINE, vol. 32 no. 4
Type: Research Article
ISSN: 0305-5728

Keywords

Article
Publication date: 19 June 2009

Chantola Kit, Toshiyuki Amagasa and Hiroyuki Kitagawa

The purpose of this paper is to propose efficient algorithms for structural grouping over Extensible Markup Language (XML) data, called TOPOLOGICAL ROLLUP (T‐ROLLUP), which are to…

1858

Abstract

Purpose

The purpose of this paper is to propose efficient algorithms for structural grouping over Extensible Markup Language (XML) data, called TOPOLOGICAL ROLLUP (T‐ROLLUP), which are to compute aggregation functions based on XML data with multiple hierarchical levels. They play important roles in the online analytical processing of XML data, called XML‐OLAP, with which complex analysis over XML can be performed to discover valuable information from XML.

Design/methodology/approach

Several variations of algorithms are proposed for efficient T‐ROLLUP computation. First, two basic algorithms, top‐down algorithm (TDA) and bottom‐up algorithm (BUA), are presented in which the well‐known structural‐join algorithms are used. The paper then proposes more efficient algorithms, called single‐scan by preorder number and single‐scan by postorder number (SSC‐Pre/Post), which are also based on structural joins, but have been modified from the basic algorithms so that multiple levels of grouping are computed with a single scan over node lists. In addition, the paper attempts to adopt the algorithm for parallel execution in multi‐core environments.

Findings

Several experiments are conducted with XMark and synthetic XML data to show the effectiveness of the proposed algorithms. The experiments show that proposed algorithms perform much better than the naïve implementation. In particular, the proposed SSC‐Pre and SSC‐Post perform better than TDA and BUA for all cases. Beyond that, the experiment using the parallel single scan algorithm also shows better performance than the ordinary basic algorithm.

Research limitations/implications

This paper focuses on the T‐ROLLUP operation for XML data analysis. For this reason, other operations related to XML‐OLAP, such as CUBE, WINDOWING, and RANKING should also be investigated.

Originality/value

The paper presents an extended version of one of the award winning papers at iiWAS2008.

Details

International Journal of Web Information Systems, vol. 5 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Abstract

Details

Aslib Journal of Information Management, vol. 66 no. 2
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 26 June 2007

Q.T. Tho, A.C.M. Fong and S.C. Hui

The semantic web gives meaning to information so that humans and computers can work together better. Ontology is used to represent knowledge on the semantic web. Web services have…

1130

Abstract

Purpose

The semantic web gives meaning to information so that humans and computers can work together better. Ontology is used to represent knowledge on the semantic web. Web services have been introduced to make the knowledge conveyed by the ontology on the semantic web accessible across different applications. This paper seeks to present the use of these latest advances in the context of a scholarly semantic web (or SSWeb) system, which can support advanced search functions such as expert finding and trend detection in addition to basic functions such as document and author search as well as document and author clustering search.

Design/methodology/approach

A distributed architecture of the proposed SSWeb is described, as well as semantic web services that support scholarly information retrieval on the SSWeb.

Findings

Initial experimental results indicate that the proposed method is effective.

Research limitations/implications

The work reported is experimental in nature. More work is needed, but early results are encouraging and the authors wish to make their work known to the research community by publishing this paper so that further progress can be made in this area of research.

Originality/value

The work is presented in the context of scholarly document retrieval, but it could also be adapted to other types of documents, such as medical records, machine‐fault records and legal documents. This is because the basic principles are the same.

Details

Online Information Review, vol. 31 no. 3
Type: Research Article
ISSN: 1468-4527

Keywords

1 – 10 of over 1000