Search results

1 – 10 of 16
Article
Publication date: 22 November 2011

Atsushi Keyaki, Kenji Hatano and Jun Miyzaki

Nowadays there are a large number of XML documents on the web. This means that information retrieval techniques for searching XML documents are very important and necessary for

Abstract

Purpose

Nowadays there are a large number of XML documents on the web. This means that information retrieval techniques for searching XML documents are very important and necessary for internet users. Moreover, it is often said that users of search engines want to browse only relevant content in each document. Therefore, an effective XML element search aims to produce only the relevant elements or portions of an XML document. Based on the demand by users, the purpose of this paper is to propose and evaluate a method for obtaining more accurate search results in XML search.

Design/methodology/approach

The existing approaches generate a ranked list in descending order of each XML element's relevance to a search query; however, these approaches often extract irrelevant XML elements and overlook more relevant elements. To address these problems, the authors' approach extracts the relevant XML elements by considering the size of the elements and the relationships between the elements. Next, the authors score the XML elements to generate a refined ranked list. For scoring, the authors rank high the XML elements that are the most relevant to the user's information needs. In particular, each XML element is scored using the statistics of its descendant and ancestor XML elements.

Findings

The experimental evaluations show that the proposed method outperforms BM25E, a conventional approach, which neither reconstructs XML elements nor uses descendant and ancestor statistics. As a result, the authors found that the accuracy of an XML element search can be improved by reconstructing the XML elements and emphasizing the informative ones by applying the statistics of the descendant XML elements.

Research limitations/implications

This work focused on the effectiveness of XML element search and the authors did not consider the search efficiency in this paper. One of the authors' next challenges is to reduce search time.

Originality/value

The paper proposes a method for improving the effectiveness of XML element search.

Article
Publication date: 18 September 2009

Wei Lu, Andrew MacFarlane and Fabio Venuti

Being an important data exchange and information storage standard, XML has generated a great deal of interest and particular attention has been paid to the issue of XML indexing…

Abstract

Purpose

Being an important data exchange and information storage standard, XML has generated a great deal of interest and particular attention has been paid to the issue of XML indexing. Clear use cases for structured search in XML have been established. However, most of the research in the area is either based on relational database systems or specialized semi‐structured data management systems. This paper aims to propose a method for XML indexing based on the information retrieval (IR) system Okapi.

Design/methodology/approach

First, the paper reviews the structure of inverted files and gives an overview of the issues of why this indexing mechanism cannot properly support XML retrieval, using the underlying data structures of Okapi as an example. Then the paper explores a revised method implemented on Okapi using path indexing structures. The paper evaluates these index structures through the metrics of indexing run time, path search run time and space costs using the INEX and Reuters RVC1 collections.

Findings

Initial results on the INEX collections show that there is a substantial overhead in space costs for the method, but this increase does not affect run time adversely. Indexing results on differing sized Reuters RVC1 sub‐collections show that the increase in space costs with increasing the size of a collection is significant, but in terms of run time the increase is linear. Path search results show sub‐millisecond run times, demonstrating minimal overhead for XML search.

Practical implications

Overall, the results show the method implemented to support XML search in a traditional IR system such as Okapi is viable.

Originality/value

The paper provides useful information on a method for XML indexing based on the IR system Okapi.

Details

Aslib Proceedings, vol. 61 no. 5
Type: Research Article
ISSN: 0001-253X

Keywords

Article
Publication date: 22 June 2010

Awny Sayed, Ahmed A. Radwan and Mohamed M. Abdallah

Information retrieval (IR) and feedback in Extensible Markup Language (XML) are rather new fields for researchers; natural questions arise, such as: how good are the feedback…

Abstract

Purpose

Information retrieval (IR) and feedback in Extensible Markup Language (XML) are rather new fields for researchers; natural questions arise, such as: how good are the feedback algorithms in XML IR? Can they be evaluated with standard evaluation tools? Even though some evaluation methods have been proposed in the literature it is still not clear yet which of them are applicable in the context of XML IR, and which metrics they can be combined with to assess the quality of XML retrieval algorithms that use feedback. This paper aims to elaborate on this.

Design/methodology/approach

The efficient evaluation of relevance feedback (RF) algorithms for XML collection posed interesting challenges on the IR and database researchers. The system based on the keyword‐based queries whether on the main query or in the RF processing instead of the XPath and structure query languages which were more complex. For measuring the efficiency of the system, the paper used the extended RF algorithms (residual collection and freezeTop) for evaluating the performance of the XML search engines. Compared to previous approaches, the paper aimed at removing the effect of the results for which the system has knowledge about their relevance, and at measuring the improvement on unseen relevant elements. The paper implemented the proposed evaluation methodologies by extending a standard evaluation tool with a module capable of assessing feedback algorithms for a specific set of metrics.

Findings

In this paper, the authors create an efficient XML retrieval system that is based on a query refinement by making a feedback processing and extending the main query terms with new terms mostly related to the main terms.

Research limitations/implications

The authors are working on more efficient retrieval algorithms to get the top‐ten results related to the submitted query. Moreover, they plan to extend the system to handle complex XPath expression.

Originality/value

This paper presents an efficient evaluation of RF algorithms for XML collection retrieval system.

Details

International Journal of Web Information Systems, vol. 6 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 19 June 2017

Keng Hoon Gan and Keat Keong Phang

When accessing structured contents in XML form, information requests are formulated in the form of special query languages such as NEXI, Xquery, etc. However, it is not easy for

Abstract

Purpose

When accessing structured contents in XML form, information requests are formulated in the form of special query languages such as NEXI, Xquery, etc. However, it is not easy for end users to compose such information requests using these special queries because of their complexities. Hence, the purpose of this paper is to automate the construction of such queries from common query like keywords or form-based queries.

Design/methodology/approach

In this paper, the authors address the problem of constructing queries for XML retrieval by proposing a semantic-syntax query model that can be used to construct different types of structured queries. First, a generic query structure known as semantic query structure is designed to store query contents given by user. Then, generation of a target language is carried out by mapping the contents in semantic query structure to query syntax templates stored in knowledge base.

Findings

Evaluations were carried out based on how well information needs are captured and transformed into a target query language. In summary, the proposed model is able to express information needs specified using query like NEXI. Xquery records a lower percentage because of its language complexity. The authors also achieve satisfactory query construction rate with an example-based method, i.e. 86 per cent (for NEXI IMDB topics) and 87 per cent (NEXI Wiki topics), respectively, compare to benchmark of 78 per cent by Sumita and Iida in language translation.

Originality/value

The proposed semantic-syntax query model allows flexibility of accommodating new query language by separating the semantic of query from its syntax.

Details

International Journal of Web Information Systems, vol. 13 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 24 April 2009

Luis M. de Campos, Juan M. Fernández‐Luna, Juan F. Huete, Carlos J. Martín‐Dancausa, Antonio Tagua‐Jiménez and Carmen Tur‐Vigil

The purpose of this paper is to present an overview of the reorganisation of the Andalusian Parliament's digital library to improve the electronic representation and access of its…

Abstract

Purpose

The purpose of this paper is to present an overview of the reorganisation of the Andalusian Parliament's digital library to improve the electronic representation and access of its official corpus by taking advantage of a document's internal organisation. Video recordings of the parliamentary sessions have also been integrated with their corresponding textual transcriptions.

Design/methodology/approach

After analysing the state of the Andalusian Parliament's digital library and determining the aspects that could be improved both in the repository and access mechanisms, this paper describes each component of the developed integrated information system.

Findings

A methodology has been developed to tackle the problem and this could be applied to other similar institutions and organisations. Exploiting the internal structure of the parliament's official documents has also proved to be extremely interesting for users as they are directed towards the most relevant parts of the documents.

Originality/value

The paper presents an application of an information retrieval system for structured documents to a real framework and the integration of multimedia sources (e.g. text and video) for retrieval purposes.

Details

Program, vol. 43 no. 2
Type: Research Article
ISSN: 0033-0337

Keywords

Article
Publication date: 18 November 2013

Jorge Luis Morato, Sonia Sanchez-Cuadrado, Christos Dimou, Divakar Yadav and Vicente Palacios

– This paper seeks to analyze and evaluate different types of semantic web retrieval systems, with respect to their ability to manage and retrieve semantic documents.

1441

Abstract

Purpose

This paper seeks to analyze and evaluate different types of semantic web retrieval systems, with respect to their ability to manage and retrieve semantic documents.

Design/methodology/approach

The authors provide a brief overview of knowledge modeling and semantic retrieval systems in order to identify their major problems. They classify a set of characteristics to evaluate the management of semantic documents. For doing the same the authors select 12 retrieval systems classified according to these features. The evaluation methodology followed in this work is the one that has been used in the Desmet project for the evaluation of qualitative characteristics.

Findings

A review of the literature has shown deficiencies in the current state of the semantic web to cope with known problems. Additionally, the way semantic retrieval systems are implemented shows discrepancies in their implementation. The authors analyze the presence of a set of functionalities in different types of semantic retrieval systems and find a low degree of implementation of important specifications and in the criteria to evaluate them. The results of this evaluation indicate that, at the moment, the semantic web is characterized by a lack of usability that is derived by the problems related to the management of semantic documents.

Originality/value

This proposal shows a simple way to compare requirements of semantic retrieval systems based in DESMET methodology qualitatively. The functionalities chosen to test the methodology are based on the problems as well as relevant criteria discussed in the literature. This work provides functionalities to design semantic retrieval systems in different scenarios.

Details

Library Hi Tech, vol. 31 no. 4
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 1 May 2006

Rajugan Rajagopalapillai, Elizabeth Chang, Tharam S. Dillon and Ling Feng

In data engineering, view formalisms are used to provide flexibility to users and user applications by allowing them to extract and elaborate data from the stored data sources…

Abstract

In data engineering, view formalisms are used to provide flexibility to users and user applications by allowing them to extract and elaborate data from the stored data sources. Conversely, since the introduction of EXtensible Markup Language (XML), it is fast emerging as the dominant standard for storing, describing, and interchanging data among various web and heterogeneous data sources. In combination with XML Schema, XML provides rich facilities for defining and constraining user‐defined data semantics and properties, a feature that is unique to XML. In this context, it is interesting to investigate traditional database features, such as view models and view design techniques for XML. However, traditional view formalisms are strongly coupled to the data language and its syntax, thus it proves to be a difficult task to support views in the case of semi‐structured data models. Therefore, in this paper we propose a Layered View Model (LVM) for XML with conceptual and schemata extensions. Here our work is three‐fold; first we propose an approach to separate the implementation and conceptual aspects of the views that provides a clear separation of concerns, thus, allowing analysis and design of views to be separated from their implementation. Secondly, we define representations to express and construct these views at the conceptual level. Thirdly, we define a view transformation methodology for XML views in the LVM, which carries out automated transformation to a view schema and a view query expression in an appropriate query language. Also, to validate and apply the LVM concepts, methods and transformations developed, we propose a viewdriven application development framework with the flexibility to develop web and database applications for XML, at varying levels of abstraction.

Details

International Journal of Web Information Systems, vol. 2 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 May 2006

Koraljka Golub

To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning…

2207

Abstract

Purpose

To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such.

Design/methodology/approach

A range of works dealing with automated classification of full‐text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages.

Findings

Provides major similarities and differences between the three approaches: document pre‐processing and utilization of web‐specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized.

Research limitations/implications

The paper does not attempt to provide an exhaustive bibliography of related resources.

Practical implications

As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities.

Originality/value

To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.

Details

Journal of Documentation, vol. 62 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 8 February 2013

Stefan Dietze, Salvador Sanchez‐Alonso, Hannes Ebner, Hong Qing Yu, Daniela Giordano, Ivana Marenzi and Bernardo Pereira Nunes

Research in the area of technology‐enhanced learning (TEL) throughout the last decade has largely focused on sharing and reusing educational resources and data. This effort has…

1461

Abstract

Purpose

Research in the area of technology‐enhanced learning (TEL) throughout the last decade has largely focused on sharing and reusing educational resources and data. This effort has led to a fragmented landscape of competing metadata schemas, or interface mechanisms. More recently, semantic technologies were taken into account to improve interoperability. The linked data approach has emerged as the de facto standard for sharing data on the web. To this end, it is obvious that the application of linked data principles offers a large potential to solve interoperability issues in the field of TEL. This paper aims to address this issue.

Design/methodology/approach

In this paper, approaches are surveyed that are aimed towards a vision of linked education, i.e. education which exploits educational web data. It particularly considers the exploitation of the wealth of already existing TEL data on the web by allowing its exposure as linked data and by taking into account automated enrichment and interlinking techniques to provide rich and well‐interlinked data for the educational domain.

Findings

So far web‐scale integration of educational resources is not facilitated, mainly due to the lack of take‐up of shared principles, datasets and schemas. However, linked data principles increasingly are recognized by the TEL community. The paper provides a structured assessment and classification of existing challenges and approaches, serving as potential guideline for researchers and practitioners in the field.

Originality/value

Being one of the first comprehensive surveys on the topic of linked data for education, the paper has the potential to become a widely recognized reference publication in the area.

Article
Publication date: 9 May 2016

Pia Borlund

The purpose of this paper is to report a study of how the test instrument of a simulated work task situation is used in empirical evaluations of interactive information retrieval

1741

Abstract

Purpose

The purpose of this paper is to report a study of how the test instrument of a simulated work task situation is used in empirical evaluations of interactive information retrieval (IIR) and reported in the research literature. In particular, the author is interested to learn whether the requirements of how to employ simulated work task situations are followed, and whether these requirements call for further highlighting and refinement.

Design/methodology/approach

In order to study how simulated work task situations are used, the research literature in question is identified. This is done partly via citation analysis by use of Web of Science®, and partly by systematic search of online repositories. On this basis, 67 individual publications were identified and they constitute the sample of analysis.

Findings

The analysis reveals a need for clarifications of how to use simulated work task situations in IIR evaluations. In particular, with respect to the design and creation of realistic simulated work task situations. There is a lack of tailoring of the simulated work task situations to the test participants. Likewise, the requirement to include the test participants’ personal information needs is neglected. Further, there is a need to add and emphasise a requirement to depict the used simulated work task situations when reporting the IIR studies.

Research limitations/implications

Insight about the use of simulated work task situations has implications for test design of IIR studies and hence the knowledge base generated on the basis of such studies.

Originality/value

Simulated work task situations are widely used in IIR studies, and the present study is the first comprehensive study of the intended and unintended use of this test instrument since its introduction in the late 1990’s. The paper addresses the need to carefully design and tailor simulated work task situations to suit the test participants in order to obtain the intended authentic and realistic IIR under study.

Details

Journal of Documentation, vol. 72 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

1 – 10 of 16