Search results

1 – 10 of over 10000
Article
Publication date: 28 August 2009

Zhewei Jiang, Cheng Luo, Wen‐Chi Hou, Dunren Che and Qiang Zhu

The purpose of this paper is to provide an efficient algorithm for Extensible Markup Language (XML) twig query evaluation.

Abstract

Purpose

The purpose of this paper is to provide an efficient algorithm for Extensible Markup Language (XML) twig query evaluation.

Design/methodology/approach

A single‐phase holistic twig pattern matching method based on the TwigStack algorithm is proposed. The method applies a novel stack structure to preserve the holisticity of the twig matches. Twig matches rooted at elements that are currently in the root stack are output directly.

Findings

Without generating individual path matches as intermediate results, the method is able to avoid the storage and output/input of the individual path matches, and totally eliminate the potentially time‐consuming merging operation. Experimental results demonstrate the applicability and advantages of our approach.

Originality/value

The paper proposes an efficient XML twig query evaluation algorithm, which by both theoretical analyses and empirical studies demonstrates its advantages over the current state‐of‐the‐art algorithm TwigStack.

Details

International Journal of Web Information Systems, vol. 5 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 7 August 2017

Dan Wu and Renmin Bi

This paper discusses the differences in search pattern transitions for mobile phone, tablet and desktop devices by mining the transaction log data of a library online public…

Abstract

Purpose

This paper discusses the differences in search pattern transitions for mobile phone, tablet and desktop devices by mining the transaction log data of a library online public access catalogue (OPAC). We aimed to analyze the impacts of different devices on user search behavior and provide constructive suggestions for the development of library OPACs on different devices.

Design/methodology/approach

Based on transaction logs which are 9 GB in size and contain 16,140,509 records of a university library OPAC, statistics and clustering were used to analyze the differences in search pattern transitions on different devices in terms of two aspects: search field transition patterns and query reformulation patterns.

Findings

Search field transition patterns are influenced by the input function and user interfaces of different devices. As reformulation times increase, the differences between query reformulation patterns among different devices decrease.

Practical implications

Mobile-side libraries need to optimize user interfaces, for example by setting web page labels and improving input capabilities. Desk-side libraries can add more suggestive content on the interface.

Originality/value

Unlike previous studies, which have analyzed web search, this paper focuses on library OPAC search. The search function of mobile phones, tablets and desktops were found to be asymptotic, which was a good illustration of how devices have a large impact on user search behavior.

Article
Publication date: 14 April 2014

Faisal Alkhateeb and Jerome Euzenat

The paper aims to discuss extensions of SPARQL that use regular expressions to navigate RDF graphs and may be used to answer queries considering RDFS semantics (in particular…

Abstract

Purpose

The paper aims to discuss extensions of SPARQL that use regular expressions to navigate RDF graphs and may be used to answer queries considering RDFS semantics (in particular, nSPARQL and our proposal CPSPARQL).

Design/methodology/approach

The paper is based upon a theoretical comparison of the expressiveness and complexity of both nSPARQL and the corresponding fragment of CPSPARQL, that we call cpSPARQL.

Findings

The paper shows that nSPARQL and cpSPARQL (the fragment of CPSPARQL) have the same complexity through cpSPARQL, being a proper extension of SPARQL graph patterns, is more expressive than nSPARQL.

Research limitations/implications

It has not been possible to the authors to compare the performance of our CPSPARQL implementation with other proposals. However, the experimentation has allowed to make interesting observations.

Practical implications

The paper includes implications for implementing the SPARQL RDFS entailment regime.

Originality/value

The paper demonstrates the usefulness of cpSPARQL language. In particular, cpSPARQL, which is sufficient for capturing RDFS semantics, admits an efficient evaluation algorithm, while the whole CPSPARQL language is in theory as efficient as SPARQL is. Moreover, using such a path language within the SPARQL structure allows for properly extending SPARQL.

Details

International Journal of Web Information Systems, vol. 10 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 13 April 2018

Dan Wu, Shaobo Liang and Renmin Bi

The study focused on online public access catalog (OPAC) users’ cross-device search behavior. The purpose of this paper is to understand the characteristics of cross-device OPAC…

Abstract

Purpose

The study focused on online public access catalog (OPAC) users’ cross-device search behavior. The purpose of this paper is to understand the characteristics of cross-device OPAC searches, and to identify query reformulation (QR) patterns during device transitions.

Design/methodology/approach

The transaction log from a university library, spanning six months, was used to conduct the quantitative analysis. The query vocabulary richness, which refers to the average number of unique words each query contains in a search session, can evaluate query diversity, and contribute to the analysis of QR.

Findings

The results show that PC-PC transition is the most important pattern of device transition. The time interval of device transition was different to the time interval of transitions in web searches. Short device transitions mainly occurred in daytime, and the number of transitions that occurred in less than one minute was higher than on the web. Searches for Industry and Technology triggered the most device transitions, and the users tended to choose the same search field. In addition, the authors made a detailed analysis of the reasons for same-type device transitions and different-type device transitions. Furthermore, the authors focused on the characteristics of adjacent QR patterns. The authors not only refined the concept of cross-device to include the same-type device transition, but also summarized the characteristics of the cross-device QR patterns, which can be used to predict post-switch queries.

Originality/value

This study extends research into cross-device interaction and cross-device search to the domain of digital library research. The authors also introduced QR perspective on cross-device interaction on OPAC.

Details

Library Hi Tech, vol. 36 no. 3
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 28 September 2007

Dunren Che

Tree pattern is at the core of XML queries. The tree patterns in XML queries typically contain redundancies, especially when broad integrity constraints (ICs) are present and…

Abstract

Purpose

Tree pattern is at the core of XML queries. The tree patterns in XML queries typically contain redundancies, especially when broad integrity constraints (ICs) are present and considered. Apparently, tree pattern minimization has great significance for efficient XML query processing. Although various minimization schemes/algorithms have been proposed, none of them can exploit broad ICs for thoroughly minimizing the tree patterns in XML queries. The purpose of this research is to develop an innovative minimization scheme and provide a novel implementation algorithm.

Design/methodology/approach

Query augmentation/expansion was taken as a necessary first‐step by most prior approaches to acquire XML query pattern minimization under the presence of certain ICs. The adopted augmentation/expansion is also the course for the typical O(n4) time‐complexity of the proposed algorithms. This paper presents an innovative approach called allying to effectively circumvent the otherwise necessary augmentation step and to retain the time complexity of the implementation algorithm within the optimal, i.e. O(n2). Meanwhile, the graph simulation concept is adapted and generalized to a three‐tier definition scheme so that broader ICs are incorporated.

Findings

The innovative allying minimization approach is identified and an effective implementation algorithm named AlliedMinimize is developed. This algorithm is both runtime optimal – taking O(n2) time – and most powerful in terms of the broadness of constraints it can exploit for XML query pattern minimization. Experimental study confirms the validity of the proposed approach and algorithm.

Research limitations/implications

Though the algorithm AlliedMinimize is so far the most powerful XML query pattern minimization algorithm, it does not incorporate all potential ICs existing in the context of XML. Effectively integrating this innovative minimization scheme into a fully‐fledged XML query optimizer remains to be investigated in the future.

Practical implications

In practice, Allying and AlliedMinimize can be used to achieve a kind of quick optimization for XML queries via fast minimization of the tree patterns involved in XML queries under broad ICs.

Originality/value

This paper presents a novel scheme and an efficient algorithm for XML query pattern minimization under broad ICs.

Details

International Journal of Web Information Systems, vol. 3 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 12 August 2014

Sheng Li and Junhu Wang

The purpose of this paper is to study the spelling suggestion (SS) problem for extensible markup language (XML) keyword search, which provides users with alternative queries that…

Abstract

Purpose

The purpose of this paper is to study the spelling suggestion (SS) problem for extensible markup language (XML) keyword search, which provides users with alternative queries that may better express users search intention.

Design/methodology/approach

To return the suggested queries more efficiently, the authors evaluate the quality of the query by estimating the selectivity and quality of each query pattern. The selectivity estimation is based on the XSketch synopsis, which summarizes the structure and value distribution of the original XML data source. The authors propose an approach to generating the top-K query candidates.

Findings

Experiments with real datasets verify the effectiveness and efficiency of the authors' approach.

Originality/value

The authors proposed a SS approach based on the XSketch summary.

Details

International Journal of Web Information Systems, vol. 10 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 19 June 2009

Imam Machdi, Toshiyuki Amagasa and Hiroyuki Kitagawa

The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig…

Abstract

Purpose

The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig joins: grid metadata model for XML (GMX) and streams‐based partitioning method for XML (SPX).

Design/methodology/approach

GMX exploits the relationships between XML documents and query patterns to perform workload‐aware partitioning of XML data. Specifically, the paper constructs a two‐dimensional model with a document dimension and a query dimension in which each object in a dimension is composed from XML metadata related to the dimension. GMX provides a set of XML data partitioning methods that include document clustering, query clustering, document‐based refinement, query‐based refinement, and query‐path refinement, thereby enabling XML data partitioning based on the static information of XML metadata. In contrast, SPX explores the structural relationships of query elements and a range‐containment property of XML streams to generate partitions and allocate them to cluster nodes on‐the‐fly.

Findings

GMX provides several salient features: a set of partition granularities that balance workloads of query processing costs among cluster nodes statically; inter‐query parallelism as well as intra‐query parallelism at multiple extents; and better parallel query performance when all estimated queries are executed simultaneously to meet their probability of query occurrences in the system. SPX also offers the following features: minimal computation time to generate partitions; balancing skewed workloads dynamically on the system; producing higher intra‐query parallelism; and gaining better parallel query performance.

Research limitations/implications

The current status of the proposed XML data partitioning schemes does not take into account XML data updates, e.g. new XML documents and query pattern changes submitted by users on the system.

Practical implications

Note that effectiveness of the XML data partitioning schemes mainly relies on the accuracy of the cost model to estimate query processing costs. The cost model must be adjusted to reflect characteristics of a system platform used in the implementation.

Originality/value

This paper proposes novel schemes of conducting XML data partitioning to achieve both static and dynamic workload balance.

Details

International Journal of Web Information Systems, vol. 5 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 4 April 2008

Dunren Che and Wen‐Chi Hou

Efficient processing of XML queries is critical for XML data management and related applications. Previously proposed techniques are unsatisfactory. The purpose of this paper is…

Abstract

Purpose

Efficient processing of XML queries is critical for XML data management and related applications. Previously proposed techniques are unsatisfactory. The purpose of this paper is to present Determined – a new prototype system designed for XML query processing and optimization from a system perspective. With Determined, a number of novel techniques for XML query processing are proposed and demonstrated.

Design/methodology/approach

The methodology emphasizes on query pattern minimization, logic‐level optimization, and efficient query execution. Accordingly, three lines of investigation have been pursued in the context of Determined: XML tree pattern query (TPQ) minimization; logic‐level XML query optimization utilizing deterministic transformation; and specialized algorithms for fast XML query execution.

Findings

Developed and demonstrated were: a runtime optimal and powerful algorithm for XML TPQ minimization; a unique logic‐level XML query optimization approach that solely pursues deterministic query transformation; and a group of specialized algorithms for XML query evaluation.

Research limitations/implications

The experiments conducted so far are still preliminary. Further in‐depth, thorough experiments thus are expected, ideally carried out in the setting of a real‐world XML DBMS system.

Practical implications

The techniques/approaches proposed can be adapted to real‐world XML database systems to enhance the performance of XML query processing.

Originality/value

The reported work integrates various novel techniques for XML query processing/optimization into a single system, and the findings are presented from a system perspective.

Details

International Journal of Web Information Systems, vol. 4 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 19 July 2022

Faraja Ndumbaro

Users' search logs are implicit feedbacks on how searchers interact with online information retrieval (IR) systems. The purpose of this paper is to analyze search query

Abstract

Purpose

Users' search logs are implicit feedbacks on how searchers interact with online information retrieval (IR) systems. The purpose of this paper is to analyze search query reformulation (SQR) patterns of University of Dar es Salaam remote OPAC users.

Design/methodology/approach

Qualitative and quantitative analysis of transaction logs were employed to ascertain the characteristics of search queries and the patterns in which remote OPAC users reformulate their search queries. The study covered a period of six months, commencing from January to June 2019.

Findings

A total of 30,474 search hits were submitted by remote OPAC users during the period under study. Individuals from academic and research institutions, computing consortia, and telecommunication companies are the main users of the system. Most of the searches originated from North America and Europe, with few searches coming from China and India. Besides improving search results, SQRs are linked with the existence of multiple information demands as manifested by the use of heterogeneous headwords within individual search episodes.

Research limitations/implications

Data collected covered only six months. Similarly, it was however not possible to analyze users' search query formulation within specific contexts such as task-based information searching.

Practical implications

A query recommendation system should be integrated into the OPAC functionalities to improve users' search experiences. Alternatively, there should be a migration to a new system that offers more advanced search features and functionalities.

Originality/value

The study has contributed new insights in SQR studies particularly on how non-institutional affiliated users translate their information needs into search queries during information searching processes.

Peer review

The peer review history for this article is available at: https://publons.com/publon/10.1108/OIR-09-2020-0389

Details

Online Information Review, vol. 47 no. 1
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 5 September 2008

Seda Ozmutlu and Gencer C. Cosar

Identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. Recently, various studies have focused on new topic…

Abstract

Purpose

Identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. Recently, various studies have focused on new topic identification/session identification of search engine transaction logs, and several problems regarding the estimation of topic shifts and continuations were observed in these studies. This study aims to analyze the reasons for the problems that were encountered as a result of applying automatic new topic identification.

Design/methodology/approach

Measures, such as cleaning the data of common words and analyzing the errors of automatic new topic identification, are applied to eliminate the problems in estimating topic shifts and continuations.

Findings

The findings show that the resulting errors of automatic new topic identification have a pattern, and further research is required to improve the performance of automatic new topic identification.

Originality/value

Improving the performance of automatic new topic identification would be valuable to search engine designers, so that they can develop new clustering and query recommendation algorithms, as well as custom‐tailored graphical user interfaces for search engine users.

Details

Library Hi Tech, vol. 26 no. 3
Type: Research Article
ISSN: 0737-8831

Keywords

1 – 10 of over 10000