Search results

1 – 10 of over 2000
Article
Publication date: 14 April 2014

Chang-Sup Park and Sungchae Lim

The paper aims to propose an effective method to process keyword-based queries over graph-structured databases which are widely used in various applications such as XML, semantic…

Abstract

Purpose

The paper aims to propose an effective method to process keyword-based queries over graph-structured databases which are widely used in various applications such as XML, semantic web, and social network services. To satisfy users' information need, it proposes an extended answer structure for keyword queries, inverted list indexes on keywords and nodes, and query processing algorithms exploiting the inverted lists. The study aims to provide more effective and relevant answers to a given query than the previous approaches in an efficient way.

Design/methodology/approach

A new relevance measure for nodes to a given keyword query is defined in the paper and according to the relevance metric, a new answer tree structure is proposed which has no constraint on the number of keyword nodes chosen for each query keyword. For efficient query processing, an inverted list-style index is suggested which pre-computes connectivity and relevance information on the nodes in the graph. Then, a query processing algorithm based on the pre-constructed inverted lists is designed, which aggregates list entries for each graph node relevant to given keywords and identifies top-k root nodes of answer trees most relevant to the given query. The basic search method is also enhanced by using extend inverted lists which store additional relevance information of the related entries in the lists in order to estimate the relevance score of a node more closely and to find top-k answers more efficiently.

Findings

Experiments with real datasets and various test queries were conducted for evaluating effectiveness and performance of the proposed methods in comparison with one of the previous approaches. The experimental results show that the proposed methods with an extended answer structure produce more effective top-k results than the compared previous method for most of the queries, especially for those with OR semantics. An extended inverted list and enhanced search algorithm are shown to achieve much improvement on the execution performance compared to the basic search method.

Originality/value

This paper proposes a new extended answer structure and query processing scheme for keyword queries on graph databases which can satisfy the users' information need represented by a keyword set having various semantics.

Details

International Journal of Web Information Systems, vol. 10 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 20 April 2015

Abubakar Roko, Shyamala Doraisamy, Azrul Hazri Jantan and Azreen Azman

The purpose of this paper is to propose and evaluate XKQSS, a query structuring method that relegates the task of generating structured queries from a user to a search engine…

Abstract

Purpose

The purpose of this paper is to propose and evaluate XKQSS, a query structuring method that relegates the task of generating structured queries from a user to a search engine while retaining the simple keyword search query interface. A more effective way for searching XML database is to use structured queries. However, using query languages to express queries prove to be difficult for most users since this requires learning a query language and knowledge of the underlying data schema. On the other hand, the success of Web search engines has made many users to be familiar with keyword search and, therefore, they prefer to use a keyword search query interface to search XML data.

Design/methodology/approach

Existing query structuring approaches require users to provide structural hints in their input keyword queries even though their interface is keyword base. Other problems with existing systems include their inability to put keyword query ambiguities into consideration during query structuring and how to select the best generated structure query that best represents a given keyword query. To address these problems, this study allows users to submit a schema independent keyword query, use named entity recognition (NER) to categorize query keywords to resolve query ambiguities and compute semantic information for a node from its data content. Algorithms were proposed that find user search intentions and convert the intentions into a set of ranked structured queries.

Findings

Experiments with Sigmod and IMDB datasets were conducted to evaluate the effectiveness of the method. The experimental result shows that the XKQSS is about 20 per cent more effective than XReal in terms of return nodes identification, a state-of-art systems for XML retrieval.

Originality/value

Existing systems do not take keyword query ambiguities into account. XKSS consists of two guidelines based on NER that help to resolve these ambiguities before converting the submitted query. It also include a ranking function computes a score for each generated query by using both semantic information and data statistic, as opposed to data statistic only approach used by the existing approaches.

Details

International Journal of Web Information Systems, vol. 11 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 May 1990

James A. Levin and Joseph Golton

Presents the Message Assistant system (based on a HyperCard stack),designed by the authors to rank electronic mail messages in a priorityorder specified by the system user…

Abstract

Presents the Message Assistant system (based on a HyperCard stack), designed by the authors to rank electronic mail messages in a priority order specified by the system user. Messages can be ranked according to the name of sender or even a key word in the message text.

Details

OCLC Micro, vol. 6 no. 5
Type: Research Article
ISSN: 8756-5196

Keywords

Article
Publication date: 1 March 1992

Simon White

The library at Ove Arup rejected a database management system solution to their catalogue requirements in favour of a text retrieval package in the shape of INMAGIC. The reasons…

Abstract

The library at Ove Arup rejected a database management system solution to their catalogue requirements in favour of a text retrieval package in the shape of INMAGIC. The reasons for the choice are outlined along with a description of the systems for which INMAGIC is used within the Partnership. In addition to system management procedures, some of the facilities offered by the software and some future developments are described.

Details

VINE, vol. 22 no. 3
Type: Research Article
ISSN: 0305-5728

Article
Publication date: 22 June 2010

Brian Tripney, Christopher Foley, Richard Gourlay and John Wilson

New directions in the provision of end‐user computing experiences mean that the best way to share data between small mobile computing devices needs to be determined. Partitioning…

Abstract

Purpose

New directions in the provision of end‐user computing experiences mean that the best way to share data between small mobile computing devices needs to be determined. Partitioning large structures so that they can be shared efficiently provides a basis for data‐intensive applications on such platforms. The partitioned structure can be compressed using dictionary‐based approaches and then directly queried without firstly decompressing the whole structure.

Design/methodology/approach

The paper describes an architecture for partitioning XML into structural and dictionary elements and the subsequent manipulation of the dictionary elements to make the best use of available space.

Findings

The results indicate that considerable savings are available by removing duplicate dictionaries. The paper also identifies the most effective strategy for defining dictionary scope.

Research limitations/implications

This evaluation is based on a range of benchmark XML structures and the approach to minimising dictionary size shows benefit in the majority of these. Where structures are small and regular, the benefits of efficient dictionary representation are lost. The authors' future research now focuses on heuristics for further partitioning of structural elements.

Practical implications

Mobile applications that need access to large data collections will benefit from the findings of this research. Traditional client/server architectures are not suited to dealing with high volume demands from a multitude of small mobile devices. Peer data sharing provides a more scalable solution and the experiments that the paper describes demonstrate the most effective way of sharing data in this context.

Social implications

Many services are available via smartphone devices but users are wary of exploiting the full potential because of the need to conserve battery power. The approach mitigates this challenge and consequently expands the potential for users to benefit from mobile information systems. This will have impact in areas such as advertising, entertainment and education but will depend on the acceptability of file sharing being extended from the desktop to the mobile environment.

Originality/value

The original work characterises the most effective way of sharing large data sets between small mobile devices. This will save battery power on devices such as smartphones, thus providing benefits to users of such devices.

Details

International Journal of Web Information Systems, vol. 6 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 7 August 2017

Junsheng Zhang, Yunchuan Sun and Changqing Yao

This paper aims to semantically linking scientific research events implied by scientific and technical literature to support information analysis and information service…

Abstract

Purpose

This paper aims to semantically linking scientific research events implied by scientific and technical literature to support information analysis and information service applications. Literature research is an important method to acquire scientific and technical information which is important for research, development and innovation of science and technology. It is difficult but urgently required to acquire accurate, timely, rapid, short and comprehensive information from the large-scale and fast-growing literature, especially in the big data era. Existing literature-based information retrieval systems focus on basic data organization, and they are far from meeting the needs of information analytics. It becomes urgent to organize and analyze scientific research events related to scientific and technical literature for forecasting development trend of science and technology.

Design/methodology/approach

Scientific literature such as a paper or a patent is represented as a scientific research event, which contains elements including when, where, who, what, how and why. Metadata of literature is used to formulate scientific research events that are implied in introduction and related work sections of literature. Named entities and research objects such as methods, materials and algorithms can be extracted from texts of literature by using text analysis. The authors semantically link scientific research events, entities and objects, and then, they construct the event space for supporting scientific and technical information analysis.

Findings

This paper represents scientific literature as events, which are coarse-grained units comparing with entities and relations in current information organizations. Events and semantic relations among them together formulate a semantic link network, which could support event-centric information browsing, search and recommendation.

Research limitations/implications

The proposed model is a theoretical model, and it needs to verify the efficiency in further experimental application research. The evaluation and applications of semantic link network of scientific research events are further research issues.

Originality/value

This paper regards scientific literature as scientific research events and proposes an approach to semantically link events into a network with multiple-typed entities and relations. According to the needs of scientific and technical information analysis, scientific research events are organized into event cubes which are distributed in a three-dimensioned space for easy-to-understand and information visualization.

Details

The Electronic Library, vol. 35 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 19 June 2009

Imam Machdi, Toshiyuki Amagasa and Hiroyuki Kitagawa

The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig…

Abstract

Purpose

The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig joins: grid metadata model for XML (GMX) and streams‐based partitioning method for XML (SPX).

Design/methodology/approach

GMX exploits the relationships between XML documents and query patterns to perform workload‐aware partitioning of XML data. Specifically, the paper constructs a two‐dimensional model with a document dimension and a query dimension in which each object in a dimension is composed from XML metadata related to the dimension. GMX provides a set of XML data partitioning methods that include document clustering, query clustering, document‐based refinement, query‐based refinement, and query‐path refinement, thereby enabling XML data partitioning based on the static information of XML metadata. In contrast, SPX explores the structural relationships of query elements and a range‐containment property of XML streams to generate partitions and allocate them to cluster nodes on‐the‐fly.

Findings

GMX provides several salient features: a set of partition granularities that balance workloads of query processing costs among cluster nodes statically; inter‐query parallelism as well as intra‐query parallelism at multiple extents; and better parallel query performance when all estimated queries are executed simultaneously to meet their probability of query occurrences in the system. SPX also offers the following features: minimal computation time to generate partitions; balancing skewed workloads dynamically on the system; producing higher intra‐query parallelism; and gaining better parallel query performance.

Research limitations/implications

The current status of the proposed XML data partitioning schemes does not take into account XML data updates, e.g. new XML documents and query pattern changes submitted by users on the system.

Practical implications

Note that effectiveness of the XML data partitioning schemes mainly relies on the accuracy of the cost model to estimate query processing costs. The cost model must be adjusted to reflect characteristics of a system platform used in the implementation.

Originality/value

This paper proposes novel schemes of conducting XML data partitioning to achieve both static and dynamic workload balance.

Details

International Journal of Web Information Systems, vol. 5 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 August 1997

A. Macfarlane, S.E. Robertson and J.A. Mccann

The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for text…

Abstract

The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for text retrieval. We analyse parallel IR systems using a classification defined by Rasmussen and describe some parallel IR systems. We give a description of the retrieval models used in parallel information processing. We describe areas of research which we believe are needed.

Details

Journal of Documentation, vol. 53 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 20 August 2018

Chang-Sup Park

This paper aims to propose a new keyword search method on graph data to improve the relevance of search results and reduce duplication of content nodes in the answer trees…

Abstract

Purpose

This paper aims to propose a new keyword search method on graph data to improve the relevance of search results and reduce duplication of content nodes in the answer trees obtained by previous approaches based on distinct root semantics. The previous approaches are restricted to find answer trees having different root nodes and thus often generate a result consisting of answer trees with low relevance to the query or duplicate content nodes. The method allows limited redundancy in the root nodes of top-k answer trees to produce more effective query results.

Design/methodology/approach

A measure for redundancy in a set of answer trees regarding their root nodes is defined, and according to the metric, a set of answer trees with limited root redundancy is proposed for the result of a keyword query on graph data. For efficient query processing, an index on the useful paths in the graph using inverted lists and a hash map is suggested. Then, based on the path index, a top-k query processing algorithm is presented to find most relevant and diverse answer trees given a maximum amount of root redundancy allowed for a set of answer trees.

Findings

The results of experiments using real graph datasets show that the proposed approach can produce effective query answers which are more diverse in the content nodes and more relevant to the query than the previous approach based on distinct root semantics.

Originality/value

This paper first takes redundancy in the root nodes of answer trees into account to improve the relevance and content nodes redundancy of query results over the previous distinct root semantics. It can satisfy the users’ various information need on a large and complex graph data using a keyword-based query.

Details

International Journal of Web Information Systems, vol. 14 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 12 July 2007

A. MacFarlane, J.A. McCann and S.E. Robertson

An issue that tends to be ignored in information retrieval is the issue of updating inverted files. This is largely because inverted files were devised to provide fast query…

Abstract

Purpose

An issue that tends to be ignored in information retrieval is the issue of updating inverted files. This is largely because inverted files were devised to provide fast query service, and much work has been done with the emphasis strongly on queries. This paper aims to study the effect of using parallel methods for the update of inverted files in order to reduce costs, by looking at two types of partitioning for inverted files: document identifier and term identifier.

Design/methodology/approach

Raw update service and update with query service are studied with these partitioning schemes using an incremental update strategy. The paper uses standard measures used in parallel computing such as speedup to examine the computing results and also the costs of reorganising indexes while servicing transactions.

Findings

Empirical results show that for both transaction processing and index reorganisation the document identifier method is superior. However, there is evidence that the term identifier partitioning method could be useful in a concurrent transaction processing context.

Practical implications

There is an increasing need to service updates, which is now becoming a requirement of inverted files (for dynamic collections such as the web), demonstrating that a shift in requirements of inverted file maintenance is needed from the past.

Originality/value

The paper is of value to database administrators who manage large‐scale and dynamic text collections, and who need to use parallel computing to implement their text retrieval services.

Details

Aslib Proceedings, vol. 59 no. 4/5
Type: Research Article
ISSN: 0001-253X

Keywords

1 – 10 of over 2000