Search results
1 – 10 of over 2000Chang-Sup Park and Sungchae Lim
The paper aims to propose an effective method to process keyword-based queries over graph-structured databases which are widely used in various applications such as XML, semantic…
Abstract
Purpose
The paper aims to propose an effective method to process keyword-based queries over graph-structured databases which are widely used in various applications such as XML, semantic web, and social network services. To satisfy users' information need, it proposes an extended answer structure for keyword queries, inverted list indexes on keywords and nodes, and query processing algorithms exploiting the inverted lists. The study aims to provide more effective and relevant answers to a given query than the previous approaches in an efficient way.
Design/methodology/approach
A new relevance measure for nodes to a given keyword query is defined in the paper and according to the relevance metric, a new answer tree structure is proposed which has no constraint on the number of keyword nodes chosen for each query keyword. For efficient query processing, an inverted list-style index is suggested which pre-computes connectivity and relevance information on the nodes in the graph. Then, a query processing algorithm based on the pre-constructed inverted lists is designed, which aggregates list entries for each graph node relevant to given keywords and identifies top-k root nodes of answer trees most relevant to the given query. The basic search method is also enhanced by using extend inverted lists which store additional relevance information of the related entries in the lists in order to estimate the relevance score of a node more closely and to find top-k answers more efficiently.
Findings
Experiments with real datasets and various test queries were conducted for evaluating effectiveness and performance of the proposed methods in comparison with one of the previous approaches. The experimental results show that the proposed methods with an extended answer structure produce more effective top-k results than the compared previous method for most of the queries, especially for those with OR semantics. An extended inverted list and enhanced search algorithm are shown to achieve much improvement on the execution performance compared to the basic search method.
Originality/value
This paper proposes a new extended answer structure and query processing scheme for keyword queries on graph databases which can satisfy the users' information need represented by a keyword set having various semantics.
Details
Keywords
Abubakar Roko, Shyamala Doraisamy, Azrul Hazri Jantan and Azreen Azman
The purpose of this paper is to propose and evaluate XKQSS, a query structuring method that relegates the task of generating structured queries from a user to a search engine…
Abstract
Purpose
The purpose of this paper is to propose and evaluate XKQSS, a query structuring method that relegates the task of generating structured queries from a user to a search engine while retaining the simple keyword search query interface. A more effective way for searching XML database is to use structured queries. However, using query languages to express queries prove to be difficult for most users since this requires learning a query language and knowledge of the underlying data schema. On the other hand, the success of Web search engines has made many users to be familiar with keyword search and, therefore, they prefer to use a keyword search query interface to search XML data.
Design/methodology/approach
Existing query structuring approaches require users to provide structural hints in their input keyword queries even though their interface is keyword base. Other problems with existing systems include their inability to put keyword query ambiguities into consideration during query structuring and how to select the best generated structure query that best represents a given keyword query. To address these problems, this study allows users to submit a schema independent keyword query, use named entity recognition (NER) to categorize query keywords to resolve query ambiguities and compute semantic information for a node from its data content. Algorithms were proposed that find user search intentions and convert the intentions into a set of ranked structured queries.
Findings
Experiments with Sigmod and IMDB datasets were conducted to evaluate the effectiveness of the method. The experimental result shows that the XKQSS is about 20 per cent more effective than XReal in terms of return nodes identification, a state-of-art systems for XML retrieval.
Originality/value
Existing systems do not take keyword query ambiguities into account. XKSS consists of two guidelines based on NER that help to resolve these ambiguities before converting the submitted query. It also include a ranking function computes a score for each generated query by using both semantic information and data statistic, as opposed to data statistic only approach used by the existing approaches.
Details
Keywords
James A. Levin and Joseph Golton
Presents the Message Assistant system (based on a HyperCard stack),designed by the authors to rank electronic mail messages in a priorityorder specified by the system user…
Abstract
Presents the Message Assistant system (based on a HyperCard stack), designed by the authors to rank electronic mail messages in a priority order specified by the system user. Messages can be ranked according to the name of sender or even a key word in the message text.
Details
Keywords
The library at Ove Arup rejected a database management system solution to their catalogue requirements in favour of a text retrieval package in the shape of INMAGIC. The reasons…
Abstract
The library at Ove Arup rejected a database management system solution to their catalogue requirements in favour of a text retrieval package in the shape of INMAGIC. The reasons for the choice are outlined along with a description of the systems for which INMAGIC is used within the Partnership. In addition to system management procedures, some of the facilities offered by the software and some future developments are described.
Brian Tripney, Christopher Foley, Richard Gourlay and John Wilson
New directions in the provision of end‐user computing experiences mean that the best way to share data between small mobile computing devices needs to be determined. Partitioning…
Abstract
Purpose
New directions in the provision of end‐user computing experiences mean that the best way to share data between small mobile computing devices needs to be determined. Partitioning large structures so that they can be shared efficiently provides a basis for data‐intensive applications on such platforms. The partitioned structure can be compressed using dictionary‐based approaches and then directly queried without firstly decompressing the whole structure.
Design/methodology/approach
The paper describes an architecture for partitioning XML into structural and dictionary elements and the subsequent manipulation of the dictionary elements to make the best use of available space.
Findings
The results indicate that considerable savings are available by removing duplicate dictionaries. The paper also identifies the most effective strategy for defining dictionary scope.
Research limitations/implications
This evaluation is based on a range of benchmark XML structures and the approach to minimising dictionary size shows benefit in the majority of these. Where structures are small and regular, the benefits of efficient dictionary representation are lost. The authors' future research now focuses on heuristics for further partitioning of structural elements.
Practical implications
Mobile applications that need access to large data collections will benefit from the findings of this research. Traditional client/server architectures are not suited to dealing with high volume demands from a multitude of small mobile devices. Peer data sharing provides a more scalable solution and the experiments that the paper describes demonstrate the most effective way of sharing data in this context.
Social implications
Many services are available via smartphone devices but users are wary of exploiting the full potential because of the need to conserve battery power. The approach mitigates this challenge and consequently expands the potential for users to benefit from mobile information systems. This will have impact in areas such as advertising, entertainment and education but will depend on the acceptability of file sharing being extended from the desktop to the mobile environment.
Originality/value
The original work characterises the most effective way of sharing large data sets between small mobile devices. This will save battery power on devices such as smartphones, thus providing benefits to users of such devices.
Details
Keywords
Junsheng Zhang, Yunchuan Sun and Changqing Yao
This paper aims to semantically linking scientific research events implied by scientific and technical literature to support information analysis and information service…
Abstract
Purpose
This paper aims to semantically linking scientific research events implied by scientific and technical literature to support information analysis and information service applications. Literature research is an important method to acquire scientific and technical information which is important for research, development and innovation of science and technology. It is difficult but urgently required to acquire accurate, timely, rapid, short and comprehensive information from the large-scale and fast-growing literature, especially in the big data era. Existing literature-based information retrieval systems focus on basic data organization, and they are far from meeting the needs of information analytics. It becomes urgent to organize and analyze scientific research events related to scientific and technical literature for forecasting development trend of science and technology.
Design/methodology/approach
Scientific literature such as a paper or a patent is represented as a scientific research event, which contains elements including when, where, who, what, how and why. Metadata of literature is used to formulate scientific research events that are implied in introduction and related work sections of literature. Named entities and research objects such as methods, materials and algorithms can be extracted from texts of literature by using text analysis. The authors semantically link scientific research events, entities and objects, and then, they construct the event space for supporting scientific and technical information analysis.
Findings
This paper represents scientific literature as events, which are coarse-grained units comparing with entities and relations in current information organizations. Events and semantic relations among them together formulate a semantic link network, which could support event-centric information browsing, search and recommendation.
Research limitations/implications
The proposed model is a theoretical model, and it needs to verify the efficiency in further experimental application research. The evaluation and applications of semantic link network of scientific research events are further research issues.
Originality/value
This paper regards scientific literature as scientific research events and proposes an approach to semantically link events into a network with multiple-typed entities and relations. According to the needs of scientific and technical information analysis, scientific research events are organized into event cubes which are distributed in a three-dimensioned space for easy-to-understand and information visualization.
Details
Keywords
Imam Machdi, Toshiyuki Amagasa and Hiroyuki Kitagawa
The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig…
Abstract
Purpose
The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig joins: grid metadata model for XML (GMX) and streams‐based partitioning method for XML (SPX).
Design/methodology/approach
GMX exploits the relationships between XML documents and query patterns to perform workload‐aware partitioning of XML data. Specifically, the paper constructs a two‐dimensional model with a document dimension and a query dimension in which each object in a dimension is composed from XML metadata related to the dimension. GMX provides a set of XML data partitioning methods that include document clustering, query clustering, document‐based refinement, query‐based refinement, and query‐path refinement, thereby enabling XML data partitioning based on the static information of XML metadata. In contrast, SPX explores the structural relationships of query elements and a range‐containment property of XML streams to generate partitions and allocate them to cluster nodes on‐the‐fly.
Findings
GMX provides several salient features: a set of partition granularities that balance workloads of query processing costs among cluster nodes statically; inter‐query parallelism as well as intra‐query parallelism at multiple extents; and better parallel query performance when all estimated queries are executed simultaneously to meet their probability of query occurrences in the system. SPX also offers the following features: minimal computation time to generate partitions; balancing skewed workloads dynamically on the system; producing higher intra‐query parallelism; and gaining better parallel query performance.
Research limitations/implications
The current status of the proposed XML data partitioning schemes does not take into account XML data updates, e.g. new XML documents and query pattern changes submitted by users on the system.
Practical implications
Note that effectiveness of the XML data partitioning schemes mainly relies on the accuracy of the cost model to estimate query processing costs. The cost model must be adjusted to reflect characteristics of a system platform used in the implementation.
Originality/value
This paper proposes novel schemes of conducting XML data partitioning to achieve both static and dynamic workload balance.
Details
Keywords
A. Macfarlane, S.E. Robertson and J.A. Mccann
The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for text…
Abstract
The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for text retrieval. We analyse parallel IR systems using a classification defined by Rasmussen and describe some parallel IR systems. We give a description of the retrieval models used in parallel information processing. We describe areas of research which we believe are needed.
Details
Keywords
This paper aims to propose a new keyword search method on graph data to improve the relevance of search results and reduce duplication of content nodes in the answer trees…
Abstract
Purpose
This paper aims to propose a new keyword search method on graph data to improve the relevance of search results and reduce duplication of content nodes in the answer trees obtained by previous approaches based on distinct root semantics. The previous approaches are restricted to find answer trees having different root nodes and thus often generate a result consisting of answer trees with low relevance to the query or duplicate content nodes. The method allows limited redundancy in the root nodes of top-k answer trees to produce more effective query results.
Design/methodology/approach
A measure for redundancy in a set of answer trees regarding their root nodes is defined, and according to the metric, a set of answer trees with limited root redundancy is proposed for the result of a keyword query on graph data. For efficient query processing, an index on the useful paths in the graph using inverted lists and a hash map is suggested. Then, based on the path index, a top-k query processing algorithm is presented to find most relevant and diverse answer trees given a maximum amount of root redundancy allowed for a set of answer trees.
Findings
The results of experiments using real graph datasets show that the proposed approach can produce effective query answers which are more diverse in the content nodes and more relevant to the query than the previous approach based on distinct root semantics.
Originality/value
This paper first takes redundancy in the root nodes of answer trees into account to improve the relevance and content nodes redundancy of query results over the previous distinct root semantics. It can satisfy the users’ various information need on a large and complex graph data using a keyword-based query.
Details
Keywords
A. MacFarlane, J.A. McCann and S.E. Robertson
An issue that tends to be ignored in information retrieval is the issue of updating inverted files. This is largely because inverted files were devised to provide fast query…
Abstract
Purpose
An issue that tends to be ignored in information retrieval is the issue of updating inverted files. This is largely because inverted files were devised to provide fast query service, and much work has been done with the emphasis strongly on queries. This paper aims to study the effect of using parallel methods for the update of inverted files in order to reduce costs, by looking at two types of partitioning for inverted files: document identifier and term identifier.
Design/methodology/approach
Raw update service and update with query service are studied with these partitioning schemes using an incremental update strategy. The paper uses standard measures used in parallel computing such as speedup to examine the computing results and also the costs of reorganising indexes while servicing transactions.
Findings
Empirical results show that for both transaction processing and index reorganisation the document identifier method is superior. However, there is evidence that the term identifier partitioning method could be useful in a concurrent transaction processing context.
Practical implications
There is an increasing need to service updates, which is now becoming a requirement of inverted files (for dynamic collections such as the web), demonstrating that a shift in requirements of inverted file maintenance is needed from the past.
Originality/value
The paper is of value to database administrators who manage large‐scale and dynamic text collections, and who need to use parallel computing to implement their text retrieval services.
Details