Search results
1 – 10 of over 6000Douglas Tudhope, Ceri Binding, Dorothee Blocks and Daniel Cunliffe
The purpose of this paper is to explore query expansion via conceptual distance in thesaurus indexed collections
Abstract
Purpose
The purpose of this paper is to explore query expansion via conceptual distance in thesaurus indexed collections
Design/methodology/approach
An extract of the National Museum of Science and Industry's collections database, indexed with the Getty Art and Architecture Thesaurus (AAT), was the dataset for the research. The system architecture and algorithms for semantic closeness and the matching function are outlined. Standalone and web interfaces are described and formative qualitative user studies are discussed. One user session is discussed in detail, together with a scenario based on a related public inquiry. Findings are set in context of the literature on thesaurus‐based query expansion. This paper discusses the potential of query expansion techniques using the semantic relationships in a faceted thesaurus.
Findings
Thesaurus‐assisted retrieval systems have potential for multi‐concept descriptors, permitting very precise queries and indexing. However, indexer and searcher may differ in terminology judgments and there may not be any exactly matching results. The integration of semantic closeness in the matching function permits ranked results for multi‐concept queries in thesaurus‐indexed applications. An in‐memory representation of the thesaurus semantic network allows a combination of automatic and interactive control of expansion and control of expansion on individual query terms.
Originality/value
The application of semantic expansion to browsing may be useful in interface options where thesaurus structure is hidden.
Details
Keywords
The semantic and structural heterogeneity of large Extensible Markup Language (XML) digital libraries emphasizes the need of supporting approximate queries, i.e. queries…
Abstract
Purpose
The semantic and structural heterogeneity of large Extensible Markup Language (XML) digital libraries emphasizes the need of supporting approximate queries, i.e. queries where the matching conditions are relaxed so as to retrieve results that possibly partially satisfy the user's requests. The paper aims to propose a flexible query answering framework which efficiently supports complex approximate queries on XML data.
Design/methodology/approach
To reduce the number of relaxations applicable to a query, the paper relies on the specification of user preferences about the types of approximations allowed. A specifically devised index structure which efficiently supports both semantic and structural approximations, according to the specified user preferences, is proposed. Also, a ranking model to quantify approximations in the results is presented.
Findings
Personalized queries, on one hand, effectively narrow the space of query reformulations, on the other hand, enhance the user query capabilities with a great deal of flexibility and control over requests. As to the quality of results, the retrieval process considerably benefits because of the presence of user preferences in the queries. Experiments demonstrate the effectiveness and the efficiency of the proposal, as well as its scalability.
Research limitations/implications
Future developments concern the evaluation of the effectiveness of personalization on queries through additional examinations of the effects of the variability of parameters expressing user preferences.
Originality/value
The paper is intended for the research community and proposes a novel query model which incorporates user preferences about query relaxations on large heterogeneous XML data collections.
Details
Keywords
Xiaoming Zhang, Mingming Meng, Xiaoling Sun and Yu Bai
With the advent of the era of Big Data, the scale of knowledge graph (KG) in various domains is growing rapidly, which holds huge amount of knowledge surely benefiting the…
Abstract
Purpose
With the advent of the era of Big Data, the scale of knowledge graph (KG) in various domains is growing rapidly, which holds huge amount of knowledge surely benefiting the question answering (QA) research. However, the KG, which is always constituted of entities and relations, is structurally inconsistent with the natural language query. Thus, the QA system based on KG is still faced with difficulties. The purpose of this paper is to propose a method to answer the domain-specific questions based on KG, providing conveniences for the information query over domain KG.
Design/methodology/approach
The authors propose a method FactQA to answer the factual questions about specific domain. A series of logical rules are designed to transform the factual questions into the triples, in order to solve the structural inconsistency between the user’s question and the domain knowledge. Then, the query expansion strategies and filtering strategies are proposed from two levels (i.e. words and triples in the question). For matching the question with domain knowledge, not only the similarity values between the words in the question and the resources in the domain knowledge but also the tag information of these words is considered. And the tag information is obtained by parsing the question using Stanford CoreNLP. In this paper, the KG in metallic materials domain is used to illustrate the FactQA method.
Findings
The designed logical rules have time stability for transforming the factual questions into the triples. Additionally, after filtering the synonym expansion results of the words in the question, the expansion quality of the triple representation of the question is improved. The tag information of the words in the question is considered in the process of data matching, which could help to filter out the wrong matches.
Originality/value
Although the FactQA is proposed for domain-specific QA, it can also be applied to any other domain besides metallic materials domain. For a question that cannot be answered, FactQA would generate a new related question to answer, providing as much as possible the user with the information they probably need. The FactQA could facilitate the user’s information query based on the emerging KG.
Details
Keywords
Search trees are a set of paths with branches or choices that enable a system to carry out the most sensible search approach at each stage of a search. A new design for…
Abstract
Search trees are a set of paths with branches or choices that enable a system to carry out the most sensible search approach at each stage of a search. A new design for subject access to online catalogs enlists search trees to identify the characteristics of end‐user queries for subjects, control system responses, and determine appropriate subject‐searching approaches in response to the subject queries users entered. The purpose of this article is to identify characteristics of the most difficult user queries and recommend enhancements to the new subject‐searching design to enable it to produce useful retrievals in response to the wide variety of queries users pose to online catalogs. Online catalogs governed by search trees are more effective than the users themselves in selecting subject‐searching approaches that would produce useful information for the subjects users seek. The enhanced search trees presented and tested in this article enlist subject‐searching approaches that are not typical of the functionality of operational online catalogs. Design and development is required to upgrade existing online catalogs with search trees and new subject‐searching functionality to be successful in responding with useful retrievals to the most difficult user queries.
eXtensible Markup Language (XML) data are data which are not necessarily constrained by a schema, XML is fast emerging as a standard for data representation and exchange…
Abstract
Purpose
eXtensible Markup Language (XML) data are data which are not necessarily constrained by a schema, XML is fast emerging as a standard for data representation and exchange on the world wide web, the ability to intelligently query XML data becomes increasingly important. Some XML graphical query languages for XML data have been proposed but they are either too complex or too limited in the power of expression and in their use. The purpose of this paper is to propose a recursive graphical query language for querying and restructuring XML data (RGQLX). The expressive power of RGQLX is comparable to Fixpoint. RGQLX language is a multi‐sorted graphical language integrating grouping, aggregate functions, nested queries and recursion.
Design/methodology/approach
The methodology emphasizes on RGQLX's development which is base of G‐XML data model syntax to express a wide variety of XML queries, ranging from simple selection, to expressive data transformations involving grouping, aggregation and sorting. RGQLX allows users to express recursive visual queries in an elegant manner. RGQLX has an operational semantics based on the annotated XML, which serves to express queries and data trees in form of XML. The paper presents an algorithm to achieve the matching between data and query trees after translating a query tree into annotated XML.
Findings
Developed and demonstrated were: a G‐XML model; recursive queries; annotated XML for the semantic operations and a matching algorithm.
Research limitations/implications
The future research work on RGQLX language will be expanding it to include recursive aggregations.
Practical implications
The algorithms/approaches proposed can be easily integrated in any commercial product to enhance the performance of XML query languages.
Originality/value
The proposed work integrates various novel techniques for XML query syntax/semantic into a single language with a suitable matching algorithm. The power of this proposal is in the class of Fixpoint queries.
Details
Keywords
Term position information, as provided in some Boolean systems in the form of field restriction and term proximity, is reviewed and its value assessed. Non‐Boolean…
Abstract
Term position information, as provided in some Boolean systems in the form of field restriction and term proximity, is reviewed and its value assessed. Non‐Boolean retrieval in the form of the ranked output experiment has not so far used term position information but has concentrated on schemes of term weighting. The use of term proximity devices is proposed here by analogy with Boolean techniques and seven algorithms are devised to incorporate the ideas of sentence matching, proximate terms, term order specification and term distance computations. It is hypothesised that term position will act as a precision device. A new search experiment is then described in which a test collection is processed into sentences and then output ranking using term position is obtained. Results are given for five algorithms compared against quorum searching as the benchmark. The best result increased the precision ratio by 18% and used proximate matching term pairs in sentences plus a distance component.