Search results

1 – 10 of over 29000
Article
Publication date: 18 July 2016

Dong Zhou, Séamus Lawless, Xuan Wu, Wenyu Zhao and Jianxun Liu

With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native…

1144

Abstract

Purpose

With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion.

Design/methodology/approach

The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods.

Findings

Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suitable for users with a sufficient amount of historical data. The study also confirmed that user profiles represented by latent semantic models trained on a cross-lingual level gained better performance than the models trained on a monolingual level.

Originality/value

Previous studies on personalized information retrieval systems have primarily investigated user profiles and personalization strategies on a monolingual level. The effect of utilizing such monolingual profiles for personalized CLIR remains unclear. The current study fills the gap by a comprehensive study of user profile representation for personalized CLIR and a novel personalized CLIR evaluation methodology to ensure repeatable and controlled experiments can be conducted.

Details

Aslib Journal of Information Management, vol. 68 no. 4
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 8 September 2023

Oussama Ayoub, Christophe Rodrigues and Nicolas Travers

This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data…

Abstract

Purpose

This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data that modern IR systems have to manage, existing solutions are needed to efficiently find the best set of documents for a given request. The words used to describe a query can differ from those used in related documents. Despite meaning closeness, nonoverlapping words are challenging for IR systems. This word gap becomes significant for long documents from specific domains.

Design/methodology/approach

To generate new words for a document, a deep learning (DL) masked language model is used to infer related words. Used DL models are pretrained on massive text data and carry common or specific domain knowledge to propose a better document representation.

Findings

The authors evaluate the approach of this study on specific IR domains with long documents to show the genericity of the proposed model and achieve encouraging results.

Originality/value

In this paper, to the best of the authors’ knowledge, an original unsupervised and modular IR system based on recent DL methods is introduced.

Details

International Journal of Web Information Systems, vol. 19 no. 5/6
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 June 2001

Eero Sormunen, Jaana Kekÿlÿinen, Jussi Koivisto and Kalervo Jÿrvelin

The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough…

Abstract

The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough that an IR system is able to make a distinction between relevant and non‐relevant documents. The reduction of information overload requires that IR systems provide the capability of screening the most valuable documents out of the mass of potentially or marginally relevant documents. This paper introduces a new concept‐based method to analyse the text characteristics of documents at varying relevance levels. The results of the document analysis were applied in an experiment on query expansion (QE) in a probabilistic IR system. Statistical differences in textual characteristics of highly relevant and less relevant documents were investigated by applying a facet analysis technique. In highly relevant documents a larger number of aspects of the request were discussed, searchable expressions for the aspects were distributed over a larger set of text paragraphs, and a larger set of unique expressions were used per aspect than in marginally relevant documents. A query expansion experiment verified that the findings of the text analysis can be exploited in formulating more effective queries for best match retrieval in the search for highly relevant documents. The results revealed that expanded queries with concept‐based structures performed better than unexpanded queries or Ñnatural languageÒ queries. Further, it was shown that highly relevant documents benefit essentially more from the concept‐based QE in ranking than marginally relevant documents.

Details

Journal of Documentation, vol. 57 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 6 November 2017

Ngurah Agus Sanjaya Er, Mouhamadou Lamine Ba, Talel Abdessalem and Stéphane Bressan

This paper aims to focus on the design of algorithms and techniques for an effective set expansion. A tool that finds and extracts candidate sets of tuples from the World Wide Web…

Abstract

Purpose

This paper aims to focus on the design of algorithms and techniques for an effective set expansion. A tool that finds and extracts candidate sets of tuples from the World Wide Web was designed and implemented. For instance, when a given user provides <Indonesia, Jakarta, Indonesian Rupiah>, <China, Beijing, Yuan Renminbi>, <Canada, Ottawa, Canadian Dollar> as seeds, our system returns tuples composed of countries with their corresponding capital cities and currency names constructed from content extracted from Web pages retrieved.

Design/methodology/approach

The seeds are used to query a search engine and to retrieve relevant Web pages. The seeds are also used to infer wrappers from the retrieved pages. The wrappers, in turn, are used to extract candidates. The Web pages, wrappers, seeds and candidates, as well as their relationships, are vertices and edges of a heterogeneous graph. Several options for ranking candidates from PageRank to truth finding algorithms were evaluated and compared. Remarkably, all vertices are ranked, thus providing an integrated approach to not only answer direct set expansion questions but also find the most relevant pages to expand a given set of seeds.

Findings

The experimental results show that leveraging the truth finding algorithm can indeed improve the level of confidence in the extracted candidates and the sources.

Originality/value

Current approaches on set expansion mostly support sets of atomic data expansion. This idea can be extended to the sets of tuples and extract relation instances from the Web given a handful set of tuple seeds. A truth finding algorithm is also incorporated into the approach and it is shown that it can improve the confidence level in the ranking of both candidates and sources in set of tuples expansion.

Details

International Journal of Web Information Systems, vol. 13 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 February 1993

Eve Wilson

Hypertext is the computer storage of information as fragmented but linked multi‐dimensional documents. Such systems offer many advantages over the printed word, for example for…

Abstract

Hypertext is the computer storage of information as fragmented but linked multi‐dimensional documents. Such systems offer many advantages over the printed word, for example for group authorship of documents or to allow more creative access to the data, although there are some drawbacks. The design of chunky and creamy hypertext systems, the way in which information is presented to the end user and the relevant merits of the technique are discussed.

Details

VINE, vol. 23 no. 2
Type: Research Article
ISSN: 0305-5728

Article
Publication date: 15 December 2020

Haya Aldaghlas, Felix Kin Peng Hui and Colin Fraser Duffield

The initiation phase of capital projects is critical as this is where the highest number of options exist for modifying the project with minimal expenditure. Government and large…

Abstract

Purpose

The initiation phase of capital projects is critical as this is where the highest number of options exist for modifying the project with minimal expenditure. Government and large organisations frequently involved in major capital projects have extensive procedures for this phase, yet organisations having an operational focus (like major container terminal stevedores), that only occasionally undertake capital projects face the dilemma of the trade-off between project planning and the management of operations. This research reported in this paper investigated the impact of industry operational considerations on the initiation of capital projects.

Design/methodology/approach

In addition to an extensive literature review, a living research investigation of real projects initiated by a stevedoring company operating in Australia has been observed; the primary author of this paper spent six months as a participant/observer and witnessed the initiation of 12 capital projects. The collected data was qualitatively analysed using a four-step coding method.

Findings

The findings confirm that project initiation is a challenge for organisations who only spasmodically undertake capital projects and available project management frameworks do not necessarily consider the impact of such an organisation's culture. Issues identified that may have a negative impact on the initiation phase include lack of workplace trust, high individualism, ineffective interdepartmental communication, lack of resources and engineering and safety complexity.

Originality/value

The study investigated an underexplored industry within the context of project initiation, using the Australian stevedoring as a case study. This initial investigation suggests that a tailored project management framework is needed for the initiation phase of projects to reflect the unique nature of the stevedoring industry and by inference other industries that have a strong operational focus.

Details

International Journal of Managing Projects in Business, vol. 14 no. 4
Type: Research Article
ISSN: 1753-8378

Keywords

Article
Publication date: 15 January 2018

Wei Lu, Heng Ding and Jiepu Jiang

The purpose of this paper is to utilize document expansion techniques for improving image representation and retrieval. This paper proposes a concise framework for tag-based image…

Abstract

Purpose

The purpose of this paper is to utilize document expansion techniques for improving image representation and retrieval. This paper proposes a concise framework for tag-based image retrieval (TBIR).

Design/methodology/approach

The proposed approach includes three core components: a strategy of selecting expansion (similar) images from the whole corpus (e.g. cluster-based or nearest neighbor-based); a technique for assessing image similarity, which is adopted for selecting expansion images (text, image, or mixed); and a model for matching the expanded image representation with the search query (merging or separate).

Findings

The results show that applying the proposed method yields significant improvements in effectiveness, and the method obtains better performance on the top of the rank and makes a great improvement on some topics with zero score in baseline. Moreover, nearest neighbor-based expansion strategy outperforms the cluster-based expansion strategy, and using image features for selecting expansion images is better than using text features in most cases, and the separate method for calculating the augmented probability P(q|RD) is able to erase the negative influences of error images in RD.

Research limitations/implications

Despite these methods only outperform on the top of the rank instead of the entire rank list, TBIR on mobile platforms still can benefit from this approach.

Originality/value

Unlike former studies addressing the sparsity, vocabulary mismatch, and tag relatedness in TBIR individually, the approach proposed by this paper addresses all these issues with a single document expansion framework. It is a comprehensive investigation of document expansion techniques in TBIR.

Details

Aslib Journal of Information Management, vol. 70 no. 1
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 11 November 2014

S. Thenmalar and T.V. Geetha

The purpose of this paper is to improve the conceptual-based search by incorporating structural ontological information such as concepts and relations. Generally, Semantic-based…

1174

Abstract

Purpose

The purpose of this paper is to improve the conceptual-based search by incorporating structural ontological information such as concepts and relations. Generally, Semantic-based information retrieval aims to identify relevant information based on the meanings of the query terms or on the context of the terms and the performance of semantic information retrieval is carried out through standard measures-precision and recall. Higher precision leads to the (meaningful) relevant documents obtained and lower recall leads to the less coverage of the concepts.

Design/methodology/approach

In this paper, the authors enhance the existing ontology-based indexing proposed by Kohler et al., by incorporating sibling information to the index. The index designed by Kohler et al., contains only super and sub-concepts from the ontology. In addition, in our approach, we focus on two tasks; query expansion and ranking of the expanded queries, to improve the efficiency of the ontology-based search. The aforementioned tasks make use of ontological concepts, and relations existing between those concepts so as to obtain semantically more relevant search results for a given query.

Findings

The proposed ontology-based indexing technique is investigated by analysing the coverage of concepts that are being populated in the index. Here, we introduce a new measure called index enhancement measure, to estimate the coverage of ontological concepts being indexed. We have evaluated the ontology-based search for the tourism domain with the tourism documents and tourism-specific ontology. The comparison of search results based on the use of ontology “with and without query expansion” is examined to estimate the efficiency of the proposed query expansion task. The ranking is compared with the ORank system to evaluate the performance of our ontology-based search. From these analyses, the ontology-based search results shows better recall when compared to the other concept-based search systems. The mean average precision of the ontology-based search is found to be 0.79 and the recall is found to be 0.65, the ORank system has the mean average precision of 0.62 and the recall is found to be 0.51, while the concept-based search has the mean average precision of 0.56 and the recall is found to be 0.42.

Practical implications

When the concept is not present in the domain-specific ontology, the concept cannot be indexed. When the given query term is not available in the ontology then the term-based results are retrieved.

Originality/value

In addition to super and sub-concepts, we incorporate the concepts present in same level (siblings) to the ontological index. The structural information from the ontology is determined for the query expansion. The ranking of the documents depends on the type of the query (single concept query, multiple concept queries and concept with relation queries) and the ontological relations that exists in the query and the documents. With this ontological structural information, the search results showed us better coverage of concepts with respect to the query.

Details

Aslib Journal of Information Management, vol. 66 no. 6
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 6 June 2016

Wei Lu, Xinghu Yue, Qikai Cheng and Rui Meng

The purpose of this paper is to explore the use of inverse local context analysis (ILCA) to obtain data from limited accessible data sources.

Abstract

Purpose

The purpose of this paper is to explore the use of inverse local context analysis (ILCA) to obtain data from limited accessible data sources.

Design/methodology/approach

The experimental results show that the method the authors proposed can obtain all retrieved documents from the limited accessible data source using the least number of queries.

Findings

The experimental results show that the method we proposed can obtain all retrieved documents from the limited accessible data source using the least number of queries.

Originality/value

To the best of the authors’ knowledge, this paper provides the first attempt to gather all the retrieved documents from limited accessible data source, and the efficiency and ease of implementation of the proposed solution make it feasible for practical applications. The method the authors proposed can also benefit the construction of web corpus.

Details

The Electronic Library, vol. 34 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 1 August 2005

Bracha Shapira, Meirav Taieb‐Maimon and Yael Nemeth

Query expansion and query limitation are two known techniques for assisting users to define efficient queries. The purpose of this article is to examine the effectiveness of the…

Abstract

Purpose

Query expansion and query limitation are two known techniques for assisting users to define efficient queries. The purpose of this article is to examine the effectiveness of the two methods.

Design/methodology/approach

The research entailed an objective and subjective evaluation of the effectiveness of automatic and interactive query expansion and of two query limit options. The evaluation included both lab simulations and large‐scale user studies. The objective aspects were evaluated in lab simulations with experts judging user performance. The subjective analysis was carried out by having the participants evaluate the quality of, and express their satisfaction with, the retrieval process and its results, thus employing perceived‐value analysis.

Findings

The main findings reveal a difference between the perceived and real values of these techniques. While users expressed their satisfaction with interactive query expansion and its performance, the real‐value analysis of their performance did not show any significant difference between the retrieval modes.

Originality/value

The article evaluates the objective and subjective effectiveness of automatic and interactive query expansion and two query limit options.

Details

Online Information Review, vol. 29 no. 4
Type: Research Article
ISSN: 1468-4527

Keywords

1 – 10 of over 29000