Search results
1 – 10 of over 29000Dong Zhou, Séamus Lawless, Xuan Wu, Wenyu Zhao and Jianxun Liu
With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native…
Abstract
Purpose
With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion.
Design/methodology/approach
The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods.
Findings
Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suitable for users with a sufficient amount of historical data. The study also confirmed that user profiles represented by latent semantic models trained on a cross-lingual level gained better performance than the models trained on a monolingual level.
Originality/value
Previous studies on personalized information retrieval systems have primarily investigated user profiles and personalization strategies on a monolingual level. The effect of utilizing such monolingual profiles for personalized CLIR remains unclear. The current study fills the gap by a comprehensive study of user profile representation for personalized CLIR and a novel personalized CLIR evaluation methodology to ensure repeatable and controlled experiments can be conducted.
Details
Keywords
Oussama Ayoub, Christophe Rodrigues and Nicolas Travers
This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data…
Abstract
Purpose
This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data that modern IR systems have to manage, existing solutions are needed to efficiently find the best set of documents for a given request. The words used to describe a query can differ from those used in related documents. Despite meaning closeness, nonoverlapping words are challenging for IR systems. This word gap becomes significant for long documents from specific domains.
Design/methodology/approach
To generate new words for a document, a deep learning (DL) masked language model is used to infer related words. Used DL models are pretrained on massive text data and carry common or specific domain knowledge to propose a better document representation.
Findings
The authors evaluate the approach of this study on specific IR domains with long documents to show the genericity of the proposed model and achieve encouraging results.
Originality/value
In this paper, to the best of the authors’ knowledge, an original unsupervised and modular IR system based on recent DL methods is introduced.
Details
Keywords
Eero Sormunen, Jaana Kekÿlÿinen, Jussi Koivisto and Kalervo Jÿrvelin
The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough…
Abstract
The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough that an IR system is able to make a distinction between relevant and non‐relevant documents. The reduction of information overload requires that IR systems provide the capability of screening the most valuable documents out of the mass of potentially or marginally relevant documents. This paper introduces a new concept‐based method to analyse the text characteristics of documents at varying relevance levels. The results of the document analysis were applied in an experiment on query expansion (QE) in a probabilistic IR system. Statistical differences in textual characteristics of highly relevant and less relevant documents were investigated by applying a facet analysis technique. In highly relevant documents a larger number of aspects of the request were discussed, searchable expressions for the aspects were distributed over a larger set of text paragraphs, and a larger set of unique expressions were used per aspect than in marginally relevant documents. A query expansion experiment verified that the findings of the text analysis can be exploited in formulating more effective queries for best match retrieval in the search for highly relevant documents. The results revealed that expanded queries with concept‐based structures performed better than unexpanded queries or Ñnatural languageÒ queries. Further, it was shown that highly relevant documents benefit essentially more from the concept‐based QE in ranking than marginally relevant documents.
Details
Keywords
Ngurah Agus Sanjaya Er, Mouhamadou Lamine Ba, Talel Abdessalem and Stéphane Bressan
This paper aims to focus on the design of algorithms and techniques for an effective set expansion. A tool that finds and extracts candidate sets of tuples from the World Wide Web…
Abstract
Purpose
This paper aims to focus on the design of algorithms and techniques for an effective set expansion. A tool that finds and extracts candidate sets of tuples from the World Wide Web was designed and implemented. For instance, when a given user provides <Indonesia, Jakarta, Indonesian Rupiah>, <China, Beijing, Yuan Renminbi>, <Canada, Ottawa, Canadian Dollar> as seeds, our system returns tuples composed of countries with their corresponding capital cities and currency names constructed from content extracted from Web pages retrieved.
Design/methodology/approach
The seeds are used to query a search engine and to retrieve relevant Web pages. The seeds are also used to infer wrappers from the retrieved pages. The wrappers, in turn, are used to extract candidates. The Web pages, wrappers, seeds and candidates, as well as their relationships, are vertices and edges of a heterogeneous graph. Several options for ranking candidates from PageRank to truth finding algorithms were evaluated and compared. Remarkably, all vertices are ranked, thus providing an integrated approach to not only answer direct set expansion questions but also find the most relevant pages to expand a given set of seeds.
Findings
The experimental results show that leveraging the truth finding algorithm can indeed improve the level of confidence in the extracted candidates and the sources.
Originality/value
Current approaches on set expansion mostly support sets of atomic data expansion. This idea can be extended to the sets of tuples and extract relation instances from the Web given a handful set of tuple seeds. A truth finding algorithm is also incorporated into the approach and it is shown that it can improve the confidence level in the ranking of both candidates and sources in set of tuples expansion.
Details
Keywords
Hypertext is the computer storage of information as fragmented but linked multi‐dimensional documents. Such systems offer many advantages over the printed word, for example for…
Abstract
Hypertext is the computer storage of information as fragmented but linked multi‐dimensional documents. Such systems offer many advantages over the printed word, for example for group authorship of documents or to allow more creative access to the data, although there are some drawbacks. The design of chunky and creamy hypertext systems, the way in which information is presented to the end user and the relevant merits of the technique are discussed.
Haya Aldaghlas, Felix Kin Peng Hui and Colin Fraser Duffield
The initiation phase of capital projects is critical as this is where the highest number of options exist for modifying the project with minimal expenditure. Government and large…
Abstract
Purpose
The initiation phase of capital projects is critical as this is where the highest number of options exist for modifying the project with minimal expenditure. Government and large organisations frequently involved in major capital projects have extensive procedures for this phase, yet organisations having an operational focus (like major container terminal stevedores), that only occasionally undertake capital projects face the dilemma of the trade-off between project planning and the management of operations. This research reported in this paper investigated the impact of industry operational considerations on the initiation of capital projects.
Design/methodology/approach
In addition to an extensive literature review, a living research investigation of real projects initiated by a stevedoring company operating in Australia has been observed; the primary author of this paper spent six months as a participant/observer and witnessed the initiation of 12 capital projects. The collected data was qualitatively analysed using a four-step coding method.
Findings
The findings confirm that project initiation is a challenge for organisations who only spasmodically undertake capital projects and available project management frameworks do not necessarily consider the impact of such an organisation's culture. Issues identified that may have a negative impact on the initiation phase include lack of workplace trust, high individualism, ineffective interdepartmental communication, lack of resources and engineering and safety complexity.
Originality/value
The study investigated an underexplored industry within the context of project initiation, using the Australian stevedoring as a case study. This initial investigation suggests that a tailored project management framework is needed for the initiation phase of projects to reflect the unique nature of the stevedoring industry and by inference other industries that have a strong operational focus.
Details
Keywords
Wei Lu, Heng Ding and Jiepu Jiang
The purpose of this paper is to utilize document expansion techniques for improving image representation and retrieval. This paper proposes a concise framework for tag-based image…
Abstract
Purpose
The purpose of this paper is to utilize document expansion techniques for improving image representation and retrieval. This paper proposes a concise framework for tag-based image retrieval (TBIR).
Design/methodology/approach
The proposed approach includes three core components: a strategy of selecting expansion (similar) images from the whole corpus (e.g. cluster-based or nearest neighbor-based); a technique for assessing image similarity, which is adopted for selecting expansion images (text, image, or mixed); and a model for matching the expanded image representation with the search query (merging or separate).
Findings
The results show that applying the proposed method yields significant improvements in effectiveness, and the method obtains better performance on the top of the rank and makes a great improvement on some topics with zero score in baseline. Moreover, nearest neighbor-based expansion strategy outperforms the cluster-based expansion strategy, and using image features for selecting expansion images is better than using text features in most cases, and the separate method for calculating the augmented probability P(q|RD) is able to erase the negative influences of error images in RD.
Research limitations/implications
Despite these methods only outperform on the top of the rank instead of the entire rank list, TBIR on mobile platforms still can benefit from this approach.
Originality/value
Unlike former studies addressing the sparsity, vocabulary mismatch, and tag relatedness in TBIR individually, the approach proposed by this paper addresses all these issues with a single document expansion framework. It is a comprehensive investigation of document expansion techniques in TBIR.
Details
Keywords
The purpose of this paper is to improve the conceptual-based search by incorporating structural ontological information such as concepts and relations. Generally, Semantic-based…
Abstract
Purpose
The purpose of this paper is to improve the conceptual-based search by incorporating structural ontological information such as concepts and relations. Generally, Semantic-based information retrieval aims to identify relevant information based on the meanings of the query terms or on the context of the terms and the performance of semantic information retrieval is carried out through standard measures-precision and recall. Higher precision leads to the (meaningful) relevant documents obtained and lower recall leads to the less coverage of the concepts.
Design/methodology/approach
In this paper, the authors enhance the existing ontology-based indexing proposed by Kohler et al., by incorporating sibling information to the index. The index designed by Kohler et al., contains only super and sub-concepts from the ontology. In addition, in our approach, we focus on two tasks; query expansion and ranking of the expanded queries, to improve the efficiency of the ontology-based search. The aforementioned tasks make use of ontological concepts, and relations existing between those concepts so as to obtain semantically more relevant search results for a given query.
Findings
The proposed ontology-based indexing technique is investigated by analysing the coverage of concepts that are being populated in the index. Here, we introduce a new measure called index enhancement measure, to estimate the coverage of ontological concepts being indexed. We have evaluated the ontology-based search for the tourism domain with the tourism documents and tourism-specific ontology. The comparison of search results based on the use of ontology “with and without query expansion” is examined to estimate the efficiency of the proposed query expansion task. The ranking is compared with the ORank system to evaluate the performance of our ontology-based search. From these analyses, the ontology-based search results shows better recall when compared to the other concept-based search systems. The mean average precision of the ontology-based search is found to be 0.79 and the recall is found to be 0.65, the ORank system has the mean average precision of 0.62 and the recall is found to be 0.51, while the concept-based search has the mean average precision of 0.56 and the recall is found to be 0.42.
Practical implications
When the concept is not present in the domain-specific ontology, the concept cannot be indexed. When the given query term is not available in the ontology then the term-based results are retrieved.
Originality/value
In addition to super and sub-concepts, we incorporate the concepts present in same level (siblings) to the ontological index. The structural information from the ontology is determined for the query expansion. The ranking of the documents depends on the type of the query (single concept query, multiple concept queries and concept with relation queries) and the ontological relations that exists in the query and the documents. With this ontological structural information, the search results showed us better coverage of concepts with respect to the query.
Details
Keywords
Wei Lu, Xinghu Yue, Qikai Cheng and Rui Meng
The purpose of this paper is to explore the use of inverse local context analysis (ILCA) to obtain data from limited accessible data sources.
Abstract
Purpose
The purpose of this paper is to explore the use of inverse local context analysis (ILCA) to obtain data from limited accessible data sources.
Design/methodology/approach
The experimental results show that the method the authors proposed can obtain all retrieved documents from the limited accessible data source using the least number of queries.
Findings
The experimental results show that the method we proposed can obtain all retrieved documents from the limited accessible data source using the least number of queries.
Originality/value
To the best of the authors’ knowledge, this paper provides the first attempt to gather all the retrieved documents from limited accessible data source, and the efficiency and ease of implementation of the proposed solution make it feasible for practical applications. The method the authors proposed can also benefit the construction of web corpus.
Details
Keywords
Bracha Shapira, Meirav Taieb‐Maimon and Yael Nemeth
Query expansion and query limitation are two known techniques for assisting users to define efficient queries. The purpose of this article is to examine the effectiveness of the…
Abstract
Purpose
Query expansion and query limitation are two known techniques for assisting users to define efficient queries. The purpose of this article is to examine the effectiveness of the two methods.
Design/methodology/approach
The research entailed an objective and subjective evaluation of the effectiveness of automatic and interactive query expansion and of two query limit options. The evaluation included both lab simulations and large‐scale user studies. The objective aspects were evaluated in lab simulations with experts judging user performance. The subjective analysis was carried out by having the participants evaluate the quality of, and express their satisfaction with, the retrieval process and its results, thus employing perceived‐value analysis.
Findings
The main findings reveal a difference between the perceived and real values of these techniques. While users expressed their satisfaction with interactive query expansion and its performance, the real‐value analysis of their performance did not show any significant difference between the retrieval modes.
Originality/value
The article evaluates the objective and subjective effectiveness of automatic and interactive query expansion and two query limit options.
Details