Search results

1 – 10 of 42
Article
Publication date: 6 April 2012

Daniela Petrelli and Paul Clough

This paper aims to describe a study of the queries generated from a user experiment for cross‐language information retrieval (CLIR) from a historic image archive.

1316

Abstract

Purpose

This paper aims to describe a study of the queries generated from a user experiment for cross‐language information retrieval (CLIR) from a historic image archive.

Design/methodology/approach

A controlled lab‐based user study was carried out using a prototype Italian‐English image retrieval system. Participants were asked to carry out searches for 16 images provided to them, a known‐item search task. Italian speaking users generated 618 queries for a set of known‐item search tasks. User's interactions with the system were recorded and queries were analysed manually quantitatively and qualitatively. The queries generated by user's interaction with the system were analysed and the results used to suggest recommendations for the future development of cross‐language retrieval systems for digital image libraries.

Findings

Results highlight the diversity in requests for similar visual content and the weaknesses of machine translation for query translation. Through the manual translation of queries the authors show the benefits of using high‐quality translation resources. The results show the individual characteristics of users while performing known‐item searches and the overlap obtained between query terms and structured image captions, highlighting the use of user's search terms for objects within the foreground of an image.

Research limitations/implications

This research looks in depth into one case of interaction and one image repository. Despite this limitation, the discussed results are likely to be valid across other languages and image repositories.

Practical implications

To develop effective systems requires studying user's search behaviours, particularly in digital image libraries.

Originality/value

The growing quantity of digital visual material in digital libraries offers the potential to apply techniques from CLIR to provide cross‐language information access services. The value of this paper is in the provision of empirical evidence to support recommendations for effective cross‐language image retrieval system design.

Article
Publication date: 18 July 2016

Dong Zhou, Séamus Lawless, Xuan Wu, Wenyu Zhao and Jianxun Liu

With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native…

1161

Abstract

Purpose

With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion.

Design/methodology/approach

The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods.

Findings

Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suitable for users with a sufficient amount of historical data. The study also confirmed that user profiles represented by latent semantic models trained on a cross-lingual level gained better performance than the models trained on a monolingual level.

Originality/value

Previous studies on personalized information retrieval systems have primarily investigated user profiles and personalization strategies on a monolingual level. The effect of utilizing such monolingual profiles for personalized CLIR remains unclear. The current study fills the gap by a comprehensive study of user profile representation for personalized CLIR and a novel personalized CLIR evaluation methodology to ensure repeatable and controlled experiments can be conducted.

Details

Aslib Journal of Information Management, vol. 68 no. 4
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 31 August 2012

Dan Wu, Daqing He and Xiaomei Xu

With the vast amount of multilingual information available online, it becomes increasingly critical for libraries to use various multilingual information access techniques in…

Abstract

Purpose

With the vast amount of multilingual information available online, it becomes increasingly critical for libraries to use various multilingual information access techniques in order to effectively support patrons' online information requests. However, this is still a relatively under‐explored area. This paper aims to study the effectiveness and the adoptability of query expansion and translation enhancement in the context of interactive multilingual information access.

Design/methodology/approach

Relying on an interactive multilingual information access system called ICE‐TEA, the authors conducted a controlled experiment (English‐to‐Chinese translation) involving human subjects to assess the retrieval effectiveness, analyzed the collected search logs to examine users' behavior, and employed pre‐ and post‐questionnaires to obtain users' opinions about the system.

Findings

The results confirm that significant improvement in retrieval effectiveness can be achieved by combining query expansion with translation enhancement (as compared to a case when there is no relevance feedback). However, users' ability to understand, interact with and even perceive the complex process of searches involving the combination of query expansion and translation enhancement may greatly impact the effectiveness of the techniques. The results also confirm that human‐generated queries were short queries, which calls for careful consideration of how longer queries perform in real search because many search engines rely on longer and more complex queries.

Originality/value

This study examines two important relevance feedback techniques in the context of human‐involved multilingual information access. This study is a valuable addition to the information seeking behaviour literature.

Article
Publication date: 6 April 2012

Anne R. Diekema

Together, increasing globalization and the internet created fertile grounds for the establishment of multilingual digital libraries. Providing cross‐lingual access to materials is…

2648

Abstract

Purpose

Together, increasing globalization and the internet created fertile grounds for the establishment of multilingual digital libraries. Providing cross‐lingual access to materials is of particular interest to political entities such as the European Union, which currently has 23 official languages, but also to multinational companies and countries that have different languages represented among their citizens. The main objective of this paper is to review the literature on multilingual digital libraries and provide an overview of this area.

Design/methodology/approach

Based on a thorough literature search in four different databases, a core set of literature on multilingual digital libraries was retrieved. Literature on various aspects of this topic was reviewed. The paper is organized based on emerging themes directly drawn from the literature. Where warranted additional literature is brought in to provide necessary background information or clarification.

Findings

Creating a multilingual digital library is a highly complex undertaking and typically requires a collaborative effort between different organizations and people with different areas of expertise. Enabling users to search across languages requires translation resources to cross the language barrier, which can be challenging depending on the language and resource availability. Additional challenges were found to be in data management (localization and language processing), representation (dealing with different fonts and character codes), development (creating international software, cross‐cultural collaboration), and interoperability (system architecture and data sharing). Research in multilingual digital libraries was mostly system based involving experimental systems or system prototypes.

Research limitations/implications

Most likely the literature review does not include all possible journal articles on multilingual digital libraries even though the literature searches done to obtain these articles were thorough and deliberate. Journal articles without the descriptors used in this search and those articles not indexed in the four different databases used in the search will not be included here. The review excludes cross‐language information retrieval research unless it is directly related to existing multilingual digital libraries, or a connection to digital libraries in general is made in the paper itself.

Originality/value

This paper provides the first literature review on the topic of multilingual digital libraries and provides a concise overview of relevant aspects in this area. The number of multilingual digital libraries is growing, as is the interest from the research community in these libraries to apply their research findings from cross‐language information retrieval. This review article provides a valuable entry point to the field of multilingual digital libraries for researchers, practitioners, and other interested parties.

Details

The Electronic Library, vol. 30 no. 2
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 5 September 2008

Eija Airio

The aim of the current paper is to test whether query translation is beneficial in web retrieval.

Abstract

Purpose

The aim of the current paper is to test whether query translation is beneficial in web retrieval.

Design/methodology/approach

The language pairs were Finnish‐Swedish, English‐German and Finnish‐French. A total of 12‐18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary‐based system. In English‐German, also machine translation was utilized. The author used Google as the search engine.

Findings

The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query‐translation were better than in the traditional laboratory tests.

Originality/value

This research shows that query translation in web is beneficial especially for users with moderate and non‐active language skills. This is valuable information for developers of cross‐language information retrieval systems.

Details

Journal of Documentation, vol. 64 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 3 January 2020

Hany M. Alsalmi

Less attention has been paid to users’ interactions and behavior in studying multilingual search. Although digital library researchers have yet to assess user interaction and…

1127

Abstract

Purpose

Less attention has been paid to users’ interactions and behavior in studying multilingual search. Although digital library researchers have yet to assess user interaction and behavior in multilingual search, they have concurred that there is a need for user studies that document the extent to which information retrieval systems meet multilingual users’ needs and expectations. The paper aims to discuss these issues.

Design/methodology/approach

This study is composed of five individual cases. The case study participants were Saudi students enrolled either at a large state university or Historically Black College and University located in the same community. Research questions are, what do Saudi Digital Library (SDL) users experience when searching within the SDL in Arabic and English? And what strategies do they use if they fail to find resources? Data collected for this study were via a qualitative method called video-stimulated recall.

Findings

In the Arabic search tasks, participants realized that finding resources is not easy. Participants expressed their concerns about the lack of relevance and accuracy of results returned by the search system, indicating weak trust and confidence in the search system. Whereas in the English search task, participants felt more satisfied and confident in their ability to trust the results returned from the search system. Participants expressed their satisfaction in the search experience as it provided them with accurate and varying resources. The participants faced difficulties finding Arabic resources than English resources in the SDL.

Originality/value

This study is considered one of the earliest works in studying the information-seeking behavior of multilingual digital libraries in the Arabic language. The value of this study arises as being the first study to investigate and report the information-seeking behavior of SDL users.

Article
Publication date: 21 September 2012

Dan Wu and Daqing He

This paper seeks to examine the further integration of machine translation technologies with cross language information access in providing web users the capabilities of accessing…

1079

Abstract

Purpose

This paper seeks to examine the further integration of machine translation technologies with cross language information access in providing web users the capabilities of accessing information beyond language barriers. Machine translation and cross language information access are related technologies, and yet they have their own unique contributions in handling information in multiple languages. This paper aims to demonstrate that there are many opportunities to further integrate machine translation with cross language information access, and the combination can greatly empower web users in their information access.

Design/methodology/approach

Using English and Chinese as the language pair for studying, this paper looks at machine translation in query translation‐based cross language information access at multiple important aspects, which include query translation, relevance feedback, interactive cross language information access, out‐of‐vocabulary term translation, and data fusion. The goal is to obtain more insights about the wide range usages of machine translation in cross language information access, and to help the community to identify promising future directions for both machine translation and cross language access.

Findings

Machine translation can be applied effectively in many places in the whole cross language information access process. Queries translated by a machine translation system are high quality and are more robust in handling potential untranslated terms. Translation enhancement, a relevance feedback method using machine translation generated returned documents, is not only a valid technique by itself, but also helps to generate more robust cross language information access performance when combined with other relevance feedback techniques. Machine translation is also found to play a significant role in resolving untranslated terms and in data fusion.

Originality/value

This set of comparative empirical studies on integrating machine translation and cross language information access was performed on a common evaluation framework, and examined integration at multiple points of the cross language access process. The experimental results demonstrate the value of further integrating machine translation in cross language information access, and identify interesting future directions for both machine translation and cross language information access research.

Article
Publication date: 1 May 2006

Tuomas Talvensaari, Jorma Laurikkala, Kalervo Järvelin and Martti Juhola

To present a method for creating a comparable document collection from two document collections in different languages.

Abstract

Purpose

To present a method for creating a comparable document collection from two document collections in different languages.

Design/methodology/approach

The best query keys were extracted from a Finnish source collection (articles of the newspaper Aamulehti) with the relative average term frequency formula. The keys were translated into English with a dictionary‐based query translation program. The resulting lists of words were used as queries that were run against the target collection (Los Angeles Times articles) with the nearest neighbor method. The documents were aligned with unrestricted and date‐restricted alignment schemes, which were also combined.

Findings

The combined alignment scheme was found the best, when the relatedness of the document pairs was assessed with a five‐degree relevance scale. Of the 400 document pairs, roughly 40 percent were highly or fairly related and 75 percent included at least lexical similarity.

Research limitations/implications

The number of alignment pairs was small due to the short common time period of the two collections, and their geographical (and thus, topical) remoteness. In future, our aim is to build larger comparable corpora in various languages and use them as source of translation knowledge for the purposes of cross‐language information retrieval (CLIR).

Practical implications

Readily available parallel corpora are scarce. With this method, two unrelated document collections can relatively easily be aligned to create a CLIR resource.

Originality/value

The method can be applied to weakly linked collections and morphologically complex languages, such as Finnish.

Details

Journal of Documentation, vol. 62 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 June 2001

Ari Pirkola

This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of every language…

1144

Abstract

This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of every language in the world can be described by two variables, index of synthesis and index of fusion. These variables provide a theoretical basis for IR research handling morphological issues. A common theoretical framework is needed in particular because of the increasing significance of cross‐language retrieval research and CLIR systems processing different languages. The paper elaborates the linguistic morphological typology for the purposes of IR research. It studies how the indexes of synthesis and fusion could be used as practical tools in mono‐ and cross‐lingual IR research. The need for semantic and syntactic typologies is discussed. The paper also reviews studies made in different languages on the effects of morphology and stemming in IR.

Details

Journal of Documentation, vol. 57 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 19 October 2018

Hengyi Fu

With the increasing number of online multilingual resources, cross-language information retrieval (CLIR) has drawn much attention from the information retrieval (IR) research…

3038

Abstract

Purpose

With the increasing number of online multilingual resources, cross-language information retrieval (CLIR) has drawn much attention from the information retrieval (IR) research community. However, few studies have examined how and why multilingual searchers seek information in two or more languages, specifically how they switch and mix language in queries to get satisfying results. The purpose of this paper is to focus on Chinese–English bilinguals’ intra-sentential code-switching behaviors in online searches. The scenarios and reasons of code-switching, factors that may affect code-switching, the patterns of mixed language query formulation and reformulation and how current IR systems and other search tools can facilitate such information needs were examined.

Design/methodology/approach

In-depth semi-structured interviews were used as the research method. In total, 30 participants were recruited based on their English proficiency, location and profession, using a purposive sampling method.

Findings

Four scenarios and four reasons for using Chinese–English mixed language queries to cover information needs were identified, and results suggest that linguistic and cultural/social factors are of equivalent importance in code-switching behaviors. English terms and Chinese terms in queries play different roles in searches, and mixed language queries are irreplaceable by either single language queries or other search facilitating features. Findings also suggest current search engines and tools need greater emphasis in the user interface and more user education is required.

Originality/value

This study presents a qualitative analysis of bilinguals’ code-switching behaviors in online searches. Findings are expected to advance the theoretical understanding of bilingual users’ search strategies and interactions with IR systems, and provide insights for designing more effective IR systems and tools to discover multilingual online resources, including cross-language controlled vocabularies, personalized CLIR tools and mixed language query assistants.

Details

Aslib Journal of Information Management, vol. 71 no. 1
Type: Research Article
ISSN: 2050-3806

Keywords

1 – 10 of 42