Search results
1 – 10 of over 2000This paper seeks to examine the further integration of machine translation technologies with cross language information access in providing web users the capabilities of accessing…
Abstract
Purpose
This paper seeks to examine the further integration of machine translation technologies with cross language information access in providing web users the capabilities of accessing information beyond language barriers. Machine translation and cross language information access are related technologies, and yet they have their own unique contributions in handling information in multiple languages. This paper aims to demonstrate that there are many opportunities to further integrate machine translation with cross language information access, and the combination can greatly empower web users in their information access.
Design/methodology/approach
Using English and Chinese as the language pair for studying, this paper looks at machine translation in query translation‐based cross language information access at multiple important aspects, which include query translation, relevance feedback, interactive cross language information access, out‐of‐vocabulary term translation, and data fusion. The goal is to obtain more insights about the wide range usages of machine translation in cross language information access, and to help the community to identify promising future directions for both machine translation and cross language access.
Findings
Machine translation can be applied effectively in many places in the whole cross language information access process. Queries translated by a machine translation system are high quality and are more robust in handling potential untranslated terms. Translation enhancement, a relevance feedback method using machine translation generated returned documents, is not only a valid technique by itself, but also helps to generate more robust cross language information access performance when combined with other relevance feedback techniques. Machine translation is also found to play a significant role in resolving untranslated terms and in data fusion.
Originality/value
This set of comparative empirical studies on integrating machine translation and cross language information access was performed on a common evaluation framework, and examined integration at multiple points of the cross language access process. The experimental results demonstrate the value of further integrating machine translation in cross language information access, and identify interesting future directions for both machine translation and cross language information access research.
Details
Keywords
Daniela Petrelli and Paul Clough
This paper aims to describe a study of the queries generated from a user experiment for cross‐language information retrieval (CLIR) from a historic image archive.
Abstract
Purpose
This paper aims to describe a study of the queries generated from a user experiment for cross‐language information retrieval (CLIR) from a historic image archive.
Design/methodology/approach
A controlled lab‐based user study was carried out using a prototype Italian‐English image retrieval system. Participants were asked to carry out searches for 16 images provided to them, a known‐item search task. Italian speaking users generated 618 queries for a set of known‐item search tasks. User's interactions with the system were recorded and queries were analysed manually quantitatively and qualitatively. The queries generated by user's interaction with the system were analysed and the results used to suggest recommendations for the future development of cross‐language retrieval systems for digital image libraries.
Findings
Results highlight the diversity in requests for similar visual content and the weaknesses of machine translation for query translation. Through the manual translation of queries the authors show the benefits of using high‐quality translation resources. The results show the individual characteristics of users while performing known‐item searches and the overlap obtained between query terms and structured image captions, highlighting the use of user's search terms for objects within the foreground of an image.
Research limitations/implications
This research looks in depth into one case of interaction and one image repository. Despite this limitation, the discussed results are likely to be valid across other languages and image repositories.
Practical implications
To develop effective systems requires studying user's search behaviours, particularly in digital image libraries.
Originality/value
The growing quantity of digital visual material in digital libraries offers the potential to apply techniques from CLIR to provide cross‐language information access services. The value of this paper is in the provision of empirical evidence to support recommendations for effective cross‐language image retrieval system design.
Details
Keywords
Dan Wu, Daqing He and Xiaomei Xu
With the vast amount of multilingual information available online, it becomes increasingly critical for libraries to use various multilingual information access techniques in…
Abstract
Purpose
With the vast amount of multilingual information available online, it becomes increasingly critical for libraries to use various multilingual information access techniques in order to effectively support patrons' online information requests. However, this is still a relatively under‐explored area. This paper aims to study the effectiveness and the adoptability of query expansion and translation enhancement in the context of interactive multilingual information access.
Design/methodology/approach
Relying on an interactive multilingual information access system called ICE‐TEA, the authors conducted a controlled experiment (English‐to‐Chinese translation) involving human subjects to assess the retrieval effectiveness, analyzed the collected search logs to examine users' behavior, and employed pre‐ and post‐questionnaires to obtain users' opinions about the system.
Findings
The results confirm that significant improvement in retrieval effectiveness can be achieved by combining query expansion with translation enhancement (as compared to a case when there is no relevance feedback). However, users' ability to understand, interact with and even perceive the complex process of searches involving the combination of query expansion and translation enhancement may greatly impact the effectiveness of the techniques. The results also confirm that human‐generated queries were short queries, which calls for careful consideration of how longer queries perform in real search because many search engines rely on longer and more complex queries.
Originality/value
This study examines two important relevance feedback techniques in the context of human‐involved multilingual information access. This study is a valuable addition to the information seeking behaviour literature.
Details
Keywords
The purpose of this paper is to investigate the use of electronic information resources to solve cultural translation problems at different stages of acquisition of the…
Abstract
Purpose
The purpose of this paper is to investigate the use of electronic information resources to solve cultural translation problems at different stages of acquisition of the translator’s cultural competence.
Design/methodology/approach
A process and product-oriented, cross-sectional, quasi-experimental study was conducted with 38 students with German as a second foreign language from the four years of the Bachelor’s degree in Translation and Interpreting at Universitat Autònoma de Barcelona, and ten professional translators.
Findings
Translation students use a wider variety of resources, perform more queries and spend more time on queries than translators when solving cultural translation problems. The students’ information-seeking process is generally less efficient than that of the translators. Training has little impact on the students’ use of electronic information resources for this specific purpose, since all students use them similarly regardless of the year they are in.
Research limitations/implications
The study has been conducted with a small sample and only one language pair from a single pedagogical context. The tendencies observed cannot be generalised to the whole population of translation students.
Practical implications
This paper has implications for translator training, as it encourages the development of efficient information-seeking processes for the resolution of cultural translation problems.
Originality/value
Unlike other studies, this paper focusses on a specific translation problem type. It provides information related to the students’ information-seeking strategies for the resolution of cultural translation problems, which can be useful for translation training.
Details
Keywords
The aim of the current paper is to test whether query translation is beneficial in web retrieval.
Abstract
Purpose
The aim of the current paper is to test whether query translation is beneficial in web retrieval.
Design/methodology/approach
The language pairs were Finnish‐Swedish, English‐German and Finnish‐French. A total of 12‐18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary‐based system. In English‐German, also machine translation was utilized. The author used Google as the search engine.
Findings
The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query‐translation were better than in the traditional laboratory tests.
Originality/value
This research shows that query translation in web is beneficial especially for users with moderate and non‐active language skills. This is valuable information for developers of cross‐language information retrieval systems.
Details
Keywords
Tuomas Talvensaari, Jorma Laurikkala, Kalervo Järvelin and Martti Juhola
To present a method for creating a comparable document collection from two document collections in different languages.
Abstract
Purpose
To present a method for creating a comparable document collection from two document collections in different languages.
Design/methodology/approach
The best query keys were extracted from a Finnish source collection (articles of the newspaper Aamulehti) with the relative average term frequency formula. The keys were translated into English with a dictionary‐based query translation program. The resulting lists of words were used as queries that were run against the target collection (Los Angeles Times articles) with the nearest neighbor method. The documents were aligned with unrestricted and date‐restricted alignment schemes, which were also combined.
Findings
The combined alignment scheme was found the best, when the relatedness of the document pairs was assessed with a five‐degree relevance scale. Of the 400 document pairs, roughly 40 percent were highly or fairly related and 75 percent included at least lexical similarity.
Research limitations/implications
The number of alignment pairs was small due to the short common time period of the two collections, and their geographical (and thus, topical) remoteness. In future, our aim is to build larger comparable corpora in various languages and use them as source of translation knowledge for the purposes of cross‐language information retrieval (CLIR).
Practical implications
Readily available parallel corpora are scarce. With this method, two unrelated document collections can relatively easily be aligned to create a CLIR resource.
Originality/value
The method can be applied to weakly linked collections and morphologically complex languages, such as Finnish.
Details
Keywords
This paper aims to investigate the multiple language support features in internet search engines. The diversity of the internet is reflected not only in its users, information…
Abstract
Purpose
This paper aims to investigate the multiple language support features in internet search engines. The diversity of the internet is reflected not only in its users, information formats and information content, but also in the languages used. As more and more information becomes available in different languages, multiple language support in a search engine becomes more important.
Design/methodology/approach
The first step of this study is to conduct a survey about existing search engines and to identify search engines with multiple language support features. The second step is to analyse, compare, and characterise the multiple language support features in the selected search engines against the proposed five basic evaluation criteria after they are classified into three categories. Finally, the strengths and weaknesses of the multiple language support features in the selected search engines are discussed in detail.
Findings
The findings reveal that Google, EZ2Find, and Onlinelink respectively are the search engines with the best multiple language support features in their categories. Although many search engines are equipped with multiple language support features, an indispensable translation feature is implemented in only a few search engines. Multiple language support features in search engines remain at the lexical level.
Originality/value
The findings of the study will facilitate understanding of the current status of multiple language support in search engines, help users to effectively utilise multiple language support features in a search engine, and provide useful advice and suggestions for search engine researchers, designers and developers.
Details
Keywords
María‐Dolores Olvera‐Lobo and Lola García‐Santiago
This study aims to focus on the evaluation of systems for the automatic translation of questions destined to translingual question‐answer (QA) systems. The efficacy of online…
Abstract
Purpose
This study aims to focus on the evaluation of systems for the automatic translation of questions destined to translingual question‐answer (QA) systems. The efficacy of online translators when performing as tools in QA systems is analysed using a collection of documents in the Spanish language.
Design/methodology/approach
Automatic translation is evaluated in terms of the functionality of actual translations produced by three online translators (Google Translator, Promt Translator, and Worldlingo) by means of objective and subjective evaluation measures, and the typology of errors produced was identified. For this purpose, a comparative study of the quality of the translation of factual questions of the CLEF collection of queries was carried out, from German and French to Spanish.
Findings
It was observed that the rates of error for the three systems evaluated here are greater in the translations pertaining to the language pair German‐Spanish. Promt was identified as the most reliable translator of the three (on average) for the two linguistic combinations evaluated. However, for the Spanish‐German pair, a good assessment of the Google online translator was obtained as well. Most errors (46.38 percent) tended to be of a lexical nature, followed by those due to a poor translation of the interrogative particle of the query (31.16 percent).
Originality/value
The evaluation methodology applied focuses above all on the finality of the translation. That is, does the resulting question serve as effective input into a translingual QA system? Thus, instead of searching for “perfection”, the functionality of the question and its capacity to lead one to an adequate response are appraised. The results obtained contribute to the development of improved translingual QA systems.
Details
Keywords
Dong Zhou, Séamus Lawless, Xuan Wu, Wenyu Zhao and Jianxun Liu
With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native…
Abstract
Purpose
With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion.
Design/methodology/approach
The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods.
Findings
Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suitable for users with a sufficient amount of historical data. The study also confirmed that user profiles represented by latent semantic models trained on a cross-lingual level gained better performance than the models trained on a monolingual level.
Originality/value
Previous studies on personalized information retrieval systems have primarily investigated user profiles and personalization strategies on a monolingual level. The effect of utilizing such monolingual profiles for personalized CLIR remains unclear. The current study fills the gap by a comprehensive study of user profile representation for personalized CLIR and a novel personalized CLIR evaluation methodology to ensure repeatable and controlled experiments can be conducted.
Details
Keywords
Tomasz Neugebauer and Elaine Menard
This paper aims to present the third stage of a research project that aims to develop a bilingual interface for the retrieval of digital images. The requirements and…
Abstract
Purpose
This paper aims to present the third stage of a research project that aims to develop a bilingual interface for the retrieval of digital images. The requirements and implementation of the search engine are described. Image search engines attempt to give access to a range of online images available on the web.
Design/methodology/approach
The strategy of using open-source software components as much as possible was chosen for the advantages of this approach: low initial cost and accessibility to evaluate and develop enhancements independently and driven by research objectives rather than financial viability.
Findings
Open-source software components can be used to develop the interface. The implementation of the image search engine and its indexes uses: Apache Solr, AJAX-Solr, jsTree and jQuery. Microsoft Translator web service was integrated into the interface to provide the optional user query translation.
Originality/value
The search interface is intended to be an innovative tool for image searchers who are looking for digital images. The search interface gives the image searchers the opportunity to easily access a variety of visual resources and facilitates searching for images in two different languages (English and French).
Details