Search results

1 – 10 of over 2000
Article
Publication date: 21 September 2012

Dan Wu and Daqing He

This paper seeks to examine the further integration of machine translation technologies with cross language information access in providing web users the capabilities of accessing…

1051

Abstract

Purpose

This paper seeks to examine the further integration of machine translation technologies with cross language information access in providing web users the capabilities of accessing information beyond language barriers. Machine translation and cross language information access are related technologies, and yet they have their own unique contributions in handling information in multiple languages. This paper aims to demonstrate that there are many opportunities to further integrate machine translation with cross language information access, and the combination can greatly empower web users in their information access.

Design/methodology/approach

Using English and Chinese as the language pair for studying, this paper looks at machine translation in query translation‐based cross language information access at multiple important aspects, which include query translation, relevance feedback, interactive cross language information access, out‐of‐vocabulary term translation, and data fusion. The goal is to obtain more insights about the wide range usages of machine translation in cross language information access, and to help the community to identify promising future directions for both machine translation and cross language access.

Findings

Machine translation can be applied effectively in many places in the whole cross language information access process. Queries translated by a machine translation system are high quality and are more robust in handling potential untranslated terms. Translation enhancement, a relevance feedback method using machine translation generated returned documents, is not only a valid technique by itself, but also helps to generate more robust cross language information access performance when combined with other relevance feedback techniques. Machine translation is also found to play a significant role in resolving untranslated terms and in data fusion.

Originality/value

This set of comparative empirical studies on integrating machine translation and cross language information access was performed on a common evaluation framework, and examined integration at multiple points of the cross language access process. The experimental results demonstrate the value of further integrating machine translation in cross language information access, and identify interesting future directions for both machine translation and cross language information access research.

Article
Publication date: 16 December 2022

Kinjal Bhargavkumar Mistree, Devendra Thakor and Brijesh Bhatt

According to the Indian Sign Language Research and Training Centre (ISLRTC), India has approximately 300 certified human interpreters to help people with hearing loss. This paper…

Abstract

Purpose

According to the Indian Sign Language Research and Training Centre (ISLRTC), India has approximately 300 certified human interpreters to help people with hearing loss. This paper aims to address the issue of Indian Sign Language (ISL) sentence recognition and translation into semantically equivalent English text in a signer-independent mode.

Design/methodology/approach

This study presents an approach that translates ISL sentences into English text using the MobileNetV2 model and Neural Machine Translation (NMT). The authors have created an ISL corpus from the Brown corpus using ISL grammar rules to perform machine translation. The authors’ approach converts ISL videos of the newly created dataset into ISL gloss sequences using the MobileNetV2 model and the recognized ISL gloss sequence is then fed to a machine translation module that generates an English sentence for each ISL sentence.

Findings

As per the experimental results, pretrained MobileNetV2 model was proven the best-suited model for the recognition of ISL sentences and NMT provided better results than Statistical Machine Translation (SMT) to convert ISL text into English text. The automatic and human evaluation of the proposed approach yielded accuracies of 83.3 and 86.1%, respectively.

Research limitations/implications

It can be seen that the neural machine translation systems produced translations with repetitions of other translated words, strange translations when the total number of words per sentence is increased and one or more unexpected terms that had no relation to the source text on occasion. The most common type of error is the mistranslation of places, numbers and dates. Although this has little effect on the overall structure of the translated sentence, it indicates that the embedding learned for these few words could be improved.

Originality/value

Sign language recognition and translation is a crucial step toward improving communication between the deaf and the rest of society. Because of the shortage of human interpreters, an alternative approach is desired to help people achieve smooth communication with the Deaf. To motivate research in this field, the authors generated an ISL corpus of 13,720 sentences and a video dataset of 47,880 ISL videos. As there is no public dataset available for ISl videos incorporating signs released by ISLRTC, the authors created a new video dataset and ISL corpus.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 16 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 5 April 2011

Werner Winiwarter

The purpose of this paper is to address the knowledge acquisition bottleneck problem in natural language processing by introducing a new rule‐based approach for the automatic…

Abstract

Purpose

The purpose of this paper is to address the knowledge acquisition bottleneck problem in natural language processing by introducing a new rule‐based approach for the automatic acquisition of linguistic knowledge.

Design/methodology/approach

The author has developed a new machine translation methodology that only requires a bilingual lexicon and a parallel corpus of surface sentences aligned at the sentence level to learn new transfer rules.

Findings

A first prototype of a web‐based Japanese‐English translation system called Japanese‐English translation using corpus‐based acquisition of transfer (JETCAT) has been implemented in SWI‐Prolog, and a Greasemonkey user script to analyze Japanese web pages and translate sentences via Ajax. In addition, linguistic information is displayed at the character, word, and sentence level to provide a useful tool for web‐based language learning. An important feature is customization; the user can simply correct translation results leading to an incremental update of the knowledge base.

Research limitations/implications

This paper focuses on the technical aspects and user interface issues of JETCAT. The author is planning to use JETCAT in a classroom setting to gather first experiences and will then evaluate a real‐world deployment; also work has started on extending JETCAT to include collaborative features.

Practical implications

The research has a high practical impact on academic language education. It also could have implications for the translation industry by superseding certain translation tasks and, on the other hand, adding value and quality to others.

Originality/value

The paper presents an extended version of the paper receiving the Emerald Web Information Systems Best Paper Award at iiWAS2010.

Details

International Journal of Web Information Systems, vol. 7 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 3 November 2020

Jagroop Kaur and Jaswinder Singh

Normalization is an important step in all the natural language processing applications that are handling social media text. The text from social media poses a different kind of…

Abstract

Purpose

Normalization is an important step in all the natural language processing applications that are handling social media text. The text from social media poses a different kind of problems that are not present in regular text. Recently, a considerable amount of work has been done in this direction, but mostly in the English language. People who do not speak English code mixed the text with their native language and posted text on social media using the Roman script. This kind of text further aggravates the problem of normalizing. This paper aims to discuss the concept of normalization with respect to code-mixed social media text, and a model has been proposed to normalize such text.

Design/methodology/approach

The system is divided into two phases – candidate generation and most probable sentence selection. Candidate generation task is treated as machine translation task where the Roman text is treated as source language and Gurmukhi text is treated as the target language. Character-based translation system has been proposed to generate candidate tokens. Once candidates are generated, the second phase uses the beam search method for selecting the most probable sentence based on hidden Markov model.

Findings

Character error rate (CER) and bilingual evaluation understudy (BLEU) score are reported. The proposed system has been compared with Akhar software and RB\_R2G system, which are also capable of transliterating Roman text to Gurmukhi. The performance of the system outperforms Akhar software. The CER and BLEU scores are 0.268121 and 0.6807939, respectively, for ill-formed text.

Research limitations/implications

It was observed that the system produces dialectical variations of a word or the word with minor errors like diacritic missing. Spell checker can improve the output of the system by correcting these minor errors. Extensive experimentation is needed for optimizing language identifier, which will further help in improving the output. The language model also seeks further exploration. Inclusion of wider context, particularly from social media text, is an important area that deserves further investigation.

Practical implications

The practical implications of this study are: (1) development of parallel dataset containing Roman and Gurmukhi text; (2) development of dataset annotated with language tag; (3) development of the normalizing system, which is first of its kind and proposes translation based solution for normalizing noisy social media text from Roman to Gurmukhi. It can be extended for any pair of scripts. (4) The proposed system can be used for better analysis of social media text. Theoretically, our study helps in better understanding of text normalization in social media context and opens the doors for further research in multilingual social media text normalization.

Originality/value

Existing research work focus on normalizing monolingual text. This study contributes towards the development of a normalization system for multilingual text.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 13 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 6 April 2012

Jiangping Chen, Ren Ding, Shan Jiang and Ryan Knudson

The purpose of this study is to evaluate freely available machine translation (MT) services' performance in translating metadata records.

Abstract

Purpose

The purpose of this study is to evaluate freely available machine translation (MT) services' performance in translating metadata records.

Design/methodology/approach

Randomly selected metadata records were translated from English into Chinese using Google, Bing, and SYSTRAN MT systems. These translations were then evaluated using a five point scale for both fluency and adequacy. Missing count (words not translated) and incorrect count (words incorrectly translated) were also recorded.

Findings

Concerning both fluency and adequacy, Google and Bing's translations of more than 70 percent of test data received scores equal to or greater than three, representative of “non‐native Chinese” and “much coverage,” respectively. SYSTRAN scored lowest in both measures. However, these differences were not statistically significant. A Pearson correlation analysis demonstrated a strong relationship (r=0.86) between fluency and adequacy. Missing count and incorrect count strongly correlated with fluency and adequacy.

Originality/value

Most existing digital collections can be accessed in English alone. Few digital collections in the USA support multilingual information access (MLIA) that enables users of differing languages to search, browse, recognize and use information in the collections. Human translation is one solution, but it is neither time nor cost effective for most libraries. This study serves as a first step to understand the performance of current MT systems and to design effective and efficient MLIA services for digital collections.

Details

The Electronic Library, vol. 30 no. 2
Type: Research Article
ISSN: 0264-0473

Keywords

Open Access
Article
Publication date: 17 July 2020

Imad Zeroual and Abdelhak Lakhouaja

Recently, more data-driven approaches are demanding multilingual parallel resources primarily in the cross-language studies. To meet these demands, building multilingual parallel…

2645

Abstract

Recently, more data-driven approaches are demanding multilingual parallel resources primarily in the cross-language studies. To meet these demands, building multilingual parallel corpora are becoming the focus of many Natural Language Processing (NLP) scientific groups. Unlike monolingual corpora, the number of available multilingual parallel corpora is limited. In this paper, the MulTed, a corpus of subtitles extracted from TEDx talks is introduced. It is multilingual, Part of Speech (PoS) tagged, and bilingually sentence-aligned with English as a pivot language. This corpus is designed for many NLP applications, where the sentence-alignment, the PoS tagging, and the size of corpora are influential such as statistical machine translation, language recognition, and bilingual dictionary generation. Currently, the corpus has subtitles that cover 1100 talks available in over 100 languages. The subtitles are classified based on a variety of topics such as Business, Education, and Sport. Regarding the PoS tagging, the Treetagger, a language-independent PoS tagger, is used; then, to make the PoS tagging maximally useful, a mapping process to a universal common tagset is performed. Finally, we believe that making the MulTed corpus available for a public use can be a significant contribution to the literature of NLP and corpus linguistics, especially for under-resourced languages.

Details

Applied Computing and Informatics, vol. 18 no. 1/2
Type: Research Article
ISSN: 2210-8327

Keywords

Article
Publication date: 27 April 2010

María‐Dolores Olvera‐Lobo and Lola García‐Santiago

This study aims to focus on the evaluation of systems for the automatic translation of questions destined to translingual question‐answer (QA) systems. The efficacy of online…

Abstract

Purpose

This study aims to focus on the evaluation of systems for the automatic translation of questions destined to translingual question‐answer (QA) systems. The efficacy of online translators when performing as tools in QA systems is analysed using a collection of documents in the Spanish language.

Design/methodology/approach

Automatic translation is evaluated in terms of the functionality of actual translations produced by three online translators (Google Translator, Promt Translator, and Worldlingo) by means of objective and subjective evaluation measures, and the typology of errors produced was identified. For this purpose, a comparative study of the quality of the translation of factual questions of the CLEF collection of queries was carried out, from German and French to Spanish.

Findings

It was observed that the rates of error for the three systems evaluated here are greater in the translations pertaining to the language pair German‐Spanish. Promt was identified as the most reliable translator of the three (on average) for the two linguistic combinations evaluated. However, for the Spanish‐German pair, a good assessment of the Google online translator was obtained as well. Most errors (46.38 percent) tended to be of a lexical nature, followed by those due to a poor translation of the interrogative particle of the query (31.16 percent).

Originality/value

The evaluation methodology applied focuses above all on the finality of the translation. That is, does the resulting question serve as effective input into a translingual QA system? Thus, instead of searching for “perfection”, the functionality of the question and its capacity to lead one to an adequate response are appraised. The results obtained contribute to the development of improved translingual QA systems.

Details

Journal of Documentation, vol. 66 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 6 April 2012

Wen Zeng

The paper aims to explore multilingual thesauri automation construction based on the freely available digital library resources. The key methods and study results are presented in…

Abstract

Purpose

The paper aims to explore multilingual thesauri automation construction based on the freely available digital library resources. The key methods and study results are presented in the paper. It also proposes a way that terms are automatically extracted from multilingual parallel corpus.

Design/methodology/approach

The study adopted the technology of natural language processing to analyze the linguistics characteristics of terms, and combined this with statistical analyses to extract the terms from technological documents. The methods consist of automatically extracting and filtering terms, judging and building relationship among terms, building the multilingual parallel corpus, and extracting term pairs between Chinese and foreign languages through calculating their associated probability. The experiments run on the Java test platform.

Findings

The study obtains the following conclusions: finding the similarities and differences between the Chinese thesaurus standard and international thesaurus standard. The methods for automatically extracting terms and building relationships among them are presented. Eventually the multilingual terms' translation sets are generated based on real corpora. The results of the study show that the proposed methods can obtain better performance. The effect of automatic terms' translation alignment method is better than that of traditional IBM model method.

Practical implications

The study results can provide references for further study and application of multilingual thesauri automation construction using Chinese as a pivot.

Originality/value

The paper proposes new ideas on thesaurus automation construction in the digital age. The presented method based on linguistics and statistics is a new attempt. According to the experimental results, this exploration and study is innovative and valuable. In addition, these ideas and methods give a good start for improving information services of the PRC's National Science and Technology Digital Library.

Details

The Electronic Library, vol. 30 no. 2
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 3 January 2020

Hany M. Alsalmi

Less attention has been paid to users’ interactions and behavior in studying multilingual search. Although digital library researchers have yet to assess user interaction and…

1087

Abstract

Purpose

Less attention has been paid to users’ interactions and behavior in studying multilingual search. Although digital library researchers have yet to assess user interaction and behavior in multilingual search, they have concurred that there is a need for user studies that document the extent to which information retrieval systems meet multilingual users’ needs and expectations. The paper aims to discuss these issues.

Design/methodology/approach

This study is composed of five individual cases. The case study participants were Saudi students enrolled either at a large state university or Historically Black College and University located in the same community. Research questions are, what do Saudi Digital Library (SDL) users experience when searching within the SDL in Arabic and English? And what strategies do they use if they fail to find resources? Data collected for this study were via a qualitative method called video-stimulated recall.

Findings

In the Arabic search tasks, participants realized that finding resources is not easy. Participants expressed their concerns about the lack of relevance and accuracy of results returned by the search system, indicating weak trust and confidence in the search system. Whereas in the English search task, participants felt more satisfied and confident in their ability to trust the results returned from the search system. Participants expressed their satisfaction in the search experience as it provided them with accurate and varying resources. The participants faced difficulties finding Arabic resources than English resources in the SDL.

Originality/value

This study is considered one of the earliest works in studying the information-seeking behavior of multilingual digital libraries in the Arabic language. The value of this study arises as being the first study to investigate and report the information-seeking behavior of SDL users.

Article
Publication date: 3 September 2019

Sharon O’Brien and Federico Marco Federici

The purpose of this paper is to highlight the role that language translation can play in disaster prevention and management and to make the case for increased attention to…

1994

Abstract

Purpose

The purpose of this paper is to highlight the role that language translation can play in disaster prevention and management and to make the case for increased attention to language translation in crisis communication.

Design/methodology/approach

The paper draws on literature relating to disaster management to suggest that translation is a perennial issue in crisis communication.

Findings

Although communication with multicultural and multilinguistic communities is seen as being in urgent need of attention, the authors find that the role of translation in enabling this is underestimated, if not unrecognized.

Originality/value

This paper raises awareness of the need for urgent attention to be given by scholars and practitioners to the role of translation in crisis communication.

Details

Disaster Prevention and Management: An International Journal, vol. 29 no. 2
Type: Research Article
ISSN: 0965-3562

Keywords

1 – 10 of over 2000