Search results
1 – 10 of over 1000This paper describes the research and development activities carried out in the framework of the Translearn project. The aim of the project is to build a translation memory tool…
Abstract
This paper describes the research and development activities carried out in the framework of the Translearn project. The aim of the project is to build a translation memory tool and the appropriate translation work environment. Translearn's application corpus consists of regulations and directives of the European Union (EU), extracted from the CELEX database, the EU's documentation system on EU law, and the language versions it concentrates on are English, French, Portuguese and Greek. The development of the prototype tool for the envisaged system proves the application's usefulness in the translation process of international multilingual organizations as well as in the localization‐internationalization process of international enterprises.
Chengzhi Zhang and Dan Wu
Terminology is the set of technical words or expressions used in specific contexts, which denotes the core concept in a formal discipline and is usually applied in the fields of…
Abstract
Purpose
Terminology is the set of technical words or expressions used in specific contexts, which denotes the core concept in a formal discipline and is usually applied in the fields of machine translation, information retrieval, information extraction and text categorization, etc. Bilingual terminology extraction plays an important role in the application of bilingual dictionary compilation, bilingual ontology construction, machine translation and cross‐language information retrieval etc. This paper aims to address the issues of monolingual terminology extraction and bilingual term alignment based on multi‐level termhood.
Design/methodology/approach
A method based on multi‐level termhood is proposed. The new method computes the termhood of the terminology candidate as well as the sentence that includes the terminology by the comparison of the corpus. Since terminologies and general words usually have different distribution in the corpus, termhood can also be used to constrain and enhance the performance of term alignment when aligning bilingual terms on the parallel corpus. In this paper, bilingual term alignment based on termhood constraints is presented.
Findings
Experimental results show multi‐level termhood can get better performance than the existing method for terminology extraction. If termhood is used as a constraining factor, the performance of bilingual term alignment can be improved.
Originality/value
The termhood of the candidate terminology and the sentence that includes the terminology is used for terminology extraction, which is called multi‐level termhood. Multi‐level termhood is computed by the comparison of the corpus. Bilingual term alignment method based on termhood constraint is put forward and termhood is used in the task of bilingual terminology extraction. Experimental results show that termhood constraints can improve the performance of terminology alignment to some extent.
Details
Keywords
The purpose of this paper is to address the knowledge acquisition bottleneck problem in natural language processing by introducing a new rule‐based approach for the automatic…
Abstract
Purpose
The purpose of this paper is to address the knowledge acquisition bottleneck problem in natural language processing by introducing a new rule‐based approach for the automatic acquisition of linguistic knowledge.
Design/methodology/approach
The author has developed a new machine translation methodology that only requires a bilingual lexicon and a parallel corpus of surface sentences aligned at the sentence level to learn new transfer rules.
Findings
A first prototype of a web‐based Japanese‐English translation system called Japanese‐English translation using corpus‐based acquisition of transfer (JETCAT) has been implemented in SWI‐Prolog, and a Greasemonkey user script to analyze Japanese web pages and translate sentences via Ajax. In addition, linguistic information is displayed at the character, word, and sentence level to provide a useful tool for web‐based language learning. An important feature is customization; the user can simply correct translation results leading to an incremental update of the knowledge base.
Research limitations/implications
This paper focuses on the technical aspects and user interface issues of JETCAT. The author is planning to use JETCAT in a classroom setting to gather first experiences and will then evaluate a real‐world deployment; also work has started on extending JETCAT to include collaborative features.
Practical implications
The research has a high practical impact on academic language education. It also could have implications for the translation industry by superseding certain translation tasks and, on the other hand, adding value and quality to others.
Originality/value
The paper presents an extended version of the paper receiving the Emerald Web Information Systems Best Paper Award at iiWAS2010.
Details
Keywords
Imad Zeroual and Abdelhak Lakhouaja
Recently, more data-driven approaches are demanding multilingual parallel resources primarily in the cross-language studies. To meet these demands, building multilingual parallel…
Abstract
Recently, more data-driven approaches are demanding multilingual parallel resources primarily in the cross-language studies. To meet these demands, building multilingual parallel corpora are becoming the focus of many Natural Language Processing (NLP) scientific groups. Unlike monolingual corpora, the number of available multilingual parallel corpora is limited. In this paper, the MulTed, a corpus of subtitles extracted from TEDx talks is introduced. It is multilingual, Part of Speech (PoS) tagged, and bilingually sentence-aligned with English as a pivot language. This corpus is designed for many NLP applications, where the sentence-alignment, the PoS tagging, and the size of corpora are influential such as statistical machine translation, language recognition, and bilingual dictionary generation. Currently, the corpus has subtitles that cover 1100 talks available in over 100 languages. The subtitles are classified based on a variety of topics such as Business, Education, and Sport. Regarding the PoS tagging, the Treetagger, a language-independent PoS tagger, is used; then, to make the PoS tagging maximally useful, a mapping process to a universal common tagset is performed. Finally, we believe that making the MulTed corpus available for a public use can be a significant contribution to the literature of NLP and corpus linguistics, especially for under-resourced languages.
Details
Keywords
Yuto Ishida, Takahiro Uchiya and Ichi Takumi
In recent years, e-commerce (EC) sites dealing in various goods and services have increased along with internet popularity. Now, very few EC recommendation systems present a…
Abstract
Purpose
In recent years, e-commerce (EC) sites dealing in various goods and services have increased along with internet popularity. Now, very few EC recommendation systems present a concrete reason for their recommendations. Therefore, because user preferences strongly influence outcomes, evaluation and selection are difficult for items, such as books, movies and luxury goods. The purpose of this paper is evoking interest by showing the review as a reason for a user’s decision-making factor. This paper aims to presents the development and introduction of a recommendation system that presents a review adapted to user preference.
Design/methodology/approach
The system presents a review to the user, which indicates the reason for matching the item contents and user preferences. Thereby, this system enables the creation of personalized reasons for recommendations.
Findings
Recommendation sentences conforming to user preferences are effective for item selection. Even with a simple method, in this paper, it was possible to present a review which is an item selection factor sufficient for the user.
Originality/value
This system can show a recommendation sentence that conforms to a user’s preferences merely from a user profile with the tag data of a product. This paper dealt in movies, but it can easily be applied even for other items.
Details
Keywords
Jelena Andonovski, Branislava Šandrih and Olivera Kitanović
This paper aims to describe the structure of an aligned Serbian-German literary corpus (SrpNemKor) contained in a digital library Bibliša. The goal of the research was to create a…
Abstract
Purpose
This paper aims to describe the structure of an aligned Serbian-German literary corpus (SrpNemKor) contained in a digital library Bibliša. The goal of the research was to create a benchmark Serbian-German annotated corpus searchable with various query expansions.
Design/methodology/approach
The presented research is particularly focused on the enhancement of bilingual search queries in a full-text search of aligned SrpNemKor collection. The enhancement is based on using existing lexical resources such as Serbian morphological electronic dictionaries and the bilingual lexical database Termi.
Findings
For the purpose of this research, the lexical database Termi is enriched with a bilingual list of German-Serbian translated pairs of lexical units. The list of correct translation pairs was extracted from SrpNemKor, evaluated and integrated into Termi. Also, Serbian morphological e-dictionaries are updated with new entries extracted from the Serbian part of the corpus.
Originality/value
A bilingual search of SrpNemKor in Bibliša is available within the user-friendly platform. The enriched database Termi enables semantic enhancement and refinement of user’s search query based on synonyms both in Serbian and German at a very high level. Serbian morphological e-dictionaries facilitate the morphological expansion of search queries in Serbian, thereby enabling the analysis of concepts and concept structures by identifying terms assigned to the concept, and by establishing relations between terms in Serbian and German which makes Bibliša a valuable Web tool that can support research and analysis of SrpNemKor.
Details
Keywords
Mehrdad Vasheghani Farahani and Zeinab Amiri
In an effort to bridge the gap between applying translation corpora, specialized terminology teaching and translation performance of undergraduate students, the purpose of this…
Abstract
Purpose
In an effort to bridge the gap between applying translation corpora, specialized terminology teaching and translation performance of undergraduate students, the purpose of this paper is to investigate the possible impacts of teaching specialized terminology of law as a specific area of inquiry on translation performance of Iranian undergraduate translation student (English–Persian language pairs). The null hypothesis of this study is that using specialized terminology does not have statistically significant impacts on the translation performance of the translation students.
Design/methodology/approach
The design of this research was experimental in that there was pretest, treatment, posttest and random sampling. In other words, this research was pre-experimental one-group pretest-posttest design. This design was used in this research as the number of subjects who participated in the research was limited. Apart from being experimental, this research enjoyed a corpus-based perspective. As Mcenery and Hardie (2012) claim, corpus-based research uses the “corpus data in order to explore a theory or hypothesis, typically one established in the current literature, in order to validate it, refute it or refine it” (p. 6). Table I shows the design of this research.
Findings
The results of this research indicated that on the whole, the posttest results had statistically significant differences with that of the pretest. In this regard, the quality of students’ translation enhanced after using the specialized terminology in the form of three types of corpora. Indeed, there was a general trend in the improved quality of the novice translators in translating specialized and subject-field terminologies in an English–Persian context.
Originality/value
This paper is original in that it probes into one of the less researched areas of Translation Studies Research and employs corpora methodology.
Details
Keywords
Wen-Feng Hsiao, Te-Min Chang and Erwin Thomas
The purpose of this paper is to propose an automatic metadata extraction and retrieval system to extract bibliographical information from digital academic documents in portable…
Abstract
Purpose
The purpose of this paper is to propose an automatic metadata extraction and retrieval system to extract bibliographical information from digital academic documents in portable document formats (PDFs).
Design/methodology/approach
The authors use PDFBox to extract text and font size information, a rule-based method to identify titles, and an Hidden Markov Model (HMM) to extract the titles and authors. Finally, the extracted titles and authors (possibly incorrect or incomplete) are sent as query strings to digital libraries (e.g. ACM, IEEE, CiteSeerX, SDOS, and Google Scholar) to retrieve the rest of metadata.
Findings
Four experiments are conducted to examine the feasibility of the proposed system. The first experiment compares two different HMM models: multi-state model and one state model (the proposed model). The result shows that one state model can have a comparable performance with multi-state model, but is more suitable to deal with real-world unknown states. The second experiment shows that our proposed model (without the aid of online query) can achieve as good performance as other researcher's model on Cora paper header dataset. In the third experiment the paper examines the performance of our system on a small dataset of 43 real PDF research papers. The result shows that our proposed system (with online query) can perform pretty well on bibliographical data extraction and even outperform the free citation management tool Zotero 3.0. Finally, the paper conducts the fourth experiment with a larger dataset of 103 papers to compare our system with Zotero 4.0. The result shows that our system significantly outperforms Zotero 4.0. The feasibility of the proposed model is thus justified.
Research limitations/implications
For academic implication, the system is unique in two folds: first, the system only uses Cora header set for HMM training, without using other tagged datasets or gazetteers resources, which means the system is light and scalable. Second, the system is workable and can be applied to extracting metadata of real-world PDF files. The extracted bibliographical data can then be imported into citation software such as endnote or refworks to increase researchers’ productivity.
Practical implications
For practical implication, the system can outperform the existing tool, Zotero v4.0. This provides practitioners good chances to develop similar products in real applications; though it might require some knowledge about HMM implementation.
Originality/value
The HMM implementation is not novel. What is innovative is that it actually combines two HMM models. The main model is adapted from Freitag and Mccallum (1999) and the authors add word features of the Nymble HMM (Bikel et al, 1997) to it. The system is workable even without manually tagging the datasets before training the model (the authors just use cora dataset to train and test on real-world PDF papers), as this is significantly different from what other works have done so far. The experimental results have shown sufficient evidence about the feasibility of our proposed method in this aspect.
Details
Keywords
Steve LeMay, Marilyn M. Helms, Bob Kimball and Dave McMahon
The purpose of this paper is to gather the current definitions of supply chain management in practical and analytical usage, to develop standards for assessing definitions and to…
Abstract
Purpose
The purpose of this paper is to gather the current definitions of supply chain management in practical and analytical usage, to develop standards for assessing definitions and to apply these standards to the most readily available definitions of the term.
Design/methodology/approach
In this research, the authors gathered the current definitions of supply chain management in practical and analytical usage from journals, textbooks, universities, and industry associations and online.
Findings
The research ends with proposed definitions for consideration. Discussion and areas for future research are included.
Research limitations/implications
Involved organizations, supply chain management programs in higher education, and professional and certifying organizations in the field need to meet and work together to research consensus on the final definition of the field, realizing that definitions can evolve, but also recognizing that a starting point is needed in this rapidly growing area.
Practical implications
The authors argue, quite simply, that a consensus definition of supply chain management is unlikely as long as we continue offering and accepting definitions that are technically unsound. Many of the current definitions violate several principles of good definitions. For these reasons, they are either empty, too restrictive, or too expansive. Until we come across or develop a definition that overcomes these limitations and agree on it, then we will still search for “the” definition without finding it. The field will become more crowded with definitions, but less certain, and progress will be restricted.
Originality/value
Theoreticians, researchers, and practitioners in a discipline require key terms in a field to share a nominal definition and prefer to have a shared real or essential definition. Yet in supply chain management, we find no such shared definition, real or nominal. Even the Council of Supply Chain Management Professional offers its definition with the caveat: “The supply chain management (SCM) profession has continued to change and evolve to fit the needs of the growing global supply chain. With the supply chain covering a broad range of disciplines, the definition of what is a supply chain can be unclear” (CSCMP, 2016).
Details
Keywords
In the COVID-19 era, sign language (SL) translation has gained attention in online learning, which evaluates the physical gestures of each student and bridges the communication…
Abstract
Purpose
In the COVID-19 era, sign language (SL) translation has gained attention in online learning, which evaluates the physical gestures of each student and bridges the communication gap between dysphonia and hearing people. The purpose of this paper is to devote the alignment between SL sequence and nature language sequence with high translation performance.
Design/methodology/approach
SL can be characterized as joint/bone location information in two-dimensional space over time, forming skeleton sequences. To encode joint, bone and their motion information, we propose a multistream hierarchy network (MHN) along with a vocab prediction network (VPN) and a joint network (JN) with the recurrent neural network transducer. The JN is used to concatenate the sequences encoded by the MHN and VPN and learn their sequence alignments.
Findings
We verify the effectiveness of the proposed approach and provide experimental results on three large-scale datasets, which show that translation accuracy is 94.96, 54.52, and 92.88 per cent, and the inference time is 18 and 1.7 times faster than listen-attend-spell network (LAS) and visual hierarchy to lexical sequence network (H2SNet) , respectively.
Originality/value
In this paper, we propose a novel framework that can fuse multimodal input (i.e. joint, bone and their motion stream) and align input streams with nature language. Moreover, the provided framework is improved by the different properties of MHN, VPN and JN. Experimental results on the three datasets demonstrate that our approaches outperform the state-of-the-art methods in terms of translation accuracy and speed.
Details