Search results

1 – 10 of over 1000
To view the access options for this content please click here
Article
Publication date: 1 March 1995

Stelios Piperidis

This paper describes the research and development activities carried out in the framework of the Translearn project. The aim of the project is to build a translation…

Abstract

This paper describes the research and development activities carried out in the framework of the Translearn project. The aim of the project is to build a translation memory tool and the appropriate translation work environment. Translearn's application corpus consists of regulations and directives of the European Union (EU), extracted from the CELEX database, the EU's documentation system on EU law, and the language versions it concentrates on are English, French, Portuguese and Greek. The development of the prototype tool for the envisaged system proves the application's usefulness in the translation process of international multilingual organizations as well as in the localization‐internationalization process of international enterprises.

Details

Aslib Proceedings, vol. 47 no. 3
Type: Research Article
ISSN: 0001-253X

To view the access options for this content please click here
Article
Publication date: 6 April 2012

Chengzhi Zhang and Dan Wu

Terminology is the set of technical words or expressions used in specific contexts, which denotes the core concept in a formal discipline and is usually applied in the…

Abstract

Purpose

Terminology is the set of technical words or expressions used in specific contexts, which denotes the core concept in a formal discipline and is usually applied in the fields of machine translation, information retrieval, information extraction and text categorization, etc. Bilingual terminology extraction plays an important role in the application of bilingual dictionary compilation, bilingual ontology construction, machine translation and cross‐language information retrieval etc. This paper aims to address the issues of monolingual terminology extraction and bilingual term alignment based on multi‐level termhood.

Design/methodology/approach

A method based on multi‐level termhood is proposed. The new method computes the termhood of the terminology candidate as well as the sentence that includes the terminology by the comparison of the corpus. Since terminologies and general words usually have different distribution in the corpus, termhood can also be used to constrain and enhance the performance of term alignment when aligning bilingual terms on the parallel corpus. In this paper, bilingual term alignment based on termhood constraints is presented.

Findings

Experimental results show multi‐level termhood can get better performance than the existing method for terminology extraction. If termhood is used as a constraining factor, the performance of bilingual term alignment can be improved.

Originality/value

The termhood of the candidate terminology and the sentence that includes the terminology is used for terminology extraction, which is called multi‐level termhood. Multi‐level termhood is computed by the comparison of the corpus. Bilingual term alignment method based on termhood constraint is put forward and termhood is used in the task of bilingual terminology extraction. Experimental results show that termhood constraints can improve the performance of terminology alignment to some extent.

To view the access options for this content please click here
Article
Publication date: 5 April 2011

Werner Winiwarter

The purpose of this paper is to address the knowledge acquisition bottleneck problem in natural language processing by introducing a new rule‐based approach for the…

Abstract

Purpose

The purpose of this paper is to address the knowledge acquisition bottleneck problem in natural language processing by introducing a new rule‐based approach for the automatic acquisition of linguistic knowledge.

Design/methodology/approach

The author has developed a new machine translation methodology that only requires a bilingual lexicon and a parallel corpus of surface sentences aligned at the sentence level to learn new transfer rules.

Findings

A first prototype of a web‐based Japanese‐English translation system called Japanese‐English translation using corpus‐based acquisition of transfer (JETCAT) has been implemented in SWI‐Prolog, and a Greasemonkey user script to analyze Japanese web pages and translate sentences via Ajax. In addition, linguistic information is displayed at the character, word, and sentence level to provide a useful tool for web‐based language learning. An important feature is customization; the user can simply correct translation results leading to an incremental update of the knowledge base.

Research limitations/implications

This paper focuses on the technical aspects and user interface issues of JETCAT. The author is planning to use JETCAT in a classroom setting to gather first experiences and will then evaluate a real‐world deployment; also work has started on extending JETCAT to include collaborative features.

Practical implications

The research has a high practical impact on academic language education. It also could have implications for the translation industry by superseding certain translation tasks and, on the other hand, adding value and quality to others.

Originality/value

The paper presents an extended version of the paper receiving the Emerald Web Information Systems Best Paper Award at iiWAS2010.

Details

International Journal of Web Information Systems, vol. 7 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Content available
Article
Publication date: 17 July 2020

Imad Zeroual and Abdelhak Lakhouaja

Recently, more data-driven approaches are demanding multilingual parallel resources primarily in the cross-language studies. To meet these demands, building multilingual…

Abstract

Recently, more data-driven approaches are demanding multilingual parallel resources primarily in the cross-language studies. To meet these demands, building multilingual parallel corpora are becoming the focus of many Natural Language Processing (NLP) scientific groups. Unlike monolingual corpora, the number of available multilingual parallel corpora is limited. In this paper, the MulTed, a corpus of subtitles extracted from TEDx talks is introduced. It is multilingual, Part of Speech (PoS) tagged, and bilingually sentence-aligned with English as a pivot language. This corpus is designed for many NLP applications, where the sentence-alignment, the PoS tagging, and the size of corpora are influential such as statistical machine translation, language recognition, and bilingual dictionary generation. Currently, the corpus has subtitles that cover 1100 talks available in over 100 languages. The subtitles are classified based on a variety of topics such as Business, Education, and Sport. Regarding the PoS tagging, the Treetagger, a language-independent PoS tagger, is used; then, to make the PoS tagging maximally useful, a mapping process to a universal common tagset is performed. Finally, we believe that making the MulTed corpus available for a public use can be a significant contribution to the literature of NLP and corpus linguistics, especially for under-resourced languages.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2210-8327

Keywords

To view the access options for this content please click here
Article
Publication date: 18 April 2017

Yuto Ishida, Takahiro Uchiya and Ichi Takumi

In recent years, e-commerce (EC) sites dealing in various goods and services have increased along with internet popularity. Now, very few EC recommendation systems present…

Abstract

Purpose

In recent years, e-commerce (EC) sites dealing in various goods and services have increased along with internet popularity. Now, very few EC recommendation systems present a concrete reason for their recommendations. Therefore, because user preferences strongly influence outcomes, evaluation and selection are difficult for items, such as books, movies and luxury goods. The purpose of this paper is evoking interest by showing the review as a reason for a user’s decision-making factor. This paper aims to presents the development and introduction of a recommendation system that presents a review adapted to user preference.

Design/methodology/approach

The system presents a review to the user, which indicates the reason for matching the item contents and user preferences. Thereby, this system enables the creation of personalized reasons for recommendations.

Findings

Recommendation sentences conforming to user preferences are effective for item selection. Even with a simple method, in this paper, it was possible to present a review which is an item selection factor sufficient for the user.

Originality/value

This system can show a recommendation sentence that conforms to a user’s preferences merely from a user profile with the tag data of a product. This paper dealt in movies, but it can easily be applied even for other items.

Details

International Journal of Web Information Systems, vol. 13 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

To view the access options for this content please click here
Article
Publication date: 2 September 2019

Jelena Andonovski, Branislava Šandrih and Olivera Kitanović

This paper aims to describe the structure of an aligned Serbian-German literary corpus (SrpNemKor) contained in a digital library Bibliša. The goal of the research was to…

Abstract

Purpose

This paper aims to describe the structure of an aligned Serbian-German literary corpus (SrpNemKor) contained in a digital library Bibliša. The goal of the research was to create a benchmark Serbian-German annotated corpus searchable with various query expansions.

Design/methodology/approach

The presented research is particularly focused on the enhancement of bilingual search queries in a full-text search of aligned SrpNemKor collection. The enhancement is based on using existing lexical resources such as Serbian morphological electronic dictionaries and the bilingual lexical database Termi.

Findings

For the purpose of this research, the lexical database Termi is enriched with a bilingual list of German-Serbian translated pairs of lexical units. The list of correct translation pairs was extracted from SrpNemKor, evaluated and integrated into Termi. Also, Serbian morphological e-dictionaries are updated with new entries extracted from the Serbian part of the corpus.

Originality/value

A bilingual search of SrpNemKor in Bibliša is available within the user-friendly platform. The enriched database Termi enables semantic enhancement and refinement of user’s search query based on synonyms both in Serbian and German at a very high level. Serbian morphological e-dictionaries facilitate the morphological expansion of search queries in Serbian, thereby enabling the analysis of concepts and concept structures by identifying terms assigned to the concept, and by establishing relations between terms in Serbian and German which makes Bibliša a valuable Web tool that can support research and analysis of SrpNemKor.

Details

The Electronic Library , vol. 37 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

To view the access options for this content please click here
Article
Publication date: 1 May 2019

Mehrdad Vasheghani Farahani and Zeinab Amiri

In an effort to bridge the gap between applying translation corpora, specialized terminology teaching and translation performance of undergraduate students, the purpose of…

Abstract

Purpose

In an effort to bridge the gap between applying translation corpora, specialized terminology teaching and translation performance of undergraduate students, the purpose of this paper is to investigate the possible impacts of teaching specialized terminology of law as a specific area of inquiry on translation performance of Iranian undergraduate translation student (English–Persian language pairs). The null hypothesis of this study is that using specialized terminology does not have statistically significant impacts on the translation performance of the translation students.

Design/methodology/approach

The design of this research was experimental in that there was pretest, treatment, posttest and random sampling. In other words, this research was pre-experimental one-group pretest-posttest design. This design was used in this research as the number of subjects who participated in the research was limited. Apart from being experimental, this research enjoyed a corpus-based perspective. As Mcenery and Hardie (2012) claim, corpus-based research uses the “corpus data in order to explore a theory or hypothesis, typically one established in the current literature, in order to validate it, refute it or refine it” (p. 6). Table I shows the design of this research.

Findings

The results of this research indicated that on the whole, the posttest results had statistically significant differences with that of the pretest. In this regard, the quality of students’ translation enhanced after using the specialized terminology in the form of three types of corpora. Indeed, there was a general trend in the improved quality of the novice translators in translating specialized and subject-field terminologies in an English–Persian context.

Originality/value

This paper is original in that it probes into one of the less researched areas of Translation Studies Research and employs corpora methodology.

Details

Journal of Applied Research in Higher Education, vol. 11 no. 3
Type: Research Article
ISSN: 2050-7003

Keywords

To view the access options for this content please click here
Article
Publication date: 1 July 2014

Wen-Feng Hsiao, Te-Min Chang and Erwin Thomas

The purpose of this paper is to propose an automatic metadata extraction and retrieval system to extract bibliographical information from digital academic documents in…

Abstract

Purpose

The purpose of this paper is to propose an automatic metadata extraction and retrieval system to extract bibliographical information from digital academic documents in portable document formats (PDFs).

Design/methodology/approach

The authors use PDFBox to extract text and font size information, a rule-based method to identify titles, and an Hidden Markov Model (HMM) to extract the titles and authors. Finally, the extracted titles and authors (possibly incorrect or incomplete) are sent as query strings to digital libraries (e.g. ACM, IEEE, CiteSeerX, SDOS, and Google Scholar) to retrieve the rest of metadata.

Findings

Four experiments are conducted to examine the feasibility of the proposed system. The first experiment compares two different HMM models: multi-state model and one state model (the proposed model). The result shows that one state model can have a comparable performance with multi-state model, but is more suitable to deal with real-world unknown states. The second experiment shows that our proposed model (without the aid of online query) can achieve as good performance as other researcher's model on Cora paper header dataset. In the third experiment the paper examines the performance of our system on a small dataset of 43 real PDF research papers. The result shows that our proposed system (with online query) can perform pretty well on bibliographical data extraction and even outperform the free citation management tool Zotero 3.0. Finally, the paper conducts the fourth experiment with a larger dataset of 103 papers to compare our system with Zotero 4.0. The result shows that our system significantly outperforms Zotero 4.0. The feasibility of the proposed model is thus justified.

Research limitations/implications

For academic implication, the system is unique in two folds: first, the system only uses Cora header set for HMM training, without using other tagged datasets or gazetteers resources, which means the system is light and scalable. Second, the system is workable and can be applied to extracting metadata of real-world PDF files. The extracted bibliographical data can then be imported into citation software such as endnote or refworks to increase researchers’ productivity.

Practical implications

For practical implication, the system can outperform the existing tool, Zotero v4.0. This provides practitioners good chances to develop similar products in real applications; though it might require some knowledge about HMM implementation.

Originality/value

The HMM implementation is not novel. What is innovative is that it actually combines two HMM models. The main model is adapted from Freitag and Mccallum (1999) and the authors add word features of the Nymble HMM (Bikel et al, 1997) to it. The system is workable even without manually tagging the datasets before training the model (the authors just use cora dataset to train and test on real-world PDF papers), as this is significantly different from what other works have done so far. The experimental results have shown sufficient evidence about the feasibility of our proposed method in this aspect.

Details

Program, vol. 48 no. 3
Type: Research Article
ISSN: 0033-0337

Keywords

To view the access options for this content please click here
Article
Publication date: 13 November 2017

Steve LeMay, Marilyn M. Helms, Bob Kimball and Dave McMahon

The purpose of this paper is to gather the current definitions of supply chain management in practical and analytical usage, to develop standards for assessing definitions…

Abstract

Purpose

The purpose of this paper is to gather the current definitions of supply chain management in practical and analytical usage, to develop standards for assessing definitions and to apply these standards to the most readily available definitions of the term.

Design/methodology/approach

In this research, the authors gathered the current definitions of supply chain management in practical and analytical usage from journals, textbooks, universities, and industry associations and online.

Findings

The research ends with proposed definitions for consideration. Discussion and areas for future research are included.

Research limitations/implications

Involved organizations, supply chain management programs in higher education, and professional and certifying organizations in the field need to meet and work together to research consensus on the final definition of the field, realizing that definitions can evolve, but also recognizing that a starting point is needed in this rapidly growing area.

Practical implications

The authors argue, quite simply, that a consensus definition of supply chain management is unlikely as long as we continue offering and accepting definitions that are technically unsound. Many of the current definitions violate several principles of good definitions. For these reasons, they are either empty, too restrictive, or too expansive. Until we come across or develop a definition that overcomes these limitations and agree on it, then we will still search for “the” definition without finding it. The field will become more crowded with definitions, but less certain, and progress will be restricted.

Originality/value

Theoreticians, researchers, and practitioners in a discipline require key terms in a field to share a nominal definition and prefer to have a shared real or essential definition. Yet in supply chain management, we find no such shared definition, real or nominal. Even the Council of Supply Chain Management Professional offers its definition with the caveat: “The supply chain management (SCM) profession has continued to change and evolve to fit the needs of the growing global supply chain. With the supply chain covering a broad range of disciplines, the definition of what is a supply chain can be unclear” (CSCMP, 2016).

Details

The International Journal of Logistics Management, vol. 28 no. 4
Type: Research Article
ISSN: 0957-4093

Keywords

To view the access options for this content please click here
Article
Publication date: 22 May 2020

Yuanxin Ouyang, Hongbo Zhang, Wenge Rong, Xiang Li and Zhang Xiong

The purpose of this paper is to propose an attention alignment method for opinion mining of massive open online course (MOOC) comments. Opinion mining is essential for…

Abstract

Purpose

The purpose of this paper is to propose an attention alignment method for opinion mining of massive open online course (MOOC) comments. Opinion mining is essential for MOOC applications. In this study, the authors analyze some of bidirectional encoder representations from transformers (BERT’s) attention heads and explore how to use these attention heads to extract opinions from MOOC comments.

Design/methodology/approach

The approach proposed is based on an attention alignment mechanism with the following three stages: first, extracting original opinions from MOOC comments with dependency parsing. Second, constructing frequent sets and using the frequent sets to prune the opinions. Third, pruning the opinions and discovering new opinions with the attention alignment mechanism.

Findings

The experiments on the MOOC comments data sets suggest that the opinion mining approach based on an attention alignment mechanism can obtain a better F1 score. Moreover, the attention alignment mechanism can discover some of the opinions filtered incorrectly by the frequent sets, which means the attention alignment mechanism can overcome the shortcomings of dependency analysis and frequent sets.

Originality/value

To take full advantage of pretrained language models, the authors propose an attention alignment method for opinion mining and combine this method with dependency analysis and frequent sets to improve the effectiveness. Furthermore, the authors conduct extensive experiments on different combinations of methods. The results show that the attention alignment method can effectively overcome the shortcomings of dependency analysis and frequent sets.

Details

Information Discovery and Delivery, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2398-6247

Keywords

1 – 10 of over 1000