Search results

1 – 10 of 120
Article
Publication date: 2 September 2019

Jelena Andonovski, Branislava Šandrih and Olivera Kitanović

This paper aims to describe the structure of an aligned Serbian-German literary corpus (SrpNemKor) contained in a digital library Bibliša. The goal of the research was to create a…

Abstract

Purpose

This paper aims to describe the structure of an aligned Serbian-German literary corpus (SrpNemKor) contained in a digital library Bibliša. The goal of the research was to create a benchmark Serbian-German annotated corpus searchable with various query expansions.

Design/methodology/approach

The presented research is particularly focused on the enhancement of bilingual search queries in a full-text search of aligned SrpNemKor collection. The enhancement is based on using existing lexical resources such as Serbian morphological electronic dictionaries and the bilingual lexical database Termi.

Findings

For the purpose of this research, the lexical database Termi is enriched with a bilingual list of German-Serbian translated pairs of lexical units. The list of correct translation pairs was extracted from SrpNemKor, evaluated and integrated into Termi. Also, Serbian morphological e-dictionaries are updated with new entries extracted from the Serbian part of the corpus.

Originality/value

A bilingual search of SrpNemKor in Bibliša is available within the user-friendly platform. The enriched database Termi enables semantic enhancement and refinement of user’s search query based on synonyms both in Serbian and German at a very high level. Serbian morphological e-dictionaries facilitate the morphological expansion of search queries in Serbian, thereby enabling the analysis of concepts and concept structures by identifying terms assigned to the concept, and by establishing relations between terms in Serbian and German which makes Bibliša a valuable Web tool that can support research and analysis of SrpNemKor.

Details

The Electronic Library , vol. 37 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 5 April 2011

Werner Winiwarter

The purpose of this paper is to address the knowledge acquisition bottleneck problem in natural language processing by introducing a new rule‐based approach for the automatic…

Abstract

Purpose

The purpose of this paper is to address the knowledge acquisition bottleneck problem in natural language processing by introducing a new rule‐based approach for the automatic acquisition of linguistic knowledge.

Design/methodology/approach

The author has developed a new machine translation methodology that only requires a bilingual lexicon and a parallel corpus of surface sentences aligned at the sentence level to learn new transfer rules.

Findings

A first prototype of a web‐based Japanese‐English translation system called Japanese‐English translation using corpus‐based acquisition of transfer (JETCAT) has been implemented in SWI‐Prolog, and a Greasemonkey user script to analyze Japanese web pages and translate sentences via Ajax. In addition, linguistic information is displayed at the character, word, and sentence level to provide a useful tool for web‐based language learning. An important feature is customization; the user can simply correct translation results leading to an incremental update of the knowledge base.

Research limitations/implications

This paper focuses on the technical aspects and user interface issues of JETCAT. The author is planning to use JETCAT in a classroom setting to gather first experiences and will then evaluate a real‐world deployment; also work has started on extending JETCAT to include collaborative features.

Practical implications

The research has a high practical impact on academic language education. It also could have implications for the translation industry by superseding certain translation tasks and, on the other hand, adding value and quality to others.

Originality/value

The paper presents an extended version of the paper receiving the Emerald Web Information Systems Best Paper Award at iiWAS2010.

Details

International Journal of Web Information Systems, vol. 7 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 February 1995

Sophia Ananiadou and John McNaught

This paper assesses the degree to which established practices in terminology can provide the translation industry with the lexical means to support mediation of information…

Abstract

This paper assesses the degree to which established practices in terminology can provide the translation industry with the lexical means to support mediation of information between languages, especially where such mediation involves modification. The effects of term variation, collocation and sublanguage phraseology present problems of term choice to the translator. Current term resources cannot help much with these problems; however, tools and techniques are discussed which, in the near future, will offer translators the means to make appropriate choices of terminology.

Details

Aslib Proceedings, vol. 47 no. 2
Type: Research Article
ISSN: 0001-253X

Article
Publication date: 3 August 2021

Chuanming Yu, Haodong Xue, Manyi Wang and Lu An

Owing to the uneven distribution of annotated corpus among different languages, it is necessary to bridge the gap between low resource languages and high resource languages. From…

Abstract

Purpose

Owing to the uneven distribution of annotated corpus among different languages, it is necessary to bridge the gap between low resource languages and high resource languages. From the perspective of entity relation extraction, this paper aims to extend the knowledge acquisition task from a single language context to a cross-lingual context, and to improve the relation extraction performance for low resource languages.

Design/methodology/approach

This paper proposes a cross-lingual adversarial relation extraction (CLARE) framework, which decomposes cross-lingual relation extraction into parallel corpus acquisition and adversarial adaptation relation extraction. Based on the proposed framework, this paper conducts extensive experiments in two tasks, i.e. the English-to-Chinese and the English-to-Arabic cross-lingual entity relation extraction.

Findings

The Macro-F1 values of the optimal models in the two tasks are 0.880 1 and 0.789 9, respectively, indicating that the proposed CLARE framework for CLARE can significantly improve the effect of low resource language entity relation extraction. The experimental results suggest that the proposed framework can effectively transfer the corpus as well as the annotated tags from English to Chinese and Arabic. This study reveals that the proposed approach is less human labour intensive and more effective in the cross-lingual entity relation extraction than the manual method. It shows that this approach has high generalizability among different languages.

Originality/value

The research results are of great significance for improving the performance of the cross-lingual knowledge acquisition. The cross-lingual transfer may greatly reduce the time and cost of the manual construction of the multi-lingual corpus. It sheds light on the knowledge acquisition and organization from the unstructured text in the era of big data.

Details

The Electronic Library , vol. 39 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 19 July 2022

Behnam Forouhandeh, Rodney J. Clarke and Nina Louise Reynolds

The purpose of this paper is to demonstrate the utility of systemic functional linguistics (SFL) as an underlying model to examine the similarities/differences between spoken and…

Abstract

Purpose

The purpose of this paper is to demonstrate the utility of systemic functional linguistics (SFL) as an underlying model to examine the similarities/differences between spoken and written peer-to-peer (P2P) communication.

Design/methodology/approach

An embedded mixed methods experimental design with linguistically standardized experimental stimuli was used to expose the basic linguistic differences between P2P communications that can be attributed to communication medium (spoken/written) and product type (hedonic/utilitarian).

Findings

The findings show, empirically, that consumer’s spoken language is not linguistically equivalent to that of written language. This confirms that the capability of language to convey semantic meaning in spoken communication differs from written communication. This study extends the characteristics that differentiate hedonic from utilitarian products to include lexical density (i.e. hedonic) vs lexical sparsity (i.e. utilitarian).

Research limitations/implications

The findings of this study are not wholly relevant to other forms of consumer communication (e.g. viral marketing). This research used a few SFL resources.

Practical implications

This research shows that marketers should ideally apply a semantic approach to the analysis of communications, given that communication meaning can vary across channels. Marketers may also want to focus on specific feedback channels (e.g. review site vs telephone) depending on the depth of product’s details that need to be captured. This study also offers metrics that advertisers could use to classify media and to characterize consumer segments.

Originality/value

This research shows the relevance of SFL for understanding P2P communications and has potential applications to other marketing communications.

Details

European Journal of Marketing, vol. 56 no. 8
Type: Research Article
ISSN: 0309-0566

Keywords

Article
Publication date: 1 May 2006

Carmen Galvez and Félix de Moya‐Anegón

To evaluate the accuracy of conflation methods based on finite‐state transducers (FSTs).

Abstract

Purpose

To evaluate the accuracy of conflation methods based on finite‐state transducers (FSTs).

Design/methodology/approach

Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm.

Findings

The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms.

Originality/value

The report outlines the potential of transducers in their application to normalization processes.

Details

Journal of Documentation, vol. 62 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 25 July 2008

Marilyn Domas White, Miriam Matteson and Eileen G. Abels

This paper characterizes translation as a task and aims to identify how it influences professional translators' information needs and use of resources to meet those needs.

2046

Abstract

Purpose

This paper characterizes translation as a task and aims to identify how it influences professional translators' information needs and use of resources to meet those needs.

Design/methodology/approach

This research is exploratory and qualitative. Data are based on focus group sessions with 19 professional translators. Where appropriate, findings are related to several theories relating task characteristics and information behavior (IB).

Findings

The findings support some of Byström's findings about relationship between task and information use but also suggest new hypotheses or relationships among task, information need, and information use, including the notion of a zone of familiarity. Translators use a wide range of resources, both formal and informal, localized sources, including personal contacts with other translators, native speakers, and domain experts, to supplement their basic resources, which are different types of dictionaries. The study addresses translator problems created by the need to translate materials in less commonly taught languages.

Research limitations/implications

Focus group sessions allow only for identifying concepts, relationships, and hypotheses, not for indicating the relative importance of variables or distribution across individuals. Translation does not cover literary translation.

Practical implications

The paper suggests content and features of workstations offering access to wide range of resources for professional translators.

Originality/value

Unlike other information behavior studies of professional translators, this article focuses on a broad range of resources, not just on dictionary use. It also identifies information problems associated not only with normal task activities, but also with translators' moving out of their zone of familiarity, i.e. their range of domain, language, and style expertise. The model of translator IB is potentially generalizable to other groups and both supports and expands other task‐related research.

Details

Journal of Documentation, vol. 64 no. 4
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 2 February 2015

Krystyna K. Matusiak, Ling Meng, Ewa Barczyk and Chia-Jung Shih

The purpose of this paper is to explore multilingual access in digital libraries and to present a case study of creating bilingual metadata records for the Tse-Tsung Chow…

1330

Abstract

Purpose

The purpose of this paper is to explore multilingual access in digital libraries and to present a case study of creating bilingual metadata records for the Tse-Tsung Chow Collection of Chinese Scrolls and Fan Paintings. The project, undertaken at the University of Wisconsin-Milwaukee Libraries, provides access to digital copies of calligraphic and painted Chinese scrolls and fans from the collection donated by Prof Tse-Tsung Chow (Cezong Zhou).

Design/methodology/approach

This paper examines the current approaches to multilingual indexing and retrieval in digital collections and presents a model of creating bilingual parallel records that combines translation with controlled vocabulary mapping.

Findings

Creating multilingual metadata records for cultural heritage materials is in an early phase of development. Bilingual metadata created through human translation and controlled vocabulary mapping represents one of the approaches to multilingual access in digital libraries. Multilingual indexing of collections of international origin addresses the linguistic needs of the target audience, connects the digitized objects to their respective cultures and contributes to richer descriptive records. The approach that relies on human translation and research can be undertaken in small-scale digitization projects of rare cultural heritage materials. Language and subject expertise are required to create bilingual metadata records.

Research limitations/implications

This paper presents the results of a case study. The approach to multilingual access that involves research, and it relies on human translation that can only be undertaken in small-scale projects.

Practical implications

This case study of creating parallel records with a combination of translation and vocabulary mapping can be useful for designing similar bilingual digital collections.

Social implications

This paper also discusses the obligations of holding institutions in undertaking digital conversion of the cultural heritage materials that originated in other countries, especially in regard to providing metadata records that reflect the language of the originating community.

Originality/value

The research and practice in multilingual indexing of cultural heritage materials are very limited. There are no standardized models of how to approach building multilingual digital collections. This case study presents a model of providing bilingual access and enhancing the intellectual control of cultural heritage collections.

Details

The Electronic Library, vol. 33 no. 1
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 21 May 2018

Shutian Ma, Yingyi Zhang and Chengzhi Zhang

The purpose of this paper is to classify Chinese word semantic relations, which are synonyms, antonyms, hyponyms and meronymys.

Abstract

Purpose

The purpose of this paper is to classify Chinese word semantic relations, which are synonyms, antonyms, hyponyms and meronymys.

Design/methodology/approach

Basically, four simple methods are applied, ontology-based, dictionary-based, pattern-based and morpho-syntactic method. The authors make good use of search engine to build lexical and semantic resources for dictionary-based and pattern-based methods. To improve classification performance with more external resources, they also classify the given word pairs in Chinese and in English at the same time by using machine translation.

Findings

Experimental results show that the approach achieved an average F1 score of 50.87 per cent, an average accuracy of 70.36 per cent and an average recall of 40.05 per cent over all classification tasks. Synonym and antonym classification achieved high accuracy, i.e. above 90 per cent. Moreover, dictionary-based and pattern-based approaches work effectively on final data set.

Originality/value

For many natural language processing (NLP) tasks, the step of distinguishing word semantic relation can help to improve system performance, such as information extraction and knowledge graph generation. Currently, common methods for this task rely on large corpora for training or dictionaries and thesauri for inference, where limitation lies in freely data access and keeping built lexical resources up-date. This paper builds a primary system for classifying Chinese word semantic relations by seeking new ways to obtain the external resources efficiently.

Details

Information Discovery and Delivery, vol. 46 no. 2
Type: Research Article
ISSN: 2398-6247

Keywords

Article
Publication date: 1 February 1993

BRIAN VICKERY and ALINA VICKERY

There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely…

Abstract

There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely held that less use is made of these databases than could or should be the case, and that one reason for this is that potential users find it difficult to identify which databases to search, to use the various command languages of the hosts and to construct the Boolean search statements required. This reasoning has stimulated a considerable amount of exploration and development work on the construction of search interfaces, to aid the inexperienced user to gain effective access to these databases. The aim of our paper is to review aspects of the design of such interfaces: to indicate the requirements that must be met if maximum aid is to be offered to the inexperienced searcher; to spell out the knowledge that must be incorporated in an interface if such aid is to be given; to describe some of the solutions that have been implemented in experimental and operational interfaces; and to discuss some of the problems encountered. The paper closes with an extensive bibliography of references relevant to online search aids, going well beyond the items explicitly mentioned in the text. An index to software appears after the bibliography at the end of the paper.

Details

Journal of Documentation, vol. 49 no. 2
Type: Research Article
ISSN: 0022-0418

1 – 10 of 120