Search results
1 – 10 of 120Jelena Andonovski, Branislava Šandrih and Olivera Kitanović
This paper aims to describe the structure of an aligned Serbian-German literary corpus (SrpNemKor) contained in a digital library Bibliša. The goal of the research was to create a…
Abstract
Purpose
This paper aims to describe the structure of an aligned Serbian-German literary corpus (SrpNemKor) contained in a digital library Bibliša. The goal of the research was to create a benchmark Serbian-German annotated corpus searchable with various query expansions.
Design/methodology/approach
The presented research is particularly focused on the enhancement of bilingual search queries in a full-text search of aligned SrpNemKor collection. The enhancement is based on using existing lexical resources such as Serbian morphological electronic dictionaries and the bilingual lexical database Termi.
Findings
For the purpose of this research, the lexical database Termi is enriched with a bilingual list of German-Serbian translated pairs of lexical units. The list of correct translation pairs was extracted from SrpNemKor, evaluated and integrated into Termi. Also, Serbian morphological e-dictionaries are updated with new entries extracted from the Serbian part of the corpus.
Originality/value
A bilingual search of SrpNemKor in Bibliša is available within the user-friendly platform. The enriched database Termi enables semantic enhancement and refinement of user’s search query based on synonyms both in Serbian and German at a very high level. Serbian morphological e-dictionaries facilitate the morphological expansion of search queries in Serbian, thereby enabling the analysis of concepts and concept structures by identifying terms assigned to the concept, and by establishing relations between terms in Serbian and German which makes Bibliša a valuable Web tool that can support research and analysis of SrpNemKor.
Details
Keywords
The purpose of this paper is to address the knowledge acquisition bottleneck problem in natural language processing by introducing a new rule‐based approach for the automatic…
Abstract
Purpose
The purpose of this paper is to address the knowledge acquisition bottleneck problem in natural language processing by introducing a new rule‐based approach for the automatic acquisition of linguistic knowledge.
Design/methodology/approach
The author has developed a new machine translation methodology that only requires a bilingual lexicon and a parallel corpus of surface sentences aligned at the sentence level to learn new transfer rules.
Findings
A first prototype of a web‐based Japanese‐English translation system called Japanese‐English translation using corpus‐based acquisition of transfer (JETCAT) has been implemented in SWI‐Prolog, and a Greasemonkey user script to analyze Japanese web pages and translate sentences via Ajax. In addition, linguistic information is displayed at the character, word, and sentence level to provide a useful tool for web‐based language learning. An important feature is customization; the user can simply correct translation results leading to an incremental update of the knowledge base.
Research limitations/implications
This paper focuses on the technical aspects and user interface issues of JETCAT. The author is planning to use JETCAT in a classroom setting to gather first experiences and will then evaluate a real‐world deployment; also work has started on extending JETCAT to include collaborative features.
Practical implications
The research has a high practical impact on academic language education. It also could have implications for the translation industry by superseding certain translation tasks and, on the other hand, adding value and quality to others.
Originality/value
The paper presents an extended version of the paper receiving the Emerald Web Information Systems Best Paper Award at iiWAS2010.
Details
Keywords
Sophia Ananiadou and John McNaught
This paper assesses the degree to which established practices in terminology can provide the translation industry with the lexical means to support mediation of information…
Abstract
This paper assesses the degree to which established practices in terminology can provide the translation industry with the lexical means to support mediation of information between languages, especially where such mediation involves modification. The effects of term variation, collocation and sublanguage phraseology present problems of term choice to the translator. Current term resources cannot help much with these problems; however, tools and techniques are discussed which, in the near future, will offer translators the means to make appropriate choices of terminology.
Chuanming Yu, Haodong Xue, Manyi Wang and Lu An
Owing to the uneven distribution of annotated corpus among different languages, it is necessary to bridge the gap between low resource languages and high resource languages. From…
Abstract
Purpose
Owing to the uneven distribution of annotated corpus among different languages, it is necessary to bridge the gap between low resource languages and high resource languages. From the perspective of entity relation extraction, this paper aims to extend the knowledge acquisition task from a single language context to a cross-lingual context, and to improve the relation extraction performance for low resource languages.
Design/methodology/approach
This paper proposes a cross-lingual adversarial relation extraction (CLARE) framework, which decomposes cross-lingual relation extraction into parallel corpus acquisition and adversarial adaptation relation extraction. Based on the proposed framework, this paper conducts extensive experiments in two tasks, i.e. the English-to-Chinese and the English-to-Arabic cross-lingual entity relation extraction.
Findings
The Macro-F1 values of the optimal models in the two tasks are 0.880 1 and 0.789 9, respectively, indicating that the proposed CLARE framework for CLARE can significantly improve the effect of low resource language entity relation extraction. The experimental results suggest that the proposed framework can effectively transfer the corpus as well as the annotated tags from English to Chinese and Arabic. This study reveals that the proposed approach is less human labour intensive and more effective in the cross-lingual entity relation extraction than the manual method. It shows that this approach has high generalizability among different languages.
Originality/value
The research results are of great significance for improving the performance of the cross-lingual knowledge acquisition. The cross-lingual transfer may greatly reduce the time and cost of the manual construction of the multi-lingual corpus. It sheds light on the knowledge acquisition and organization from the unstructured text in the era of big data.
Details
Keywords
Behnam Forouhandeh, Rodney J. Clarke and Nina Louise Reynolds
The purpose of this paper is to demonstrate the utility of systemic functional linguistics (SFL) as an underlying model to examine the similarities/differences between spoken and…
Abstract
Purpose
The purpose of this paper is to demonstrate the utility of systemic functional linguistics (SFL) as an underlying model to examine the similarities/differences between spoken and written peer-to-peer (P2P) communication.
Design/methodology/approach
An embedded mixed methods experimental design with linguistically standardized experimental stimuli was used to expose the basic linguistic differences between P2P communications that can be attributed to communication medium (spoken/written) and product type (hedonic/utilitarian).
Findings
The findings show, empirically, that consumer’s spoken language is not linguistically equivalent to that of written language. This confirms that the capability of language to convey semantic meaning in spoken communication differs from written communication. This study extends the characteristics that differentiate hedonic from utilitarian products to include lexical density (i.e. hedonic) vs lexical sparsity (i.e. utilitarian).
Research limitations/implications
The findings of this study are not wholly relevant to other forms of consumer communication (e.g. viral marketing). This research used a few SFL resources.
Practical implications
This research shows that marketers should ideally apply a semantic approach to the analysis of communications, given that communication meaning can vary across channels. Marketers may also want to focus on specific feedback channels (e.g. review site vs telephone) depending on the depth of product’s details that need to be captured. This study also offers metrics that advertisers could use to classify media and to characterize consumer segments.
Originality/value
This research shows the relevance of SFL for understanding P2P communications and has potential applications to other marketing communications.
Details
Keywords
Carmen Galvez and Félix de Moya‐Anegón
To evaluate the accuracy of conflation methods based on finite‐state transducers (FSTs).
Abstract
Purpose
To evaluate the accuracy of conflation methods based on finite‐state transducers (FSTs).
Design/methodology/approach
Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm.
Findings
The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms.
Originality/value
The report outlines the potential of transducers in their application to normalization processes.
Details
Keywords
Marilyn Domas White, Miriam Matteson and Eileen G. Abels
This paper characterizes translation as a task and aims to identify how it influences professional translators' information needs and use of resources to meet those needs.
Abstract
Purpose
This paper characterizes translation as a task and aims to identify how it influences professional translators' information needs and use of resources to meet those needs.
Design/methodology/approach
This research is exploratory and qualitative. Data are based on focus group sessions with 19 professional translators. Where appropriate, findings are related to several theories relating task characteristics and information behavior (IB).
Findings
The findings support some of Byström's findings about relationship between task and information use but also suggest new hypotheses or relationships among task, information need, and information use, including the notion of a zone of familiarity. Translators use a wide range of resources, both formal and informal, localized sources, including personal contacts with other translators, native speakers, and domain experts, to supplement their basic resources, which are different types of dictionaries. The study addresses translator problems created by the need to translate materials in less commonly taught languages.
Research limitations/implications
Focus group sessions allow only for identifying concepts, relationships, and hypotheses, not for indicating the relative importance of variables or distribution across individuals. Translation does not cover literary translation.
Practical implications
The paper suggests content and features of workstations offering access to wide range of resources for professional translators.
Originality/value
Unlike other information behavior studies of professional translators, this article focuses on a broad range of resources, not just on dictionary use. It also identifies information problems associated not only with normal task activities, but also with translators' moving out of their zone of familiarity, i.e. their range of domain, language, and style expertise. The model of translator IB is potentially generalizable to other groups and both supports and expands other task‐related research.
Details
Keywords
Krystyna K. Matusiak, Ling Meng, Ewa Barczyk and Chia-Jung Shih
The purpose of this paper is to explore multilingual access in digital libraries and to present a case study of creating bilingual metadata records for the Tse-Tsung Chow…
Abstract
Purpose
The purpose of this paper is to explore multilingual access in digital libraries and to present a case study of creating bilingual metadata records for the Tse-Tsung Chow Collection of Chinese Scrolls and Fan Paintings. The project, undertaken at the University of Wisconsin-Milwaukee Libraries, provides access to digital copies of calligraphic and painted Chinese scrolls and fans from the collection donated by Prof Tse-Tsung Chow (Cezong Zhou).
Design/methodology/approach
This paper examines the current approaches to multilingual indexing and retrieval in digital collections and presents a model of creating bilingual parallel records that combines translation with controlled vocabulary mapping.
Findings
Creating multilingual metadata records for cultural heritage materials is in an early phase of development. Bilingual metadata created through human translation and controlled vocabulary mapping represents one of the approaches to multilingual access in digital libraries. Multilingual indexing of collections of international origin addresses the linguistic needs of the target audience, connects the digitized objects to their respective cultures and contributes to richer descriptive records. The approach that relies on human translation and research can be undertaken in small-scale digitization projects of rare cultural heritage materials. Language and subject expertise are required to create bilingual metadata records.
Research limitations/implications
This paper presents the results of a case study. The approach to multilingual access that involves research, and it relies on human translation that can only be undertaken in small-scale projects.
Practical implications
This case study of creating parallel records with a combination of translation and vocabulary mapping can be useful for designing similar bilingual digital collections.
Social implications
This paper also discusses the obligations of holding institutions in undertaking digital conversion of the cultural heritage materials that originated in other countries, especially in regard to providing metadata records that reflect the language of the originating community.
Originality/value
The research and practice in multilingual indexing of cultural heritage materials are very limited. There are no standardized models of how to approach building multilingual digital collections. This case study presents a model of providing bilingual access and enhancing the intellectual control of cultural heritage collections.
Details
Keywords
Shutian Ma, Yingyi Zhang and Chengzhi Zhang
The purpose of this paper is to classify Chinese word semantic relations, which are synonyms, antonyms, hyponyms and meronymys.
Abstract
Purpose
The purpose of this paper is to classify Chinese word semantic relations, which are synonyms, antonyms, hyponyms and meronymys.
Design/methodology/approach
Basically, four simple methods are applied, ontology-based, dictionary-based, pattern-based and morpho-syntactic method. The authors make good use of search engine to build lexical and semantic resources for dictionary-based and pattern-based methods. To improve classification performance with more external resources, they also classify the given word pairs in Chinese and in English at the same time by using machine translation.
Findings
Experimental results show that the approach achieved an average F1 score of 50.87 per cent, an average accuracy of 70.36 per cent and an average recall of 40.05 per cent over all classification tasks. Synonym and antonym classification achieved high accuracy, i.e. above 90 per cent. Moreover, dictionary-based and pattern-based approaches work effectively on final data set.
Originality/value
For many natural language processing (NLP) tasks, the step of distinguishing word semantic relation can help to improve system performance, such as information extraction and knowledge graph generation. Currently, common methods for this task rely on large corpora for training or dictionaries and thesauri for inference, where limitation lies in freely data access and keeping built lexical resources up-date. This paper builds a primary system for classifying Chinese word semantic relations by seeking new ways to obtain the external resources efficiently.
Details
Keywords
BRIAN VICKERY and ALINA VICKERY
There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely…
Abstract
There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely held that less use is made of these databases than could or should be the case, and that one reason for this is that potential users find it difficult to identify which databases to search, to use the various command languages of the hosts and to construct the Boolean search statements required. This reasoning has stimulated a considerable amount of exploration and development work on the construction of search interfaces, to aid the inexperienced user to gain effective access to these databases. The aim of our paper is to review aspects of the design of such interfaces: to indicate the requirements that must be met if maximum aid is to be offered to the inexperienced searcher; to spell out the knowledge that must be incorporated in an interface if such aid is to be given; to describe some of the solutions that have been implemented in experimental and operational interfaces; and to discuss some of the problems encountered. The paper closes with an extensive bibliography of references relevant to online search aids, going well beyond the items explicitly mentioned in the text. An index to software appears after the bibliography at the end of the paper.