Search results

1 – 2 of 2
Article
Publication date: 12 November 2018

Aleksandra Tomašević, Ranka Stanković, Miloš Utvić, Ivan Obradović and Božo Kolonja

This paper aims to develop a system, which would enable efficient management and exploitation of documentation in electronic form, related to mining projects, with information…

Abstract

Purpose

This paper aims to develop a system, which would enable efficient management and exploitation of documentation in electronic form, related to mining projects, with information retrieval and information extraction (IE) features, using various language resources and natural language processing.

Design/methodology/approach

The system is designed to integrate textual, lexical, semantic and terminological resources, enabling advanced document search and extraction of information. These resources are integrated with a set of Web services and applications, for different user profiles and use-cases.

Findings

The use of the system is illustrated by examples demonstrating keyword search supported by Web query expansion services, search based on regular expressions, corpus search based on local grammars, followed by extraction of information based on this search and finally, search with lexical masks using domain and semantic markers.

Originality/value

The presented system is the first software solution for implementation of human language technology in management of documentation from the mining engineering domain, but it is also applicable to other engineering and non-engineering domains. The system is independent of the type of alphabet (Cyrillic and Latin), which makes it applicable to other languages of the Balkan region related to Serbian, and its support for morphological dictionaries can be applied in most morphologically complex languages, such as Slavic languages. Significant search improvements and the efficiency of IE are based on semantic networks and terminology dictionaries, with the support of local grammars.

Details

The Electronic Library, vol. 36 no. 6
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 24 May 2018

Vesna Pajić, Staša Vujičić Stanković, Ranka Stanković and Miloš Pajić

A hybrid approach is presented, which combines linguistic and statistical information to semi-automatically extract multiword term candidates from texts.

Abstract

Purpose

A hybrid approach is presented, which combines linguistic and statistical information to semi-automatically extract multiword term candidates from texts.

Design/methodology/approach

The method is designed to be domain and language independent, focusing on languages with rich morphology. Here, it is used for extracting multiword terms from texts in Serbian, belonging to the agricultural engineering domain, as a use case. Predefined syntactic structures were used for multiword terms. For each structure, a finite state transducer was developed, which recognizes text sequences having that structure and outputs the sequence in a normalized form, so that different inflectional forms of the same multiword term can be counted properly. Term candidates were further filtered by their frequencies and evaluated by two domain experts.

Findings

By using language resources, such as electronic dictionaries and grammars, 928 multiword terms were extracted out of 1,523 multiword terms that were recognized as candidates from a corpus having 42,260 different simple word forms; 870 of these were new, not already contained in the existing electronic dictionary of compounds for Serbian, and they were used to enrich the dictionary.

Originality/value

The paper presents methodology that can significantly contribute to the development of terminology lexicons in different areas. In this particular use case, some important agricultural engineering concepts were extracted from the text, but this approach could be used for other domains and languages as well.

Details

The Electronic Library, vol. 36 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Access

Year

All dates (2)

Content type

1 – 2 of 2