Search results

1 – 3 of 3
Article
Publication date: 10 December 2018

Bruno C.N. Oliveira, Alexis Huf, Ivan Luiz Salvadori and Frank Siqueira

This paper describes a software architecture that automatically adds semantic capabilities to data services. The proposed architecture, called OntoGenesis, is able to semantically…

Abstract

Purpose

This paper describes a software architecture that automatically adds semantic capabilities to data services. The proposed architecture, called OntoGenesis, is able to semantically enrich data services, so that they can dynamically provide both semantic descriptions and data representations.

Design/methodology/approach

The enrichment approach is designed to intercept the requests from data services. Therefore, a domain ontology is constructed and evolved in accordance with the syntactic representations provided by such services in order to define the data concepts. In addition, a property matching mechanism is proposed to exploit the potential data intersection observed in data service representations and external data sources so as to enhance the domain ontology with new equivalences triples. Finally, the enrichment approach is capable of deriving on demand a semantic description and data representations that link to the domain ontology concepts.

Findings

Experiments were performed using real-world datasets, such as DBpedia, GeoNames as well as open government data. The obtained results show the applicability of the proposed architecture and that it can boost the development of semantic data services. Moreover, the matching approach achieved better performance when compared with other existing approaches found in the literature.

Research limitations/implications

This work only considers services designed as data providers, i.e., services that provide an interface for accessing data sources. In addition, our approach assumes that both data services and external sources – used to enhance the domain ontology – have some potential of data intersection. Such assumption only requires that services and external sources share particular property values.

Originality/value

Unlike most of the approaches found in the literature, the architecture proposed in this paper is meant to semantically enrich data services in such way that human intervention is minimal. Furthermore, an automata-based index is also presented as a novel method that significantly improves the performance of the property matching mechanism.

Details

International Journal of Web Information Systems, vol. 15 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 21 August 2017

Ivan Luiz Salvadori, Alexis Huf, Bruno C.N. Oliveira, Ronaldo dos Santos Mello and Frank Siqueira

This paper aims to propose a method based on Linked Data and Semantic Web principles for composing microservices through data integration. Two frameworks that provide support for…

Abstract

Purpose

This paper aims to propose a method based on Linked Data and Semantic Web principles for composing microservices through data integration. Two frameworks that provide support for the proposed composition method are also described in this paper: Linkedator, which is responsible for connecting entities managed by microservices, and Alignator, which aligns semantic concepts defined by heterogeneous ontologies.

Design/methodology/approach

The proposed method is based on entity linking principles and uses individual matching techniques considering a formal notion of identity. This method imposes two major constraints that must be taken into account by its implementation: architectural constraints and resource design constraints.

Findings

Experiments were performed in a real-world scenario, using public government data. The obtained results show the effectiveness of the proposed method and that, it leverages the independence of development and composability of microservices. Thereby, the data provided by microservices that adopt heterogeneous ontologies can now be linked together.

Research limitations/implications

This work only considers microservices designed as data providers. Microservices designed to execute functionalities in a given application domain are out of the scope of this work.

Originality/value

The proposed composition method exploits the potential data intersection observed in resource-oriented microservice descriptions, providing a navigable view of data provided by a set of interrelated microservices. Furthermore, this study explores the applicability of ontology alignments for composing microservices.

Details

International Journal of Web Information Systems, vol. 13 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 11 May 2020

Bojan Bozic, Andre Rios and Sarah Jane Delany

This paper aims to investigate the methods for the prediction of tags on a textual corpus that describes diverse data sets based on short messages; as an example, the authors…

Abstract

Purpose

This paper aims to investigate the methods for the prediction of tags on a textual corpus that describes diverse data sets based on short messages; as an example, the authors demonstrate the usage of methods based on hotel staff inputs in a ticketing system as well as the publicly available StackOverflow corpus. The aim is to improve the tagging process and find the most suitable method for suggesting tags for a new text entry.

Design/methodology/approach

The paper consists of two parts: exploration of existing sample data, which includes statistical analysis and visualisation of the data to provide an overview, and evaluation of tag prediction approaches. The authors have included different approaches from different research fields to cover a broad spectrum of possible solutions. As a result, the authors have tested a machine learning model for multi-label classification (using gradient boosting), a statistical approach (using frequency heuristics) and three similarity-based classification approaches (nearest centroid, k-nearest neighbours (k-NN) and naive Bayes). The experiment that compares the approaches uses recall to measure the quality of results. Finally, the authors provide a recommendation of the modelling approach that produces the best accuracy in terms of tag prediction on the sample data.

Findings

The authors have calculated the performance of each method against the test data set by measuring recall. The authors show recall for each method with different features (except for frequency heuristics, which does not provide the option to add additional features) for the dmbook pro and StackOverflow data sets. k-NN clearly provides the best recall. As k-NN turned out to provide the best results, the authors have performed further experiments with values of k from 1–10. This helped us to observe the impact of the number of neighbours used on the performance and to identify the best value for k.

Originality/value

The value and originality of the paper are given by extensive experiments with several methods from different domains. The authors have used probabilistic methods, such as naive Bayes, statistical methods, such as frequency heuristics, and similarity approaches, such as k-NN. Furthermore, the authors have produced results on an industrial-scale data set that has been provided by a company and used directly in their project, as well as a community-based data set with a large amount of data and dimensionality. The study results can be used to select a model based on diverse corpora for a specific use case, taking into account advantages and disadvantages when applying the model to your data.

Details

International Journal of Web Information Systems, vol. 16 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 3 of 3