Learning representations of Web entities for entity resolution

Luciano Barbosa (Universidade Federal de Pernambuco, Recife, Brazil)

International Journal of Web Information Systems

ISSN: 1744-0084

Publication date: 19 August 2019

Abstract

Purpose

Matching instances of the same entity, a task known as entity resolution, is a key step in the process of data integration. This paper aims to propose a deep learning network that learns different representations of Web entities for entity resolution.

Design/methodology/approach

To match Web entities, the proposed network learns the following representations of entities: embeddings, which are vector representations of the words in the entities in a low-dimensional space; convolutional vectors from a convolutional layer, which capture short-distance patterns in word sequences in the entities; and bag-of-word vectors, created by a bow layer that learns weights for words in the vocabulary based on the task at hand. Given a pair of entities, the similarity between their learned representations is used as a feature to a binary classifier that identifies a possible match. In addition to those features, the classifier also uses a modification of inverse document frequency for pairs, which identifies discriminative words in pairs of entities.

Findings

The proposed approach was evaluated in two commercial and two academic entity resolution benchmarking data sets. The results have shown that the proposed strategy outperforms previous approaches in the commercial data sets, which are more challenging, and have similar results to its competitors in the academic data sets.

Originality/value

No previous work has used a single deep learning framework to learn different representations of Web entities for entity resolution.

Keywords

Citation

Barbosa, L. (2019), "Learning representations of Web entities for entity resolution", International Journal of Web Information Systems, Vol. 15 No. 3, pp. 346-358. https://doi.org/10.1108/IJWIS-07-2018-0059

Download as .RIS

Publisher

:

Emerald Publishing Limited

Copyright © 2018, Emerald Publishing Limited

Please note you might not have access to this content

You may be able to access this content by login via Shibboleth, Open Athens or with your Emerald account.
If you would like to contact us about accessing this content, click the button and fill out the form.
To rent this content from Deepdyve, please click the button.