To read this content please select one of the options below:

Set of tuples expansion by example with reliability

Ngurah Agus Sanjaya Er (Department of Computer Science and Networks, Télécom ParisTech, Paris, France)
Mouhamadou Lamine Ba (Université Alioune Diop de Bambey, Bambey, Senegal)
Talel Abdessalem (Department of Computer Science and Networks, Télécom ParisTech, Paris, France)
Stéphane Bressan (School of Computing, National University of Singapore, Singapore, Singapore)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 6 November 2017

88

Abstract

Purpose

This paper aims to focus on the design of algorithms and techniques for an effective set expansion. A tool that finds and extracts candidate sets of tuples from the World Wide Web was designed and implemented. For instance, when a given user provides <Indonesia, Jakarta, Indonesian Rupiah>, <China, Beijing, Yuan Renminbi>, <Canada, Ottawa, Canadian Dollar> as seeds, our system returns tuples composed of countries with their corresponding capital cities and currency names constructed from content extracted from Web pages retrieved.

Design/methodology/approach

The seeds are used to query a search engine and to retrieve relevant Web pages. The seeds are also used to infer wrappers from the retrieved pages. The wrappers, in turn, are used to extract candidates. The Web pages, wrappers, seeds and candidates, as well as their relationships, are vertices and edges of a heterogeneous graph. Several options for ranking candidates from PageRank to truth finding algorithms were evaluated and compared. Remarkably, all vertices are ranked, thus providing an integrated approach to not only answer direct set expansion questions but also find the most relevant pages to expand a given set of seeds.

Findings

The experimental results show that leveraging the truth finding algorithm can indeed improve the level of confidence in the extracted candidates and the sources.

Originality/value

Current approaches on set expansion mostly support sets of atomic data expansion. This idea can be extended to the sets of tuples and extract relation instances from the Web given a handful set of tuple seeds. A truth finding algorithm is also incorporated into the approach and it is shown that it can improve the confidence level in the ranking of both candidates and sources in set of tuples expansion.

Keywords

Citation

Er, N.A.S., Ba, M.L., Abdessalem, T. and Bressan, S. (2017), "Set of tuples expansion by example with reliability", International Journal of Web Information Systems, Vol. 13 No. 4, pp. 425-444. https://doi.org/10.1108/IJWIS-04-2017-0037

Publisher

:

Emerald Publishing Limited

Copyright © 2017, Emerald Publishing Limited

Related articles