Indexing the invisible web: a survey

Yanbo Ru (Department of Computer Science, University of Southern California, Los Angeles, California, USA)
Ellis Horowitz (Department of Computer Science, University of Southern California, Los Angeles, California, USA)

Online Information Review

ISSN: 1468-4527

Publication date: 1 June 2005

Abstract

Purpose

The existence and continued growth of the invisible web creates a major challenge for search engines that are attempting to organize all of the material on the web into a form that is easily retrieved by all users. The purpose of this paper is to identify the challenges and problems underlying existing work in this area.

Design/methodology/approach

A discussion based on a short survey of prior work, including automated discovery of invisible web site search interfaces, automated classification of invisible web sites, label assignment and form filling, information extraction from the resulting pages, learning the query language of the search interface, building content summary for an invisible web site, selecting proper databases, integrating invisible web‐search interfaces, and accessing the performance of an invisible web site.

Findings

Existing technologies and tools for indexing the invisible web follow one of two strategies: indexing the web site interface or examining a portion of the contents of an invisible web site and indexing the results.

Originality/value

The paper is of value to those involved with information management.

Keywords

Citation

Ru, Y. and Horowitz, E. (2005), "Indexing the invisible web: a survey", Online Information Review, Vol. 29 No. 3, pp. 249-265. https://doi.org/10.1108/14684520510607579

Download as .RIS

Publisher

:

Emerald Group Publishing Limited

Copyright © 2005, Emerald Group Publishing Limited

To read the full version of this content please select one of the options below

You may be able to access this content by logging in via Shibboleth, Open Athens or with your Emerald account.
To rent this content from Deepdyve, please click the button.
If you think you should have access to this content, click the button to contact our support team.