The Invisible Web

David Bawden (Senior Lecturer, Department of Information Science, City University, London, UK)

Program: electronic library and information systems

ISSN: 0033-0337

Article publication date: 1 March 2002

78

Keywords

Citation

Bawden, D. (2002), "The Invisible Web", Program: electronic library and information systems, Vol. 36 No. 1, pp. 51-52. https://doi.org/10.1108/prog.2002.36.1.51.2

Publisher

:

Emerald Group Publishing Limited

Copyright © 2002, MCB UP Limited


One of Aslib’s series of short “Know‐How Guides”, this book shows the strengths and weaknesses of the series in general. On the plus side, it addresses a topical issue in a clearly written and well‐structured manner. On the negative side, the required brevity of the format allows the author space to deal with main points, and not to get involved in detail on any; the result can be a “yes, and …” response from the reader.

Pedley’s topic is one which seems to be in danger of becoming a rather over‐hyped topic of the moment: the “invisible Web”. He explains this as information on Web pages, which conventional search engines cannot or do not index. Examples are searchable databases, electronic discussion lists, directories, library catalogues, collections of PDF files, and so on. Naturally, since the stuff is invisible, it is difficult to assess quite how much there is, but a Brightplanet survey, on which Pedley places some reliance for background, suggests that it amounts to 500 times the material of the “visible” Web, i.e. that is accessible to search engines.

This puts one in mind of the mathematician, Stanislaw Ulam, speaking of the advent of chaos theory research, which showed the ubiquity of non‐linear phenomena in nature, that to use the phrase non‐linear at all was like calling all of zoology, except for elephants, “non‐elephant studies” (Gleick, 1998). By this reckoning, virtually the whole of the Web is hidden, with the visible surface a rather unimportant addendum. Of course, this is not a true reflection. Indeed, the phrase “invisible Web” is itself rather unhelpful. As Pedley points out, other terminology is in use. Brightplanet like to talk of the “deep Web”, while another, arguably more helpful, breakdown distinguishes the “open Web”, the “gated Web” and the “professional Web” (Basch and Bates, 2000).

In any event, Pedley begins his book with a lucid account, illuminated by statistics and examples, of how and why material which is accessible through a Web browser and Internet connection cannot be found by conventional search engines. In essence, as has been said above, this is because it is in systems which must be searched interactively to find specific pieces on information; it is easy enough to find the (static) home pages for the British Library public catalogue, or the Dialog online host system with a search engine, but to find details of a specific book in the catalogue, or a reference in a Dialog database, requires recourse to that specific searching system to produce a (dynamic) Web page with the details needed.

This much Pedley explains clearly. Now what? Well, what follows is, in effect, an annotated listing of general tools giving access to particular sources of this sort of information; for the most part these are directories of similar forms of source material: PDF files, library catalogues, public record databases, and so on. This is quite useful, and I am not aware that it can be found anywhere else, though there is a degree of irony in the provision of a printed list of Web sources, when Pedley has devoted some time in the earlier part of the book to lamenting the short persistence of URLs. The rest of the book is largely given over to a “selective list of invisible Web resources”. This is so selective as to be bizarre in places: health care and medicine, for example, is represented by two instantiations of the Medline literature database, and two UK National Health Service sites.

While these sections provide useful examples for readers who have got the idea of what is going on, I fear that they could leave some people floundering among a large assortment of varied material. It would be helpful to have some clearer structure, or distinction among the types of resources listed here, perhaps along the lines of Robinson’s strategic approach to resource listings (Robinson, 2000).

Finally, and commendably, Pedley presents a number of “worked examples” of the use of this kind of source; would that more writers of books of this kind did the same. The book rather tails off, with a reasonable bibliography, a glossary with just five entries, and a rather unhelpful context‐free index.

Overall, this book will be a useful reminder to a wide readership that there is more information available via the Web than comes from your favourite search engine. However, those with some library and information science background will probably want more detail and structure than are provided here; those without such a background may, sadly, end up confused.

References

Basch, R. and Bates, M. (2000), Researching Online for Dummies, IDG Books, Foster City, CA.

Gleick, J. (1988), Chaos, Heinemann, London.

Robinson, L. (2000), “A strategic approach to research using Internet tools and resources”, Aslib Proceedings, Vol. 52 No. 1, January, pp. 11‐19.

Related articles