Archiving Websites: A Practical Guide for Information Management Professionals

Polona Vilar (Department of Librarianship and Information Science and Book Studies, University of Ljubljana, Ljubljana, Slovenia)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 7 March 2008




Vilar, P. (2008), "Archiving Websites: A Practical Guide for Information Management Professionals", Journal of Documentation, Vol. 64 No. 2, pp. 304-305.



Emerald Group Publishing Limited

Copyright © 2008, Emerald Group Publishing Limited

The archiving of web sites has become a significant issue for librarians and other information professionals. This manual by Adrian Brown is a valuable resource for anyone who wishes to get acquainted with web archiving.

The author draws from his personal experience as manager of The (UK) National Archives' web‐archiving programme and from the experiences of other web archiving initiatives, and has managed to compile a nice set of advice and information. The book is well written and provides a readable insight into multiple aspects of web archiving.

According to the publisher, the manual does not assume a thorough knowledge of ICTs, which will be a relief for many librarians and archive managers who are set to implement web archiving software on their own. They will get valuable and useful information on, e.g. the development and background of web archiving; selection policies; collection methods; quality assurance, processing and cataloguing; preservation; delivery to users; legal issues; management of web‐archiving programme, and future trends of web archiving. As noted before, a great advantage of the book is that it does not require detailed knowledge of technical issues, but is deliberately focused on presentation of current best practices thus giving practical advice on the procedures and issues regarding web archiving.

The book opens with a glossary of terms and then consists of ten chapters. In Chapter 1 the author presents in short the technological and conceptual development of the web with the aim to bring forward the characteristics and challenges for those who want to deal with web sites archiving. The process of web archiving is described and a model of the process is presented.

In chapter 2 we find information on the history of web archiving up to and including some current web site archiving initiatives: initial projects in Nordic countries and related projects, e.g. Pandora, NEDLIB, MINERVA, as well as IIPC as one of most recent example of international collaboration.

The following six chapters deal with various stages of the web archiving process. Chapter 3 addresses issues associated with the selection of materials for archiving (e.g. policy definition and maintenance, quality assurance). This is then backed up by a thorough coverage of methods and criteria for selection and maintenance. Some issues are emphasized, e.g. difficulty of drawing the boundaries of a web site, timing and frequency of collection.

In Chapter 4 the strong and weak sides of collection methods are discussed. The author is sincere in admitting that no method is perfect, nor does any one give a solution for every archiving challenge. Thus a combination of methods will be needed in most cases.

Chapter 5 is dedicated to the stages of the web archiving process with the emphasis on quality assurance and cataloguing. Appendices 3 and 4 introduce practical tools to further complement this discussion. The author also discusses common issues that are likely to arise, e.g. incorrect navigation, missing content, and multilingual content. The issue of balancing the quality and quantity of archived material is brought up. Description of materials, in other words cataloguing, is also discussed.

Chapter 6 deals with principles and practicalities of preservation. It also presents strategies for preservation, from emulation to migration. The entire preservation cycle, passive and active preservation as well as preservation metadata are discussed in detail.

In Chapter 7 the author presents issues of delivery to users, starting from the two major aspects: context and authenticity. He then goes on to discuss search and browse facilities, putting archived content in context, methods of delivery, and functionality.

Very interesting is Chapter 8, which discusses legal aspects, from intellectual property rights, privacy, content liability, human rights, legislation to support web archiving (e.g. legal deposit), and public accountability. It is a thorough and concise coverage of relevant legal issues concerning web archiving.

Chapter 9 deals with management of a web archiving programme. The author presents advantages and disadvantages of different models, stages of selection and implementation, and management issues. A case study is provided as an illustrative example.

The analysis of future trends, discussed in Chapter 10, is, according to author's own words, one of he most unreliable and inexact endeavours. He nevertheless makes an attempt to discuss the future of data storage issues, digital preservation, standards, web archiving tools and technologies.

My opinion is that the LIS and archival literature would be the poorer, if this clear and well‐written book did not exist. There are not many of its kind around. However, some drawbacks should be noted. Since this is not, by the author's emphasis, a technical manual (although some parts are rather procedural), one would expect the book to tackle more social aspects, e.g. reasons and purposes to archive web sites, and the tasks and responsibilities of those included in the process. These are not simple or straightforward, but are becoming very relevant in today's world of information overload. Another example would be a discussion of the similarities and differences between institutions involved in web archiving; not only libraries, but also archives, information centres, etc., each of which has its own mission and policy.

However, regardless of these caveats, the book is certainly a worthy contribution to the body of LIS and related literature.

Related articles