To read the full version of this content please select one of the options below:

HIERARCHIC AGGLOMERATIVE CLUSTERING METHODS FOR AUTOMATIC DOCUMENT CLASSIFICATION

ALAN GRIFFITHS (Department of Information Studies, University of Sheffield, Western Bank, Sheffield S10 2TN, UK)
LESLEY A. ROBINSON (Department of Information Studies, University of Sheffield, Western Bank, Sheffield S10 2TN, UK)
PETER WILLETT (Department of Information Studies, University of Sheffield, Western Bank, Sheffield S10 2TN, UK)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 1 March 1984

326

Abstract

This paper considers the classifications produced by application of the single linkage, complete linkage, group average and Ward clustering methods to the Keen and Cranfield document test collections. Experiments were carried out to study the structure of the hierarchies produced by the different methods, the extent to which the methods distort the input similarity matrices during the generation of a classification, and the retrieval effectiveness obtainable in cluster based retrieval. The results would suggest that the single linkage method, which has been used extensively in previous work on document clustering, is not the most effective procedure of those tested, although it should be emphasized that the experiments have used only small document test collections.

Citation

GRIFFITHS, A., ROBINSON, L.A. and WILLETT, P. (1984), "HIERARCHIC AGGLOMERATIVE CLUSTERING METHODS FOR AUTOMATIC DOCUMENT CLASSIFICATION", Journal of Documentation, Vol. 40 No. 3, pp. 175-205. https://doi.org/10.1108/eb026764

Publisher

:

MCB UP Ltd

Copyright © 1984, MCB UP Limited

Related articles