To read this content please select one of the options below:

Identification of duplicate and near‐duplicate full‐text records in database search‐outputs using hierarchic cluster analysis

John W. Kirriemuir

Peter Willett

Program: electronic library and information systems

ISSN: 0033-0337

Article publication date: 1 March 1995

Downloads

92

Abstract

Clustering the output of a multi‐database online search enables a user to obtain an overview of the information that has been retrieved without the need to inspect any documents that contain only redundant information. In this paper we describe a classification scheme that characterises the degree of relationship between pairs of documents in database search‐outputs and then report the application of a range of clustering methods and similarity coefficients to 20 such outputs. These experiments demonstrate that clustering is capable of grouping documents that are identical to, or closely‐related to, other documents in the search‐output on the basis of their term similarities.

Citation

Kirriemuir, J.W. and Willett, P. (1995), "Identification of duplicate and near‐duplicate full‐text records in database search‐outputs using hierarchic cluster analysis", Program: electronic library and information systems, Vol. 29 No. 3, pp. 241-256. https://doi.org/10.1108/eb047198

Publisher

:

MCB UP Ltd

To read this content please select one of the options below:

Please note you do not have access to teaching notes

Identification of duplicate and near‐duplicate full‐text records in database search‐outputs using hierarchic cluster analysis

Abstract

Citation

Publisher

Related articles

Something didn’t work…

All feedback is valuable

Platform update page

Questions & More Information

To read this content please select one of the options below:

Please note you do not have access to teaching notes

Abstract

Citation

Publisher

Related articles

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information