To read this content please select one of the options below:

Augmenting Dublin Core digital library metadata with Dewey Decimal Classification

Michael John Khoo (College of Computing and Informatics, Drexel University, Philadelphia, Pennsylvania, U.S.A.)
Jae-wook Ahn (College of Computing and Informatics, Drexel University, Philadelphia, Pennsylvania, U.S.A.)
Ceri Binding (Hypermedia Research Unit, University of South Wales, Pontypridd, U.K.)
Hilary Jane Jones (MIMAS, The University of Manchester, Manchester, U.K.)
Xia Lin (Department: College of Computing and Informatics, Drexel University, Philadelphia, Pennsylvania, U.S.A.)
Diana Massam (MIMAS, The University of Manchester, Manchester, U.K.)
Douglas Tudhope (Hypermedia Research Unit, University of South Wales, Pontypridd, U.K.)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 14 September 2015

1838

Abstract

Purpose

The purpose of this paper is to describe a new approach to a well-known problem for digital libraries, how to search across multiple unrelated libraries with a single query.

Design/methodology/approach

The approach involves creating new Dewey Decimal Classification terms and numbers from existing Dublin Core records. In total, 263,550 records were harvested from three digital libraries. Weighted key terms were extracted from the title, description and subject fields of each record. Ranked DDC classes were automatically generated from these key terms by considering DDC hierarchies via a series of filtering and aggregation stages. A mean reciprocal ranking evaluation compared a sample of 49 generated classes against DDC classes created by a trained librarian for the same records.

Findings

The best results combined weighted key terms from the title, description and subject fields. Performance declines with increased specificity of DDC level. The results compare favorably with similar studies.

Research limitations/implications

The metadata harvest required manual intervention and the evaluation was resource intensive. Future research will look at evaluation methodologies that take account of issues of consistency and ecological validity.

Practical implications

The method does not require training data and is easily scalable. The pipeline can be customized for individual use cases, for example, recall or precision enhancing.

Social implications

The approach can provide centralized access to information from multiple domains currently provided by individual digital libraries.

Originality/value

The approach addresses metadata normalization in the context of web resources. The automatic classification approach accounts for matches within hierarchies, aggregating lower level matches to broader parents and thus approximates the practices of a human cataloger.

Keywords

Acknowledgements

The authors gratefully acknowledge: the support of the funders of the Digging Into Data program (IMLS in the USA, and JISC-ESRC-AHRC in the UK); the NSDL for metadata support; and OCLC for DDC23.

Citation

Khoo, M.J., Ahn, J.-w., Binding, C., Jones, H.J., Lin, X., Massam, D. and Tudhope, D. (2015), "Augmenting Dublin Core digital library metadata with Dewey Decimal Classification", Journal of Documentation, Vol. 71 No. 5, pp. 976-998. https://doi.org/10.1108/JD-07-2014-0103

Publisher

:

Emerald Group Publishing Limited

Copyright © 2015, Emerald Group Publishing Limited

Related articles