Search results
1 – 10 of 13Ceri Binding, Claudio Gnoli and Douglas Tudhope
The Integrative Levels Classification (ILC) is a comprehensive “freely faceted” knowledge organization system not previously expressed as SKOS (Simple Knowledge Organization…
Abstract
Purpose
The Integrative Levels Classification (ILC) is a comprehensive “freely faceted” knowledge organization system not previously expressed as SKOS (Simple Knowledge Organization System). This paper reports and reflects on work converting the ILC to SKOS representation.
Design/methodology/approach
The design of the ILC representation and the various steps in the conversion to SKOS are described and located within the context of previous work considering the representation of complex classification schemes in SKOS. Various issues and trade-offs emerging from the conversion are discussed. The conversion implementation employed the STELETO transformation tool.
Findings
The ILC conversion captures some of the ILC facet structure by a limited extension beyond the SKOS standard. SPARQL examples illustrate how this extension could be used to create faceted, compound descriptors when indexing or cataloguing. Basic query patterns are provided that might underpin search systems. Possible routes for reducing complexity are discussed.
Originality/value
Complex classification schemes, such as the ILC, have features which are not straight forward to represent in SKOS and which extend beyond the functionality of the SKOS standard. The ILC's facet indicators are modelled as rdf:Property sub-hierarchies that accompany the SKOS RDF statements. The ILC's top-level fundamental facet relationships are modelled by extensions of the associative relationship – specialised sub-properties of skos:related. An approach for representing faceted compound descriptions in ILC and other faceted classification schemes is proposed.
Details
Keywords
Douglas Tudhope, Ceri Binding, Dorothee Blocks and Daniel Cunliffe
The purpose of this paper is to explore query expansion via conceptual distance in thesaurus indexed collections
Abstract
Purpose
The purpose of this paper is to explore query expansion via conceptual distance in thesaurus indexed collections
Design/methodology/approach
An extract of the National Museum of Science and Industry's collections database, indexed with the Getty Art and Architecture Thesaurus (AAT), was the dataset for the research. The system architecture and algorithms for semantic closeness and the matching function are outlined. Standalone and web interfaces are described and formative qualitative user studies are discussed. One user session is discussed in detail, together with a scenario based on a related public inquiry. Findings are set in context of the literature on thesaurus‐based query expansion. This paper discusses the potential of query expansion techniques using the semantic relationships in a faceted thesaurus.
Findings
Thesaurus‐assisted retrieval systems have potential for multi‐concept descriptors, permitting very precise queries and indexing. However, indexer and searcher may differ in terminology judgments and there may not be any exactly matching results. The integration of semantic closeness in the matching function permits ranked results for multi‐concept queries in thesaurus‐indexed applications. An in‐memory representation of the thesaurus semantic network allows a combination of automatic and interactive control of expansion and control of expansion on individual query terms.
Originality/value
The application of semantic expansion to browsing may be useful in interface options where thesaurus structure is hidden.
Details
Keywords
Paul Beynon‐Davies, Douglas Tudhope and Hugh Mackay
In this paper we discuss some of the particular features of user involvement in information systems (IS) development projects with reference to the idea of the trajectory of…
Abstract
In this paper we discuss some of the particular features of user involvement in information systems (IS) development projects with reference to the idea of the trajectory of development being a political/cultural process. The main aim is to attempt to supply more depth to an understanding of the pragmatics of user involvement in IS development projects. We illustrate how in one particular project, differences in organisational sub‐cultures, and in particular the way in which the technology was ‘framed’, led to differences in the way in which an information system was conceived. These differences, in turn, contributed to elements of organisational conflict between stakeholder groups over the future trajectory of the IS development. We conclude with a critique of some generally held assumptions concerning user involvement.
Details
Keywords
Andreas Vlachidis, Ceri Binding, Douglas Tudhope and Keith May
This paper sets out to discuss the use of information extraction (IE), a natural language‐processing (NLP) technique to assist “rich” semantic indexing of diverse archaeological…
Abstract
Purpose
This paper sets out to discuss the use of information extraction (IE), a natural language‐processing (NLP) technique to assist “rich” semantic indexing of diverse archaeological text resources. The focus of the research is to direct a semantic‐aware “rich” indexing of diverse natural language resources with properties capable of satisfying information retrieval from online publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project.
Design/methodology/approach
The paper proposes use of the English Heritage extension (CRM‐EH) of the standard core ontology in cultural heritage, CIDOC CRM, and exploitation of domain thesauri resources for driving and enhancing an Ontology‐Oriented Information Extraction process. The process of semantic indexing is based on a rule‐based Information Extraction technique, which is facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules.
Findings
Initial results suggest that the combination of information extraction with knowledge resources and standard conceptual models is capable of supporting semantic‐aware term indexing. Additional efforts are required for further exploitation of the technique and adoption of formal evaluation methods for assessing the performance of the method in measurable terms.
Originality/value
The value of the paper lies in the semantic indexing of 535 unpublished online documents often referred to as “Grey Literature”, from the Archaeological Data Service OASIS corpus (Online AccesS to the Index of archaeological investigationS), with respect to the CRM ontological concepts E49.Time Appellation and P19.Physical Object.
Details
Keywords
Andreas Vlachidis and Douglas Tudhope
The purpose of this paper is to present the role and contribution of natural language processing techniques, in particular negation detection and word sense disambiguation in the…
Abstract
Purpose
The purpose of this paper is to present the role and contribution of natural language processing techniques, in particular negation detection and word sense disambiguation in the process of Semantic Annotation of Archaeological Grey Literature. Archaeological reports contain a great deal of information that conveys facts and findings in different ways. This kind of information is highly relevant to the research and analysis of archaeological evidence but at the same time can be a hindrance for the accurate indexing of documents with respect to positive assertions.
Design/methodology/approach
The paper presents a method for adapting the biomedicine oriented negation algorithm NegEx to the context of archaeology and discusses the evaluation results of the new modified negation detection module. A particular form of polysemy, which is inflicted by the definition of ontology classes and concerning the semantics of small finds in archaeology, is addressed by a domain specific word-sense disambiguation module.
Findings
The performance of the negation dection module is compared against a “Gold Standard” that consists of 300 manually annotated pages of archaeological excavation and evaluation reports. The evaluation results are encouraging, delivering overall 89 per cent precision, 80 per cent recall and 83 per cent F-measure scores. The paper addresses limitations and future improvements of the current work and highlights the need for ontological modelling to accommodate negative assertions.
Originality/value
The discussed NLP modules contribute to the aims of the OPTIMA pipeline delivering an innovative application of such methods in the context of archaeological reports for the semantic annotation of archaeological grey literature with respect to the CIDOC-CRM ontology.
Details
Keywords
Koraljka Golub, Marianne Lykke and Douglas Tudhope
The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social…
Abstract
Purpose
The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social tagging, with the ultimate purpose of improving subject indexing and information retrieval.
Design/methodology/approach
Over 11,000 Intute metadata records in politics were used. Totally, 28 politics students were each given four tasks, in which a total of 60 resources were tagged in two different configurations, one with uncontrolled social tags only and another with uncontrolled social tags as well as suggestions from a controlled vocabulary. The controlled vocabulary was DDC comprising also mappings from the Library of Congress Subject Headings.
Findings
The results demonstrate the importance of controlled vocabulary suggestions for indexing and retrieval: to help produce ideas of which tags to use, to make it easier to find focus for the tagging, to ensure consistency and to increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both as to conceptual relevance to the user and as to appropriateness of the terminology.
Originality/value
No research has investigated the enhancement of social tagging with suggestions from the DDC, an established KOS, in a user trial, comparing social tagging only and social tagging enhanced with the suggestions. This paper is a final reflection on all aspects of the study.
Details
Keywords
Brian Matthews, Catherine Jones, Bartłomiej Puzoń, Jim Moon, Douglas Tudhope, Koraljka Golub and Marianne Lykke Nielsen
Traditional subject indexing and classification are considered infeasible in many digital collections. This paper seeks to investigate ways of enhancing social tagging via…
Abstract
Purpose
Traditional subject indexing and classification are considered infeasible in many digital collections. This paper seeks to investigate ways of enhancing social tagging via knowledge organization systems, with a view to improving the quality of tags for increased information discovery and retrieval performance.
Design/methodology/approach
Enhanced tagging interfaces were developed for exemplar online repositories, and trials were undertaken with author and reader groups to evaluate the effectiveness of tagging augmented with control vocabulary for subject indexing of papers in online repositories.
Findings
The results showed that using a knowledge organisation system to augment tagging does appear to increase the effectiveness of non‐specialist users (that is, without information science training) in subject indexing.
Research limitations/implications
While limited by the size and scope of the trials undertaken, these results do point to the usefulness of a mixed approach in supporting the subject indexing of online resources.
Originality/value
The value of this work is as a guide to future developments in the practical support for resource indexing in online repositories.
Details
Keywords
Michael John Khoo, Jae-wook Ahn, Ceri Binding, Hilary Jane Jones, Xia Lin, Diana Massam and Douglas Tudhope
– The purpose of this paper is to describe a new approach to a well-known problem for digital libraries, how to search across multiple unrelated libraries with a single query.
Abstract
Purpose
The purpose of this paper is to describe a new approach to a well-known problem for digital libraries, how to search across multiple unrelated libraries with a single query.
Design/methodology/approach
The approach involves creating new Dewey Decimal Classification terms and numbers from existing Dublin Core records. In total, 263,550 records were harvested from three digital libraries. Weighted key terms were extracted from the title, description and subject fields of each record. Ranked DDC classes were automatically generated from these key terms by considering DDC hierarchies via a series of filtering and aggregation stages. A mean reciprocal ranking evaluation compared a sample of 49 generated classes against DDC classes created by a trained librarian for the same records.
Findings
The best results combined weighted key terms from the title, description and subject fields. Performance declines with increased specificity of DDC level. The results compare favorably with similar studies.
Research limitations/implications
The metadata harvest required manual intervention and the evaluation was resource intensive. Future research will look at evaluation methodologies that take account of issues of consistency and ecological validity.
Practical implications
The method does not require training data and is easily scalable. The pipeline can be customized for individual use cases, for example, recall or precision enhancing.
Social implications
The approach can provide centralized access to information from multiple domains currently provided by individual digital libraries.
Originality/value
The approach addresses metadata normalization in the context of web resources. The automatic classification approach accounts for matches within hierarchies, aggregating lower level matches to broader parents and thus approximates the practices of a human cataloger.
Details
Keywords