Search results
1 – 10 of over 5000Michael John Khoo, Jae-wook Ahn, Ceri Binding, Hilary Jane Jones, Xia Lin, Diana Massam and Douglas Tudhope
– The purpose of this paper is to describe a new approach to a well-known problem for digital libraries, how to search across multiple unrelated libraries with a single query.
Abstract
Purpose
The purpose of this paper is to describe a new approach to a well-known problem for digital libraries, how to search across multiple unrelated libraries with a single query.
Design/methodology/approach
The approach involves creating new Dewey Decimal Classification terms and numbers from existing Dublin Core records. In total, 263,550 records were harvested from three digital libraries. Weighted key terms were extracted from the title, description and subject fields of each record. Ranked DDC classes were automatically generated from these key terms by considering DDC hierarchies via a series of filtering and aggregation stages. A mean reciprocal ranking evaluation compared a sample of 49 generated classes against DDC classes created by a trained librarian for the same records.
Findings
The best results combined weighted key terms from the title, description and subject fields. Performance declines with increased specificity of DDC level. The results compare favorably with similar studies.
Research limitations/implications
The metadata harvest required manual intervention and the evaluation was resource intensive. Future research will look at evaluation methodologies that take account of issues of consistency and ecological validity.
Practical implications
The method does not require training data and is easily scalable. The pipeline can be customized for individual use cases, for example, recall or precision enhancing.
Social implications
The approach can provide centralized access to information from multiple domains currently provided by individual digital libraries.
Originality/value
The approach addresses metadata normalization in the context of web resources. The automatic classification approach accounts for matches within hierarchies, aggregating lower level matches to broader parents and thus approximates the practices of a human cataloger.
Details
Keywords
Steve Mitchell, Julie Mason and Lori Pender
The following describes a number of technologies and exemplary service designs that foster better Internet finding tools in libraries and more cooperative and efficient effort in…
Abstract
The following describes a number of technologies and exemplary service designs that foster better Internet finding tools in libraries and more cooperative and efficient effort in Internet resource collection building. Our library and partner institutions have been involved in this work for over a decade. The open source software and projects discussed represent appropriate technologies and sustainable strategies that will help Internet portals, digital libraries, virtual libraries and library catalogs‐with‐portal‐like‐capabilities (IPDVLCs) to scale better and to anticipate and meet the needs of scholarly and educational users.
Details
Keywords
Koraljka Golub, Pawel Michal Ziolkowski and Goran Zlodi
The study aims to paint a representative picture of the current state of search interfaces of Swedish online museum collections, focussing on search functionalities with…
Abstract
Purpose
The study aims to paint a representative picture of the current state of search interfaces of Swedish online museum collections, focussing on search functionalities with particular reference to subject searching, as well as the use of controlled vocabularies, with the purpose of identifying which improvements of the search interfaces are needed to ensure high-quality information retrieval for the end user.
Design/methodology/approach
In the first step, a set of 21 search interface criteria was identified, based on related research and current standards in the domain of cultural heritage knowledge organization. Secondly, a complete set of Swedish museums that provide online access to their collections was identified, comprising nine cross-search services and 91 individual museums' websites. These 100 websites were each evaluated against the 21 criteria, between 1 July and 31 August 2020.
Findings
Although many standards and guidelines are in place to ensure quality-controlled subject indexing, which in turn support information retrieval of relevant resources (as individual or full search results), the study shows that they are not broadly implemented, resulting in information retrieval failures for the end user. The study also demonstrates a strong need for the implementation of controlled vocabularies in these museums.
Originality/value
This study is a rare piece of research which examines subject searching in online museums; the 21 search criteria and their use in the analysis of the complete set of online collections of a country represents a considerable and unique contribution to the fields of knowledge organization and information retrieval of cultural heritage. Its particular value lies in showing how the needs of end users, many of which are documented and reflected in international standards and guidelines, should be taken into account in designing search tools for these museums; especially so in subject searching, which is the most complex and yet the most common type of search. Much effort has been invested into digitizing cultural heritage collections, but access to them is hindered by poor search functionality. This study identifies which are the most important aspects to improve.
Details
Keywords
Teresa Susana Mendes Pereira and Ana Alice Baptista
The purpose of this paper is to present an instance of the system developed in the OmniPaper project, regarding the mechanisms of distributed information retrieval. These…
Abstract
Purpose
The purpose of this paper is to present an instance of the system developed in the OmniPaper project, regarding the mechanisms of distributed information retrieval. These mechanisms were developed for newspaper articles and they were then instantiated in the context of the scientific publication.
Design/methodology/approach
One of the steps of the system's development was the definition of the metadata layer that supports the research and the navigation functionalities as well as the contents' syndication. Several tasks were performed for the definition of the metadata layer, namely: analysis of several metadata standard vocabularies; selection of the metadata elements; definition of an application profile and the RSS template; development of a metadatabase, through the use of a native Resource Description Framework (RDF) database management system to store the RSS descriptions of the scientific publications; implementation of the search and navigation processes developed in the prototype; finally, tests and validation of all developed functionalities.
Findings
The RSS technology is well suited for handling the description of scientific contents. RDF records that were used in the OmniPaper RDF prototype were replaced by RSS. The subject and lexical thesauri were kept. This strong metadata layer allows the creation of several services that facilitate the conceptual search of scientific contents.
Research limitations/implications
The system implemented was tested but not evaluated in a real environment with specific users.
Originality/value
This paper presents a system that uses a central metadatabase to support conceptual searching mechanisms. This is a solution for a value‐added service for the scientific community that is fully based in state‐of‐the‐art standard technologies and is fully open for integration with other systems. Moreover this could be implemented by journals to improve the current mechanisms used to access, distribute and disseminate the scientific research developments.
Details
Keywords
A.R.D. Prasad and Nabonita Guha
The purpose of this paper is to show that concept naming alone in document annotation is not sufficient to convey the thought content of the information resource. The paper…
Abstract
Purpose
The purpose of this paper is to show that concept naming alone in document annotation is not sufficient to convey the thought content of the information resource. The paper presents an outline of semantic document annotation which combines two major processes: facet analysis and concept categorisation. This is also an effort to show how RDF schema can be designed and implemented so that the properties of the schema are able to express the basic structure of the subject matter of the resource.
Design/methodology/approach
This paper presents a methodology for representing the subject matter of a document in terms of RDF. For the purposes of faceted subject annotation, it has developed an extended RDF schema for simple knowledge organisation system (SKOS). The facets and relationships of the faceted subject indexing language postulate‐based permuted subject indexing system (POPSI) have been transformed into RDFS classes. The elementary categories of POPSI form the property classes in the POPSI/RDF Schema. These property classes have been used to formulate the subject description of a document.
Findings
The subject annotation of a document using this schema expresses all the components of the thought content of an information resource.
Practical implications
The examples given in this paper show the applicability of this schema in describing resources in web directories and annotating scholarly documents in digital libraries. In a broader perspective, this provides a methodology for formulating the subject metadata of web resources. This schema helps in formulating the subject string(s) for a resource outlining the skeleton structure of its thought content.
Originality/value
SKOS has been developed as an RDF schema representation of the traditional knowledge organisation systems. But the schema has limited room to accommodate subject indexing languages. The present schema extends the SKOS schema to accommodate the representation of faceted subject indexing languages. The faceted subject annotation system has been adopted for the very reason that it has precedence over the enumerated classification systems, controlled vocabulary lists, etc. The potential to describe the specific subject of the document with more accuracy and representation of context gives the faceted subject indexing languages strength to make the subject description explicit and machine processible.
Details
Keywords
Andrea Cuna and Gabriele Angeli
This paper puts forward a MARC-based semiautomated approach to extracting semantically rich subject facets from general and/or specialized controlled vocabularies for display in…
Abstract
Purpose
This paper puts forward a MARC-based semiautomated approach to extracting semantically rich subject facets from general and/or specialized controlled vocabularies for display in topic-oriented faceted catalog interfaces in a way that would better support users' exploratory search tasks.
Design/methodology/approach
Hierarchical faceted subject metadata is extracted from general and/or specialized controlled vocabularies by using standard client/server communication protocols. Rigorous facet analysis, classification and linguistic principles are applied on top of that to ensure faceting accuracy and consistency.
Findings
A shallow application of facet analysis and classification, together with poorly organized displays, is one of the major barriers to effective faceted navigation in library, archive and museum catalogs.
Research limitations/implications
This paper does not deal with Web-scale discovery services.
Practical implications
This paper offers suggestions that can be used by the technical services departments of libraries, archives and museums in designing and developing more powerful exploratory search interfaces.
Originality/value
This paper addresses the problem of deriving clearly delineated topical facets from existing metadata for display in a user-friendly, high-level topical overview that is meant to encourage a multidimensional exploration of local collections as well as “learning by browsing.”
Details
Keywords
Koraljka Golub, Jenny Bergenmar and Siska Humelsjö
This article aims to help ensure high-quality subject access to Swedish lesbian, gay, bisexual, transgender, queer and intersexual (LGBTQI) fiction, and aims to identify…
Abstract
Purpose
This article aims to help ensure high-quality subject access to Swedish lesbian, gay, bisexual, transgender, queer and intersexual (LGBTQI) fiction, and aims to identify challenges that librarians consider important to address, on behalf of themselves and end users.
Design/methodology/approach
A web-based questionnaire comprising 35 closed and open questions, 22 of which were required, was sent via online channels in January 2022. By the survey closing date, 20 March 2022, 82 responses had been received. The study was intended to complement an earlier study targeting end users.
Findings
Both this study of librarians and the previous study of end users have painted a dismal image of online search services when it comes to searching for LGBTQI fiction. The need to consult different channels (e.g. social media, library catalogues and friends), the inability to search more specifically than for the broad LGBTQI category and suboptimal search interfaces were among the commonly reported issues. The results of these studies are used to inform the development of a dedicated Swedish LGBTQI fiction database with an online search interface.
Originality/value
The subject searching of fiction via online services is usually limited to genre with facets for time and place, while users are often seeking characteristics such as pacing, characterization, storyline, frame/setting, tone and language/style. LGBTQI fiction is even more challenging to search because indexing practices are not really being standardized or disseminated worldwide. This study helps address this important gap, in both research and practical applications.
Details
Keywords
Lucas Mak, Devin Higgins, Aaron Collie and Shawn Nicholson
The purpose of this paper is to illustrate that Electronic Theses and Dissertation (ETD) metadata can be used as data for institutional assessment and to map an extended research…
Abstract
Purpose
The purpose of this paper is to illustrate that Electronic Theses and Dissertation (ETD) metadata can be used as data for institutional assessment and to map an extended research landscape when connected to other data sets through linked data models.
Design/methodology/approach
This paper presents conceptual consideration of ideas behind linked data architecture to leverage ETD and attendant metadata to build a case for institutional assessment. Analysis of graph data support the considerations.
Findings
The study reveals first and foremost that ETD metadata is in itself data. Concerns with creating URIs for data elements and general applicability of linked data model formation result. The analysis positively points up a rich environment of institutional relationships not readily found in traditional flat metadata records.
Originality/value
This paper provides a new perspective in examining research landscape through ETDs produced by graduate students in higher education sector.
Details
Keywords
This paper seeks to study the metadata requirements for setting up a digital repository in ceramics resources that would provide researchers and ceramic art professionals with…
Abstract
Purpose
This paper seeks to study the metadata requirements for setting up a digital repository in ceramics resources that would provide researchers and ceramic art professionals with access to the information as per their requirements.
Design/methodology/approach
The paper first reviews and analyzes various metadata standards and formats already available. Open software (Greenstone) is used to develop the repository and the paper discusses its metadata provisions. Thereafter, the paper focuses on ceramics resources and attempts to determine the metadata elements required to describe and organize ceramic resources. Existing controlled vocabularies to standardize content metadata of the repository are also reviewed.
Findings
The paper finds that selected metadata elements of Dublin Core and Categories for the Description of Work of Art can be used to describe and organize the ceramics resources. Local qualifiers are added when necessary to describe the resources. As Categories for the Description of Work of Art metadata standards are not provided in Greenstone, these were defined using GEMS to describe and organize ceramic art works. It also found that existing controlled vocabularies are not sufficient to standardize the content metadata of the repository.
Research limitations/implications
A digital repository should also contain information resources such as video and audio‐video information resources. The study has not considered studying metadata requirements to describe such information resources.
Originality/value
This paper could be useful for others who want to develop their repositories in various disciplines.
Details
Keywords
Hirak Jyoti Hazarika, S. Ravikumar and Akash Handique
This paper aims to present a novel DSpace-based medical image repository system planned explicitly for storing and retrieving clinical images using digital imaging and…
Abstract
Purpose
This paper aims to present a novel DSpace-based medical image repository system planned explicitly for storing and retrieving clinical images using digital imaging and communication in medicine (DICOM) metadata standards. DSpace institutional repository software is widely used in an academic environment for accessing and mainly storing text-related files. DICOM images are particular types of images embedded with much system-generated metadata and organised using DICOM metadata standards.
Design/methodology/approach
The present paper talks about institutional repository software (DSpace) in archiving DICOM images. In the current study, the authors have tried to integrate the DICOM metadata standard with DSpace, which was compatible with Dublin Core (DC) and open archives initiative – protocol for metadata harvesting (OAI-PMH). After combining the DICOM standard with DSpace and the repository tested with a sample of 5,000 images, the retrieval results using various DICOM tags was very satisfactory. This study paves for the use of open source software (OSS) in storing and retrieving medical images.
Findings
The author has provided the DSpace software to recognised DICOM (.dcm) files in the first stage. In the second stage, a patch was developed to identify the DICOM metadata standard in Dspace, which has inbuilt DC metadata standards. Finally, in the third stage, retrieval efficiency was tested with a 5,000 .dcm image using the DICOM tag and the results were very fruitful.
Research limitations/implications
A major limitation of this study was the size of the data (5,000 DICOM images) with which the authors have tested the system. The system scalability has to be tested on various fronts like on cloud and local servers with different configurations, for which a separate study has to be done.
Practical implications
Once this system is in place, DICOM users can stock, retrieve and access the image from the Web platform. Furthermore, this proposed repository will be the warehouse of various DICOM images with reasonable storage costs.
Originality/value
In addition to exploring the opportunities of free open source software (FOSS) implementation in medical science, this study includes issues related to the performance of an open-source repository for retrieving and preserving medical images. It created and developed Open Source DICOM Medical Image Library with DICOM metadata standard with the help of DSpace. Thus, the study will generate value for library professionals and medical professionals and FOSS vendors to understand the medical market in the context of FOSS.
Details