Search results

1 – 10 of 13
Open Access
Article
Publication date: 2 April 2024

Koraljka Golub, Osma Suominen, Ahmed Taiye Mohammed, Harriet Aagaard and Olof Osterman

In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an…

Abstract

Purpose

In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an open source software package on a large set of Swedish union catalogue metadata records, with Dewey Decimal Classification (DDC) as the target classification system. It also aimed to contribute to the body of research on aboutness and related challenges in automated subject indexing and evaluation.

Design/methodology/approach

On a sample of over 230,000 records with close to 12,000 distinct DDC classes, an open source tool Annif, developed by the National Library of Finland, was applied in the following implementations: lexical algorithm, support vector classifier, fastText, Omikuji Bonsai and an ensemble approach combing the former four. A qualitative study involving two senior catalogue librarians and three students of library and information studies was also conducted to investigate the value and inter-rater agreement of automatically assigned classes, on a sample of 60 records.

Findings

The best results were achieved using the ensemble approach that achieved 66.82% accuracy on the three-digit DDC classification task. The qualitative study confirmed earlier studies reporting low inter-rater agreement but also pointed to the potential value of automatically assigned classes as additional access points in information retrieval.

Originality/value

The paper presents an extensive study of automated classification in an operative library catalogue, accompanied by a qualitative study of automated classes. It demonstrates the value of applying semi-automated indexing in operative information retrieval systems.

Details

Journal of Documentation, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 30 November 2021

Koraljka Golub, Pawel Michal Ziolkowski and Goran Zlodi

The study aims to paint a representative picture of the current state of search interfaces of Swedish online museum collections, focussing on search functionalities with…

2807

Abstract

Purpose

The study aims to paint a representative picture of the current state of search interfaces of Swedish online museum collections, focussing on search functionalities with particular reference to subject searching, as well as the use of controlled vocabularies, with the purpose of identifying which improvements of the search interfaces are needed to ensure high-quality information retrieval for the end user.

Design/methodology/approach

In the first step, a set of 21 search interface criteria was identified, based on related research and current standards in the domain of cultural heritage knowledge organization. Secondly, a complete set of Swedish museums that provide online access to their collections was identified, comprising nine cross-search services and 91 individual museums' websites. These 100 websites were each evaluated against the 21 criteria, between 1 July and 31 August 2020.

Findings

Although many standards and guidelines are in place to ensure quality-controlled subject indexing, which in turn support information retrieval of relevant resources (as individual or full search results), the study shows that they are not broadly implemented, resulting in information retrieval failures for the end user. The study also demonstrates a strong need for the implementation of controlled vocabularies in these museums.

Originality/value

This study is a rare piece of research which examines subject searching in online museums; the 21 search criteria and their use in the analysis of the complete set of online collections of a country represents a considerable and unique contribution to the fields of knowledge organization and information retrieval of cultural heritage. Its particular value lies in showing how the needs of end users, many of which are documented and reflected in international standards and guidelines, should be taken into account in designing search tools for these museums; especially so in subject searching, which is the most complex and yet the most common type of search. Much effort has been invested into digitizing cultural heritage collections, but access to them is hindered by poor search functionality. This study identifies which are the most important aspects to improve.

Open Access
Article
Publication date: 30 October 2023

Koraljka Golub, Xu Tan, Ying-Hsang Liu and Jukka Tyrkkö

This exploratory study aims to help contribute to the understanding of online information search behaviour of PhD students from different humanities fields, with a focus on…

Abstract

Purpose

This exploratory study aims to help contribute to the understanding of online information search behaviour of PhD students from different humanities fields, with a focus on subject searching.

Design/methodology/approach

The methodology is based on a semi-structured interview within which the participants are asked to conduct both a controlled search task and a free search task. The sample comprises eight PhD students in several humanities disciplines at Linnaeus University, a medium-sized Swedish university from 2020.

Findings

Most humanities PhD students in the study have received training in information searching, but it has been too basic. Most rely on web search engines like Google and Google Scholar for publications' search, and university's discovery system for known-item searching. As these systems do not rely on controlled vocabularies, the participants often struggle with too many retrieved documents that are not relevant. Most only rarely or never use disciplinary bibliographic databases. The controlled search task has shown some benefits of using controlled vocabularies in the disciplinary databases, but incomplete synonym or concept coverage as well as user unfriendly search interface present hindrances.

Originality/value

The paper illuminates an often-forgotten but pervasive challenge of subject searching, especially for humanities researchers. It demonstrates difficulties and shows how most PhD students have missed finding an important resource in their research. It calls for the need to reconsider training in information searching and the need to make use of controlled vocabularies implemented in various search systems with usable search and browse user interfaces.

Article
Publication date: 2 June 2020

Koraljka Golub, Jukka Tyrkkö, Joacim Hansson and Ida Ahlström

As the humanities develop in the realm of increasingly more pronounced digital scholarship, it is important to provide quality subject access to a vast range of heterogeneous…

1039

Abstract

Purpose

As the humanities develop in the realm of increasingly more pronounced digital scholarship, it is important to provide quality subject access to a vast range of heterogeneous information objects in digital services. The study aims to paint a representative picture of the current state of affairs of the use of subject index terms in humanities journal articles with particular reference to the well-established subject access needs of humanities researchers, with the purpose of identifying which improvements are needed in this context.

Design/methodology/approach

The comparison of subject metadata on a sample of 649 peer-reviewed journal articles from across the humanities is conducted in a university repository, against Scopus, the former reflecting local and national policies and the latter being the most comprehensive international abstract and citation database of research output.

Findings

The study shows that established bibliographic objectives to ensure subject access for humanities journal articles are not supported in either the world's largest commercial abstract and citation database Scopus or the local repository of a public university in Sweden. The indexing policies in the two services do not seem to address the needs of humanities scholars for highly granular subject index terms with appropriate facets; no controlled vocabularies for any humanities discipline are used whatsoever.

Originality/value

In all, not much has changed since 1990s when indexing for the humanities was shown to lag behind the sciences. The community of researchers and information professionals, today working together on digital humanities projects, as well as interdisciplinary research teams, should demand that their subject access needs be fulfilled, especially in commercial services like Scopus and discovery services.

Open Access
Article
Publication date: 16 October 2023

Koraljka Golub, Jenny Bergenmar and Siska Humelsjö

This article aims to help ensure high-quality subject access to Swedish lesbian, gay, bisexual, transgender, queer and intersexual (LGBTQI) fiction, and aims to identify…

Abstract

Purpose

This article aims to help ensure high-quality subject access to Swedish lesbian, gay, bisexual, transgender, queer and intersexual (LGBTQI) fiction, and aims to identify challenges that librarians consider important to address, on behalf of themselves and end users.

Design/methodology/approach

A web-based questionnaire comprising 35 closed and open questions, 22 of which were required, was sent via online channels in January 2022. By the survey closing date, 20 March 2022, 82 responses had been received. The study was intended to complement an earlier study targeting end users.

Findings

Both this study of librarians and the previous study of end users have painted a dismal image of online search services when it comes to searching for LGBTQI fiction. The need to consult different channels (e.g. social media, library catalogues and friends), the inability to search more specifically than for the broad LGBTQI category and suboptimal search interfaces were among the commonly reported issues. The results of these studies are used to inform the development of a dedicated Swedish LGBTQI fiction database with an online search interface.

Originality/value

The subject searching of fiction via online services is usually limited to genre with facets for time and place, while users are often seeking characteristics such as pacing, characterization, storyline, frame/setting, tone and language/style. LGBTQI fiction is even more challenging to search because indexing practices are not really being standardized or disseminated worldwide. This study helps address this important gap, in both research and practical applications.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 9 January 2017

Koraljka Golub, Joacim Hansson and Lars Selden

The purpose of the paper is to analyse three Scandinavian iSchools in Denmark, Norway and Sweden with regard to their intentions of becoming iSchools and curriculum content in…

Abstract

Purpose

The purpose of the paper is to analyse three Scandinavian iSchools in Denmark, Norway and Sweden with regard to their intentions of becoming iSchools and curriculum content in relation to these intentions. By doing so, a picture will be given of the international expansion of the iSchool concept in terms of organisational symbolism and practical educational content. In order to underline the approaches of the Scandinavian schools, comparisons are made to three American iSchools.

Design/methodology/approach

The study is framed through theory on organisational symbolism and the intentions of the iSchool movement as formulated in its vision statements. Empirically, the study consists of two parts: close readings of three documents outlining the considerations of three Scandinavian LIS schools before applying for the iSchool status, and statistical analysis of 427 syllabi from master level courses at three Scandinavian and three American iSchools.

Findings

All three Scandinavian schools, analysed, have recently become iSchools, and though some differences are visible, it is hard to distinguish anything in their syllabi as carriers of what can be described as an iSchool identity. In considering iSchool identity, it instead benefits on a symbolic level that are most prominent, such as branding, social visibility and the possible attraction of new student groups. The traditionally strong relation to national library sectors are emphasised as important to maintain, specifically in Norway and Sweden.

Research limitations/implications

The study is done on iSchools in Denmark, Norway and Sweden with empirical comparison to three American schools. These comparisons face the challenge of meeting the educational system and programme structure of each individual country. Despite this, findings prove possible to use as ground for conclusions, although empirical generalisations concerning, for instance, other countries must be made with caution.

Practical implications

This study highlights the practical challenges met in international expansion of the iSchool movement, both on a practical and symbolic level. Both the iSchool Caucus and individual schools considering becoming iSchools may use these findings as a point of reference in development and decision making.

Originality/value

This is an original piece of research from which the results may contribute to the international development of the iSchool movement, and extend the theoretical understanding of the iSchool movement as an educational and organisational construct.

Details

Journal of Documentation, vol. 73 no. 1
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 14 October 2022

Koraljka Golub, Jenny Bergenmar and Siska Humlesjö

The purpose of this study is to investigate the needs of potential end-users of a database dedicated to Swedish lesbian, gay, bisexual, transgender, queer, and intersex (LGBTQI…

1161

Abstract

Purpose

The purpose of this study is to investigate the needs of potential end-users of a database dedicated to Swedish lesbian, gay, bisexual, transgender, queer, and intersex (LGBTQI) literature (e.g. prose, poetry, drama, graphic novels/comics, and illustrated books), in order to inform the development of a database, search interface functionalities, and an LGBTQI thesaurus for fiction.

Design/methodology/approach

A web questionnaire was distributed in autumn 2021 to potential end-users. The questions covered people's reasons for reading LGBTQI fiction, ways of finding LGBTQI fiction, experience of searching for LGBTQI fiction, usual search elements applied, latest search for LGBTQI fiction, desired subjects to search for, and ideal search functionalities.

Findings

The 101 completed questionnaires showed that most respondents found relevant literature through social media or friends and that most obtained copies of literature from a library. Regarding desirable search functionalities, most respondents would like to see suggestions for related terms to support broader search results (i.e. higher recall). Many also wanted search support that would enable retrieving more specific results based on narrower terms when too many results are retrieved (i.e. higher precision). Over half would also appreciate the option to browse by hierarchically arranged subjects.

Originality/value

This study is the first to show how readers of LGBTQI fiction in Sweden search for and obtain relevant literature. The authors have identified end-user needs that can inform the development of a new database and a thesaurus dedicated to LGBTQI fiction.

Details

Journal of Documentation, vol. 78 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 16 October 2009

Koraljka Golub and Marianne Lykke

The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to…

Abstract

Purpose

The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to investigate the performance of an automated classification algorithm based on the Ei classification scheme.

Design/methodology/approach

A user study was conducted in which users solved four controlled searching tasks. The users browsed the Ei classification scheme in order to examine the suitability of the classification systems for browsing. The classification algorithm was evaluated by the users who judged the correctness of the automatically assigned classes.

Findings

The study showed that the Ei classification scheme is suited for browsing. Automatically assigned classes were on average partly correct, with some classes working better than others. Success of browsing showed to be correlated and dependent on classification correctness.

Research limitations/implications

Further research should address problems of disparate evaluations of one and the same web page. Additional reasons behind browsing failures in the Ei classification scheme also need further investigation.

Practical implications

Improvements for browsing were identified: describing class captions and/or listing their subclasses from start; allowing for searching for words from class captions with synonym search (easily provided for Ei since the classes are mapped to thesauri terms); when searching for class captions, returning the hierarchical tree expanded around the class in which caption the search term is found. The need for improvements of classification schemes was also indicated.

Originality/value

A user‐based evaluation of automated subject classification in the context of browsing has not been conducted before; hence the study also presents new findings concerning methodology.

Details

Journal of Documentation, vol. 65 no. 6
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 2 September 2014

Koraljka Golub, Marianne Lykke and Douglas Tudhope

The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social…

1753

Abstract

Purpose

The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social tagging, with the ultimate purpose of improving subject indexing and information retrieval.

Design/methodology/approach

Over 11,000 Intute metadata records in politics were used. Totally, 28 politics students were each given four tasks, in which a total of 60 resources were tagged in two different configurations, one with uncontrolled social tags only and another with uncontrolled social tags as well as suggestions from a controlled vocabulary. The controlled vocabulary was DDC comprising also mappings from the Library of Congress Subject Headings.

Findings

The results demonstrate the importance of controlled vocabulary suggestions for indexing and retrieval: to help produce ideas of which tags to use, to make it easier to find focus for the tagging, to ensure consistency and to increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both as to conceptual relevance to the user and as to appropriateness of the terminology.

Originality/value

No research has investigated the enhancement of social tagging with suggestions from the DDC, an established KOS, in a user trial, comparing social tagging only and social tagging enhanced with the suggestions. This paper is a final reflection on all aspects of the study.

Details

Journal of Documentation, vol. 70 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 May 2006

Koraljka Golub

To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning…

2223

Abstract

Purpose

To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such.

Design/methodology/approach

A range of works dealing with automated classification of full‐text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages.

Findings

Provides major similarities and differences between the three approaches: document pre‐processing and utilization of web‐specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized.

Research limitations/implications

The paper does not attempt to provide an exhaustive bibliography of related resources.

Practical implications

As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities.

Originality/value

To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.

Details

Journal of Documentation, vol. 62 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

1 – 10 of 13