Search results

1 – 3 of 3

View access options

Article

Publication date: 11 May 2020

Comparing tagging suggestion models on discrete corpora

Bojan Bozic, Andre Rios and Sarah Jane Delany

This paper aims to investigate the methods for the prediction of tags on a textual corpus that describes diverse data sets based on short messages; as an example, the authors…

HTML

PDF (1.7 MB)

Downloads

Abstract

Purpose

This paper aims to investigate the methods for the prediction of tags on a textual corpus that describes diverse data sets based on short messages; as an example, the authors demonstrate the usage of methods based on hotel staff inputs in a ticketing system as well as the publicly available StackOverflow corpus. The aim is to improve the tagging process and find the most suitable method for suggesting tags for a new text entry.

Design/methodology/approach

The paper consists of two parts: exploration of existing sample data, which includes statistical analysis and visualisation of the data to provide an overview, and evaluation of tag prediction approaches. The authors have included different approaches from different research fields to cover a broad spectrum of possible solutions. As a result, the authors have tested a machine learning model for multi-label classification (using gradient boosting), a statistical approach (using frequency heuristics) and three similarity-based classification approaches (nearest centroid, k-nearest neighbours (k-NN) and naive Bayes). The experiment that compares the approaches uses recall to measure the quality of results. Finally, the authors provide a recommendation of the modelling approach that produces the best accuracy in terms of tag prediction on the sample data.

Findings

The authors have calculated the performance of each method against the test data set by measuring recall. The authors show recall for each method with different features (except for frequency heuristics, which does not provide the option to add additional features) for the dmbook pro and StackOverflow data sets. k-NN clearly provides the best recall. As k-NN turned out to provide the best results, the authors have performed further experiments with values of k from 1–10. This helped us to observe the impact of the number of neighbours used on the performance and to identify the best value for k.

Originality/value

The value and originality of the paper are given by extensive experiments with several methods from different domains. The authors have used probabilistic methods, such as naive Bayes, statistical methods, such as frequency heuristics, and similarity approaches, such as k-NN. Furthermore, the authors have produced results on an industrial-scale data set that has been provided by a company and used directly in their project, as well as a community-based data set with a large amount of data and dimensionality. The study results can be used to select a model based on diverse corpora for a specific use case, taking into account advantages and disadvantages when applying the model to your data.

Details

International Journal of Web Information Systems, vol. 16 no. 2

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Book part

Publication date: 2 February 2024

Searching for “A Green Place” in a World of “Fire and Blood”: An (Eco)Feminist Reading of the Mad Max: Fury Road

Marija Geiger Zeman, Mirela Holy and Brigita Miloš

HTML

PDF (1.5 MB)

EPUB (42 KB)

Details

Ecofeminism on the Edge: Theory and Practice

Type: Book

DOI:

ISBN: 978-1-80455-041-0

View access options

Book part

Publication date: 19 June 2019

References

Michael Schandorf

HTML

PDF (294 KB)

EPUB (90 KB)

Details

Communication as Gesture

Type: Book

DOI:

ISBN: 978-1-78756-515-9

Access

Year

Content type

1 – 3 of 3

Search results

Comparing tagging suggestion models on discrete corpora

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Searching for “A Green Place” in a World of “Fire and Blood”: An (Eco)Feminist Reading of the Mad Max: Fury Road

Abstract

Details

References

Abstract

Details

Access

Year

Content type

Something didn’t work…

All feedback is valuable

Platform update page

Questions & More Information

Comparing tagging suggestion models on discrete corpora

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Searching for “A Green Place” in a World of “Fire and Blood”: An (Eco)Feminist Reading of the Mad Max: Fury Road

Abstract

Details

References

Abstract

Details

Access

Year

Content type

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information