Search results

1 – 10 of over 3000
Open Access
Article
Publication date: 2 April 2024

Koraljka Golub, Osma Suominen, Ahmed Taiye Mohammed, Harriet Aagaard and Olof Osterman

In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an…

Abstract

Purpose

In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an open source software package on a large set of Swedish union catalogue metadata records, with Dewey Decimal Classification (DDC) as the target classification system. It also aimed to contribute to the body of research on aboutness and related challenges in automated subject indexing and evaluation.

Design/methodology/approach

On a sample of over 230,000 records with close to 12,000 distinct DDC classes, an open source tool Annif, developed by the National Library of Finland, was applied in the following implementations: lexical algorithm, support vector classifier, fastText, Omikuji Bonsai and an ensemble approach combing the former four. A qualitative study involving two senior catalogue librarians and three students of library and information studies was also conducted to investigate the value and inter-rater agreement of automatically assigned classes, on a sample of 60 records.

Findings

The best results were achieved using the ensemble approach that achieved 66.82% accuracy on the three-digit DDC classification task. The qualitative study confirmed earlier studies reporting low inter-rater agreement but also pointed to the potential value of automatically assigned classes as additional access points in information retrieval.

Originality/value

The paper presents an extensive study of automated classification in an operative library catalogue, accompanied by a qualitative study of automated classes. It demonstrates the value of applying semi-automated indexing in operative information retrieval systems.

Details

Journal of Documentation, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 4 July 2023

Joacim Hansson

In this article, the author discusses works from the French Documentation Movement in the 1940s and 1950s with regard to how it formulates bibliographic classification systems as…

Abstract

Purpose

In this article, the author discusses works from the French Documentation Movement in the 1940s and 1950s with regard to how it formulates bibliographic classification systems as documents. Significant writings by Suzanne Briet, Éric de Grolier and Robert Pagès are analyzed in the light of current document-theoretical concepts and discussions.

Design/methodology/approach

Conceptual analysis.

Findings

The French Documentation Movement provided a rich intellectual environment in the late 1940s and early 1950s, resulting in original works on documents and the ways these may be represented bibliographically. These works display a variety of approaches from object-oriented description to notational concept-synthesis, and definitions of classification systems as isomorph documents at the center of politically informed critique of modern society.

Originality/value

The article brings together historical and conceptual elements in the analysis which have not previously been combined in Library and Information Science literature. In the analysis, the article discusses significant contributions to classification and document theory that hitherto have eluded attention from the wider international Library and Information Science research community. Through this, the article contributes to the currently ongoing conceptual discussion on documents and documentality.

Details

Journal of Documentation, vol. 80 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 22 November 2022

Kedong Yin, Yun Cao, Shiwei Zhou and Xinman Lv

The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems…

Abstract

Purpose

The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems for the design optimization and inspection process. The research may form the basis for a rational, comprehensive evaluation and provide the most effective way of improving the quality of management decision-making. It is of practical significance to improve the rationality and reliability of the index system and provide standardized, scientific reference standards and theoretical guidance for the design and construction of the index system.

Design/methodology/approach

Using modern methods such as complex networks and machine learning, a system for the quality diagnosis of index data and the classification and stratification of index systems is designed. This guarantees the quality of the index data, realizes the scientific classification and stratification of the index system, reduces the subjectivity and randomness of the design of the index system, enhances its objectivity and rationality and lays a solid foundation for the optimal design of the index system.

Findings

Based on the ideas of statistics, system theory, machine learning and data mining, the focus in the present research is on “data quality diagnosis” and “index classification and stratification” and clarifying the classification standards and data quality characteristics of index data; a data-quality diagnosis system of “data review – data cleaning – data conversion – data inspection” is established. Using a decision tree, explanatory structural model, cluster analysis, K-means clustering and other methods, classification and hierarchical method system of indicators is designed to reduce the redundancy of indicator data and improve the quality of the data used. Finally, the scientific and standardized classification and hierarchical design of the index system can be realized.

Originality/value

The innovative contributions and research value of the paper are reflected in three aspects. First, a method system for index data quality diagnosis is designed, and multi-source data fusion technology is adopted to ensure the quality of multi-source, heterogeneous and mixed-frequency data of the index system. The second is to design a systematic quality-inspection process for missing data based on the systematic thinking of the whole and the individual. Aiming at the accuracy, reliability, and feasibility of the patched data, a quality-inspection method of patched data based on inversion thought and a unified representation method of data fusion based on a tensor model are proposed. The third is to use the modern method of unsupervised learning to classify and stratify the index system, which reduces the subjectivity and randomness of the design of the index system and enhances its objectivity and rationality.

Details

Marine Economics and Management, vol. 5 no. 2
Type: Research Article
ISSN: 2516-158X

Keywords

Open Access
Article
Publication date: 24 March 2023

Dimitris Koutoulas and Akrivi Vagena

The purpose of this study is, first, to determine which developments have shaped official hotel classification systems over recent years (including the impact of guest-review…

2605

Abstract

Purpose

The purpose of this study is, first, to determine which developments have shaped official hotel classification systems over recent years (including the impact of guest-review platforms) and second to establish the future of those systems through the eyes of the people who are actually in charge of operating them.

Design/methodology/approach

Semi-structured interviews were chosen as the most suitable method for approaching hotel classification system administrators. This method is in line with previous research on approaching key informants in their respective fields. Sixteen people representing 12 different official national hotel classification systems from across the world as well as one commercial hotel star rating system participated in the online interviews.

Findings

The first main conclusion is that hotel classification systems – especially voluntary ones – would not have survived the enormous impact of guest-review platforms without quickly adjusting to the ever-changing hotel industry landscape. The frequent review of classification criteria and procedures has become the main survival strategy of classification systems. The second conclusion is that system operators are strongly optimistic about the future outlook of hotel classification based on their proven flexibility to swiftly adapt to new market conditions.

Originality/value

Research about hotel classification systems is usually based on the views of the systems' users, i.e. hotels or hotel guests, whereas the present paper reflects the perspective of the systems' operators, an angle rarely analyzed in the literature.

Details

Journal of Tourism Futures, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2055-5911

Keywords

Open Access
Article
Publication date: 11 April 2023

Wenhao Yi, Mingnian Wang, Jianjun Tong, Siguang Zhao, Jiawang Li, Dengbin Gui and Xiao Zhang

The purpose of the study is to quickly identify significant heterogeneity of surrounding rock of tunnel face that generally occurs during the construction of large-section rock…

Abstract

Purpose

The purpose of the study is to quickly identify significant heterogeneity of surrounding rock of tunnel face that generally occurs during the construction of large-section rock tunnels of high-speed railways.

Design/methodology/approach

Relying on the support vector machine (SVM)-based classification model, the nominal classification of blastholes and nominal zoning and classification terms were used to demonstrate the heterogeneity identification method for the surrounding rock of tunnel face, and the identification calculation was carried out for the five test tunnels. Then, the suggestions for local optimization of the support structures of large-section rock tunnels were put forward.

Findings

The results show that compared with the two classification models based on neural networks, the SVM-based classification model has a higher classification accuracy when the sample size is small, and the average accuracy can reach 87.9%. After the samples are replaced, the SVM-based classification model can still reach the same accuracy, whose generalization ability is stronger.

Originality/value

By applying the identification method described in this paper, the significant heterogeneity characteristics of the surrounding rock in the process of two times of blasting were identified, and the identification results are basically consistent with the actual situation of the tunnel face at the end of blasting, and can provide a basis for local optimization of support parameters.

Details

Railway Sciences, vol. 2 no. 1
Type: Research Article
ISSN: 2755-0907

Keywords

Open Access
Article
Publication date: 30 July 2020

Alaa Tharwat

Classification techniques have been applied to many applications in various fields of sciences. There are several ways of evaluating classification algorithms. The analysis of…

32753

Abstract

Classification techniques have been applied to many applications in various fields of sciences. There are several ways of evaluating classification algorithms. The analysis of such metrics and its significance must be interpreted correctly for evaluating different learning algorithms. Most of these measures are scalar metrics and some of them are graphical methods. This paper introduces a detailed overview of the classification assessment measures with the aim of providing the basics of these measures and to show how it works to serve as a comprehensive source for researchers who are interested in this field. This overview starts by highlighting the definition of the confusion matrix in binary and multi-class classification problems. Many classification measures are also explained in details, and the influence of balanced and imbalanced data on each metric is presented. An illustrative example is introduced to show (1) how to calculate these measures in binary and multi-class classification problems, and (2) the robustness of some measures against balanced and imbalanced data. Moreover, some graphical measures such as Receiver operating characteristics (ROC), Precision-Recall, and Detection error trade-off (DET) curves are presented with details. Additionally, in a step-by-step approach, different numerical examples are demonstrated to explain the preprocessing steps of plotting ROC, PR, and DET curves.

Details

Applied Computing and Informatics, vol. 17 no. 1
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 17 October 2019

Qiong Bu, Elena Simperl, Adriane Chapman and Eddy Maddalena

Ensuring quality is one of the most significant challenges in microtask crowdsourcing tasks. Aggregation of the collected data from the crowd is one of the important steps to…

1290

Abstract

Purpose

Ensuring quality is one of the most significant challenges in microtask crowdsourcing tasks. Aggregation of the collected data from the crowd is one of the important steps to infer the correct answer, but the existing study seems to be limited to the single-step task. This study aims to look at multiple-step classification tasks and understand aggregation in such cases; hence, it is useful for assessing the classification quality.

Design/methodology/approach

The authors present a model to capture the information of the workflow, questions and answers for both single- and multiple-question classification tasks. They propose an adapted approach on top of the classic approach so that the model can handle tasks with several multiple-choice questions in general instead of a specific domain or any specific hierarchical classifications. They evaluate their approach with three representative tasks from existing citizen science projects in which they have the gold standard created by experts.

Findings

The results show that the approach can provide significant improvements to the overall classification accuracy. The authors’ analysis also demonstrates that all algorithms can achieve higher accuracy for the volunteer- versus paid-generated data sets for the same task. Furthermore, the authors observed interesting patterns in the relationship between the performance of different algorithms and workflow-specific factors including the number of steps and the number of available options in each step.

Originality/value

Due to the nature of crowdsourcing, aggregating the collected data is an important process to understand the quality of crowdsourcing results. Different inference algorithms have been studied for simple microtasks consisting of single questions with two or more answers. However, as classification tasks typically contain many questions, the proposed method can be applied to a wide range of tasks including both single- and multiple-question classification tasks.

Details

International Journal of Crowd Science, vol. 3 no. 3
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 23 July 2020

Rami Mustafa A. Mohammad

Spam emails classification using data mining and machine learning approaches has enticed the researchers' attention duo to its obvious positive impact in protecting internet…

1962

Abstract

Spam emails classification using data mining and machine learning approaches has enticed the researchers' attention duo to its obvious positive impact in protecting internet users. Several features can be used for creating data mining and machine learning based spam classification models. Yet, spammers know that the longer they will use the same set of features for tricking email users the more probably the anti-spam parties might develop tools for combating this kind of annoying email messages. Spammers, so, adapt by continuously reforming the group of features utilized for composing spam emails. For that reason, even though traditional classification methods possess sound classification results, they were ineffective for lifelong classification of spam emails duo to the fact that they might be prone to the so-called “Concept Drift”. In the current study, an enhanced model is proposed for ensuring lifelong spam classification model. For the evaluation purposes, the overall performance of the suggested model is contrasted against various other stream mining classification techniques. The results proved the success of the suggested model as a lifelong spam emails classification method.

Details

Applied Computing and Informatics, vol. 20 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 16 August 2021

Jan-Halvard Bergquist, Samantha Tinet and Shang Gao

The purpose of this study is to create an information classification model that is tailored to suit the specific needs of public sector organizations in Sweden.

2143

Abstract

Purpose

The purpose of this study is to create an information classification model that is tailored to suit the specific needs of public sector organizations in Sweden.

Design/methodology/approach

To address the purpose of this research, a case study in a Swedish municipality was conducted. Data was collected through a mixture of techniques such as literature, document and website review. Empirical data was collected through interviews with 11 employees working within 7 different sections of the municipality.

Findings

This study resulted in an information classification model that is tailored to the specific needs of Swedish municipalities. In addition, a set of steps for tailoring an information classification model to suit a specific public organization are recommended. The findings also indicate that for a successful information classification it is necessary to educate the employees about the basics of information security and classification and create an understandable and unified information security language.

Practical implications

This study also highlights that to have a tailored information classification model, it is imperative to understand the value of information and what kind of consequences a violation of established information security principles could have through the perspectives of the employees.

Originality/value

It is the first of its kind in tailoring an information classification model to the specific needs of a Swedish municipality. The model provided by this study can be used as a tool to facilitate a common ground for classifying information within all Swedish municipalities, thereby contributing the first step toward a Swedish municipal model for information classification.

Open Access
Article
Publication date: 4 March 2021

Paulo Henrique Bertucci Ramos and Marcelo Caldeira Pedroso

This paper aims to identify and analyze the agtech classification and categorization systems in the Brazilian context.

2265

Abstract

Purpose

This paper aims to identify and analyze the agtech classification and categorization systems in the Brazilian context.

Design/methodology/approach

The systematic literature review (SLR) was carried out according to the protocol of Kitchenham and Charters (2007). The classification systems found in literature were evaluated using the thinking aloud protocol, as proposed by Ericsson and Simon (1993). The responses obtained were evaluated through lexicographic analysis, described by Bécue-Bertaut (2019) and content analysis, described by Bardin (2011).

Findings

SLR identified four agtech classification systems. The model proposed by Dias, Jardim, and Sakuda (2019) was the one with the highest adherence to classify Brazilian agtechs. From the analysis of the systems found in literature, the authors proposed a new categorization model of agricultural startups (agtechs).

Research limitations/implications

The study has limitations in relation to the theoretical and empirical validation of the model proposed by the authors. This limitation can be the subject of subsequent research.

Practical implications

The SLR study considers the evolution of the classification systems of a new agribusiness reality, the agtechs. In addition, there is a practical contribution in proposing a new classification system that attempts to address some of the limitations found in previous studies.

Originality/value

Agtechs are startups focused on developing solutions for agriculture and have shown a significant increase in recent years. However, there are few studies focused on this type of company. Even rarer are the studies that seek to classify and categorize them. The present work opens the horizon for future studies focused on this new reality.

Details

Innovation & Management Review, vol. 18 no. 3
Type: Research Article
ISSN: 2515-8961

Keywords

1 – 10 of over 3000