Search results

1 – 10 of over 2000
Open Access
Article
Publication date: 4 July 2023

Joacim Hansson

In this article, the author discusses works from the French Documentation Movement in the 1940s and 1950s with regard to how it formulates bibliographic classification systems as…

Abstract

Purpose

In this article, the author discusses works from the French Documentation Movement in the 1940s and 1950s with regard to how it formulates bibliographic classification systems as documents. Significant writings by Suzanne Briet, Éric de Grolier and Robert Pagès are analyzed in the light of current document-theoretical concepts and discussions.

Design/methodology/approach

Conceptual analysis.

Findings

The French Documentation Movement provided a rich intellectual environment in the late 1940s and early 1950s, resulting in original works on documents and the ways these may be represented bibliographically. These works display a variety of approaches from object-oriented description to notational concept-synthesis, and definitions of classification systems as isomorph documents at the center of politically informed critique of modern society.

Originality/value

The article brings together historical and conceptual elements in the analysis which have not previously been combined in Library and Information Science literature. In the analysis, the article discusses significant contributions to classification and document theory that hitherto have eluded attention from the wider international Library and Information Science research community. Through this, the article contributes to the currently ongoing conceptual discussion on documents and documentality.

Details

Journal of Documentation, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 2 April 2024

Koraljka Golub, Osma Suominen, Ahmed Taiye Mohammed, Harriet Aagaard and Olof Osterman

In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an…

Abstract

Purpose

In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an open source software package on a large set of Swedish union catalogue metadata records, with Dewey Decimal Classification (DDC) as the target classification system. It also aimed to contribute to the body of research on aboutness and related challenges in automated subject indexing and evaluation.

Design/methodology/approach

On a sample of over 230,000 records with close to 12,000 distinct DDC classes, an open source tool Annif, developed by the National Library of Finland, was applied in the following implementations: lexical algorithm, support vector classifier, fastText, Omikuji Bonsai and an ensemble approach combing the former four. A qualitative study involving two senior catalogue librarians and three students of library and information studies was also conducted to investigate the value and inter-rater agreement of automatically assigned classes, on a sample of 60 records.

Findings

The best results were achieved using the ensemble approach that achieved 66.82% accuracy on the three-digit DDC classification task. The qualitative study confirmed earlier studies reporting low inter-rater agreement but also pointed to the potential value of automatically assigned classes as additional access points in information retrieval.

Originality/value

The paper presents an extensive study of automated classification in an operative library catalogue, accompanied by a qualitative study of automated classes. It demonstrates the value of applying semi-automated indexing in operative information retrieval systems.

Details

Journal of Documentation, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 5 March 2021

Xuan Ji, Jiachen Wang and Zhijun Yan

Stock price prediction is a hot topic and traditional prediction methods are usually based on statistical and econometric models. However, these models are difficult to deal with…

16595

Abstract

Purpose

Stock price prediction is a hot topic and traditional prediction methods are usually based on statistical and econometric models. However, these models are difficult to deal with nonstationary time series data. With the rapid development of the internet and the increasing popularity of social media, online news and comments often reflect investors’ emotions and attitudes toward stocks, which contains a lot of important information for predicting stock price. This paper aims to develop a stock price prediction method by taking full advantage of social media data.

Design/methodology/approach

This study proposes a new prediction method based on deep learning technology, which integrates traditional stock financial index variables and social media text features as inputs of the prediction model. This study uses Doc2Vec to build long text feature vectors from social media and then reduce the dimensions of the text feature vectors by stacked auto-encoder to balance the dimensions between text feature variables and stock financial index variables. Meanwhile, based on wavelet transform, the time series data of stock price is decomposed to eliminate the random noise caused by stock market fluctuation. Finally, this study uses long short-term memory model to predict the stock price.

Findings

The experiment results show that the method performs better than all three benchmark models in all kinds of evaluation indicators and can effectively predict stock price.

Originality/value

In this paper, this study proposes a new stock price prediction model that incorporates traditional financial features and social media text features which are derived from social media based on deep learning technology.

Details

International Journal of Crowd Science, vol. 5 no. 1
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 7 March 2019

Eva Hemmungs Wirtén

The purpose of this paper is to show how the documentation movement associated with the utopian thinkers Paul Otlet and Henri La Fontaine relied on patent offices as well as the…

2373

Abstract

Purpose

The purpose of this paper is to show how the documentation movement associated with the utopian thinkers Paul Otlet and Henri La Fontaine relied on patent offices as well as the documents most closely associated with this institutional setting – the patents themselves – as central to the formation of the document category. The main argument is that patents not only were subjected to and helped construct, but also in fact engineered the development of technoscientific order during 1895–1937.

Design/methodology/approach

The paper draws on an interdisciplinary approach to intellectual property, document theory and insights from media archeology. Focused on the historical period 1895–1937, this study allows for an analysis that encapsulates and accounts for change in a number of comparative areas, moving from bibliography to documentation and from scientific to technoscientific order. Primary sources include Paul Otlet’s own writings, relevant contemporary sources from the French documentation movement and the Congrès Mondial de la documentation universelle in 1937.

Findings

By understanding patent offices and patents as main drivers behind those processes of sorting and classification that constitute technoscientific order, this explorative paper provides a new analytical framework for the study of intellectual property in relation to the history of information and documentation. It argues that the idea of the document may serve to rethink the role of the patent in technoscience, offering suggestions for new and underexplored venues of research in the nexus of several overlapping research fields, from law to information studies.

Originality/value

Debates over the legitimacy and rationale of intellectual property have raged for many years without signs of abating. Universities, research centers, policy makers, editors and scholars, research funders, governments, libraries and archives all have things to say on the legitimacy of the patent system, its relation to innovation and the appropriate role of intellectual property in research and science, milieus that are of central importance in the knowledge-based economy. The value of this paper lies in proposing a new way to approach patents that could show a way out of the current analytical gridlock of either/or that for many years has earmarked the “openness-enclosure” dichotomy. The combination of intellectual property scholarship and documentation theory provides important new insight into the historical networks and processes by which patents and documents have consolidated and converged during the twentieth century.

Details

Journal of Documentation, vol. 75 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 14 December 2021

Mariam Elhussein and Samiha Brahimi

This paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile…

Abstract

Purpose

This paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile classification. The method is demonstrated through the problem of sick-leave promoters on Twitter.

Design/methodology/approach

Four machine learning classifiers were used on a total of 35,578 tweets posted on Twitter. The data were manually labeled into two categories: promoter and nonpromoter. Classification performance was compared when the proposed clustering feature selection approach and the standard feature selection were applied.

Findings

Radom forest achieved the highest accuracy of 95.91% higher than similar work compared. Furthermore, using clustering as a feature selection method improved the Sensitivity of the model from 73.83% to 98.79%. Sensitivity (recall) is the most important measure of classifier performance when detecting promoters’ accounts that have spam-like behavior.

Research limitations/implications

The method applied is novel, more testing is needed in other datasets before generalizing its results.

Practical implications

The model applied can be used by Saudi authorities to report on the accounts that sell sick-leaves online.

Originality/value

The research is proposing a new way textual clustering can be used in feature selection.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 16 August 2021

Jan-Halvard Bergquist, Samantha Tinet and Shang Gao

The purpose of this study is to create an information classification model that is tailored to suit the specific needs of public sector organizations in Sweden.

2114

Abstract

Purpose

The purpose of this study is to create an information classification model that is tailored to suit the specific needs of public sector organizations in Sweden.

Design/methodology/approach

To address the purpose of this research, a case study in a Swedish municipality was conducted. Data was collected through a mixture of techniques such as literature, document and website review. Empirical data was collected through interviews with 11 employees working within 7 different sections of the municipality.

Findings

This study resulted in an information classification model that is tailored to the specific needs of Swedish municipalities. In addition, a set of steps for tailoring an information classification model to suit a specific public organization are recommended. The findings also indicate that for a successful information classification it is necessary to educate the employees about the basics of information security and classification and create an understandable and unified information security language.

Practical implications

This study also highlights that to have a tailored information classification model, it is imperative to understand the value of information and what kind of consequences a violation of established information security principles could have through the perspectives of the employees.

Originality/value

It is the first of its kind in tailoring an information classification model to the specific needs of a Swedish municipality. The model provided by this study can be used as a tool to facilitate a common ground for classifying information within all Swedish municipalities, thereby contributing the first step toward a Swedish municipal model for information classification.

Open Access
Article
Publication date: 23 January 2024

Luís Jacques de Sousa, João Poças Martins, Luís Sanhudo and João Santos Baptista

This study aims to review recent advances towards the implementation of ANN and NLP applications during the budgeting phase of the construction process. During this phase…

Abstract

Purpose

This study aims to review recent advances towards the implementation of ANN and NLP applications during the budgeting phase of the construction process. During this phase, construction companies must assess the scope of each task and map the client’s expectations to an internal database of tasks, resources and costs. Quantity surveyors carry out this assessment manually with little to no computer aid, within very austere time constraints, even though these results determine the company’s bid quality and are contractually binding.

Design/methodology/approach

This paper seeks to compile applications of machine learning (ML) and natural language processing in the architectural engineering and construction sector to find which methodologies can assist this assessment. The paper carries out a systematic literature review, following the preferred reporting items for systematic reviews and meta-analyses guidelines, to survey the main scientific contributions within the topic of text classification (TC) for budgeting in construction.

Findings

This work concludes that it is necessary to develop data sets that represent the variety of tasks in construction, achieve higher accuracy algorithms, widen the scope of their application and reduce the need for expert validation of the results. Although full automation is not within reach in the short term, TC algorithms can provide helpful support tools.

Originality/value

Given the increasing interest in ML for construction and recent developments, the findings disclosed in this paper contribute to the body of knowledge, provide a more automated perspective on budgeting in construction and break ground for further implementation of text-based ML in budgeting for construction.

Details

Construction Innovation , vol. 24 no. 7
Type: Research Article
ISSN: 1471-4175

Keywords

Open Access
Article
Publication date: 8 December 2020

Matjaž Kragelj and Mirjana Kljajić Borštnar

The purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods.

2873

Abstract

Purpose

The purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods.

Design/methodology/approach

The general research approach is inherent to design science research, in which the problem of UDC assignment of the old, digitised texts is addressed by developing a machine-learning classification model. A corpus of 70,000 scholarly texts, fully bibliographically processed by librarians, was used to train and test the model, which was used for classification of old texts on a corpus of 200,000 items. Human experts evaluated the performance of the model.

Findings

Results suggest that machine-learning models can correctly assign the UDC at some level for almost any scholarly text. Furthermore, the model can be recommended for the UDC assignment of older texts. Ten librarians corroborated this on 150 randomly selected texts.

Research limitations/implications

The main limitations of this study were unavailability of labelled older texts and the limited availability of librarians.

Practical implications

The classification model can provide a recommendation to the librarians during their classification work; furthermore, it can be implemented as an add-on to full-text search in the library databases.

Social implications

The proposed methodology supports librarians by recommending UDC classifiers, thus saving time in their daily work. By automatically classifying older texts, digital libraries can provide a better user experience by enabling structured searches. These contribute to making knowledge more widely available and useable.

Originality/value

These findings contribute to the field of automated classification of bibliographical information with the usage of full texts, especially in cases in which the texts are old, unstructured and in which archaic language and vocabulary are used.

Details

Journal of Documentation, vol. 77 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 5 November 2019

Anette Rantanen, Joni Salminen, Filip Ginter and Bernard J. Jansen

User-generated social media comments can be a useful source of information for understanding online corporate reputation. However, the manual classification of these comments is…

4355

Abstract

Purpose

User-generated social media comments can be a useful source of information for understanding online corporate reputation. However, the manual classification of these comments is challenging due to their high volume and unstructured nature. The purpose of this paper is to develop a classification framework and machine learning model to overcome these limitations.

Design/methodology/approach

The authors create a multi-dimensional classification framework for the online corporate reputation that includes six main dimensions synthesized from prior literature: quality, reliability, responsibility, successfulness, pleasantness and innovativeness. To evaluate the classification framework’s performance on real data, the authors retrieve 19,991 social media comments about two Finnish banks and use a convolutional neural network (CNN) to classify automatically the comments based on manually annotated training data.

Findings

After parameter optimization, the neural network achieves an accuracy between 52.7 and 65.2 percent on real-world data, which is reasonable given the high number of classes. The findings also indicate that prior work has not captured all the facets of online corporate reputation.

Practical implications

For practical purposes, the authors provide a comprehensive classification framework for online corporate reputation, which companies and organizations operating in various domains can use. Moreover, the authors demonstrate that using a limited amount of training data can yield a satisfactory multiclass classifier when using CNN.

Originality/value

This is the first attempt at automatically classifying online corporate reputation using an online-specific classification framework.

Details

Internet Research, vol. 30 no. 1
Type: Research Article
ISSN: 1066-2243

Keywords

Open Access
Article
Publication date: 5 April 2023

Tomás Lopes and Sérgio Guerreiro

Testing business processes is crucial to assess the compliance of business process models with requirements. Automating this task optimizes testing efforts and reduces human error…

2502

Abstract

Purpose

Testing business processes is crucial to assess the compliance of business process models with requirements. Automating this task optimizes testing efforts and reduces human error while also providing improvement insights for the business process modeling activity. The primary purposes of this paper are to conduct a literature review of Business Process Model and Notation (BPMN) testing and formal verification and to propose the Business Process Evaluation and Research Framework for Enhancement and Continuous Testing (bPERFECT) framework, which aims to guide business process testing (BPT) research and implementation. Secondary objectives include (1) eliciting the existing types of testing, (2) evaluating their impact on efficiency and (3) assessing the formal verification techniques that complement testing.

Design/methodology/approach

The methodology used is based on Kitchenham's (2004) original procedures for conducting systematic literature reviews.

Findings

Results of this study indicate that three distinct business process model testing types can be found in the literature: black/gray-box, regression and integration. Testing and verification approaches differ in aspects such as awareness of test data, coverage criteria and auxiliary representations used. However, most solutions pose notable hindrances, such as BPMN element limitations, that lead to limited practicality.

Research limitations/implications

The databases selected in the review protocol may have excluded relevant studies on this topic. More databases and gray literature could also be considered for inclusion in this review.

Originality/value

Three main originality aspects are identified in this study as follows: (1) the classification of process model testing types, (2) the future trends foreseen for BPMN model testing and verification and (3) the bPERFECT framework for testing business processes.

Details

Business Process Management Journal, vol. 29 no. 8
Type: Research Article
ISSN: 1463-7154

Keywords

1 – 10 of over 2000