Search results
1 – 10 of over 2000In this article, the author discusses works from the French Documentation Movement in the 1940s and 1950s with regard to how it formulates bibliographic classification systems as…
Abstract
Purpose
In this article, the author discusses works from the French Documentation Movement in the 1940s and 1950s with regard to how it formulates bibliographic classification systems as documents. Significant writings by Suzanne Briet, Éric de Grolier and Robert Pagès are analyzed in the light of current document-theoretical concepts and discussions.
Design/methodology/approach
Conceptual analysis.
Findings
The French Documentation Movement provided a rich intellectual environment in the late 1940s and early 1950s, resulting in original works on documents and the ways these may be represented bibliographically. These works display a variety of approaches from object-oriented description to notational concept-synthesis, and definitions of classification systems as isomorph documents at the center of politically informed critique of modern society.
Originality/value
The article brings together historical and conceptual elements in the analysis which have not previously been combined in Library and Information Science literature. In the analysis, the article discusses significant contributions to classification and document theory that hitherto have eluded attention from the wider international Library and Information Science research community. Through this, the article contributes to the currently ongoing conceptual discussion on documents and documentality.
Details
Keywords
Koraljka Golub, Osma Suominen, Ahmed Taiye Mohammed, Harriet Aagaard and Olof Osterman
In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an…
Abstract
Purpose
In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an open source software package on a large set of Swedish union catalogue metadata records, with Dewey Decimal Classification (DDC) as the target classification system. It also aimed to contribute to the body of research on aboutness and related challenges in automated subject indexing and evaluation.
Design/methodology/approach
On a sample of over 230,000 records with close to 12,000 distinct DDC classes, an open source tool Annif, developed by the National Library of Finland, was applied in the following implementations: lexical algorithm, support vector classifier, fastText, Omikuji Bonsai and an ensemble approach combing the former four. A qualitative study involving two senior catalogue librarians and three students of library and information studies was also conducted to investigate the value and inter-rater agreement of automatically assigned classes, on a sample of 60 records.
Findings
The best results were achieved using the ensemble approach that achieved 66.82% accuracy on the three-digit DDC classification task. The qualitative study confirmed earlier studies reporting low inter-rater agreement but also pointed to the potential value of automatically assigned classes as additional access points in information retrieval.
Originality/value
The paper presents an extensive study of automated classification in an operative library catalogue, accompanied by a qualitative study of automated classes. It demonstrates the value of applying semi-automated indexing in operative information retrieval systems.
Details
Keywords
Xuan Ji, Jiachen Wang and Zhijun Yan
Stock price prediction is a hot topic and traditional prediction methods are usually based on statistical and econometric models. However, these models are difficult to deal with…
Abstract
Purpose
Stock price prediction is a hot topic and traditional prediction methods are usually based on statistical and econometric models. However, these models are difficult to deal with nonstationary time series data. With the rapid development of the internet and the increasing popularity of social media, online news and comments often reflect investors’ emotions and attitudes toward stocks, which contains a lot of important information for predicting stock price. This paper aims to develop a stock price prediction method by taking full advantage of social media data.
Design/methodology/approach
This study proposes a new prediction method based on deep learning technology, which integrates traditional stock financial index variables and social media text features as inputs of the prediction model. This study uses Doc2Vec to build long text feature vectors from social media and then reduce the dimensions of the text feature vectors by stacked auto-encoder to balance the dimensions between text feature variables and stock financial index variables. Meanwhile, based on wavelet transform, the time series data of stock price is decomposed to eliminate the random noise caused by stock market fluctuation. Finally, this study uses long short-term memory model to predict the stock price.
Findings
The experiment results show that the method performs better than all three benchmark models in all kinds of evaluation indicators and can effectively predict stock price.
Originality/value
In this paper, this study proposes a new stock price prediction model that incorporates traditional financial features and social media text features which are derived from social media based on deep learning technology.
Details
Keywords
The purpose of this paper is to show how the documentation movement associated with the utopian thinkers Paul Otlet and Henri La Fontaine relied on patent offices as well as the…
Abstract
Purpose
The purpose of this paper is to show how the documentation movement associated with the utopian thinkers Paul Otlet and Henri La Fontaine relied on patent offices as well as the documents most closely associated with this institutional setting – the patents themselves – as central to the formation of the document category. The main argument is that patents not only were subjected to and helped construct, but also in fact engineered the development of technoscientific order during 1895–1937.
Design/methodology/approach
The paper draws on an interdisciplinary approach to intellectual property, document theory and insights from media archeology. Focused on the historical period 1895–1937, this study allows for an analysis that encapsulates and accounts for change in a number of comparative areas, moving from bibliography to documentation and from scientific to technoscientific order. Primary sources include Paul Otlet’s own writings, relevant contemporary sources from the French documentation movement and the Congrès Mondial de la documentation universelle in 1937.
Findings
By understanding patent offices and patents as main drivers behind those processes of sorting and classification that constitute technoscientific order, this explorative paper provides a new analytical framework for the study of intellectual property in relation to the history of information and documentation. It argues that the idea of the document may serve to rethink the role of the patent in technoscience, offering suggestions for new and underexplored venues of research in the nexus of several overlapping research fields, from law to information studies.
Originality/value
Debates over the legitimacy and rationale of intellectual property have raged for many years without signs of abating. Universities, research centers, policy makers, editors and scholars, research funders, governments, libraries and archives all have things to say on the legitimacy of the patent system, its relation to innovation and the appropriate role of intellectual property in research and science, milieus that are of central importance in the knowledge-based economy. The value of this paper lies in proposing a new way to approach patents that could show a way out of the current analytical gridlock of either/or that for many years has earmarked the “openness-enclosure” dichotomy. The combination of intellectual property scholarship and documentation theory provides important new insight into the historical networks and processes by which patents and documents have consolidated and converged during the twentieth century.
Details
Keywords
Mariam Elhussein and Samiha Brahimi
This paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile…
Abstract
Purpose
This paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile classification. The method is demonstrated through the problem of sick-leave promoters on Twitter.
Design/methodology/approach
Four machine learning classifiers were used on a total of 35,578 tweets posted on Twitter. The data were manually labeled into two categories: promoter and nonpromoter. Classification performance was compared when the proposed clustering feature selection approach and the standard feature selection were applied.
Findings
Radom forest achieved the highest accuracy of 95.91% higher than similar work compared. Furthermore, using clustering as a feature selection method improved the Sensitivity of the model from 73.83% to 98.79%. Sensitivity (recall) is the most important measure of classifier performance when detecting promoters’ accounts that have spam-like behavior.
Research limitations/implications
The method applied is novel, more testing is needed in other datasets before generalizing its results.
Practical implications
The model applied can be used by Saudi authorities to report on the accounts that sell sick-leaves online.
Originality/value
The research is proposing a new way textual clustering can be used in feature selection.
Details
Keywords
Jan-Halvard Bergquist, Samantha Tinet and Shang Gao
The purpose of this study is to create an information classification model that is tailored to suit the specific needs of public sector organizations in Sweden.
Abstract
Purpose
The purpose of this study is to create an information classification model that is tailored to suit the specific needs of public sector organizations in Sweden.
Design/methodology/approach
To address the purpose of this research, a case study in a Swedish municipality was conducted. Data was collected through a mixture of techniques such as literature, document and website review. Empirical data was collected through interviews with 11 employees working within 7 different sections of the municipality.
Findings
This study resulted in an information classification model that is tailored to the specific needs of Swedish municipalities. In addition, a set of steps for tailoring an information classification model to suit a specific public organization are recommended. The findings also indicate that for a successful information classification it is necessary to educate the employees about the basics of information security and classification and create an understandable and unified information security language.
Practical implications
This study also highlights that to have a tailored information classification model, it is imperative to understand the value of information and what kind of consequences a violation of established information security principles could have through the perspectives of the employees.
Originality/value
It is the first of its kind in tailoring an information classification model to the specific needs of a Swedish municipality. The model provided by this study can be used as a tool to facilitate a common ground for classifying information within all Swedish municipalities, thereby contributing the first step toward a Swedish municipal model for information classification.
Details
Keywords
Luís Jacques de Sousa, João Poças Martins, Luís Sanhudo and João Santos Baptista
This study aims to review recent advances towards the implementation of ANN and NLP applications during the budgeting phase of the construction process. During this phase…
Abstract
Purpose
This study aims to review recent advances towards the implementation of ANN and NLP applications during the budgeting phase of the construction process. During this phase, construction companies must assess the scope of each task and map the client’s expectations to an internal database of tasks, resources and costs. Quantity surveyors carry out this assessment manually with little to no computer aid, within very austere time constraints, even though these results determine the company’s bid quality and are contractually binding.
Design/methodology/approach
This paper seeks to compile applications of machine learning (ML) and natural language processing in the architectural engineering and construction sector to find which methodologies can assist this assessment. The paper carries out a systematic literature review, following the preferred reporting items for systematic reviews and meta-analyses guidelines, to survey the main scientific contributions within the topic of text classification (TC) for budgeting in construction.
Findings
This work concludes that it is necessary to develop data sets that represent the variety of tasks in construction, achieve higher accuracy algorithms, widen the scope of their application and reduce the need for expert validation of the results. Although full automation is not within reach in the short term, TC algorithms can provide helpful support tools.
Originality/value
Given the increasing interest in ML for construction and recent developments, the findings disclosed in this paper contribute to the body of knowledge, provide a more automated perspective on budgeting in construction and break ground for further implementation of text-based ML in budgeting for construction.
Details
Keywords
Matjaž Kragelj and Mirjana Kljajić Borštnar
The purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods.
Abstract
Purpose
The purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods.
Design/methodology/approach
The general research approach is inherent to design science research, in which the problem of UDC assignment of the old, digitised texts is addressed by developing a machine-learning classification model. A corpus of 70,000 scholarly texts, fully bibliographically processed by librarians, was used to train and test the model, which was used for classification of old texts on a corpus of 200,000 items. Human experts evaluated the performance of the model.
Findings
Results suggest that machine-learning models can correctly assign the UDC at some level for almost any scholarly text. Furthermore, the model can be recommended for the UDC assignment of older texts. Ten librarians corroborated this on 150 randomly selected texts.
Research limitations/implications
The main limitations of this study were unavailability of labelled older texts and the limited availability of librarians.
Practical implications
The classification model can provide a recommendation to the librarians during their classification work; furthermore, it can be implemented as an add-on to full-text search in the library databases.
Social implications
The proposed methodology supports librarians by recommending UDC classifiers, thus saving time in their daily work. By automatically classifying older texts, digital libraries can provide a better user experience by enabling structured searches. These contribute to making knowledge more widely available and useable.
Originality/value
These findings contribute to the field of automated classification of bibliographical information with the usage of full texts, especially in cases in which the texts are old, unstructured and in which archaic language and vocabulary are used.
Details
Keywords
Anette Rantanen, Joni Salminen, Filip Ginter and Bernard J. Jansen
User-generated social media comments can be a useful source of information for understanding online corporate reputation. However, the manual classification of these comments is…
Abstract
Purpose
User-generated social media comments can be a useful source of information for understanding online corporate reputation. However, the manual classification of these comments is challenging due to their high volume and unstructured nature. The purpose of this paper is to develop a classification framework and machine learning model to overcome these limitations.
Design/methodology/approach
The authors create a multi-dimensional classification framework for the online corporate reputation that includes six main dimensions synthesized from prior literature: quality, reliability, responsibility, successfulness, pleasantness and innovativeness. To evaluate the classification framework’s performance on real data, the authors retrieve 19,991 social media comments about two Finnish banks and use a convolutional neural network (CNN) to classify automatically the comments based on manually annotated training data.
Findings
After parameter optimization, the neural network achieves an accuracy between 52.7 and 65.2 percent on real-world data, which is reasonable given the high number of classes. The findings also indicate that prior work has not captured all the facets of online corporate reputation.
Practical implications
For practical purposes, the authors provide a comprehensive classification framework for online corporate reputation, which companies and organizations operating in various domains can use. Moreover, the authors demonstrate that using a limited amount of training data can yield a satisfactory multiclass classifier when using CNN.
Originality/value
This is the first attempt at automatically classifying online corporate reputation using an online-specific classification framework.
Details
Keywords
Tomás Lopes and Sérgio Guerreiro
Testing business processes is crucial to assess the compliance of business process models with requirements. Automating this task optimizes testing efforts and reduces human error…
Abstract
Purpose
Testing business processes is crucial to assess the compliance of business process models with requirements. Automating this task optimizes testing efforts and reduces human error while also providing improvement insights for the business process modeling activity. The primary purposes of this paper are to conduct a literature review of Business Process Model and Notation (BPMN) testing and formal verification and to propose the Business Process Evaluation and Research Framework for Enhancement and Continuous Testing (bPERFECT) framework, which aims to guide business process testing (BPT) research and implementation. Secondary objectives include (1) eliciting the existing types of testing, (2) evaluating their impact on efficiency and (3) assessing the formal verification techniques that complement testing.
Design/methodology/approach
The methodology used is based on Kitchenham's (2004) original procedures for conducting systematic literature reviews.
Findings
Results of this study indicate that three distinct business process model testing types can be found in the literature: black/gray-box, regression and integration. Testing and verification approaches differ in aspects such as awareness of test data, coverage criteria and auxiliary representations used. However, most solutions pose notable hindrances, such as BPMN element limitations, that lead to limited practicality.
Research limitations/implications
The databases selected in the review protocol may have excluded relevant studies on this topic. More databases and gray literature could also be considered for inclusion in this review.
Originality/value
Three main originality aspects are identified in this study as follows: (1) the classification of process model testing types, (2) the future trends foreseen for BPMN model testing and verification and (3) the bPERFECT framework for testing business processes.
Details