Search results

1 – 10 of over 2000
Open Access
Article
Publication date: 8 December 2020

Matjaž Kragelj and Mirjana Kljajić Borštnar

The purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods.

2889

Abstract

Purpose

The purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods.

Design/methodology/approach

The general research approach is inherent to design science research, in which the problem of UDC assignment of the old, digitised texts is addressed by developing a machine-learning classification model. A corpus of 70,000 scholarly texts, fully bibliographically processed by librarians, was used to train and test the model, which was used for classification of old texts on a corpus of 200,000 items. Human experts evaluated the performance of the model.

Findings

Results suggest that machine-learning models can correctly assign the UDC at some level for almost any scholarly text. Furthermore, the model can be recommended for the UDC assignment of older texts. Ten librarians corroborated this on 150 randomly selected texts.

Research limitations/implications

The main limitations of this study were unavailability of labelled older texts and the limited availability of librarians.

Practical implications

The classification model can provide a recommendation to the librarians during their classification work; furthermore, it can be implemented as an add-on to full-text search in the library databases.

Social implications

The proposed methodology supports librarians by recommending UDC classifiers, thus saving time in their daily work. By automatically classifying older texts, digital libraries can provide a better user experience by enabling structured searches. These contribute to making knowledge more widely available and useable.

Originality/value

These findings contribute to the field of automated classification of bibliographical information with the usage of full texts, especially in cases in which the texts are old, unstructured and in which archaic language and vocabulary are used.

Details

Journal of Documentation, vol. 77 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 2 February 2023

Xian Wang, Yijian Zhao, Qingyi Wang, Huang Yixing and Gabedava George

This paper focuses on the orientation of the economy expressed in the communication of the Central Economic Work Conference (CEWC) of China and its relation with the stock market…

Abstract

Purpose

This paper focuses on the orientation of the economy expressed in the communication of the Central Economic Work Conference (CEWC) of China and its relation with the stock market. This study seeks to explore which orientation of the economy may have a stronger impact on the rise of the stock market. It proposes words connoting orientation of the economy (WOE) that is closely related to the stock market, and different WOE has different impacts on the stock market in terms of intensity. The study aims to provide investors with better investment strategies by identifying the stronger developmental WOE.

Design/methodology/approach

The paper opted for an exploratory study using the textual analysis approach, based on a corpus of 28 CEWC communications spanning from 1994 to 2021. The raw corpus amounted to 50,754 words in total that are treated with noise reduction method and record an effective corpus of 39,591.

Findings

The paper provides empirical insights into the close relationship of the WOE of the CEWC to the stock market, and different WOE has different impacts on the stock market in terms of intensity. It suggests that WOE connoting development may forecast a rising stock market if it is nearly 40% higher than the other two WOEs by impact index.

Research limitations/implications

As WOE is only proven in the CEWC, this paper has its limitations in the scope of samples. It is necessary to apply WOE to more Central Bank communication (CBC) and countries. It is desirable to apply the Gunning–Fog index.

Practical implications

The paper includes implications for investors to read out the orientation of the economy and the degree of different WOEs. Investors are keener to know “what” degree of the CEWC leads to the rise/fall of the stock market. The impact index can be an indicator of a tendency of the stock market, which upgrades the rationality of investment decisions.

Social implications

This paper fulfills words connoting the orientation of economy as an identified linguistic feature, which the impact of CEWC on stockmarket can be measured.

Originality/value

Previous academic research studies mostly focus on the impact on stock market from the language features of CBC, rather than that from the more influential body, CEWC communication. This study seeks to provide the relationship of CEWC communication and the time length of the impact on the stock prices.

Details

Journal of Capital Markets Studies, vol. 7 no. 1
Type: Research Article
ISSN: 2514-4774

Keywords

Open Access
Article
Publication date: 14 December 2021

Mariam Elhussein and Samiha Brahimi

This paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile…

Abstract

Purpose

This paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile classification. The method is demonstrated through the problem of sick-leave promoters on Twitter.

Design/methodology/approach

Four machine learning classifiers were used on a total of 35,578 tweets posted on Twitter. The data were manually labeled into two categories: promoter and nonpromoter. Classification performance was compared when the proposed clustering feature selection approach and the standard feature selection were applied.

Findings

Radom forest achieved the highest accuracy of 95.91% higher than similar work compared. Furthermore, using clustering as a feature selection method improved the Sensitivity of the model from 73.83% to 98.79%. Sensitivity (recall) is the most important measure of classifier performance when detecting promoters’ accounts that have spam-like behavior.

Research limitations/implications

The method applied is novel, more testing is needed in other datasets before generalizing its results.

Practical implications

The model applied can be used by Saudi authorities to report on the accounts that sell sick-leaves online.

Originality/value

The research is proposing a new way textual clustering can be used in feature selection.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 1 December 2016

Naomi Matsubara

This paper aims to highlight contrasts between the writing of young people in the UAE and Japan. For comparison, anthologies of 50-word short stories written in English, resulting…

Abstract

This paper aims to highlight contrasts between the writing of young people in the UAE and Japan. For comparison, anthologies of 50-word short stories written in English, resulting from the Extremely Short Story Competition (ESSC) in each country are examined. These two ESSC anthologies were created under similar conditions in 2006. Analysis of the most frequently-appearing topics in each ESSC anthology provides insights into the daily life, general mindsets, behavior, preferences, values and culture of these two groups. These data help us to understand the everyday life and social context of young people in the UAE and Japan. Thematic analysis shows that youth in both countries are often preoccupied with seeking identity, and regard friends to be important. Both groups of young people also appear to appreciate the beauty of nature and feel affection towards living creatures. An identifying characteristic of Emirati youth is that they talk about death more often than do the Japanese writers; in addition, the ESSC anthologies indicate UAE society is remarkably family-oriented, with life being firmly connected to Islam and God. In contrast, Japanese youth show they are keen to engage in various hobbies and also like to express their romantic feelings and thankfulness for their environment. The ESSC was originally designed to develop students’ creative writing in English. This study explains that corpora generated by the ESSC may be used to illuminate the lives and societies of students living in disparate countries, with implications for planning and delivering locally appropriate education.

Details

Learning and Teaching in Higher Education: Gulf Perspectives, vol. 13 no. 2
Type: Research Article
ISSN: 2077-5504

Open Access
Article
Publication date: 23 May 2023

Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen and Juha Rautiainen

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different…

Abstract

Purpose

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different quality OCR on users' subjective perception through an interactive information retrieval task with a collection of one digitized historical Finnish newspaper.

Design/methodology/approach

This study is based on the simulated work task model used in interactive information retrieval. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869–1918 which consists of ca. 1.45 million autosegmented articles. The article search database had two versions of each article with different quality OCR. Each user performed six pre-formulated and six self-formulated short queries and evaluated subjectively the top 10 results using a graded relevance scale of 0–3. Users were not informed about the OCR quality differences of the otherwise identical articles.

Findings

The main result of the study is that improved OCR quality affects subjective user perception of historical newspaper articles positively: higher relevance scores are given to better-quality texts.

Originality/value

To the best of the authors’ knowledge, this simulated interactive work task experiment is the first one showing empirically that users' subjective relevance assessments are affected by a change in the quality of an optically read text.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 17 July 2020

Mukesh Kumar and Palak Rehan

Social media networks like Twitter, Facebook, WhatsApp etc. are most commonly used medium for sharing news, opinions and to stay in touch with peers. Messages on twitter are…

1186

Abstract

Social media networks like Twitter, Facebook, WhatsApp etc. are most commonly used medium for sharing news, opinions and to stay in touch with peers. Messages on twitter are limited to 140 characters. This led users to create their own novel syntax in tweets to express more in lesser words. Free writing style, use of URLs, markup syntax, inappropriate punctuations, ungrammatical structures, abbreviations etc. makes it harder to mine useful information from them. For each tweet, we can get an explicit time stamp, the name of the user, the social network the user belongs to, or even the GPS coordinates if the tweet is created with a GPS-enabled mobile device. With these features, Twitter is, in nature, a good resource for detecting and analyzing the real time events happening around the world. By using the speed and coverage of Twitter, we can detect events, a sequence of important keywords being talked, in a timely manner which can be used in different applications like natural calamity relief support, earthquake relief support, product launches, suspicious activity detection etc. The keyword detection process from Twitter can be seen as a two step process: detection of keyword in the raw text form (words as posted by the users) and keyword normalization process (reforming the users’ unstructured words in the complete meaningful English language words). In this paper a keyword detection technique based upon the graph, spanning tree and Page Rank algorithm is proposed. A text normalization technique based upon hybrid approach using Levenshtein distance, demetaphone algorithm and dictionary mapping is proposed to work upon the unstructured keywords as produced by the proposed keyword detector. The proposed normalization technique is validated using the standard lexnorm 1.2 dataset. The proposed system is used to detect the keywords from Twiter text being posted at real time. The detected and normalized keywords are further validated from the search engine results at later time for detection of events.

Details

Applied Computing and Informatics, vol. 17 no. 2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 13 October 2022

Linzi Wang, Qiudan Li, Jingjun David Xu and Minjie Yuan

Mining user-concerned actionable and interpretable hot topics will help management departments fully grasp the latest events and make timely decisions. Existing topic models…

379

Abstract

Purpose

Mining user-concerned actionable and interpretable hot topics will help management departments fully grasp the latest events and make timely decisions. Existing topic models primarily integrate word embedding and matrix decomposition, which only generates keyword-based hot topics with weak interpretability, making it difficult to meet the specific needs of users. Mining phrase-based hot topics with syntactic dependency structure have been proven to model structure information effectively. A key challenge lies in the effective integration of the above information into the hot topic mining process.

Design/methodology/approach

This paper proposes the nonnegative matrix factorization (NMF)-based hot topic mining method, semantics syntax-assisted hot topic model (SSAHM), which combines semantic association and syntactic dependency structure. First, a semantic–syntactic component association matrix is constructed. Then, the matrix is used as a constraint condition to be incorporated into the block coordinate descent (BCD)-based matrix decomposition process. Finally, a hot topic information-driven phrase extraction algorithm is applied to describe hot topics.

Findings

The efficacy of the developed model is demonstrated on two real-world datasets, and the effects of dependency structure information on different topics are compared. The qualitative examples further explain the application of the method in real scenarios.

Originality/value

Most prior research focuses on keyword-based hot topics. Thus, the literature is advanced by mining phrase-based hot topics with syntactic dependency structure, which can effectively analyze the semantics. The development of syntactic dependency structure considering the combination of word order and part-of-speech (POS) is a step forward as word order, and POS are only separately utilized in the prior literature. Ignoring this synergy may miss important information, such as grammatical structure coherence and logical relations between syntactic components.

Details

Journal of Electronic Business & Digital Economics, vol. 1 no. 1/2
Type: Research Article
ISSN: 2754-4214

Keywords

Open Access
Article
Publication date: 14 July 2020

Yuning Zhao, Xinxue Zhou and Tianmei Wang

Following Hovland’s persuasion theory, this paper aims to develop a conceptual model and analyzes characteristics of online political deliberation behavior from three aspects…

1306

Abstract

Purpose

Following Hovland’s persuasion theory, this paper aims to develop a conceptual model and analyzes characteristics of online political deliberation behavior from three aspects (i.e. information, situation and manager). Based on the whole interactive process of online political deliberation, this paper aims to reveal the key points that affect the response effect of the government from the persuasive perspective of online political consultation.

Design/methodology/approach

Based on more than 40,000 netizens’ posts and government responses from 2011 to the first half of 2019 of the Chinese political platform, this paper used the text analysis and machine learning methods to extract measurement variables of online political deliberation characteristics and the econometrics analysis method to conduct empirical research.

Findings

The results showed that the textual information, political environment and identity of the political objects affect the effectiveness of government response. Furthermore, for different position categories of political officials, the length of political texts, topic categories and emotional tendencies have different effects on the response effectiveness. Additionally, the effect of political time on the effectiveness of response differs.

Originality/value

The findings will help ascertain the characteristics of online political deliberation behavior that affect how effective government response is and provide a theoretical basis for why the public should express their political concerns.

Details

International Journal of Crowd Science, vol. 4 no. 3
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 19 August 2022

Muneera Muftah

How closely does the translation match the meaning of the reference has always been a key aspect of any machine translation (MT) service. Therefore, the primary goal of this…

3937

Abstract

Purpose

How closely does the translation match the meaning of the reference has always been a key aspect of any machine translation (MT) service. Therefore, the primary goal of this research is to assess and compare translation adequacy in machine vs human translation (HT) from Arabic to English. The study looks into whether the MT product is adequate and more reliable than the HT. It also seeks to determine whether MT poses a real threat to professional Arabic–English translators.

Design/methodology/approach

Six different texts were chosen and translated from Arabic to English by two nonexpert undergraduate translation students as well as MT services, including Google Translate and Babylon Translation. The first system is free, whereas the second system is a fee-based service. Additionally, two expert translators developed a reference translation (RT) against which human and machine translations were compared and analyzed. Furthermore, the Sketch Engine software was utilized to examine the translations to determine if there is a significant difference between human and machine translations against the RT.

Findings

The findings indicated that when compared to the RT, there was no statistically significant difference between human and machine translations and that MTs were adequate translations. The human–machine relationship is mutually beneficial. However, MT will never be able to completely automated; rather, it will benefit rather than endanger humans. A translator who knows how to use MT will have an opportunity over those who are unfamiliar with the most up-to-date translation technology. As MTs improve, human translators may no longer be accurate translators, but rather editors and editing materials previously translated by machines.

Practical implications

The findings of this study provide valuable and practical implications for research in the field of MTs and for anyone interested in conducting MT research.

Originality/value

In general, this study is significant as it is a serious attempt at getting a better understanding of the efficiency of MT vs HT in translating the Arabic–English texts, and it will be beneficial for translators, students, educators as well as scholars in the field of translation.

Details

PSU Research Review, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2399-1747

Keywords

Open Access
Article
Publication date: 16 April 2024

Iddrisu Mohammed, Alexander Preko, Samuel Kwami Agbanu, Timothy K. Zilevu and Akorfa Wuttor

This conceptual paper aims to explore government regulatory responses of social networking platforms (SNP) and tourism destination evangelism. This research draws on a two-phase…

Abstract

Purpose

This conceptual paper aims to explore government regulatory responses of social networking platforms (SNP) and tourism destination evangelism. This research draws on a two-phase data source review of government legislations that guarantee social media users and empirical papers related to social media platforms. The results revealed that Ghana has adopted specific legislations that manage and control SNP. To the best of the author’s knowledge, this study is the first of its kind that synthesized government legislation and empirical papers on social networking platforms in evangelising destinations which have been missing in extant literature.

Details

Tourism Critiques: Practice and Theory, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2633-1225

Keywords

1 – 10 of over 2000