Search results

1 – 10 of over 2000

Open Access

Article

Publication date: 23 July 2019

Transforming scholarship in the archives through handwritten text recognition: Transkribus as a case study

Guenter Muehlberger, Louise Seaward, Melissa Terras, Sofia Ares Oliveira, Vicente Bosch, Maximilian Bryan, Sebastian Colutto, Hervé Déjean, Markus Diem, Stefan Fiel, Basilis Gatos, Albert Greinoecker, Tobias Grüning, Guenter Hackl, Vili Haukkovaara, Gerhard Heyer, Lauri Hirvonen, Tobias Hodel, Matti Jokinen, Philip Kahle, Mario Kallio, Frederic Kaplan, Florian Kleber, Roger Labahn, Eva Maria Lang, Sören Laube, Gundram Leifert, Georgios Louloudis, Rory McNicholl, Jean-Luc Meunier, Johannes Michael, Elena Mühlbauer, Nathanael Philipp, Ioannis Pratikakis, Joan Puigcerver Pérez, Hannelore Putz, George Retsinas, Verónica Romero, Robert Sablatnig, Joan Andreu Sánchez, Philip Schofield, Giorgos Sfikas, Christian Sieber, Nikolaos Stamatopoulos, Tobias Strauß, Tamara Terbul, Alejandro Héctor Toselli, Berthold Ulreich, Mauricio Villegas, Enrique Vidal, Johanna Walcher, Max Weidemann, Herbert Wurster and Konstantinos Zagoris

An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR…

HTML

PDF (832 KB)

Downloads

10696

Abstract

Purpose

An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR, demonstrates Transkribus, gives examples of use cases, highlights the affect HTR may have on scholarship, and evidences this turning point of the advanced use of digitised heritage content. The paper aims to discuss these issues.

Design/methodology/approach

This paper adopts a case study approach, using the development and delivery of the one openly available HTR platform for manuscript material.

Findings

Transkribus has demonstrated that HTR is now a useable technology that can be employed in conjunction with mass digitisation to generate accurate transcripts of archival material. Use cases are demonstrated, and a cooperative model is suggested as a way to ensure sustainability and scaling of the platform. However, funding and resourcing issues are identified.

Research limitations/implications

The paper presents results from projects: further user studies could be undertaken involving interviews, surveys, etc.

Practical implications

Only HTR provided via Transkribus is covered: however, this is the only publicly available platform for HTR on individual collections of historical documents at time of writing and it represents the current state-of-the-art in this field.

Social implications

The increased access to information contained within historical texts has the potential to be transformational for both institutions and individuals.

Originality/value

This is the first published overview of how HTR is used by a wide archival studies community, reporting and showcasing current application of handwriting technology in the cultural heritage sector.

Details

Journal of Documentation, vol. 75 no. 5

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

Open Access

Article

Publication date: 5 August 2021

Reaching hard-to-reach people through digital means – Citizens as initiators of co-creation in public services

Harri Jalonen, Jussi Kokkola, Harri Laihonen, Hanna Kirjavainen, Valtteri Kaartemo and Miika Vähämaa

This paper considers the potential of social media for developing public services. The paper approaches social media as a context that can provide information that might otherwise…

HTML

PDF (867 KB)

Downloads

1982

Abstract

Purpose

This paper considers the potential of social media for developing public services. The paper approaches social media as a context that can provide information that might otherwise be unattainable. The focus of analysis is on a special hard-to-reach group of marginalized youths who appear to have isolated themselves from society.

Design/methodology/approach

The authors answer the question: How can the experiences of socially withdrawn youth as shared on social media be used to enrich the knowledge base relating to the initiation phase of co-creation of public services? The data retrieved from the Finnish discussion forum are analyzed using the combination of unsupervised machine learning and discourse analysis.

Findings

The paper contributes by outlining a method that can be applied to identify expertise-by-experience from digital stories shared by marginalized youths. To overcome the challenges of making socially withdrawn youths real contributors to the co-creation of public services, this paper suggests several theoretical and managerial implications.

Originality/value

Co-creation assumes an interactive and dynamic relationship where value is created at the nexus of interaction. However, the evidence base for successful co-creation, particularly with digital technology, is limited. This paper fills the gap by providing findings from a case study that investigated how social media discussions can be a stimulus to enrich the knowledge base of the co-creation of public services.

Details

International Journal of Public Sector Management, vol. 34 no. 7

Type: Research Article

DOI:

ISSN: 0951-3558

Keywords

Open Access

Article

Publication date: 14 July 2022

Predicting sentiment and rating of tourist reviews using machine learning

Karlo Puh and Marina Bagić Babac

As the tourism industry becomes more vital for the success of many economies around the world, the importance of technology in tourism grows daily. Alongside increasing tourism…

HTML

PDF (787 KB)

Downloads

5871

Abstract

Purpose

As the tourism industry becomes more vital for the success of many economies around the world, the importance of technology in tourism grows daily. Alongside increasing tourism importance and popularity, the amount of significant data grows, too. On daily basis, millions of people write their opinions, suggestions and views about accommodation, services, and much more on various websites. Well-processed and filtered data can provide a lot of useful information that can be used for making tourists' experiences much better and help us decide when selecting a hotel or a restaurant. Thus, the purpose of this study is to explore machine and deep learning models for predicting sentiment and rating from tourist reviews.

Design/methodology/approach

This paper used machine learning models such as Naïve Bayes, support vector machines (SVM), convolutional neural network (CNN), long short-term memory (LSTM) and bidirectional long short-term memory (BiLSTM) for extracting sentiment and ratings from tourist reviews. These models were trained to classify reviews into positive, negative, or neutral sentiment, and into one to five grades or stars. Data used for training the models were gathered from TripAdvisor, the world's largest travel platform. The models based on multinomial Naïve Bayes (MNB) and SVM were trained using the term frequency-inverse document frequency (TF-IDF) for word representations while deep learning models were trained using global vectors (GloVe) for word representation. The results from testing these models are presented, compared and discussed.

Findings

The performance of machine and learning models achieved high accuracy in predicting positive, negative, or neutral sentiments and ratings from tourist reviews. The optimal model architecture for both classification tasks was a deep learning model based on BiLSTM. The study’s results confirmed that deep learning models are more efficient and accurate than machine learning algorithms.

Practical implications

The proposed models allow for forecasting the number of tourist arrivals and expenditure, gaining insights into the tourists' profiles, improving overall customer experience, and upgrading marketing strategies. Different service sectors can use the implemented models to get insights into customer satisfaction with the products and services as well as to predict the opinions given a particular context.

Originality/value

This study developed and compared different machine learning models for classifying customer reviews as positive, negative, or neutral, as well as predicting ratings with one to five stars based on a TripAdvisor hotel reviews dataset that contains 20,491 unique hotel reviews.

Details

Journal of Hospitality and Tourism Insights, vol. 6 no. 3

Type: Research Article

DOI:

ISSN: 2514-9792

Keywords

Open Access

Article

Publication date: 12 January 2024

Better abstract or concrete, narrating or not: optimal strategies for the communication of innovation

Ernesto Cardamone, Gaetano Miceli and Maria Antonietta Raimondo

This paper investigates how two characteristics of language, abstractness vs concreteness and narrativity, influence user engagement in communication exercises on innovation…

HTML

PDF (582 KB)

Downloads

572

Abstract

Purpose

This paper investigates how two characteristics of language, abstractness vs concreteness and narrativity, influence user engagement in communication exercises on innovation targeted to the general audience. The proposed conceptual model suggests that innovation fits well with more abstract language because of the association of innovation with imagination and distal construal. Moreover, communication of innovation may benefit from greater adherence to the narrativity arc, that is, early staging, increasing plot progression and climax optimal point. These effects are moderated by content variety and emotional tone, respectively.

Design/methodology/approach

Based on a Latent Dirichlet allocation (LDA) application on a sample of 3225 TED Talks transcripts, the authors identify 287 TED Talks on innovation, and then applied econometric analyses to test the hypotheses on the effects of abstractness vs concreteness and narrativity on engagement, and on the moderation effects of content variety and emotional tone.

Findings

The authors found that abstractness (vs concreteness) and narrativity have positive effects on engagement. These two effects are stronger with higher content variety and more positive emotional tone, respectively.

Research limitations/implications

This paper extends the literature on communication of innovation, linguistics and text analysis by evaluating the roles of abstractness vs concreteness and narrativity in shaping appreciation of innovation.

Originality/value

This paper reports conceptual and empirical analyses on innovation dissemination through a popular medium – TED Talks – and applies modern text analysis algorithms to test hypotheses on the effects of two pivotal dimensions of language on user engagement.

Details

European Journal of Innovation Management, vol. 27 no. 9

Type: Research Article

DOI:

ISSN: 1460-1060

Keywords

Open Access

Article

Publication date: 22 February 2022

Employing machine learning for capturing COVID-19 consumer sentiments from six countries: a methodological illustration

Bodo B. Schlegelmilch, Kirti Sharma and Sambbhav Garg

This paper aims to illustrate the scope and challenges of using computer-aided content analysis in international marketing with the aim to capture consumer sentiments about…

HTML

PDF (3.1 MB)

Downloads

2557

Abstract

Purpose

This paper aims to illustrate the scope and challenges of using computer-aided content analysis in international marketing with the aim to capture consumer sentiments about COVID-19 from multi-lingual tweets.

Design/methodology/approach

The study is based on some 35 million original COVID-19-related tweets. The study methodology illustrates the use of supervised machine learning and artificial neural network techniques to conduct extensive information extraction.

Findings

The authors identified more than two million tweets from six countries and categorized them into PESTEL (i.e. Political, Economic, Social, Technological, Environmental and Legal) dimensions. The extracted consumer sentiments and associated emotions show substantial differences across countries. Our analyses highlight opportunities and challenges inherent in using multi-lingual online sentiment analysis in international marketing. Based on these insights, several future research directions are proposed.

Originality/value

First, the authors contribute to methodology development in international marketing by providing a “use-case” for computer-aided text mining in a multi-lingual context. Second, the authors add to the knowledge on differences in COVID-19-related consumer sentiments in different countries. Third, the authors provide avenues for future research on the analysis of unstructured multi-media posts.

Details

International Marketing Review, vol. 40 no. 5

Type: Research Article

DOI:

ISSN: 0265-1335

Keywords

Open Access

Article

Publication date: 10 April 2023

Twitter and the circular economy: examining the public discourse

Loretta Mastroeni, Maurizio Naldi and Pierluigi Vellucci

Though the circular economy (CE) is a current buzzword, this still lacks a precise definition. In the absence of a clear notion of what that term includes, actions taken by the…

HTML

PDF (3.3 MB)

Downloads

1130

Abstract

Purpose

Though the circular economy (CE) is a current buzzword, this still lacks a precise definition. In the absence of a clear notion of what that term includes, actions taken by the government and companies may not be well informed. In particular, those actions need to consider what people mean when people talk about the CE, either to refocus people's decisions or to undertake a more effective communications strategy.

Design/methodology/approach

Since people voice people's opinions mainly through social media nowadays, special attention has to be paid to discussions on those media. In this paper, the authors focus on Twitter as a popular social platform to deliver one's thoughts quickly and fast. The authors' research aim is to get the perceptions of people about the CE. After collecting more than 100,000 tweets over 16 weeks, the authors analyse those tweets to understand the public discussion about the CE. The authors conduct a frequency analysis of the most recurring words, including the words' association with other words in the same context and categorise them into a set of topics.

Findings

The authors show that the discussion focuses on the usage of resources and materials that heavily endanger sustainability, i.e. carbon and plastic and the harmful habit of wasting. On the other hand, the two most common good practices associated with the CE and sustainability emerge as recycling and reuse (the latter being mentioned far less). Also, the business side of the CE appears to be relevant.

Research limitations/implications

The outcome of this analysis can drive suitable communication strategies by which companies and governments interested in the development of the CE can understand what is actually discussed on social media and call for the attention.

Originality/value

This paper addresses the lack of a standard definition the authors highlighted in the Introduction. The results confirm that people understand CE by looking both at CE's constituent activities and CE's expected consequences, namely the reduction of waste, the transition to a green economy free of plastic and other pollutants and the improvement of the world climate.

Details

Management Decision, vol. 61 no. 13

Type: Research Article

DOI:

ISSN: 0025-1747

Keywords

Open Access

Article

Publication date: 29 June 2022

The construction of an accurate Arabic sentiment analysis system based on resources alteration and approaches comparison

Ibtissam Touahri

This paper purposed a multi-facet sentiment analysis system.

HTML

PDF (1.1 MB)

Downloads

679

Abstract

Purpose

This paper purposed a multi-facet sentiment analysis system.

Design/methodology/approach

Hence, This paper uses multidomain resources to build a sentiment analysis system. The manual lexicon based features that are extracted from the resources are fed into a machine learning classifier to compare their performance afterward. The manual lexicon is replaced with a custom BOW to deal with its time consuming construction. To help the system run faster and make the model interpretable, this will be performed by employing different existing and custom approaches such as term occurrence, information gain, principal component analysis, semantic clustering, and POS tagging filters.

Findings

The proposed system featured by lexicon extraction automation and characteristics size optimization proved its efficiency when applied to multidomain and benchmark datasets by reaching 93.59% accuracy which makes it competitive to the state-of-the-art systems.

Originality/value

The construction of a custom BOW. Optimizing features based on existing and custom feature selection and clustering approaches.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Open Access

Article

Publication date: 24 August 2020

Environmental disclosure and sentiment analysis: state of the art and opportunities for public-sector organisations

Laura Rocca, Davide Giacomini and Paola Zola

Because of the expansion of the internet and Web 2.0 phenomenon, new challenges are emerging in the disclosure practises adopted by organisations in the public-sector. This study…

HTML

PDF (2.8 MB)

Downloads

2157

Abstract

Purpose

Because of the expansion of the internet and Web 2.0 phenomenon, new challenges are emerging in the disclosure practises adopted by organisations in the public-sector. This study aims to examine local governments’ (LGOs) use of social media (SM) in disclosing environmental actions/plans/information as a new way to improve accountability to citizens to obtain organisational legitimacy and the related sentiment of citizens’ judgements.

Design/methodology/approach

This paper analyses the content of 39 Italian LGOs’ public pages on Facebook. After the distinction between five classes of environmental issues (air, water, energy, waste and territory), an initial study is performed to detect possible sub-topics applying latent Dirichlet allocation. Having a list of posts related to specific environmental themes, the researchers computed the sentiment of citizens’ comments. To measure sentiment, two different approaches were implemented: one based on a lexicon dictionary and the other based on convolutional neural networks.

Findings

Facebook is used by LGOs to disclose environmental issues, focussing on their main interest in obtaining organisational legitimacy, and the analysis shows an increasing impact of Web 2.0 in the direct interaction of LGOs with citizens. On the other hand, there is a clear divergence of interest on environmental topics between LGOs and citizens in a dialogic accountability framework.

Practical implications

Sentiment analysis (SA) could be used by politicians, but also by managers/entrepreneurs in the business sector, to analyse stakeholders’ judgements of their communications/actions and plans on corporate social responsibility. This tool gives a result on time (i.e. not months or years after, as for the reporting system). It is cheaper than a survey and allows a first “photograph” of stakeholders’ sentiment. It can also be a useful tool for supporting, developing and improving environmental reporting.

Originality/value

To the best of the authors’ knowledge, this paper is one of the first to apply SA to environmental disclosure via SM in the public sphere. The study links modern techniques in natural language processing and machine learning with the important aspects of environmental communication between LGOs and citizens.

Details

Meditari Accountancy Research, vol. 29 no. 3

Type: Research Article

DOI:

ISSN: 2049-372X

Keywords

Open Access

Article

Publication date: 6 March 2017

Application of keyword extraction on MOOC resources

Zhuoxuan Jiang, Chunyan Miao and Xiaoming Li

Recent years have witnessed the rapid development of massive open online courses (MOOCs). With more and more courses being produced by instructors and being participated by…

HTML

PDF (1.2 MB)

Downloads

2116

Abstract

Purpose

Recent years have witnessed the rapid development of massive open online courses (MOOCs). With more and more courses being produced by instructors and being participated by learners all over the world, unprecedented massive educational resources are aggregated. The educational resources include videos, subtitles, lecture notes, quizzes, etc., on the teaching side, and forum contents, Wiki, log of learning behavior, log of homework, etc., on the learning side. However, the data are both unstructured and diverse. To facilitate knowledge management and mining on MOOCs, extracting keywords from the resources is important. This paper aims to adapt the state-of-the-art techniques to MOOC settings and evaluate the effectiveness on real data. In terms of practice, this paper also tries to answer the questions for the first time that to what extend can the MOOC resources support keyword extraction models, and how many human efforts are required to make the models work well.

Design/methodology/approach

Based on which side generates the data, i.e instructors or learners, the data are classified to teaching resources and learning resources, respectively. The approach used on teaching resources is based on machine learning models with labels, while the approach used on learning resources is based on graph model without labels.

Findings

From the teaching resources, the methods used by the authors can accurately extract keywords with only 10 per cent labeled data. The authors find a characteristic of the data that the resources of various forms, e.g. subtitles and PPTs, should be separately considered because they have the different model ability. From the learning resources, the keywords extracted from MOOC forums are not as domain-specific as those extracted from teaching resources, but they can reflect the topics which are lively discussed in forums. Then instructors can get feedback from the indication. The authors implement two applications with the extracted keywords: generating concept map and generating learning path. The visual demos show they have the potential to improve learning efficiency when they are integrated into a real MOOC platform.

Research limitations/implications

Conducting keyword extraction on MOOC resources is quite difficult because teaching resources are hard to be obtained due to copyrights. Also, getting labeled data is tough because usually expertise of the corresponding domain is required.

Practical implications

The experiment results support that MOOC resources are good enough for building models of keyword extraction, and an acceptable balance between human efforts and model accuracy can be achieved.

Originality/value

This paper presents a pioneer study on keyword extraction on MOOC resources and obtains some new findings.

Details

International Journal of Crowd Science, vol. 1 no. 1

Type: Research Article

DOI:

ISSN: 2398-7294

Keywords

Open Access

Article

Publication date: 19 July 2022

Transforming unstructured digital clinical notes for improved health literacy

Shreyesh Doppalapudi, Tingyan Wang and Robin Qiu

Clinical notes typically contain medical jargons and specialized words and phrases that are complicated and technical to most people, which is one of the most challenging…

HTML

PDF (3 MB)

Downloads

1050

Abstract

Purpose

Clinical notes typically contain medical jargons and specialized words and phrases that are complicated and technical to most people, which is one of the most challenging obstacles in health information dissemination to consumers by healthcare providers. The authors aim to investigate how to leverage machine learning techniques to transform clinical notes of interest into understandable expressions.

Design/methodology/approach

The authors propose a natural language processing pipeline that is capable of extracting relevant information from long unstructured clinical notes and simplifying lexicons by replacing medical jargons and technical terms. Particularly, the authors develop an unsupervised keywords matching method to extract relevant information from clinical notes. To automatically evaluate completeness of the extracted information, the authors perform a multi-label classification task on the relevant texts. To simplify lexicons in the relevant text, the authors identify complex words using a sequence labeler and leverage transformer models to generate candidate words for substitution. The authors validate the proposed pipeline using 58,167 discharge summaries from critical care services.

Findings

The results show that the proposed pipeline can identify relevant information with high completeness and simplify complex expressions in clinical notes so that the converted notes have a high level of readability but a low degree of meaning change.

Social implications

The proposed pipeline can help healthcare consumers well understand their medical information and therefore strengthen communications between healthcare providers and consumers for better care.

Originality/value

An innovative pipeline approach is developed to address the health literacy problem confronted by healthcare providers and consumers in the ongoing digital transformation process in the healthcare industry.

Details

Digital Transformation and Society, vol. 1 no. 1

Type: Research Article

DOI:

ISSN: 2755-0761

Keywords

Access

Year

Content type

1 – 10 of over 2000