Search results

1 – 10 of over 3000
Article
Publication date: 23 November 2018

Chih-Ming Chen, Yung-Ting Chen and Chen-Yu Liu

An automatic text annotation system (ATAS) that can collect resources from different databases through Linked Data (LD) for automatically annotating ancient texts was developed in…

2952

Abstract

Purpose

An automatic text annotation system (ATAS) that can collect resources from different databases through Linked Data (LD) for automatically annotating ancient texts was developed in this study to support digital humanities research. It allows the humanists referring to resources from diverse databases when interpreting ancient texts as well as provides a friendly text annotation reader for humanists interpreting ancient text through reading. The paper aims to discuss whether the ATAS is helpful to support digital humanities research or not.

Design/methodology/approach

Based on the quasi-experimental design, the ATAS developed in this study and MARKUS semi-ATAS were compared whether the significant differences in the reading effectiveness and technology acceptance for supporting humanists interpreting ancient text of the Ming dynasty’s collections existed or not. Additionally, lag sequential analysis was also used to analyze users’ operation behaviors on the ATAS. A semi-structured in-depth interview was also applied to understand users’ opinions and perception of using the ATAS to interpret ancient texts through reading.

Findings

The experimental results reveal that the ATAS has higher reading effectiveness than MARKUS semi-ATAS, but not reaching the statistically significant difference. The technology acceptance of the ATAS is significantly higher than that of MARKUS semi-ATAS. Particularly, the function comparison of the two systems shows that the ATAS presents more perceived ease of use on the functions of term search, connection to source websites and adding annotation than MARKUS semi-ATAS. Furthermore, the reading interface of ATAS is simple and understandable and is more suitable for reading than MARKUS semi-ATAS. Among all the considered LD sources, Moedict, which is an online Chinese dictionary, was confirmed as the most helpful one.

Research limitations/implications

This study adopted Jieba Chinese parser to perform the word segmentation process based on a parser lexicon for the Chinese ancient texts of the Ming dynasty’s collections. The accuracy of word segmentation to a lexicon-based Chinese parser is limited due to ignoring the grammar and semantics of ancient texts. Moreover, the original parser lexicon used in Jieba Chinese parser only contains the modern words. This will reduce the accuracy of word segmentation for Chinese ancient texts. The two limitations that affect Jieba Chinese parser to correctly perform the word segmentation process for Chinese ancient texts will significantly affect the effectiveness of using ATAS to support digital humanities research. This study thus proposed a practicable scheme by adding new terms into the parser lexicon based on humanists’ self-judgment to improve the accuracy of word segmentation of Jieba Chinese parser.

Practical implications

Although some digital humanities platforms have been successfully developed to support digital humanities research for humanists, most of them have still not provided a friendly digital reading environment to support humanists on interpreting texts. For this reason, this study developed an ATAS that can automatically retrieve LD sources from different databases on the Internet to supply rich annotation information on reading texts to help humanists interpret texts. This study brings digital humanities research to a new ground.

Originality/value

This study proposed a novel ATAS that can automatically annotate useful information on an ancient text to increase the readability of the ancient text based on LD sources from different databases, thus helping humanists obtain a deeper and broader understanding in the ancient text. Currently, there is no this kind of tool developed for humanists to support digital humanities research.

Article
Publication date: 2 May 2023

Giovanna Aracri, Antonietta Folino and Stefano Silvestri

The purpose of this paper is to propose a methodology for the enrichment and tailoring of a knowledge organization system (KOS), in order to support the information extraction…

Abstract

Purpose

The purpose of this paper is to propose a methodology for the enrichment and tailoring of a knowledge organization system (KOS), in order to support the information extraction (IE) task for the analysis of documents in the tourism domain. In particular, the KOS is used to develop a named entity recognition (NER) system.

Design/methodology/approach

A method to improve and customize an available thesaurus by leveraging documents related to the tourism in Italy is firstly presented. Then, the obtained thesaurus is used to create an annotated NER corpus, exploiting both distant supervision, deep learning and a light human supervision.

Findings

The study shows that a customized KOS can effectively support IE tasks when applied to documents belonging to the same domains and types used for its construction. Moreover, it is very useful to support and ease the annotation task using the proposed methodology, allowing to annotate a corpus with a fraction of the effort required for a manual annotation.

Originality/value

The paper explores an alternative use of a KOS, proposing an innovative NER corpus annotation methodology. Moreover, the KOS and the annotated NER data set will be made publicly available.

Details

Journal of Documentation, vol. 79 no. 6
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 15 March 2024

Florian Rupp, Benjamin Schnabel and Kai Eckert

The purpose of this work is to explore the new possibilities enabled by the recent introduction of RDF-star, an extension that allows for statements about statements within the…

Abstract

Purpose

The purpose of this work is to explore the new possibilities enabled by the recent introduction of RDF-star, an extension that allows for statements about statements within the Resource Description Framework (RDF). Alongside Named Graphs, this approach offers opportunities to leverage a meta-level for data modeling and data applications.

Design/methodology/approach

In this extended paper, the authors build onto three modeling use cases published in a previous paper: (1) provide provenance information, (2) maintain backwards compatibility for existing models, and (3) reduce the complexity of a data model. The authors present two scenarios where they implement the use of the meta-level to extend a data model with meta-information.

Findings

The authors present three abstract patterns for actively using the meta-level in data modeling. The authors showcase the implementation of the meta-level through two scenarios from our research project: (1) the authors introduce a workflow for triple annotation that uses the meta-level to enable users to comment on individual statements, such as for reporting errors or adding supplementary information. (2) The authors demonstrate how adding meta-information to a data model can accommodate highly specialized data while maintaining the simplicity of the underlying model.

Practical implications

Through the formulation of data modeling patterns with RDF-star and the demonstration of their application in two scenarios, the authors advocate for data modelers to embrace the meta-level.

Originality/value

With RDF-star being a very new extension to RDF, to the best of the authors’ knowledge, they are among the first to relate it to other meta-level approaches and demonstrate its application in real-world scenarios.

Details

The Electronic Library , vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 15 June 2020

Abdelhak Belhi, Abdelaziz Bouras, Abdulaziz Khalid Al-Ali and Sebti Foufou

Digital tools have been used to document cultural heritage with high-quality imaging and metadata. However, some of the historical assets are totally or partially unlabeled and…

1033

Abstract

Purpose

Digital tools have been used to document cultural heritage with high-quality imaging and metadata. However, some of the historical assets are totally or partially unlabeled and some are physically damaged, which decreases their attractiveness and induces loss of value. This paper introduces a new framework that aims at tackling the cultural data enrichment challenge using machine learning.

Design/methodology/approach

This framework focuses on the automatic annotation and metadata completion through new deep learning classification and annotation methods. It also addresses issues related to physically damaged heritage objects through a new image reconstruction approach based on supervised and unsupervised learning.

Findings

The authors evaluate approaches on a data set of cultural objects collected from various cultural institutions around the world. For annotation and classification part of this study, the authors proposed and implemented a hierarchical multimodal classifier that improves the quality of annotation and increases the accuracy of the model, thanks to the introduction of multitask multimodal learning. Regarding cultural data visual reconstruction, the proposed clustering-based method, which combines supervised and unsupervised learning is found to yield better quality completion than existing inpainting frameworks.

Originality/value

This research work is original in sense that it proposes new approaches for the cultural data enrichment, and to the authors’ knowledge, none of the existing enrichment approaches focus on providing an integrated framework based on machine learning to solve current challenges in cultural heritage. These challenges, which are identified by the authors are related to metadata annotation and visual reconstruction.

Details

Journal of Enterprise Information Management, vol. 36 no. 3
Type: Research Article
ISSN: 1741-0398

Keywords

Article
Publication date: 14 June 2013

Bojan Božić and Werner Winiwarter

The purpose of this paper is to present a showcase of semantic time series processing which demonstrates how this technology can improve time series processing and community…

Abstract

Purpose

The purpose of this paper is to present a showcase of semantic time series processing which demonstrates how this technology can improve time series processing and community building by the use of a dedicated language.

Design/methodology/approach

The authors have developed a new semantic time series processing language and prepared showcases to demonstrate its functionality. The assumption is an environmental setting with data measurements from different sensors to be distributed to different groups of interest. The data are represented as time series for water and air quality, while the user groups are, among others, the environmental agency, companies from the industrial sector and legal authorities.

Findings

A language for time series processing and several tools to enrich the time series with meta‐data and for community building have been implemented in Python and Java. Also a GUI for demonstration purposes has been developed in PyQt4. In addition, an ontology for validation has been designed and a knowledge base for data storage and inference was set up. Some important features are: dynamic integration of ontologies, time series annotation, and semantic filtering.

Research limitations/implications

This paper focuses on the showcases of time series semantic language (TSSL), but also covers technical aspects and user interface issues. The authors are planning to develop TSSL further and evaluate it within further research projects and validation scenarios.

Practical implications

The research has a high practical impact on time series processing and provides new data sources for semantic web applications. It can also be used in social web platforms (especially for researchers) to provide a time series centric tagging and processing framework.

Originality/value

The paper presents an extended version of the paper presented at iiWAS2012.

Details

International Journal of Web Information Systems, vol. 9 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 19 July 2013

Leonardo Lezcano, Salvador Sánchez‐Alonso and Antonio J. Roa‐Valverde

The purpose of this paper is to provide a literature review of the principal formats and frameworks that have been used in the last 20 years to exchange linguistic resources. It…

Abstract

Purpose

The purpose of this paper is to provide a literature review of the principal formats and frameworks that have been used in the last 20 years to exchange linguistic resources. It aims to give special attention to the most recent approaches to publishing linguistic linked open data on the Web.

Design/methodology/approach

Research papers published since 1990 on the use of various formats, standards, frameworks and methods to exchange linguistic information were divided into two main categories: those proposing specific schemas and syntaxes to suit the requirements of a given type of linguistic data (these are referred to as offline approaches), and those adopting the linked data (LD) initiative and the semantic web technologies to support the interoperability of heterogeneous linguistic resources. For each paper, the type of linguistic resource exchanged, the framework/format used, the interoperability approach taken and the related projects were identified.

Findings

The information gathered in the survey reflects an increase in recent years in approaches adopting the LD initiative. This is due to the fact that the structural and syntactic issues which arise when addressing the interoperability of linguistic resources can be solved by applying semantic web technologies. What remains an open issue in the field of computational linguistics is the development of knowledge artefacts and mechanisms to support the alignment of the different aspects of linguistic resources in order to guarantee semantic and conceptual interoperability in the linked open data (LOD) cloud. Ontologies have proved to be of great use in achieving this goal.

Research limitations/implications

The research presented here is by no means a comprehensive or all‐inclusive survey of all existing approaches to the exchange of linguistic resources. Rather, the aim was to highlight, analyze and categorize the most significant advances in the field.

Practical implications

This survey has practical implications for computational linguists and for every application requiring new developments in natural language processing. In addition, multilingual issues can be better addressed when semantic interoperability of heterogeneous linguistic resources is achieved.

Originality/value

The paper provides a survey of past and present research and developments addressing the interoperability of linguistic resources, including those where the linked data initiative has been adopted.

Article
Publication date: 28 March 2023

Jun Liu, Sike Hu, Fuad Mehraliyev and Haolong Liu

This study aims to investigate the current state of research using deep learning methods for text classification in the tourism and hospitality field and to propose specific…

Abstract

Purpose

This study aims to investigate the current state of research using deep learning methods for text classification in the tourism and hospitality field and to propose specific guidelines for future research.

Design/methodology/approach

This study undertakes a qualitative and critical review of studies that use deep learning methods for text classification in research fields of tourism and hospitality and computer science. The data was collected from the Web of Science database and included studies published until February 2022.

Findings

Findings show that current research has mainly focused on text feature classification, text rating classification and text sentiment classification. Most of the deep learning methods used are relatively old, proposed in the 20th century, including feed-forward neural networks and artificial neural networks, among others. Deep learning algorithms proposed in recent years in the field of computer science with better classification performance have not been introduced to tourism and hospitality for large-scale dissemination and use. In addition, most of the data the studies used were from publicly available rating data sets; only two studies manually annotated data collected from online tourism websites.

Practical implications

The applications of deep learning algorithms and data in the tourism and hospitality field are discussed, laying the foundation for future text mining research. The findings also hold implications for managers regarding the use of deep learning in tourism and hospitality. Researchers and practitioners can use methodological frameworks and recommendations proposed in this study to perform more effective classifications such as for quality assessment or service feature extraction purposes.

Originality/value

The paper provides an integrative review of research in text classification using deep learning methods in the tourism and hospitality field, points out newer deep learning methods that are suitable for classification and identifies how to develop different annotated data sets applicable to the field. Furthermore, foundations and directions for future text classification research are set.

Details

International Journal of Contemporary Hospitality Management, vol. 35 no. 12
Type: Research Article
ISSN: 0959-6119

Keywords

Article
Publication date: 24 June 2020

Xinran Zhu, Bodong Chen, Rukmini Manasa Avadhanam, Hong Shui and Raymond Zhuo Zhang

The COVID-19 pandemic has forced many instructors to rapidly shift to online/distance teaching. With a narrow preparation window, many instructors are at a loss of strategies that…

5778

Abstract

Purpose

The COVID-19 pandemic has forced many instructors to rapidly shift to online/distance teaching. With a narrow preparation window, many instructors are at a loss of strategies that are both effective in responding to the crisis and compatible with their professional practices. One urgent need in classrooms at all levels is to support social reading of course materials. To fulfill this need, this paper aims to present a systematic literature review on using Web annotation in K-12 and higher education to provide practical and evidence-based recommendations for educators to incorporate social annotation in online teaching.

Design/methodology/approach

This paper presents a systematic literature review of the use of Web annotation in formal education. The authors reviewed 39 articles that met the inclusion criteria and extracted the following information from each article: level of education, subject area, learning theory, learning activity design, Web annotation technology, research methods and learning outcomes. Studies were further analyzed and synthesized by the genre of learning activity design.

Findings

The authors identified five types of social annotation activity design: processing domain-specific knowledge, supporting argumentation and inquiry, improving literacy skills, supporting instructor and peer assessment and connecting online learning spaces. In addition, the authors developed practical recommendations on setting pedagogical goals, selecting annotation tools, deciding instructor involvement and developing evaluation strategies.

Originality/value

This study provides a timely response to online/distance teaching under the COVID-19 pandemic. It is a hope that these identified application areas, in combination with four practical recommendations, would provide pragmatic and evidence-based support for educators to engage learners in reading, learning and connecting.

Details

Information and Learning Sciences, vol. 121 no. 5/6
Type: Research Article
ISSN: 2398-5348

Keywords

Article
Publication date: 12 January 2015

Hong Huang

– The purpose of this paper is to understand genomics scientists’ perceptions in data quality assurances based on their domain knowledge.

Abstract

Purpose

The purpose of this paper is to understand genomics scientists’ perceptions in data quality assurances based on their domain knowledge.

Design/methodology/approach

The study used a survey method to collect responses from 149 genomics scientists grouped by domain knowledge. They ranked the top-five quality criteria based on hypothetical curation scenarios. The results were compared using χ2 test.

Findings

Scientists with domain knowledge of biology, bioinformatics, and computational science did not reach a consensus in ranking data quality criteria. Findings showed that biologists cared more about curated data that can be concise and traceable. They were also concerned about skills dealing with information overloading. Computational scientists on the other hand value making curation understandable. They paid more attention to the specific skills for data wrangling.

Originality/value

This study takes a new approach in comparing the data quality perceptions for scientists across different domains of knowledge. Few studies have been able to synthesize models to interpret data quality perception across domains. The findings may help develop data quality assurance policies, training seminars, and maximize the efficiency of genome data management.

Details

Journal of Documentation, vol. 71 no. 1
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 12 April 2024

Youwei Li and Jian Qu

The purpose of this research is to achieve multi-task autonomous driving by adjusting the network architecture of the model. Meanwhile, after achieving multi-task autonomous…

Abstract

Purpose

The purpose of this research is to achieve multi-task autonomous driving by adjusting the network architecture of the model. Meanwhile, after achieving multi-task autonomous driving, the authors found that the trained neural network model performs poorly in untrained scenarios. Therefore, the authors proposed to improve the transfer efficiency of the model for new scenarios through transfer learning.

Design/methodology/approach

First, the authors achieved multi-task autonomous driving by training a model combining convolutional neural network and different structured long short-term memory (LSTM) layers. Second, the authors achieved fast transfer of neural network models in new scenarios by cross-model transfer learning. Finally, the authors combined data collection and data labeling to improve the efficiency of deep learning. Furthermore, the authors verified that the model has good robustness through light and shadow test.

Findings

This research achieved road tracking, real-time acceleration–deceleration, obstacle avoidance and left/right sign recognition. The model proposed by the authors (UniBiCLSTM) outperforms the existing models tested with model cars in terms of autonomous driving performance. Furthermore, the CMTL-UniBiCL-RL model trained by the authors through cross-model transfer learning improves the efficiency of model adaptation to new scenarios. Meanwhile, this research proposed an automatic data annotation method, which can save 1/4 of the time for deep learning.

Originality/value

This research provided novel solutions in the achievement of multi-task autonomous driving and neural network model scenario for transfer learning. The experiment was achieved on a single camera with an embedded chip and a scale model car, which is expected to simplify the hardware for autonomous driving.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9288

Keywords

1 – 10 of over 3000