To read this content please select one of the options below:

A cooperative crowdsourcing framework for knowledge extraction in digital humanities – cases on Tang poetry

Liang Hong (School of Information Management, Wuhan University, Wuhan, China)
Wenjun Hou (School of Information Management, Wuhan University, Wuhan, China)
Zonghui Wu (School of Information Management, Wuhan University, Wuhan, China)
Huijie Han (School of Information Management, Wuhan University, Wuhan, China)

Aslib Journal of Information Management

ISSN: 2050-3806

Article publication date: 25 February 2020

Issue publication date: 20 April 2020

1124

Abstract

Purpose

The purpose of this paper is to propose a knowledge extraction framework to extract knowledge, including entities and relationships between them, from unstructured texts in digital humanities (DH).

Design/methodology/approach

The proposed cooperative crowdsourcing framework (CCF) uses both human–computer cooperation and crowdsourcing to achieve high-quality and scalable knowledge extraction. CCF integrates active learning with a novel category-based crowdsourcing mechanism to facilitate domain experts labeling and verifying extracted knowledge.

Findings

The case study shows that CCF can effectively and efficiently extract knowledge from multi-sourced heterogeneous data in the field of Tang poetry. Specifically, CCF achieves higher accuracy of knowledge extraction than the state-of-the-art methods, the contribution of feedbacks to the training model can be maximized by the active learning mechanism and the proposed category-based crowdsourcing mechanism can scale up the effective human–computer collaboration by considering the specialization of workers in different categories of tasks.

Research limitations/implications

This research proposes CCF to enable high-quality and scalable knowledge extraction in the field of Tang poetry. CCF can be generalized to other fields of DH by introducing domain knowledge and experts.

Practical implications

The extracted knowledge is machine-understandable and can support the research of Tang poetry and knowledge-driven intelligent applications in DH.

Originality/value

CCF is the first human-in-the-loop knowledge extraction framework that integrates active learning and crowdsourcing mechanisms; he human–computer cooperation method uses the feedback of domain experts through the active learning mechanism; the category-based crowdsourcing mechanism considers the matching of categories of DH data and especially of domain experts.

Keywords

Citation

Hong, L., Hou, W., Wu, Z. and Han, H. (2020), "A cooperative crowdsourcing framework for knowledge extraction in digital humanities – cases on Tang poetry", Aslib Journal of Information Management, Vol. 72 No. 2, pp. 243-261. https://doi.org/10.1108/AJIM-07-2019-0192

Publisher

:

Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited

Related articles