To read this content please select one of the options below:

Understanding quotation extraction and attribution: towards automatic extraction of public figure’s statements for journalism in Indonesia

Yohanes Sigit Purnomo W.P. (Informatics Department, Faculty of Industrial Technology, Universitas Atma Jaya Yogyakarta, Yogyakarta, Indonesia and Center for Advanced Computing Technology (C-ACT), Fakulti Teknologi Maklumat Dan Komunikasi, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia)
Yogan Jaya Kumar (Center for Advanced Computing Technology (C-ACT), Fakulti Teknologi Maklumat Dan Komunikasi, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia)
Nur Zareen Zulkarnain (Center for Advanced Computing Technology (C-ACT), Fakulti Teknologi Maklumat Dan Komunikasi, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia)

Global Knowledge, Memory and Communication

ISSN: 2514-9342

Article publication date: 2 December 2020

Issue publication date: 27 July 2021

294

Abstract

Purpose

Extracting information from unstructured data becomes a challenging task for computational linguistics. Public figure’s statement attributed by journalists in a story is one type of information that can be processed into structured data. Therefore, having the knowledge base about this data will be very beneficial for further use, such as for opinion mining, claim detection and fact-checking. This study aims to understand statement extraction tasks and the models that have already been applied to formulate a framework for further study.

Design/methodology/approach

This paper presents a literature review from selected previous research that specifically addresses the topics of quotation extraction and quotation attribution. Research works that discuss corpus development related to quotation extraction and quotation attribution are also considered. The findings of the review will be used as a basis for proposing a framework to direct further research.

Findings

There are three findings in this study. Firstly, the extraction process still consists of two main tasks, namely, the extraction of quotations and the attribution of quotations. Secondly, most extraction algorithms rely on a rule-based algorithm or traditional machine learning. And last, the availability of corpus, which is limited in quantity and depth. Based on these findings, a statement extraction framework for Indonesian language corpus and model development is proposed.

Originality/value

The paper serves as a guideline to formulate a framework for statement extraction based on the findings from the literature study. The proposed framework includes a corpus development in the Indonesian language and a model for public figure statement extraction. Furthermore, this study could be used as a reference to produce a similar framework for other languages.

Keywords

Acknowledgements

The authors would like to thank Universitas Atma Jaya Yogyakarta and Universiti Teknikal Malaysia Melaka for supporting this research.

Citation

Purnomo W.P., Y.S., Kumar, Y.J. and Zulkarnain, N.Z. (2021), "Understanding quotation extraction and attribution: towards automatic extraction of public figure’s statements for journalism in Indonesia", Global Knowledge, Memory and Communication, Vol. 70 No. 6/7, pp. 655-671. https://doi.org/10.1108/GKMC-07-2020-0098

Publisher

:

Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited

Related articles