Understanding quotation extraction and attribution: towards automatic extraction of public figure’s statements for journalism in Indonesia
Global Knowledge, Memory and Communication
ISSN: 2514-9342
Article publication date: 2 December 2020
Issue publication date: 27 July 2021
Abstract
Purpose
Extracting information from unstructured data becomes a challenging task for computational linguistics. Public figure’s statement attributed by journalists in a story is one type of information that can be processed into structured data. Therefore, having the knowledge base about this data will be very beneficial for further use, such as for opinion mining, claim detection and fact-checking. This study aims to understand statement extraction tasks and the models that have already been applied to formulate a framework for further study.
Design/methodology/approach
This paper presents a literature review from selected previous research that specifically addresses the topics of quotation extraction and quotation attribution. Research works that discuss corpus development related to quotation extraction and quotation attribution are also considered. The findings of the review will be used as a basis for proposing a framework to direct further research.
Findings
There are three findings in this study. Firstly, the extraction process still consists of two main tasks, namely, the extraction of quotations and the attribution of quotations. Secondly, most extraction algorithms rely on a rule-based algorithm or traditional machine learning. And last, the availability of corpus, which is limited in quantity and depth. Based on these findings, a statement extraction framework for Indonesian language corpus and model development is proposed.
Originality/value
The paper serves as a guideline to formulate a framework for statement extraction based on the findings from the literature study. The proposed framework includes a corpus development in the Indonesian language and a model for public figure statement extraction. Furthermore, this study could be used as a reference to produce a similar framework for other languages.
Keywords
Acknowledgements
The authors would like to thank Universitas Atma Jaya Yogyakarta and Universiti Teknikal Malaysia Melaka for supporting this research.
Citation
Purnomo W.P., Y.S., Kumar, Y.J. and Zulkarnain, N.Z. (2021), "Understanding quotation extraction and attribution: towards automatic extraction of public figure’s statements for journalism in Indonesia", Global Knowledge, Memory and Communication, Vol. 70 No. 6/7, pp. 655-671. https://doi.org/10.1108/GKMC-07-2020-0098
Publisher
:Emerald Publishing Limited
Copyright © 2020, Emerald Publishing Limited