The purpose of this paper is to propose an automatic method to generate semantic annotations of football transfer in the news. The current automatic news integration systems on the Web are constantly faced with the challenge of diversity, heterogeneity of sources. The approaches for information representation and storage based on syntax have some certain limitations in news searching, sorting, organizing and linking it appropriately. The models of semantic representation are promising to be the key to solving these problems.
The approach of the author leverages Semantic Web technologies to improve the performance of detection of hidden annotations in the news. The paper proposes an automatic method to generate semantic annotations based on named entity recognition and rule-based information extraction. The authors have built a domain ontology and knowledge base integrated with the knowledge and information management (KIM) platform to implement the former task (named entity recognition). The semantic extraction rules are constructed based on defined language models and the developed ontology.
The proposed method is implemented as a part of the sport news semantic annotations-generating prototype BKAnnotation. This component is a part of the sport integration system based on Web Semantics BKSport. The semantic annotations generated are used for improving features of news searching – sorting – association. The experiments on the news data from SkySport (2014) channel showed positive results. The precisions achieved in both cases, with and without integration of the pronoun recognition method, are both over 80 per cent. In particular, the latter helps increase the recall value to around 10 per cent.
This is one of the initial proposals in automatic creation of semantic data about news, football news in particular and sport news in general. The combination of ontology, knowledge base and patterns of language model allows detection of not only entities with corresponding types but also semantic triples. At the same time, the authors propose a pronoun recognition method using extraction rules to improve the relation recognition process.
This paper is an extended version of Quang-Minh et al., 2014, “Automatic creation of semantic data about football transfer in sport news”, Proceedings of the 16th International Conference on Information Integration and Web-based Applications & Services (iiWAS 2014), ACM, Hanoi, pp. 356-364.
Nguyen, Q.-M. and Cao, T.-D. (2015), "A novel approach for automatic extraction of semantic data about football transfer in sport news", International Journal of Pervasive Computing and Communications, Vol. 11 No. 2, pp. 233-252. https://doi.org/10.1108/IJPCC-03-2015-0018Download as .RIS
Emerald Group Publishing Limited
Copyright © 2015, Emerald Group Publishing Limited