The purpose of this paper is to apply local grammar (LG) to develop an indexing system which automatically extracts keywords from titles of Lebanese official journals.
To build LG for our system, the first word that plays the determinant role in understanding the meaning of a title is analyzed and grouped as the initial state. These steps are repeated recursively for the whole words. As a new title is introduced, the first word determines which LG should be applied to suggest or generate further potential keywords based on a set of features calculated for each node of a title.
The overall performance of our system is 67 per cent, which means that 67 per cent of the keywords extracted manually have been extracted by our system. This empirical result shows the validity of this study’s approach after taking into consideration the below-mentioned limitations.
The system has two limitations. First, it is applied to a sample of 5,747 titles and it can be developed to generate all finite state automata for all titles. The other limitation is that named entities are not processed due to their varieties that require specific ontology.
Almost all keyword extraction systems apply statistical, linguistic or hybrid approaches to extract keywords from texts. This paper contributes to the development of an automatic indexing system to replace the expensive human indexing by taking advantages of LG, which is mainly applied to extract time, date and proper names from texts.
Rammal, M., Bahsoun, Z. and Al Achkar Jabbour, M. (2015), "Keyword extraction from Arabic legal texts", Interactive Technology and Smart Education, Vol. 12 No. 1, pp. 62-71. https://doi.org/10.1108/ITSE-11-2013-0030Download as .RIS
Emerald Group Publishing Limited
Copyright © 2015, Emerald Group Publishing Limited