Large collections of patent documents disclosing novel, non-obvious technologies are publicly available and beneficial to academia and industries. To maximally exploit its potential, searching these patent documents has increasingly become an important topic. Although much research has processed a large size of collections, a few studies have attempted to integrate both patent classifications and specifications for analyzing user queries. Consequently, the queries are often insufficiently analyzed for improving the accuracy of search results. This paper aims to address such limitation by exploiting semantic relationships between patent contents and their classification.
The contributions are fourfold. First, the authors enhance similarity measurement between two short sentences and make it 20 per cent more accurate. Second, the Graph-embedded Tree ontology is enriched by integrating both patent documents and classification scheme. Third, the ontology does not rely on rule-based method or text matching; instead, an heuristic meaning comparison to extract semantic relationships between concepts is applied. Finally, the patent search approach uses the ontology effectively with the results sorted based on their most common order.
The experiment on searching for 600 patent documents in the field of Logistics brings better 15 per cent in terms of F-Measure when compared with traditional approaches.
The research, however, still requires improvement in which the terms and phrases extracted by Noun and Noun phrases making less sense in some aspect and thus might not result in high accuracy. The large collection of extracted relationships could be further optimized for its conciseness. In addition, parallel processing such as Map-Reduce could be further used to improve the search processing performance.
The experimental results could be used for scientists and technologists to search for novel, non-obvious technologies in the patents.
High quality of patent search results will reduce the patent infringement.
The proposed ontology is semantically enriched by integrating both patent documents and their classification. This ontology facilitates the analysis of the user queries for enhancing the accuracy of the patent search results.
This research is funded by Vietnam National University Ho Chi Minh City (VNUHCM) under grant number C2016-2808.
Phan, C.-P., Nguyen, H.-Q. and Nguyen, T.-T. (2019), "Ontology-based heuristic patent search", International Journal of Web Information Systems, Vol. 15 No. 3, pp. 258-284. https://doi.org/10.1108/IJWIS-06-2018-0053
Emerald Publishing Limited
Copyright © 2018, Emerald Publishing Limited