To read the full version of this content please select one of the options below:

Domain-specific word embeddings for patent classification

Julian Risch (Hasso Plattner Institute, University of Potsdam, Potsdam, Germany)
Ralf Krestel (Hasso Plattner Institute, University of Potsdam, Potsdam, Germany)

Data Technologies and Applications

ISSN: 2514-9288

Article publication date: 29 March 2019

Issue publication date: 3 April 2019

600

Abstract

Purpose

Patent offices and other stakeholders in the patent domain need to classify patent applications according to a standardized classification scheme. The purpose of this paper is to examine the novelty of an application it can then be compared to previously granted patents in the same class. Automatic classification would be highly beneficial, because of the large volume of patents and the domain-specific knowledge needed to accomplish this costly manual task. However, a challenge for the automation is patent-specific language use, such as special vocabulary and phrases.

Design/methodology/approach

To account for this language use, the authors present domain-specific pre-trained word embeddings for the patent domain. The authors train the model on a very large data set of more than 5m patents and evaluate it at the task of patent classification. To this end, the authors propose a deep learning approach based on gated recurrent units for automatic patent classification built on the trained word embeddings.

Findings

Experiments on a standardized evaluation data set show that the approach increases average precision for patent classification by 17 percent compared to state-of-the-art approaches. In this paper, the authors further investigate the model’s strengths and weaknesses. An extensive error analysis reveals that the learned embeddings indeed mirror patent-specific language use. The imbalanced training data and underrepresented classes are the most difficult remaining challenge.

Originality/value

The proposed approach fulfills the need for domain-specific word embeddings for downstream tasks in the patent domain, such as patent classification or patent analysis.

Keywords

Citation

Risch, J. and Krestel, R. (2019), "Domain-specific word embeddings for patent classification", Data Technologies and Applications, Vol. 53 No. 1, pp. 108-122. https://doi.org/10.1108/DTA-01-2019-0002

Publisher

:

Emerald Publishing Limited

Copyright © 2019, Emerald Publishing Limited

Related articles