To read this content please select one of the options below:

Infer the missing facts of D3FEND using knowledge graph representation learning

Anish Khobragade (Department of Computer Engineering and IT, College of Engineering Pune, Pune, India)
Shashikant Ghumbre (Department of Computer Engineering, Government College of Engineering and Research Avasari Khurd, Pune, India)
Vinod Pachghare (Department of Computer Engineering and IT, College of Engineering Pune, Pune, India)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 16 August 2023

Issue publication date: 28 November 2023

48

Abstract

Purpose

MITRE and the National Security Agency cooperatively developed and maintained a D3FEND knowledge graph (KG). It provides concepts as an entity from the cybersecurity countermeasure domain, such as dynamic, emulated and file analysis. Those entities are linked by applying relationships such as analyze, may_contains and encrypt. A fundamental challenge for collaborative designers is to encode knowledge and efficiently interrelate the cyber-domain facts generated daily. However, the designers manually update the graph contents with new or missing facts to enrich the knowledge. This paper aims to propose an automated approach to predict the missing facts using the link prediction task, leveraging embedding as representation learning.

Design/methodology/approach

D3FEND is available in the resource description framework (RDF) format. In the preprocessing step, the facts in RDF format converted to subject–predicate–object triplet format contain 5,967 entities and 98 relationship types. Progressive distance-based, bilinear and convolutional embedding models are applied to learn the embeddings of entities and relations. This study presents a link prediction task to infer missing facts using learned embeddings.

Findings

Experimental results show that the translational model performs well on high-rank results, whereas the bilinear model is superior in capturing the latent semantics of complex relationship types. However, the convolutional model outperforms 44% of the true facts and achieves a 3% improvement in results compared to other models.

Research limitations/implications

Despite the success of embedding models to enrich D3FEND using link prediction under the supervised learning setup, it has some limitations, such as not capturing diversity and hierarchies of relations. The average node degree of D3FEND KG is 16.85, with 12% of entities having a node degree less than 2, especially there are many entities or relations with few or no observed links. This results in sparsity and data imbalance, which affect the model performance even after increasing the embedding vector size. Moreover, KG embedding models consider existing entities and relations and may not incorporate external or contextual information such as textual descriptions, temporal dynamics or domain knowledge, which can enhance the link prediction performance.

Practical implications

Link prediction in the D3FEND KG can benefit cybersecurity countermeasure strategies in several ways, such as it can help to identify gaps or weaknesses in the existing defensive methods and suggest possible ways to improve or augment them; it can help to compare and contrast different defensive methods and understand their trade-offs and synergies; it can help to discover novel or emerging defensive methods by inferring new relations from existing data or external sources; and it can help to generate recommendations or guidance for selecting or deploying appropriate defensive methods based on the characteristics and objectives of the system or network.

Originality/value

The representation learning approach helps to reduce incompleteness using a link prediction that infers possible missing facts by using the existing entities and relations of D3FEND.

Keywords

Acknowledgements

The authors are grateful to the Department of Computer Engineering and IT, COEP, for providing high-computing GPU server facilities procured under TEQIP-III (A World Bank project) for our research work. This work is not supported by any funding.

Citation

Khobragade, A., Ghumbre, S. and Pachghare, V. (2023), "Infer the missing facts of D3FEND using knowledge graph representation learning", International Journal of Web Information Systems, Vol. 19 No. 3/4, pp. 139-156. https://doi.org/10.1108/IJWIS-03-2023-0042

Publisher

:

Emerald Publishing Limited

Copyright © 2023, Emerald Publishing Limited

Related articles