Research data management in institutional repositories: an architectural approach using data lakehouses
ISSN: 2059-5816
Article publication date: 24 December 2024
Issue publication date: 28 January 2025
Abstract
Purpose
This paper aims to address the pressing challenges in research data management within institutional repositories, focusing on the escalating volume, heterogeneity and multi-source nature of research data. The aim is to enhance the data services provided by institutional repositories and modernise their role in the research ecosystem.
Design/methodology/approach
The authors analyse the evolution of data management architectures through literature review, emphasising the advantages of data lakehouses. Using the design science research methodology, the authors develop an end-to-end data lakehouse architecture tailored to the needs of institutional repositories. This design is refined through interviews with data management professionals, institutional repository administrators and researchers.
Findings
The authors present a comprehensive framework for data lakehouse architecture, comprising five fundamental layers: data collection, data storage, data processing, data management and data services. Each layer articulates the implementation steps, delineates the dependencies between them and identifies potential obstacles with corresponding mitigation strategies.
Practical implications
The proposed data lakehouse architecture provides a practical and scalable solution for institutional repositories to manage research data. It offers a range of benefits, including enhanced data management capabilities, expanded data services, improved researcher experience and a modernised institutional repository ecosystem. The paper also identifies and addresses potential implementation obstacles and provides valuable guidance for institutions embarking on the adoption of this architecture. The implementation in a university library showcases how the architecture enhances data sharing among researchers and empowers institutional repository administrators with comprehensive oversight and control of the university’s research data landscape.
Originality/value
This paper enriches the theoretical knowledge and provides a comprehensive research framework and paradigm for scholars in research data management. It details a pioneering application of the data lakehouse architecture in an academic setting, highlighting its practical benefits and adaptability to meet the specific needs of institutional repositories.
Keywords
Acknowledgements
This research was funded by [Projects of the National Social Science Foundation of China] grant number [22BGL011]: Research on the Influence Mechanism and Implementation Path of Key Core Technology Breakthroughs under the New National System and [Xi’an Science and Technology Program Soft Science Project] grant number [24RKYJ0011]: Research on the Mechanisms and Pathways for Market-Oriented Allocation of Technological Elements in Xi’an to Achieve Breakthroughs in Key Core Technologies.
Citation
He, Z. and Fang, W. (2025), "Research data management in institutional repositories: an architectural approach using data lakehouses", Digital Library Perspectives, Vol. 41 No. 1, pp. 145-178. https://doi.org/10.1108/DLP-02-2024-0022
Publisher
:Emerald Publishing Limited
Copyright © 2024, Emerald Publishing Limited