The utilisation of open research data repositories for storing and sharing research data in higher learning institutions in Tanzania

Purpose – The study aims to investigate the utilisation of open research data repositories (RDRs) for storing and sharing research data in higher learning institutions (HLIs) in Tanzania. Design/methodology/approach – A survey research design was employed to collect data from postgraduate students at the Nelson Mandela African Institution of Science and Technology (NM-AIST) in Arusha, Tanzania. The data were collected and analysed quantitatively and qualitatively. A census sampling techniquewasemployedtoselectthesamplesizeforthisstudy.Thequantitativedatawereanalysedusingthe StatisticalPackagefortheSocialSciences(SPSS),whilstthequalitativedatawereanalysedthematically. Findings – Less than half of the respondents were aware of and were using open RDRs, including Zenodo, DataVerse, Dryad, OMERO, GitHub and Mendeley data repositories. More than half of the respondents were not willing to share research data and cited a lack of ownership after storing their research data in most of the open RDRs and data security. HILs need to conduct training on using trusted repositories and motivate postgraduatestudentstoutiliseopenrepositories(ORs).ThechallengesforunderutilisationofopenRDRswere alackofpoliciesgoverningthestorageandsharingofresearchdataandgrantconstraints. Originality/value – Research data storage and sharing are of great interest to researchers in HILs to inform them to implement open RDRs to support these researchers. Open RDRs increase visibility within HILs and reduceresearchdataloss,andresearchworkswillbecitedandusedpublicly.Thispaperidentifiesthepotential foradditionalstudiesfocussedonthisarea.


Introduction and theoretical background
The pressure of storing and sharing research data in open research data repositories (RDRs) has been placed on researchers by funding agencies, journal publishers, funding agencies, higher learning institutions (HLIs), open science initiatives such as findable, accessible, interoperable and reusable (FAIR) and research data movements (Boyd, 2021;Hrynaszkiewicz et al., 2021;Uzwyshyn, 2016).Scientific discovery is accelerated by the storage and sharing of research data.It also improves the return on investment for researchers' institutions and communities (Donaldson and Koepke, 2022).To improve access to and reuse of such research data, it is necessary to store and exchange it utilising open platforms such as Open RDRs (Sicilia et al., 2017Bezuidenhout, 2019;Boyd, 2021;Donaldson and Koepke, 2022;Jeng and He, 2022;Mauthner and Parry, 2013;Tenopir et al., 2015).
Open RDRs are amongst the online systems for storing and sharing research data (Besançon et al., 2021;Chawinga and Zinn, 2019;Donaldson and Koepke, 2022;Jeng and He, 2022;Thoegersen and Borlund, 2021;Nielsen, 2011;Uzwyshyn, 2016).Holdren (2013, p. 5) defined RDRs as "large database infrastructures set up to manage, share, access, and archive researchers' datasets" when practicable and possible from a legal and ethical standpoint.Data centres, data archives and/or scientific databases are other names for open RDRs (Uzwyshyn, 2016).Open RDRs enhance the storage and sharing of a wide range of data types in a wide variety of formats (Wilkinson et al., 2016) and play a vital role in the research life cycle and ensuring research data deposits remain FAIR (Boyd, 2021;Wilkinson et al., 2016).Open RDRs enable researchers to store and share their research data using automatic persistent identifiers (e.g.persistent identifier (PID) and uniform resource locator (URL)), indexing and metadata to ensure access control possibilities (e.g.embargo and authentication) (Donaldson and Koepke, 2022;Thoegersen and Borlund, 2021;van Wyk Johann and van der Walt, 2020).Additionally, they support academic research cooperation, publishing and sharing, enabling HLIs to manage their study data and present them to larger research communities (Van Wyk Johann and Van der Walt, 2020).
The three main categories of open RDRs are: (1) Domain-specific or discipline repositories (DRs) which are used to store research data on the specific subject areas (Garijo et al., 2022), for example, National Institutes of Health (NIH) data repositories for medical research data which are maintained by the Trans-NIH Biomedical Informatics Coordinating Committee (BMIC) (Gonzales et al., 2022).
(3) Institutional data repositories (IDRs) which are maintained and operated by a specific institution, these repositories are often university-based (Xafis and Labude, 2019), for instance, the University of Pretoria (UP) RDR (https://researchdata.up.ac.za/) which is used to facilitate data publishing, sharing and collaboration of academic research and allowing UP to manage their research data (Van Wyk Johann and Van der Walt, 2020).
For open RDRs to be utilised they must earn the trust of the research communities and demonstrate that they are reliable and capable of appropriately managing the data they hold (Lin et al., 2020).That implies that HLIs should opt for trusted RDRs (Giaretta, 2007) that are indexed by the Registry of Research Data Repositories (re3data) (Boyd, 2021).Thus, HLIs need to ensure that research data collected by their institutional members such as students, staff and researchers are stored and shared via open RDRs to enable access and reuse of such data to the public.Due to their potential to provide very substantial and potent datasets, the majority of HLIs worldwide are starting to adopt and implement open RDRs (Thoegersen and Borlund, 2021;Van Wyk Johann and Van der Walt, 2020) through their libraries and Information and Communication Technologies (ICTs) departments.In this regard, there is a need for a close collaboration between libraries and ICTs departments for the implementation of open RDRs in HLIs.Despite the training and workshops guided by libraries and ICTs to ensure academic community members are well informed and are capable on using open RDRs for storing and sharing their research data (Gordon et al., 2015;Van Wyk Johann and Van der Walt, 2020), other activities to ensure proper implementation of open RDRs such as metadata creation and software installation must be conducted in collaboration between libraries and ICTs departments within HLIs (Mosha and Ngulube, 2023;Van Wyk Johann and Van der Walt, 2020).Thus, this study used the Nelson Mandela African Institution of Science and Technology (NM-AIST) in Tanzania as a case study to explore how open RDRs might enhance storage and research data sharing in HLIs.

Significance of the study
The present study is significant because it addresses a pressing research issue of the need for researchers to store their research data in open RDRs with most of them available free of charge.That enables the preservation and curation of research data beyond the lifetime of a research project.The study also creates an opportunity for researchers and students in HLIs to gain new knowledge about storing and distributing research data for later use and produce new research results.In this scenario, reusing research data in open RDRs saves time and cost for researchers going to field for collecting data.Open RDRs facilitate research data publishing, sharing and collaboration of academic research, allowing HLIs to manage and in some cases showcase their research data within and outside their countries.In other words, this study raises awareness of open RDRs and their advantages to researchers and science.

Problem statement
Advances in technology have increased the types of digital data collected and analysed during research projects in many institutions, including HLIs (Thoegersen and Borlund, 2021).There is evidence that research data are saved on personal devices such as laptops, iPads, flash drives, hard drives and tablets, which makes it difficult to keep and share research data (Haixia et al., 2017).The lack of open RDRs in many HLIs, the lack of policies and guidelines to improve the storage and sharing of research data in HLIs, the cost of hosting research data within open RDRs and a lack of knowledge and expertise about the storage and sharing of research data utilising open RDRs are some of the reasons why some HLIs do not properly save and share these data (Donaldson and Koepke, 2022;Hrynaszkiewicz et al., 2021;Uzwyshyn, 2016;Xu et al., 2022).However, little is known about how HLIs in Tanzania utilise open RDRs to store and share research data.Therefore, this study investigated the utilisation of open RDRs for storing and sharing enhanced research data in HLIs using the NM-AIST as a test case.The specific objectives of the study were to: (1) Determine the level of awareness and usage of open RDRs amongst postgraduate students.
(2) Establish the extent to which postgraduate students were willing to share research data.
(3) Assess the role of HILs in supporting research data storage and sharing.
(4) Outline the requirements for trusted open RDRs.
(5) Ascertain the inclusion of research data traceability information in the repository.
(6) Identify the benefits of storing and sharing research data in open RDRs.
(7) Find out the challenges that hinder the storage and sharing of research data in open RDRs.

Research design
This study employed a cross-sectional study design (Creswell and Creswell, 2018;Ponto, 2015).A pretested closed-ended questionnaire was used to collect research data from postgraduate students.Since it was COVID-19 period, the questionnaires were circulated online (using individual emails), with a rate of return of 58 (83) %.

Study area
The study was conducted at the NM-AIST, one of the public universities located in Arusha, Tanzania.The institution has four schools, namely the School of Computational and Communication Science Engineering (CoCSE), the School of Materials, Energy, Water and Environmental Sciences (MEWES), the School of Life Sciences and Bioengineering (LiSBE) and the School of Business and Humanities (BuSH).The School of BuSH was not included in this study because it was not offering any postgraduate degree during data collection.

Study population
The study investigated second-year postgraduate students at the NM-AIST who were in the final stages of research data collection, analysis and storage.

Sample size and sampling procedure
The study employed census sampling which provides an equal chance for all members of a given population to participate in the study.According to Fottrell and Byass (2008), census sampling refers to the quantitative research method, in which all the members of the population are enumerated.Furthermore, add that census sampling has higher participation of individuals and it has low chances of bias, it is also more applicable to a small number of the total population (Gilmore et al., 2022).Thus, in this study the whole population that is secondyear postgraduate students was included.The main reason for using census sampling was because the target population was small (70 postgraduate students).We also focussed on the second-year students since they were in the middle of their research activities, whilst firstyear students were busy with their coursework.

Data collection
Data was collected using questionnaires that included both closed and open-ended question items.
Responses from the open-ended questions (qualitative data) were used to supplement responses from the closed-ended questions (quantitative data).The main aim is to enable both quantitative and qualitative results to be compared and reported at the same time (Fetters et al., 2013).

Data analysis
The Statistical Package for Social Sciences (SPSS), also known as IBM SPSS Statistics software package was used to analyse research data.

Ethical consideration
Ethical clearance was obtained from the University of South Africa (UNISA), whilst permission to collect data at the institution was obtained from the NM-AIST.On the other hand, participants were asked to sign an informed consent form.In addition, respondents were assured that the information collected will be treated confidentially and used purely for research work.Aliases or pseudonyms were used in data analysis to ensure the confidentiality and privacy of the study participants.

Demographic information of respondents
A total of 58 (83%) postgraduate students responded to the questionnaire.Male respondents accounted for 38 (66%), whilst females accounted for 20 (34%).Many respondents (53%) were aged between 31 and 40 years of age.Table 1 presents the demographic information about respondents.There were no obvious demographic trends in the data.

Awareness and usage of open RDRs amongst researchers
More than half of the respondents 32 (55%) were not aware and were not using open RDRs for research data storage and sharing.Amongst them, a total of 20 (62.5%) out of 32 were storing their research data using their personal own devices such as laptop computers, iPad, flash disks, external hard drives and CDs/DVDs.Figure 1 provides personal research data storage and sharing facilities amongst respondents.
Less than half of the respondents 26 (45%) were aware of and using open RDRs for research data storage and sharing, whereas a total of 9 (35%) respondents were using Zenodo data repository to store and share their research data.Figure 2 presents open RDRs used by respondents to store their research data.

Willingness to share research data
A total of 10 (17%) out of 58 respondents were willing to share their research data.Figure 3 presents the willingness to share research data amongst respondents.On the other side, 48 (83%) respondents were not willing to share their research data.Open research data repositories

Discussion
The discussion in this study is presented based on the specific objectives as follows:

Awareness and usage of open RDRs amongst researchers
In this study, more than half of the respondents were not aware of the open RDRs for storing and sharing their research data.Most of them were using their personal devices to store and share their research data.Goben and Sandusky (2020) warned that using personal devices has drawbacks because of previous reliance on decentralised servers, laptops, printouts, or hard drives stashed under a desk, which has resulted in ongoing and frequent research data loss.Few respondents were using open RDRs for their research data storage and sharing.Most of them were using Zenodo Repository to store their research data other than open RDRs such as Dryad, Figshare and Mendeley.Similar findings were reported by Sicilia et al. (2017), who found that 3,828 research data were kept in Zenodo at the time of data extraction.Despite storing research data, Zenodo allows the publication of any research output, including papers, posters and presentations (Assante et al., 2016).

Willingness to share research data
Many respondents expressed a lack of willingness to share their research data.Only a total of 10 (17%) out of 58 respondents were willing to share their research data.The present study revealed different reasons provided by respondents for not sharing their research data using open RDRs.Majority of them presented a lack of ownership after storing their data in most of the open RDRs, whereas others presented a lack of long-term preservation of research data.The absence of ownership after storing research data in the majority of open RDRs was also supported by other studies (Austin et al., 2015;Bangani and Moyo, 2019;Bezuidenhout, 2019;Michener, 2015;Thoegersen and Borlund, 2021).Austin et al. (2015) added that a lack of ownership rights prohibits researchers to deposit their data in repositories for open access.Other studies presented various reasons for not sharing research data.Longo and Drazen (2016) underlined inadequate attribution, Science et al. (2019) discovered a lack of incentives for researchers to share research data as well as an inadequate infrastructure for storage and sharing of research data.Inappropriate use of the shared data, costs associated with data storage and sharing, or the trialist's ability to have a fair opportunity to publish research using the data set first were amongst the hurdles to sharing clinical trial data discovered by Rathi et al. (2012) and Ross and Krumholz (2013).Abele-Brehm et al. (2019) found that researchers possess some fears regarding sharing research data, however, such fears are highest amongst early career researchers and lowest amongst professors.It will be interesting to find out from other studies if such reasons inhibit data sharing in other academic environments.

Role of higher learning institutions in supporting data storage and sharing
The present study findings revealed the need for HLIs to conduct training to equip knowledge and skills on storing and sharing open RDRs.The same observation was made by Austin et al. (2015) and Goben and Sandusky (2020)  Various training programmes were indicated including metadata standards, research data reuse, free access and version control (Austin et al., 2015) and uploading and updating data files (i.e.versioning) and formats and usability of the stored research data (Donaldson and Koepke, 2022).

Requirements for trusted open RDRs
The findings indicated that respondents preferred open RDRs which allow the storage and sharing of different types and formats of research data.Lin et al. (2020) supported the findings presented by respondents that research data types and formats are amongst the criteria for a trusted repository since they can allow that data to be accessible.Faundeen (2017) also recommended that data and file formats that ensure long-term storage are amongst the requirements for trusted open RDRs.Other requirements presented by Lin et al. (2020) were metadata schema, data file formats, controlled vocabularies and ontologies.

Research data traceability information
Research data stored and shared using open RDRs need to be traceable and should be accompanied by storage details.For the research data to be traceable, they should provide information about "when" the research data was created, submitted and published.The findings from this study seem to suggest that the respondents did not pay enough attention to this issue.Traceability information as presented by Assante et al. (2016) including creation, submission and publication dates of the research data, as well as a unique or persistent identifier (such as a URL or PID) was not prevalent in the research data of the respondents.Assante et al. (2016) and Austin et al. (2015), concluded that research data needs to be accompanied by Digital Object Identifier (DOI) standards for persistent identification and traceability.

Benefits of storing and sharing research data in open RDRs
Respondents mentioned various benefits of storing and sharing research data whereas majority mentioned to increase research outputs and to reduce research data wastes.Studies by Jeng and He (2022), Piwowar et al. (2007) and Ross and Krumholz (2013) presented various benefits of open RDRs including to improve academic citation, visibility, scholarly influence and future chances.Other benefits include increasing the visibility of research data for citation and reuse by other researchers, reinforcing scientific inquiry by ensuring that enthusiasts and sceptics can test, validate and replicate research results, promoting new research and different ways of testing and analysing research data, saving financial and other resources that are wasted when similar data sets are created by different researchers and enabling new discoveries from old research data sets (Engineering and Physical Sciences Research Council (EPSRC), 2018).Increased datafication of research procedures and infrastructure, encouragement for researchers to publish raw data, data sets and metadata and increased visibility of HLIs were further benefits that were indicated in line with what is reported in the literature (Kidwell et al., 2016;Nosek et al., 2015;Roche et al., 2014).

Challenges that hinder the storage and sharing of research data in open RDRs
The findings revealed a lack of research data storage and sharing policies and guidelines posed a lot of difficulties for researchers.A lack of strategic planning for the long-term preservation of research data was also noted by Jeng and He (2022).Moreover, Austin et al. (2015), identified inadequate platform support for file-level metadata descriptions for research data or files and high expenses related to data storage, dissemination, curation, or preservation as some of the difficulties encountered in open RDRs.According to Goben and Sandusky (2020), research data loss is the most frequent threat, followed by improper data handling, loss of data privacy and security, problems with reproducibility and loss of trust from research participants and the public.The existence of policies and guidelines can play a great role in reducing some of the identified difficulties.

Areas for further studies
(1) The use of structured questions limits participants to provide more information and ideas concerning the use of open RDRs for storing and sharing research data.Further studies should include unstructured questions and will engage more respondents from different universities.
(2) Examine the factors associated with the application of open RDRs in HLIs in Tanzania and Sub-Saharan African (SSA).
(3) Engage library members especially students, staff and researchers to assess not only the usage but also the implementation strategies for more open RDRs.
Figure 1.Personal research data storage and sharing facilities (N 5 32) that training and advice on how to utilise open RDRs for storing and sharing research data, as well as help find reputable open RDRs, are significant roles that HLIs need to play in facilitating the adoption of open RDRs amongst their users.In this case, HLIs must set up trusted open RDRs per requirements such as the capacity to store and facilitate the sharing of various types and formats of research data.

( 4 )
Examine the factors that affect the application of open RDRs for storing and sharing research data HLIs in Tanzania and Africa.8. Conclusion and recommendations Many respondents were unaware of open RDRs and their functionality.They used their own devices including laptops, iPads, flash disks and the like to store their research data.There is a need for HLIs to raise awareness about open RDRs amongst the members of their community, especially as they relate to storing and sharing research data.HLIs should ensure the installation of trusted open RDRs considering all the requirements for trusted open RDRs.Researchers must make sure their research data are accompanied by traceable information to be traced and utilised for subsequent research projects.To raise researchers' understanding of why and how they must use open RDRs in HLIs, it is necessary to communicate and promote the benefits of storing and sharing research data for the effective utilisation of open RDRs.The challenges that hinder storing and sharing of research data need to be minimised.Reducing the difficulties associated with sharing research data is good for science and the progress of humanity.It is recommended that:(1) HLI should increase the awareness on storing and sharing research data using open RDRs.(2)HLIs should provide training to their researcher to increase the usability of open RDRs for storing and sharing research data.(3) HLIs should minimise the challenges presented to increase the usage of open RDRs for storing and sharing research data.This study is a case study.It provides an overview of how researchers at a Tanzanian higher education institution use open RDRs.The results cannot be generalised to other HLIs, theories about the utilisation of open RDRs in HLIs can be developed if more case studies are conducted on the subject.

Table 1 .
Demographic information of respondentsA total of 18 (37.5%)respondents said they did not share research data due to a lack of ownership after storing research data in most of the open RDRs, whereas 10 (20.8%) cited a lack of long-term preservation of research data as their reason for not sharing.Table 2 depicts reasons for not sharing research data amongst respondents.

Table 3 .
Table 3 provides different HLIs' roles to support research data storage and sharing using open RDRs.5.5 Requirements for trusted open RDRs Based on the requirements needed for trusted open RDRs, 16 (27.5%)respondents preferred open RDRs which allow the storage and sharing of different types and formats of research data.Table 4 illustrates the requirements for trusted open RDRs expected by the respondents.5.6 Research data traceability information Research data stored and shared using open RDRs need to be traceable and should be accompanied by storage details.In this scenario, 18(31.1%)respondents indicated that research data storage details that provide information about "when" the research data were HLIs' roles to support research data storage and sharing created, submitted and published should be easily traced in open RDRs.Table 5 depicts research data storage information in open RDRs.5.7 Benefits of storing and sharing research data in open RDRs Respondents mentioned various benefits of storing and sharing research data.A total of 16 (27.6%)respondents mentioned that open RDRs increases research output and reduces research data waste.Table 6 mentions various the benefits of storing and sharing research data using open RDRs.5.8 Challenges that hinder the storage and sharing of research data in open RDRs Challenges that hinder the storage and sharing of research data in open RDRs were presented to the respondents.A total of 16 (27.6%)respondents indicated a lack of research data storage and sharing policies and guidelines posed a lot of difficulties for researchers.Table 7 presents challenges that hinder the storage and sharing of research data.

Table 5 .
Research data storage details