An indexing system for the relevance of academic production and research from digital repositories and metadata

Jared David Tadeo Guerrero-Sosa (Facultad de Matematicas, Universidad Autonoma de Yucatan, Merida, Mexico)
Víctor Hugo Menéndez-Domínguez (Facultad de Matematicas, Universidad Autonoma de Yucatan, Merida, Mexico)
María Enriqueta Castellanos-Bolaños (Facultad de Matematicas, Universidad Autonoma de Yucatan, Merida, Mexico)

The Electronic Library

ISSN: 0264-0473

Article publication date: 25 January 2021

Issue publication date: 18 May 2021

752

Abstract

Purpose

This paper aims to propose a set of quantitative statistical indicators for measuring the scientific relevance of research groups and researchers, based on high-impact open-access digital production repositories.

Design/methodology/approach

An action research (AR) methodology is proposed in which research is associated with the practice; research informs practice and practice is responsible for informing research in a cooperative way. AR is divided into five phases, beginning with the definition of the problematic scenario and an analysis of the state of the art and ending with conducting tests and publishing the results.

Findings

The proposed indicators were used to characterise group and individual output in a major public university in south-eastern Mexico. University campuses hosting a large number of high-impact research groups. These indicators were very useful in generating information that confirmed specific assumptions about the scientific production of the university.

Research limitations/implications

The data used here were retrieved from Scopus and open access national repository of Mexico. It would be possible to use other data sources to calculate these indicators.

Practical implications

The system used to implement the proposed indicators is independent of any particular technological tool and is based on standards for metadata description and exchange, thus facilitating the easy integration of new elements for evaluation.

Social implications

Many organisations evaluate researchers according to specific criteria, one of which is the prestige of journals. Although the guidelines differ between evaluation bodies, relevance is measured based on elements that can be adapted and where some have greater weight than others, including the prestige of the journal, the degree of collaboration with other researchers and individual production, etc. The proposed indicators can be used by various entities to evaluate researchers and research groups. Each country has its own organisations that are responsible for evaluation, using various criteria based on the impact of the publications.

Originality/value

The proposed indicators assess based on the importance of the types of publications and the degree of collaborations. However, they can be adapted to other similar scenarios.

Keywords

Citation

Guerrero-Sosa, J.D.T., Menéndez-Domínguez, V.H. and Castellanos-Bolaños, M.E. (2021), "An indexing system for the relevance of academic production and research from digital repositories and metadata", The Electronic Library, Vol. 39 No. 1, pp. 33-58. https://doi.org/10.1108/EL-06-2020-0160

Publisher

:

Emerald Publishing Limited

Copyright © 2021, Emerald Publishing Limited


1. Introduction

Technology has developed by leaps and bounds over the past decade. A wide range of disciplines enjoy the benefits and facilities that technological advances provide, and science is no exception. Scientific research has become increasingly accessible because of digital repositories, which are responsible for storing, preserving and acting as a tool for content dissemination; these are entities of a digital or non-digital nature and can be used, reused or referenced throughout technology-supported learning (Ip et al., 2001). There is a need to describe and locate the resources of a repository, and metadata are used for this purpose. Metadata is descriptive data about data; that is, they provide the minimum information necessary to identify a resource (Senso and Rosa Piñero, 2003).

Certain repositories contain publications that meet a strict series of conditions, which are used to evaluate the degree of relevance of the research (known as indexing). At the international level, Scopus from Elsevier and Web of Science from Clarivate Analytics is considered the most important.

Based on identifiable scientific publications in digital repositories, various indicators have been proposed for evaluating the impact of research. However, some associated problems have arisen, due to their very nature.

A study by Belter (2015) affirmed that the citation index allows the measurement only of the usefulness of one publication over another, but does not evaluate whether the cited publication represented a real advance in its knowledge area, as a publication can be cited to make references to methodologies, figures and examples, as well as to discredit the results or other content.

Another study (Ranjan, 2017) mentioned that the journal impact factor (JIF) is a widely used indicator for evaluating the prestige of scientific journals but with various limitations related to its easy handling and, as a consequence, experts have proposed using new alternative indicators such as SCImago journal ranking and others that complement JIF which consider that a journal should be evaluated based on the quality of its articles and not on the average of citations received.

In addition, another study (Ding et al., 2020) analysed the limitations of the h-index, g-index, AR-index, p-index, integrated impact indicator and academic trace, concluding, amongst other recommendations, that the performance of researchers should be measured through a comprehensive academic evaluation composed of a meta-analysis of various indicators and multiple perspectives, in addition to the fact that the influence of a researcher comes from the degree of dissemination of knowledge of their publications, which considers how many people, fields, institutions and regions have cited their posts, an aspect that citation-based indicators cannot measure.

In addition to the repositories of high impact scientific production, institutional repositories focus on the storage and dissemination of the scientific production by the staff from an institution, being one of the pillars for open access to research (Mary, 2015).

Although the evaluation of scientific production mainly uses repositories of high impact scientific production as data sources, institutional open access repositories have been useful, as disadvantages have been identified in conventional indicators and the use of institutional repositories allows evaluating from different perspectives (Anabel, 2014). This has been one of the main reasons for the establishment of these platforms in institutions (Rahayu et al., 2019).

The purpose of this article is to propose a set of metrics for the evaluation of both group and individual relevance, which are implemented as part of a system and tested in a case study involving an important public university in Southeastern Mexico.

The indicators include factors such as the weights of the publications assigned by evaluation bodies, the degree of collaboration (institutional, national and international) and the individual publications themselves.

The remainder of the article is divided into the following sections: in Section 2, the foundations that lead to the research work are examined. Section 3 describes the procedure and the proposed indicators used to measure research relevance. Section 4 explains the implementation of the system, with Section 5 using a case study as an example. Section 6 presents a discussion and Section 7 the conclusion and future work.

2. Background

In addition to Scopus and Web of Science for the scientific output evaluation, Google Scholar has recently started to be considered as another alternative and this database uses its own indicators for scientific productivity. Several studies have compared the characteristics, strengths and weaknesses of citation databases, using Scopus, Web of Science and Google Scholar as references (Bakkalbasi et al., 2006; Falagas et al., 2007; Harzing and Alakangas, 2016; Jacso, 2005). However, the first two of these have greater credibility in terms of allowing institutions to evaluate the quality and relevance of publications by their researchers.

A systematic review has been carried out of the traditional indicators used to measure the impact of publications by agencies and repositories such as journal impact factor, the total impact factor of journals, citation index, h-index and prestige factor (Guerrero Sosa et al., 2018a).

These indicators are used by various entities to evaluate researchers and research groups. Each country has its own organisations that are responsible for evaluation, using various criteria based on the impact of the publications.

Each publication used to evaluate both researchers and research groups is described by certain metadata. To use and assign values to metadata, it is necessary to define a standard. Although there are several such standards, only Dublin Core is described in this work, as this is the standard chosen for the proposed system. It is also the metadata standard that is most widely used to describe digital documents. It has been adopted in various countries in their respective languages and for work in conjunction with open access initiatives (OAIs) (Méndez Rodríguez, 2006). The Dublin Core standard defines a base set of 15 metadata that defines a simple description of the document content, copyright and the information needed to create instances.

The metadata associated with publications can be retrieved via interoperability (i.e. the ability to exchange content between two or more different systems). This concept has been analysed in terms of four fundamental aspects: syntactic, semantic, infrastructural and structural (Gómez Dueñas, 2010). The last of these is the most important in this work, as it refers to communication between heterogeneous repositories through the use of specialised protocols such as OAI-PMH and OpenAIRE.

  • OAI-PMH. This is a low-barrier mechanism for the interoperability of a repository (NSF, 2018). Data providers are repositories that expose structured metadata via OAI-PMH. Service providers are responsible for making OAI-PMH service requests to collect metadata. In practice, it consists of a set of verbs (services) that are invoked within HTTP, based on required, optional or exclusive arguments.

  • OpenAIRE. This is a project based on the open-access policies in use in Europe and provides a means for the promotion and realisation of the widespread adoption of these policies (RECOLECTA, 2018). OpenAIRE is used for the identification of projects, funders, referenced publications and data sets and is based on the Dublin Core syntax, with a uniform resource identifier (URI) defined as the namespace info:eu-repo. The fields in OpenAIRE are labelled as mandatory (M), mandatory when applicable (MA), recommended (R) and optional (O) (RECOLECTA, 2018).

Based on the metadata values, various models have been proposed for the evaluation of scientific production. There is a proposed model in charge of evaluating research performance at the University of Texas Institute for Geophysics through indicators that measure production per year, the number of citations per article and the average age of citations received by a journal during the year of evaluation of the citation report (JCR), classifying scientific production into four categories: mainstream, archive, articles published as conference proceedings and other publications (Frohlich and Resler, 2001).

Based on successive h-indices, a model proposes to evaluate through the researcher-department-institution hierarchy using the information available in Web of Science (Arencibia-Jorge et al., 2008). The model evaluated the National Centre for Scientific Research of Cuba mainly using the h-index to evaluate the individual performance of a researcher, complementing it with the g-index and the a-index. In addition, the successive h-indexes that have been proposed to evaluate research at the institutional and national levels were used to measure performance at the departmental and institutional levels.

Another model proposes measuring the potential of research in a university at the departmental and individual level (Kotsemir and Shashnov, 2017) using Scopus as a data source. The model was applied by evaluating the members of three departments of a Russian university based on indicators for the potential of research, collaborations (nationally and internationally) and the quality of the journals.

This work, unlike those previously mentioned, assesses the production of an institution from two approaches: research groups and researchers, considering, from high impact publications, aspects such as collaboration (at international, national and institutional levels), the level of importance of the type of scientific document, the collaborations of the researchers outside their established research group, the weight of the collaboration of a researcher according to their position in the authorship list and their individual publications; complementing the publications available in open access repositories. The proposed indicators are set out in the next section.

3. Methodology

An action research methodology is used here, which associates research with practice: research informs practice and practice is responsible for informing research in a cooperative way (Avison et al., 1999). It is divided into five phases, each of which is described in Figure 1 (Guerrero Sosa et al., 2018b).

3.1 Proposed indicators for scientific relevance

A set of indicators for the relevance of research groups and researchers are presented below. It is worth mentioning that the values of the weights for these publications are arbitrary, considering that some types of publications are more important than others. However, they can be adapted to the needs of other entities, needs and countries.

The indexed publications considered here are drawn from Scopus. Equation (1) represents the impact of the research, an indicator that will be used consistently in this document along with other indicators.

(1) IoR=SJR(1+c)
where:
  • SJR is the value of the impact factor of the publication (SCImago journal rank, Scopus);

  • c is the total number of citations of the publication.

As mentioned above, when using publications indexed by Scopus, the indicator SCImago journal rank (SJR) is considered to measure the impact of the publication (González-Pereira et al., 2010). SJR represents the weighted citations received by the source publication in a given year based on publications and citations over the three years prior to the evaluation. If a different tool such as the Web of Science, were used, the equivalent indicator would be applied.

3.2 Indicators for research groups

An indicator is proposed that measures the total relevance of a research group (RGR); this is composed of other metrics, which are described below. Equation (2) represents the indicator of the total output of the research group, as indexed by Scopus: (TPS).

(2) TPS=i=1nWPi+SJRi+IoRi

where:

  1. n is the number of publications by the research group indexed by Scopus;

  2. WP is the weight of the i-th publication. This can take any value, but in this case, is

    • 4 for papers and books;

    • 2 for other publications;

  3. SJR is the impact factor of Scopus for the i-th publication;

  4. IoR is the impact of the research for the i-th publication.

Equation (3) shows the indicator used to measure the degree of collaboration of the research group (DCRG).

(3) DCRG=i=1nj=1mvci,j
where:
  • n is the number of publications by the research group indexed by Scopus;

  • m is the number of external collaborators on the i-th publication;

  • vc is a number assigned to the j-th collaborator on the i-th publication. This can take any value, but in this case, is

  • 1 for a collaborator who is internal to the institution but external to the research group;

  • 2 for a national external collaborator;

  • 4 for a foreign external collaborator.

Equation (4) represents the indicator used to measure the total weight of non-indexed output (TWNIP) for a research group. The weights of the publications must be lower, as they are less relevant than indexed publications.

(4) TWNIP=i=1nWNIPi

where:

  1. n is the total number of publications not indexed by Scopus;

  2. WNIP is the weight of the i-th non-indexed publication. This can take any value but in this case, is:

    • 2 for papers and books;

    • 1 for book chapters.

Finally, equation (5) shows how the results of the previous indicators are combined, giving rise to an equation that measures the total relevance of a research group (RGR):

(5) RGR=TPS+DCRG+TWNIP

3.3 Indicators for researchers

In the same way, as the RGR indicator is for research groups, an indicator that measures the relevance for researchers based on other metrics is proposed, as presented below. In equation (6), the indicator WOC is composed of the following indicators and represents the weight of the collaboration of a given researcher on a publication.

(6) WOC= {1 if researcher is the first or corresponding author 22p otherwise 
where p is the position of the author in the authorship list of the publication.

Equation (7) represents an indicator of the relevance of the researcher within their research group (RIRG).

(7) RIRG= RGR1mi=1nWOCi
where:
  1. RGR is relevant to the research group;

  2. m is the total number of publications of the research group;

  3. n is the number of publications of the research group in which the researcher has participated;

  4. WOC is the weight for the collaboration on the i-th publication.

Equation (8) shows the relevance of the researcher in a collaboration external to the research group (REC).

(8) REC=i=1n(WPiWOCi)+SJRi+IoRi+jmvci,j
where:
  1. n is the number of publications produced through external collaboration;

  2. WP is the weight of the i-th publication. This can take any value, but in this case, is

    • 4 for papers and books;

    • 2 for other publications;

  3. WOC is the weight of the collaboration on the i-th publication;

  4. SJR is the impact factor of Scopus for the i-th publication;

  5. IoR is the research impact of the i-th publication;

  6. m is the number of collaborators on the i-th publication;

  7. vc is a number assigned to the j-th collaborator on the i-th publication. This can take any value, but in this case, is

    • 1 for a collaborator internal to the institution and external to the CA;

    • 2 for a national external collaborator;

    • 4 for a foreign external collaborator.

Individual production is another element that needs to be considered. The weights of the publications are lower, as the evaluation bodies consider collaboration to be a factor of greater importance. The indicator that calculates this quantity (RIP) is shown in equation (9).

(9) RIP=i=1nWPi+SJRi+IoRi
where:
  • n is the number of individual publications indexed by Scopus;

  • WP is the weight of the i-th individual publication. This can take any value, but in this case, is

    • 2 for papers and books;

    • 1 for other publications;

  • SJR is the impact factor of Scopus for the i-th publication;

  • IoR is the research impact for the i-th publication.

Non-indexed output (NIP) is considered, but only for papers, books and book chapters. The expression for this metric is shown in equation (10).

(10) NIP=i=1nWPNIiWOCi
where:
  • n is the number of non-indexed publications;

  • WPNI is the weight of the i-th non-indexed publication. This can take any value, but in this case, is

    • 1 for papers and books;

    • 0.5 for book chapters;

  • WOC is the weight for collaboration on the i-th publication.

Using the indicators listed above, it is possible to calculate the total relevance of the researcher (RR), as shown in equation (11).

(11) RR=RIRG+REC+RIP+NIP

4. Implementation

Based on the set of indicators for the group and individual relevance presented above, a system that calculates these metrics was implemented as illustrated below, from the recovery and processing of data to the calculation and presentation of the results. Figure 2 shows a general architecture for the representation of the scientific relevance of a researcher, in which the indicators are calculated using information retrieved from digital repositories.

The architecture is designed in layers to facilitate the flow of information in an orderly manner. Each layer has a specific function, as described below.

  1. User interface. This presents the information obtained by the data services from the data sources.

  2. Services. This is made up of modules that work together to present the results via the user interface.

    • Calculate indicators. This module obtains the values of the relevant indicators for both the research groups and researchers based on their publications and the values of the associated metadata.

    • Locate collaborations. This module analyses the publications and collaborations to find existing collaborating groups within an institution.

    • Ontological model. An ontological model was built for the representation of each concept in the system (Guerrero-Sosa et al., 2019b), and these act as tools for queries and the inference of new information.

      • SPARQL queries. This module is responsible for obtaining relevant information from the data stored in the ontology, which are presented to the user via the interface;

      • SWRL rules. This module executes an ontology to apply the consequent part of the rules to the instances that meet the conditions established in the antecedent part of the rules.

  3. Information storage schemes. This contains the tools used to store the information about the researchers and their output, as described below:

    • OWL. This is a language of labels for the representation and dissemination of knowledge through ontologies.

    • Ontological database. This stores documents in OWL and RDF formats, amongst others.

    • Document-based database. Unlike a relational database, this stores information in documents, making it easier to transfer data to an ontology or to store semi-structured data (Guerrero-Sosa et al., 2020).

  4. Information acquisition manager. This layer contains data acquisition engines that connect directly to the various corresponding repositories, either via the harvesting of metadata with OAI-PMH or an API.

  5. Origin of the data. Contains repositories that store information about the academic and scientific production of researchers.

4.1 Acquisition of information

The proposed indicators were implemented as a web application that takes as its input data publications stored in Scopus (for indicators related to indexed output) and data about publications stored in the national repository of Mexico (for indicators related to open access production; i.e. not indexed). This repository is made up of more than 100 repositories of institutions within Mexico.

For this task, it is necessary to consider some existing problems associated with the acquisition of information from bibliographic databases.

At the institutional level, each researcher has a unique identifier for easy location. However, because of the increasing appearance of systems such as repositories, they use different identifiers for the same researcher, creating the challenges of generating identifiers that are not unique for a single system, in addition to the fact that all the identifiers of a researcher must be related to each other (Jörg et al., 2012), giving rise to proposed identifiers such as ResearcherID and ORCID (Mazov and Gureyev, 2018).

A study was in charge of identifying the causes of the multiplicity of the profiles in Scopus, evaluating 400 Russian researchers and 400 Russian institutions, affirming that the duplicity of the institution profiles is mainly due to translation, typographical and address errors; while the duplicity of author profiles is due to spelling errors in the names and surnames and the union of the first and last names; these aspects being elements that prevent an adequate scientific evaluation (Selivanova et al., 2019).

In this work, databases with researchers and academic group descriptions are analysed to process all the information from the repositories. Table 1 lists the data that is stored about researchers and research groups.

To solve the problems associated with multiplicity, the Scopus API was used firstly for retrieving author information:

  • Full name;

  • Scopus ID of the institution to which he or she belongs;

  • University from which the researcher’s last degree was awarded (due to the anomaly that arises when an author is not registered in Scopus with his or her current institution, but his or her output is).

From the results of each query, one or more Scopus ID is chosen for the author. A dictionary is created with the name of each university faculty, associating them with areas of knowledge in Scopus (Box 1).

Box 1

Example dictionary of faculties

ENGINEERING: Engineering (all), Materials Science (all), Mathematics (all), Computer Science (all).

CHEMICAL ENGINEERING: Chemistry (all), Materials Science (all), Engineering (all), Chemical Engineering (all).

MATHS: Computer Science (all), Engineering (all), Mathematics (all).

MEDICINE: Medicine (all), Immunology and Microbiology (all), Nursing (all), Agricultural and Biological Sciences (all).

To select the appropriate publications, five conditions were used based on the metadata for each result, based on the name of the researcher, the institution in which he or she is located, the university that awarded the researcher’s last degree and the area of study. The five conditions are as follows:

  1. The name of the researcher resulting from the query consists of two first names and two surnames and completely matches the full name of the researcher in the database.

  2. The query returns the two surnames of the researcher separated by a hyphen, together with one of his or her first names.

  3. The query returns the two surnames of the researcher separated by a hyphen, together with one of his or her abbreviated names.

  4. In the database, the full name of the researcher consists of a first name and a surname, but the result of the query returns it as surname and first name and the area of knowledge is associated with that of the faculty of the researcher.

  5. The query returns the investigator’s middle name and first surname and his or her area of knowledge is associated with that of the investigator’s faculty.

Each Scopus ID associated with a researcher is stored in the documentary database. For each researcher, their production is retrieved via the Scopus search API, using the Scopus ID of each researcher and the ID associated with their institution as parameters, to exclude any output that the researcher did not create as a member of this institution. If the publication record exists in the database, then it is updated by adding another researcher as an author. Otherwise, only the publication record is stored in the database.

For indicators of non-indexed publications, records of open access output are retrieved from the national repository of Mexico using the OAI-PMH protocol (Lagoze et al., 2005). OAI-PMH allows the user to harvest metadata from verbs. The repository was queried using the ListRecords verb (this retrieves information about each resource in the repository), the metadata was analysed and the records of publications by one or more researchers at the institution were stored (Guerrero et al., 2019). The national repository uses the Dublin Core standard to describe the essential basic information about its resources (CONACYT, 2018).

4.2 Calculation of indicators for research groups

For the output stored in the database, queries are made to calculate the indicators of the research groups. To obtain TPS and DCRG, information on the indexed output is used. The procedure for this is illustrated in Figure 3 and is as follows:

  1. For each member listed for the research group, obtain the unique ID and execute an ordinary combination [Equation (12)]:

    (12) Cn,2=n!2!(n2)!
    where n is the number of members of the research group.

  2. For each combination, a query is made to the database to retrieve the output belonging to the pair of authors in question, as a paper is considered to be the publication of the research group if it was written by at least two members of it.

  3. If the record has not been detected, the paper is considered to be a publication by the research group.

  4. For each publication by the research group, consider the type of paper and the values of SJR and IoR to obtain the value of TPS.

  5. In the same publications, identify the Scopus ID of the members of the research group in the scopus_id_authors field. As this field is an array, the array positions of these researchers are located and excluded, leaving only people who do not belong to the research group. Based on these positions, locate the institutions to which the collaborators belong, using the affiliations_authors field. Each element of the array has a vector for each research institution. If position two of the vector has a value other than “Mexico”, the author is considered to be a foreign collaborator. If “Mexico” is shown in position two, then the array value in position one is verified, which contains the Scopus ID for the researcher’s institution. If the researcher is not connected to a Scopus institution ID associated with the university, he or she is considered to be a foreign national collaborator within the same institution of the research group. If the value in position one is the ID of a university, the researcher is considered to be a collaborator within the same institution but external to the research group.

  6. Based on the weight of each collaborator in each publication, the value of DCRG can be obtained.

  7. To obtain TWNIP from the ordinary combinations, the non-indexed production database is consulted using the full names of the researchers.

  8. If the publication is considered to be produced by the research group, calculate and assign the corresponding weight based on the type of paper.

  9. 9. Add the weight values for each non-indexed publication to obtain the TWNIP.

  10. 10. Based on the previous indicators, calculate the value of the relevance of the research group (RGR).

4.3 Calculation of indicators for researchers

Based on the output stored in the database, queries are made to calculate the indicators for the researchers. To obtain REC and RIP, the indexed production information is used. For the NIP indicator, information from the non-indexed production database is used. The procedure used is illustrated in Figure 4 and is as follows:

  • To obtain RIRG, the publications in which the researcher contributed to the indexed and non-indexed database are retrieved. The value of the RGR of this research group is also obtained.

  • Based on RGR, RIRG is calculated.

  • A query is made to retrieve the scientific publications that are indexed by Scopus but are not associated with the research group.

  • To calculate REC, the publications indexed by Scopus that were written as part of collaborations external to the research group are recovered. If the researcher does not belong to a research group, all collaborative output is reflected in this indicator.

  • RIP is obtained by retrieving the output created on an individual basis.

  • A query is made to retrieve the scientific output that is not indexed by Scopus and that is not associated with the research group.

  • NIP is calculated when consulting the non-indexed production database.

  • Using the previous indicators, the total relevance of the researcher (RR) is calculated.

The results are presented via the system user interface. Figure 5 shows the home page, which contains links to the results for the indicators, displayed in the form of treemaps in which the value of the indicator is represented by the size of the box (the larger the box, the greater the value of the indicator). For example, Figures 6 and 7 show the relevance of the research groups (RGR) and researchers (RR), respectively.

5. Case study

To validate the proposed scheme, a case study of Universidad Autónoma de Yucatán (UADY), the most prominent educational institution in Southeastern Mexico, was carried out. This institution has 15 faculties distributed over five campuses, and a research centre focussed on two areas of study, as shown in Figure 8.

As of 25 November 2019, more than 26,000 students were following high school, undergraduate and postgraduate educational programmes at UADY (2019).

As of 1 June 2018, UADY had 78 research groups and 824 full-time professors. In Mexico, the Faculty Improvement Programme (PROMEP, for its acronym in Spanish) evaluates research groups, assigning them a status according to their level of consolidation obtained, amongst other factors, for its scientific production; in addition to evaluating full-time professors, assigning them the PROMEP profile, which recognises their scientific and educational work (DGESU, 2020). Another Mexican evaluating body is the National Research System (SNI, for its acronym in Spanish), which recognises national researchers according to their scientific relevance (from least to greatest importance, candidate, level 1, level 2 and level 3) (CONACYT, 2020).

A study of the collaborations between UADY researchers has been carried out using graph theory (Guerrero-Sosa et al., 2019a). Figure 9 shows the research groups by campus and by the level of consolidation, and it can be seen that the campuses for Exact Sciences and Engineering and Social Sciences, Economic Administration and Humanities have the largest number of research groups, with 21 each.

Figure 10 presents a summary of the professors at UADY by campus and gender. It can be seen that the two campuses with the highest numbers of academics are Exact Sciences and Engineering and Social Sciences, Economic Administrative and Humanities, with 232 and 226, respectively.

For each of the 824 full-time professors, their output was retrieved from Scopus and the repositories associated with the national repository. It was found that 438 (53.15%) had scientific publications stored in these repositories. Figure 11 shows a summary by the campus of the number of professors with scientific publications with accreditation PROMEP and SNI.

Figure 12 shows the distribution of the types of publications. The results show that articles represent the majority of the publications and books the lowest percentage. Other types of publications include letters to the editor, editorials and reviews.

From the authors’ affiliations described in the retrieved information from Scopus, the five Mexican institutions (Figure 13), the five countries (Figure 14) and the five foreign institutions (Figure 15) with the most collaborations with the UADY (where collaboration was defined as participation by an external author on a publication) and the publications arising from these collaborations were obtained.

Based on these results, the proposed indicators for research groups and researchers were calculated. The results are presented below.

5.1 Research groups

Table 2 summarises the status of production and relevance of the research groups within UADY. It can be observed that in all cases, the maximum values are very far from the average.

Figure 16 presents a treemap of the accumulated values of the RGR indicator. Although the numbers of research groups on the two campuses of Exact Sciences and Social Sciences were the same, it was found that the greatest relevance (RGR) was concentrated at the Biological Sciences campus. This was followed by the Exact Sciences campus, the Biomedical Unit Research Centre, the Social Sciences campus and, finally, the Health Sciences campus, from higher to lower relevance. However, no production was found that would enable this indicator to be evaluated for the research groups at the Research Centre for Social Sciences Unit.

Figure 17 shows the indicator for the total relevance of each research group within the UADY, taking into account the degree of consolidation. Figure 17(a) shows that of the nine research groups in formation, production was found in the repositories of only three. The most relevant was that of the CIR-Biomédicas, with an RGR of 26. Figure 17(b) shows the research groups in consolidation, which were 59.37% productive (i.e. 19 of 32 research groups) and these were distributed as follows: two in Biological Sciences, seven in Exact Sciences, eight in Health Sciences and two in CIR-Biomedical. The most relevant research group in consolidation was associated with the CIR-Biomedical, with an RGR of 967. Finally, Figure 17(c) shows the consolidated research groups, with 78.94% productivity (i.e. 30 of 38 research groups), which were distributed as follows: 8 in Biological Sciences, 10 in Exact Sciences, 4 in Health Sciences, 4 in the CIR-Biomedical and 4 in Social Sciences. The most relevant consolidated research group was associated with Exact Sciences, with an RGR of 3,057.

In addition, a graph of the collaboration between research groups is presented based on these results (Figure 18). Each vertex represents a research group, and each edge indicates that there is at least one collaboration between the groups. The colour of each vertex depends on the faculty. It can be observed that a large proportion of the research groups have not collaborated with others. The body with the greatest number of collaborations with others is Biomedicine for Infectious and Parasitic Diseases, part of the Research Centre for Biomedicine Unit, which has collaborated with Animal Health (Faculty of Veterinary Medicine); Endemic, Emerging and Re-emerging Diseases in the Tropical Region (Faculty of Medicine); Reproductive and Genetic Health (Research Centre for Biomedicine Unit); and Microbiology, Pathology and Dental Molecular Biology (Faculty of Dentistry).

5.2 Researchers

Table 3 summarises the production status and relevance of the researchers. It can be observed that in all cases, the maximum values are very far from the average.

Figure 19 shows an example of the RR values obtained for the most relevant researcher from each campus.

Figure 20 presents a treemap of the accumulated values of the researcher relevance indicator (RR) and its percentage with respect to the sum of all the indicator values for each teacher. Unlike the research groups, the highest score for relevance at UADY is achieved by the Exact Sciences campus, followed by the Biological Sciences campus. In this case, the campus with the highest number of professors is the one with the most scientific relevance. In addition, there is some production by the professors at the Research Centre for Social Sciences Unit.

A graph of the collaboration between researchers can be created based on the results and this is shown in Figure 21. Each vertex represents a teacher and each edge indicates that at least one collaboration between teachers could be found in the products stored in Scopus and in the open access repositories. The colour of each vertex depends on the faculty. It can be observed that a large proportion has collaborated with other professors.

6. Discussion

According to the results presented above, the Exact Sciences and Biological Sciences campuses represent the majority of the relevance of both the research groups and the researchers at UADY. The Exact Sciences campus contains the largest number of full-time professors and the largest number of research groups (together with the Social Sciences campus).

However, a higher number of elements does not always mean that the accumulated value of the indicators will be greater. The Biological Sciences campus offers proof of this, as despite having 10 research groups (less than half of those on the Exact Sciences campus) and 84 researchers, it represents most of the relevance of the research groups and is the second most relevant in terms of researchers.

The three campuses with the largest numbers of consolidated research groups are, in descending order: Exact Sciences (10), Social Sciences (9) and Biological Sciences (8). For the first and third of these, it can be inferred that there is a relationship between the high value of the relevance indicator and the high level of accreditation by PROMEP. However, the Biomedicine Unit Research Centre has greater relevance than the Social Sciences Campus, even though it has only four consolidated research groups. No production was found for the three research groups within the Social Sciences Unit Research Centre, so it was not possible to carry out this evaluation.

In terms of researchers, the Exact Sciences campus represents most of the relevance, followed by the Biological Sciences campus. Unlike the research groups, the production of the Social Unit Research Centre has the least scientific relevance of the UADY researchers.

For the Social Sciences campus, only 23.8% of the full-time professors had production stored in Scopus and in the national repository, while for the Health Sciences campus, the figure was 54%. As a consequence, the relevance of both their research groups and their individual researchers is low compared to the Exact Sciences and Biological Sciences campuses.

The indicators proposed here both for research groups and researchers are based on the type of publication, the degree of collaboration and to a lesser extent, the non-indexed production (in this case with reference to Scopus). Weights are arbitrarily assigned to the type of publication and by each external collaborator, corresponding to the level of interest for the Mexican evaluating organisations (SNI and PROMEP). However, these weights can be adapted to the needs of organisations in other countries and new aspects may also be evaluated.

7. Conclusions and future work

This article has presented a set of indicators for measuring the relevance of research groups and researchers, based on the products stored in the repositories of indexed and open access publications. The use of web services and a protocol for interoperability between OAI-PMH repositories for the recovery of scientific production was proposed.

An action research methodology is used consisting of five phases, from the definition of the problem scenario to publishing the results and this approach was applied throughout the investigation process.

Both group and individual production within a Mexican public university were studied based on the proposed indicators. In general, campuses containing a high number of impact research groups were found to account for most of the scientific group relevance. In addition, researchers at the Exact Sciences and Biological Sciences campuses represent the majority of the university’s research impact, despite the fact that the Biological Sciences campus has fewer researchers than the Health Sciences campus.

In future work, it will be necessary to define an indicator that measures the usefulness of citations from a research group and a researcher, to adjust the definition of the indicator model to values ranging from 0 to 100 and consequently to make decisions and evaluations that require the use of the proposed indicators more transparent and easy to understand. Finally, using other data sources such as Google Scholar and Web of Science can broaden the research landscape.

Figures

Action research methodology used in the research

Figure 1.

Action research methodology used in the research

Architecture of the proposed system

Figure 2.

Architecture of the proposed system

Procedure for calculating indicators for research groups

Figure 3.

Procedure for calculating indicators for research groups

Procedure for calculating investigator indicators

Figure 4.

Procedure for calculating investigator indicators

Main page of the user interface

Figure 5.

Main page of the user interface

Visualisation of the results for the relevance of research groups (RGR)

Figure 6.

Visualisation of the results for the relevance of research groups (RGR)

Visualisation of the results for the relevance of researchers (RR)

Figure 7.

Visualisation of the results for the relevance of researchers (RR)

Campuses and their faculties in UADY

Figure 8.

Campuses and their faculties in UADY

Research groups at UADY by campus and degree of consolidation

Figure 9.

Research groups at UADY by campus and degree of consolidation

Professors at UADY by campus and gender

Figure 10.

Professors at UADY by campus and gender

UADY professors with scientific publications with accreditation PROMEP and SNI

Figure 11.

UADY professors with scientific publications with accreditation PROMEP and SNI

Scientific publications by UADY professors stored in Scopus and the national repository

Figure 12.

Scientific publications by UADY professors stored in Scopus and the national repository

Mexican institutions with most collaborations with UADY

Figure 13.

Mexican institutions with most collaborations with UADY

Countries with most collaborations with UADY

Figure 14.

Countries with most collaborations with UADY

Foreign institutions with most collaborations with UADY

Figure 15.

Foreign institutions with most collaborations with UADY

Treemap of the indicator of the relevance of research groups (RGR) by campus

Figure 16.

Treemap of the indicator of the relevance of research groups (RGR) by campus

Values of the RGR indicator for research groups at UADY: (a) research groups at the formation stage; (b) research groups in consolidation and (c) consolidated research groups

Figure 17.

Values of the RGR indicator for research groups at UADY: (a) research groups at the formation stage; (b) research groups in consolidation and (c) consolidated research groups

Collaborations between research groups

Figure 18.

Collaborations between research groups

Values of RR for the most relevant researcher on each campus

Figure 19.

Values of RR for the most relevant researcher on each campus

Treemap of the total researcher relevance indicator (RR) by campus

Figure 20.

Treemap of the total researcher relevance indicator (RR) by campus

Collaboration between researchers

Figure 21.

Collaboration between researchers

Required information about researchers and research groups

Researchers Research groups
  • Full name

  • Name

  • Faculty or area of knowledge

  • Faculty or area of knowledge

  • University from which the researcher’s last degree was gained

  • Line of research

  • List of members

  • Degree of consolidation

Production status and relevance of research groups within UADY

Indicators Minimum value Average Maximum
Number of members 3 4.74 11
Number of indexed publications 0 10.88 102
Number of non-indexed publications 0 0.37 9
Number of total publications 0 11.25 103
TPS 0 180.84 2,373
DCRG 0 81.44 750
TWNIP 0 0.73 18
RGR 0 263.03 3,057

Production status and relevance of researchers

Indicators Minimum value Average Maximum
Number of publications 1 11.42 157
RIRG 0 81.21 2,937
REC 0 208.17 3,210
RIP 0 2.31 178
NIP 0 0.146 6
RR 0.015 292.5 4,361

References

Anabel, B.C. (2014), “Institutional repositories as complementary tools to evaluate the quantity and quality of research outputs”, Library Review, Vol. 63 Nos 1/2, pp. 46-59.

Arencibia-Jorge, R., Barrios-Almaguer, I., Fernández-Hernández, S. and Carvajal-Espino, R. (2008), “Applying successive H indices in the institutional evaluation: a case study”, Journal of the American Society for Information Science and Technology, Vol. 59 No. 1, pp. 155-157.

Avison, D., Lau, F., Myers, M. and Nielsen, P.A. (1999), “Action research”, Communications of the Acm, Vol. 42 No. 1, pp. 94-97.

Bakkalbasi, N., Bauer, K., Glover, J. and Wang, L. (2006), “Three options for citation tracking: Google scholar, scopus and web of science”, Biomedical Digital Libraries, Vol. 3 No. 1, available at: https://doi.org/10.1186/1742-5581-3-7.

Belter, C.W. (2015), “Bibliometric indicators: opportunities and limits”, Journal of the Medical Library Association: Jmla, Vol. 103 No. 4, pp. 219-221.

CONACYT (2018), “Interoperabilidad con el metabuscador del repositorio nacional”, available at: www.repositorionacionalcti.mx/docs/manualesInteroperabilidad/manual_de_Interoperabilidad_Repositorio_Nacional_ver.3.pdf (accessed 11 April 2020).

CONACYT (2020), “Sistema nacional de investigadores”, available at: www.conacyt.gob.mx/index.php/el-conacyt/sistema-nacional-de-investigadores (accessed 20 March 2020).

DGESU (2020), “Programa Para el desarrollo profesional docente, Para el tipo superior (PRODEP)”, available at: www.dgesu.ses.sep.gob.mx/PRODEP.htm (accessed 20 March 2020).

Ding, J., Liu, C. and Kandonga, G.A. (2020), “Exploring the limitations of the h-index and h-type indexes in measuring the research performance of authors”, Scientometrics, Vol. 122 No. 3, pp. 1303-1322.

Falagas, M.E., Pitsouni, E.I., Malietzis, G.A. and Pappas, G. (2007), “Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses”, The FASEB Journal, Vol. 22 No. 2, available at: https://doi.org/10.1096/fj.07-9492lsf.

Frohlich, C. and Resler, L. (2001), “Analysis of publications and citations from a geophysics research institute”, Journal of the American Society for Information Science and Technology, Vol. 52 No. 9, pp. 701-713.

Gómez Dueñas, L. (2010), “Modelos de interoperabilidad En bibliotecas digitales y repositorios documentales: Caso biblioteca digital colombiana”, available at: http://eprints.rclis.org/14878/1/MODELOS_DE_interoperabilidad_BDCOL.pdf (accessed 7 March 2018).

González-Pereira, B., Guerrero-Bote, V.P. and Moya-Anegón, F. (2010), “A new approach to the metric of journals’ scientific prestige: the SJR indicator”, Journal of Informetrics, Vol. 4 No. 3, pp. 379-391.

Guerrero, J., Sánchez, D., Menéndez, V., Castellanos, M.E. and Gómez, J. (2019), “Tools for interoperability between repositories of digital resources”, in Gómez Chova, L., López Martínez, A. and Candel Torres, I. (Eds), Proceedings of INTED '19, IATED, Valencia, pp. 6292-6300.

Guerrero Sosa, J.D.T., Menéndez Domínguez, V.H. and Castellanos Bolaños, M.E. (2018a), “Indicadores de calidad en investigaciones científicas: Antecedentes”, Abstraction and Application, Vol. 19, pp. 6-24.

Guerrero Sosa, J.D.T., Menéndez Domínguez, V.H. and Castellanos Bolaños, M.E. (2018b), “Sistema de índices Para valorar la calidad de la producción académica y la investigación, a partir de repositorios digitales y metadatos”, in Prieto-Méndez, M.E., Pech-Campos, S.J. and Francesa-Alfaro, A. (Eds), X Conferencia Conjunta Internacional Sobre Tecnologías y Aprendizaje, CIATA.org-UCLM, Cartago, pp. 45-52.

Guerrero-Sosa, J.D.T., Menéndez-Domínguez, V., Castellanos-Bolaños, M.E. and Curi-Quintal, L.F. (2019a), “Use of graph theory for the representation of scientific collaboration”, in Nguyen, N.T. (Ed.), 11th International Conference on Computational Collective Intelligence, Springer, Hendaye.

Guerrero-Sosa, J.D.T., Menéndez-Domínguez, V., Castellanos-Bolaños, M.E. and Gómez-Montalvo, J. (2019b), “Use of an ontological model to assess the relevance of scientific production”, IEEE Latin America Transactions, Vol. 17 No. 9, pp. 1424-1431.

Guerrero-Sosa, J.D.T., Menéndez-Domínguez, V.H., Castellanos-Bolaños, M.-E. and Moo-Mena, F. (2020), “Document database for scientific production”, in Moo-Mena, F. and Duarte, E.P. (Eds), 8th International Workshop on ADVANCEs in ICT Infrastructures and Services, Cancún, Mexico, pp. 129-132.

Harzing, A.-W. and Alakangas, S. (2016), “Google scholar, scopus and the web of science: a longitudinal and cross-disciplinary comparison”, Scientometrics, Vol. 106 No. 2, pp. 787-804.

Ip, A., Morrison, I. and Currie, M. (2001), “What is a learning object, technically?”, in Fowler, W. and Hasebrook, J. (Eds), Proceedings of WebNet '01- World Conference on the WWW and Internet, Association for the Advancement of Computing in Education (AACE), pp. 580-586.

Jacso, P. (2005), “As we may search - Comparison of major features of the web of science, scopus, and google scholar citation-based and citation-enhanced databases”, Current Science, Vol. 89 No. 9, pp. 1537-1547.

Jörg, B., Höllrigl, T. and Sicilia, M.A. (2012), “Entities and identities in research information systems”, in Jeffery, K.G. and Dvořák, J. (Eds), E-Infrastructures for Research and Innovation - Linking Information Systems to Improve Scientific Knowledge Production – 11th International Conf. on Current Research Information Systems.

Kotsemir, M. and Shashnov, S. (2017), “Measuring, analysis and visualization of research capacity of university at the level of departments and staff members”, Scientometrics, Vol. 112 No. 3, pp. 1659-1689.

Lagoze, C. Van de Sompel, H. Nelson, M. and Warner, S. (2005), “Open archives Initiative - Protocol for metadata Harvesting – Guidelines for repository implementers”, available at: www.openarchives.org/OAI/2.0/guidelines-repository.htm (accessed 6 November 2017).

Mary, M.R. (2015), “The role of institutional repositories in developing the communication of scholarly research”, OCLC Systems and Services: International Digital Library Perspectives, Vol. 31 No. 4, pp. 163-195.

Mazov, N.A. and Gureyev, V.N. (2018), “Modern challenges in bibliographic metadata identification”, 3rd Russian-Pacific Conference on Computer Technology and Applications (RPC '18), pp. 1-4.

Méndez Rodríguez, E. (2006), “Dublin core, metadatos y vocabularios”, El Profesional de La Informacion, Vol. 15 No. 2, pp. 84-86.

NSF (2018), “Open archives initiative protocol for metadata harvesting”, available at: www.openarchives.org/pmh/ (accessed 1 April 2020).

Rahayu, S., Laraswati, D., Pratama, A.A., Permadi, D.B., Sahide, M.A.K. and Maryudi, A. (2019), “Research trend: Hidden diamonds – the values and risks of online repository documents for Forest policy and governance analysis”, Forest Policy and Economics, Vol. 100, pp. 254-257.

Ranjan, C.K. (2017), “Bibliometric indices of scientific journals: Time to overcome the obsession and think beyond the impact factor”, Medical Journal Armed Forces India, Vol. 73 No. 2, pp. 175-177.

RECOLECTA (2018), “OpenAIRE”, available at: https://recolecta.fecyt.es/open-aire (accessed 8 March 2018).

Selivanova, I.V., Kosyakov, D.V. and Guskov, A.E. (2019), “The impact of errors in the sсopus database on the research assessment”, Scientific and Technical Information Processing, Vol. 46 No. 3, pp. 204-212.

Senso, J.A. and Rosa Piñero, A. D L. (2003), “El concepto de metadato: algo más que descripción de recursos electrónicos”, Ciência Da Informação, Vol. 32 No. 2, pp. 95-106.

UADY (2019), “Universidad autónoma de yucatán”, available at: www.uady.mx/nuestra-universidad (accessed 25 March 2020).

Acknowledgements

The current work has been developed thanks to the support by Consejo Nacional de Ciencia y Tecnología (CONACYT, Mexico) through the grant: 853088/630948.

Corresponding author

Víctor Hugo Menéndez-Domínguez can be contacted at: mdoming@correo.uady.mx

Related articles