Set up a supply chain observatory through the comparison of multi-criteria parsimonious methods

Purpose – This studypresents the development of a supply chain(SC)observatory,which isa benchmarking solution to support companies within the same industry in understanding their positioning in terms of SC performance. Design/methodology/approach – A case study is used to demonstrate the set-up of the observatory. Twelveexpertsonautomaticequipmentfor the wrapping andpackagingindustrywereaskedto selectaset of performance criteria taken from the literature and evaluate their importance for the chosen industry using multi-criteria decision-making (MCDM) techniques. To handle the high number of criteria without requiring a high amount of time-consuming effort from decision-makers (DMs), five subjective, parsimonious methods for criteria weighting are applied and compared. Findings – A benchmarking methodology is presented and discussed, aimed at DMs in the considered industry. Ten companies were ranked with regard to SC performance. The ranking solution of the companies was on average robust since the general structure of the ranking was very similar for all five weighting methodologies,thoughsimplified-analytichierarchyprocess(AHP)wasthemethodwiththegreatestabilityto discriminatebetweenthecriteriaofimportanceandwasconsideredfastertocarryoutandmorequicklyunderstoodbythedecision-makers. Originality/value – DevelopinganSCobservatoryusuallyrequiresmanagingalargenumberofalternatives and criteria.The developedmethodology usesparsimoniousweightingmethods,providingDMswith aneasy-to-use and time-saving tool. A future research step will be to complete the methodology by defining the minimum variation required for one or more criteria to reach a specific position in the ranking through the implementation of a post-fact analysis.


Introduction
In today's competitive environment, characterized by high innovation, the competitive advantages can be constantly eroded.This means that companies should understand their Supply chain observatory reference market and industry performance standards as much as possible, in order to be able to evolve in line with it, without having to reinvent solutions that have already been proved to provide positive results (Dobrzykowski et al., 2012).Companies that exploit benchmarking can improve their performance by setting realistic goals, continuously improving internal processes, enhancing their process thinking and innovation diffusion and making financial savings.It involves learning from competitors' successful experiences through methodically measuring, assessing, comparing and applying assimilated knowledge to improve performances (Krishnamoorthy and D'Lima, 2014).In the globalized market, companies' boundaries are unclear due to outsourcing and information technology (IT) (Balfaqih et al., 2016).Thus it is commonly accepted that competition is no longer about competing with individual organizations, but rather with the supply chains (SCs) (Trkman et al., 2010).This study therefore focuses on the performance of the SC.
There is a vast literature on SC performance and benchmarking (Hoek, 1998;Wong and Wong, 2008;Kailash et al., 2017), highlighting different focuses and different metrics.According to (Maestrini et al., 2017), the scientific literature often deals with partial and incomplete aspects of the overall performance of the SC.Most studies focus only on a few key aspects and rarely analyze the performance of the SC as a whole (Balfaqih et al., 2016), and often it is not considered that different strategies and set of metrics are needed for different sectors (Jagan Mohan Reddy et al., 2019).Despite the large number of models and frameworks developed for SC performance analysis and monitoring, they still face limitations in terms of practical applicability due to the competitive environment, the emergence of innovative technologies like digitalization and new sustainability-related requirements (Oubrahim et al., 2022).In addition, supply chain management (SCM) functions can differ between developing and developed countries (Ramos et al., 2022), especially regarding sustainability issues (Ali et al., 2021;Ali et al., 2023a, b).
Two fundamental aspects thus emerge from the academic discussion on the SC performance metrics (Bigliardi and Bottani, 2014;Hallikas et al., 2021;Jha et al., 2022).The first is that since the SC is a complex system, involving many actors with different strategic objectives, the goal of perfectly representing it through a defined series of parameters and metrics is very difficult.The second point is that the SC performance depends on the reference context (business sector, geographic positioning, available resources, business strategy, study objective, etc.).Moreover, the need for DMs to provide responses rapidly, taking into account the knowledge of the experts, is also highlighted (Oubrahim and Sefiani, 2022).In this context, the use of multi-criteria decision-making (MCDM) techniques allows for transforming the expertise of one or more DMs into a standardized and shared model, thus saving time and resources for selecting and ranking the alternatives available (Velasquez and Hester, 2013;Khan et al., 2019).
One of the key points of multi-criteria models is the set of weights to assign to the criteria, as these weights represent their significance quantitatively and consequently influence the results of the analysis (Vinogradova et al., 2018).The weights of the criteria can be elicited either directly or indirectly.Indirect elicitation methods make use of ranked (sorted) examples from which the criteria weights are obtained indirectly, i.e. they are implicitly given by the DM when assessing the ranking (class) of the examples taken as references.For indirect elicitation methods, the reader can refer to (Lolli et al., 2019(Lolli et al., , 2022)).In direct elicitation methods, the criteria weights are assessed directly by the DMs without using any pre-ranked (pre-sorted) example.As we aim to consider directly the knowledge of the experts, we focus solely on direct elicitation methods that can be divided further into subjective and objective methods.When reliable weights cannot be obtained from experts (e.g. if the DMs do not have sufficient expertise) objective methods can be used, which apply mathematical models to the decision matrix (e.g. the entropy method (Deng et al., 2000)).In subjective methods the weights are based on the judgment of experts in the field, and this is our case.

BIJ
The complexity of the SC performance metrics demands handling a high number of criteria, so, among the subjective weighting methods, we consider the so-called parsimonious weighting methods requiring the least possible number of decisions from the DM, reducing the cognitive effort of the DMs' while maintaining accuracy in the calculation of the weight values.
To address the aforementioned issues, this study presents an SC observatory, which is an assessment tool aimed at supporting companies in understanding their positioning in terms of SC performance from a benchmarking perspective.The tool was developed by focusing on manufacturers of automatic equipment for wrapping and packaging, which is a flourishing industry in Italy.
The tackled research questions are: RQ1. Are weighting parsimonious methods effective for handling a numerous set of SC performance criteria from a benchmarking perspective, providing the DMs with an accurate, easy-to-use and time-saving tool?
RQ2.How to establish an SC observatory for a specific industrial sector?
This paper contributes to field of the overall SC performance evaluation by adopting a benchmarking methodology using parsimonious criteria weighting methods.The comparison of five parsimonious criteria weighting methods is implemented in order to analyze their performance and also their simplicity of use, comprehensibility of the requests and perception of the DMs.As highlighted by the performed literature review, and at the best of our knowledge, this comparison has never been made neither parsimonious methods have been adopted in the field of SC performance despite the high number of involved criteria.The numerical application demonstrates the robustness of the rankings, which cannot be generalized as being casesensitive, but allows results to be used for improvement actions starting from a solid basis.Hence, we provide a theoretical contribution in the field of SC performance, where parsimonious methods are effective to deal with a high number of criteria and show a good appreciation by DMs.The remainder of this paper is organized as follows.Firstly, a set of criteria for the evaluation of the SC performance was created, selecting from the literature the most important criteria pertaining to the automatic equipment for wrapping and packaging industry.Secondly, a ranking of companies was created using the technique for order preference by similarity to ideal solution (TOPSIS) MCDM method, after weighting the criteria according to an expert evaluation.Five different parsimonious subjective methodologies for determining the weights of the criteria are compared and their performances are discussed.

Literature review
The following literature review aims at showing how the researchers tried to address the evaluation of the SC performance through the application of MCDM methods and the limitations of previous studies.As the supply chain operation reference (SCOR) and the balanced scorecard (BSC) models, two of the most used SC evaluation models (Jagan Mohan Reddy et al., 2019), showed some limitations since they do not include all SC functions and are scarcely flexible (Jha et al., 2022), we limited our review to the studied using MCDM methods to include all the relevant performance indicators in a specific sector.Moreover, we investigated the academic studies comparing the subjective criteria weighting methods and specifically the parsimonious methods.
2.1 Multi-criteria methods to evaluate the supply chain performance A complex hybrid exploratory three-phased MCDM model based on the "Decision making trial and evaluation laboratory" (DEMATEL) is used in Chand et al. (2020) to evaluate the SC Supply chain observatory performance in the mining industry.DEMATEL is mainly used for analyzing cause and effect relationships among components of a system (Si et al., 2018).It establishes the level of interdependence between the chosen criteria.The criteria are grouped into seven macrocategories: sustainability, order planning, collaboration capacity, operational performance, delivery performance, customer service level, costs and financial ratios.An integrated approach between the analytic hierarchy process (AHP) (Saaty, 1977) and the BSC is used by Varma et al. (2008) to assess the SC performance of the oil industry.Sufiyan et al. (2019) evaluate the performance of the food sector SC by integrating fuzzy DEMATEL and DEMATEL based-analytic network process (DANP) methods.They used 18 criteria, grouped into six categories: agility, sustainability, quality, level customer service, collaboration skills and SC efficiency.In some cases, the analysis of the performance focuses only on particular aspects of the SC, such as in the study by Arshinder et al. (2007), where collaboration and coordination skills are focused on using a fuzzy AHP method.
Most studies that address the SC performance are focused on sustainability (Ali et al., 2020;Qorri et al., 2022;Ali et al., 2023a, b;Erol et al., 2011) investigate the SC sustainability from environmental, social and economic perspectives, using a total of 37 criteria, which are weighted through the fuzzy entropy method, while the performance calculation is obtained through the fuzzy multi-attribute utility theory (FMAUT).The criteria considering the whole life cycle of the product are analyzed by Sarkis (2003) using the analytic network process (ANP) method.The sustainability of the service SC is analyzed using the Viekriterijumsko KOmpromisno Rangiranje (VIKOR) and ELimination Et Choix Traduisant la REalit e (ELECTRE) models by Chithambaranathan et al. (2015) and Anas et al. (2018) use criteria defined by SCOR to evaluate the SC in the hospital environment, through a fuzzy AHP and best-worst method (BWM) for determining the weights of the criteria.Menon and Ravi (2022) use the AHP-TOSPIS methods to select sustainable suppliers in an electronics SC; the combined methods are used to evaluate the quantitative and qualitative data and manage the involved uncertainty.The social dimension of sustainability in warehousing operations is investigated by Ali and Kaur (2021) using the BWM with a hybrid approach; the study aims at understanding the effectiveness of corporate social responsibility in implementation of the 9 identified social sustainability practices.
The prioritization of 36 SC performance improvement indicators in the plastic manufacturing industry is implemented by Govindan et al. (2017) by using the fuzzy AHP method to improve the handling of human-based qualitative judgments; the sensitivity analysis examines the priority ranking of the indicators.A mixed SCOR-AHP approach is applied by Mañay et al. (2022) to study the performance of SCs at the floricultural sector.
The TOPSIS method is widely used in the SCM field (Behzadian et al., 2012;Velasquez and Hester, 2013;Joshi et al., 2011) use a Delphi-AHP-TOPSIS-based methodology to develop a benchmarking framework that evaluates the cold chain performance of a company, while (Kumar et al., 2022) apply the hybrid AHP-fuzzy TOPSIS method to identify the critical performance criteria, responsible for the performance measurement of cold SC and suggest the best possible alternatives to improve it.The management performance of an electronic SC of an Indian automobile industry is evaluated through a hybrid approach using AHP-TOPSIS methods by Tyagi et al. (2014).A fuzzy TOPSIS method is used to evaluate the performance of the Indian petroleum SC in Kumar and Barua (2022).

Subjective criteria weighting methods comparison
There are several examples of comparisons between criteria weighting methods, which mainly consider subjective methods (S ¸ahin, 2021).Although different aspects of the results are analyzed, the methods considered in the analysis tend to be the same.In fact, one of the main issues in comparing the methods for criteria weighting is that the "real" weight values BIJ are lacking.Thus it is not possible to establish a standard for evaluating the validity of the weights assigned by each method (van Til et al., 2014).It is therefore necessary to find other bases for the comparison of the methods.
Many of the studies use the most common criteria weighting methods, such as AHP and BWM, as terms of comparison to demonstrate the ability of other or newly developed methods to obtain the same results in a simpler way.van Til et al. (2014) compare the five-point scale, ranking, AHP and BWM assessment techniques from a total of 14 criteria and 60 DMs.The study shows that the choice of method in a group decision is not decisive, while it is for individual decisions.In the latter case pairwise comparisons are preferable, due to the greater ability in differentiating the criteria.Since some criteria impact more on the overall SC performance than others, the pairwise comparison of the AHP method, which ensures consistency among DMs when assigning the importance of a given factor over another, is preferred for the overall SC performance evaluation (Khan et al., 2019).Eight subjective methods for determining subjective weights, including AHP and SMART (simple-multi-attribute utility theory), are compared by N emeth et al. (2019) on the basis of the general complexity, the use of resources, software requirements and the possible distortions in the judgment due to the method setting.The results show that more complex methods make the judgment less distorted, but at the same time they require more resources.Thus the most appropriate method needs to be carefully selected on the basis of the reference context, considering the acceptable accuracy tolerance, the size of the group of DMs and the resources available.
AHP is used as a term of comparison also in a study conducted by S ¸ahin (2021), where BMW and four objective methods are evaluated.The results of the study reveal similar weight values between the two subjective methods and similar weights between the four subjective methods.However, they differ when comparing the results obtained through subjective or objective methodologies, which is why the study suggests using a combined approach to obtain more reliable values, where possible.
Two other cases in which AHP is used as a term of comparison are the studies conducted by Riabacke et al. (2012) and by P€ oyh€ onen and H€ am€ al€ ainen (2001).In the first study different methods (including direct rating, point allocation, SWING, SMART) are evaluated according to three fundamental concepts: extraction (how the information is derived from the input provided by the DM), representation (the format in which the information derived from the DM is structured) and the interpretation (how the meaning is attributed to the information derived from the DM).The second study compares SMART, AHP, SWING, point allocation and TRADEOFF methods, assuming that these methods provide similar weights as they are based on the same theoretical foundations.
Some studies compare simple subjective methods, where pair comparisons are not required but, for example, a simple classification by importance or direct scoring is implemented (Bottomley and Doyle, 2001;Alfares and Duffuaa, 2008;Borcherding et al., 1991) compare the ratio method, SWING, TRADEOFF and pricing out method considering their internal consistency, the level of agreement between the weights obtained with the different methods and their external validity (the concordance between the obtained weights and judgments of the experienced managers).The simpler methods however need to be used carefully, as an inverse relationship is possible between the precision of the methods and their simplicity (N emeth et al., 2019).
Another area of interest concerns the subjective perception of the DM and the judgment biases found in the implementation of subjective methodologies.Weber and Borcherding (1993) analyze SMART, SWING and TRADEOFF concluding that the weights could be influenced firstly by the choice of weighting method, the hierarchical structure of the problem, and the reference used.Thus, there is no way to determine the "true" weights of the criteria considered, since all weighting methods induce bias; a possible solution is therefore to rely on multiple evaluations.Other cognitive biases are anchoring bias, which is influenced Supply chain observatory by the structure of the weighting method and the first judgment that is submitted to the DM (Buchanan and Corner, 1997;Rezaei, 2021), along with classification bias, loss aversion and status quo (Deniz, 2020).

Research methodology
A case study is used to demonstrate the set-up of an observatory to support benchmark analyses for companies belonging to the same industry.The observatory structure focuses on a set of quantitative criteria and is aimed at providing an exhaustive representation of companies' performance in the SC area.
The performance criteria of the SC were taken from the literature, and a group of experts selected the most important, each belonging to five macro-categories that the experts judged as the most significant for the chosen industry: purchasing, planning, internal logistics, transportation and quality.The experts (see Table 1) were asked to assign a weight to the selected criteria as well as to the macro-categories in order to obtain the criteria global weights.The TOPSIS method was then applied to calculate, for each company, an overall score combining the scores obtained from the set of criteria weighted by applying the aforementioned five subjective parsimonious methods for criteria weighting.
The research methodology is displayed in Figure 1.

Case studythe sample companies
The industry analyzed is the automatic equipment for wrapping and packaging industry, a key industrial sector in Italy and a leading sector worldwide, made up of 633 companies in Italy, 81% of which are small production units.The companies are mainly located in the north of Italy and particularly in the Emilia-Romagna region, which is also known as "packaging valley" as it makes half of the almost eight-billion-euro national turnover for the sector.Most of the production of wrapping and packaging machines is absorbed by the Food and Beverage customer sector (58.2%), followed by Pharmaceutical and Biomedical (17.4%),Cosmetic and Personal Care (4.5%) and Chemicals and Home (3.5%) [1].The sample of selected companies includes 10 companies belonging to the chosen sector with a turnover of over 50 million euros and based in the Emilia-Romagna region (IT).The turnover distribution is shown in Figure 2.

The decision-makers
Twelve professionals with significant experience (see Table 1) in the reference industry were asked to select the relevant criteria and evaluate them.To guarantee the quality of the expert's judgment, they were chosen among managers operating in SC design, development and management, ensuring a deep knowledge of the competitive environment and of the complexity of the SC performance evaluation.
After expressing their criteria weights, the experts gave a qualitative evaluation of the five parsimonious weighting methods applied.

Criteria selection guiding principles
The performance criteria of the SC were collected through a literature search carried out on the Scopus, WoS and Google Scholar databases.The literature survey was conducted searching for such keywords as "supply chain performance indicators", "supply chain performance criteria", "supply chain performance parameters", "supply chain performance metrics".

Supply chain observatory
Based on the experts' knowledge of the SCM of the reference industry, the following hypotheses drove the selection of the criteria: (1) All the metrics had to refer to a one-year time period; (2) Each metric involving inbound materials only considered the goods and raw materials that directly contribute to obtain the finished product; (3) When referring to a warehouse, this always meant the raw materials warehouse, since, according to experts, the automatic machines industry generally adopts a make-to-order (MTO) or assemble-to-order (ATO) production.This is because often this type of company has a catalog of standard machines, to which specifications and adaptations can be added upon the customer's request.For this reason, warehouses dedicated to finished products were not considered relevant in our study; (4) The "Production" business function was not included among the considered macrocategories since it was considered too specific with respect to each company.The experts suggested that in this sector some companies produce each component of the machine, while others only deal with the assembly, others produce only some components, etc., Thus this business area is not significant for the purposes of this study.Furthermore, other metrics included, such as punctuality in the delivery of the finished product to the end customer (proposed in the "Planning" macro-category) indirectly require the correct management of production; (5) No rejected products were foreseen (rather, the hours for reworking a product not qualitatively acceptable can be considered); (6) No sent back products by the end customers were foreseen (the number of complaints and the costs incurred to carry out maintenance or repairs were considered at the end customer).

Criteria weightingfive parsimonious methods
Many methods have been developed for determining criteria weights based on expert judgments, with AHP and BWM (Best Worst Method) being the most common.AHP is one of the best known and most common methods (Ishizaka and Labib, 2011).It is based on pairwise comparisons made by expert DMs to create a priority scale.Although this is an easy and widely used method, the number of necessary pairwise comparisons increases rapidly with the increase in the number of criteria.It has also been shown that it is in fact almost impossible to perform consistent pairwise comparisons if more than nine criteria are present (Pamu car et al., 2018).The validation of the results is then based on the degree of consistency of the matrix, which must not exceed 0.10.Given n criteria, the BMW method (Rezaei, 2015) is aimed at improving some of the AHP flaws, reducing the number of pairwise comparisons from n(n -1) to 2 n -3 and the probability of matrix inconsistency.It is based on the concept of identifying the best and the worst criterion and proceeds by comparing these with the other criteria; the values of the weights are then obtained through an optimization model.
Figure 3 compares the number of judgments required by the DMs in AHP, BWM and parsimonious weighting methods as a function of the number of criteria considered.Parsimonious methods significantly reduce the judgments required as the number of criteria increases over 10.
The experts were asked to evaluate the previously selected criteria using five parsimonious subjective methods.The five methods are (for details, see the papers of the cited authors): BIJ (1) Simplified-AHP: this method, proposed by Benitez et al. (2019), does not modify the AHP algorithm (developed by (Saaty (1977)), but reduces the number of pairwise comparisons by selecting just a sample of n pairwise comparisons which while providing balanced and unbiased (incomplete) information, still produces consistent and robust decisions.
(2) AHP-express (Leal, 2020), is another simplified method for the application of the AHP.
Assuming that DMs make consistent evaluations, the number of pairwise comparisons is reduced to n-1 and the comparisons are carried out between a chosen criterion (normally the most significant one) and all the other criteria.The assumption of consistency in the evaluation is validated since the inconsistency occurs mainly in the comparisons between alternatives that are less significant.Thus taking the criterion deemed most important as the first term of comparison, the DM will pay more attention to the assignment of the score.
(3) Full consistency method (FUCOM), developed by Pamu car et al. (2018).This is based on the comparison of n-1 pairs of criteria, ensuring maximum consistency thanks to the validation of the model by determining the deviation from full consistency.By respecting the transitivity conditions defined by specific constraints, it also eliminates the problems related to comparisons of redundant pairs.(2020).This method, as with the previously described methods, requires n-1 pairwise comparisons and is based on the identification of the best criterion.It classifies the criteria in decreasing series from the most to the least important in order to group the criteria into significance levels, based on the judgment expressed by the DMs.In addition, unlike standard AHP, it obtains consistent results even with a number of criteria greater than nine.Preferences are not defined on the 1/9-9 scale, which limits the expression of preferences.
(5) Level-based weight assessment (LBWA) developed by Zi zovic and Pamucar (2019).This method is based on the classification of criteria in levels of importance, requiring a reduced number of comparisons in pairs of criteria (n-1).Another advantage is that the algorithm does not become more complex as the number of criteria increases, which is why it can be used in contexts with a high number of criteria.The existence of the elasticity coefficient also enables further corrections of the weight values according to the preferences of the DMs.
The set of selected criteria was broken down into six sections: five sections consisted of the aforementioned macro-categories and the sixth section, SC, concerned the global weighting of the macro-categories.The DMs were asked to attribute weights to one or more sections of the set, according to their area of expertise in SC management; the judgments of three experts were assessed for each macro-category.
Then, the weights w i;j;z resulting from the judgments of the DMs, were calculated as the average of the judgments of the three DMs for the selected method: where w i;j;k;z represents the weight associated with the criterion j (j 5 1, . .., J i ) within the macro-category i (i 5 1, . .., I), defined by the DM k (k 5 1, . .., K) for the weighting method z (z 5 1, . .., Z).The weights W i;z of the macro-categories are calculated as: where W i;k;z represents the weight associated with the macro-category i, defined by the DM k for the weighting method z.
The degree of discrepancy between the DMs D i , within each macro-category i, was then measured by calculating the variance of the results according to equation (3): where w i;z ¼

Evaluation of parsimonious methods
The performances of the five parsimonious methods were evaluated both in terms of the perceptions of the DMs and by calculating the discriminating capacity of each method.

BIJ
A questionnaire (available in Appendix 1) was given to the experts to assess their opinions regarding the time needed to respond, the comprehensibility of the method and their preferred method.
The discriminating capacity d z of each method z was measured by calculating the average variation range for each proposed methodology, according to equation (4): where maxðw i;j;z;k Þ and minðw i;j;z;k Þ are the maximum and minimum values respectively of the weights assigned by the DM k, for the macro-category i, according to the method z.
3.6 Company ranking and overall scoring by TOPSIS TOPSIS was selected as the method for our case study due to its practicality and ease of use, speed of application and the standardization of the steps (the number of steps in fact does not change with any increase in criteria (Velasquez and Hester, 2013).It is based on the classification of alternatives based on the concept that the selected alternative must simultaneously have the shortest distance from the positive-ideal solution and the farthest distance from the negative-ideal solution.The positive-ideal solution maximizes the benefit criteria and minimizes the cost criteria, whereas the negative-ideal solution maximizes the cost criteria and minimizes the benefit criteria.This approach applies well to the benchmark concept that is the focus of this study.Appendix 1 outlines steps in the TOPSIS implementation.The global weights w * i;j;z used as input for the TOPSIS methodology implementation are calculated as the product between the weight of the criterion w i;j;z (eq.( 1)) and the weight of the macro-category W i;z (eq.( 2)) to which it belongs: w * i;j;z ¼ w i;j;z 3 W i;z (5)

Criteria set
According to the criteria selection guiding principles previously agreed by the experts, a total of 22 numerical KPIs were considered as relevant for the reference industry, categorized according to the five macro-categories, each one corresponding to a relevant company area: purchasing, planning, internal logistics, transportation and quality.The set of criteria chosen by the experts aims to be exhaustive, thus justifying the use of parsimonious weighting methods.The 22 criteria are presented in Table 2.
The selected criteria can be both costs and benefits for the companies, as specified in column 4 of Table 2.

Criteria weighting using the five parsimonious methods
Figure 4 shows the comparison of the weights obtained after the weighting process using the five parsimonious criteria weighting methods.The graph includes the five macro-categories and the SC evaluation; the weights of the criteria belonging to the five macro-categories are calculated using equation ( 1), the weights of the macro-categories (SC) are calculated using equation (2).The calculated values and the weight assigned to the 22 criteria and to the macro-categories are available as supplementary material (Appendix 3).
The calculated weights can be used both to analyze the importance of the selected criteria and to compare the results obtained with the different parsimonious criteria weighting methods.

Supply chain observatory
Macro

BIJ
Regarding the macro-category Purchasing, the weights assigned to criteria C 1,1 and C 1,2 are very similar for all the considered five methods (only the weights identified using the LBWA methodology differ slightly from those obtained by the other methods: ∼3% for criterion C 1,1 and ∼10% for criterion C 1,2 ).Therefore, the greater importance assigned by the experts to criterion C 1,1 (the reduction of purchasing costs) over criterion C 1,2 (the reliability of the purchase budget) is clear.
The criteria for Internal Logistics show the greatest degree of disagreement among the DMs (the low reliability of the simplified-AHP data should be highlighted due to issues encountered by two experts in making their judgments with this methodology).The results show that criterion C 3,4 can be considered as the least important.
On the other hand, Transportation showed the lowest degree of discrepancy between the DMs.The criteria are clearly distinguishable by their importance, with C 4,1 criterion being the most significant followed by C 4,2 and finally by C 4,3 and C 4,4 .
As regards Quality, identical weights for the AHP-express and FUCOM methodologies were obtained, as well as for the NDSL and LBWA methods.The greater importance given to criteria C 5,5 and C 5,6 is clearly highlighted, while the criteria C 5,3 and C 5,4 are less important.

BIJ
Planning is given the most important, followed by Purchasing and Quality.The least importance was given to Internal Logistics and Transportation.
From the obtained results, it can be argued that methodologies with comparable structures yield comparable results, as observed by P€ oyh€ onen and H€ am€ al€ ainen (2001).

Discrepancy between decision-makers
The degree of discrepancy D i for each macro-category i is calculated according to eq. ( 3) and presented in Table 3: Purchasing shows the lowest degree of discordance, while the greatest discrepancy between the DMs was for Planning and Quality, which were the categories with the greatest number of criteria.
4.4 Evaluation of the five parsimonious subjective criteria weighting methods 4.4.1 Experts' perception.At the end of the criteria weighting procedure, each expert was asked to answer the evaluation questionnaire (Appendix 1) on the proposed methods in order to evaluate their perception.
The results are presented in Table 4.

Supply chain observatory
The questionnaire results showed that simplified-AHP was the preferred method in terms of all the aspects considered, except for the perception of accurately selecting a preference, for which AHP-express and FUCOM methods were ranked highest.
The degree of appreciation of the DMs was not considered in previous studies (Luthra et al., 2017;Mañay et al., 2022), making our framework more attractive for business applications where DMs are subject to time constraints.
4.4.2Discriminating capacity of each method.The discriminating capacity d z of each method z was calculated according to eq. ( 4).The resulting values are shown in Table 5.
The results show that the method with the greatest discriminating capacity is the simplified-AHP.This may be due to the fact that the pairs of criteria to assign a judgment to are always different; on the other hand, in AHP-express or in FUCOM, the first term of comparison is always the most important criterion.This phenomenon can also be considered a bias since, taking the criterion deemed to be most important as the first term of comparison, the DM will pay more attention to the assignment of the score (Leal, 2020).
The AHP-express and FUCOM methods have almost identical values, since as seen previously, they often result in identical weights.

Company rankings and overall TOPSIS scoring
The data of the 10 sample companies were analyzed in order to assign each company with a score in relation to each of the selected criteria (Table 6).
4.5.1 TOPSIS implementation.Starting with the original score decision matrix (Table 6), the TOPSIS method was implemented (see Appendix 2).
The global weights to be used as input for the TOPSIS methodology were calculated according to equation (3) for each of the five weighting methods (Table 7).
The final ranking of the companies included in the study is shown in Table 8.Table 8 compares the rankings obtained from the five subjective parsimonious weighting methods.As shown, identical results were obtained for the AHP-express and FUCOM methodologies.Almost the same resulting ranking was obtained for the other three methods (simplified-AHP, NDSL and LBWA), with a few inverted positions (between companies A1 and A9 in the first and second place) and between companies A3 and A6 in the 4th and 5th place for the simplified-AHP).
The consistency of the ranking results confirms the medium robustness of the solution (mainly due to the uncertainty between the first and second positions), which can be critical for creating a ranking of companies and providing a benchmark.The robustness of the solution as the weighting method varies determines the validation of the ranking procedure using the parsimonious weighting methods.

Supply chain observatory
A case study is used to demonstrate the set-up of the observatory, involving twelve professionals with expertise in automatic equipment for the wrapping and packaging industry.These experts selected and evaluated a set of relevant criteria, identifying the company areas that could be improved using MCDM techniques.Ten companies were ranked with their respective positioning in terms of SC performance, comparing five parsimonious subjective weighting criteria methods selected to reduce the cognitive efforts of the DMs.As previously mentioned, the ranking solution of the companies obtained was on average robust, since despite some changes in the order, the general structure of the ranking was very similar for all the weighting methodologies.It is possible to conclude that methodologies with a similar structure identify similar solutions.With regard to the subjective perceptions of the DMs, simplified-AHP was considered the best method in almost all aspects: faster to carry out and more quickly understood.The only aspect for which other methods were better is the

BIJ
perceived accuracy in giving preferences; in this case, the AHP-express and FUCOM methods prevailed.Simplified AHP showed the greatest ability to discriminate between the criteria importance, although it sometimes provided weights that were visibly higher or lower than the average.This may be due to the fact that the pairs of criteria to assign a judgment to are always different; that is not the case with AHP-express or FUCOM, where the first term of comparison is always the most important criterion, leading the DM to pay more attention to the assignment of the score.These considerations should be taken into account when choosing a method.
The main theoretical implications of this study the robustness of the obtained results when applying different parsimonious weighting methods, which are strongly recommended in cases of a high number of criteria (Pamu car et al., 2018).Each of the method can be used for improvement actions starting from a solid basis.
The obtained results have some practical and managerial implications, which help us respond to RQ2.With a few adjustments, such as shifting the weights and criteria, the presented general methodology can be applied to many industries and take into account evolving SC strategies and policies.So, it paves the way for models of resource rationalization from the perspective of continuous improvement, according to a competitive logic aimed at a whole sector.
To give an answer to RQ1, we observe that the use of parsimonious weighting methods has shown good accuracy in the calculation of the weight values and a good degree of appreciation by DMs, enhancing the appeal of our system for corporate applications where DMs must work under time restrictions.
However, the study has some limitations.The choice of the criteria does not include innovative SC aspects such as digitalization-related indicators or sustainability-related indicators; this highlights the need for a wider discussion with the experts to make the set of indicators as exhaustive and inclusive as possible (Mishra et al., 2018).This improvement will positively impact not only the SC performance but also the whole society, improving sustainability at the global level.
As future research directions, the next step of our research foresees the application of the post factum analysis to the obtained ranking to achieve the minimum variation on one or more criteria to allow companies to reach a specific position.This can be very useful in the context of competitive benchmarking: once a company has learned its position in the ranking, it is essential to understand which business areas and which indicators to invest in, in order to improve its positioning compared to competing companies.
We also plan to define a set of qualitative criteria in order to use a combined qualitativequantitative framework in the evaluation process.
Step 1: Normalization x i,j being the score of the i-th alternative with respect to the j-th criterion, the normalized decision matrix is built according to the (1.1): Step 2: Weighting Starting from the assigned weights w j to the j-th criterion, the weighted normalized decision matrix is calculated (1.2): Step 3: Identifying the positive-ideal A* (1.3) and negative-ideal A 0 solutions (1.4) where J is the set of benefit criteria and J 0 the set of the cost criteria.

BIJ
Step 4: Calculation of the separation of each alternative from positive-ideal and negative-ideal solutions The separation from the positive-ideal solution is (1.5): The separation from the negative-ideal solution is (1.6): Step 5: Calculation of the similarity to the ideal solution Step 6: Ranking preference order The best alternative has C * i closest to 1.

Appendix 3
Criteria weighting using five parsimonious methods Tables A1 to A5 of Appendix 3 present the numerical results of the criteria weighting process implemented by the experts according to the five methods.The simplified AHP method is referred to as "S-AHP", and the AHP-express method as "AHP-e".
The resulting weights calculated using equation ( 1) are summarized in Table A7, while the weights of the macro-categories, calculated with eq. ( 2), are shown in Table A8 Figure 1.Research methodology used in this study Figure 3. Number of judgments required as a function of the number of criteria to be weighted in AHP, BWM and parsimonious methods Figure 4. Comparison of the weights resulting from the application of the five parsimonious criteria weighting methods

Table 1 .
Age Table created by authors Table 2.