“ A little bit of everything? ” Conceptualising performance measurement in hybrid public sector organisations through a literature review

Purpose – Numerousoftoday ’ spublicsectororganisations(PSOs)canbecharacterisedashybrids.Hybridity is caused by different (at times conflicting) demands that stem from the institutional environment, which is likely to affect performance measurement in these organisations. This paper focuses on the relationship between hybridity and organisational performance, which has so far not been studied in detail. Design/methodology/approach – Based on a literature review (final sample of 56 articles), the authors systematiseperformancedimensionsalongsidethepillars “ economy ” , “ efficiency ” , “ effectiveness ” and “ (social) equity ” . The article summarises results in a framework for measuring performance in hybrid PSOs. The authors outline strategies as to how public managers can tailor frameworks to the requirements and idiosyncrasies of organisations. Findings – Since hybrid PSOs combine logics from different administrative models (Weberian bureaucracy, market-capitalism and democracy), so need their organisational performance measurement systems. Potential synergies from and frictions between the different performance dimensions related to the four pillars are discussed. Originality/value – This is the first literature review on performance dimensions and their application in hybrid PSOs. The distilled “ hybrid performance measurement framework ” can be scrutinised and further refined in future research.


Introduction
The context in which today's public sector organisations (PSOs) operate has become increasingly complex over the last decades, not least because of competing demands stemming from the institutional environment. In their responses to complexity, governments frequently employ strategies to accommodate and integrate these (sometimes conflicting) institutional demands, resulting in hybrid organisational forms (Mair et al., 2015;Pache andSantos, 2013, Vakkuri and. Such "hybrid organisations" are commonly referred to as instantiating what scholars refer to as "conflicting logics" within their organisational core (Battilana and Lee, 2014;Smith and Besharov, 2019).
The research interest into and relevance of hybridity have recently been expanding from organisation theory (Battilana and Lee, 2014;Smith and Besharov, 2019) into urban studies (Leixnering et al., 2020), non-profit management (Kim and Mason, 2020;Skelcher and Smith, 2015), public administration (Denis et al., 2015;Vakkuri and Johanson, 2018) and public sector accounting (Grossi et al., , 2020. Denis et al. (2007) advise to regard hybrid PSOs "as a natural state of affairs and not as a subversive aberration" (p. 183). In this context, already Selznick (1957) raised the question as to how organisations could cope with distinct visions. Hybridity is therefore rooted in much older theoretical debates.
We understand performance measurement as a sub-process of performance management that, broadly speaking, deals with identifying, monitoring and communicating performance targets, throughputs and results through the use of indicators. Although hybridity is likely to affect performance measurement in PSOs, the relationship between hybridity and performance measurement remains so far only little scrutinised (Rajala et al., 2020). Grossi et al. (2017) argue that "[h]ybrids at all levels will become much more prevalent in the future; consequently, the need to understand the particularities of their performance is urgent" (p. 383). While a number of case studies on performance measurement in hybrid PSOs have been published (e.g. Agostino and Arnaboldi, 2015;Cappellaro and Ricci, 2017;Guenoun et al., 2016;Rajala et al., 2020), little is known, beyond the level of single organisations, regarding the relationship between hybridity and performance measurement Vakkuri and Johanson, 2018;Vakkuri et al., 2021). This is an important knowledge gap, as organisational performance in the contemporary public sector, with service delivery by an increasing number of hybrid PSOs, has been argued to be a concept that is hard to grasp and thus difficult to operationalise (Andrews and Van de Walle, 2013;Boyne, 2002;Rajala et al., 2020). Echoing Grossi et al. (2020), Vakkuri and Johanson (2020) call in particular "for more sophisticated theorizations of the ambiguities associated with measuring and reporting performance in hybrid organisations" (p. 34). This state-of-the-art spurred our two research questions: (1) What dimensions of performance are potentially relevant to hybrid PSOs and how can they be integrated in a measurement framework?; and (2) What are potential synergies from and frictions between the different performance dimensions in hybrid PSOs?
Against the backdrop that the scholarly knowledge about "performance management in hybrid organisations remains lacking" (Rajala et al., 2020, p. 799), we explore these issues through the means of a thorough literature review on performance measurement in the public sector, in general. We combine the insights gained from the review with the extant research on hybrid organisations. Given the current state-of-the-art, this research strategy is regarded as suitable to derive conclusions for the performance measurement in hybrid PSOs.
Hence, the contribution of this paper is twofold. For one, building on a review of the literature of performance measurement, we distil a "hybrid performance measurement framework" tailored to hybrid PSOs. As a second contribution, we discuss potential synergies from and frictions between the different performance dimensions.
The remainder of the article is structured as follows. In the next section, we present the conceptual setup, introducing the literature streams on hybrid PSOs and organisational performance. We then outline the methodology for the literature review. After that, we present the findings and summarise these in a framework. Subsequently, we outline strategies as to how public managers can adapt performance measurement frameworks to the requirements and idiosyncrasies of organisations and we reflect on the (non-) compatibility of different performance dimensions. The paper finishes with concluding remarks.

Conceptual background
Organisational hybridity as the combination of different administrative models The recent discourse about hybridity has been greatly inspired by New Institutionalism (e.g. Greenwood et al., 2011;Mair et al., 2015;Pache and Santos, 2013). In this view, hybrid organisations attempt to respond to conflicting demands from their institutional environment by simultaneously adopting different (at times opposing) logics, often with the aim to ensure legitimacy (Alexius and Grossi, 2018;Johanson and Vakkuri, 2018;Smith and Besharov, 2019). Battilana and Lee (2014) emphasise the multi-dimensional character of hybridity, pointing towards hybrid identities, hybrid organisational forms and the co-existence of multiple institutional logics within an organisation or social entity. The current study is interested in the latter two aspects. Regarding organisational forms, scholars have studied hybrid organisations such as social enterprises (Battilana and Lee, 2014;Costa and Andreaus, 2020;Doherty et al., 2014), state-owned enterprises (Giosi and Caiffa, 2020), municipally-owned corporations (Krause and Swiatczak, 2020), non-profit organisations (Kim and Mason, 2020;Skelcher and Smith, 2015), knowledge-intensive public organisations in the education and healthcare sectors (Gebreiter and Hidayah, 2019;Grossi et al., 2020) and public-private partnerships (PPPs) (Liu et al., 2014).
Institutional logics refer to institutional forces that determine how organisations should be structured, how they should be steered and how they should be controlled (Greenwood et al., 2011). Such a conceptualisation resonates with the notion of "administrative models" in the public administration literature. In the past decades, governments implemented reforms informed by ideas from different blueprints of such models (Pollitt and Bouckaert, 2017). Traditional Public Administration (TPA), New Public Management (NPM) and Public Value Governance (PVG) are suggested in the literature as administrative models (e.g. Benington, 2011;Osborne, 2010). PVG is alternatively referred to as Post-NPM, Network Governance, New Public Service, or New Public Governancewith some nuanced differences in content (De Waele et al., 2015;Denhardt and Denhardt, 2000). Administrative models "define and identify problems, show appropriate means to solve them and incorporate cause-effect relations into broader worldviews" (Fleischer and Jann, 2011, p. 70). They are "widely spread ideas for what kinds of formal structures, technologies, processes, procedures and ideologies an organisation should adopt" (Christensen et al., 2007, p. 58), and they offer scripts for action to actors. Each model is characterised by core ideas (Hyndman et al., 2014;Polzer, 2019) and a dominant logic -Weberian bureaucracy in TPA, market-capitalism in NPM and democracy in PVG (Polzer et al., 2016). We will see that the different logics have implications regarding performance measurement.
Hybridity in the public sector implicates that public organisations simultaneously combine logics from different administrative models (Denis et al., 2015;Skelcher and Smith, 2015). Most of the earlier research on hybridity departed from the juxtaposition of two logics, such as academic versus business logics, social versus commercial logics or public versus private logics (e.g. Doherty et al., 2014; for a critique, see Guenoun et al., 2016).
The boundaries of administrative models are, however, not always clear-cut, especially with respect to the core ideas conveyed (Pollitt and Bouckaert, 2017). For example, the outcomes dimension of government action could not be assigned to a single model, as they constitute a core idea in public management spanning across all administrative modules. Also, economy, although emphasised as a core doctrine of NPM (Hood, 1991), was already part of TPA (and would not be given up by PVG). Looking at the question if PVG is an entirely novel administrative model in comparison to NPM, Lynn holds that "[w]hatever may be new will be Hybridity and organisational performance rooted in soil that is very old indeed" (Lynn, 2010, p. 119). Perrow (1986, p. 158), looking at how new grand ideas are developed in complex organisations, argues "[t]he present is rooted in the past; no organisation (and no person) is free to act as if the situation were de novo and the world is a set of discrete opportunities ready to be seized at will". This means that a number of core ideas might be re-interpreted and differently emphasised by later administrative models, but cannot be assigned exclusively to a single one. Using the concepts of core ideas is nevertheless helpful to get an overview of the whole spectrum of core ideas and performance dimensions. We will now turn to this second central concept: (organisational) performance.

Organisational performance
In general, performance can be measured on an individual, organisational, system or societal level (Bouckaert and Halligan, 2008). Performance in PSOs (which lies at the heart of this research) is an intriguing, yet ambiguous concept Grossi et al., 2017). While performance measurement (and management) in the public sector has been referred to as "the ultimate challenge" , contemporary scholarship asked if performance management in a hybridised government is even a "shotgun marriage" (Vakkuri et al., 2021). This is because hybridity has been found to significantly affect dimensions of organisational performance, such as costs, quality, equitable access, transparency, regulatory compliance and accountability (Amirkhanyan et al., 2014;Conaty, 2012), as well as having brought forward the need to re-balance certain financial versus non-financial aspects (Manes Rossi et al., 2020). Also, hybridity might cause organisational "mission-drift" (Alexius and Cisneros € Ornberg, 2015). While performance in organisations in the private sector can often beultimatelynarrowed down to relatively easy-to-measure yardsticks regarding, e.g. profitability or growth, the majority of PSOs generally lack such unambiguous performance criteria (Hvidman and Andersen, 2016). Moreover, even if a profit criterion is in place in PSOs, organisational aims such as equity or redundancy (e.g. in strategically relevant areas such as defence and health) and transparency might still be prevailing, at least in certain situations (Andrews and Van de Walle, 2013;Fredrickson, 1990). With this, performance measurement in hybrid PSOs poses specific challenges for accounting and management , resembling "a little bit of everything", as it borrows from and integrates multiple performance dimensions. For example, while it is mandatory to ensure compliancy with rules and equity in social welfare institutions in order to guarantee that applicants are treated equally, the use of other dimensions such as customer satisfaction might remain of secondary importance, since strict rule application may hamper the organisation from being responsive to citizens' needs.
Over time, there was a shift towards a multi-dimensional view on organisational performance in the public sector in line with an expansion of the number of "pillars of administration" (Amirkhanyan et al., 2014). Boyne (2002) and Carter (1989), for instance, state that performance has been constructed around pillars such as "efficiency", "effectiveness" and "economy"often referred to as the "three pillars of public administration" or "3E model" (see also Chun and Rainey, 2005;Walker and Andrews, 2015). Fredrickson (1990Fredrickson ( , 2010 suggested to include the fourth pillar of "(social) equity", expanding the "3E" to a "4E model" (see also Norman-Major, 2011).
The value added of multi-dimensional performance systems has been confirmed by van Thiel and Leeuw (2002). However, the authors also warn for a performance paradox: If these systems are not well developed or too much skewed towards a certain direction, they might create unintended consequences. van Thiel and Leeuw (2002) illustrate this paradox with the example of a labour agency: If the agency's performance is mainly based on outputs (helping somebody to find a job again), the success rate will be the highest when only accepting clients with good chances to be reintegrated into the job market, therefore ignoring the more uneducated and poorer ones. This example demonstrates the discrepancy between performance standards (i.e. the "efficiency" dimension) and policy objectives (the "effectiveness" dimension on the "4E model"), and how this might result in perverse effects.
Furthermore, organisational performance is contingent on factors such as the organisation's purpose. Here, Behn (2003) found that different managerial purposes require different performance measures. For instance, for a number of public services, such as social welfare, it is key to ensure equityi.e. by guaranteeing equal access by citizens. However, state-owned enterprisesas one type of hybrid PSOsthat operate in an environment largely controlled by private companies (e.g. in the utilities sector) may frequently accommodate dimensions such as efficiency and effectiveness in facing competition (Alexius and Grossi, 2018;Denhardt and Denhardt, 2000). Likewise, the different logics present within hybrid PSOs are likely to result in the implementation of different performance dimensions.
Interestingly, Andrews and Van de Walle (2013) point out that the performance question in itself is a value-laden concept since performance has been mostly associated with quantifiable and result-driven constructs that tend to align well with NPM principles. Indeed, already Hood (1991) identified "explicit standards and measures of performance" (p. 4) as one of the doctrinal components of NPM. Because performance is such an abstract concept, practitioners and researchers might tend to favour performance criteria that are relatively clear-cut to measure, straightforward to compare with other organisations and easy to access (Andrews and Van de Walle, 2013). Consequently, our expectation is that most of the available empirical research is based on "hard" and (easier) quantifiable data such as efficiency measures and less so on "soft", qualitative and difficult-to-operationalise measures such as innovation (De Vries et al., 2016), equity (Frederickson, 2010) or social impact (Costa and Andreaus, 2020). This issue has been also mentioned by Lewis (2015), who refers to the large influence of mechanistic and rationalist thinking behind the influential performance literatures, such as principal-agency theory and the public choice school. Guenoun et al. (2016), without mentioning hybrid PSOs in detail, refer to the need to develop performance measurement frameworks that enable PSOs to coordinate their performance dimensions more effectively according to organisational contingencies. Scholars have argued that such frameworks need to account for and accommodate the idiosyncrasies of hybrid PSOs (see also Capellaro and Ricci, 2017;Grossi et al., 2020). Although these calls are not new (Brignall and Modell, 2000), previous attempts to do so are fairly rare or limited in scope (Rajala et al., 2020;Vakkuri and Johanson, 2020). For example, a study by Liu et al. (2014) on performance measurement in PPPsas another type of hybrid PSOsfound that measurement was done ex-post (and not ex ante and during projects), focussing on "cost" and "time" indicators, which was argued to be insufficient to evaluate PPPs. Furthermore, earlier attempts to generate such hybrid performance measurement frameworks were being criticised for actually growing into a symbol of NPM, since the associated tools (such as the Balanced Scorecard; Kaplan and Norton, 1996;Grossi et al., 2017) were originally developed as a management control tool for the private sector.
Due to the overall lack of empirical studies on performance measurement in hybrid PSOs (Rajala et al., 2020), we went one step back and conducted a literature review on organisational performance in the public sector to develop a more holistic understanding of the performance dimensions "available" to hybrid PSOs. From the analysis, a "hybrid performance measurement framework" tailored to hybrid PSOs is distilled. The methodology of the review will be outlined in the next section.
Methodology of the literature review Literature reviews include a research protocol for the retrieval and selection of original scientific articles and the evaluation of their quality, as well as their analysis and presentation Santis et al., 2018). Reviews have been established as reliable and valid means of summarising prior research findings (Denyer and Tranfield 2009;Fink, 2019;Manes Hybridity and organisational performance Rossi et al., 2020). In this study, the research aim is twofold: First, identifying the different performance dimensions and, second, studying additional literature in order to identify underrepresented performance dimensions. These aims were completed in two phases. Phase I was achieved by identifying and comparing the main performance dimensions in extant research. Since this phase was based on an analysis of literature based on predefined keywords, the research followed a mainly deductive approach. In Phase II, supplementary literature was analysed in order to check the performance dimensions from Phase I for robustness and to identify additional dimensions.
Here, literature was selected on other criteria than predefined keywords, demonstrating merely an inductive research approach [1].

Phase I
Using an approach similar to prior reviews (e.g. Nerantzidis et al., 2020;Shepherd and Challenger, 2013), a list of articles was compiled with a keyword search in the Web of Science database. This database is widely used when conducting social science-related reviews, offering access to the fields' most prominent journals. The keywords "performance dimensio*" and "performance indicato*" were entered in the "basic research" pane. We decided to limit the number of keywords in order to prevent potential bias or irrelevant query results. Also, other queries such as "performance measures" were initially employed. However, these searches generated lists of articles with large conceptual overlaps so that they were left out from further analysis.
Results were reduced to articles in English and the research-area "public administration" for the reason (1) that the primary focus of the current paper is on the public sector, (2) that including other research areas resulted in a list of mostly irrelevant articles and (3) to ensure internal validity of the findings brought forward since the public sector differs greatly from the private sector in terms of performance measurement. Monographs and conference papers were not included in the search because of (1) limited accessibility and (2) potentially lower quality, as many of them are not subject to peer review (Kuipers et al., 2014). Considering duplicate hits only once, this resulted in 616 articles.
The selection of the article set from Phase I was based on a two-step procedure (Kuipers et al., 2014;Lee and Cassell, 2013). First, papers were ranked according to journal rating, following the top-ten list in the research field of Public Administration produced by Web of Science (five-year impact factor), as top-rated journals are regarded particularly helpful in drawing authoritative conclusions (Doherty et al., 2014). This reduced the initial sample to 144 papers. Next, two members of the research team manually checked the relevance of the papers by reading the abstracts and/or the full paper (Shepherd and Challenger 2013). Any disagreement was discussed and both researchers had to agree in order to withdraw a paper from further analysis. Much of the literature we found focused on performance indeed, however, did not clearly elaborate on performance dimensions, resulting in very fragmented discussions of these dimensions. Therefore, articles in which the conceptual contribution to the performance literature was limited were removed from further analysis. After this process, the sample was reduced to 38 studies.
During the analysis, each article was reviewed, and each of the performance dimensions mentioned was recorded on a spreadsheet. In addition, information on author(s), year, number of citations, journal, methods used, regional focus and type of organisation studied were retrieved to assist the classification (see Kuipers et al., 2014). Subsequently, based on thematic analysis, similar performance dimensions in the papers were clustered and each cluster was allocated to a pillar on the "4E model" ("economy", "efficiency", "effectiveness" and "equity"; Norman-Major, 2011) (Table 1). We removed dimensions from further analysis where the application was limited to a specific context or where concepts were lacking conceptual clarity or empirical rigour. For instance, we found literature about performance dimensions in  Hybridity and organisational performance public schools, such as students' scores for specific assessment tests. Since these performance dimensions were specifically developed for the context of schools, they were hard to generalise for a broader use of public administrations in general and hence were omitted. The clustering of the different performance dimensions was aided by earlier work, such as from Denhart and Denhart (2000) and Hyndman et al. (2014). Clustering was done by two authors in order to reduce researchers' bias. Thick description was added when constructing the different clusters so that discussions between the authors were recorded for future revisions (Eisenhart, 1989). Only when there was unanimity among the authors, dimensions were clustered.
Dimensions included in the initial model needed to be explicitly mentioned by more than five different authors. This number is argued to be substantially high enough to detect trends. Below this cut-off, this selection would be subject to more personal interpretations from the researchers. However, this implies that the model would be exclusively based on counting frequencies and would be less robust regarding the theoretical foundations within the current discourses. Therefore, Table 1 also includes dimensions below this cut-off, stemming from articles with a major contributioni.e. literature containing outstanding empirical analyses relating to the performance dimensions. For this purpose, lists of potential articles were independently generated by two of this paper's authors and subsequently discussed and consolidated, based on the criteria "empirical rigor" and "theoretical contribution" (see Kuipers et al., 2014). To reduce bias, two members of the research team had to agree with each of these articles and performance dimensions. As outcome of this grouping in Phase I, an initial overview table was created (Table 1).

Phase II
The aims of the second research phase were (1) to check the clusters identified in Phase I for robustness and (2) to identify additional performance dimensions. In Phase II, we used a snowball approach rather than a keyword search for identifying additional articles, in line with the methodological recommendations by Fink (2019) and the methodology applied by Kuipers et al. (2014). This extension of the Phase I dataset is necessary due to the following two limitations.
First, it is possible that a very influential paper was not published in a top-tier journal, but gained a lot of influence since it has been widely cited, or was exceptional in terms of a conceptual contribution (e.g. a systematic overview). Therefore, a selection criterion purely based on the impact factor of the journal (as done in Phase I) might be a sufficient, yet not fully satisfactory condition (see Kuipers et al., 2014). Hence, we needed to complement the literature review from Phase I with additional articles, exceptional in conceptual and empirical rigor or in terms of citations. The evaluation criteria were as follows: (1) either the conceptual contribution was regarded substantial, achieved by juxtaposing the different administrative models in a schematic overview (2) or these articles were referred to more than 100 times. As before, two members of the research team had to independently identify such articles. This involved the close reading and analysis of the following four articles: Crosby et al. (2017), Denhardt and Denhardt (2000), Olsen (2006) and Stoker (2006).
A second limitation of Phase I was the focus on public administration journals, as a search for all subject areas yielded mostly irrelevant results. However, focussing on the literature in the public administration field alone was regarded insufficient in terms of our research question, since we wanted to obtain an encompassing overview of performance dimensions. Therefore, we extended the research field and looked for additional articles to substantiate our analysis (similarly done in the review by Mauro et al., 2017). Given the focus of the study, we enlarged our search to the research field of public sector accounting, as this area is particularly concerned with performance measurement and has recently concentrated on hybridity (e.g. Grossi et al., 2017). Adding accounting literature, specifically related to the context of public administrations, might therefore complement the results from Phase I and overcome the limitation of the exclusive focus on public administration journals. This resulted in 14 additional articles from special issues in accounting journals (i.e. AAAJ and PMM). These special issues focused on hybrid organisations, and the one in PMM explicitly focused on performance issues in PSOs.
A new performance dimension was added to the scheme developed in Phase I when it was mentioned in more than two of thein sum -18 studies identified in Phase II. With this, the total number of papers studied from both phases is 56.

Findings
Phase I The majority of the articles were published between 1994 and 2015. The most influential articles in terms of citations (more than 50) were published in JPART (3), PAR (3), PMM (2), IRAS (2) and JPAM (1). As time evolved, organisational performance became more and more studied as a multi-dimensional construct. Before the year 2000, organisational performance was mostly covered by only a few performance dimensions, whereas the number of examined dimensions increased afterwards. Also, before 2000, the performance dimensions were mostly narrowly defined; after 2000, more and more abstract performance dimensions, such as "participation", "program outcomes" and "innovation", emerged in the literature. Four of the reviewed studies were purely conceptual, two were literature reviews, five involved a qualitative research design (mostly case studies) and 25 included quantitative analyses. Furthermore, two meta-analyses compared and cumulated the findings from different studies. The dominant presence of quantitative research might reflect the preference in this tradition to focus on quantifiable performance measures, while more abstract performance measures are explored less.
Regarding the research setting, most of the empirical data focus on English local authorities (16 articles), of which eight articles are based on comprehensive performance assessment (CPA) data. The use of these records might reflect the pragmatic opportunity to access easily available data. Yet, the empirical focus on local governments also implies that the effect of potential contingencies on organisational performance, such as organisation's purpose beyond local governments, so far has been only little explored. Other studies were conducted in the USA (ten articles), Australia (two articles) and Canada, New-Zealand, Switzerland and Sweden (one article each). The research focussing on the UK has been predominantly produced by scholars from Cardiff University (e.g. Rhys Andrews, George Boyne and Richard Walker) and their networks (e.g. Andrews and Boyne, 2011;Andrews et al., 2006;Brewer and Walker, 2013). This research had high impact amongst academics working in the area of performance management in the public sector worldwide and was echoed accordingly (e.g. in the USA: Amirkhanyan et al., 2014;Moynihan and Pandey, 2005;O'Toole and Meier, 2008).
Although authors name constructs differently, the main underlying concepts they refer to seemed rather similar. Consequently, the performance dimensions allowed for being clustered (Table 1). For example, dimensions such as "service outputs" and "volume of service" in "quantity of outputs" ("efficiency" pillar in the "4E model") and "environmental performance", "societal impact" and "participation" in "socio-environmental impact" ("equity" pillar). Below, we will discuss the meaning of each of these in more detail alongside the pillars of the "4E model".
Economy pillar: In his seminal book, Frederickson (2010) refers to "economy" in public administration as the "management of scarce resources and particularly with expending the fewest resources for an agreed upon level of public services" (p. xv). "Parsimony" (cluster 9) is the only cluster from the "economy" pillar. The cluster is particular interesting from an Hybridity and organisational performance accounting and budgeting point of view, as it entails the dimension "economic use of allocated budget". Another study discussed the topic of "cost reduction". The data reported in the analysed articles were generated in the context of collecting CPA data in English local governments (Palmer, 1993;Walker and Andrews, 2015). Efficiency pillar: "Efficiency" has been defined as "achieving the most, the best, or the most preferable public services for available resources" (Fredrickson, 2010, p. xv). We collapse three clusters under this pillar, "efficient organisation", "quality of outputs" and "quantity of outputs". First, the review reveals that 27 dimensions relate to "efficient organisation" (cluster 1) as the most-discussed cluster. Boyne (2002) argues that efficiency reflects input-output relationships, being concerned with achieving relatively short-term results with minimum resources.
Another finding from the analysis of the sample is that none of the papers brings forward a clear definition of "quality of outputs" (cluster 5). Instead, authors illustrate what "quality" means by providing examples, such as the reduced number of cyclists killed (Andrews et al., 2006) or speed of delivery (Boyne, 2002), but fail to produce a general definition.
A similar state-of-the-art prevails regarding "quantity of outputs" (cluster 6). Chun and Rainey (2005), for instance, introduce the example of the number of home visits for the elderly. Quantity of outputs thus merely refers to the number of times a specific action has been accomplished, whereas quality of outputs refers to the quality of the performed actions.
Effectiveness pillar: Norman-Major (2011) defined effectiveness as "being successful in producing a desired result or accomplishing set goals" (p. 236). Unpacking this pillar, we identified the clusters "satisfaction", "effective organisation" and "outcomes". Bouckaert and Van de Walle (2003) hold that "satisfaction" (cluster 2) is about evaluating citizens' perception of the quality of public services and of the broader direction in which society is steered. An example is the satisfaction with waste collection or the number of users satisfied with swimming pools, theatres and concert halls (Andrews et al., 2006). We added employee satisfaction (Pollanen et al., 2017) to this cluster.
Second, Palmer (1993) defines "effective organisation" (cluster 3) as the extent to which the defined task or work program has been accomplished in relation to overall aims. Examples for this would be the percentage of rent collected from council housing tenants (Andrews et al., 2006). Organisational effectiveness thus involves the extent to which specific actions have been achieved in relation to the PSO's aims. Boyne (2002) states that "outcomes" (cluster 7) encompass the impact of a service. Gregory and Lonti (2008), in an abstract way, argue that outcomes should aim for an inclusive and innovative economy. In sum, outcomes refer to the effects of policy and political programs in a broader sense, such as enabling social cohesion or transparency.
Equity pillar: The equity pillar "delves into questions of for whom government operates" (Norman-Major, 2011, p. 237). From an organisational perspective this relates to questions regarding from whose perspective the organisation is well-managed, and if public services are fairly delivered (Fredrickson, 2010). Our analysis led to the identification of two clusters: "social equity" and "socio-environmental impact". Andrews and Boyne (2011) define "social equity" (cluster 4) as the extent to which a PSO aligns its policies with respect to gender, race and disability, such as the equal access to public housing (see also Brewer and Walker, 2013;Walker et al., 2011). Consequently, "social equity" is about the extent to which citizens have equal access to the public services offered.
The dimension "social responsibility", "environmental performance", "impact", "participatio" and "cost per unit of democratic outcome" appear to relate to each other, resulting in our final cluster: socio-environmental impact (cluster 8; O'Faircheallaigh, 2010). According to Glasson et al. (2012), the social and ecological consequences of a plan, policy or programme are taken into account in order to ensure sustainability.
Innovation as cross-cutting cluster: Finally, another cluster, "innovation" (cluster 10), came to the front. This cluster could not be allocated to one of the 4Es, as it can be related to more than one pillar. For example, while social innovation would align with the "equity" discourse, innovation in programme delivery through streamlining processes would refer to aspects of "efficiency" (De Vries et al., 2016). Performance dimensions falling under this cluster are assessing to what extent governments are open, accessible, responsive and have collaborative structures in place (Crosby et al., 2017;Denhardt and Denhardt, 2000).
To sum up, the frequencies in the second column of Table 1 indicate a dominance of "efficiency"-related dimensions in the extant literature. These dimensions were mentioned 48 times, followed by "effectiveness" (29), "equity" (17) and "economy" (2). Consequently, there is an imbalance in attention between the performance dimensions relating to the different pillars.
Phase II An explanation for dimensions relating to "efficiency" being over-represented in Phase I might be that the majority of reviewed studies in the academic discourse pursued a quantitative methodology, focussing on measurable input, output and process data (Spekl e and Verbeeten, 2014). The aim of the analysis in Phase II was therefore (1) to perform a robustness check for the clusters identified in Phase I and (2) to identify additional performance dimensions. Different to Phase I, the identified supplementary 18 studies followed a qualitative (7 papers) or conceptual approach (11). The analysis resulted in the identification of three further clusters.
First, in line with Brereton and Temple (1999), Olsen (2006) argues that "citizenship involves more than voting, requiring a new institutionalized moral vision synthesizing private and public ethical principles" (p. 7). Downe et al. (2016) define these principles as written frameworks used by organisations to specify and shape what is regarded as appropriate conduct. The relevance of integrating "ethical principles" (cluster 11) into organisational performance measurement systems is therefore accentuated. This cluster resonates well with the "equity" pillar (Cameron, 2004).
Second, we identify "standardisation of procedures" as another cluster (cluster 12). Authors such as Glanz (1991) argue that bureaucracies, by their very nature, require a high degree of standardisation, with an emphasis on uniformity in both rules and conduct. This is also echoed by Stoker (2006), who holds that "mass-citizenship led to increased demands on the state in areas such as education and health that only can be managed by standardisation of administrative responses that enable an organisation to meet the welfare tasks it generates" (p. 45; see also Cappellaro and Ricci, 2017). We subsume cluster 12 under the "efficiency" pillar, as it relates to operational aspects of service delivery.
Finally, Olsen (2006) stated that one of the key concepts of Weberian bureaucracy is that civil servants are supposed to act as guardians of constitutional principles, the law and professional standards. Accordingly, compliance with prescribed rules, procedures and codes of conduct is central. Moreover, the findings of Denhardt and Denhardt (2000) imply that rules and procedures are being followed in a consistent way. Demonstrating "compliance with rules and procedures" (cluster 13) is therefore essential. We argue that this cluster is cutting across the pillars for the "4E model", in the same way as the "innovation" cluster does. This is for the reason that this cluster refers to the context in which PSOs operate and can manifestfor instance, the context in which programmes are offered ("effectiveness") and to which groups of society ("equity").

Summary of findings: a framework for measuring performance in hybrid PSOs
We are now ready to take the next step towards summarising our results from the literature review in a framework for performance measurement in hybrid PSOs. In Figure 1, the findings from the review's two phases are integrated. We collapsed the 13 identified clusters Hybridity and organisational performance into four pillars, informed by the "4E model" and further identified two cross-cutting clusters that represent the background in which hybrid PSOs operate ("compliance", cluster 13) and that deals with various aspects of "innovation" (cluster 10). The results of the review demonstrate that the contemporary literature on organisational performance is, to date, mainly oriented towards the efficiency pillar of the "4E model". Hence, our findings echo earlier observations by researchers (e.g. Lewis, 2015;van Thiel and Leeuw, 2002) identifying a potential "attention skewness" among the main performance dimensions towards data that is measurable with (relatively) little effort.
However, research pointed out that there is no "one-size-fits-all approach" regarding measuring performance in PSOs (Amirkhanyan et al., 2014;Radin, 2006), let alone hybrid ones (Vakkuri et al., 2021). In the next section, we reflect what these results mean for performance measurement in hybrid PSOs.

Discussion
The first research question of this article focused on what dimensions of performance are potentially relevant to hybrid PSOs and how can they be integrated in a measurement framework. In this context, Behn (2003) reminds us that different purposes of PSOs require different measures, and Pollitt and Bouckaert (2017) elaborate that the potential frictions that stem from the integration of different performance dimensions imply that PSOs should attempt to find a compromise when using particular dimensions.
The need to balance and integrate different performance dimensions is even more urgent for hybrid PSOs that are guided by more than one logic at the same time, which adds a layer of complexity. The appropriate mixture of performance dimensions has argued to be contingent on the organisational configuration and on which logics are dominant. Therefore, our framework can be regarded as a "toolbox" that contains different "tools" (performance dimensions) from different "compartments" (pillars on the "4E model"). Hybrid PSOs and Hybrid performance measurement framework their parents (core administrations such as line ministries) can select tools and assemble their own performance measurement frameworks accordingly.
Ideally, performance dimensions should therefore be chosen in accordance with the prescriptions from the administrative models adopted by a hybrid PSO. Large inconsistencies are undesirable and are likely to provoke gaming behaviour (van Thiel and Leeuw, 2002), which negatively affects the performance of the organisation in the long run (Grossi et al., 2020). With this, performance dimensions also have to potentially go beyond ones that are easy to measure.

Factors affecting the design of performance measurement frameworks in hybrid PSOs
After outlining what performance dimensions are potentially available to hybrid PSOs, a question that follows is what factors need to be taken into account when designing performance measurement frameworks in hybrid PSOs. The studies by Cappellaro and Ricci (2017), Vakkuri et al. (2021) and Pache and Santos (2013) provide a number of suggestions.
Looking at examples from PPPs in health and social services, Cappellaro and Ricci (2017) present a contingency-based approach for developing performance measurement systems alongside the two dimensions "degree of integration of partners" and "degree of specialisation of offered services". They define three templates based on which measurement systems can be tailored to the organisational needs in PPPs. Vakkuri et al. (2021) depart from the notion that there are various types of hybrid PSOs and point towards differences in ownership, funding arrangements, pursued goals and logics and forms of social control in such organisations. The authors point out that the design of a particular performance measurement framework depends on the type of hybrid PSO (Vakkuri et al., 2021). They suggest three strategies for public managers for creating consensus around specific chosen performance dimensions: (1) mixing (dimensions from both logics are represented in performance measurement); (2) compromising (performance dimensions are modified and do not represent one single logic directly, leading to a system that is not ideal, but acceptable); and (3) legitimising (logics embedded in the performance measurement framework are legitimated by the organisation via-a-vis stakeholders).
Pache and Santos (2013), although not focussing explicitly on performance measurement, identify two further strategies for dealing with conflicting logics: (1) "selective coupling" (referring to hybrid organisations integrating intact elements prescribed by each logic, allowing them to demonstrate legitimacy to external stakeholders without having to engage in deceptions or costly negotiations); and (2) "Trojan horse pattern" (organisations with low legitimacydue to their embeddedness in one logicstrategically incorporate elements from another logic in order to gain acceptance).
It could be questioned, for example, if performance dimensions from the "efficiency" and "effectiveness" pillars should be advocated in hybrid PSOs that are being confronted with solving large and complex problems, and that primarily operate in a non-competitive environment. Instead, these organisations might benefit from valuing "innovative" and collaborative approaches related to "equity". A similar reasoning could be applied in the context of state-owned enterprises that operate in a competitive environment in which nonefficient and non-effective organisations are being forced to leave. These sorts of hybrid PSOs might benefit from primarily focussing on "efficiency" and "effectiveness" in order to withstand the strong competition.
In summary, the choice of particular performance dimensions to be included in the performance measurement frameworks is contingent on the configuration of the hybrid PSO.
Our study brought forward ideas as to how performance measurement in hybrid PSOs might achieve some gains "on the ground" (Cappellaro and Ricci, 2017). We will now, in turn, discuss potential synergies from and frictions between different performance dimensions in hybrid PSOs.

Hybridity and organisational performance
Synergies from and frictions between different performance dimensions Our second research question asked about the potential synergies from and the frictions between the different performance dimensions in hybrid PSOs. In an organisation that is guided, for instance, by both market-capitalism (NPM) and democracy (PVG) logics at the same time, not only "efficiency" and "effectiveness" pillars play presumably a role, but also the "equity" one. Within such constellations, interactions between performance dimensions related to different pillars might create frictions or reinforce each other. In the following, we identify and reflect upon a number of such synergies and frictions. First, the requirement to "comply with rules and procedures" (cluster 13) in order to ensure equal access to public services might come at the expense of (customer) "satisfaction" (cluster 2). Organisations bounded by detailed rules and procedures (Weberian bureaucracy logic; TPA) are typically less flexible and hence less responsive to clients' needs (market-capitalism logic; NPM) (Rai, 1983). Moreover, public servants may hide themselves behind rules and procedures when they make mistakes, ignoring their responsibilities towards their clients and complicating accountability (democracy logic; PVG) (Fatemi and Behmanesh, 2012). Yet, customer satisfaction might also exacerbate political inequalities since public service efforts may be biased towards citizens of higher socioeconomic status, as those citizens are more likely to know and exert their rights and to successfully raise their voice (Fountain, 2001).
Second, combining market-capitalist (NPM) and democracy (PVG) logics might negatively affect the "socio-environmental impact" (cluster 8) of a hybrid PSO. "Customer"-centred measures (cluster 2) that align well with NPM logics mostly depart from a shareholder instead of a stakeholder model, which limits the organisation's ability to respond to shifts in the greater environmental context, as illustrated by Hillman and Keim (2001). Also, the pursuit of "social equity" (cluster 4) may lead to frictions with considerations of "parsimony" (cluster 9) and the "standardisation of services" (cluster 12)for example, when hybrid PSOs offer services to citizens that are "non-standardisable".
Finally, the focus on "innovation" (cluster 10) can potentially lead to synergies. This is because innovation aligns well with NPM-related performance dimensions on the "efficiency" and "effectiveness" pillars, as it has been noted that the outcomes from organisational and collaborative forms of innovation may aid the restructuring of technical and administrative processes (De Vries et al., 2016;Melo et al., 2020).
To conclude, the question of how to balance economy, efficiency, effectiveness and equity in PSOs is not new (Norman-Major, 2011). We added the perspective of hybrid PSOs, for many of which creating a balance between performance dimensions is paramount. We also reflected on a number of potential synergies from and frictions between different performance dimensions that might come to the front in hybrid PSOs. Being aware of these is a crucial precondition for integrating dimensions in a coherent performance measurement system. Our research outlined strategies for how to do so.

Conclusion
This study departed from the notion that despite services are increasingly provided by hybrid organisations in today's public sectors, there is so far only limited consolidated knowledge about performance management in hybrid PSOs (Rajala et al., 2020). Given this scarcity of literature, we took one step back and systematically reviewed the extant literature on performance measurement in the public sector in order to identify performance dimensions from analysing this literature. As a first contribution, we subsequently clustered and summarised the described performance yardsticks in a "hybrid performance measurement framework". Further, we outlined strategies as to how public managers can tailor frameworks to the requirements and idiosyncrasies of hybrid PSOs (Cappellaro and Ricci, 2017;Vakkuri et al., 2021).
This study showed that as a hybrid PSOs can be understood to be guided by prescriptions from more than one institutional logic at the same time, their performance measurement systems will be a mixture as well. As a second contribution, this paper outlined potential synergies from and frictions between different performance dimensions that might come to the front in hybrid PSOs.
Our framework might help researchers and practitioners to determine what pillars are dominantly guiding performance measurement in a hybrid PSO and what performance dimensions therefore might be the most appropriate ones. The framework challenges researchers (1) to take a more holistic view on organisational performance, (2) to be aware of potential contingencies that play a role when prioritising performance dimensions that should fit with the organisation's purpose and (3) to identify synergies and frictions that follow from the interplay of performance dimensions.
We invite future research to scrutinise, apply and further refine this framework. For example, future work could examine how performance measurement is linked to systems of integrated reporting P€ arl et al., 2020) and accounting for the public value that hybrid PSOs generate (Bracci et al., 2019;Douglas and Overmans, 2020). As we did not test our framework empirically (which is a limitation), a potential further research avenue would be to investigate the ways in which indicators from different performance dimensions are actually used in hybrid PSOshere, research differentiates, e.g. between "rational" and "ritualistic and decoupled" forms of use (Agostino and Arnaboldi, 2017). This directs attention to the "users" and "use" of performance information which has been recently studied quite intensively and could be extended to hybrid PSOs (e.g. Haustein et al., 2019;van Helden and Reichard, 2019;Ouda and Klischewski, 2019).
Finally, the framework can also be useful to practitioners that can deploy it to steer organisations by using customised forms of performance measurement. Intermingling performance dimensions that simultaneously refer to different pillars from the "4E model" indeed resembles "a little bit of everything", but performance systems that thoughtfully integrate the dimensions can ultimately facilitate accounting and management in hybrid PSOs.