Output measurement in professional public organizations: insights from European justice systems

Purpose – The aim of this study is to empirically explore and analyze the concrete tasks of output measurement and the inherent challenges related to these tasks in a traditional and autonomous professional public work setting – the judicial system. Design/methodology/approach – The analysis of the tasks is based on a categorization of general performance measurement motives(control-motivate-learn)and mainstakeholderlevels(society-organization- professionals). The analysis is exploratory and conducted as an empirical content analysis on materials and reports produced in two performance improvement projects conducted in European justice organizations. Findings – The identified main tasks in the different categories are related to managing resources, controlling performance deviations, and encouraging improvement and development of performance. Based on the results, key improvement areas connected to output measurement in professional public organizations are connected to the improvement of objectivity and fairness in budgeting and work allocation practices, improvement of output measures ’ versatility and informativeness to highlight motivational and learning purposes, improvement of professional self-management in setting output targets and producing outputs, as well as improvement of organizational learning from the output measurement. Practical implications – The paper presents empirically founded practical examples of challenges and improvement opportunities related to the tasks of output measurement in professional public organization. Originality/value – This paper fulfils an identified need to study how general performance management motives realize as concrete tasks of output measurement in justice organizations.


Introduction
Performance management (PM) systems are increasingly implemented in different sectors of professional public sector organizations (e.g. Barbato and Turri, 2017;de Brujin, 2011;Goh et al., 2015;Vogel, 2013). Despite the widespread and worldwide use of the systems, there still exists several barriers, challenges and problems in the effective utilization of performance measurement systems in these organizations (Barbato and Turri, 2017;J€ a€ askel€ ainen and Roitto, 2014;Goh et al., 2015). The identified and most studied distinct characteristics and challenges are connected to the output versus outcome measurement, resource allocation and budgeting practices, the number of stakeholders involved and to the unintended behavior and effects caused by the implemented PM system (e.g. Barbato and Turri, 2017;de Brujin, 2011;Johansson, 2015;Radnor, 2008;Rajala et al., 2018;Goh et al., 2015;Van Thiel and Leeuw, 2002).
A distinct research area is the relationship between output and outcome measurement in professional public sector organizations (Rajala et al., 2018). Output measurement should not be forgotten in the improvement of performance measurement systems. As outcomes can be seen to cover the long-term effects of performance management (Førsund, 2012;Mandl et al., 2008;Rajala et al., 2018), it is important to make sure that output measures and measuring practices are appropriate and comprehensively incorporate the important components of the organization's performance. This requires that output and outcome measurement are not understood as opposites, but rather mutually complementary and important parts of the overall PM system (Linna et al., 2010). In professional service organizations, the outputs are also usually intangible and more difficult to quantify than in a manufacturing environment. This means that the different challenges related to performance measurement can only be solved after the outputs and their effects on outcomes have been properly defined (J€ a€ askel€ ainen and L€ onnqvist, 2011). Hence, it is important that the improvement needs and opportunities of output measurement are not forgotten, but instead studied and analyzed further in different organizational contexts in order to improve the transformation of outputs into the desired outcomes and avoid possible negative effects on operations.
The various dimensions of quality are usually well established in the processes, management practices and professional ethics of the organizations (Pekkanen, 2011), whereas efficiency issues are considered irrelevant and contradictory to the overall purpose of the organization (de Bruijn, 2011;Johansson, 2015). However, public organizations are dealing with problems connected to efficiency and productivity (de Bruijn, 2011;Dobija et al., 2019;Elg et al., 2013), and public managers are facing increasing pressure to hold down public expenditure while improving service quality (Bj€ ork et al., 2014). The increasing pressures to improve efficiency and accountability also highlights the importance and the need to analyze and improve the output measurement systems used in the organizations (J€ a€ askel€ ainen and L€ onnqvist, 2011;Mandl et al., 2008). There exists a clear need to study the possibilities to design suitable and more comprehensive output indicators and measurement systems without compromising quality, equality and satisfaction of citizens or other professional work standards (see, e.g. J€ a€ askel€ ainen and L€ onnqvist, 2011). This calls for empirical analysis on the motives, tasks, and main improvement needs of the implemented output measures.
Implementing performance management systems often causes tension between managers and professionals, making the autonomous professionals to reject or abuse the proposed indicators (de Bruijn, 2011). Professionals opposing or gaming the indicators is usually a consequence of choosing indicators based on practicality, leading to the oversimplification of the complex work of professionals (de Bruijn, 2011;Johansson, 2015). It has been argued that PM in professional public sector organizations is viewed too narrowly, focusing on control and accountability and not enough on the motivation, improvement, and learning tasks of measurement systems (Radnor, 2008). It should be remembered that measurement information has several important aspects to different stakeholders (e.g. society, management, professionals), and include these aspects into the systems. By studying and analyzing output measurement from the perspective of different motives and purposes, it would be possible to better incorporate incentives and feedback for improvement, celebrate and promote learning from the measurement information, facilitate self-management of professionals, and reduce the perverse effects and negative attitudes related to PM systems (Behn, 2003;Johansson, 2015;Spekl e and Verbeeten, 2014). Research on measurement motives has produced lists and characteristics of the different purposes of PM systems (e.g. Behn, 2003;Spekl e and Verbeeten, 2014). However, there is a lack of empirical studies on how these different motives manifest in concrete output measures implemented in professional public organizations and how the used output measures support the achievement of the different motives from the perspective of different stakeholders.
The aim of this study is to empirically explore and analyze the main tasks of output measurement and the inherent challenges related to the tasks in a very traditional and autonomous professional public work settingthe judicial system. In justice organizations, IJPPM the challenges surrounding performance measurement is highlighted because the emphasis on efficiency and accountability is considered a threat to the autonomy and impartiality of the judges (De Santis and Emery, 2017;Lienhard et al., 2012). The outcomes of courts are especially difficult to define precise (Contini et al., 2014;Vecchi, 2018), making the comprehensiveness and impressiveness of the output measures especially important.
The research questions of the study are: (1) How are the general performance management motives realized as concrete tasks of output measurement in justice organizations?
(2) What are the main challenges in carrying out these different tasks of output measurement?
The analysis is based on a categorization of general performance measurement motives (control-motivate-learn) and main stakeholder levels (society-organization-professionals). The analysis is exploratory and based on secondary source of evidence: materials and reports produced in two performance improvement projects in European judicial systems. The analyzed material is based on interviews and expert workshops conducted in 14 European countries and their judicial organizations. Section 2 presents a literature review concerning the distinct characteristics and motives of performance management in professional public sector organizations. Section 3 covers the methodology of the study. Section 4 presents the main findings of the study. Finally, in section 5, discussion on the results and concluding remarks are presented.

Literature review
The public sector in many countries has undergone a range of New Public Management (NPM) reforms over the last decades to become more "business-like" with a greater emphasis on results and accountability (see, e.g. Charbonneau et al., 2015;McGeough, 2015;Pollitt and Bouckaert, 2000). Hood (1991) and Van Dooren et al. (2015) conclude in their studies that performance measurement and control, especially output measurement, are central doctrinal components in NPM. Many professional public organizations have introduced performance measurement systems in order to meet the increasing requirements concerning efficiency, transparency and performance accountability (e.g. Barbato and Turri, 2017;de Bruijin, 2002;Dobija et al., 2019;Elg et al., 2013;Linna et al., 2010;Radnor and McGuire, 2004;Tabi, 2013;Vogel, 2013). However, it is argued that, the literature on the performance measurement systems in professional public sector organizations needs more detailed empirical analysis (see, e.g. J€ a€ askel€ ainen and L€ onnqvist, 2011).
The basic reason for using performance measurement is to improve goal achievement by shaping behavior within the organization and to act as the basis for internal and external accountability (Johansson, 2015). However, performance measures should also be incentives to promote improvement, learning and motivation in the organization (de Brujn, 2002;Radnor, 2008). In addition, it should be noted in terms of PM systems, that public sector performance is multidimensional, including explicit considerations of quality and quantity of outputs, service outcomes, and citizen and user satisfaction (Charbonneau et al., 2015).

Performance measurement in professional public organizations
The most studied distinctive characteristics of performance measurement in professional public organizations are connected to (1) the output vs. outcome measures and measurement, (2) the issues surrounding resource allocation and budgeting practices, (3) the challenges connected to the number of stakeholders involved in measurement, and (4) the unintended effects of the implemented systems (e.g. de Brujin, 2011;Johansson, 2015;Radnor, 2008;Rajala et al., 2018;Goh et al., 2015;Van Thiel and Leeuw, 2002).

European justice systems
Outputs describe what the organization does in using resources to produce directs outputs, and outcomes describe what the direct and indirect effects of the outputs are (e.g. Charbonneau et al., 2015;Rajala et al., 2018). A typical feature in the relationship between outputs and outcomes is that the transformation from outputs to outcomes includes various external factors. Therefore, the different transmission effects are hard to distinguish and isolate, and the process is not fully controllable by the organization. It is also challenging to disentangle the effects of different outputs on the outcome. There are usually delays between the implementation of output measures and their impact on the outcome (Førsund, 2012;Mandl et al., 2008).
It is still extremely complicated to establish an outcome-oriented measurement system for public sector needs, purposes, and practices (Rajala et al., 2018). Some outcomes cannot be measured directly, and some cannot be measured at all. Outcomes usually reflect values like quality and satisfaction, which are hard to define and measure (Rajala et al., 2018). As more information and more time are used to collect outcome data, it may be even harder to point out the factors causing the particular outcome (Lowe, 2013). However, performance management is unlikely to be effective if it does not include both outputs and outcomes as part of an integrated performance information and evaluation framework (McPhee, 2005). As outcomes cover the long-term effects of performance management (Førsund, 2012;Mandl et al., 2008;Rajala et al., 2018), output measures and measuring practices need to appropriate and comprehensively incorporate the important components of the organization's performance. This requires that the output and outcome measurement are not understood as opposites, but rather mutually complementary and important parts of the overall performance management system (Linna et al., 2010). Improving output measurement should not be forgotten in the improvement efforts of professional public organizations.
Usually in the public sector, performance measures are highly related to budget and resource allocation decisions. This makes the design of the systems to also include a political dimension with large impact from the society driven by the need for equality (de Brujin, 2002;Linna et al., 2010;Goh et al., 2015). The use of measures in resource allocation and strategic planning have been widely studied (e.g. J€ a€ askel€ ainen and Roitto, 2014). The most common approach to allocating resources in the public sector is performance-based funding (PBF). Various alternatives of this system have been developed in different countries (see, e.g. Hur, 2018). PBF uses specific formulas to tie funding to organizations' performance, which is, again, based on different indicators and targets (Francesconi and Guarini, 2018). In the Western world, a so-called model of management by objectives and results (MBOR) is commonly utilized (see, e.g. Kristiansen, 2017). In MBOR, agencies and individual organizations are given autonomy and flexibility in the use of resources and in choosing means and measures, but they must accept performance contracts, targets, reporting and assessment systems established by the ministries or other governmental institutes (Kristiansen, 2017).
de Brujn (2002) states several common challenge areas in the financial incentive and payfor-performance aspects of resource allocation and budgeting practices. There is a possibility that the practices can even lead to punishment of performance. The challenges are connected to shared resource pools, lack of performance transparency and lack of possibilities to reward good performance. When a fixed budget is divided among several organizations and all organizations perform better, it leads to financial sanctions with lower "price-per-product". A transparent and well-performing organization may also be in a vulnerable situation, where investment in increasing efficiency may lower the budget for the next year (whereas an organization not increasing efficiency and offering transparency may be rewarded with equal targets and resources). Rewarding well-performing organizations with resources is not straightforward. Usually additional resources need be used to help non-performing organizations to guarantee equal service to citizens, sometimes at the expense of the wellperforming organizations.

IJPPM
In public organizations, there exists multiple stakeholders with conflicting needs concerning performance. This may be realized as a large number of measures and unfocused purpose of measurement (Boland and Fowler, 2000;J€ a€ askel€ ainen and Roitto, 2014;Radnor, 2008;Rantanen et al., 2007). Hence, conceptualizations of the performance of an organization may vary depending on the perspective from which it is viewed. Some perspectives are contradictory, creating an especially challenging task for managers as they try to achieve improved results (Charbonneau et al., 2015). Important stakeholder groups include the citizen-and the formal state society, the management of the organizations and the individual professionals. The role of the society is to control that individual organizations produce the outputs and outcomes needed. As the society has a central role, Radnor and McGuire (2004) argue in their study that the role of managers is often more about being administrators than managers, particularly in relation to performance measurement. Managers need to balance between sometimes conflicting goals, required to follow national and local policies while being attentive to the needs of staff and customers during day-to-day service production (Bj€ ork et al., 2014).
In professional organizations, the performance measurement practices can lead to different unwanted effects and behavior. These perverse effects are widely referred to in literature (e.g. de Bruijn, 2002;de Bruijn, 2011;Garlatti et al., 2018;Kerpershoek et al., 2016;Radnor, 2008). Typical effects include, for example, different types of strategic behavior (e.g. optimizing the output, concentrating on easier outputs, focusing on performance on unit-level not on organizational level, focusing on short-term targets, misreporting or distorting data, and deliberate under-achieving). Van Thiel and Leeuw (2002) divide the perverse effects to unintended and deliberate. Unintended perverse effects can be caused by insufficient knowledge about performance measurement, and deliberate perverse effects happen through conscious decisions. The characteristics of professional public organizations foster the emergence of perverse effects (Van Thiel and Leeuw, 2002), and this should be considered in the design of the system. It has been noted that commitment towards measures is more widely established when management and other personnel feel they can affect the measurement results (J€ a€ askel€ ainen and Roitto, 2014).

Motives of performance measurement in professional public organizations
According to various studies, performance measures should be, among other things, diverse and complementary, objective, informative, causally related, supportive of decision-making and incentive for improvement (e.g. Ittner and Larcker, 2002;Lipe and Salterio, 2000;Malina and Selto, 2004). In general, performance measures should enable transparency, learning, appraising, sanctioning, and comparing (de Brujn, 2002). Performance measurement systems are also meant to create incentives to align individual goals with the objectives of the organization, provide feedback information concerning the progress towards these objectives, and form the basis for internal and external accountability (Spekl e and Verbeeten, 2014).
Public performance management literature also recognizes various uses for performance information and discusses the factors behind the use of performance information (e.g. Behn, 2003;Laihonen and M€ antyl€ a, 2017). Measures act as evidence of effectiveness in improving public accountability and policy decision-making, as well as enable goal setting, resource allocation and budget formulation (Behn, 2003).
Several different categorizations of motives and uses of performance measures can be identified in the literature. For example, Spekl e and Verbeeten (2014) introduce three different classifications of performance measurement use: operational use, incentive use and exploratory use. Vecchi (2018) divides the usage into four categories: instrumental use, process use, conceptual/enlightenment use, and symbolic/legitimating use. Behn (2003) defines and explains eight purposes:

European justice systems
(1) Evaluate -How well is an agency performing?
(2) Control -Ensuring that subordinates are doing the right things.
(3) Budget -On what should the agency spend the public's money?
(4) Motivate -How to motivate line staff, middle managers, collaborators, stakeholders, and citizens to do things to improve performance?
(5) Promote -How to convince stakeholders that the agency is doing a good job?
(6) Celebrate -What accomplishments are worthy of celebrating success?
(7) Learn -What is working, what is not?
(8) Improve -What exactly should be done differently to improve performance?
It can be concluded that, in addition to decision support and control, performance information should also have other purposes (Laihonen and M€ antyl€ a, 2017;Pollitt, 2006). It has been argued that, in general, controlling plays too large of a role at the expense of the other measurement purposes (Radnor, 2008). To produce different types of positive effects (for example innovations and organizational learning), the development of structures and procedures to enable these effects would be required (e.g. Kalgin et al., 2018;Laihonen and M€ antyl€ a, 2017).

Methods and data
This study is conducted as an empirical content analysis, utilizing data collected from continental European' justice organizations. The study is based on two separate research and improvement projects carried out in co-operation between several European research and governmental institutes and co-funded by the European Commission. The empirical content analysis is based on secondary source of evidence, utilizing the written materials and reports produced during the research and improvement projects. Justice systems can be categorized under broader labels or so called major legal systems: the common law system and the civil law system (sometimes referred to as Romano-Germanic legal system). Continental Europe belongs to the group of civil law system. Central to the civil law system is that laws are based on codification and the judge's role is to apply the law instead of creating it. The civil law tradition also emphasizes the judicial independence of a judge, which is needed to secure the fairness of judgments (see, e.g. Merryman and P erez-Perdomo, 2007).

Data gathering
The analysis is based on a set of written materials and reports produced in two performance improvement projects at the European judicial systems. The authors have been involved in the data gathering and reporting of both projects. The projects investigated issues related to the improvement needs and opportunities of performance and quality assurance practices in different European countries.
The first project "CFM-net -Towards European Caseflow Management development network -Identifying, developing and sharing best practices" was carried out in 2014-2016. The overall objective of the project was to start creating procedures for European co-operation in developing and sharing good practices for process and operations managementnamely the "flow" of judicial cases and performance of the organizations. The research team included members from 5 countries: Estonia, Finland, Italy, the Netherlands, and Switzerland. Two reports from the project are included in the data IJPPM analysis: "Caseflow Management Handbook -Guide for Enhanced Court Administration in Civil Proceedings" and "Inventory of caseflow management practices in European civil proceedings". The first report provides a facilitation guide and general analysis, guidelines, and advice for carrying out improvement work in courts. The second report is a collection of tools, practices and solutions applied in different European countries related to process performance improvement. The reports are based on an extensive set of data gathered during the project. The data gathering in the project included interviews and on-site visits in 12 different European countries, four expert workshops, as well as literature reviews and materials related to other court improvement projects. Information about the project and the project reports and materials can be found at: www.lut.fi/web/en/european-caseflow-management-development-network.
The second project "Handle with Care: assessing and designing methods for evaluation and development of the quality of justice" was carried out in 2017. The overall objective of the project was to study, analyze and improve the quality of justice in European countries. The project looked at the justice system as a whole, and the procedures aimed at improving the ways in which the system performs, operates, and generates public value. The quality concept in the project was broadly defined as covering practices and procedures of performance management and evaluation. The reports cover "classical" performance management practices, as well as more innovative practices emerging in the countries. The research team included members from 5 countries: Finland, France, Hungary, Italy, and the Netherlands. Four reports from the project are included in the data analysis: "Handle with Care: Assessing and designing methods for evaluation and development of the quality of justice" (summative report of project results), "Comparing the evaluation and development of the quality of Justice in Finland, France, Hungary, Italy and the Netherlands" (compares data collected at national level to identify common trends and diverging paths), "Something good? In search of new practices to improve the quality of justice in EU" (identifies innovative practices and analyzes conditions for a successful implementation at EU level), and "Performance management of courts and judgesorganizational and professional learning vs political accountability" (develops a methodological framework for performance and quality evaluation and improvement in courts). The reports are based on an extensive set of data collected during the project. The data gathering in the project included interviews in five countries, two expert workshops, as well as literature reviews and documentary analysis. Information about the project and the project reports and materials can be found at: https://www.lut.fi/web/en/school-ofengineering-science/research/projects/handle-with-care. Table 1 summarizes the materials and reports produced in the projects and their data gathering methods.

Data analysis
First, an analysis framework was formed based on literature. The analysis framework enabled the categorization of the tasks according to two dimensions: the main motive of the measurement and the main stakeholder level. The aim of this categorization was to create an overall picture of the output measurement and enable detailed analysis on how the general performance measurement motives are realized as concrete tasks from the perspective of different stakeholders.
Three stakeholder groups were included in the data analysis framework: individual (judge), organizational (court) and societal (formal national and governmental justice institutes). The measurement practices were further categorized according to these groups into two measurement levels: measurement tasks at the level of societyorganization and on the level of organization-individual. Measurement tasks set on the level of society-individual European justice systems (e.g. measures related to occupational development, position appointment and promotion) were also identified. These were excluded from the categorization as the measures were not related to output, but rather on quality-related issues in professional work of judges.
Three basic measuring motive categories were formed based on the literature review and included in the data analysis framework: control, motivate and learn. The aim of this concise categorization was to form the focal and most central motive categories for the purposes of coherent data analysis. The control category was defined to include also the different aspects of financial performance (e.g. budgeting) and the evaluation of performance. The learn category included also the promote, celebrate and improve motives of measurement. Other categorizations have been utilized in different studies in the field (see, e.g. Behn, 2003;Spekl e and Verbeeten, 2014;Vecchi, 2018). These categorizations have been used as a basis in creating adequately simple and distinctive categories for analysis purposes.
The basic measuring roles in the data analysis are defined as the "Principal" (responsible for diagnostic and measurement decisions) and the "Actors" (responsible for the actual operations and performance).
The categories included in the analysis were defines as:

IJPPM
(1) Control Purpose: to plan operations and to monitor process and performance The principal identifies deflections in performance levels and executes corrective actions (single-loop learning) (2) Motivate Purpose: to encourage and influence actors in improving performance The actor decides and implements improvement initiatives and the principal rewards success (3) Learn Purpose: to understand performance improvement opportunities and the factors affecting performance All those concerned analyze actual performance and process-related factors. Aiming to find a "common path" for success and promoting a wider application of that path (double-loop learning).
Based on the categorizations along these dimensions, the framework for data analysis included six output measurement categories: control measures for society-organization level and for organization-individual level, motivate measures for society-organization level and for organization-individual level, as well as learn measures for society-organization level and for organization-individual level (see Figure 1).
In the data analysis process, the output measurement tasks, and the inherent challenges described in the reports were identified, coded, and categorized according to the data analysis framework and analyzed further. Firstly, all tasks related to output measurement were identified, marked in the material, listed and given an initial name. After the tasks were listed European justice systems and named, their underlying motives and use were analyzed for the categorization purposes. The materials provided detailed descriptions of the underlying aims and the use of the output measures in studied countries and organizations. These descriptions were used as a basis to relate each identified task to the category based on its main motive and stakeholder level. Two researchers first conducted this analysis of the reports and materials separately, after which the analyses and categorizations result were verified, discussed, and combined. At this point the reports were revisited multiple time in order to make sure that all relevant information related to individual tasks were included to the categorization.
In the second phase, the challenges and features of each identified and categorized task were analyzed further. The analyzed material included lot of information from different countries about the features and challenges connected to the output measurement tasks. These were utilized as a source in conducting for each task a summative description of main challenges connected to it. The analyzed tasks and the descriptions were broadened with documented and described examples and experiences. The data analysis was conducted at European level, not separating measurement practices according to different countries.
A summary of the main phases and results of the data analysis process is described in Figure 1.

Findings
The main findings related to the tasks in the motive categories and at the two stakeholder levels are summarized in Table 2. The tasks, and the challenges related to them, are described in more detail in the following sub-sections.

Tasks of output measurement at the societyorganization level
The control category at the society -organization measurement level was found to include tasks related to resource management and performance deviation management. Firstly, the control motive is connected to the task of overseeing and balancing workload and resources between individual court organizations. The basic principle connected to the task is the inherent need for objectivity in setting output goals and allocating resources between the organizations. The number of produced outputs is seen as a straightforward and easy measure to base resource allocation decisions on. The goal for the number of outputs and the resources needed are decided in the annual output-based budget and funding negotiations. The challenge inherent to the task is that the court organizations have large variations in their circumstances, operating environments, case-structures and thus in their workloads and resource needs. The differences in the case structure mean that the judicial cases vary in complexity and thus in the amount of time and resources needed for them. The case structure also varies from one year to another and cannot be completely predicted and planned. Therefore, mechanisms are required to proactively detect changes in the workload leading to over-or under-resourcing and the need for re-allocation of resources.
The challenges related to the commensuration of output goals and resources were found to be common in all studied countries. To tackle the challenges related to variations in casestructures and resource needs, different types of weighted caseload systems have been introduced. The systems aim to provide data that are more accurate for goal setting and resource allocation and to provide opportunities to compare the resource utilization of different courts more reliably and in more detail. Even though the need for these types of resource allocation tools is common for all the studied countries, the readiness and level of detail used differ among the countries. In Italy, the need for workload assessment has been recognized as one of the greatest improvement needs of the judicial system, but there is a lack of data needed to estimate the workload and correlate it properly with the resources. Hungary IJPPM has implemented a system which aims to detect large workload differences, as they found out that there are courts which had to deal with over two times more incoming cases than other courts. In 2012, Hungary announced a plan for developing a scheme for classifying incoming cases based on their type and difficulty level. The scale of the scores ranges from 10 to 50. Also, for example, Finland, Sweden, and Estonia have established detailed weighted caseload systems for resource allocation purposes where all cases have weighted scores (e.g. Finland scores cases from 0.1 to 5.4 points). These types of systems have had good acceptance in courts, even though it is acknowledged that the appropriateness of the scores needs to be constantly updated based on practical experiences. It can be said that, due to the large variations in the time consumption of different cases, even a rough and approximate weighted caseload system is better than not weighting the cases at all.
Secondly, the control motive is related to the task of controlling the productivity and performance deviations of individual courts. The overall aim of the task is to keep the production processes of courts in "a normal stage" with a balanced pending case inventory and sufficient output levels. Due to the independent nature of the court organizations, the control efforts and possible intervening need to be carried out while respecting the autonomic nature of the court. Therefore, the control efforts aim to set general control limits with built-in early-warning mechanisms, instead of targeting the control efforts at the handling of individual cases or at the performance problems of individual professionals.

General motives
Society -Organization Organization -Individual

Control
Managing resources and controlling performance deviations How to include different organizational circumstances and workload situations in balanced commensuration of goals and allocation of resources? How to respect organizational independence by sensitively intervening in productivity and performance deviations?

Detecting and controlling performance deviations
How to balance between output, timeliness, and quality ("the trilemma") in setting individual goals and allocating work? How to utilize performance measurement data in detecting and sensitively intervening in productivity and performance deviations? Motivate Encouraging court management to improve performance How to balance between output, timeliness, and quality in goal setting: the "trilemma" of setting performance targets? How to balance between "punishment of performance" and "circle of impoverishment" in output-based funding practices?
Encouraging judges to improve their performance How to create measurement practices which support independence, selfmanagement, peer-control, and professional pride and decrease the need for controlling and intervening? How to create opportunities for judges to participate in planning performance measurement processes and procedures? Learn Improving and developing the entire judiciary in society The need for measures has been identified although existing practices are rare How to overcome inherent organizational circumstances and strong organizational identities hampering benchmarking and distribution of good practices?

Developing the individual courts and the judges
The need for measures has been identified although existing practices are rare How to include measures which support the exchange of knowledge and experiences effectively (e.g. different type of mentoring practices) and respect selfmanagement and independence? Table 2. Summary of the main tasks related to output measurement systems in justice organizations It became evident in the data analysis that issues related to throughput time and delayed cases are the easiest performance deviation area to control at society level. In time related issues, societies usually set limits and allow variations within the limits. For example, in France the goal is that 2/3 of the cases need to be handled in the set mandatory time limit. Sweden has implemented a sophisticated and detailed system for controlling performance deviations. In the Swedish system, a balanced pending case inventory refers to the number of pending cases which is "normal" and in balance with the resources and still allows a court to meet the set timeframes. In Sweden, the government has mandated that 75% of cases should be solved within the given time limits. The system uses the term "balanced inventory ratio" which refers to the ratio of a balanced pending case inventory and the number of incoming cases per year. The Swedish method is based on the empirical evidence that there is a clear linear relation between the throughput time and the inventory ratio.
Even though setting general time limits enable a sensitive approach to controlling performance deviations at court level, it contains challenges and has led to gaming behavior and perverse effects. Most of all, it has led to excessive solving of simpler cases to meet the limits and thus to long delays for the more complex cases. Within the control limits, it is possible that some cases get extremely delayed with no control mechanisms to detect them.
The motivate category at the society-organization measurement level includes tasks related to goal setting and output-based funding practices. Practices related to these areas have an impact on court management's motivation to improve all aspects of an organization's performance. The identified tasks in the motivate category are linked to the control tasks, as the sense of fairness and objectivity are the premises for the motivation to improve performance. The central motivation tasks in the data analysis were related to the so called "trilemma" of setting performance targets for courts. Courts have three important performance target areas: number of outputs, quality, and timeliness. In all analyzed countries, society level motivation efforts concentrate heavily on the improvement of the output number, creating a balancing problem between the accomplishment of all target areas. In many countries, for example in Hungary, recent reforms have put an even greater emphasis on the most visible numerical indicators which can only include certain aspects of the overall performance. This has led to a situation where society strongly motivates the accomplishment of output targets and the inherent values and ethics of professionals motivate the accomplishment of quality. Timeliness targets do not have equally clear motivational mechanisms, leading to delays and problems in the timely handling of all cases.
Excessive highlighting of the number of outputs also affects the motivational and rewarding aspects related to output-based funding. The inherent dilemma of providing financial incentives without punishment of performance or impoverishing the poorly performing organizations was relevant in all analyzed countries. If good output performance is rewarded by extra funding, it indirectly "punishes" the weaker organizations. Under market economy this would not be a problem, but in the case of public organizations providing public services, it creates problems of equal services to citizens. In the name of equality, even the poorly performing organizations need to be ensured adequate funding to deal with the workload. This sometimes leads to situations where extra funding is given to the underperformers. If good output performance is not rewarded, there are no incentives to raise performance above average.
In the Netherlands, the budget of a court fluctuates depending on the amount of output (95% of the court budget is based on the number of output). The more judicial cases are solved, the more budget will be assigned to the court. If a court produces more than expected, they receive 70% of the agreed price per product from the equalization account. When producing less than agreed, they must deposit 70% of the agreed prices of the cases not finalized into the equalization account. This is a strong stimulus for the courts to enhance production, but the system has been criticized. The system can also easily lead to gaming and IJPPM other unwanted behavior. Many Dutch judges have signed a manifest to protest the court financing system. The main critique is connected to the excess focus on output targets. The law provides that considerations of quality should play a role in determining the price of the case categories. It is argued, however, that the considerations of quality do not actually influence the financing of courts. The courts have responded to the problem by drafting professional standards and a system for the better integration of different targets into judicial budgeting.
The learn category measures related to the improvement and development of the entire judiciary are rare in the studied countries, but the need for them is clearly identified. The potential for using qualitative evaluation tools and performance data and measures to learn is recognized across judiciaries. One example is Italy, where it is recognized that the considerable amount of data created and managed by the courts has a social and economic impact (given that most disputes between people, companies, and public institutions find a resolution within the court). Making such data freely available to the public, organizations and institutes would improve knowledge of the many events affected or processed by the courts. There is an ongoing data-driven project aiming to make available the data collected by the justice system at national level. The action is a learning initiative based on data collection, analysis, and dissemination of the data on the Italian judicial system. The main objectives of the project are to support the measurement and benchmarking of the activities carried out by the judicial system and to enhance transparency and public accountability.
It also became evident that overemphasizing different organizational circumstances and strong organizational identities hamper benchmarking and distribution of good practices in the judicial system. Measurement does produce comparative data and would allow comparative learning and the detection of differences. In addition, a lot of feedback and assessment data is already collected. However, this measuring data is not currently utilized effectively for learning tasks and purposes. For example, in the improvement projects carried out in the Finnish judiciary, several means to improve the dissemination of the project results across court organizations were designed. The challenges in these systematic benchmarking efforts were connected to the emphasis on the differences in operating environments, circumstances, and case-structures between courts (so called "not-invented-here attitude"). Improvement in learning from other courts would require changes in the mindset related to organizational independence and autonomy connected to operational activities and improvement.

Tasks of output measurement at the organization-individual level
The control category at the organization-individual measurement level was found to include two tasks both aiming to detect and control the performance deviations of individual professionals. The challenge of balancing goal setting -"the trilemma" between output, timeliness, and quality goals -was clearly evident also at the organization-individual level. At this level, the tasks and effects of the trilemma relate more to control and work allocation practices than motivation (as at the society-organization level). The basic premise is that individual professionals are expected to produce the same number of outputs. Average throughput-time is also emphasized in controlling the performance of individual judges. These goals combined can cause perverse effects by encouraging excessive solving of simpler and newer cases to reach the output and throughput time goals. For example, in France the target that two-thirds of the cases be handled within a mandatory time limit has produced different type of unwanted behavior (e.g. a crowding-out effect on other cases for which the period of non-mandatory processing shall then be extended).
The weighted caseload systems used in different countries enable the comparison of output levels in more detail and reduce the gaming of the numbers. The weighted caseload systems are also used to allocate cases between judges as evenly as possible. For example, in Hungary the system of weighting cases is seen to significantly improve the objectivity of goal setting and case allocation between judges. Due to the introduced case weights, the workload of the judges can be compared and balanced much more easily and more reliable data on the performance of individual judges can be obtained. The challenge recognized was the fact that the weight is assigned to the case at the very beginning of the trial (initial weight), and the complexity of the case can change later on during the process. As a response to the challenge, courts have introduced the system of "post-weighing" cases to obtain a more reliable calculation of individual workload and performance.
The other control task relates to situations where some individuals cannot reach their goals. When these performance deviations are detected, an intervention needs to be carried out while respecting the autonomic position of the professionals. Managers need to constantly balance between the need to intervene in performance problems and respecting autonomy. An important task of managers is to make use of the concrete support received from the set targets in detecting and sensitively intervening in productivity and performance deviations.
Different types of systems which proactively detect performance deviations have been implemented in many countries. These systems enable "early-warning" signals and produce transparent and real-time performance data for court management and self-management purposes. For example, in Slovenia there is a tool called "the President's Dashboard" available to all court Presidents. The tool provides real-time information on individual performance. In Finland, a time-frame alarm system has been established to improve personal work planning, to reduce the number of pending cases and to eliminate long delays. The idea of the system is that those cases which are in danger to lag behind are detected early on when the set timeframe can still be reached. With the help of the time-frame alarm system, individual workers can monitor and control their own case inventory and schedule the work while taking account of old cases. The listings of pending cases are available and transparent to all in the organization. Also, "late warning" systems facilitate self-management. For example, in Estonia, at the beginning of each year, all judges get a list of "old" cases, and they need to provide explanations on why there is no final judgement yet. In every following quarter, the judges have to describe how the listed cases have proceeded since their reporting. Thanks to the system, the number of old cases has decreased by nearly 10 times. Also, for example Italy and Austria have similar checklist systems implemented.
The motivate category at the organization-individual measurement level also includes tasks which aim to support independence, professional pride, and self-management. Another important motivation task is to create opportunities for the professionals to influence the performance measures and participate in the target setting process and procedures. Setting unbiased targets and keeping the performance information completely transparent supports self-management and peer-control. Self-management, work ethics and professional pride are important means especially in ensuring the quality of cases. The identified tasks in the motivation category encourage judges to improve their own performance also in relation to productivity and timeliness, and thus reduce the need for controlling and intervening. It became evident in the analysis that, in recent years, the emphasis and importance of productivity as part of professional work has increased. The transparency of performance data also has a significant impact on the motivation to improve all aspects of performance. An important task in strengthening self-management is to create opportunities for individual professionals to participate and influence target setting and performance measurement. In the data analysis, several performance improvement projects were identified where the systematic and wide participation of professionals significantly improved the results and success of implementation. For example, in Hungary, a timelinessproject (The "Debrecen model") was carried out with a strong emphasis on the bottom-up IJPPM approach of improvement. The project targeted a comprehensive change in the attitudes of individual professionals with good results. In Finland, quality projects have been carried out in the jurisdiction of the Rovaniemi Court of Appeal, where the most central stakeholders of the jurisdiction have been actively involved. Based on the projects' results, it was concluded that a participatory, bottom-up approach increased peer-to-peer interaction between judges, improved attitudes towards change, encouraged discussion about work productivity among professionals, and created a culture for improvement.
Learn category measures, as well as goals and measurement practices which directly support learning aspects at the organization-individual level, are practically non-existent. However, the need for them has been widely recognized in the studies countries. In particular, there is a need to increase the understanding of individual professionals related to the overall performance situation of the organization and the individual's own role in creating productivity and well-performing processes (namely understanding how the "court-factory" works and functions). Also, in the learn category, the measuring tasks need to respect selfmanagement and independency. Designing ways to exchange knowledge between peers could be beneficialsimilarly to the participatory improvement projects carried out in different countries or creating different types of mentoring practices. Learning should be based on individual needs. Every judge has regular and mandatory judicial training based on their individually assessed needs. This type of system could be broadened to cover performance management related assessment and training (for example monitoring and work planning skills).

Discussion and conclusion
This study aimed to empirically explore and analyze the concrete tasks and inherent challenges of output measurement in European justice organizations. The overall goal was to include in the analysis the main general motives of performance measurement and the main stakeholders' perspectives. The strength and main source of originality of the study is that it provides a structured overview of output measurement practices and challenges in the professional field of justice. The study also highlights the importance of output measurement as a part of balanced performance measurement systems.
As the study use secondary source of evidence, the challenge is the lack of detailed descriptions related to output measurement practices. Therefore, the analysis required a relatively simplified analysis framework and to some extent subjective interpretation. The project reports deal with the years 2014-2017 and the results do not include any developments undertaken after that. As an addition, the analysis does not include all European countries.
The main implications of the study relate to the roles and motives of output measurement in professional organizations in general. The main implications point out that even though output measurement emphasizes control motive, there is room for development especially in self-management purposes and in learning from measurement information.
It can be concluded that there exists a clear need to study the possibilities to improve output measurement practices in professional public organizations, even though the research focus has shifted to the improvement of outcome measurement (see, e.g. Barbato and Turri, 2017;J€ a€ askel€ ainen and L€ onnqvist, 2011). Achieving a flawless output measurement system at once is impossible, but a planned, systematic, and conscious improvement process is needed. Thus, the improvement of output measurement needs to be a continuing process, highlighting the importance of organizational and individual participation and their acceptance of the practices.
Previous studies have highlighted budgeting and resource allocation decisions and tasks in relation to output measurement in professional public organizations (e.g. de Bruin, 2002;Francesco and Guarini, 2018;Hur, 2018;J€ a€ askel€ ainen and Roitto, 2014;Kristiansen, 2017). This study also confirmed that output measurement tasks, especially at the societyorganization level, are largely driven by resource allocation needs. Furthermore, the need for objectivity and a sense of fairness proved to be central elements in designing and improving resource allocation practices and tasks. Objectivity and fairness should be considered fundamental parameters in designing budgeting and MBOR practices at society level, as well as work allocation practices at individual level. The analysis revealed that, in the European judicial sector, several systems and tools have been designed and implemented in order to improve the objectivity and impartiality of output measurement, budgeting, and resource allocation practices. The application of these types of systems should be further improved, even though it involves multidimensional and complex efforts connected to highly intangible outputs produced in a process with several uncontrollable circumstances. The study indicated that the sense of fairness and objectivity concerning the output measures have a large impact on both the approval of control tasks and to the success of motivational tasks.
In the judicial sector, the used output measures are clearly designed for mainly control purposes and designed based on practical reasons that are too straightforward. It has been argued that this is quite common in professional public organizations and can lead to simplifications of reality and unwanted behavior (e.g. de Bruijn, 2011;Johansson, 2015). The main indicators used in the justice system (number of output and average throughput time) have several practical advantages. They are reliable and unbiased, easy to quantify and understand, as well as functional in controlling performance deviations. It can also be said that the use of these types of simplifying measures enables society and managers to intervene in performance deviation problems sensitively and objectively. However, the analysis clearly showed that simplifying measures does not have good validity, leading to different types of gaming and unwanted behavior. In the studied countries this has caused resistance towards measurement practices and long delays for some judicial cases. It can also be argued that the measures are not informative enough to be used for motivation and learning tasks and purposes, highlighting the control tasks of output measurement even further.
The study focused attention on the need for designing measurement practices which support the self-management of professionals. Self-management proved to play a central role in the motivational aspects of output measurement in the judicial sector and also decreased the need for controlling. Based on the study, self-management reduces possible tension between managers and professionals and decreases the number of professionals rejecting or abusing the output measures (see, e.g. de Bruijn, 2011). Self-management practices also increased the individuals' understanding of their role and contribution to the overall performance of the organization. Transparency of performance data and peer-control proved to play an important role in facilitating self-management. Previous studies highlighting the importance of giving individuals wide possibilities to influence and commit to performance measurement practices (e.g. J€ a€ askel€ ainen and Roitto, 2014;Van Thiel and Leeuw, 2002) have touched on the concept of self-management. However, it can be concluded that self-management and ways of improving intrinsic motivation should be studied more in relation to performance measurement and management in professional public organizations.
The need to improve organizational learning from measurement information has been recognized in the literature (e.g. Kalgin et al., 2018;Laihonen and M€ antyl€ a, 2017). Similarly, the need for improving the learning tasks of output measurement became clear and evident across the studied judiciaries. The used indicators allow comparison in terms of objectivity and use of resources, but they are not sufficiently exploited in information sharing, benchmarking, and learning. Based on the study, the requirements for using measurement structures and procedures to learn will emphasized even more in the future. This is why it can be concluded that improving the learning tasks of output measurement in professional public organizations is a central area in need of future research and improvement efforts.