Sustainability performance measurement – a framework for context-speci ﬁ c applications

Purpose – Researchers and practitioners have recently been interested in corporate sustainability performance (CSP). However, knowledge on measuring CSP is limited. Many CSP-measurements are eclectic, without guidance for contextual applications. This paper aims to develop a conceptual framework that categorizes, explains and evaluates measurements based on their accuracy and precision and provides a guideline fortheircontext-speci ﬁ c application. Design/methodology/approach – The authors conducted a systematic literature review of an initial sample of 1,415 papers. Findings – The ﬁ nalsampleof74paperssuggestedfourmeasurementcategories:isolatedindicators,indicator frameworks, Sustainability Balanced Scorecards (SBSC) and Sustainability Performance Measurement Systems (SPMS). The analysis reveals that isolated indicators are inaccurate and imprecise, limiting their application to organizations with delimited, speci ﬁ c measurements of parts of CSP due to the risk of a GIGO-effect (i.e. low-quality input will always produce low-quality output). CSP-indicator frameworks are imprecise but accurate, making them applicable to organizations that handle a more signi ﬁ cant amount of CSP data. They have a risk of greensplashing, i.e. many indicators not connected to the industry, organization or strategy. In contrast, SBSCs are precise but inaccurate and valuable for organizations desiring a comprehensive strategic management tool with limited capacity to handle sustainability issues. They pose a risk of the streetlight effect, where organisationsdonotmeasurerelevantindicatorsbutwhatiseasytomeasure. Originality/value – The ideal CSP-measurement was identi ﬁ ed as SPMSs, which are both precise and accurate. SPMSs are useful for organizations with complex, comprehensive, connected and tailored indicators but aremethodologicallychallenging.


Introduction
Corporate sustainability performance (CSP) has grown in importance, especially in the past decade or two (Grewal and Serafeim, 2020).Increasing pressure from both external and internal stakeholders, forces organizations to actively manage and account for their activities' sustainability (O'Dwyer et al., 2005).Therefore, precise and accurate CSPmeasurements are important.A plethora of standards, guidelines and measurement approaches for measuring CSP have emerged, including ISO 26000 (Antolín-L opez et al., 2016), SA8000 (Schrippe and Ribeiro, 2018), Global Reporting Initiative (GRI) (Siew, 2015) and other indicator frameworks (Schneider and Meins, 2012), such as standards of the International Sustainability Standards Board (ISSB), the Task Force for Climate-related Financial Disclosures (TCFD) and the European Sustainability Reporting Standards (ESRS), sustainability balanced scorecards (SBSC) (Hubbard, 2009) and sustainability performance measurement systems (SPMS) (Pryshlakivsky and Searcy, 2017).These CSP-measurements are also used by academic research to represent actual organizational sustainability practices (Venkatesh et al., 2021) or sustainability performance (Grewatsch and Kleindienst, 2017).
Problematically, CSP-measurements can easily misrepresent organizational practices (and thus performance) and result in biased conclusions if they lack precision and accuracy (Adams and Frost, 2008).Precision can be defined as whether a CSP-measurement can be repeated with the same result (Rukmana, 2012).In contrast, accuracy can be defined as how conclusions drawn from a CSP-measurement reflect reality (Shiu et al., 2009).Measurement bias may relate not only to the nature and number of indicators representing CSP (e.g.Adams and Frost, 2008;Keeble et al., 2003;Sureeyatanapas et al., 2015) but also to the appropriateness of the data proxies used (e.g.Jasi nski et al., 2016).Environmental, social and governance (ESG) scores from large data providers are widely used in practice and research as proxies for sustainability, even though they can be highly problematic in terms of precision and accuracy.ESG scores are also imprecise because of the sheer variety, inconsistency and differences in methods used to deal with data gaps (Grewal and Serafeim, 2020).
This study conducted a systematic literature review of CSP to answer the research question: RQ1.What are the trade-off effects of different corporate sustainability performance measurements on precision and accuracy?
The aim is to provide a conceptual framework for CSP measurements that guides future research at the organizational level in choosing appropriate CSP-proxies and assessing their accuracy and precision.We identified four CSP-measurements by reducing the initial 1,415 CSP papers to 74 seminal papers published in 28 journals between 1987 and 2021.Each consecutive measurement incorporated the previous method: isolated sustainability indicators; sustainability indicator frameworks; SBSC; and SPMS.
We identified different trade-offs in precision and accuracy for these four CSPmeasurements and potentials for bias.Trade-offs exist in the selection of indicators, the number of indicators, the specification of indicators and the contextual fit of indicators.We also identified four bias effects related to the types of CSP-measurements: a potential JGR Garbage-In-Garbage-Out (GIGO)-Effect, a Greensplashing Effect, a Streetlight Effect and a methodological challenge.Finally, we provide context-specific applications of different CSPmeasurements.
The remainder of the paper is organized as follows.In Section 2, theoretical foundations are provided, while Section 3 describes the methodology on which this paper is based.Subsequently, a systematic literature review is conducted in Section 4, and a content analysis of the papers is presented.In Section 5, the precision and accuracy of the identified CSP-measurements and their corresponding risk for bias are discussed, resulting in a conceptual framework.This paper discusses and concludes in Section 6 with implications for theory and practice, limitations and opportunities for future research.

Theoretical foundations 2.1 Defining corporate sustainability
Concepts, definitions and delimitations between the concepts of sustainability and corporate social responsibility have been discussed for more than 70 years.Where sustainability originated on the green, environmental side (Caradonna, 2014), focusing on for example ecojustice and eco-efficiency, CSR originated more on the social, human side (Strand et al., 2015), focusing on ethics, philanthropy and social responsiveness.Nowadays both incorporate both sides, focusing on value and costs for all material stakeholders.Even though CSR is used widely, it seems that in a corporate context, most companies focus on corporate sustainability, as also can be seen in the new EU Sustainable Finance Framework, the EU Corporate Sustainability Directive and European Sustainability Reporting Standards (ESRS).This might be because corporate managers prefer the more rational language of sustainability over more normative CSR language (Strand et al., 2015).
The Brundtland Commision (1987, p. 15) defined sustainable development as the development that "meets the needs of the present without compromising the ability of future generations to meet their own needs".This widely accepted definition requires operationalization for specific sustainability fields (Seuring et al., 2003).When incorporated by a company, sustainable development is referred to as Corporate Sustainability (CS), which can be defined as "demonstrating the inclusion of social and environmental concerns in business operations and interactions with stakeholders" (van Marrewijk, 2003, p. 102) and having "an intentional strategy to create long-term financial value through measurable societal impact" (Grewal and Serafeim, 2020, p. 2).As with stakeholder theory (Freeman, 2010), this concerns the total value and cost of doing business for all stakeholders, and not just focusing on shareholders and profit.However, no widely agreed definition of CS exists (Chen et al., 2017).

Defining corporate sustainability performance
The literature often sees CS as a three-dimensional construct composed of economic, social and environmental sustainability.Performance on these dimensions has to be managed and measured.Corporate sustainability performance (CSP) [1] can be defined as how well organizations contribute to the Triple Bottom Line (Elkington, 1997) of environmental stewardship and social responsibility while maintaining an economically viable business (Wagner, 2010).CSP reflects how a company contributes to the intention and principles of sustainable development, that is, the impact on society and the natural environment (Hillman and Keim, 2001;Xiao et al., 2018).Since CSP reflects practices, it focuses on the organizational level.

Corporate sustainability performance measurement
Measuring CSP is not easy.General methodological concerns about CSP-measurement include that the identified indicators should reflect the characteristics of the industry and organization (Hubbard, 2009;Sureeyatanapas et al., 2015) and strategy (Adams and Frost, 2008).Additionally, indicators should be flexible and change over time (Keeble et al., 2003).
To foster this, data should be collected continuously by involving managers, experts and relevant stakeholders (Adams and Frost, 2008).The team should consider the organization's internal processes and surroundings and emphasize the selection and specification of KPIs, design, indicator weights and results evaluation phase (Wicher et al., 2019).
Common CSP-measurements rely on indicators companies developed internally or extracted from the plethora of guidelines that exist right now.Guidelines are made by national governments and accounting bodies, supranational governments, like the EU ESRS, international non-profits like the ISSB, or NGO-standards like the ISO-standards, SA8000, and the GRI.Consolidation is starting in that for example the ISSB now incorporates the Sustainability Accounting Standards Board and the TCFD, while the EU ESRS state they a.o.take account of the SDGs, the UN Global Compact, the UN PRI, the OECD Guidelines for MNEs, the ILO Principles and ISO 26000 (EU, 2022).
CSP measurements extend from simple isolated indicators that are not holistic, through connected indicators that add more features and complexity, toward comprehensive measurements tailored to and integrating other performance measurement systems in the organization.We use this preliminary framework to conduct a literature review.

Methodology
We conducted a systematic literature review of CSP measurements.We followed the systematic approach of Tranfield et al. (2003) for the literature search and applied the PRISMA -guidelines for structured and transparent reporting (Moher et al., 2009).We refer to the Appendix for a detailed description of the three stages.

Systematic literature review 4.1 Descriptive results
Figure A1 (Panels A-D) in the Appendix provides information on the systematic literature review, including a descriptive analysis of the number of papers published each year, affiliation of the first author and categorization of journals.Panel A shows that CSP measurements represent a relatively new area of research with 65% of the papers published within the past six years.According to Panel B, 26 countries are represented, which shows that it is a topic of global interest.Most studies stem from European researchers, which could imply some form of Eurocentrism and/or Europe having significantly more focus on sustainability.Panel C shows that 43% of the sample were cited less than 50 times, with the average number of citations per paper being 183 and the median 56.5.Panel D reveals that 62% of the papers were published in the sustainability, business and ethics areas, while accounting is grossly underrepresented with only two papers.This is surprising since performance measurement is a central management accounting topic (Lueg and Radlach, 2016).

Content analysis
The following section analyses the theories, research methods and CSP measurement approaches used in the study's sample.
4.2.1 Theories.Some researchers use traditional theory like stakeholder theory or institutional theory.In contrast, others are phenomena-driven and fact-centred and do not JGR apply any specific theory, which is in line with previous observations (Montiel and Delgado-Ceballos, 2014).The choice of theory reflects assumptions regarding the phenomenon and Table A2 in the Appendix provides an overview of the theories applied in the sample [2].We found that 65% of the papers were phenomena-driven without applying a specific theory.Among the papers applying traditional theory, stakeholder theory was used most, either as an isolated theory or through TBL or the balanced scorecard (BSC).Other theories include agency theory (n ¼ 1), the resource-based view (n ¼ 5) and institutional theory (n ¼ 3).Stakeholder theory focuses on CSP-measurement and value for all stakeholders, while institutional theory relates CSP-measurement to what is legitimate (Glover et al., 2014) and focuses on social aspects.RBV-theory is related to strategic choices and decision-making (Arag on- Correa and Sharma, 2003;Bowen, 2007), which could be argued to affect the scope of measurements.
4.2.2Methods and research objectives.Research in this field often has one of two objectives: (1) to develop new CSP-measurements; or (2) to explore theoretical study subjects and existing frameworks related to CSPmeasurement.
The latter focuses on similarities and differences between CSP measurements (Antolín-L opez et al., 2016) and the decision-making effect of CSP measurement (Adams and Frost, 2008) or deepens insights into applications (e.g.Montiel and Delgado-Ceballos, 2014).More than half of the papers in the sample are attempts to develop new CSP measurements, which suggests that research in this area is not saturated, and that an accurate and precise CSP measurement probably still is unavailable.4.2.3Research methodology.Methodologically, research on CSP measurements has focused on three different streams.The first and most essential track (51% or n ¼ 38) is where researchers seek insight into CSP-measurements by making baseline conclusions on existing literature, such as extracting general KPIs (Adams and Frost, 2008) or increasing understanding in the field (Montiel and Delgado-Ceballos, 2014).In the second track (35%), researchers use archival data to conclude multiple settings, for example, extracting KPIs based on guidelines and standards (Engida et al., 2018;Sartori et al., 2017) or by testing conditions that can later be generalized (Bodhanwala and Bodhanwala, 2018).In the third, most specialized track, researchers analyse a specific organization or industry by collecting primary data through surveys and interviews.However, surveys and interviews remain underrepresented (see Table A3 in Appendix).Further research using these methods could potentially contribute new knowledge.The lack of methodological variety could be why the field of CSP measurement partially still is immature.

Measurement approaches (instruments).
While our pilot study initially pointed toward isolated, connected and comprehensive CSP-measurements, we identified one more category after thoroughly examining the 74 papers.This category consists of measurement approaches tailored to other management systems in the organization, providing comprehensive financial and non-financial information measures.We identified two different measurements integrated into the other systems: Sustainable Balanced Scorecards (SBSC) and Sustainable Performance Measurement Systems (SPMS).An SBSC has the same philosophy as a BSC, and therefore argues for the 80/20 Pareto principle (i.e.80% of the consequences come from 20% of the causes).An SPMS builds on many input indicators and is anchored to other management systems.This yields four categories of CSP measurements: (1) isolated indicators; (2) indicator frameworks; (3) SBSC; and (4) SPMS.
64% of the papers argue that CSP should be measured using sustainability indicators integrated into a specific framework or model.In comparison, 22% argue using indicators in an isolated manner.Only a few studies claim that CSP-measurement entails integrating indicators within an SPMS (7%) or SBSC (7%).As most studies apply secondary data with a theoretical focus (see Table A3 in the Appendix), it is not surprising that the literature focuses on developing isolated indicators and indicator frameworks, as these are more isolated and narrower, and therefore do not necessarily require a more profound analysis in a specific organization.On the other hand, SPMS and SBSC are connected to other systems within organizations.We argue that researchers should emphasize SPMS and SBSC research because of precision and accuracy issues in frameworks and isolated indicators.
The following sections provide information on the four different categories of measurement approaches.
4.2.4.1 Isolated indicatorsdescription and context-specific application.Firstly, we identified CSP measurements using indicators that are not part of a framework, system or model.Organizations individually collect sustainability indicators without applying a structured method of collecting, weighing or measuring them.This leads to an unstructured overall CSP measurement tailored to the organization.It is the simplest and fastest CSP measurement method.However, there are limitations regarding precision and accuracy because they do not allow the handling of a large amount of data.The purpose is often delimited and theoretical, including a specific investigation of how KPIs can be used in CSP measurement (Adams and Frost, 2008), the development of a standardized list of possible KPIs (Antolín-L opez et al., 2016), and an investigation of CSP measurements related to decision-making (Epstein and Widener, 2010).However, it might not capture the entire construct of CSP measurement because it can be (too) complex to measure all relevant indicators without a strategy.Gianni et al. (2017Gianni et al. ( , p. 1298) argue that when relying on indicators, "organizations seem often failing to prove that internal operations deal with sustainability issues yielding results that come out as improvement in sustainability indicators."Isolated indicators may not be able to accurately measure sustainability if they do not include key event details, time or space dimensions (Lodhia and Martin, 2014).
We argue that KPIs can measure specific and delimited areas of CSP and assist managers in decision-making but cannot be used for an overall CSP measurement.International standards and guidelines can be used in CSP measurements but remain helpful only in the lowest category of measurements, namely, for isolated indicators.CSP measurement using isolated indicators is helpful for simple, large-scale research and organizations with low complexity related to sustainability issues owing to low precision and accuracy.Isolated indicators are helpful for organizations that know exactly what to measure, either through ad hoc or regular monitoring.An example of isolated indicators are the 79 indicators suggested by the GRI.4.2.4.2 Indicator frameworks and modelsdescription and context-specific application.Secondly, we identified CSP-measurements based on a framework or model (e.g.Ahmad and Wong, 2019;Engida et al., 2018;Jasi nski et al., 2016;Krajnc and Glavi c, 2005).These include all aspects of the first category of CSP measurements and other features.The new features include being causal, data-driven and comprehensive, and involve a process from collecting data to weighting indicators (Jiang et al., 2018).As frameworks often include a recipe that JGR includes both preparatory steps and the execution of the measurement, it is possible to handle a more significant amount of data.
Even though frameworks represent a more complex CSP measurement than isolated indicators, they are not integrated with other management systems in the organization, as they can be used separately.Frameworks allow the use of more indicators without the same risk of information overload (Jasi nski et al., 2016) but serve mainly to understand and visualize the measurement process.They are most beneficial to complex organizations because structured measurement allows for a more significant number of indicators and structuring of the data collection process.
4.2.4.3 Sustainability balanced scorecarddescription and context-specific application.Thirdly, we classified SBSCs as a category of CSP measurements.An SBSC is a performance measurement and management framework based on Kaplan and Norton (1992) BSC.The BSC was initially organized with four performance perspectives to balance financial and nonfinancial, short-term and long-term and qualitative and quantitative measures (Kaplan and Norton, 1992).The SBSC expands beyond this by explicitly focusing on the organization's environmental, social and ethical aspects (Hubbard, 2009).There is a broad consensus that an SBSC can be used to measure CSP in different ways, including, but not limited to, integrating social and environmental measures within the four perspectives of the BSC and adding specific social and environmental dimensions to the traditional BSC (Hubbard, 2009;Journeault, 2016;Vieira et al., 2017).Consequently, an SBSC is a strategic management approach with a high level of integration, as CSP measurement is integrated into a system that does not require parallel systems such as separate environmental, social and financial management systems (Hansen and Schaltegger, 2016).SBSCs are more complex and can be integrated into systems other than isolated indicators and frameworks.Researchers argue that the number of indicators must remain low when using SBSCs for CSP measurements, following an 80 / 20 Pareto principle (Hubbard, 2009).SBSCs require sustainability strategies; therefore, they are valuable for organizations that formulate these strategies (Hubbard, 2009b).
SBSCs are appropriate for organizations that already use a BSC because it is easier to build on an already recognized tool (Hubbard, 2009).To avoid compromising the validity of this CSPmeasurement, vital insights into the relevant sustainability indicators are crucial.This is important because the number of indicators used was low; therefore, the selected indicators must be relevant.Moreover, because it contains a high degree of integration with other systems, this tool is especially appropriate for organizations that wish to integrate sustainability into their strategy (P adua and Jabbour, 2015; Pryshlakivsky and Searcy, 2017).In addition, an SBSC can be helpful for organizations that wish to report on their sustainability performance, as the SBSC, if handled correctly, adopts the necessary stakeholder view going beyond shareholder value (Hansen and Schaltegger, 2016;Hubbard, 2009).
4.2.4.4 Sustainability performance measurement systemsdescription and contextspecific application.Performance measurement systems go beyond a performance indicator catalogue, as they need the integration of indicators with the infrastructure required to use and interpret these data.An SPMS can be defined as a: system of indicators that, in short-and long-term, provides the corporation with information necessary to assist in the management, control, planning and performance of its economic, environmental and social activities (Searcy, 2012, p. 240).
Even though many different SPMSs exist, their main principles are essentially the same: the measurement is conducted based on the vision and strategy of the organization (Nawaz and Koç, 2018;P adua and Jabbour, 2015), indicators are chosen from a broad range of perspectives to create a holistic and balanced view and a list of critical indicators such as success factors is developed when the system is designed (Vieira et al., 2017).
SPMS differs from SBSC in that it includes individual and composite indicators.In contrast, SBSC is a "performance measurement package" that integrates multidimensional performance measurement and management models (Hansen and Schaltegger, 2016, p. 195).Moreover, it incorporates a more significant number of indicators, whereas SBSCs facilitate the identification of a smaller number of successful drivers to focus only on essential indicators (Vieira et al., 2017).Thus, although SPMS and SBSC are both PMSs, they are at the opposite end of the scale when evaluated by the number of indicators and the amount of information involved.In addition, a SPMS has a more external view and is forward-looking as it also contains a strategic purpose (Silvi et al., 2015).Compared with SBSCs, SPMSs rely on a more significant number of indicators, making them especially appealing for complex organizations.SPMSs can also help investigate trade-offs, integrate them into the organization's core capabilities and better understand how sustainability-related issues can affect the organization and what decisions need to be taken to achieve sustainability (Pryshlakivsky and Searcy, 2017, Figure 1 on p. 329).

Synthesis
In the following, CSP measurement issues related to accuracy and precision are discussed.

Analysis of precision
Precision can be defined as whether a CSP measurement can be repeated with the same result (Rukmana, 2012).Explicitly related to CSP measurement, a precise measure would provide the same effect when repeating the measurement.Still, they do not necessarily show the "true" or the actual state of CSP.This depends on accuracy, as illustrated in Figure 1.
Repeatability and reproducibility are closely related to precision (McAlinden et al., 2015).Threats to measurement precision are factors that cause errors such as human error, changes in the environment or data collection processes (Haase et al., 2010).The repeatability of measurements concerns the variation in repeat measurements completed under identical conditions (i.e. the same instrument, same observer and a short period).Variations in these measurements can only be described according to errors in the measurement process itself (Bartlett and Frost, 2008).On the other hand, reproducibility concerns a deviation in measurements completed on a subject under changing conditions (i.e.different instruments used, different observers or a long period) (Bartlett and Frost, 2008).Surprisingly, precision is rarely discussed or tested in the CSP-literature, even though researchers argue that precision is a problem (Jiang et al., 2018;Morioka andCarvalho, 2016a, 2016b;Vieira et al., 2017).
Trade-offs exist in the selection of indicators, which range from simple (isolated indicators, SBSC) to complex (indicator frameworks, SPMS); the number of indicators that vary from few (isolated indicators, SBSC) to comprehensive (indicator frameworks, SPMS); the specification of indicators, which range from isolated (isolated indicators, indicator frameworks) to connected (SBSC, SPMS); and the contextual fit of indicators, which vary from generic (isolated indicators, indicator frameworks) to tailored to the specific industry, the organization and its strategy and culture (SBSC, SPMS).These trade-offs affect the precision of CSP measurements, and we argue that SBSC and SPMSs are more precise than isolated indicators and indicator frameworks for several reasons.Firstly, the specification of indicators in the measurement is connected to the organization or industry.Secondly, the contextual fit of the measurement is tailored to the organization.The structure and design of the measurements secured a high level of precision, as they were carefully developed for predefined purposes, with little subjectivity involved when carrying out the measurement.
A threat towards reproducibility, and thus to precision, is human error.The risk of human error increases as the amount of data to be handled increases.SBSCs include the fewest indicators of all measurements, limiting the risk of human error and enhancing reproducibility and precision.SPMSs involve a large amount of data, which increases the risk of human error.However, SPMSs are often standardized as part of a more extensive system, which lowers the risk of human error.Although isolated indicators can only handle a lower amount of data due to a lack of structure, the precision remains low, as there is no structured approach to select either indicators or data for these.

Analysis of accuracy
Accuracy is how the conclusion drawn from CSP measurement reflects reality (Shiu et al., 2009), and concerns questions about the validity and relevance of the measure.Accurate indicators should represent the phenomenon intended to be measured (Rusticus, 2014) and should be able to measure the given hypothesis (Ginty, 2013).Jiang et al. (2018) addressed the accuracy problem by testing validity and found that almost 33% of the measurement indicators did not meet the criteria to pass the analysis and were therefore excluded.This shows that accuracy is a substantial threat to CSP measurement, yet fewer than a dozen papers in the sample assess this risk.
Relying on a low (high) number of indicators results in less (more) accurate measurement of the CSP-construct.Relying on fewer indicators increases the risk that what is being measured is not necessarily CSP but only a small part of the construct.This is a substantial risk as organizations can "score" high on some features of CSP but low on others (Hubbard, 2009).Thus, we argue that CSP-measurements with a significant number of indicators (indicator frameworks and SPMS) are more accurate than isolated indicators and SBSC.If only a few indicators are used, the risk of reaching inaccurate conclusions increases.Frameworks and SPMS have better opportunities to incorporate indicators, as they have established a systematic method for collecting data and measurement.
Another threat to the accuracy of CSP measurements is the potential collection of nonrepresentative data.For example, using ESG scores without carefully investigating what they actually measure.ESG scores are being criticized for having low validity, due to convergent scores between different rating providers (Berg et al., 2020;Chatterji et al., 2009;Crifo et al., 2016).In addition, too much focus on readily available data can also threaten the accuracy of the measure.Using readily available data risks a streetlight effect, as researchers may search for measurements where it is easiest to see and not where it would be most accurate.In other words, the streetlight effect occurs when researchers study what is easy to learn (Hendrix, 2017) and managers measure what is easy to measure.
Another threat is the Garbage-In-Garbage-Out (GIGO)-effect, which generally refers to the idea that the quality of output is determined by the quality of the input [4].If researchers or managers rely on low quality, inappropriate or nonsensical data, the output will be similar.Even in situations where good input data are used, accuracy can still be an issue, as the weighting of the indicators must be accurate.With a low number of indicators (e.g.isolated indicators or SBSC), an inappropriate weighting scheme affects the overall accuracy of the measurement more than CSP measurements with more indicators.We argue that CSP measurements related to indicator frameworks and SPMS are more accurate than those of isolated indicators and SBSC.

Precision and accuracy effects of the measurement
Issues related to the precision and accuracy of CSP measurements resulted in four different situations, as illustrated in Figure 1.CSP-measurements can be accurate and precise; the ideal setting we call "the green bullseye."These CSP-measurements measure CSP accurately, reflecting actual practices as all the bullets hit the same spot every time and the spot is in the middle.This illustrates that what is measured is actually CSP, i.e. it has high construct validity.CSP-measurements can be precise, but inaccurate.This is an example of the streetlight effect, in which researchers or managers measure what is easy to measure without concerns about the accuracy or appropriateness of the data.An SBSC can be precise but inaccurate.SBSCs ensure structure but often comprise only a small number of indicators.Organizations can potentially be high performers in some areas of CSP (e.g.human rights, training and safety, or reducing emissions) but low performers in other places that are not being measured.Thus, SBSC risks measuring only a minor part of the construct, as it is built on limited input, i.e. construct validity is low.
Next, CSP measurements can be accurate, but imprecise.In this greensplashing-effect, the "bullets" hit close to the center, which is the accurate reflection of CSP but are spread out widely.This can occur when indicator frameworks or models are used.They allow more isolated indicators, which increase accuracy but can be imprecise.The fourth and last setting is when CSP-measurement is neither precise nor accurate.In this situation, we do not JGR hit close to the target (i.e.CSP), and the results could be different each time the measurement is completed.This is an example of the GIGO-effect: the lack of integration into other systems and the lack of structure in the data collection could affect the appropriateness of input data.

Conceptual framework of CSP measurements
We developed a conceptual framework based on a systematic literature review of CSPmeasurements.Figure 1 presents different categories of CSP-measurement, elaborates on their specific attributes and provides suggestions for their context-specific applications.
The conceptual framework highlights the findings in our review, including the categorization of CSP-measurements available, related issues with accuracy and precision and suggestions for their context-specific use.The framework provides a comprehensive overview of the essential aspects to be considered in CSP-measurements.See Table A4 in the Appendix for an overview of the papers related to each category.

Discussion
This study analysed different CSP-measurements by providing a conceptual framework that guides future research in assessing the context, accuracy and precision of proxies for the CSP construct at the organizational level.Our systematic literature review of 74 papers dealing with CSP-measurements from 1987 to September 2021 found that CSP has drawn increasing interest during the decade.This study makes several contributions to research and practice.

Implications for research
Our review's main theoretical contribution is the development of a conceptual framework for CSP-measurements to be used in future research.Firstly, the four identified categories of CSP-measurement: (1) isolated sustainability indicators; (2) sustainability frameworks; (3) SBSC; and (4) SPMS.
Guide future research with this more nuanced picture to position and develop theories and contributions in CSP-measurement (also for example to investigate links to external reporting models, such as integrated reporting).Secondly, our suggestions for contextspecific applications develop CSP-measurement theory by demonstrating trade-offs in the application of the four different categories and showing context-specific situations where each category is suitable.Studying context-fitting CSP-measurements is also vital for being able to report CSP-performance, for example through an integrated report.Thirdly, our conceptual framework contributes to theory by identifying problems with CSP measurement precision and accuracy, constructing four challenges with respect to the categories: the GIGO-effect, the streetlight effect, greensplashing and a methodological challenge.Methodologically, researchers should take these challenges into account when doing research.

Implications for practice
This systematic review has several practical implications.Firstly, managers should be aware of the different practices and difficulties of CSP measurement.These pitfalls include CSPmeasurements that are susceptible to GIGO-, greensplashing and streetlight effects.Secondly, we identify and structure four levels of CSP measurements and discuss them according to their accuracy and precision.This will help managers and regulators improve their understanding of CSP measurements.By suggesting appropriate, context-specific applications, this study advances the process of identifying CSP measurements that are both appropriate for a specific organization and sufficiently accurate and precise in the given context.In general, this study can help organizations to address the evolution of CSP measurements.

Limitations
Despite the systematic approach, the transparency of the findings in this review is limited by our inherent values and beliefs, which can be difficult to express fully (Lueg and Radlach, 2016).In sampling papers, there is a risk of excluding studies in progress or published in other languages.By requiring specific terms to be listed in the title, abstract, or keywords of the paper, we may have excluded other relevant papers.However, we are confident that analysing the references of the 71 initially identified papers minimized this risk.Additionally, relevant papers were excluded because they were not published in a 1-4* journal, as this study aimed to review papers of the highest quality.Sustainability is a broad concept currently being explored from many different perspectives, such as in economics, policy, engineering, law and science-based fields.This study only focuses on the business perspective at the organizational level.

Future research opportunities
Our review of CSP measurements shows several research shortcomings.As European researchers are overrepresented (Eurocentrism), research should shed light on drivers of CS and CSP measurement on national and regional levels to account for this.In addition, American and Chinese samples were more prevalent.Future research should be more inclusive and develop different cultural contexts.There is ample opportunity, specifically in accounting and management, as these remain underrepresented.Accounting and management researchers can contribute by making the picture of CSP measurement more nuanced, as the principles of performance measurement are rooted in accounting and management principles.One specific area that needs more investigation is the link between CSP measurements and their external reporting, for instance through integrated reporting.Many studies have not applied a specific theory in their analyses, and future research should not continue this trend, as the lack of academic background hinders further development of research analysis and consistent argumentation.Also, CSP research often relies on ESG-scores as data input.We argue that this should be performed only after careful investigation of the data.
In addition, researchers have not fully exploited available data.Instead of manually coding or relying on publicly available data such as ESG-scores, they could analyse annual reports and websites using modern big data and AI-tools such as computer-aided text analysis (CATA).CATA can measure individuals' beliefs, perceptions and feelings imitated in written texts.It is specifically appealing because of its reliability and ability to process a large amount of data quickly (Short et al., 2010).Especially compared to proxy ESG-scores, we argue that CATA can lessen the streetlight effect, in that it focuses on actual performance data, the GIGO-effect, in that CATA-input is more reliable and less prone to human error, and the black box methodological challenge, in that CATA-dictionaries, if shared, make coding and results transparent.

JGR
Several practical gaps remain, including how general business-specific aspects, such as industry, country and organizational size, affect selected indicators or type of CSPmeasurement.This includes research on best practices for collecting valid and reliable data, including transforming qualitative data into quantitative data, and interpreting, evaluating and communicating related information.Additionally, researchers should seek to answer how sustainability accounting, management control and sustainability reporting are linked, and whether this affects CSP measurement.In general, sector-specific research and field studies are lacking.Many CSP measurements are still not tested in case studies or are only tested across one or a few organizations in isolated industries.

Conclusion
In conclusion, this article provides a conceptual framework for measuring corporate sustainability performance (CSP) and evaluates different measurement approaches based on their accuracy and precision.The study conducts a systematic literature review of 74 papers and identifies four categories of CSP measurements: isolated indicators, indicator frameworks, sustainability balanced scorecards (SBSC) and sustainable performance measurement systems (SPMS).The analysis reveals trade-offs between accuracy and precision for each category, with SPMS identified as the ideal measurement approach.The article offers context-specific suggestions for the use of these measurements and discusses their implications for research and practice, highlighting the challenges and opportunities in the field of CSP measurement.Notes 1.For brevity's sake, we use CSP; not to be confused with corporate social performance (Wood, 1991), which focuses on corporate social responsibility, corporate social responsiveness and social impacts, programs and policies.
2. For further information on the methodology, research questions and data collection, see Table 4 in Appendix.
3. Figure 1 in Jiang et al. (2018) provides an example of many best-practice approaches.
4. Coined in 1957 about US Army mathematicians, the GIGO effect is now also widely used in business, IT, computer science and data science (Hanson et al., 2023).

Appendix. Systematic literature review
As described in the methodology section, we conducted a systematic literature review in three stages.In Stage 1, we planned the review by following a non-structured snowball approach (Morioka and Carvalho, 2016a;Nawaz and Koç, 2018) to better understand the topic keywords in relevant databases.The search revealed an essential difference between "assessment" and "measurement."Maas et al. (2016) state that the keyword "measurement" mainly depicts an internal and managerial perspective of the decisionmaking process, while "assessment" focuses on external reporting.However, not all studies have consistently used these terms.Researchers argue that external assessments require internal systems to provide information, so both measurement and assessment approaches require internal and external elements (Maas et al., 2016;Silva et al., 2019).Hence, we included both terms in the search string.
In Stage 2, we conducted the literature review based on a seven-step approach (Tranfield et al., 2003).Keywords were selected based on the initial search in Stage 1.We searched the following string in either title, abstract or keywords: (corporate sustainab* measur* OR corporate sustainab* assess* OR sustainab* performance measur* OR sustainab* performance assess* OR sustainab* management measur* OR sustainab* management assess*).We searched two separate databases (EBSCO Business Source Complete and Scopus), which initially yielded a sample of 1,415 papers.The search was conducted in September 2021.
In the second step of Stage 2, we set research boundaries by setting the language to English only and the timeline to 1987-September 2021 (the Brundtland Commission report was released in 1987).As we focus on the most impactful journals with the highest quality, we only included the journals rated 1-4* in the 2021 Association of Business Schools (ABS) Academic Journal Guide (AJG).The AJG is stable, widely-used, focuses on business, rather than all sciences (Walker et al., 2019), and scores journals similar to other well-known rankings, such as the Harzing-list (Harzing, 2023).This reduced the number of studies to 664.
In the third step of Stage 2, we performed a cursory analysis of the titles of the 664 papers.We only included papers that specifically dealt with CSP-measurements and viewed CSP as a threedimensional construct.Research often examines smaller parts of CSP by focusing on only one of the three dimensions (economic, social or environmental), with the risk that the internal relationships between the dimensions are overlooked.Additionally, the measurement must contain an organization-level view.This step reduced the number of papers to 140.
In Step 4, we screened the 140 papers by reading abstracts to include papers that met the mentioned inclusion criteria.This reduced the number of studies to 87.We excluded several papers that did not focus on CSP measurements or elaborated on intra-organizational relationships and unidimensional measures.We also excluded studies on non-organization-level measures related to national, project or product-level measures.We also eliminated studies on investments, reporting, lifecycle sustainability and not-for-profit organizations because they do not balance or prioritize financial objectives.In Step 5, we eliminated six duplicates as two separate databases were used.We performed full-text reading of the remaining 81 papers in the sixth step.Ten of these papers were excluded because they did not meet the inclusion criteria, primarily because of a lack of focus on CSP measurements.A total of 71 papers were included in the sample.We investigated all their references, which helped identify three additional relevant papers.We analysed the resulting 74 papers using quantitative and qualitative content analyses of the research questions.Specifically, we read all the articles more than once to become immersed in the data.As we applied deductive content analysis, we developed a structured categorization matrix (Elo and Kyngäs, 2008) and coded data according to the following categories: measurement approach proposed, research objective, research methodology, data sources and data processing.We also gathered information about the validity and reliability of the tests and data related to the systematic literature review.

JGR
In Stage 3, the final stage, we reported the results based on the PRISMA-framework.Table A1 displays our processes across all three stages.Table A1.Three-stage approach used in the literature review

Stages Main activity
Stage 1-planning the review We followed the non-structured snowball approach to obtain a better understanding of the topic, keywords and relevant databases.
Figure 1.CSP measurement categories and issues related to accuracy and precision The search revealed an important difference between the keywords assessment and measurement Stage 2-Conducting the review Step 1: Identification of keywords, search string and relevant databases Databases: EBSCO business source complete and scopus -"Corporate sustainab*" measure* -"Corporate sustainab*" assess* -Sustainab* performance assess* -Sustainab* performance measure* -Sustainab* management assess* -Sustainab* management measure* -Sustainab* performance evaluation !Result: 1,415 papers Step 2: Development of research boundaries -Language: English -Timeline: 1987-september 2021 -Only 1-4* journals according to the ABS Journal Guide !Result: 664 papers Step 3: Cursory analysis by reading titles Inclusion of papers based on their relevance by reading title !Result: 140 papers Step 4: Including papers meeting the inclusion criteria by reading abstracts -Deal with SPM measurement -Three-dimensional view of SPM -Corporate level analysis !Result: 87 papers Step 5: Removal of duplicates -Removing 6 duplicates !Result: 81 papers Step 6: Full-text screening -Including papers based on inclusion criteria from step 4 !Result: 71 papers Step 7: Investigating references -Inclusion based on inclusion criteria from step 4 !Result: 74 papers Stage 3 -Reporting Reporting of the literature review based on PRISMA framework Source: Authors' own work Sustainability performance measurement