Accountability by design? Exploring design characteristics of corporate social responsibility standards

Purpose – While corporate social responsibility (CSR) standards are amongst the most widely adopted instruments for supporting firms in becoming more accountable, firms who adopt them frequently fail to comply. In this context, the purpose of this study is to explore to what extent CSR standards are designed for accountability. In the analysis, this paper investigates design characteristics related to accountability across different standard types, namely, principle-based, reporting, certification and process standards. Design/methodology/approach – This study reviews the design characteristics of 50 CSR standards in a systematic and comparative fashion. This paper combines qualitative deductive coding with exploratory quantitative analyses methods to elucidate structural variance and patterns of accountability-related design characteristics across the sample. Findings – This study finds that the prevalence of design characteristics aimed at fostering accountability varies significantly between different types of standards. This paper identifies three factors related to the specific purpose of any given standard that explain this structural variation in design characteristics, namely, implementability, comparability andmeasurability. Practical implications – Non-compliance limits the effectiveness and legitimacy of CSR standards. The systematic exploration of patterns and structural variation in design characteristics that promote accountability may provide valuable clues for the design of more effective CSR standards in the future. Social implications – Better understanding the role of design characteristics of CSR standards is critical to ensure they contribute to greater corporate accountability. Originality/value – This study strives to expand the current understanding of the design characteristics of CSR standards beyond individual cases through a systematic exploration of accountability-related design characteristics across a larger sample.


Introduction
Firms affect billions of people across the world through their products, operations and value chains. Increasingly, they have also been recognized as a major driver of sustainable development Schönherr et al., 2017;Lenssen and Blowfield, 2012;Dyllick and Muff, 2016). However, in response to the global economic, financial and social crises that have emerged since 2008/2009, public trust in the ability of businesses to drive positive social change is diminishing (Bies, 2014;Curran and Eckhardt, 2020;Gardels and Berggruen, 2017;Witt, 2019). In response, many firms have taken strides towards integrating corporate social responsibility (CSR) into their business operations (Martinuzzi and Krumay, 2013;Pivot Goals, 2017;Schönherr et al., 2017). For instance, 90% of S&P 500 Index companies set sustainability-related management objectives and published CSR reports in 2019 (Peterson et al., 2020).
CSR standards have emerged as one of the most prevalent instruments for supporting firms in becoming more accountable for the social and environmental sustainability of their operations (Blankenbach, 2016;Christensen et al., 2019;Derkx and Glasbergen, 2014;Leipziger, 2017Leipziger, , 2003. Some well-known examples of such standards include the United Nations Global Compact (Arevalo et al., 2013;Rasche, 2009) or the ISO 14 000 families of environmental management standards (Hahn and Weidtmann, 2016;Heras-Saizarbitoria and Boiral, 2013;Popa and Dabija, 2019). However, there is evidence pointing towards the frequent failure of firms to implement CSR standards effectively (Michelon et al., 2016;Fransen and Kolk, 2007).
Hitherto, firms that adopted CSR standards were blamed for failing to achieve compliance (Banerjee, 2011;Ählström, 2010;Dietz et al., 2019). However, there is mounting evidence that the design characteristics of the CSR standards themselves, specifically the accountability mechanisms they contain, should be considered as an antecedent of later compliance or non-compliance (Simpson et al., 2012;Wijen, 2014;Milne and Gray, 2013;Fraser et al., 2020). Nevertheless, the design characteristics of CSR standards remain underresearched (Reinecke et al., 2012;Behnam and MacLean, 2011). This gap is becoming increasingly relevant in light of the growing importance of voluntary CSR standards in international CSR governance (Albareda and Waddock, 2017;Scherer and Palazzo, 2011;Waddock, 2008).
Extant studies (Slager et al., 2012;Windolph et al., 2014;Zinenko et al., 2015;Rasche, 2009;Behnam and MacLean, 2011;Grimm et al., 2020) on the design characteristics of CSR standards have largely consisted of (multi) case study designs and conceptual contributions. They provide important insights into the design characteristics of individual standards (Zinenko et al., 2015;Vigneau et al., 2015;Schuler and Christmann, 2011;Delmas and Montes-Sancho, 2011;Grimm et al., 2020) and/or theorize about the broader relevance of design characteristics to accountability (Wijen, 2014;Haack and Schoeneborn, 2015). An exploration of patterns and variance in design characteristics across a larger sample of CSR standards is still wanting, however.
The purpose of this paper is to explore to what extent CSR standards are designed for accountability. We review the design characteristics of 50 CSR standards in a systematic and comparative fashion. In doing so, we elucidate patterns in accountability mechanisms across a plethora of extant CSR standards. On the one hand, this approach serves to explore more widely the reliability of the results yielded by previous work on the design characteristics of CSR standards. On the other hand, we strive to build on existing work by expanding the current understanding of the design characteristics of CSR standards beyond individual cases.

SAMPJ
We find that different types of standards display significant variations in the accountability mechanisms they contain. Maybe counter-intuitively, the type of standard setter does not seem to have a bearing on the degree to which standards promote accountability by design. Our data suggest that there are different design strategies standard-setting initiatives may pursue when it comes to accountability mechanisms. Firstly, some standards focus on "comparability", i.e. the degree to which a standard enables comparisons across time and across adopters. Secondly, standards can be designed to enhance "measurability", i.e. the degree to which evaluation criteria and performance metrics are well-specified and quantifiable. The third strategy prioritizes "implementability", i.e. the degree to which guidance on effective implementation is available to support adopters in achieving compliance. Consideration of this structural variance is relevant because it may provide valuable clues for the design of future CSR standards and can inform more targeted research on their effectiveness.
The remainder of this paper is organized as follows: Section 2 provides a review of the literature on CSR standards, contextualizes and elaborates on the design characteristics of CSR standards. Section 3 presents the research design of the present study and provides a description of the sampling process, coding strategy and data analysis methods used. Section 4 provides an overview of the main findings of the study, focusing specifically on the structural variation in the design characteristics of CSR standards. The findings are discussed and linked back to the literature in Section 5. We conclude by providing an outlook on opportunities for future comparative research on the design characteristics of CSR standards.
2. Literature review 2.1 Conceptualising corporate social responsibility standards from a neo-institutional perspective This study applies a neo-institutional lens (Brammer et al., 2012) and conceptualizes CSR standards as part of the "set of legal, cultural and institutional arrangements that determine what [. . .] corporations can do, who controls them, [and] how this control is exercised" (Blair, 1995, p. 19). From a neo-institutional perspective, CSR standards are viewed as a non-legal ("soft") form of regulation for firm behaviour (Scherer and Palazzo, 2011;de Bakker et al., 2019). Voluntary in nature, they address sustainability issues, particularly in transnational arenas beyond the jurisdiction of individual nation-states (Potts et al., 2014;Schleifer, 2019;Schleifer et al., 2019;Fransen et al., 2019). The adoption of CSR standards has become ubiquitous amongst firms (Perego and Kolk, 2012;Schönherr et al., 2019) and may even reach quasi-mandatory status when a standard comes to dominate a market or industry (Blankenbach, 2016;Ponte, 2012;Sippl, 2015).
More specifically, Gilbert et al. (2011, p. 24) define CSR standards as "voluntary predefined rules, procedures and methods to systematically assess, measure, audit and/or communicate the social and environmental behaviour and/or performance of firms". An essential function of CSR standards is to define socially and environmentally desirable practices and outcomes and to provide some sort of accountability of firms vis-à-vis stakeholders for their actions and omissions with regard to these practices and outcomes (Bebbington, 2009;Behnam and MacLean, 2011;Schons and Steinmeier, 2016).
To effectively enhance accountability, institutional arrangements need to be designed in a way that imposes credible requirements on firm behaviour and ensures that adopters will generally fulfil these requirements Grimm et al., 2020). Therefore, institutional arrangements require accountability mechanisms, which safeguard that the "rules of the game" are complied with. In other words, the design characteristics of the institutions that govern firm behaviour -CSR standards in our caseare an important predictor of whether they are adopted only in form or also in function (Behnam and MacLean, 2011;Wijen, 2014;Rasche, 2009). By adopting this perspective, we look beyond the characteristics and behaviour of the firm and focus instead on the antecedents of accountability that are applicable across firms.
2.2 Designing corporate social responsibility standards for accountability By accountability mechanisms, we mean those parts of the institutional arrangement which ensure that rules are complied with (Behnam and MacLean, 2011;Vigneau et al., 2015;Wijen, 2014). When legal sanctions are absent, as is the case for CSR standards because of their fundamentally voluntary nature, other accountability mechanisms are essential to ensure effective implementation of institutional arrangements (O'Dwyer et al., 2011;Perego and Kolk, 2012). While such accountability mechanisms do not provide full control over the potential opportunistic behaviour of adopters, they are nevertheless considered essential for incentivizing compliance (Wijen, 2014). There is a general consensus in the literature that CSR standards may vary considerably in terms of how accountability mechanisms are reflected in their design (Gilbert et al., 2011;Reinecke et al., 2012;Wiengarten et al., 2016;Rasche, 2014). This literature also distinguishes several specific design characteristics geared towards enhancing accountability.
Firstly, this includes guidance on effective implementation (Rasche, 2009). Supplementing a standard with guidance on its effective implementation is primarily geared towards addressing the problem of lack of capacity amongst corporate adopters , who may not have the knowledge, skills or experience required to interpret a standard in the spirit in which it was conceived or to build the organizational structures needed to implement it fully (Wijen, 2014). Such guidance may take a variety of forms, ranging from a checklist or written "implementation manuals" to training and even individualized intensive consultancy services for adopters. Many standard setters also provide best practice examples or build peer communities to ensure that adopters are fully aware of how to best implement the requirements of a standard (Komives and Jackson, 2014).
Secondly, a large majority of CSR standards include some form of assessment to check whether adopters comply with the requirements of the standard. However, the specificity of the metrics of such assessments may vary. On the one hand, standards may allow for narrative accounts of compliance, which are highly flexible and provide the opportunity to explore detailed and unstructured data of all kinds (Dalal-Clayton and Bass, 2011). On the other hand, highly specific indicator-based assessments are considered the most useful approach to achieving measurable, transparent and comparable results and consequently, they are a better foundation for accountability (Esteves et al., 2012;de RIDDER et al., 2007).
Thirdly, the sustainability issues covered by CSR standards are highly diverse, spanning a range from working conditions to biodiversity conservation. Concrete practices and their contribution to adequately addressing these issues are not always clear and/or observable (Tharani, 2019). For adopters, this poses the risk of not paying the required attention to issues, which may be perceived as fuzzy (Rasche, 2009). Standards that systematically specify salient issues, prescribe adequate practices and provide clear evaluation criteria for assessing compliance are, therefore, more likely to be implemented as intended. As Wijen (2014: 308) states: "detailed codification offers clear guidance and limits the room for divergent interpretation, thereby reducing ambiguity and uncertainty".
Fourthly, and closely related to the standardization of sustainability issues, is the extent to which standards prescribe adequate practices and provide clear evaluation criteria for SAMPJ assessing compliance (Wijen, 2014;Bebbington, 2009). For instance, Grimm et al. (2020) show that compliance with a standard can be substantially influenced by the evaluative criteria used by the standard-setter. Such criteria define what is considered successful implementation and provide a yardstick against which the performance of an adopter can be measured (Tharani, 2019). Consequently, the specification in a standardized way of sustainability issues and evaluation criteria play an important role in laying out what is actually required of corporate adopters and ensuring that rules are specific, concrete and not easily subverted (Behnam and MacLean, 2011).
Fifthly, the verification of compliancefrequently considered under the terms auditing, in relation to management standards (Fraser et al., 2020;Pruett et al., 2005) or assurance, in relation to reporting standards (O'Dwyer et al., 2011;Perego and Kolk, 2012)is arguably one of the best-documented accountability mechanisms in the scholarship on CSR standards. The increasing uptake of verification practices for greater accountability stems from classical accounting, where financial auditing has traditionally provided greater confidence in the accuracy and robustness of compliance claims (Milne and Gray, 2013). Literature on verification distinguishes first, second and third-party verification of accountability claims. In the space of voluntary CSR standards, however, the effectiveness of extant verification measures is disputed (Pruett et al., 2005;Fransen and Kolk, 2007;Perego and Kolk, 2012), even though third-party verification is generally considered one of the strongest accountability mechanisms available (TERWINDT and ARMSTRONG, 2019). Therefore, some standards focus on improving verification, rather than setting CSR performance requirements per se. For instance, this is the case for the AA 1000 standard by AccountAbility (Rasche and Seidl, 2019;Rasche, 2014) Sixthly, benchmarking can complement verification by auditors as a softer accountability mechanism. Benchmarking is intended as a peer control mechanism whereby other stakeholders are enabled to compare information and performance scores against other standard adopters, standard requirements or best practices (van Kersbergen and van Waarden, 2004). Some standards particularly stress benchmarking as a mechanism for continuous improvement, both of the compliance of adopters and of the standard itself (e.g. GLOBAL G.A.P., Hachez and Wouters, 2011).
Finally, for some CSR standards, the award of a label or certificate (on a product, facility, firm or supply chain) is the main purpose. However, certification options, both mandatory and voluntary, can also be used as an inducement for adopting firms to better comply with standard requirements (Chkanikova and Sroufe, 2020;Christmann and Taylor, 2006). For instance, the B Impact Assessment Standards can be independently used by firms to assess and improve their sustainability performance (Villela et al., 2019). However, a minimum score is needed to qualify for certificationwhich, in turn, provides a host of new benefits and additional legitimacy to adopters. In turn, certification can be withheld or firms can be de-listed from public records of certificate holders, thus functioning as a punishment for non-compliance (Feng et al., 2016;Overdevest, 2010;Richards et al., 2017).

Classifying corporate social responsibility standard types and standard setters
The extant literature notes that there is a substantive structural variation amongst the plethora of existing CSR standards (Marx, 2013;Fransen et al., 2019;Reinecke et al., 2012). Consequently, several attempts have been made to develop useful taxonomies and classification systems (Behnam and MacLean, 2011;Rasche, 2009;Brunsson et al., 2012;Timmermans and Epstein, 2010). This is a difficult endeavour, however, not least because new standards tend to incorporate elements of their predecessors to form new hybrids, which defies any attempt to develop mutually exclusive categories (Rasche, 2014).
One vector along which CSR standards can be classified is the main purpose for which they are developed . This is useful because the purpose of a CSR standard predetermines the specific design characteristics required to attain that purpose. Drawing on Rasche (2014), we distinguish four types of standards.
Firstly, principle-based standards are amongst the longest-established CSR standards, with the first examples appearing as early as the 1970s. As their name suggests, they strive to formulate basic principles of responsible corporate conduct, sometimes loosely coupled with a list of desired outcomes (Rasche, 2009(Rasche, , 2014. Examples of this type of standard are the AA1000 AccountAbility Principles Standard (Rasche and Seidl, 2019), the OECD Guidelines for Multinational Enterprises (Reinert et al., 2016;Liberti, 2012) or the Caux Round Table Principles for Moral Capitalism (Carroll, 2013).
Secondly, certification standards aim to monitor design, production and trade practices with a view to awarding labels (for products) or certificates (for facilities or entire firms) that firms can use to signal to stakeholders (e.g. consumers, supply chain partners, investors) that they conform to ethical, environmental and social requirements (Boiral and Gendron, 2011;Richards et al., 2017). The Forest Stewardship Council (Overdevest, 2010;Sippl, 2015), SA 8.000 (Sartor et al., 2016), as well as the Fairtrade Standards (Schuler and Christmann, 2011), are widely adopted examples of this standard type.
Thirdly, some standards focus on processes and management practices related to CSR, without awarding labels or certificates. They are geared towards enabling adopters to create appropriate governance and management structures for the discharge of their social and environmental responsibilities and are frequently accompanied by guidance on best practices (Zinenko et al., 2015;Rasche, 2014). ISO 26 000 (Hahn, 2013;Popa and Dabija, 2019) and the Natural Capital Protocol (Whitaker, 2018) are apt illustrations of this standard type.
Fourthly, the increasing adoption of non-financial reporting by firms has led to the emergence of reporting standards (Burritt and Schaltegger, 2010;Tschopp and Nastanski, 2014). These standards aim to harmonize reporting practices across firms, sectors and regions to generate more comparable information that can be used by stakeholders to hold firms accountable (in the same way that accounting standards provide relevant information enabling shareholders to hold managers accountable) (Christensen et al., 2019;Milne and Gray, 2013). The Global Reporting Initiative's sustainability reporting standards (Vigneau et al., 2015;Pope and Lim, 2020) and the International Integrated Reporting Council's Integrated Reporting Framework (Kannenberg and Schreck, 2019;Vaz et al., 2016) are arguably the most prominent examples of this type.
A second vector along which CSR standards may be classified is the type of standardsetting organization or initiative (Potts et al., 2014;Gilbert et al., 2011;Reinecke et al., 2012). The organizations and initiatives that develop CSR standards may take a variety of different forms and involve different stakeholder groups (de Bakker et al., 2019;Christmann and Taylor, 2002). Several studies suggest that the type of standard setter (also termed "sponsor", Carmin et al., 2003) influences the inclusivity of the standard design process and consequently the design characteristics of the final published standard (Prakash and Potoski, 2007;Terlaak, 2007;Tschopp and Nastanski, 2014). For instance, some authors have argued that standard-setting processes spearheaded by business (including business associations, business networks and consultancies) tend to be limited by their focus on ensuring that issues relevant to the firms themselves are reflected in the design of the standard, rather than issues of societal and environmental relevance (Carmin et al., 2003;Schuler and Christmann, 2011). This, some scholars argue, has resulted in a "race to the bottom" where accountability mechanisms are concerned. Standards advanced by businesses, they argue, compete for adopters by lowering the bar relative to more stringent SAMPJ standards provided by other types of standard setters such as non-profit or international organizations (Reinecke et al., 2012;Fransen et al., 2019).
For the purposes of this study, we distinguish between five different types of standardsetting initiatives (Table 1). Business-led initiatives include for-profit organizations as well as organizations exclusively representing the interests of firms (e.g. business clubs and business associations). In contrast, non-profit organizations (NGOs) cover all non-profit and non-governmental organizations, except those exclusively representing business interests. The third type of standard setters is international organizations (IGOs) that differ from NGOs in that they are comprising public, inter-governmental initiatives. Increasingly, standards are developed by multi-stakeholder initiatives (MSIs). Such initiatives involve two or more different types of stakeholders (e.g. business, civil society, governments or IGOs). Finally, the above-mentioned types of standard setters may also partner to develop new standards for a limited period while maintaining their independence as separate entities. We call such initiatives partnership initiatives.

Research design and methods
The research design presented hereafter is geared towards an open-ended exploration of the design characteristics in CSR standards. More specifically, it aims to provide insights into the degree to which CSR standards differ from each other regarding the accountability mechanisms they contain. For this purpose, we apply a mixed methods research design drawing on an inventory of 50 CSR standards (Grafton et al., 2011). To our knowledge, this is the first exploratory comparative review covering such a large sample of CSR standards.

Sample
We inventoried extant CSR standards by following a purposive sampling approach. This qualitative approach is particularly useful for achieving a degree of representativeness and comparability across a population that is difficult to define in its totality (Teddlie and Yu, 2017). Given that, to our knowledge, a complete database of CSR standards does not exist, Includes non-governmental and non-profit organizations (except for those exclusively representing the interests of for-profit organizations) Oxfam international and fairtrade labelling organizations International International organization (IGO)

Multistakeholder initiative (MSI)
Includes initiatives and organizations involving two or more different types of stakeholders (such as business, civil society, governments or international organizations)

Partnership initiative (PI)
Includes partnerships of at least two organizations from amongst the above-mentioned sponsor types, who collaborate on a specific standard-setting project but also continue to exist as separate entities Gender equality principles initiative Accountability by design we compiled a dedicated inventory of some 50 standards for the purpose of this study ( Table 2). The inventory drew on three distinct sources: (1) we reviewed the academic literature on CSR standards (Table 1), The CSR standards included in the final sample were selected to generate the maximum variety of individual cases to enable a rich and fruitful comparative assessment (Teddlie and Tashakkori, 2010). More specifically, we selected CSR standards offered by different types of standard setters (business, non-profit organizations, international organizations and multi-stakeholder and partnership initiatives) and CSR standards of all four standard types (principle-based, certification, process and reporting standards). To ensure a basic level of comparability, we included only standards that are applicable at the international level (i.e. no standards specifically developed for a single national or regional context), standards that can be adopted and/or implemented by firms directly (i.e. no ratings or indices) and we avoided standards that are only applicable to one type of product (notwithstanding, some sector-specific standards were included). In addition, we focused on non-proprietary standards to ensure a comparable level of access to information. The sample contains CSR standards first released as early as the 1970s (e.g. the ILO's labour standards) but also includes recently issued exemplars that have received significant attention (e.g. the Natural Capital Coalition's Natural Capital Protocol). As many standards are regularly revised and updated, we consistently chose the latest available version of each CSR standard for our research.

Coding strategy
The CSR standards were exclusively attributed to one of the four standard types based on their stated purpose as well as to one category of standard-setter based on the stated organization or initiative that issues the standard (Table 1). The coding for these two variables was binary (1part of the group; 0not part of the group).
The individual standards were then deductively coded for the seven accountability mechanisms described in the literature review (Table 3), namely, the guidance on effective implementation, the specificity of metrics, the standardization of issues covered, the standardization of evaluation criteria, the verification mechanisms, benchmarking and the availability of certification options. We reviewed the standard texts as well as the supporting documentation and used the information contained therein to build our coding matrix.
We used magnitude coding to assign distinct levels of intensity to each accountability mechanism (Saldaña, 2016). This type of coding affixes an additional alphanumeric code to an established qualitative category, and thus allows the data to be transformed in such a way that the codes can be used for quantitative statistical analyses. For our purposes, we coded our data along a five-point scale, with 1 indicating the lowest and 5 indicating the SAMPJ #    Table 3. Codebook including design characteristics and assigned levels of intensity SAMPJ highest level of intensity for each accountability mechanism (Saldaña, 2016). Missing values were coded as 0 (Table 3). For each accountability mechanism, all 50 standards were independently coded by two researchers. In the case of divergent coding, the respective cases were discussed between the coders until intercoder agreement was achieved. The resulting coding matrix contains a distinct design profile for each CSR standard and was used as a basis for the comparative assessment.

Data analysis
All statistical analyses were carried out using SPSS. Firstly, a descriptive analysis of the frequency distribution and mean values were conducted. Mean, minimum and maximum values, as well as standard deviation were computed for each standard type and for the different groups of standard setters to prepare the data for further analysis.
Secondly, we carried out analyses of variance (ANOVAs). This approach is particularly appropriate for comparing groups and detecting significant differences between them (Ross and Willson, 2017). We tested for the two independent variables (standard type and standard-setter), respectively, to explore whether standards belonging to different groups differed significantly based on their accountability mechanisms.
Thirdly, the relative importance and weight of each accountability mechanism were elicited through a principal component analysis (PCA) (Vidal et al., 2016) to explore our data further. We first extracted the initial eigenvalues and built a correlation matrix, identifying the principal components that explained the largest variance within our sample. The most frequently used approach for the extraction of pertinent factors is the "root greater than one" criterion (Jolliffe and Cadima, 2016). Originally suggested by Kaiser (1960;cited in Cliff, 1988), this criterion retains only those components whose eigenvalues are greater than one. The reasoning behind this criterion is that an eigenvalue less than one implies that scores on the component have negative reliability (Cliff, 1988).
Subsequently, components fulfilling this criterion were used to calculate rotated component scores for all seven accountability mechanisms with varimax rotation and Kaiser normalization (Table 8). The rotated component matrix, sometimes referred to as the loadings, contains estimates of the correlations between each of the variables and the estimated components. Factor loadings of >0.5 are usually considered acceptable; any loadings <0.4 are considered trivial (Jolliffe and Cadima, 2016).

Limitations
Our research design also has some limitations. Our analysis was limited to seven design characteristics pertinent for accountability. This does not preclude that other design characteristics may play a role as well. In the same vein, our analysis only allows a comparison of accountability mechanisms amongst the sampled standards. This means that the findings of this study are not generalizable beyond our sample. More work is needed to establish whether the relationships derived from the stock of available case studies and conceptual work also hold up to empirical testing and can be generalized across CSR standards. Table 4 presents a univariate, descriptive analysis of the sample by standard type and standard setter. Principle-based, certification and process standards are well represented with 13 (26%), 14 (28%) and 18 exemplars (36%), respectively. As there are fewer widely applied reporting standards than there are other standard types, these are somewhat under-represented in the sample with five exemplars (10%). in addition, the sample reflects the full variety of potential standard setters with 11 exemplars (22%) issued by businesses, 12 exemplars (24%) offered by non-profit organizations (NGOs), 14 standards (28%) advanced by international organizations (IGOs), eight exemplars (16%) provided by multi-stakeholder initiatives (MSIs) and five (10%) by partnership initiatives (PIs)

Findings
The mean values, ranging from 2.12 to 3.28, show variation between the levels of intensity of the seven coded design characteristics for different standard types. This is to say that the higher the mean values, the better equipped the standards are to promote accountability. The variation between CSR standards from different standard setters ranges from 2.46 to 3.20. A higher mean value means a higher level of intensity of the respective accountability mechanism. Overall, certification and reporting standards show the highest mean values. Amongst the different types of standard setters, non-profit organizations advanced standards with the highest mean value.
One-way ANOVAs were conducted to explore the variance in accountability mechanisms between different types of standards as well as between standards advanced by different sponsors. We expected that the variance between the four types of standards would be significant. Our analysis (Table 5) confirms this expectation (F (3.46) = 8.998; p = 0.000).
Certification standards (M =3.28) tend to display the highest levels of intensity across all design characteristics. This is to say they are better equipped to ensure compliance compared with the other standard types under investigation. The group of certification standards also includes the case with the highest mean value in the sample (M = 4.43), notably the SA 8.000 standard by Social Accountability International. The group of We were also interested in exploring whether CSR standards vary according to the type of standard setter, as suggested by the literature. Accordingly, CSR standards by independent standard-setters such as non-profit organizations were expected to display higher levels of intensity of the investigated accountability mechanisms. While the descriptive statistics showed that NGOs produce standards with higher mean values than other standard setters (Table 6), this expectation was not confirmed. The variance between standards from different standard setters was found not to be significant (F (4.45) =1.779; p = 0.150).
Finally, the literature also suggests that CSR standards vary in terms of their specific combination of accountability mechanisms, irrespective of standard type or sponsor. We conducted a PCA to explore which factors contributed most strongly to the variance in accountability mechanisms between CSR standards. Table 7 presents the initial eigenvalues of the correlation matrix. Three reliable components with eigenvalues greater than 1 can be identified, which explain 63.06% of the total variance within the sample.
The rotated component scores (Table 8) reveal that "standardized evaluation criteria", "verification" and "certification" contribute to the factor load of Component 1. "Metrics" and "standardized issues" contribute to Component 2. "Guidance on effective implementation" is the primary contributor to Component 3. All other values were below the threshold (<0.5) and are not displayed for the sake of clarity. Benchmarking did not significantly contribute to any of the components and can, therefore, be considered negligible in explaining the variance between CSR standards. Accountability by design Component 1 explains 29.93% of the variance between CSR standards. The accountability mechanisms loading onto this component all contribute to ensuring that the degree of compliance and effective standard implementation is objectively verifiable, can be assessed in a standardized manner and is comparable across adopters. We, therefore, name this component comparability (Table 9). CSR standards that have been found to display a particularly high degree of comparability include the GLOBAL G.A.P. standards, a global quality and sustainability certification system for the agricultural sector (normalized PCA score of 1.000); the FLO Fairtrade standards by Fairtrade Labelling Organizations International (normalized PCA score of 0.878); and the ISO 14 046 environmental management standard for water footprinting (normalized PCA score of 0.866). Component 2 explains 18.73% of the variance between CSR standards. The design characteristics contributing to this component refer to the degree to which the sustainability issues addressed by a standard are well specified and the outcomes related to these sustainability issues can be quantitatively measured. We, therefore, use the term measurability (Table 9) to describe this component. CSR standards that perform particularly well for measurability include the B Impact Assessment, a set of sustainability standards for firms that systematically strive to generate social and/or environmental benefits in addition to economic returns (normalized PCA score of 1.000); the greenhouse gas (GHG) Protocol's Corporate Standard for assessing GHG emissions (normalized PCA score of 0.974); as well as the Oxfam Poverty Footprint, a standard for assessing corporate impacts on poverty (normalized PCA score of 0.900).
Component 3 explains 14.4% of the variance between the investigated CSR standards. Only one factor, notably "guidance on effective implementation", substantially loads onto this component. Guidance on effective implementation is geared towards building capacity amongst adopters for achieving compliance with the spirit and the word of CSR standards. We, therefore, name this component implementability (Table 9). Standards that obtained particularly high scores for implementability include the BSCI Code of Conduct by the Business Social Compliance Initiative (normalized PCA score of 1.000); the Natural Step by TNS International (normalized PCA score of 0.840) and the Higg Index, a standard for social and environmental performance in the textile sector developed by the Sustainable Apparel Coalition (normalized PCA score of 0.801).
When comparing the normalized PCA scores across the individual components, we find that 24 (48%) of the sampled CSR standards do perform well with a score of 0.8 or higher against at least one component. However, only 2 (4%) of the sampled standards obtain a score of 0.8 or higher for two components. None of the sampled standards receive a high score of 0.8 or higher for all three components.

Discussion and conclusions
Prior research on accountability in the context of CSR standards has mainly focused on the behaviour of firms who adopt them Windolph et al., 2014). In contrast, this study explores the accountability mechanisms embedded in the design of CSR standards as antecedents of (non-) compliance within firms (Auld and Gulbrandsen, 2010;Behnam and MacLean, 2011;May, 2007). Our study broadens the scope of analysis to a sample of 50 CSR standards that cover the spectrum of standard types and standard-setting initiatives currently available (Fransen et al., 2019). We ask: to what extent are CSR standards designed for accountability? Our findings confirm that principle-based, certification, process and reporting standards vary significantly in terms of their design characteristics and related accountability mechanisms. This is in line with classifications that have been advanced previously in conceptual and case-study-based contributions (Behnam and MacLean, 2011;Gilbert et al., 2011). The intensity of accountability mechanisms contained within CSR standards varies according to the type of standard. Certification standards as a group displayed a significantly higher intensity in accountability-related design characteristics than any other group and comprise the standard with the single highest mean value in the sample, SA 8.000 (Sartor et al., 2016). Principle-based standards as a group displayed the lowest intensity across all design characteristics, with the lowest mean value for the ILO Declaration on Fundamental Principles and Rights at Work (Islam and McPhail, 2011).
In light of this finding, the question arises: are certification standards generally preferable over principle-based standards when it comes to accountability? As both the highest and lowest intensity values are held by CSR standards focusing on labour issues, an interpretation based on a direct comparison between these two exemplars is possible. The certification standard SA 8000 was the first auditable CSR standard focusing on labour, health and safety issues globally (Sartor et al., 2016). According to Behnam and MacLean (2011), it stands out amongst CSR standards because it provides specific implementation guidance, presents sanctioning mechanisms for non-compliance, requires evidence of performance and exhibits high costs of adoption (Sartor et al., 2016). In contrast, the principle-based standard ILO Fundamental Principles is easy to adopt because it does not entail costly assessments or certification processes. This suggests that, between the two standards, SA 8000 is the one that is designed for accountability.
Yet, it suffers from what Rasche (2010) calls the limit of standardization in the CSR space: [. . .][standards] are never sufficient to take into account the contextuality and singularity that genuine corporate responsibility calls for. At best, standards can give corporations an idea about where reflections need to start and which issues are at stake. At worst, standards promote a "going-by-the-book" and "tick-the-boxes" attitude towards corporate responsibility, which has a marginal, if any, effect on real-life practices.
Considered from this perspective, the ILO Fundamental Principles might well be better suited to encourage reflection on labour issues and practices than SA 8000 (Islam and McPhail, 2011). For future work, this implies that when it comes to "hard" facts on accountability mechanisms, the standard type is clearly an important variable to consider. However, this does not preclude that CSR standards with a "softer" stance on accountability mechanisms have merits as well, especially when it comes to sustainability issues that are contested or newly emerging. Secondly, we analysed whether the type of standard setter has some bearing on the extent to which accountability mechanisms are embedded into the design of the CSR standards (Carmin et al., 2003). However, this proposition was not supported by the data. Rather, many of the CSR standards displaying a high intensity across the coded accountability mechanisms were issued by business sponsors. This is particularly interesting, as business initiatives are frequently under suspicion of putting business interests first and building communication smokescreens, rather than investing in ensuring substantive accountability Milne and Gray, 2013;Vigneau et al., 2015). Our analysis paints a rather different picture.
Overall, there was no significant difference in terms of accountability mechanisms between standards offered by the business, non-profit organizations, international organizations and multi-stakeholder or partnership initiatives. However, some scholars argue that an ongoing trend towards an increasing multiplicity of standards in the same sectors and issue areas have led to competition amongst standard setters and the ability of incumbents to choose the standard that requires the least change in practices (Fransen et al., 2019;Reinecke et al., 2012;Chkanikova and Sroufe, 2020). This suggests that researchers suspecting a "race to the bottom" in CSR standards should not merely look at the standardsetting organization or initiative. Rather, the number of available standards in a given sectoral or supply chain context might be a better starting point.
Thirdly, there are specific combinations of accountability mechanisms that matter for explaining the variance between CSR standards. We identified three principal components, which we named "comparability", i.e. the degree to which a standard enables comparisons across time and across adopters; "measurability", i.e. the degree to which evaluation criteria and performance metrics are well-specified and quantifiable; and "implementability", i.e. the degree to which guidance on effective implementation is available to support adopters in achieving compliance.
The GLOBAL G.A.P. standards performed best on comparability. This standard is implemented largely in international agricultural value chains with uneven terms of trade between smallholders in the global south, exporters and large retailers in the global north (Otieno et al., 2017). Because of the sectoral context, the standard is mainly intended to diffuse sustainable and socially responsible good practices. These are standardized and substantive guidance on implementation is provided. This fosters close collaboration along supply chains and, thus, has implications beyond individual firms that adopt the standards. Operating in a similar context, the FLO Fairtrade Standards and the FAO's SAFA Guidelines also perform exceptionally well on comparability.
It is notable that reporting standards, which explicitly aim for comparability in CSR disclosures, are not amongst the top scorers. One possible interpretation of this finding is that the ambition to achieve universal applicability across sectors makes it difficult to clearly define, measure and evaluate sustainability outcomes so that they are applicable across sector contexts, yet still relevant and useful for individual firms. This suggests that comparability might be easier to achieve within a specified sector context, where the salient sustainability issues are well defined. In this context, it is interesting to note that sponsors of reporting standards do increasingly account for sectoral specificities in their standards (e.g. the GRI Standards or the SASB Standards).
Measurability was most pronounced in the B Corp Certification standards. The standards are integrated into a comprehensive and regularly revised B Impact Assessment Tool, which is geared towards enabling participating firms to measure and improve their sustainability impacts (Villela et al., 2019). In the same vein, other standards performing well on measurability also stress their reliance on science-based targets and well-defined indicators and robust quantitative assessments, for instance, the GHG Protocol (Hickmann, 2017). This suggests that for high-scoring CSR standards in this area a focus on measurability is an actual design focus adopted by standard-setting initiatives when developing the standards, possibly even at the expense of other accountability mechanisms.

SAMPJ The Natural
Step and the Higg Index both achieved high scores for implementability. They contain elaborate guidance, consultancy and capacity building services as well as training opportunities for firms wishing to implement them. As such, these standards place emphasis on ensuring that adopting firms are well-equipped to understand the issues at stake, to translate them into their corporate context and to implement them in daily practice. At the same time, other accountability mechanisms display relatively low-intensity levels in these standards. As Rasche (2009) notes: sometimes a standard "is not designed as an enforcement tool for global rules, but reflects a learning network that fosters their implementation and dissemination" (Rasche, 2009, p. 201). This might be the case for standards that favour implementability.
For practice, our findings suggest that there are different design strategies that standardsetting organizations and initiatives can pursue. The sampled standards generally do not perform well on more than one component. It might be necessary to choose either comparability, measurability or implementability as the focal goal in standard design to achieve excellence (in our study this means a normalized PCA score of 0.8 or higher) [1]. This chimes with Wijen (2014), who argues that there may be irreconcilable trade-offs between these different factors.
However, there is also one example that displays a more balanced performance. The BSCI Code of Conduct, a process standard focusing on labour issues along the supply chains of retail, brand and importing firms, achieves medium to high scores >0.5 for all three components. The standard combines a number of different accountability mechanisms such as external verification, network collaboration and technical assistance for standard implementation. In addition, it links up with other related standards in the field (e.g. SA 8000) with the ultimate objective to create consistency and harmonization in firm practices (Terwindt and Armstrong, 2019). This goes to show it is possible, albeit rare, for standardsetting initiatives to adopt a design strategy that successfully addresses all three components.
For research, our study affirms the need to continue the investigation and comparison of the design characteristics of larger groups of standards from different perspectives (Wiengarten et al., 2016;Zinenko et al., 2015). No "best CSR standard" can be determined based on our analysis; we posit that the findings of our analysis should be carefully interpreted with consideration for the specific purpose for which each standard was designed. Nevertheless, our findings suggest that future studies would do well to consider standard design more explicitly when examining the performance of CSR standards, not only as regard accountability but also in regard to other important performance criteria such as legitimacy (O'Dwyer et al., 2011;Richards et al., 2017), contributions to organizational transformation (Martinuzzi and Krumay, 2013) or the achievement of substantial contributions to sustainable development (Schönherr et al., 2017;Milne and Gray, 2013), to list but a few. This study should be considered the first step towards such a broader consideration of CSR standards. It provides a first comparative exploration of a substantive set of CSR standards and the extent to which accountability mechanisms are incorporated into their design. Accountability by design