Attitudes toward organic products: a cross-national comparison and scale validation Actitudes hacia los productos orgánicos: una comparaci (cid:1) on y validaci (cid:1) on internacional de una escala

Purpose – This study aims to examine the formal and metric properties of Gil et al. ’ s (2000) scale of attitudes towardorganic products, which is themost popular scale to measure theseattitudes. Design/methodology/approach – The sample consisted of 4,992 household shoppers living in Hong Kong, Germany, Norway, Spain and the UK. The questionnaire was distributed using a third-party consumer panel, and the ﬁ eldwork was conducted using computer-assisted Web interviewing. The approach was based on con ﬁ rmatory factor analysis and measurement of invariance, as well as format analysis using a wording-syntacticandsemantic descriptive method. Findings – The scale re ﬂ ects an attitude-toward-object model approach. Its use has been heavily varied (in terms of wording, item semantics and the attributes to be measured). A two-factor structure that meets the metric conditions (reliability and validity) is found. However, the analysis of invariance shows that the scale behaves differently indifferent countries. Research limitations/implications – This scale offers a good starting point for measuring attitudes toward organic products. However, it requires re ﬁ nement to adapt to consumer evolution and improve its metric validity. Veri ﬁ cation of itsapplicability in cross-nationalstudiesis recommended. Originality/value – To the best of the authors ’ knowledge, this is the ﬁ rst study that assesses the format and quantitative characteristics of this scale on a cross-national level. For scholars and companies with international interests, preventing the use of scales with poor properties at the transnational level can improve the design of future studies and save money through a more informed choice of attitudinal scale. escala re un enfoque de actitud basada en el objeto. Su uso ha sido muy variado (en redacci (cid:1) on, semántica de sus redacciones y losatributos que mide). Se encuentra una estructura de dos factores que cumple con lascondiciones métricas ( abilidad y validez). Sin embargo, el análisis de invariancia muestra que la escalasecomportade diferente en distintospaíses.


Introduction
Over the past two decades, the organic product market has grown considerably year on year across all countries. There are two reasons for this growth. First, there is growing consumer interest in a cleaner, healthier form of consumption that also provides greater well-being (Hughner et al., 2007;Apaolaza et al., 2018). Second, a growing number of producers are abandoning standard products in favor of new products such as functional foods (Küster-Boluda and Vidal-Capilla, 2017) or organic products (Willer and Lernoud, 2019).
Despite this increasing demand for organic products, consumers are bombarded by conflicting information and negative news. Accordingly, organic products may be portrayed as a fraud (Miller, 2018) or equally as offering major benefits (Organic Trade Association, 2019). Likewise, there are varied, periodic reports of fraud in relation to these products SJME 24,1 (Pomranz, 2018;European Law Monitor, 2019), which has raised doubts over organic products' added value with respect to conventional products (Yu et al., 2018). This situation has also consistently elicited skepticism (Olson, 2017). In particular, this skepticism influences the attitudes that are a key antecedent to purchase intentions (Fishbein and Ajzen, 2010).
Attitudes are important because they help explain why consumers develop preferences. Similarly, they shed light on the precursors of consumers' willingness to embrace organic products. Attitudes also have implications in marketing and communication. In both cases, attitudes help predict future behavior and aid our understanding of how to drive changes in current behavior to increase the consumption of organic products (Thomas et al., 2015) or reduce the consumption of non-organic products (Peattie and Peattie, 2009). Given these implications and the fact that the phenomenon of organic products is evolving, it is important to measure attitudes toward organic products using a sound and cohesive method (Gerbing and Anderson, 1988). Doing so is especially important in areas such as consumer behavior, where the wide range of available theories allows for different ways of tackling and measuring the same phenomenon.
There is a proliferation of scales to measure attitudes toward organic products (Zotos et al., 1999;Gil et al., 2000;Onurlubas and Öztürk, 2015;Oroian et al., 2017), many of which have been used only once. However, this proliferation of measurement systems has been criticized (Bruner, 2003) both methodologically and practically. Methodologically, all measures must meet the requirements of validity and reliability (Nunnally and Bernstein, 1994). On a practical note, measuring phenomena in a simple, effective and precise manner is crucial because the measures applied to do so are used to evaluate the market and provide the basis for commercial and marketing decisions (Taticchi et al., 2010).
The scale developed by Gil et al. (2000) is the most widely used to measure attitudes toward organic products. This scale has been denoted as popular in relation to food (Mata et al., 2010). However, very few papers provide details of its nature, properties, validity and application. In addition, the literature shows that its use has not been systematic, and, to the best of our knowledge, no study seems to have undertaken cross-national validation of the scale. This issue is critical for transnational companies and cross-national academic studies. In both areas, valid and reliable instruments are required to measure consumer attitudes and other phenomena of interest because it is vital to design and control marketing strategies based on a realistic image of consumers. Moreover, it is crucial to make good comparisons in cross-national studies. For these reasons, testing the quality of measurement instruments is essential to avoid design problems and unreliable results. Therefore, the present study examines the formal and metric properties of Gil et al.'s (2000) scale. The research question or aim of this study is to ascertain whether this scale may be applied with a sufficient degree of confidence.

Attitudes toward organic products
Attitudes are an essential construct for understanding consumers' decision-making processes (Ajzen, 2008). According to functionalist theory (Argyriou and Melewar, 2011), attitudes can be understood as a person's evaluations, feelings and tendencies toward an object, which entails the associations that the individual makes between that object and the evaluation of the object. There are several ways to approach the nature of attitudes. For example, they are formed through cognitive learning (Eagly and Chaiken, 1993) or are an adaptive system derived from contextualized assessments (Schwarz, 2007). They may also refer to a predisposition toward stable cognitive behavior over time (Hawkins and Mothersbaugh, 2013) or maybe conceived as Attitudes toward organic products a mental state, due to similarity with the explanatory, predictive and evaluative capacity of individual traits (Bunge, 2017). The literature offers different ways of measuring attitudes, and two types of models are highlighted, namely, the multi-attribute attitude model and the ABC model. In the first model, a person's attitudes toward an object can be expressed as a function of the perceptions and beliefs about the attributes of the object, as well as the degree of importance that individuals attach to each of these attributes (Fishbein and Ajzen, 2010). In the ABC model, which is also known as the tripartite model of attitudes (Rosenberg and Hovland, 1960), attitudes are depicted as comprising affective, behavioral and cognitive components. This model is more flexible than the multi-attribute model because it allows more diverse relationships, whereas the multi-attribute model is restricted to a linear compensatory assumption.
In the study of organic products, most of the literature reports a significant direct or indirect effect of attitudes on purchase intentions (Thøgersen, 2009;Chen, 2009;Yadav and Pathak, 2017;Scalco et al., 2017) and purchasing behavior (Varela-Candamio et al., 2018). However, studies have also shown that the relationship between attitudes and actual behavior is not well established (Chen and Chai, 2010). For example, Gupta and Ogden (2009) showed that many consumers are reluctant to buy organic products, despite being highly concerned about environmental problems.
1.2 Gil et al's scale of attitudes toward organic products 1.2.1 Characteristics of the scale. In light of the important role that organic products are beginning to play, Gil et al. (2000) developed an instrument to measure attitudes toward such products. The proposed instrument has nine Likert-type items measured on a sevenpoint scale. Seven of these items express positive perceptions, and two items express negative perceptions. The original items (in English and Spanish) appear in the Appendix. In their seminal work, Gil et al. (2000) tested these items using a stratified random sample of 800 people from two Spanish cities. A different factor structure was found for each sample, with three factors found for one sample and four factors found for the other sample. The percentage of variance explained by these factors was 53 and 62 per cent, respectively.
One defining characteristic of a scale is whether it is reflective or formative (Diamantopoulos, 2008). Reflective scales mean that the underlying phenomenon exists per se, and the direction of causality runs from the construct to the measures (items). In contrast, formative scales are applied when the construct is a conceptual creation. The causality is inverse because the attributes (items) are what form the construct. In the case of this scale, no study has been identified that explicitly addresses this issue. However, several scholars (Chen, 2007;Braga Junior et al., 2014;Rojas-Méndez et al., 2015) have reported reliability scores that are greater than 0.70. These findings suggest that the scale is reflective because calculating reliability or convergent validity when using a formative approach does not make sense (Coltman et al., 2008). Scholars have also reported that the scale has convergent and discriminant validity (De Magistris and Gracia, 2008;Teng and Wang, 2015), but most studies have not confirmed these metric properties. Table I García et al. (1998), focusing on organic foods. It was subsequently published by the same authors in English (Gil et al., 2000), this time focusing on organic products in general. However, the initial Spanish and English versions (Sánchez García et al., 1998;Gil et al., 2000) are not exactly equivalent. Although the number of items and the structure are the same, the meaning of two items was modified in the second version. The SJME 24,1 Spanish scale has been used in its full form (Attieh, 2015) and a version with only basic descriptive words (Sánchez García et al., 1998;Rivera and Brugarolas, 2003). The scale has been administered in several languages such as Portuguese (Braga Junior et al., 2014) and Italian (De Magistris, 2004). However, it cannot be found in full in the languages in which it has been administered (Taiwanese/Chinese, Lebanese/Arabic and Croatian).
Given this situation, the research question investigated in this paper is whether the scale under analysis has adequate formal and metric properties for the cross-national application.

Participants and fieldwork
Participants. The participants were 4,992 adults aged 18 years or older who were responsible for household shopping. The participants resided in Germany (n = 838; 57.6 per cent women; n/a n/a n/a 2 Items in Italian Likert-type 1-5 Radman (2005) n = 179 Croatia 3 þ2 outer n/a n/a n/a n/a Items reworded Likert-type 1-5 Chen (2007)  n = 464 Spain 9 þ2 outer n/a n/a n/a n/a Items reworded Likert-type 1-5 Ventura-Lucas et al. n/a n/a n/a n/a Items reworded Likert 1-5 Stolz et al. (2011) n = 886 Germany 2 þ1 outer n/a n/a n/a 2 Items reworded Likert 1-5 Ventura-Lucas and Marreiros (2013) n = 214 Portugal 8 n/a n/a n/a 3 Items reworded Likert 1-5 Braga Junior et al.  (2015) n = 372 Lebanon 9 n/a n/a n/a n/a Original items a Likert-type 1-5 Rojas-Méndez et al. (2015) n = 137 Canada 4 0.82 n/a n/a 1 Likert-type 1-7 Drugova (2019) n = 1,009 USA 7 n/a n/a n/a 2 Items reworded Likert 1-5 Notes: N = sample size; a = Cronbach's alpha; CV = convergent validity; DV = discriminant validity; CR = composite reliability; outer = external items; n/a = not available; a the scale was administered in other languages Source: Compiled by the authors Attitudes toward organic products age M = 36.08 years, SD = 12.79), Hong Kong Special Administrative Region (SAR)-China (n = 1200; 55.1 per cent women; age M = 36.86 years, SD = 11.68), Norway (n = 840; 45.8 per cent women; age M = 37.24 years, SD = 13.53), Spain (n = 1011; 48.7 per cent women; age M = 36.76 years, SD = 10.86) and the UK (n = 1103; 54.1 per cent women; age M = 40.55 years, SD = 15.58). The study examined the main household buyers using Cint consumer panels for each country under analysis. Cint specializes in online surveys and computerassisted Web interviews (CAWI). To increase the representativeness of the sample, the origin of the subjects was randomized within each country, maintaining proportionality by population size. Quotas of age, sex and population of origin were used. Likewise, the size of the target sample was increased in each country to reduce the probability of Type II error (i.e. false negatives). Data gathering. CAWI was used because of its high coverage, ease of use and low cost of gathering responses. The countries where the data were collected had household internet penetration between 84 per cent in Germany and 97 per cent in Norway (The World Bank, 2019). Fieldwork was carried out between March and September 2019. During this period, no target country had campaigns that might have had a favorable or unfavorable effect on the image of organic products. Therefore, no information was considered to condition consumer attitudes (either generally or asymmetrically).
Target countries. The target countries were chosen because of their position in the organic food sales ranking (Willer and Lernoud, 2019). Three of the top 10 countries were selected: Germany (2nd), the United Kingdom (7th) and Spain (10th). Two countries outside the top 10 were also chosen: Norway and Hong Kong SAR-China. In Norway, the market share of organic products is 1.7 per cent, and there is poor awareness of organic labels (Siiskonen, 2015). In Hong Kong, educational programs on organic products have been implemented, and over 40 per cent of consumers claim that they buy organic products (Hong Kong Organic Resource Centre, 2012).

Questionnaire
Questionnaire. The core questionnaire included the items presented by Gil et al. (2000). All items were scored using the same seven-point Likert scale. Data on country of residence, age and sex were also collected. No personally identifiable information was gathered.
Translation of the scale. The core questionnaire was translated into traditional Chinese, Norwegian and German following a two-stage process. Using the original scale presented by Gil et al. (2000), professional translators were hired to translate the items into the target languages. Second, two bilingual experts, who were independent of the authors, and translators evaluated the equivalence of the items in each given language and the items in the English version (Harkness and Schoua-Glusberg, 1998). Two different experts were used for each language. They were asked the following question: "Considering each pair of statements in English and in your native tongue, to what extent do you think they are equivalent? (0 = not at all; 10 = completely)." The Kendall concordance coefficient and the average scores given to the translations were used to analyze the agreement between translators. The results for the Kendall test were W = 0.13 and Chi-square = 8.30 (df = 9, p = 0.31). Therefore, no differences between the independent experts were observed. The scores for the average degree of equivalence were as follows: 10 points for English-Spanish, 9.7 for English-Norwegian, 9.8 for English-German and 9.6 for English-Traditional Chinese. Therefore, the translations were accepted. Notably, no score for any item was lower than 8 out of 10. SJME 24,1 2.3 Nature of the scale Prior to the analyses, it was necessary to establish the measurement perspective that should be adopted when analyzing the scale under consideration. This issue is important because the correct choice can help to prevent Type I and Type II errors (Diamantopoulos and Siguaw, 2006) and can determine the type of analyses to perform. According to Coltman et al. (2008), the theoretical considerations are: the nature of the construct; the direction of causality; and the characteristics of the items used.
From an empirical perspective, the considerations are: correlation among items; similar sign and significance with the precursors and effects; and the need to identify measurement error.
The first consideration is met because Gil et al.'s (2000) scale intend to measure the latent construct of attitude (toward organic products). This attitude exists (it is not a formal construct that uses indicators), and it is widely recognized as a basic psychological construct. Thus, attitude toward any object (organic products) is independent of perception, measurement and interpretation by any researcher. It is also a latent construct because it is not possible to measure and understand it directly, depending on the methodological approach followed (Park and MacInnis, 2006) and the accuracy and validity of the instrument used to quantify it. Regarding the second consideration (direction of causality), at least three criteria must be examined: association, temporal priority and non-spuriousness (Chambliss and Schutt, 2006). "Association" implies that there must be an empirical relationship between the measurement of an attitude and the actual situation of that attitude. In other words, the two objects must co-vary. "Temporal priority" means that the phenomenon under study must exist before its measurement and not as a consequence of its measurement. Finally, "non-spuriousness" means that the relationship of attitude with other phenomena must not be due to shared common causes. Attitude meets these criteria (Edwards and Bagozzi, 2000), and the psychology literature uses this construct as an antecedent of behavior and a consequence of many variables such as culture, perceptions and socialization, among others (Albarracin et al., 2005;Haugtvedt et al., 2008).
Regarding the third theoretical consideration (characteristics of the items), all are common to the theme of organic products, and adding or removing an item should not introduce changes in the conceptual domain of the theoretical construct (Coltman et al., 2008). The state-of-the-art presented earlier shows that authors who used attitudinal items removed and/or added items, but the construct was the same. This is because characteristics of organic products define and describe aspects of that product category.
In empirical analyses, it is necessary to test whether there is a correlation among items and measurement error. It was not possible to consider their sign and significance with the precursors and effects cited previously because the scale was analyzed without considering antecedents or effects.

Results
An analysis of the format and content of the scale is presented first. This analysis is followed by quantitative analysis of the metric properties: inter-country factor dimensionality and stability, reliability and convergent and discriminant validity.

Format and content analysis
The scale has been used in a range of ways, although it has always been scored on a Likert-type scale with five or seven points (Table I). The number of items has ranged from nine (as in the original scale) to two. The wording of the items has varied considerably. Whereas some studies have worded them as statements (Chen, 2007;Chen, 2009;Attieh, 2015), others have used keywords (Ventura-Lucas et al., 2008;Ventura-Lucas and Marreiros, 2013;Teng and Wang, 2015;Drugova, 2019). Synonyms have also been used for the keywords of each item. For example, the term "fraud" in item IT3 has been replaced by "cheating" or "fraudulent" in some cases. Similarly, for item IT8, the expression "not harmful effects" has been replaced by "safer" or "not harmful to the environment" in other cases. The same is true of most items on the scale. In addition, items have been worded using positive and negative constructions in different studies. The original scale is not consistent in terms of the syntax of the items. Most items (IT1, IT2, IT4, IT6 and IT7) are written in the incomplete comparative form (e.g. "organic products are healthier"), but the object, the organic products are compared with, is not explicitly stated. This issue does apply to item IT5, which compares organic products with conventional products. Items IT3, IT8 and IT9 are not comparative statements. Moreover, the items vary with respect to the comparative form. For example, for IT1, Teng and Wang (2015) and Rojas-Méndez et al. (2015) wrote "healthier than conventional ones", whereas Ventura-Lucas and Marreiros (2013) removed the comparison by writing "are good for health". The latter wording appears in the first version of the scale in Spanish (Sánchez García et al., 1998). Likewise, with respect to item IT5, which uses the comparison "are worse than the conventional ones", Braga Junior et al. (2014) replaced the term "conventional" with "traditional", implying that the two terms are synonyms. In reality, however, these terms may have a different or even opposite meaning (Gliessman, 1998).
Finally, the scale is not coherent with an ABC attitudinal model because no item refers to affective or behavioral themes. Instead, the scale is coherent with a multiattribute model because it measures individuals' beliefs about organic products, which corresponds to an attitude-toward-object model. In this model, attributes should be sufficient and relevant. However, Gil et al.'s (2000) scale seem to be incomplete because there are numerous attributes that are not considered by the authors (Table II). No information regarding the relevance of the attributes in the scale was found.

Metric analysis of the scale
The literature reflects an absence of consensus on the factor structure of the scale, and the authors of the scale themselves found a different number of factors (three or four) depending on which sample they considered. Thus, the analysis presented in this paper followed four steps: (1) analyze the dimensionality of the overall sample; (2) use covariance-based confirmatory factor analysis (CFA) to check whether the general structure is the same for each country; (3) check the reliability and validity of each factor; and (4) check whether the scale is invariant in terms of its form and factor loadings.
Previously, it was necessary to debug the database. To do so, the items were normalized so that they were uniform. Accordingly, items IT3 and IT5 were recoded. Then, following Osborne (2012), the next step was to delete five cases with missing data, 93 cases with straightlining as an indicator of low-quality responses (Zhang and Conrad, 2014), and 114 outliers based on the criterion of Mahalanobis distance. A case was considered an outlier if its distance had a p-value such that p # 0.001. Following this data cleaning process, the effective sample for the analysis was reduced to 4,780 cases (95.75 per cent of the initial sample).
3.2.1 Dimensionality of the overall sample. A principal axis factoring was first applied to find the smallest number of factors that explain the shared variance of the items. This method revealed that the Kaiser-Meyer-Olkin coefficient was 0.81, the Bartlett test was significant (Chi-squared = 14211.61, p < 0.00), and the determinant (D) was 0.05. Only 3 of the 36 correlation coefficients were observed to have p > 0.05, which suggests an association among the items of the scale. Errors (considering six of the empirical characteristics of a reflective model) were calculated using the residuals among the observed correlations and those reproduced by common factor analysis or principal axis factoring. No non-redundant residual was obtained with an absolute value greater than 0.05. These values were sufficient to consider the scale to be reflective and to form factors because, from an exploratory perspective, the data seem to fit a model with latent factors. Three factors were found: Factor 1 (IT1, IT2, IT4, IT7 and IT8), Factor 2 (IT3 and IT5), and Factor 3 (IT6 and IT9), with an overall percentage of explained variance of 52.25 per cent.
Applying CFA to this factor structure revealed the absence of multivariate normality (Mardia Test = 17.82), which led to the application of the robust Satorra-Bentler Chisquared (SBCS) test = 455.86 (df = 24, p = 0.00, normed chi-square = 18.99). The comparative fit index (CFI) = 0.96 and root mean square error of approximation (RMSEA) = 0.06, 90 per cent confidence interval (CI) = (0.06, 0.07), showed that despite the significance of the chi-square test (due to the large sample size), the fit indicators were satisfactory. Both met the recommendations stipulated in the literature (Hair et al., 2006), indicating that the fit was acceptable. The three factors are shown in Table III.
The first factor comprises items IT1, IT2, IT4, IT7 and IT8. These items relate to positive aspects that arouse desire (e.g. "tasty" or "attractive") and relate to basic consumer needs (e.g. healthy, safe and food). Therefore, this factor is related to the intrinsic "Desirability" of its attributes. The second factor, denominated "Hoax", comprises items IT3 and IT5, which reflect negative aspects related to possible fraud in the production of organic products. The third factor, denominated "Trend", comprises items IT6 and IT9, both of which relate to style in the form of price and trendiness. Factors 2 ("Hoax") and 3 ("Trend") had some problems. In both cases, half of the items had factor loadings of less than 0.70. However, the Attitudes toward organic products average factor loading for Factor 2 was greater than 0.7, Factor 3 did not have sufficient reliability in any sample. Therefore, in this step, Factor 3 was eliminated, and the remaining steps were performed for Factors 1 and 2. 3.2.2 Checking the structure for each country.
Step 3. CFA was performed five times (once per country) using the two-factor structure. Table IV shows the results. All coefficients used to estimate multivariate normality were high (Mardia Test > 3). Therefore, this absence of multivariate normality necessitated the use of robust estimates (Bentler and Wu, 2005). With the exception of Germany, all SBCS scores were significant, primarily due to the large sample sizes. The relative normed chi-square indicator had acceptable values for Germany and the UK, but not for the other countries. Regarding the goodness-of-fit indices, Hair et al. (2006) report that the CFI should be greater than 0.95 for models with fewer than Notes: (R) = reversed item; CR = composite reliability; AVE = average variance extracted SJME 24,1 12 variables when n > 250. The values for Hong Kong-China and Norway did not meet this threshold. Finally, their RMSEA indices (i.e. absolute fit of the error in terms of the population rather than a sample) must be less than 0.07 (Hair et al., 2006). The same countries also failed to meet this condition. 3.2.3 Testing the reliability and validity of each factor. After Hong Kong-China and Norway were removed from the analysis, the reliability and validity of the two factors were checked (Table IV). For the remaining three countries, Factor 1 had a composite reliability score of more than 0.8, which is the recommended threshold to define scales with five to eight items (Netemeyer et al., 2003). Factor 1 had convergent validity because all of the average variance extracted (AVE) scores were greater than 0.5.
Factor 2 did not meet the recommended composite reliability level of 0.8 for the German sample, but it did for the Spanish and British samples. However, convergent validity was observed to hold in all three countries. Finally, the existence of discriminant validity was verified using the CI criterion because no CI contained the value 1.
Step 5. Only the Spanish and British samples met all the criteria of fit, reliability and validity for a two-factor structure. To determine whether the scale has the same form in both countries, the invariance of the structure was analyzed. The SBCS test yielded a value of 93.96, df = 26, normed Chi-square = 3.61, CFI = 0.98, RMSEA = 0.05, 90 per cent CI (0.04, 0.06), which indicates an adequate fit to the two-factor model in both countries.
Next, to check for equal factor loadings (i.e. metric invariance), equal load restrictions for both samples were applied. The results were as follows: Mardia test = 5.25, chi-square = 155.49, df = 33, normed chi-square = 4.71, CFI = 0.97, RMSEA = 0.06, 90 per cent CI (0.05, 0.07). Although the fit is not poor, the increase in the chi-square statistic (155.49 -93.96 = 61.53), df = 33 -26 = 7, is notable, which implies p < 0.01. The conclusion is that the model fit worsens significantly when equal load restrictions are imposed. Therefore, invariance does not exist because the factor loadings of the items are not equal in both samples.
These results confirm that the scale under analysis may not be applied in cross-national studies due to its inherent weaknesses.

Discussion
Careful, accurate measurement of phenomena is fundamental in any science. Measurement is particularly important in marketing, an area where many phenomena are intangible and/ or latent (Kumar, 2018) This paper analyzes Gil et al.'s (2000) scale, which is the most widely used scale to measure the latent phenomenon of attitudes toward organic products. Nevertheless, few studies have applied it exactly as it was designed. Most studies have instead used reformulated versions of the original items or have selected certain items, adding others that did not appear on the original scale.
In relation to the formal aspects of the scale, the scale has been applied in a haphazard manner. This application is reflected by the fact that the items have been presented in different formats using different wordings, often modifying the scale's original semantics. For example, items IT4 and IT8 have each been worded seven different ways with different meanings (e.g. IT4: "do not taste better", "more flavorful", "tastier"; IT8: "without adverse effects", "better for the environment", "have not harmful effects"). This lack of consistency hinders the scale's replicability, threatening the standard use of the scale (Boateng et al., 2018) by failing to ensure objectivity, replicability and ease of use (Sauro and Lewis, 2016). To take another example, the English and Spanish versions of IT5 use the original adjective "conventional", but the Portuguese translation changes this term to "traditional", introducing a potential source of content bias. Although certain consumers may consider the Attitudes toward organic products two terms equivalent, the adjective "traditional" has a cultural and generational connotation that is not associated with "conventional", which refers more to what is done and expected in the present. This distinction also arises in other areas such as teaching and medicine. Because organic production may become widespread in the near future (i.e. becoming "conventional"), the recommendation is to avoid the terms "traditional" and "conventional". Instead, "organic production" and "non-organic production" should be used.
Regarding the scale type, the scale contains several comparisons that relate to cognitive attributes (e.g. "superior quality", "no harmful effects" and "more expensive") rather than comparisons associated with emotions or lifestyle-related attributes. However, consumers of organic products tend to have an active lifestyle (Irene Goetzke and Spiller, 2014). The scale's lack of attributes related to the way consumers conceive their lives (in general) and the products they consume (in particular) may create biases in the measurement of attitudes. Therefore, including lifestyle-related attributes would be advisable to enhance the scale. Likewise, since the 1990s, new attitudinal models that consider consumers' automatic and unconscious responses (implicit attitudes) have emerged. These attitudes have often helped explain the gap between reported intentions and actual behavior. Therefore, considering implicit attitudes when measuring attitudes toward organic products is advisable too.
The scale also has weaknesses in terms of metrics. The Hong Kong-Chinese and Norwegian samples considered in this study were rejected because of poor fit in terms of error (excessively high RMSEA), and the German sample had very low reliability for one of the factors. Moreover, metric invariance was not observed for the Spanish and British samples. These results imply that the scale behaves differently depending on the country, which raises doubts about its applicability for cross-country comparisons (at least for the countries analyzed in this study).
The general conclusion is that Gil et al.'s (2000) scale provides a good basis, but it needs to be improved in terms of metrics and content. Two practical implications may be derived from this study.
The first implication is scholarly and refers to an interest in checking the metric properties of any measurement instrument. Beyond any methodological issues, scales that allow cross-national comparisons are vital. In fact, it is the only way to obtain accurate and comparable knowledge of consumers. The second implication is for companies. The use of inappropriate measures can generate losses when such measures are used in market research questionnaires. Bad measurements not only play a more prominent role than they should but also have a high opportunity cost for companies. These questionnaires are often short and do not provide alternatives to reduce the impact of the failure of some measurements. Therefore, when cross-national studies are carried out, verified measures are needed.
The findings of this study must be viewed in light of some limitations, three of which are highlighted. The first is the lack of budget, which prevented the inclusion of a greater number of countries. For example, the US market, which is the largest in the world, was not considered. This is because the USA requires special treatment since it is not a single market but five regions with very different behaviors (Driscoll and Ichikawa, 2017). The second limitation is that research has focused on creating new measures without providing analyses of their suitability. This is especially true in the analysis of the format in which scales are administered because it is rarely accounted for, nor is its influence on respondents' answers considered. The third limitation is that no in-depth analyses have been undertaken and no consideration has been given to the influence of background variables such as culture or stage of development. This gap represents a future line of research.
The choice of the language in which the questionnaire was administered in Hong Kong was an important issue. The present study used traditional Chinese, even though some people regularly use Cantonese and simplified Mandarin in this region. Traditional Chinese was chosen because 89 per cent of households speak Cantonese and use traditional characters (not the simplified characters used on the continent). Mandarin has expanded significantly since 1997, although it is used fluently by fewer than 20 per cent of Hong Kong households.
Finally, the literature shows a natural tendency to create new measures but not to revise existing ones. This trend produces an inflation of measurements without studies that support their metric and utility properties. As Bruner (2003) and Bearden et al. (2011) point out, it is necessary to improve the methodological toolbox. Doing so would allow both firms and scholars to have reliable instruments at their disposal to measure phenomena of common interest. In the long run, this would mean that results would be comparable. Analysis in this area offers an interesting line of future research.