“Sustainability-contents SEO”: a semantic algorithm to improve the quality rating of sustainability web contents

PurposeFor companies that intend to respond to the modern conscious consumers' needs, a great competitive advantage is played on the ability to incorporate sustainability messages in marketing communications. The aim of this paper is to address this important priority in the web context, building a semantic algorithm that allows content managers to evaluate the quality of sustainability web contents for search engines, considering the current semantic web development.Design/methodology/approachFollowing the Design Science (DS) methodological approach, the study develops the algorithm as an artefact capable of solving a practical problem and improving the operation of content managerial process.FindingsThe algorithm considers multiple factors of evaluation, grouped in three parameters: completeness, clarity and consistency. An applicability test of the algorithm was conducted on a sample of web pages of the Google blog on sustainability to highlight the correspondence between the established evaluation factors and those actually used by Google.Practical implicationsStudying content marketing for sustainability communication constitutes a new field of research that offers exciting opportunities. Writing sustainability contents in an effective way is a fundamental step to trigger stakeholder engagement mechanisms online. It could be a positive social engineering technique in the hands of marketers to make web users able to pursue sustainable development in their choices.Originality/valueThis is the first study that creates a theoretical connection between digital content marketing and sustainability communication focussing, especially, on the aspects of search engine optimization (SEO). The algorithm of “Sustainability-contents SEO” is the first operational software tool, with a regulatory nature, that is able to analyse the web contents, detecting the terms of the sustainability language and measuring the compliance to SEO requirements.


Introduction
Web technologies, and especially social media, have made people more mindful about the problems affecting the planet, and more responsive and sensitive in demanding changes (Chang and Chin, 2011;Leong et al., 2019). Consequently, companies, whose activities have a huge impact on people's daily lives (Colleoni, 2013), should take a public stand on important global issues. A recent study conducted by IBM in Collaboration with NRF (2020) highlights a global trend: Consumers are increasingly demanding from brands a sustainable commitment and greater transparency in business practices. Furthermore, beyond business boundaries, brand activism initiatives (Kotler and Sarkar, 2018) are required in order to influence institutional decisions on environmental, social, political and economic aspects (Vredenburg et al., 2020). Brands and companies are thus called to play their effective role in raising awareness of sustainable development, promoting collective participation in seeking a new way of consuming and adopting daily behaviours in this direction (Eyada, 2020). This new role necessarily implies a review of the methods of communication and involvement with stakeholders, in favour of building solid relationships of trust.
Since web information is crowd-controlled, companies that choose to adopt new media for their communications need to be prepared to take a more ethical stance towards stakeholders (DiStaso and Bortree, 2014). The perception of authenticity, transparency andabove allconsistency becomes the metric to measure the reliability of the company (Kim and Ferguson, 2018). In this changing context, it has become essential to offer network users various highvalue content, which responds in a timely manner to search engine queries, satisfying the real information (and therefore consumption) needs of users (Jefferson and Tanton, 2015). When this request for information concerns the support and promotion practices of sustainable development adopted by companies, a clear and specific need emerges in terms of content management.
In the context of academic research that crosses corporate sustainability and marketing, Diez-Martin et al. (2019) see digital marketing as capable to fill the "important gap between the behaviour and beliefs of society and markets about sustainability, and companies' capability to understand and face this trend" (p. 1). One of the challenges identified by this authors is figuring out how sustainability could be turned into a competitive advantage. To answer this same question, some researchers studied the potential role of branded sustainability contents (especially web and social media contents). Branded contents are considered as useful tools to promote sustainable development beyond explicit corporate interests and to persuade branded consumers to adopt a lifestyle compatible with the planet's needs (Grubor and Milovanov, 2017;Hanson et al., 2019). Other studies have investigated the effectiveness of sustainability messages (i.e. those that leads to positive consumer attitudes) considering the used language (Evans and Peirson-Smith, 2018), the emotional and psychological triggers (Line et al., 2016) or the influence of the communication source (Kapoor et al., 2021) as variables. Yet, on the web, the effectiveness of content strongly depends on technical aspects that improve its retrieval through online searches, and which are subject to search engine optimization (SEO) activities. No studies have so far dealt with these aspects.
Improving the qualitative dimension of sustainability contents (in terms of utility, authority, completeness, clarity of language) helps content managers to overcome the scepticism that typically affects users' ability to assess the reliability of companies' sustainability efforts and their public legitimacy (De Vries et al., 2015).
In the context of semantic web searches, the quality of content is to be considered as the ability to respond in a relevant and satisfactory way to the search intent of users, making the message to be conveyed as clear and understandable as possible. The relevance of creating web content on sustainability issues that are appropriately optimized to compete in the semantic web space will soon become a need to be addressed in the context of digital content marketing practices. It is, therefore, crucial to put academic research at the service of managerial needs in order to take advantage of making the "first move". The purpose of this paper is to develop a semantic algorithm that allows content managers to evaluate the quality of sustainability web contents for search engines, considering the current semantic web development. The algorithm has to be intended as a tool useful to measure how well the web page achieve its purpose in satisfying users' sustainability information needs.
The reminder of the paper is organized as follows: Section 2 presents the conceptual background of the study, reviewing the literature about sustainability marketing, considered in a social engineering perspective. The role of sustainability web contents in stakeholder engagement mechanisms and the evolution of web search engines through which content must be distributed will be explored. Section 3 describes the Design Science (DS) methodological approach, adopted to develop, and test the algorithm. The study and the results of the algorithm applicability test are discussed in Section 4. Finally, Section 5 concludes the study by examining the limitations and proposing future research.
2. Conceptual background 2.1 Sustainability marketing in a social engineering perspective When a company focusses its marketing strategy on supporting the environment and society, it embraces what is called the "sustainability marketing" (Belz and Peattie, 2010;Kemper and Ballantine, 2019).
The link between sustainability and marketing has already explored by academics for several years. Many have questioned the possibility of overcoming the natural contradiction between these two concepts; marketing, as traditionally defined, pushes towards a continuous consumption unsustainable due to the ecological limits to growth (Peattie and Peattie, 2009). Therefore, it is interpreted as the antithesis of sustainability (Jones et al., 2008;Lim, 2017). However, marketing has the ability to influence sustainable lifestyles (Peattie and Peattie, 2009). Sustainability marketing, in fact, consists of a set of business activities aimed at creating a positive impact that goes beyond the corporate well-being, including the social and environmental facets. In this perspective, it can be interpreted within a social engineering framework Parsons, 2012, 2014). The original concept of "social engineering" was often associated with negative, political motivated interventions by governments aimed at impacting and modifying individual behaviour (McMahon, 2001;Kennedy and Parsons, 2012). Social engineering techniques, in fact, are based on the possibility of "exploiting" the weaknesses of the human being in order to persuade him to perform a desired task (Hadnagy, 2010). According to Lies (2019), «social engineering are practices applied to influence people. Hence, marketing, PR-management, advertising etc. are examples of social engineering» (p. 137). However, considering the original aim of social engineering of «arranging and channelling environmental and social forces to create a high probability that effective social action will occur» (Alexander and Schmidt, 1996, p. 1), social engineering could be intended as a set of applied methods for positive social impact and social change (Kennedy and Parsons, 2014). Sustainability marketing, implying business efforts aiming at enhancing social wellbeing, it strongly depends on the positive relationship between business and society (Sinthupundaja and Kohda, 2019). That means that sustainability marketing impacts on stakeholder engagement (Pucci et al., 2020) involving stakeholders in business management, sharing information, dialoguing and mutual responsibility (Manetti, 2011), help to share a broader philosophy that influences the adoption of conducts favourable to sustainable development (Rossi, 2017;Yang and Yan, 2020). As the use of the participatory web is a key facilitator for stakeholder engagement (Sivarajah et al., 2020), sustainability marketing in the web can be interpreted from the perspective of social engineering.

Sustainability web content as applied social engineering
For the purposes of sustainability marketing, communication techniques such as advertising, public relations and content marketing, associated with incentives for action, and environmental stimuli could be considered as means of influencing behaviour either by encouragement (e.g. healthy eating, lifelong learning, physical exercise) or discouragement (e.g. anti-smoking, drink-driving, domestic violence) (Kennedy and Parsons, 2012). Large organizations, with strong brands and a solid reputation, taking part in speeches on important political and social issues and publicly embracing social issues and emergencies afflicting the planet, are considered capable of activating virtuous circles of change, even more than governments (Auemsuvarn, 2019). They can, therefore, achieve the same effect as social engineering. In particular, inbound marketing makes it possible to draw the customer into the firm voluntarily, keeping the attention through the use of appealing informative and, above all, responsive web contents. The responsiveness, in turn, strongly depends on SEO techniques. SEO, by intervening on the setting of the contents, especially on the choice of terminology, it creates correspondence between the contents and user queries. Thus, first of all, web contents can be functional to satisfy the multiple needs of the modern, empowered and mindful consumer, already positively open to change. They respond to the expectations of personalization of the browsing and consumption experience, of an interaction that reflects modern consumer needs and attitudes (Light, 2014). Secondly, they are a vehicle for emotions, a fundamental component for attracting and maintaining people's attention and interest (Wylie, 2014), and could be engaging: engagement is critical as it stimulates content sharing in the mainstream (Botha and Reyneke, 2013). But above all, web contents can -and mustcommunicate a sense of ethics and honesty (Syzdek, 2014), as signals of corporate reliability and legitimacy in public opinion (Colleoni, 2013), incorporating sustainability messages. As customers view the sustainability contents, they are more engaged, more active and more likely to support the important causes presented to them, such as environmental and social ones.
If those of sustainability have become fundamental aspects in the decision-making process of individuals (O'Rourke and Ringer, 2016), SEO techniques in this area are increasingly relevant to intercept the interest of users during their browsing experience of valuable information searching. At this stage, it is important for companies' contents to be found through any search engine, to enjoy good visibility but, above all, to respond in a timely and adequate manner to the searches of one's online audience (Fishkin and Høgenhaven, 2013). A great competitive advantage is played on the ability to overcome the cognitive overload of people generated by the surplus of web contents, improving the qualitative dimension of sustainability messages in accordance with the logic of web search engines (Halligan and Shah, 2009).
It should be considered that sustainability information is still based mainly on the transmission of objective and measurable data, through periodic reports. Measurable data, in fact, offer a verifiable vision of the objectives and performances of companies in terms of environmental, social and economic impact. In this field, some studies have already addressed the problem of defining how to assess the quality of sustainability information. According to Isaakson (2019), for example, the quality of sustainability information is linked to the ability to clearly define sustainability priorities based on stakeholders' needs. However, often the sustainability statements in reports are not really stakeholder-inclusive, as they are often missing in completeness, accuracy, clarity, timeliness and reliability (Boiral et al., 2019). In this way, reports' information results too technical and specialized, difficult to understand and, consequently, not very attractive to the general public (Confetto and Covucci, 2021). This undermines the perception of the information quality of sustainability contents which, in fact, often do not meet the very usefulness of the information transmitted. For this reason, the quality of sustainability contents for the web is linked to clarity and understandability, as well as to exhaustiveness in meeting the information needs of online users. The attempt to improve clarity and understandability has a great impact on the writing of content, in particular on the choice of terminology to facilitate the readability and on the use of userfriendly keywords to facilitate the availability of contents in response to user queries. Therefore, the highlighted aspect has strong implications for SEO activities.
2.3 Evolution of search engines: content quality and keywords optimization Access to information is often associated with browsers' search engines. For people who have even a vague idea of what they want or need, it's natural to search and then sort out the results in order to attain the best answer (Capra and P erez-Quiñones, 2005). Companies aim for the top positions in the Search Engine Results Pages (SERPs), as when a web page is ranked higher, it offers a more favourable brand image, attracts more visitors and leads to greater purchase intent, influencing the formation of users' attitudes and beliefs (Epstein and Robertson, 2015).
The functioning of search engines has evolved over time with the main aim of facilitating the transmission on the web of content qualitatively more relevant and more responsive to the users' search intent. The recent updates to the Google search engine algorithm, for example, were aimed at "cleaning up" the SERPs from fake news and unwanted results, with particular attention to content concerning health, well-being and financial stability of users (the socalled "Medic Update"). Understanding the real search intent, however, requires efforts in terms of re-structuring knowledge towards the semantic web development (Hitzler et al., 2009;Suryanarayana et al., 2018). This goal is achievable describing and interconnecting existing data through ontologies and standardized languages, in order to facilitate their semantic contextualization and deeper use (Patel and Jain, 2019). The result is a network of meanings, not just of machines. Ideally, users of semantic search engines can ask questions in natural language and receive relevant answers from machines that act as "intelligent agents" (Sadeeq and Zeebaree, 2021). To improve web page accessibility and understandability by the search engines, SEO is an important technique, particularly the on-page SEO, which concerns the internal elements of a web page and above all the writing of contents.
The implementation of techniques such as Latent Semantic Indexing (LSI), has the purpose of finding hidden (latent) relationships between words (semantics) to improve the understanding of information (indexing) (Lahey, 2021), establishing, for example, that "home" is synonymous of "house", on the basis of their conceptual similarity (Srikanth and Sakthivel, 2019). Using semantically related keywords (LSI keywords) in SEO of a web page could help a search engine to understand its content meaning (Mathews, 2020). When a search engine is able to understand the meaning of a content, it can index and rank it, clustering the search results with more relevance for the target queries (Harto, 2019).

Research design
The opportunities deriving from improving SEO practices in the specific field of sustainability (content) marketing constitute a new trend that has not yet been investigated. The development of new trends entails the need to use epistemological paradigms based on exploratory rather than explanatory research. This is due to the practical impossibility of observing and finding theoretical explanations for a phenomenon that has not yet occurred in practice. Therefore, the creation of artificial phenomena or simply artefacts, such as new frameworks or software applications, is essential.
DS is a "constructive research method" (Piirainen and Gonzalez, 2013) also known as action research or "interventionist" research (see, for e.g. J€ onsson and Lukka, 2006). DS aims at creating a new artefact as a solution to the problems found in practice: DS is fundamentally a problem-solving paradigm. The artefact has to be designed to extend the boundaries of the human limit of organizational and resolution capacities through intellectual and computational tools. A DS artefact in Information Science could be a construct (vocabulary and symbols), a model (abstractions and representations), a method (algorithms and practices) or an instantiation (implemented and prototype system) (Hevner et al., 2004) or every "new properties of technical, social and/or information resources" (J€ arvinen, 2007, p. 49).
DS includes the need of determining the desired functionality of the artefact and its architecture and then the creation itself. For the purposes of this study, the Design Science Research Methodology (DSRM) procedure, developed by Peffers et al. (2007), is adopted. DSRM entails the following steps: (1) identification of the problem and the motivation; (2) definition of the goal of the prospected solution; (3) design and development of the artefact; (4) demonstration; (5) evaluation and communication, through the publication of results.

Step 1 -identification of the problem and the motivation
Despite the predominant interest in corporate sustainability in online communication, web communication practices in this field have recently been the focus of attention in the academic field (S€ upke et al, 2009;Lodhia, 2010;Kr€ atzig and Warren-Kretzschmar, 2014). In particular, there is a lack of measurement criteria that guarantee a robust implementation or verification of the accuracy of information outside the reporting channel. In fact, currently, there is not any tool that allows to evaluate the setting of sustainability web contents.

3.2
Step 2 -definition of the goal of the prospected solution The goal to be achieved is designing a tool to automatically analyse the sustainability web contents and assess their quality rating (QR) in terms of: completeness in the compilation of SEO data and metadata; use of sustainability keywords among metadata; clarity of the content subject and consistency on the subject among various parts in which content is structured. The evaluation requires to perform a series of content analysis steps: from reading, to extrapolating and analysing data, to calculating a score. For this reason, the solution of the problem can be identified in an algorithm, of semantic nature, that is "a set of steps used to perform a task" (March and Smith, 1995, p. 257) and could be considered to all intents and purposes a DS technical artefact (Weigand et al., 2020).

3.3
Step 3.1 -design and development of the artefactthe algorithm's logic In order to work, the algorithm needs to be able to identify and collect specific semantic signals from the web content. Hence, there is the need of reading the content of the entire web document, analysing it syntactically (parsing) and then extracting the data useful for achieving the scope. These data are contained in the Hyper Text Markup Language (HTML) code of the web pages. The algorithm, leaving aside the "instrumental" words (articles, prepositions, conjunctions, etc.), has to search within the tags for sustainability information, tracing the "theme-words" to immediately grasp the main subject of the content. The information to be searched must be provided to the algorithm by setting up a database, structured in order to rationalize the search for such elements. In this case, it is a database populated by terms and phrases belonging to the language of sustainability, organized according to a taxonomic structure that groups them by sustainability themes. The systematization of the terminology of the sustainability language envisaged the establishment of a controlled vocabulary, following the procedure outlined by Deng et al. (2017). The corpus of terms of the vocabulary has been elaborated starting from three dictionaries or encyclopaedias already published on the subject (Beck, 2014;Idowu et al., 2015;Robertson, 2017). The three chosen sources, focussing on different aspects of sustainability and corporate social responsibility, guaranteed a wide range of terminology (over 12,000 entries). A consistent sample of the corpus (about 3,000 entries) was categorized according to a taxonomy of sustainability topics to test the functionality of the algorithm. The taxonomy identifies and organizes the aspects of sustainability according to three levelsdimension, theme and topic , declining the basic conceptual dimensions of sustainability (Planet, People, Profit, Governance), in thematic categories, (e.g. the Planet dimension is declined in themes of "Environmental footprint", "Climate", "Biodiversity", "Green innovation and technologies"); each of these themes, in turn, grouped more specific topics (e.g. the theme "Climate" is given by the set of topics "gas and emissions", "global warming", "deforestation", "transport"). The categorization was carried out by three researchers independently, through a manual approach not supported by content analysis software, to hold greater control in the disambiguation of terms. A guide in this process has been the definitions of dictionaries used to make up the controlled vocabulary (Beck, 2014;Idowu et al., 2015;Robertson, 2017). On the basis of the achieved results of each researcher, the categorization choices were made, opting for the common understanding of at least two researchers.
A theme labelled "Broader entries" has been added to the taxonomy of sustainability topics. This additional category in the database is useful to collect all those vocabulary entries that may have semantic polyvalence; therefore, they cannot be traced back to just one of the thematic categories but, potentially, to all (the most obvious example of this need is linked to general terms such as "sustainability").
Starting from sensible terms contained in the database, the algorithm analyses which and how many of these terms are contained in the various parts that made up the main content (MC) of a web page (i.e. the HTML document), detecting the category of the theme on which the web page focusses. In a well-formed HTML document, it is expected to find data visible to users, and "not visible" data intended for search engines, as well as organized by sections such as title (<page title>, <title> or <H1>), subtitles (<H2>, <H3>, <H4>, etc.), paragraphs (<p>), metadata (<meta-title>, <meta-description>, <meta-keywords>), etc. The analysis made by the tool takes into consideration both types of data, in order to evaluate the MC of the web page as a whole. In order to make the operation of the algorithm as useful as possible, it was decided not to neglect the potential for content optimization that would derive from the implementation of LSI techniques. The QR of a web content is favoured when used terms are semantically related to strengthen the semantic context of the content. For this reason, within the database, semantic relationships have been created between some sustainability terms which, usually, are used as synonyms. This is the case, for example, of the word "sustainability" which in multiple semantic contexts can be replaced with terms such as "corporate sustainability", "corporate responsibility", "corporate social responsibility", "CSR"; in others, it can be simply declined in "environmental sustainability", "economic sustainability", etc. Therefore, there is a semantic relationship between all these vocabulary entries and, therefore, a relationship has been built in the database.

3.4
Step 3.2 -design and development of the artefactparameters and factors of sustainability contents QR To understand on which variables to measure the quality of web content in terms of SEO, an in-depth analysis was conducted on multiple and updated practical guides to SEO (see: SEJ, 2019; Papagiannis, 2020;Clarke, 2021;Godin and Kennedy, 2021). This technique, in fact, due to the constant updates of the search engine algorithms, is constantly evolving and the factors, on which it works, change continuously. Moreover, it is important to highlight that it is never certain to know all factors involved in the abovementioned technique. However, the on-site SEO, in the part that concerns the writing of contents, entails tags and meta-tags whose importance lasts over time. Thus, these are the ones that have been considered to constitute a complete range of evaluation factors and that we have organized into three parameters (Completeness, Clarity and Consistency). Nevertheless, as repeatedly stressed, the quality of the contents depends a lot on their degree of clarity, and therefore, it also depends on the degree of interpretability by search engines (which is closely linked to the semantics of the keywords used in the tags and meta-tags).
The definition of contents' QR is thus based on three parameters: (1) Completeness (Com) provides information on the presence/absence of the most relevant tags and meta-tags and, in addition, on the use of sustainability keywords (SKws) in the compilation of data; (2) Clarity (Cla), relating to the possibility to unambiguously identify the purpose of the content, considering the concordance of elements within a content that give information on the matter (the theme); (3) Consistency (Con), considered as semantic coherence between the content parts. Content dealing with a given theme is expected to have a title, subtitles, links and metadata consistent with that theme.
Each evaluation parameter considers multiple factors. In the calculation, the factors are treated as dichotomous variables, that means, they are all described as a condition that can have only two modes: to exist (Ok) or not to exist (Missing). When the detection is positive (i.e. the condition is met), a score proportional to the overall value of the parameter is attributed to the single factor. Each parameter will be expressed in hundredths, and the overall QR will be given by the sum of the three parameters (normalized on the basis of 100). In total, the Com is based on the detection of 20 factors, each of which will express the same weight in the evaluation calculation of 20/100 (Table 1) as it is not possible to establish with certainty which variables are most important for the indexing and ranking of content by search engines (Luh et al., 2016). Regarding the URL, there is no need to detect its presence, since a URL address is necessarily associated with each web page; therefore, only the possible presence of sustainability terms within it is detected (Com1). With regard to the Cla, the algorithm can firstly calculate whether the content concerns Planet, People, Profit or Governance dimension and the specific theme. The algorithm can then identify the main sustainability keyword (MSKw), that is the one mentioned several times in the text and, then, determine the prominence and acceptability of its density. The factors that contribute to determining the clarity of the content are five ( Table 2). The score attributed to each factor depends on the possibility of semantic context understandability: The semantic context is defined as clear if at least 51% of the SKws detected are attributable to the same sustainability dimension, in fact, there is the possibility that the same content refers to multiple dimensions. The theme of the content is clear when there is a predominant taxonomic category compared to the others, or if at least 51% of the SKws detected are categorized within the same theme. The MSKw is evaluated as clear when, among all SKws detected, there is one that occurs more frequently than the others. Once the MSKw has been

Items
Conditions for score attribution Score Yes No
Completeness (Com) evaluation system determined, its prominence must be assessed, that is the relevance that this keyword is given within the content through its positioning in strategic positions for SEO. Keyword prominence (Kivuti, 2018) helps to increase the clarity of the content if the MSKw is among the first 100 words of the paragraph, at the beginning of the Title Page tag (within the first three words), at the beginning of the H1 tag and other heading tags (first three words) and at the beginning of the meta-description tag (within the first 10 words). The keyword prominence is assessed on the basis of five items, each of which contributes 1/5 of the maximum prominence score (20/100), i.e. 4/100. Finally, the clarity parameter is affected by the keyword density (Bansal and Sharma, 2015), that is the frequency of a keyword related to the total number of words in the content. Therefore, it evaluated the density of the MSKw only within the body of the text (<p> tags). The keyword density must not exceed 3%, because beyond this threshold, keyword stuffing occurs (Zuze and Weideman, 2013). The scoring of Cla factors depends on the satisfaction of the aforementioned conditions of clarity. If the condition is met, the algorithm assigns a score unit of 20/100. Finally, considering the Con, the algorithm first evaluates the consistency between the preponderant category of sustainability terms included in the various tags and meta-tags with respect to the taxonomic category corresponding to the theme of the content in general (Cla2). Where there is a perfect correspondence, there is a strong consistency, and the maximum score is attributed to the factor (10/100). But, often, some parts of the MC do not necessarily concern the same theme but could concern neighbouring themes (i.e. those related to the same dimension in the taxonomy or those "Broader") with the aim of enriching the informing purpose of the content. For this reason, in the lack of a strong consistency, the Con factors are evaluated in relation to the dimension (Cla1). Where there is this correspondence, a lower score is awarded (7/100). Con is measured on the basis of 10 factors, each of which is attributed a value of 10/100 (Table 3). It should be considered that consistency is assessed only with regard to the factors actually identified (present in the Com). Undetected tags and meta-tags are excluded from the calculation as it would not make sense to evaluate the consistency of items that do not exist.
The overall evaluation (QR), expressed in a range from 0 to 100, is given by: where the interval of each variable varies between 0 and the maximum number of factors of each parameter, so Com [0, 20]; Cla [0, 5]; Con [0, 10] and the constant K is equivalent, from time to time, to the score attributed to each factor that responds positively to the condition for assigning the score (K1 5 20; K2 5 5; K3 5 10). Each parameter (Com, Cla, Con) contributes equally to the final QR. The evaluations expressed in a synthetic way in the form of numerical judgment have the advantage of being immediately understandable because they are Items Conditions for score attribution

Cla1
The sustainability dimension is clear 20/100 0/100 Cla2 The sustainability content theme is clear 20/100 0/100 Cla3 The MSKw is clear 20/100 0/100 Cla4 The prominence of MSKw is appropriate In the paragraph 4/100 0/100 In the page title tag 4/100 0/100 In the H1title tag 4/100 0/100 In the tag heading(s) 4/100 0/100 In the meta-description tag 4/100 0/100 Cla5 The MSkw density is appropriate 20/100 0/100 Table 2. Clarity (Cla) parameter evaluation system Sustainabilitycontents SEO algorithm compared in relation to the maximum achievable score (100, in this case). However, to facilitate the interpretation of the numerical data, the QR is accompanied by a label that shows the level of adequacy of the content optimization with respect to the established compliance parameters. The QR reflects, albeit indirectly, the competitive ability of the content in response to the same search intent. Therefore, the maximum scale of 100 points has been divided into scoring ranges (Table 4). The level of excellence optimization was expected to be very high (90/100) because the SEO is highly competitive, and it is therefore necessary to achieve a very high level of performance. On the other hand, there are two considerations that need to be made. First of all, the lack of a single factor in terms of completeness automatically penalizes the score on the other two parameters as well. Let's say, for example, the case in which the description metatag is not used (Com8 5 0); it will automatically result that Com9 5 0, the factor Cla4 will be reduced by 4/100. Second, the impossibility of detecting the Cla2 factor penalizes the calculation of Con parameter, affecting the overall QR. For this reason, the range of acceptability was expected to be very wide (from 65 to 89). In the example cited above, the only corrective action related to the inclusion of a meta-description with a sustainability keyword would significantly improve the QR of the content. The same interpretability of meaning can be applied to the scores of the individual parameters.

Step 4 -demonstration: an applicability test of the algorithm
To test the functionality of the algorithm, a web application was implemented that would make automatic the algorithm's process and return the results organized in data tables. The

Items
Conditions for score attribution Score Yes No Strong consistent Low consistent Not consistent

Con7
The main theme of heading tags (H2, H3, H4)  Too low The optimization of the content is insufficient; an important review is required to improve the QR Table 3. Consistency (Con) parameter evaluation system Table 4. Guidelines for the interpretation of the score development of the web application was performed based on iterative approach of the agile methodology (Cao et al., 2009), according to an adaptive software development life-cycle model (Highsmith, 2013). An analysis test was conducted in April 2021 on the web pages of the Google blog section dedicated to sustainability, progress, tools, reports and initiatives implemented to save the planet (https://blog.google/outreach-initiatives/sustainability/), published between January 2020 and April 2021. The sample of analysis is made up of 10 web pages, each of which has an ID marked with a hashtag (Table 5). The choice of Google web pages analysis is obviously not accidental. First of all, within the Google blog, the concept of sustainability is mainly associated to environmental sustainability (Planet dimension). In the controlled vocabulary, the terminology referable to the Planet is certainly the richest in technicalities and specialized terms, with respect to which the categorization work is less susceptible to errors of subjectivity. For this reason, choosing the Google blog as source of contents on which to test the algorithm was considered the most appropriate choice. Moreover, being the development of the algorithm based on the known and updated SEO factors used for indexing and positioning web contents, it is expected that the setting of the web pages managed by Google will comply with the requirements established. Therefore, it is assumed that at least acceptable scores will be obtained on all detection parameters.

3.6
Step 5 -findings Table 5 summarizes the results of the sustainability-contents SEO evaluation. All the scores obtained fall within the range of acceptability (65 < QR < 89) or even in that of excellence (QR ≥ 90).

Sustainabilitycontents SEO algorithm
Regarding Com (Table 6), it should be noted, first of all, that all tags and meta-tags selected resulted used (items: Com2, Com4, Com6, Com8, Com10, Com12, Com14, Com16). However, the complete lack of the Com11 factor (SKws in meta-keywords) has shed doubt on the use of the meta-keywords tag (Com10). In fact, analysing the page source of the sample's web pages, we will always notice < meta name 5 "keywords" content 5 "None"/>. Consequently, the tag actually exists (and the software detects it), but the "none" value makes it null. This confirms the known fact that Google does not use the meta-keywords tag in web ranking results (Cutts, 2009). In all other cases where an SKw is missing in a tag, it is considered an SEO choice. Note that groups of LSI keywords were found in all the pages analysed (Com20).
Regarding Cla (Table 7), the data show, first of all, the correct detection of the main dimension of the contents (Cla1), that is, Planet. The two resulting themes (Cla2), Environmental footprint and Climate, are themes of the Planet dimension, as well as the MSKw(s) (Cla3), except the term "sustainability" (#4), categorized as "broader entry". The position of the MSKw detected in various parts of the text (Cla4) is the one with respect to which the score of the prominence is calculated. For example, within the paragraph 1/5 of the overall Cla4 score is attributed (i.e. 4/100) if the MSKw is positioned within the first 100 words. This means that 4 points are awarded in cases # 1, # 2, # 3, # 6, # 9, # 10. Finally, considering the MSKw density, all web pages excepted #3, comply with the <3% rule.
Concerning Con (Table 8), the detection of the main sustainability theme deriving from the words used in each part of the content, highlights that in all cases where the tag or meta-tag exists, keywords consistent with the main theme (Cla2) are used within. In some cases, there is a strong consistency (SC), in others there is a weaker consistency (C).
Finally, Google's sustainability blog pages scores expectations (acceptable range) were met.

Discussion and implication
The results of the algorithm applicability test allow us to make some considerations on the usefulness of it in assessing the quality of web contents focussed on sustainability issues.
If we consider Google as a reference to establish the norm in the optimization of contents for web search engines, based on its dominance as a privileged tool for online searches [1], we can say that the quality factors on which the algorithm is based reflect largely the standard set by Google. Google's advances to promote the quality of contents have been aimed at deep understanding of content, in order to facilitate matching between search queries and content proposed in SERPs. This required a major step towards the development of the semantic web, which is an "environment" still in an embryonic stage (Hitzler, 2021). The semantic web implies a new way of structuring documents for their usability on the web (Berners-Lee et al., 2001). This restructuring step involves, first of all, the application of ontological languages as a methodology for defining the knowledge domain of web contents. This is the only way to propose relevant results to online users' search queries. In this perspective, in future, keywords will become, even more, the measurableand quantifiablebasis of an optimization methodology oriented towards a neural match with users (Southern, 2019;Hermanson, 2021).
The controlled sustainability vocabulary that serves as a database for the application of the algorithm was created in the semantic web perspective, as it organizes and describes the sustainability language in predetermined classes (sustainability domain) and sub-classes (sustainability themes) and creates basic semantic relationships between the entries. This work of categorization and construction of semantic relationships between the terms of the vocabulary enriches the sustainability taxonomy used, providing further data on the basis of which to evaluate and, consequently, optimize the web contents on sustainability issues.

Sustainabilitycontents SEO algorithm
First of all, the algorithm evaluates the completeness in the compilation of SEO data and metadata. Completeness is to be understood, not only as the use of all the content optimization factors (i.e. title, meta-title, description, etc.) but, above all, as the use of terms that, belonging to the same languagethat of sustainabilityallow to strengthen the semantic context of the content. In this sense, the added value of the algorithm lies, in particular, in the detection of the use of LSI keywords. LSI helps to solve search intent disambiguation problems such as those associated with polysemy and the so-called "lateral relationships" between ontology's entities (Krum, 2019). Therefore, unlike most of the SEO analysis tools already available on the market, the completeness of the optimization activities reaches, thanks to this algorithm, a higher level of sophistication. The results it returns, in fact, not only allows to detect the neglected factors in the writing phase of content processinghighlighting the errors to be correctedbut, being verticalized on a specific communication area (communication for sustainability) permits to carry out more precise and contextualized analyses aimed at elaborating contents on this specific theme. To reach this purpose, the analysis on the Completeness parameter needed to be integrated with the analysis on the other two parameters, Clarity and Consistency, which are two fundamental principles, above all, of sustainability communication. The clarity of a content is relevant from the point of view of both the user (who must be able to understand what the content is about) and the search engine (which uses some signals to establish the correspondence of a content with respect to the search intent). By evaluating the ability of the content to communicate the "subject" to which it refers in an unambiguous and understandable way, clarity therefore guarantees the readability of the text and the correct understanding of sustainability information. Consistency is also relevant first of all for the user who uses the content, as the signals that lead to open a certain web page instead of another (such as the title or meta-description) must not be misleading but they must direct user to a content that responds to his/her information needs. Consequently, it is an important signal for search engines to index content and present it in response to specific queries. In the context of sustainability communications, consistency increases the reliability of the information conveyed and, therefore, of the source that conveys it. The algorithm establishes a relationship between the three parameters: The results on the factors of one can affect the results of the factors of the other two. This correlation is justified by the fact that those who deal with content optimization need to have an overview of the items that can modify the rating results, understanding the relationships between the factors. For example, the semantic consistency between different parts of the content can be established only when it is possible to determine the semantic contextor the focus themeof each part: It is therefore closely linked to the degree of clarity of the content. With this in mind, the algorithm assumes regulatory value, establishing the rulesor rather, the standard to be followed in setting the sustainability content. It is therefore useful as a tool for ex-post analysis of the optimization level for search engines, but it can also be used as a guiding model during the copywriting phase.
The algorithm's logic has its roots in a theoretical framework that aims to emphasize the link between two fields of study widely debated in both academic and professional marketing literature: digital content marketing and sustainability communication, with a focus on the aspects of SEO. From the literature review emerged that the digital evolution, on the one hand, and the sustainability revolution, on the other, made it fundamental for modern companies to establish strong, trusting and long-term relationships with their stakeholders, creating shared value between parties (Sinthupundaja and Kohda, 2019;Pucci et al., 2020). This shared value can be found in the sustainable approach to business operations, as much as, to consumption, generating well-being, at the same time, for the company, for the environment and for society (Yang and Yan, 2020). In this perspective, the marketing and communication efforts of companies have been, in recent years, focussed on seizing and exploiting the opportunities offered by the web and social media as an ideal space to activate an authentic involvement of stakeholders in business decisions and conduct by virtue of sustainable development (Khan et al., 2019;Sivarajah et al., 2020). In this sense, sustainability communication expresses its greatest potential through the creation and transmission of web, informative, educational and emotional contents. SEO practices on sustainability contents enhance stakeholder engagement because they allow brands to propose highly targeted content. In terms of theoretical contribution, this study identifies, for the first time in the literature, a cord linking sustainability communication, inbound and content marketing and SEO: A new field of research that configures exciting managerial opportunities. Writing content in an optimal way to intercept web users' needs could be seen as a powerful social engineering tool in the hands of marketers, able to manage the relationships with multiple stakeholders, consumers first and foremost. Marketers need to understand the role of sustainability web contents in creating desirable changes and making people able to improve (Kennedy and Parsons, 2012), while fulfilling the goals of sustainability marketing. At the same time, they cannot and must not ignore the developments in the direction of the semantic web. This places the spotlight on the need to rethink SEO techniques coherently with the development of ontological languages of the semantic web, especially in reference to increasingly important sustainability web language.
Further practical implication emerges relating to reporting activities. Using the algorithm for assessing the quality of the sustainability reporting contents, these could be set as singular content, independent and unhooked from the entire document, suitable for userfriendly web transmission. It means that each report content can individually perform the task of answering specific search queries more precisely.

Conclusions, limitations and further research
This study addresses the important priority of responding to the modern conscious consumers' needs of sustainability information in the web context. The effort to design a "sustainability-contents SEO" tool was made with the aim of intervene on the quality of sustainability content to develop and maintain, in the long term, trusting relationships between companies and publics. The concept of quality in web searches is essentially binds to the ability to efficiently respond to users' search intent. This requires, however, still a great work of development and improvement of the logic of the semantic web, in terms of both organization and representation of knowledge. Given the theoretical premises of this study, it is reasonable supposing that, in future, the information of sustainability about companies or products will be so relevant to be immediately provided among the results in SERPs, before the user chooses to consult a specific web page. To act as the so-called "rich snippet", the sustainability information will have to find representation in form of metadata, within the structured data of the web page.
This study, still purely in the experimental phase, is only the first step towards the semantic structuring of the sustainability language for the web. The validity of controlled vocabulary of sustainability language is dependent on the sources chosen to collect the terminology, as well as to the subjectivity applied by researchers in the categorization of entries according to the selected taxonomy. In order to overcome this limitation, it was decided to set the algorithm to work on the analysis of SKws on the "theme" and not on the "topic" level. Greater accuracy in the organization of the sustainability language and, consequently, in the usability of the algorithm, can be achieved through the application of machine learning and artificial intelligence mechanisms to the development of the vocabulary database.
Additional empirical demonstrations will be necessary to evaluate the utility of the algorithm in improving the positioning of contents in SERPs. However, the algorithm focusses only on the assessment of the MC of a web page, leaving up the other SEO factors linked to the structure of the website and its "reputation". The applicability test carried out, having been focussed on the web pages of the same source (Google's blog), with an evident good reputation for search engines, has bypassed this type of problem. In future research, it would be interesting to integrate into the analysis of the instrument the possibility to detect all other SEO aspects concerning the architecture of the website as a whole, in order to offer an overall view of exploration at the content managers.
As social networks are an innovative and effective resource that can be used by organizations for developing common awareness and personal motivation to embrace the principles of sustainability, sustainability contents must begin to be built in a more creative, multimedia way, with a strong visual impact. However, even in this context, the technical aspect of optimization for the search engines of social networks cannot be neglected. They are very important factors in the optimization of contents for their online sharing. The internal search engines of each social network work according to different logics compared to other social network's engine and to the browsers' search engines.
The social network meta-tags are useful to improve the look and the efficacy of social networks post previews. Each social network search engine is based on specific protocol of meta-properties definition (e.g. Open Graph for Facebook, Twitter Cards for Twitter, etc.). In future development of the tool, considering these variables in the evaluation process would make the sustainability-contents SEO algorithm more comprehensive.
Finally, the flexibility of the evaluation software developed, or its applicability to other communication areas, should be highlighted. Populating the database of a controlled vocabulary of a different nature, in fact, the validity of the algorithm in assessing the QR of web content remains the same. This translates into the future concrete possibility of adapting the tool to the evaluation of contents in other communication areas, i.e. contents focussed on topics other than that of sustainability.