Website quality evaluation: a model for developing comprehensive assessment instruments based on key quality factors

Purpose – The field of website quality evaluation attracts the interest of a range of disciplines, each bringing its own particular perspective to bear. This study aims to identify the main characteristics – methods, techniques and tools – of the instruments of evaluation described in this literature, with a specific concern for the factors analysed, and based on these, a multipurpose model is proposed for the development of new comprehensive instruments. Design/methodology/approach – Followingasystematicbibliographicreview,305publicationsonwebsite qualityareexamined,thefield ’ sleadingauthors,theirdisciplinesoforiginandthesectorstowhichthewebsites being assessed belong are identified, and the methods they employ characterised. Findings – Evaluations of website quality tend to be conducted with one of three primary focuses: strategic, functional or experiential. The technique of expert analysis predominates over user studies and most of the instruments examined classify the characteristics to be evaluated – for example, usability and content – into factors thatoperate atdifferent levels, albeitthatthere islittle agreement onthe names used inreferring tothem. Originality/value – Based on the factors detected in the 50 most cited works, a model is developed that classifiesthesefactorsinto13dimensionsandmorethan120generalparameters.Theresultingmodelprovides acomprehensiveevaluationframeworkandconstitutesaninitialsteptowardsasharedconceptualizationofthedisciplineofwebsitequality.


Introduction
Over the last three decades, websites have become one of the most important platforms on the Internet for disseminating information and providing services to society.Shortly after their first appearance, the need to evaluate website quality became evident.The earliest analyses were developed by experts in human-computer interaction and comprised usability heuristics the core elements of analysis that make it possible to operationalize and assess the parameters.Thus, for example, the dimension of "information architecture" includes "labelling" as one of its parameters and this, in turn, includes, among others, "conciseness", "syntactic agreement", "univocity" and "universality" as its indicators.
To evaluate these indicators, website quality studies employ different methodologies, experimental and quasi-experimental as well as descriptive and observational, typical of the associative or correlational paradigm.Likewise, such evaluations might adopt either qualitative or quantitative perspectives, undertaking both subjective and objective assessments.Similarly, they might employ either participatory and direct methodsas they record user opinionsor non-participatory or indirect methodssuch as inspection or web analytics.
In the case of participatory methods, user experience (UX) studies have focused on user preferences, perceptions, emotions and physical and psychological responses that can occur before, during and after the use of a website (Bevan et al., 2015).The most frequently employed techniques are testingwhich resorts to the use of such instruments as usability tests, A/B tests and task analyses; observationcentred on ethnographic, think-aloud and diary studies-; questionnairesincluding surveys, interviews and focus groups; and biometricswhich uses eye tracking, psychometric and physiological reaction tests, to name just a few (Rosala and Krause, 2020).
Among the most common methods of inspection, we find expert analysis, a procedure for examining the quality of a site or a group of sites employing guidelines, heuristic principles or sets of good practices (Codina and Pedraza-Jim enez, 2016).The most common instrument is that of heuristic evaluation, in which a group of specialists judge whether each element of a user interface adheres to principles of usability, known as heuristics (Paz et al., 2015;Jainari et al., 2022).
Other instruments employed in undertaking inspections include checklists, in which each indicator usually takes the form of a question, and whose answertypically binaryshows whether or not the quality factor under analysis is met; scales, where each indicator is assigned a relative weight based on the importance established or calculated by the experts for each parameter under evaluation (Fern andez-Cavia et al., 2014); indices, metrics that not only evaluate a website's quality, but also how good it is in comparison with similar sites (Xanthidis et al., 2009); and analytical systems, typically qualitative instruments of either a general or specialized nature, which are mainly aimed at evaluating individual websites, conducting benchmarking studies, or for use as web design guides.These systems of analysis vary depending on the factors that their creators consider key to determine the quality of a website (Sanabre et al., 2020).In this study, in order to standardise their name, we refer to them as "evaluation instruments".
These instruments can be applied manually, that is, by experts in website quality or those with an understanding of the discipline; in a semi-automated fashion, with the help of software and specialised validators (Ismailova and Inal, 2017); or in a fully automated manner (Adepoju and Shehu, 2014), using techniques of artificial intelligence (Jayanthi and Krishnakumari, 2016) or natural language processing (Nikoli c et al., 2020).Thus, content analysisa major technique in website quality inspectioncan be applied in one of three ways.
Finally, we also find techniques aimed at the strategic analysis of performance (Kr ol and Zdonek, 2020), including return on investment; search engine positioning (Lopezosa et al., 2019); competitiveness, including web analytics (Kaushik, 2010) and webmetrics (Orduña-Malea and Aguillo, 2014).Additionally, within this group we find mathematical models for decision making with multiple, hybrid, intuitive or fuzzy criteria (Anusha, 2014).By employing criteria at different, unconnected, levels, these models establish a hierarchy of evaluable factors (Rekik et al., 2015).They are used, among other applications, to weight user responses and generate indices of satisfaction or purchase intention.

Website quality evaluation
Thus, this review of the literature highlights that the study of website quality is multidimensional.Moreover, such evaluations can employ a range of different focuses and employ multiple techniques and instruments.With this as our working hypothesis, we seek here to determine the properties that characterise the main website quality evaluation instruments, as well as to identify the dimensions, parameters and indicators that they analyse in each case.Based on these outcomes, we develop a comprehensive evaluation framework (Rocha, 2012).This, in addition to unifying the different concepts examined and helping to clarify the broad panorama comprised by website quality publications, should serve both as a guide and model for the development of new instruments that can be employed by professionals and researchers alike in this field.

Objectives
The general objective of this article is to identify the main characteristics of the instruments of website quality evaluation described in the literature, with particular attention to the factors they analyse, and then, based on this analysis, to propose a multipurpose model for the development of new comprehensive instruments.

Specific objectives
(1) Characterize the main methods and techniques of evaluation used in website quality analyses, while identifying the specific focus of the instruments proposed: be it strategic, functional or experiential.
(2) Determine which website quality factors are used by the instruments employed in the most cited works, and how these are grouped into different dimensions, parameters and indicators.
(3) Build a model that can serve as a guide for the development of future instruments for evaluating website quality.

Methodology
To achieve the objectives outlined above, the systematic bibliographic review method (Booth et al., 2016) was employed, undertaking a search in academic databases and conducting a systematic mapping of the literature (Gough et al., 2017).Specifically, the review was carried out applying the SALSA protocol (Grant and Booth, 2009), which includes the search, appraisal, analysis and synthesis of the selected works.
In the search phase, to identify the main published works on website quality evaluation, we used the search equation presented below, comprising the most common keywords in the specialized literature and representative of the main facet of the field as it stands today: [website OR "web site" OR "web sites"] AND [quality] AND [evaluation OR evaluating OR evaluate OR analysis OR assessment OR assess OR assessing OR assurance OR index OR guideline OR standard OR heuristic].
The query was executed in the multidisciplinary databases of Web of Science (WoS) and Scopus and the results were ordered by relevance, filtered by language, selecting only studies published in English, and by year of publication, comprising the six-year period 2014-2019 (Codina, 2018).This procedure was repeated in other specialized databases of importance in the discipline, including IEEE, ACM, Emerald and the LISTA collection of EBSCO, among other information resources.
Likewise, the Google Scholar search engine was also used, which in addition to its wide coverage (Mart ın-Mart ın et al., 2018), includes books, technical reports and other documents JD 79,7 of interest to both the academic and professional community in the field of website development.To these were added international guidelines and standards detected by undertaking a systematic mapping review (Gough et al., 2017).As a result, a corpus of 432 documents was created, once duplicates and false positives had been excluded.
These documents were appraised by conducting a manual examination of titles and abstracts to determine whether they met the established inclusion or exclusion criteria.The former included studies dedicated to website quality analysis published in the previously established period and language.Publications dedicated solely to web analytics, studies of mobile phone applications and studies focused on user psychology and not on a particular website were excluded.Thus, an evidence base (Yin, 2015) comprising 305 documents was finally obtained.
In the third phase, all the papers were reviewed, their formal aspects described, their quality attributes and methodological tools classified according to a code book (Lavrakas, 2008), and relevant data about their content collected.Then, based on the number of citations reported in Google Scholar as of September 2020, the average number of citationsaverage citation count, ACCwas determined, normalised according to the number of years elapsed since publication (Dey et al., 2018).Using this indicator, we identified the 50 most cited texts, which account for 86% of the total number of citations received.
Finally, in the synthesis phase, all the data were systematized onto a spreadsheet containing the following details: the characteristics of the websites evaluated; the parameters and indicators considered as quality factors; and the respective methods, models, instruments and software on which the evaluation instruments proposed in each study are based.

Results
The main findings from the coincidence count conducted on the data obtained in the synthesis phase, and the most relevant outcomes derived as a result, are detailed below.

Characteristics
Between 2014 and 2019, a total of 305 publications on website quality evaluation were found, with an average of 51 studies per year.A steady upward trend is evident in the period analysed.
Among the scientific journals, 166 different titles were detected, 44 of which belong to the field of health and medical informatics.The journals with the highest number of articles published on this subject were The Electronic Library, International Journal of Engineering and Technology, International Journal of Information Management, Online Information Review and Universal Access in the Information Society.
The number of citations received by each text according to Google Scholar (GS) was also recorded.Table 1 shows the fifteen works with the highest average citation count.The first twelve positions are occupied by website quality guidelines, such as those of the World Wide Web Consortium (W3C, 2016) and new editions of reference books in the discipline (Krug, 2014;Sauro and Lewis, 2016;Shneiderman et al., 2016).These publications mostly contain general recommendations, that is, applicable to any website, with the exception of the guide for websites of the European Union (European Commission, 2016) and the HONcode (Health On the Net, 2017), specialized in medical information.
The level of specialization of the evaluation instruments proposed in these works was also examined.Specifically, a distinction was drawn between those that propose an analysis applicable to all types of website (general) and those that focus on a specific sector.It emerged that most of the evaluation instruments (73.4%) focus on a particular sector (Figure 1).
The same figure shows that the latter are led by the education sectoruniversities, libraries and museums, among othersclosely followed by the health sector, which includes both health sites and hospital websites.At a lower scale, we find the government sector, which focuses on the quality of websites of government administrations and municipalities; commerce, dominated by e-commerce stores; tourism, with sites of destinations, hotels and airlines; and the media, focused on the Internet news media.Methods, focuses and techniques A clear predominance of the associative or correlational paradigm is observed in the type of applied research conducted on the evaluation instruments as opposed to experimental.Indeed, most of the analytic instruments use observational or descriptive methodologies.Also evident is the pre-eminence of qualitative over quantitative approaches, and a balance between objective evaluations, based on the verification of verifiable characteristics, and subjective assessments, based on the perceptions of experts and users.
In turn, most of the proposals are based on non-participatory or indirect methods and, as a result, there are fewer instruments based on surveys or interviews.Similarly, there are a greater number of studies that focus on the verification of technical and functional requisites (57.4%) compared to those concerned with user experience (23.0%), with the strategic objectives of the site owner (14%), or mixed (5.5%).
If we examine more specifically the instruments present in all the publications (Table 2), we find that three-quarters were designed to be applied by professional experts in website quality, and include checklists, indices and scales, and specialized instruments that articulate various dimensions for evaluation.In contrast, usability tests and user questionnaires are much scarcer.

Dimensions, parameters and indicators
Of the 305 publications, 241 (79.0%) present website quality criteria expressed as dimensions, parameters or indicators, the latter being the most specific unit of analysis.To further our examination of these criteria, we concentrate on the systematization of the criteria present in the evaluation instruments proposed in the fifty works with the highest average number of citations.
Overall, we detected 38 factors explicitly stated as dimensions or parameters and 154 as indicators.As Table 3 shows, there is a degree of overlap between the two lists given that each author ranks and classifies the website quality factors differently depending on their own specific objectives.
It is apparent that usability and accessibility occupy the first positions both as a dimension or parameter and as an indicator.However, if all the factors directly linked to content are grouped togetherthat is, readability, language, transparency and othersthis criterion is the one that concentrates the highest number of mentions.Information architecture and navigation and interface graphic design also feature prominently.
It should be noted here that there are entire studies that focus exclusively on a single parameterthe case, for example, of credibility (Choi and Stvilia, 2015) and accessibility (Kamoun and Almourad, 2014) but which are treated as just one more indicator in others.There are also instruments that include indicators that apply only to very specific types of site, such as "public values" and "citizen engagement" on local government websites Website quality evaluation (Karkin and Janssen, 2014) or "emotional appeal" and "use of science in argumentation" in health websites (Keselman et al., 2019).
Likewise, we detect indictors that differ greatly in their nature.Thus, atomic and dichotomous indicators, verifying the presence of a specific elementsuch as an internal search engine or contact informationcoexist with other more abstract, subjective properties, such as coherence, integrity, aesthetic appeal or familiarity.This multiplicity of characteristics and conditions in the nature of the indicators leads us to propose a categorization (Table 4) that should facilitate a better understanding of them.
As can be seen, the indicators can be designed with a specific focus in mind, be they strategic, functional or experiential in nature.The latter, for example, cannot be assessed by means of a metric or a technical inspection, but require a more complex evaluationoften expressed using a scale or scoreapplied by an expert or by recording the perceptions expressed by a website's users.Instruments, tools and models Precisely because of this need to measure indicators of a different nature, website quality evaluation uses a multiplicity of instruments, models and tools.Many originate from the research methodologies employed in the social sciencesthe case, for example, of questionnaires, interviews and observation, while otherssuch as web analytics and code validatorswere formulated specifically to evaluate a site's characteristics.Table 5 reports the techniques most frequently employed by the evaluation instruments described in the 50 publications with the highest number of average citations.It shows that undertaking surveys is the most frequently used technique for collecting user data in these studies.Other techniques used for this purpose include task observation, usability tests, interviews and focus groups.Expert analyses are also represented, as identified through the use of checklists, content analyses, manual inspections and web analytics, all of which are indirect methods that do not necessarily require user participation.
The instruments also employ specialized tools and software, among which we find both manual proceduressuch as the DISCERN or HONcode guidelines (Dueppen et al., 2019;Manchaiah et al., 2019) for the evaluation of medical information on the Internet and the Web Content Accessibility Guidelines (WCAG) 2.0and automated inspection mechanisms, including the W3C HTML code and CSS validators, the Majestic SEO tool for analysing backlinks and the Readability Studio software, aimed at determining text readability (Cajita et al., 2017).
Other software mentioned include AChecker, EvalAccess 2.0, WaaT and Fujitsu Web Accessibility Inspector for automated accessibility validation; Xenu's Link Sleuth and LinkMiner for broken link detection; Pingdom, for monitoring download speed and service availability; SortSite for website technical analysis; mobileOK for mobile adaptability; and SimilarWeb for measuring the site's traffic and that of its competitors, to name just a few (Ismailova and Inal, 2017).
We also find mathematical models designed for multiple-criteria decision-making that are employed primarily in e-commerce sites.In models of this type, user and expert responses, collected by means of assessment scales, are subjected to a weighting of variables mechanism to obtain, for example, an index of perceived quality (Crist obal Fransi et al., 2017) or of content credibility (Choi and Stvilia, 2015).

Proposed model
Following on from the review of the literature dedicated to website quality evaluation and drawing above all on the 50 most cited works, we propose a multipurpose model with three specific focuses for the formulation and application of comprehensive instruments of evaluation.We divide this model in three parts: the first provides a breakdown of website Website quality evaluation quality parameters, organized according to the specific focus they offer; the second serves as a visual scheme of the model's main dimensions and focuses; and the third, comprises a set of tasks or a framework that synthesizes the stages that a researcher needs to consider when designing a website quality evaluation instrument.
In Table 6, we classify into thirteen dimensions the more than 120 website quality factors that appear most frequently in the 50 most cited texts.These factors are treated here as parameters because each of them can be broken down further into a number of different indicators.The dimensions are presented in descending order of frequency as they appear in the literature, while the parameters are organised according to the specific focus taken by the study.
Thus, the table compiles the parameters that have been the object of most attention in the website quality studies identified as having greatest impact.The model proposed on the basis of this mapping aims to offer researchers wanting to design new evaluation instruments a broad initial set of common parameters.The parameters, moreover, are all of a general nature and, as such, can be applied to any type of website.Consequently, the parameters can also be used to complement the specific parameters of sector-specific evaluation instruments.
As can be seen, usability and content are the dimensions with the most parameters, while the others are made up of fewer.However, here, we have opted for a hierarchical structure in order that important website factors, such as user assistance and support, advertising and legal aspects which are typically dealt with less frequently in the literature, are more visible.In so doing, we also seek to identify gaps and, hence, research opportunitiesthe case, for example, of the parameters to evaluate website services, which are not as well developed as those of website content.It also emerges that while certain parameters respond to more than one focus within the same dimension, the case, for example, of multilingualism or user satisfaction, we have opted not to repeat them but rather to take a decision regarding their classification.
The second component of the model is a diagram (Figure 2) that synthesizes the dimensions of website quality evaluation, placing at its core the three analytical focuses proposed: strategic, experiential and functional.For each focus, it then shows, in a tiered arrangement, the dimensions that we consider most important for any website.The figure can be read as follows: starting from the base with the site's essentials elements and working up to the peak, we begin the evaluation by determining how solid the content and services base is and continue by analysing its interface and user experience and conclude by verifying if the website owner's strategic objectivesa critical factor in any exhaustive evaluationhave been met.
Finally, the model also includes a framework or procedure for the creation of instruments to evaluate website quality.Table 7 classifies and sets out the individual steps required to design either a general or sector-specific instrument.It is organized in accordance with the most frequently employed techniques in the discipline: namely, user studies, expert analyses and strategic analyses.In this way, those responsible for the creation of the instrument can opt to incorporate those techniques they consider most pertinent, with triangulation being recommended for the most exhaustive evaluation.
This framework is divided, in turn, into five design stages: definition, research, parameterization, testing and validation.In the first, the design of the instrument is planned in relation to a set of given requirementsincluding the objectives and scopeand the conditions that delimit its useincluding the resources and the degree of data access granted the key informants.In the second, the research stage, a study is undertaken of the specific characteristics of the sector to which the site belongs, its context of use, the profile of its users and the concrete recommendations previously made by other experts.These first two stages are common to each of the three techniques addressed.
From the third stage onwards, the tasks vary depending on the technique chosen by the creators of the instrument (see Table 7).In the parameterization stage, all the sector-specific quality factors relevant to the website's objective are determined.Then, in the testing stage, an initial test of the instrument is made to identify opportunities for improvement and to

Website quality evaluation
calibrate it for purposes of optimization.Finally, in the validation stage, its reproducibility is verified based on the observations of other experts.
In this way, the model guarantees that any evaluation instrument created using this methodology provides an exhaustive analysis of the quality of any given website.This is thanks to the fact that model recommends the use of a triangulation of focuses and techniques and considers such components as: the testing of the general heuristics of usability; the expert analysis of sector-specific indicators; the study of users, albeit with indirect methods such as web analytics; and, importantly, the verification that the site meets its strategic objectives.
To ensure that the cycle of enhancement continues to have positive effects on the websites analysed, we recommend the communication of the results in a timely and effective manner, with a summary of the most relevant findings or insights, accompanied ideally by suggested approaches to address the solution of the most recurrent problems.6.

JD 79,7
Discussion Based on these results, and in line with the conclusions drawn by other authors (Rekik et al., 2018; Semer adov a and Weinlich, 2020), it is evident that studies concerned with website quality evaluation have undergone steady growth in recent years, attracting primarily the interest of authors from academia, but also from the professional world.In this regard, the interest of a number of specific academic disciplines for such analyses is notable, led by the computer sciences, health sciences and business.However, it is worth stressing that no interdisciplinary or transdisciplinary studies involving these fields of study have been detected and that most of the papers cite almost exclusively references from their own discipline.
At the same time, it is apparent that proposals for sector-specific or specialized evaluation instruments are increasing (Morales-Vargas et al., 2020).The education and health sectorsabove all, the analysis of health information sitesare the sectors with the highest number of studies, followed by those of government, commerce and tourism.
A finding of some relevance, here, is the focus adopted by the website quality evaluation instruments.All in all, we detect three primary focuses: strategic or oriented to the fulfilment of the site owner's objectives; functional, present in more than half the proposals and designed to verify the presence of technical factors; and experiential, with a concern for user experience and perception.Sanabre et al. (2020) are pioneers in combining strategic and functional focuses, but the incorporation of all three is not evident in any of studies reviewed herein.
A common element in the way evaluation instruments are organized is the fact that most opt to express the factors to be analysed in dimensions, parameters and indicators.Although a variety of different names are employed to refer to themincluding, attributes, criteria, variables and characteristics, as reported by Chiou et al. (2010) what is present in all of them is the idea of starting from broad groups of properties which are then gradually broken down into more specific units of analysis that facilitate inspection.Website quality evaluation I. Definition 1. Define the instrument's objectives and identify its target users 2. Establish its scope: general or sector-specific 3. Determine the resources, deadlines and degree of effective access to information 4. Delimit the scope of the analysis: exhaustive or centred on a specific parameter 5. Determine the focus of analysis  Content, usability and accessibility are the most frequently occurring dimensions among the most cited studies, followed by information architecture and visual design.In the case of the pre-eminence of content, our results coincide with those of Cao and Yang, (2016) and Hasan (2014).Similarly, as regards the number of different indicators detected, our results are in line with the outcomes reported by Sun et al. (2019)Sun et al., 2019. However, other studies have tended to assign the leading role to credibility (Choi and Stvilia, 2015;Huang and Benyoucef, 2014), functionality (Law, 2019) or trust (Daraz et al., 2019).
Our study also identifies indicators of a different nature, including, for example, their level of specificity.Therefore, here, for the first time, we propose a categorization of the parameters according to their scope, site of validation, focus, way of scoring and perspective.We construct a model that classifies the parameters detectednumbering more than 120in thirteen dimensions and three focuses.In this way, we seek to identify the elements that make up an instrument for evaluating websites as well as the main characteristics it is designed to analyse.The classification we propose is based on previous studies that have been validated by experts, including for example the Lee-Geiller and Lee (2019) model.
Having identified the general parameters for the evaluation of all types of website, we propose a procedure for creating new instruments for evaluating website quality.The procedure comprises the following five stages: (1) definition of objectives; (2) study of the characteristics specific to a given sector; (3) the parameterization of the most relevant attributes; (4) the piloting or testing of the instrument; and (5) its subsequent validation by other experts.In this way, an evaluation centred on three main points of focusstrategic, functional and experientialis guaranteed, satisfying also the need to use multiple tools as detected by Rekik et al. (2018)Rekik et al., 2018and triangulation, as recommended by Whitenton (2021).

Conclusions
As is more than evident, website quality as a field of study continues to occupy a broad space in which different areas of knowledge are in continuous dialogue.But the field has yet to develop a shared terminology, a shortcoming that hinders efforts to establish its conceptualization as a discipline in its own right.
Despite the technological advances made and the growing technical mastery of their users, websites are still in need of evaluation instruments that can enhance both their performance and user experience.This is most apparent when these websites belong to a sector whose content, functions and services are characterised by a set of specific requirements.
As such, we wish to highlight the importance in the field of website quality of being able to identify and analyse a set of dimensions, parameters and indicators that are specific to each type Website quality evaluation of website.However, at the same time, it is critical that this be done by adopting a range of different focuses: in other words, the instrument of evaluation has to be able to assess the technical and functional requirements as well the website's strategic objectives and user experience.
This study, therefore, proposes a model for the development of new comprehensive instruments for the evaluation of website quality that are applicable to a very broad set of domains.It also constitutes an initial step in the adoption of a shared conceptualization in this field of study.The latter should, moreover, promote the sharing, reuse and comparison of the instruments proposed by other website quality researchers and professionals working in different disciplines.
Source(s): Created by authors based on most cited publications Table Figure 2. Focuses and dimensions of website quality evaluation