Development and validation of a scale for measuring hospital service quality: a dyadic approach

Purpose – This study integrates the providers ’ perspective as well as the patient ’ s perspective in developing and validating a scale to measure hospital service quality in multispecialty hospitals. Design/methodology/approach – An exploratory sequential mixed-method approach was used in this study. The strategies used included a thematic literature review, semi-structured interviews, modified Delphi and confirmatory factor analysis. Findings – The reliability coefficient of 41 item scale was 0.963 with each attribute, that is, pivotal, core and peripheral, having a Cronbach ’ s alpha of 0.907, 0.91 and 0.891, with scale content validity (S-CVI Ave) of 0.9151. The composite reliability scores of all constructs were greater than 0.7, with an Average Variance Explained (AVE) of all items greater than 0.5. Originality/value – The instrument can be used to measure the difference between what service providers believe customers expect and customers ’ actual needs and expectations. The scale can be used to measure the difference between what is delivered (as perceived by the provider) and what customers perceive they have received (because they are unable to accurately evaluate service quality). The dyadic approach of administering this questionnaire in measuring hospital service quality will lead to the identification of a knowledge gap and a perception gap in delivering hospital service quality.


Introduction
In the past decade, much of the hospital service quality (HSQ) research has focused on managing customer expectations and perceptions [1].HSQ is usually evaluated as the Scale for measuring hospital service quality gap between health care seekers' expectations and their perception of performance.The dimensions on which HSQ is measured vary from being unidimensional to as large as having ten dimensions [2].SERVQUAL is a widely used instrument to measure the service quality gap between customer expectations and perceptions on five dimensions [3].Some authors proposed the alternative of measuring the customer's perceptions alone on these dimensions is sufficient to judge service quality SERVPERF [4].Besides these two, several other variations are available to measure HSQ in various health care settings [3], but all with a user-centric approach.
In health care services, the user-centric view of service quality may be prejudiced, with information asymmetry being one of them.Consequently, patients/attendants are left with no option but to believe in what they have been informed of or delivered.The patient/attendant is considered as a layman in evaluating the fundamental medical/clinical care [2].HSQ evaluations, therefore, tend to be biased toward a process of delivery and physical settings of hospitals that a patient/attendant can easily evaluate.Unlike other services, patients are in a state of physical or psychological discomfort [5] and are likely to see service quality differently from what is seen by service providers [6].Despite health care seekers' disposition in measuring functional aspects of care, providers of the care believe that an acceptable level of technical quality should precede it [7].Therefore, from the providers' perspective, HSQ may vary based on the knowledge and professional effort applied by them [8].Further, physical and emotional job-related stress may cause service quality to vary [9].The inherent inseparability of service provider and seeker of care in professional service like health care calls for a dyadic view instead of taking either seekers' or providers' perspective of HSQ [10].
Hospital service quality literature seemingly sheds little light on relevant questions: [1] What could be a possible way to evaluate service quality considering the dyadic nature of professional exchange in the service relationship?[2] Which dimensions are reflective of these dyadic exchanges?The dyadic perspective of measuring hospital service quality will pave a new way for not only assessing the service quality gap between customer expectations and perceptions but also measuring knowledge and perception gap [11] benefiting seekers, practitioners, hospital managers and administrators.Measuring perceptions of service seekers and providers can improve relationships, job satisfaction and performance in the health care delivery process.The dearth of HSQ measurement scales, incorporating both the participant's perspectives beginning from item creation to final development and validation, adds novelty to our study.
Adopting mixed-method research, a literature search resulted in the identification of eighteen dimensions of hospital service quality.The PCP model [12] helped in classifying these dimensions under pivotal (end product or outcome), core people.Process and organizational structure and peripheral attributes (incidental extras or frills around the service encounters) serve as priority themes for conducting interview rounds with health care service seekers and providers.The template analysis [13] of the health care seekers' and providers' interview-based textual data generated an item pool of 107 unique statements.The statements are evaluated for content validity using a modified Delphi approach from an authoritative panel.The scale is refined and tested for its reliability and validity using confirmatory factor analysis.The final forty-one-item scale incorporates thirteen dimensions of hospital service quality from both, health care seekers and providers.

Methodology
Phase 1: item generation process An exemplary review of articles related to hospital service quality was conducted during Jan-March, 2018.A total of sixty-three articles published in thirty-four journals available to the authors were reviewed to identify the determinants of hospital service quality.This led to the identification of priority themes for the subsequent round of interviews.
Patients and their attendants who had visited any multispecialty hospital in the previous year were approached using snowball sampling.Eleven women and ten men participated in the survey; they were in the age group of twenty-five to sixty-two years.Semi-structured interviews were conducted with them during the period June-September 2018.During the same period, fifteen doctors, nine nursing and para-medical staff and three hospital administrators/managers were also interviewed who were working in three multispecialty hospitals.The respondents were approached using snowball sampling and the sample constituted sixteen women and eleven men in the age group of twenty-five to fifty-one years.The template analysis technique [13] was used to analyze the qualitative data generated through the interviews.
Phase 2: Modified Delphi process Expert selection.As the dimension of hospital service quality was identified previously, round one of the "Classical Delphi" became redundant in our study and called for the use of a "Modified Delphi" with a heterogeneous panel.Twenty-six panelists were approached using purposive sampling for participation in the survey, and the purpose and design of the study were explained to them.Informed consent was taken from the panelist and their anonymity was maintained during the entire survey.
Data collection.Twenty-six panelists were invited to participate in the survey between August and October 2019.All consented to participate in the survey.A paper survey was designed, and each respondent was briefed on how to fill the survey questionnaire.After one reminder, twenty-three panelists returned the questionnaire, and three could not participate due to other engagements.The authoritative coefficient was used to establish the credibility (Cr) of the panel members [14].This was determined by two factors: (Ca) the judgment criterion of the indicator and the familiarity with the indicator (Cs).A value of Cr greater than 0.7 was considered as the acceptable level.
In round 1, panelists were invited to provide a rating on a 5-point Likert scale suited for the surveys where the purpose was to measure the level of agreement.The panelists were required to rate their degree of agreement with the items in the survey on an ordinal scale of strongly disagree to strongly agree.The median rating was calculated for each item.All the items were re-presented to the panelists in round two for reviewing their ratings concerning the median rating of the group computed in round one.
The items with a median rating of 4 or more were assumed to initially qualify for being accepted as the item measuring hospital service quality.The panelists were also asked to rate the relevance of the items in the instrument of the decisive ordinal scale of 1 to 4 [where 1 is not relevant, 2 5 somewhat relevant, 3 5 quite relevant and 4 being highly relevant].The ratings of 3 and 4 were considered content valid for items in the instrument.To ensure the stability of responses, multi-rater kappa coefficient for the degree of agreement beyond chance was calculated [15].Thus, items with a median rating of 4 and above and I-CVI values above 0.79 were retained.

Phase 3: scale refinement and validation
A close-ended self-administered questionnaire was used to collect information from caregivers, bearing items arrived at from the previous Delphi round.The data were collected online and offline using convenience sampling to avoid common method bias.The psychometric properties of the proposed instrument to measure hospital service quality were tested using confirmatory factor analysis [16].Cronbach's alpha (>0.7), composite reliability (>0.7) and unidimensionality through average variance explained (>0.5) was checked as per the guidelines [17].The schema of the research process is shown in Figure 1.[12].After removing the redundancy and similar meaning statements, the final item pool of 107 statements was prepared for the Delphi round.

Delphi round
Twenty-three panelists participated in the first round (88%) and the second round (100%) of the survey.The mean authoritative coefficient value Cr was 0.79 (SD 5 0.06) which was found to be good (Table 1).Of the total 107 statements presented to the panelists, only twenty-two items met the consensus criteria, that is, items having a median rating of greater than or equal to 4 and item-content validity (I-CVI) greater than 0.79.The individual ratings were aggregated and summarized, and every panelist was re-presented with a survey containing their individual ratings on all the statements and the aggregated rating.Panelists were given a chance to revisit their level of agreement with each statement in light of the group response.The second round of Delphi resulted in the retention of forty-nine statements achieving consensus fulfilling both the criteria of a median rating greater than or equal to 4 and I-CVI (Ave) greater than 0.79 (Range 0.8261 to 1).
Scale-content validity (S-CVI/Ave) was calculated for the retained items after the second round of Delphi.The S-CVI/Ave value of 0.9095 (SD 5 0.0531) was achieved which was above 0.9 showing high excellent content validity [18].Fleiss's kappa was used to calculate inter-rater reliability.The k value ranged from À1 to þ,1 with positive values indicating substantial agreement between the raters.Fleiss's kappa value of 0.63 indicated substantial inter-rater reliability [19].The p-value of less than 0.5 indicated that the agreement between the raters was significantly better than that would have been achieved by chance.

Tests of scale refinement and validation
The online and offline survey resulted in the collection of 403 responses (288 online, 115 offline).Ten questionnaires were rendered non-usable due to missing information.Six samples were considered outliers as the observations had a unique combination of values across variables [17] resulting in 387 usable responses.The values of different absolute, relative and non-centrality-based fit indices are shown in Table 2 surpassed the recommended threshold values of all the dimensions in the three attributes.Composite reliability (CR) of all dimensions in the final 41 items scale was above 0.7 [17], establishing the construct reliability with a minor deviation in the charges and payment construct (CPA).AVE of all the constructs was greater than 0.50, indicating good convergent validity as shown in Table 3.The widely used measure for reliability coefficient Cronbach's alpha of the complete scale with 41 items was found to be 0.963 (>0.7).The 15 items of pivotal attribute,

Discussion
The scale distinguishes itself from the other scales for measuring HSQ by incorporating a dyadic approach to service encounters, use of thorough scale development processing and an authoritative panel.One of the previously developed scales using the dyadic approach used items in the instrument only based on face and content validity established in discussion with experts [16].Another existing scale in the development stage planned to conduct in-depth interviews but ultimately resorted only to modified Delphi due to constraints and further lacks in the establishment of the authority of the heterogeneous panel in Delphi [20].Other contemporary scales developed so far for measuring hospital service quality have borrowed most of the items in the scale only from the literature [5,16] with a little effort, recognizing the fact that the health care needs of developing countries are different from others [21].The thirteen dimensions of HSQ have linkages with the five-dimensional construct of the SERVQUAL scale.Diagnosis and treatment, professional skills and competence of service providers and medical communication add to the reliability of the service.The process construct depicts the responsiveness of the service provider.Patient safety and privacy, personal behavior and charges and payments provide assurance to the customers.Caring individual attention showing empathy was indicated by the need management and discharge construct in our scale.The medical infrastructure, amenities and physical infrastructure and quality of room and food added up to tangibility.However, the authors recommend that the proposed service quality should be classified under the PCP model of service quality because of their better linkages with it.
The authors propose a dyadic approach in using this questionnaire to measure HSQ.The scale can be used to measure the knowledge gap, that is, difference between what service providers believe customers expect and customer's actual needs and expectations.Further, the scale can be used to measure the perception gap, that is, difference between what is delivered [as perceived by the provider] and what customers perceive they have received.The dyadic approach of administering this questionnaire in measuring hospital service quality will lead to not only a measurement of the service gap but also the identification of knowledge and perception gaps in HSQ [10,11].
The identified knowledge gap and perception gap will help in building better service design while the perception gap will help in bridging the gap in service performance.Service quality managers and hospital administrators can benefit from the use of this questionnaire to accurately measure service quality and improve upon it, leading to increased profitability.
of Sixty Three articles Identification of Eighteen themes of service quality Semi-structured interviews with: -Twenty One patients and attendants -Twenty Four doctors, nurses and para-medical staff, hospital quality manager and administrator Template Analysis of Qualitative Interviews Generation of Item Pool for Delphi Process (n = 107) Rated Indicators (n = 107) Indicators reaching consensus (n = 22) (median >= 4, ICVI > 0.79) Re-rated Indicators (n = 107) (with feedback) Indicators reaching consensus (n = 50) (median >= 4, ICVI > 0.79) 49 items questionnaire administered through online mode (n = 402) 387 usable responses were analysed using CFA for Good of Fit indices Internal Consistency (Cronbach's alpha > 0.7)of scales, Composite Reliability of constructs (CR > 0.7) and Factor loadings of items (AVE > 0. [13]cal considerationsThe study proposal and protocols were approved by the Chairman of Faculty Research Committee of University of Petroleum Studies, Dehradun, India, August 6, 2016, ref no.UPES/Ph.D/FRC-5-6Aug'16/2016/19.Thematic literature search and semi-structured interviews resulted in an identification pool of statements.Statements were reflective of either view given by the respondents recorded during the interviews or statements used by previous researchers related to the measurement of health care service quality.Using template analysis[13], the items were classified under three attributes having fourteen different dimensions, namely (1) pivotal [end product or outcome] with diagnosis and treatment, medical infrastructure, needs management, patient safety, privacy, professional knowledge skills and competence.(2) core [people, process and organizational structure] admission, discharge, medical communication, personal behavior and process; (3) and peripheral attributes [incidental extras or frills around service encounters] amenities and physical infrastructure, charges and payment arrangement, image, quality room and food

Table 4 .
Content validation index scores of final questionnaire