Self-report questionnaires: who calibrates the reporter?

Team Performance Management

ISSN: 1352-7592

Article publication date: 1 August 1998

444

Citation

Beyerlein, M. (1998), "Self-report questionnaires: who calibrates the reporter?", Team Performance Management, Vol. 4 No. 5. https://doi.org/10.1108/tpm.1998.13504eaa.001

Publisher

:

Emerald Group Publishing Limited

Copyright © 1998, MCB UP Limited


Self-report questionnaires: who calibrates the reporter?

Self-report questionnaires: who calibrates the reporter?

Michael BeyerleinCenter for the Study of Work Teams, University of North Texas, Denton, Texas, USA

What is the Problem?

If you and I are members of the same team and fill out questionnaires designed to measure out perceptions of our work environment, how can responses to items on the questionnaire make sense? I may mark a 3 on a scale of 5 to indicate how much autonomy I perceive in doing my work, whereas you may mark a 4. Does that difference of 1 point mean that you possesses more autonomy, perceive more autonomy, define autonomy differently, or define a response of 4 differently than I do? The meaning of a 4 may vary from team member to team member; I consider this to be a calibration problem. I am concerned that lack of calibration before surveying a team undermines the value of the survey data and therefore reduces the validity of research findings based on the analysis of that data.

Research on work teams is beginning to accumulate. Until this decade, much of the research considered relevant was with laboratory groups. In the past six to eight years, more researchers have moved into the field to investigate the performance, context, and dynamics of work groups and work teams. Some useful findings have been emerging that provide an increasingly solid foundation for implementing, maintaining, and developing work teams within work settings. As a field of research matures, we should expect emergence of some findings that are useful for research and practice; we should also expect significant improvements in research methodologies and instruments.

Recent useful studies of work teams have utilized a combination of qualitative and quantitative methodologies. Useful findings emerge from the use of both types of methods, but perhaps the most useful studies are those that combine methods, e.g. those that utilize interviews and observations to creative a framework for questionnaire construction.

Questionnaires are popular in field studies, because they bring a certain economy to the effort. They enable us to capture data from a wide range of sites and/or teams without a significant increase in the resources expended. Questionnaires also provide an opportunity to collect data from a large number of survey participants and a large number of survey sites, whereas, in-depth investigations of single sites or single teams provide rich understanding of those systems. At the Center for the Study of Work Teams, we have conducted studies using both quantitative and qualitative methods. Our survey studies have included samples ranging from 48 to 212 work teams from 9 to 35 companies. That range seems fairly typical of published study sample sizes.

Although our questionnaires are carefully constructed, and typically based on interview and observation data, they have one major weakness; I believe every questionnaire used in research has such a weakness: calibration. Questionnaires are referred to as "self-report" instruments. The questionnaires are filled out by members of a team or occasionally by observers of a team to represent their perceptions of the team, its members, and its circumstances. The questionnaire items may be carefully selected, either from actual statements made by team members in interviews or by thoughtful researchers who have their own understanding of a construct, such as morale, climate, and trust. Each question is accompanied by a scale with some range of responses, such as those ranging from "strongly agree" to "strongly disagree." And, often the data collected is analyzed to establish some index of internal consistency (reliability) or relationships to other variables (validity). However, the real "instrument" in this process is not the questionnaire, it is the "reporter," that is, the person completing the questionnaire is the one who collected the actual data from experience, summarizes it in a holistic impression based on perception, emotion, and cognition and reports it by filling out the questionnaire. So, the question becomes: who calibrates the real instrument, the reporter?

The idea of "person as instrument" is not new; it has been part of the literature on organization development (OD) and clinical and counseling psychology for a long time. In the OD role, "person as instrument" typically refers to the holistic role of the consultant when making informal observations about the organization. For example, if the receptionist greets the consultant the first time he/she walks into the company in a style that is hostile or depressed, the style becomes a piece of data that may register as an emotional response for the consultant and contribute to assembling a mental model of the organization's work climate.

The same kind of role as "instrument" applies to counselors and clinicians ­ a use of the "gut" to perceive the whole situation in non-analytical, non-cognitive ways, using emotion and sensation to collect data to supplement the cognitive information one can gather (the "gut response" may be more precisely described by Herb Simon's work on expertise and intuition). The other aspect for the counselor and clinician is that of relationship building ­ establishing rapport with the client or customer. One can walk into nearly any store and customers and salespeople, and develop a "feeling" for the climate of the organization by looking around, watching the way people behave.

I am afraid that the value of the data we collect through the use of questionnaires is somewhat limited, because the items are typically created by the researchers and the results are interpreted by the researchers. The meaning of a response on a questionnaire is likely to be too elusive for an outsider to comprehend it. On the other hand, the individuals who fill out the questionnaire have a pretty good idea about what their responses meant and also what they did not mean. How can that meaning be more accurately represented by the questionnaire results when the researcher analyzes the data?

What are the options?

The invention of the survey-feedback process about 30 years ago addressed part of this problem of questionnaire response meaning. The survey-feedback process typically consists of the following steps:

  • questionnaire construction;

  • data collection;

  • simple analysis of responses (e.g. percentages of respondents selecting each level of agreement, average response on an item, and range of responses on an item);

  • presentation of the simple analysis to intact groups of respondents (e.g. each work group or team);

  • discussion with the group about the meaning of the pattern of responses on each item, or on each item with a pattern that suggests a strength or a weakness; and

  • capturing the important parts of that discussion (e.g. problems identified and solutions suggested) for follow-up work.

This is not new. This procedure has been available in publications for over 20 years. It seems to solve the "calibration of the real instrument" problem, but to do so after the fact ­ discussing the pattern of responses in order to determine what the pattern, and the item, means to the people who filled it out. That seems like a laudable approach, and there are published case studies that report on its effectiveness for initiating organizational change.

However, survey-feedback does not solve the calibration problem for situations where meaning of items is a critical component prior to data collection or for situations where group by group feedback is not possible (such sessions take quite a bit of time, so 100 or 200 teams would require a tremendous investment of researcher resources). Can calibration and meaning be dealt with prior to data collection?

First, let us examine calibration. There are several perspectives available for looking at this problem. But the most comprehensive perspective probably comes from the work of Armenakis (e.g. Armenakis, 1988; Bedeian et al., 1980; Buckley and Armenakis, 1987), Golembiewski (e.g. Golembiewski et al., 1976; Tennis et al., 1989), and others on alpha, beta, gamma change. Although their work focused on post-data collection analyses, the concepts they developed seem to help define the calibration problem. In a situation where a questionnaire has been administered at two different times, alpha change represents a simple change in the average response on the scale ­ this change is taken for granted; it suggests that a change for the group on an item on trust from an average of 2.9 to 3.3 is nothing more than such a linear increase. But Golembiewski et al. (1976) realized that such a change in average score might be explained by two other causes: beta and gamma change. Beta change occurs when a respondent's metric on the item increases or decreases the distance between the scale anchors; this "recalibration of the individual" might signal a change in level of discrimination on the topic. Gamma change occurs when the respondent actually changes his/her own definition of the construct represented by the item; this may signal that learning has occurred that changed the meaning of the terms or ideas represented by the item (Thompson and Hunt, 1996).

So, the meaning of a response on a questionnaire may represent a simple increase or decrease in the amount of some environmental facet like trust, or it may represent a change in scale, or a change in the meaning of the construct. We cannot know which of these is the case without collecting data twice and subjecting the data to a somewhat sophisticated analysis. Collecting data twice is often difficult.

So, although change typology work helps us understand something about calibration and meaning, it does not solve our problem. Let's consider another approach that at first looks less directly related to questionnaires. Steve Jones of Middle Tennessee State University conducts workshops on team performance measurement. The most outstanding feature of those workshops the organizing principle that the team chooses its out measurements.

Finally, shared mental models emerge as a group works together in ways that forge a team. Steve Jones' approach implies establishing of shared mental models; the sharing process generates a common understanding of the meaning of the goals the team chooses and calibrates the team members' perceptions of progress toward the goals. The members increasingly share elements of a common framework for perceiving their task work and their team work. Some authors suggest that advanced stages of team development are not attainable without extensive sharing of mental models. How does that sharing impact the "calibration of the reporter"?

The content of questionnaires administered to teams asks questions such as: "we trust each other," "we trust our leader," "we share concerns openly," "we have the tools we need," and "I want to remain a member of this team." Models of effective teams have contributed a long list of variables for consideration in collecting data that represents relationships of interest in the study of teams. Some of those models have been carefully crafted by researchers[1].

Three frameworks to make sense of data: ipsative, normative, and criterion. Ipsative measurement compares a person or group to themselves over time; it requires collecting data more than once to establish a base line; meaning of the scores arises from examining changes in the scores over time. Normative measurement makes sense of scores by comparing them against a database of scores from many other persons or groups. Normative data bases are often broken down into subsets, so comparisons can be made to appropriate subpopulations. Criterion-based interpretation of the data depends on comparison to some standard that is independent of any individual's or group's scores. The discussion above has all been within the framework of classical test theory. Modern test theory seems to provide a framework and tools (e.g. item response theory and Rasch modeling) that enable researchers to make such comparisons to true scores. However, I do not have sufficient expertise to comment on the potential of modern test theory approaches for solving the calibration problem.

I do not have a solution to recommend for the calibration problem. Several possible solutions developed by researchers in management, psychology, and education over the past 50 years were referred to above. My guess is that the next step in developing a solution will be to combine some of the existing methods. Until someone does that and makes it practical for use with field data, I will utilize the solution that best fits my data collection situation (e.g. one time versus two or more times for collecting data, a large sample versus a small one, etc.) with a bias toward the shared mental model solution and grounded theory. And, I will try to develop an increased awareness about the calibration problem when using self-report questionnaires in survey research.

Note

1. Chris Hall will review seven of the models in an article in a later issue of Team Performance Management.

References

Armenakis, A.A. (1988), "A review of research on the change typology" in Pasmore, W.A. and Woodman, R.W. (Eds), Research in Organizational Change and Development, Vol. 2, pp. 163-94.

Bedeian, A.G., Armenakis, A.A. and Bigson, R.W. (1980), "The measurement and control of beta change", Academy of Management Review, Vol. 5, pp. 561-6.

Buckley, M.R. and Armenakis, A.A. (1987), "Detecting scale recalibration in survey research: a laboratory investigation", Group & Organization Studies, Vol. 12 No. 4, pp. 464-81.

Golembiewski, R.T., Billingsley, K. and Yeager, S. (1976), "Measuring change and persistence in human affairs: types of change generated by OD designs", Journal of Applied Behavioral Science, Vol. 12 No. 2, pp. 133-57.

Tennis, C.N., Golembiewski, R.T., Bedeian, A.G. and Armenakis, A.A. (1989), "Responses to the alpha, beta, gamma change typology: cultural resistance to change", Group & Organization Studies, Vol. 14 No. 2, pp. 134-60.

Thompson, R.C. and Hunt, J.G. (1996), "Inside the black box of alpha, beta, and gamma change: using a cognitive-processing model to assess attitude structure", Academy of Management Review, Vol. 21 No. 3, pp. 655-90.

Related articles