Table of contents(19 chapters)
Evaluation is comprised of diverse, oftentimes conflicting, theories and practices that reflect the philosophies, ideologies, and assumptions of the time and place in which they were constructed. Underlying this diversity is the search for program quality. It is the search for understanding and, consequently, is the fundamental, and most challenging, task confronting the evaluator. This volume is divided into four broad sections, each of which conveys a different vision of quality that include postmodern and normative perspectives, practice-driven concerns, and applied examples in the field of education.
When I hear the term “report” or “representation” applied to the concept of expressing quality I feel as though I am expected to believe that an understanding of quality can be delivered in a nice, neat bundle. Granted, the delivery of information — numbers, dimensions, effects — can be an important part of such an expression, but it seems to me that the quality resides in and among these descriptors. By its very nature, therefore, quality is difficult to “report.” The only way to express this quality is through a concerted and careful effort of communication. It is for this reason that I prefer to limit my use of the term “reporting” to expressions of quantity, and my colleagues will hear me referring to the “communication” of quality.As I have noted, I see the communication of quality is an interactive process, whether this interaction takes the form of two friends talking about the quality of a backpack, an evaluator discussing the quality of a classroom teacher, or a critic's review speaking to its readers. In any case the effectiveness of the process is dependent on the interaction that takes place in the mind of the person who is accepting a representation (a re-presentation) of quality. The communicator's careful use of familiarity or some common language encourages this interaction and therefore enhances the communication of quality.I also used this forum to suggest that the complexities and responsibilities of social programs bring great importance to the effort of communicating quality. Given this importance, I recommend that program evaluators use descriptive and prescriptive methods, as well as subjectivity and objectivity, as tools to extend the capability of their work to communicate the quality that has been experienced. Again, their ability to communicate this quality rests upon the interaction that takes place between evaluator and audience. As I see it, the job of every evaluator, reviewer, and critic is to attend carefully to what has been described here as the communication of quality.
The simplest, unfortunately least effective, “defense” against Deputy Governor Danforth is, of course, to say “well, I just don't see things like that”. In Salem there was no mediating device between the wielders of judgment and those whose lives were to be judged, for the judges were putative agents of a higher, immutable logic. Hence, there was no appeal against the arbitrariness of the theocratic judgment — “As God have not empowered me like Joshua to stop this sun from rising, so I cannot withhold from them the perfection of their punishment” (Miller, 1976, p. 104), bemoaned the hapless Danforth. The standard is the standard. This, in fact — the absence of mediating structures — partially defines a theocratic state, a state that asserts a unitary system of meaning. But the liberal democratic state is partially defined by precisely its opposite — the presence of such mediating structures and the possibility of some distance between those who judge and those who are judged. An element of a liberal state is the live practice of holding logic of government publicly accountable; the increasing absence of this condition in Western democratic states is corrosive of liberalism.Evaluators inhabit that mediation territory — which is why it is so important for us to maintain some distance from each side of the judgment-equation (and, hence, Democratic Evaluation — in its best expression — refuses to make recommendations). Contemporary Western political cultures are fearful of such arbitrary power and vest some vestigial democratic authority in intermediary agencies. Evaluation — to greater and lesser degrees politically neutral — has thrived on that. Perhaps, as our democracies grow more liberalized and less liberal that ethical space in which we conduct our business becomes more straitened — perhaps, too, we are, as a community, increasingly co-opted into political ambitions. Nonetheless, there is at least an opportunity at the margins to play a role in enhancing the self-determination of people and increasing the accountability of policies and programs to that aim. Central to the task of such an intermediary role is resistance to the passion of precision. There are many ways to define standards and there are many approaches to understanding quality, and it is the responsibility of the impartial evaluator to resist the arbitrary dismissal of alternatives. We should treat with professional scepticism the increasingly common claims that such concepts as “national standards”, “best practices”, “quality control criteria”, “benchmarks” and “excellence in performance” are meaningful in the context of professional and social action.Glass (1978) warned evaluators and assessors against ascribing a level of precision in judgment to a subject matter that has less of it in itself — here evaluation contaminates its subject. Programs are rarely as exact in their aspirations, processes, impacts or meanings as our characterisations of them and as our measurements of their success and failure. Glass urged evaluation to avoid absolute statements and to stay, at best, with comparative statements — rough indications of movement, what makes one state or position distinct from another, distinguishing ascendance from decline, etc. Given the 20-odd staff at the Rafael Hernandez School, the 200-or-so pupils, the range of languages and social backgrounds, the plurality of meanings perceived in a curriculum statement — given all of these where is the boundary between the exact and the arbitrary? And if we are to settle for the arbitrary, why commission someone as expensive and potentially explosive as an evaluator? Ask a cleric.
One paragraph in House (1995) probably best captures what we suspect are the misunderstandings and the real differences between his position and ours. In responding to our previous writings, he says of us:Believing that evaluators cannot or should not weight and balance interests within a given context … suggests that human choice is based on non-rational or irrational preferences, not rational values, and that these things cannot be adjudicated by rational means. It implies a view of democracy in which citizens have irrational preferences and self-interests that can be resolved only by voting and not by rational means such as discussion and debate. A cognitive version of democracy, one in which values and interests can be discussed rationally and in which evaluation plays an important role by clarifying consequences of social programs and policies, is much better (p. 46).If we were to rephrase this paragraph to better fit our own beliefs, we would say:Evaluators can weight and balance interests, and when value conflicts are minimal, they should do so; otherwise they should not, if this means suppressing some of those interests. Human choice is based on non-rational, irrational, and rational grounds, usually all at once. While some of these choices can be adjudicated by rational means, the irrational preferences are usually far more pervasive and far less tractable than the rational ones, so successful rational adjudication, particularly by scientists, is the exception rather than the rule. In a democracy, citizens really do have irrational preferences and self-interests. Lots of them; and they are entitled to them. Sometimes they can be resolved rationally through discussion and debate, and when that is possible, it should be done. But more often than not, they can only be resolved by voting or some other method that voters legitimate. Evaluation can play an important role by clarifying consequences of social programs and policies, especially concerning outcomes that are of interest to stakeholders. But this is unlikely to lead to a so-called cognitive democracy in which values and interests are adjudicated rationally.When it comes to values, we think the world is a much less rational place than House seems to think. We agree with him that it is good to try to increase that rationality, within limits. When stakeholders differ about important values, it can sometimes make sense for the evaluator to engage them to see if they can find some common ground (although that task frequently extends well beyond the evaluator's training and role). When the needs of the disadvantaged look to be omitted from an evaluation, by all means the evaluator should make note of this and consider a remedy (descriptive valuing usually is such a remedy). But all this is within limits. If values were entirely rational, they would not be values by definition. So there is a limit to what the evaluator can hope to accomplish, and we suspect that limit is far lower than House's admirably high aspirations. Trying to impose a cognitive solution on value preferences that have a substantial irrational component is the problem democracy was originally invented to solve. To paraphrase House's (1995) last sentence: Disaster awaits a democratic society that tries to impose such a solution.
Throughout the world, both in government and in the not-for-profit sector, policymakers and managers are grappling with closely-related problems that include highly politicized environments, demanding constituencies, public expectations for high quality services, aggressive media scrutiny, and tight resource constraints. One potential solution that is getting increasing attention is performance-based management or managing for results: the purposeful use of resources and information to achieve and demonstrate measurable progress toward agency and program goals, especially goals related to service quality and outcomes (see Hatry, 1990; S. Rep. No. 103-58, 1993; Organization for Economic Cooperation and Development, 1996; United Way of America, 1996).
This paper presents a performance-based view of assessment design with a focus on why we need to build tests that are mindful of standards and content outlines and on what such standards and outlines require. Designing assessment for meaningful educational feedback is a difficult task. The assessment designer must meet the requirements of content standards, the standards for evaluation instrument design, and the societal and institutional expectations of schooling. At the same time, the designer must create challenges that are intellectually interesting and educationally valuable. To improve student assessment, we need to design standards that are not only clearer, but also backed by a more explicit review system. To meet the whole range of student needs and to begin fulfilling the educational purposes of assessment, we need to rethink not only the way we design, but also the way we supervise the process, usage, and reporting of assessments. This paper outlines how assessment design is parallel to student performance and illustrates how this is accomplished through intelligent trial and error, using feedback to make incremental progress toward design standards.