The Highly Qualified Teacher: What is Teacher Quality and How do we Measure it?

Patricia Maloney (Yale University, New Haven, Connecticut E‐mail: patricia.maloney@yale.edu)

Journal of Educational Administration

ISSN: 0957-8234

Article publication date: 15 March 2013

883

Keywords

Citation

Maloney, P. (2013), "The Highly Qualified Teacher: What is Teacher Quality and How do we Measure it?", Journal of Educational Administration, Vol. 51 No. 2, pp. 242-244. https://doi.org/10.1108/09578231311304751

Publisher

:

Emerald Group Publishing Limited

Copyright © 2013, Emerald Group Publishing Limited


In The Highly Qualified Teacher, Strong sequentially answers the two questions found in the subtitle to this book: what is teacher quality and how have we measured it in the past? He then proceeds to address the implicit third question of how we should measure it through quasi‐experimental data that he collected and used to design a new teacher evaluation system. These experiments tested whether a variety of judges (both education experts and those not in the education field) can accurately identify high‐ and low‐performing teachers (with performance measured by value‐added scores of student standardized tests) based on short video clips of classroom performance. These experiments, which are discussed in greater detail below, found that individuals – experts and non‐experts alike – do very poorly when attempting to classify a teacher as high‐ or low‐performing based on a short exposure to that teacher's whole class instruction. Since many K‐12 administrators have based their teachers’ evaluations on such a short exposure, this may lead to inaccuracies in evaluations. To combat such inaccuracies, Strong suggests a new system for evaluating teachers that administrators can use called the Rapid Assessment of Teacher Effectiveness (RATE) that can be used to more accurately identify teachers as high‐ or low performing. Strong argues that this system should be used in addition to value‐added scores to measure teacher performance.

The majority of this book (the first four chapters) is a well written and thorough review of the literature on the history of the teaching profession, how we define teacher quality, and how we have measured it in the past, with a special focus on the advent of measuring and defining the “highly qualified teacher” spoken of in No Child Left Behind. This literature review is the book's strong point, and I highly recommend it for those who study teacher quality/certification or for those who are just entering this field and wish to have solid overview. The final chapter contains a description of Strong's experiment design, methodology, and how his RATE system emerged from the data gathered in those experiments. This final chapter is not as strong as the literature review, and could have benefited from a more in‐depth description of the methodology and data.

The first chapter contextualizes teacher evaluation and reform within a history of the teaching profession and how teaching's prestige and compensation has been affected by the increasing predominance of women in the field. This chapter also presents evidence concerning the general decline in teacher quality as measured by standardized test scores since the 1940s, as well as sections on teacher compensation and occupational status. Finally, we are led into the governmental reforms that have attempted to reverse the trend of lessening teacher quality: professionalization, deregulation, and reforms to the pay structure (p. 9). The second chapter picks up from this discussion of governmental reform with the literature on how teacher quality has been defined in the past. Briefly, there are many definitions, ranging from those based on teacher behaviours and personality attributes to those based on the supposedly objective criteria of credentials like certification or university degrees. Strong rightly notes that we have come to pragmatically accept “good teaching” as synonymous with “effective teaching” as measured by student growth on standardized tests, and this has, in turn, guided current reforms and standards concerning teacher evaluation.

Chapter 3 delves into the current state of the literature on the effects of teacher quality on student learning, examining what we have learned about the effects of teacher‐preparation variables like certification, undergraduate major, undergraduate institution quality, graduate degrees, and teacher test scores, as well as variables concerning teacher attributes like race, gender, verbal ability, and personality. The many different variables discussed in this chapter accurately and evenly reflect the sophistication and general confusion within the research field about the effects of teacher preparation, personality, and training – so much so that I wished for some sort of summary table to help me keep track of all that was being discussed.

The final chapter in the literature review, Chapter 4, focused on how teacher quality has been measured in the past through in‐classroom evaluation protocols like Danielson's Framework for Teaching and the Classroom Assessment Scoring System (CLASS), as well as other subject‐specific protocols. Strong provides a cogent summary of each, arguing that the best measurement of teacher quality is based on administrators’ protocol‐directed in‐classroom observation of teacher behaviour used in conjunction with value‐added scores derived from student achievement data, since principal evaluations alone have not been found to be accurate in identifying teacher quality (again, problematically defined by value‐added scores). At times, Strong writes as if he is privileging the value‐added scores as the true measure of teacher performance, which is consistent with his later use of those scores as a normative scale in his experiments. He does correctly note that value‐added scores should not be used to make judgments about individual teachers (given how problematic they are to calculate and the exogenous factors like familial or social environment that may affect students), but does end the chapter by saying that they should be used with “some other measure, such as observation, portfolios, or student evaluations, in order to arrive at an estimate of teacher effectiveness that can then be used to make decisions about promotion, merit pay, or firing” (p. 84), which transitions the reader into the original research in the final chapter.

Chapter 5 presents the experimental methodology, the data gathered from said experiments (which would be more accurately termed quasi‐experiments, given the lack of randomly assigned treatment), and new teacher evaluation system, RATE, that Strong has developed from that data. The chapter begins with a well‐written rationale for why we need an organized and quasi‐objective protocol for teacher evaluations, beyond an administrator's informal walk‐through: administrators are humans, as likely to fall prey to confirmation bias and other cognitive biases as the next person. Strong and his research team were able to calculate value‐added scores for all the teachers in a district, then recruited elementary mathematics teachers (the final pool of teachers consisted of seven white females, three in the high value‐added score group and four from the low) who allowed the researchers to film the first two minutes of whole‐class instruction for study (there is nothing said about how knowledge of being recorded may have changed teacher behaviour, even beyond what we might expect if they were being observed by administrators). Then, 100 “judges” from ten categories that included K‐12 administrators and teachers, parents, and elementary schoolchildren as well as non‐education stakeholders were recruited to classify the teachers as high‐ or low performers (performance is treated as synonymous with the value‐added score calculated previously).

The results showed that experts and non‐experts alike do very poorly in determining a teacher's value‐added score from teacher behaviour based on only observation: “an effective strategy for identifying teachers accurately would be to place them in the opposite categories to which the judges assigned them” (p. 93). This finding then begs the question of why value‐added scores are considered more accurate than the observers’ judgments, but this is not addressed. Experiments 2 and 3 used 165 new judges and data from a new school district with teachers who were considered either very high‐ or low performing, and again found that inter‐judge agreement on performance was very high, but accuracy was very low. Experiment 3 used full length lessons and judges trained in the CLASS system who were only able to classify teachers with 50 per cent accuracy – no greater than chance (p. 97). Strong notes that the RATE system emerged out of the previous three experiments, although does not precisely say how, and it is the subject of Experiment Number 4. Two judges were selected to be trained on RATE (the pilot version is included in this book), and viewed an entire lesson. They were found to be accurate in identifying high‐ and low performance in 14 out of the 16 teachers. This does not necessarily lead to the conclusion that RATE is more accurate than CLASS, in that little on how these judges were chosen is included in this chapter.

More information on subject recruitment, the judges’ background and experience, and how the qualitative data from the judges’ debrief were analyzed would be helpful in determining the results’ accuracy, as well as information on how RATE emerged from the first three experiments. Additionally, the endnotes describe generally how the value‐added scores were calculated, noting that they are based on student factors like race (minority vs white), class factors like per cent free/reduced lunch, and a dummy variable for school. Since the calculation of this score is the foundation for these experiments, greater detail such as the actual, full equation used and the rationale for choosing regression as a method would be helpful. For Experiments 2‐4, value‐added scores were provided by the school district with little explanation of how precisely they were calculated, which is problematic.

Currently, the reader is left with many questions about both the methodology and the validity of the data and conclusions. Strong briefly notes that RATE should be used in conjunction with value‐added scores to evaluate teachers – but, if the RATE system is implicitly structured to be normed on and predictive of value‐added scores, how do they materially differ? As it stands, an argument could be made that this system just offers some examples of best practices for the teacher to strive for and more differentiated evaluation for the administrator – something that could be said of many different teacher evaluation systems. Strong notes that this book contains preliminary results, so hopefully these issues will be addressed in further research.

About the reviewer

Patricia Maloney is a doctoral candidate at Yale University.

Related articles