CitationDownload as .RIS
Emerald Group Publishing Limited
Copyright © 2009, Emerald Group Publishing Limited
Letter to the Editor
Article Type: Letter to the Editor From: Health Education, Volume 109, Issue 6
I am writing on behalf of Prevention Research Institute (PRI) in response to the article, “Evaluation of an alcohol risk reduction program (PRIME for Life)1 in young Swedish military conscripts,” published in Heath Education, Vol. 109 No. 2, 2009. PRI welcomes and takes seriously studies on the impacts of PRIME For Life (PFL). The importance of independent evaluation of interventions is well known. Obviously, this study concerns us because it finds little impact of PFL, which is at odds with nearly all other independent evaluations of PFL. Of greatest concern are the methodological limitations in this study, which we believe the authors have not adequately addressed in their article and which may lead the reader to inappropriate conclusions about the effectiveness of PFL.
As with any study, it is important to know its limitations in order to place the findings in context, whether the findings are positive, neutral, or negative. In addition to the limitations listed by the authors, PRI has identified three significant threats to the internal validity of this study. Given these limitations, we do not believe accurate conclusions can be drawn from this study about the degree of effectiveness of PFL.
The military personnel assigned to implementation of the program in Sweden were trained PFL instructors. Consequently, they were aware of the essential elements of effective implementation of the program. However, given the learning curve involved in becoming skilled in delivering a new program as well as logistical limitations, these military officials observed that the implementation of PFL was poor (personal correspondence), being well below the established and recommended standards for the proper delivery of the intervention. Specific implementation issues they observed include:
While the article states that the classes were taught in two days, all classes were actually delivered in one day. Consequently, in many classes, the program was not completed. In particular, there was insufficient time to conduct the necessary curriculum activities; and in some classes, the content was not fully covered.
In most classes there was a significant lack of interactivity/group discussion. PFL is designed to be an interactive program, including facilitated group discussion, to engage the participant in the materials. Specific examples of this implementation issue include: – In many classes few, if any, essential group processing activities were conducted. – At least two instructors read from the manual the entire class. This is specifically in opposition to the delivery protocol as it leads to disengagement of participants. – At least one class was delivered in a theater with a fairly large group, where the seating arrangements made it impossible to facilitate group discussions.
All of the instructors were inexperienced in teaching the program, with some delivering their first PFL classes in this study.
There were significant language barriers. All of the instructors had received their training in English and the version used in the study was a draft Swedish translation, which was improved and finalized after the completion of the data collection. In addition, some instructors were not skilled at reading or writing English, which is a compromising factor when it comes to understanding the program and its objective, as well as delivering the program effectively. Since the researchers did not measure treatment integrity, they did not detect these implementation issues, Thus, it is impossible to conclude whether the lack of impact was a result of an ineffective curriculum, inadequate delivery or (as noted below) a measurement effect. Consequently, even if there were no other limitations in the study, the most one can conclude is that when PFL is poorly implemented, it might not have lasting impact.
The measures used in this study may have been inadequate to detect changes, particularly with regards to heavy drinking and attitudes targeted by PFL. There are also concerns with the data analytic approach to the alcohol measures, which further reduced the ability of the researchers to detect changes.
While the goal of PFL is to reduce consumption to low-risk levels, it is also expected that changes may occur at heavy consumption levels; that is, the individual may act to lower consumption levels, even while continuing to drink at risky levels. While not optimal, a change in this consumption would be a meaningful shift in behavior. Clearly, the study does not show significant changes from high-risk to low-risk drinking. However, because of the measures employed the authors are unable to detect meaningful change within the heavy consumption category. For example, there was no report on measures of maximum/peak drinking, only drinking on a “typical” day. Many drinkers consume much more alcohol on some days than they do on a typical day; for example on the weekends versus during the week.
Another item measured the frequency of drinking six or more drinks on a drinking occasion. This item does not allow for analysis of potentially meaningful reductions in very heavy drinking. For example, reducing consumption from 12 drinks, two or three times a week to six drinks, two or three times a week would not be evident in this analysis, but would be a significant reduction in drinking quantity and in risk for harm. Thus, the measures limit evaluation to only one end of the change spectrum. In addition to the insensitivity of the alcohol measures, there is also concern about the ability of the attitude scale to detect changes. The researchers reported a summary score for the attitude scale, yet the scale included items (e.g. “It should be allowed to sell beer, wine and distilled spirits in ordinary grocery stores,” “It is OK to buy or give alcohol to a 15-year old,” and “Of course you take a drink if you’re offered one”) unrelated to the curriculum. Nor was there any indications that these scale items were tested before the program to see if they correlate with drinking choices. While including non-targeted items can help to detect response sets and changes unrelated to the intervention, it can also add noise to the instrument that reduces its sensitivity to change. Conversely, some researchers (e.g. Theory of Planned Behavior) would argue that only attitudinal items targeted to specific elements within a behavior area represent an appropriate test. Hence, the measure is at best limited in its ability to detect change. Similarly, there is concern about analyzing an alcohol summary score. Comparing the combined scores from the three drinking questions (frequency of drinking, amount of drinking on a typical day, and frequency of consuming six or more drinks on one occasion) limits the ability of researchers to ascertain positive changes in behavior even more so than does examining these measures of behavior change separately.
Finally, there is significant attrition in the treatment sample. Only 53 percent of the sample responded to the consumption questions at all three time periods (baseline, five months, 20 months), and only 51 percent of the sample responded to the attitude items at all three time periods. While the authors’ comparison of responders to non-responders on the combined scores for the consumption questions and on the attitude scale showed little difference at baseline, there could still be important between-group differences that these comparisons did not reflect. These potential differences could have significantly affected the results. Given these substantial implementation and evaluation limitations, we request that this letter of response be published in Heath Education, along with any responses from the authors of the original article.
In summary, checks of intervention integrity were not conducted in this study. Without good assessment in this area it is impossible to know the meaning of a negative outcome. Checks to insure that research is measuring what it is intended to measure have become standard in most areas of research but remain largely unaddressed in evaluation research. We need to be able to distinguish between ineffective programs, ineffective implementation and ineffective measurement.
Thank you for your time and consideration.
Ray DaughertyPresident, Prevention Research Institute
For more information about the implementation of PRIME For Life in this study, contact: Anna Sjöström, Stockholmsvagen 18, 181 33 Lindingo, Sweden; E-mail: firstname.lastname@example.org or Sten-Erik Edenhag, Försvarsmakten HKV FÖRBE PERS, 107 86 Stockholm, Sweden; E-mail: email@example.com
Greenfield, T. (2000), “Ways of measuring drinking patterns and the difference they make: experience with graduated frequencies”, Journal of Substance Abuse, Vol. 12 Nos 1-2, pp. 33–49Kallina-Knighton, W. (2002), “Effectiveness of an intervention program for DUI (driving under the influence) offenders”, Dissertation Abstracts International Section A: Humanities and Social Sciences, Vol. 63 Nos 6-A, p. 2151Nason, M. (2007), “2006 evaluation report of the implementation of PRIME For Life with impaired drivers in six states”, Prevention Research Institute, Inc, Lexington, KYRoom, R. (1990), “Measuring alcohol consumption in the United States: methods and rationales”, in Kozlowski, L.T. , Annis, H.M. , Cappell, H.D. , Glaser, F.B. , Goodstadt, M.S. , Israel, Y. , Kalant, H. , Sellers, E.M. and Vingilis, E.R. (Eds), Research Advances in Alcohol and Drug Problems, Vol. 10, Plenum Press, New York, NY, pp. 39–80