Scienti ﬁ c accompaniment: a new model for integrating program development, evidence and evaluation

Purpose – Calls for the development and dissemination of evidence-based programs to support children and families have been increasing for decades, but progress has been slow. This paper aims to argue that a singular focus on evaluation has limited the ways in which science and research is incorporated into program development, and advocate instead for the use of a new concept, ‘‘scientific accompaniment,’’ toexpandand guideprogramdevelopmentand testing. Design/methodology/approach – A heuristic is provided to guide research – practice teams in assessingthe program’s developmentalstageand levelofevidence. Findings – In an idealized pathway, scientific accompaniment begins early in program development, with ongoing input from both practitioners and researchers, resulting in programs that are both effective and scalable. The heuristic also provides guidance for how to ‘‘catch up’’ on evidence when program developmentand scienceutilizationareout ofsync. Originality/value – While implementationmodels provide ideason improving the use ofevidence-based practices, social service programs suffer from a significant lack of research and evaluation. Evaluation resources are typically not used by social service program developers and collaboration with researchers happens late in program development, if at all. There are few resources or models that encourage andguide theuse ofscienceandevaluation across programdevelopment.


Introduction
The research practice gap has been extensively described and debated in health, education and social service fields (Backer et al., 1995;Chambers, 2012;Flaspohler et al., 2012aFlaspohler et al., , 2012bGlasgow and Emmons, 2007;Green et al., 2012;Hallfors and Godette, 2002;Morrissey et al., 1997). Despite numerous books, manuals, websites and other publications on evaluation and a long-standing call for evidence-based interventions in many fields (Eagle et al., 2003;Lyles et al., 2006;Nathan and Gorman, 2015;Truman et al., 2000;Zaza et al., 2005), progress in narrowing the research practice gap has been slow (Fagan et al., 2019;Gambrill, 2016;Neuhoff et al., 2022;Wathen and MacMillan, 2018). The goal of better integrated research and program implementation are hampered by several problems including: limited backgrounds in research and science for most practitioners; pressure on practitioners to incorporate "evaluation" and demonstrate their program is evidence-based, but without much guidance on what kind of evaluation makes sense at which stage or how to connect with a good partner; research-practice partnerships that often begin late in the program life cycle; and a mismatch between researcher skills and interests and practitioner needs.
To better understand how these problems typically arise, it is helpful to consider how programs often develop. There are two different approaches for developing programmatic interventions that have been described: the research to practice model, and the community-centered model (Wandersman et al., 2008). Research to practice interventions generally include research from the onset (Wandersman et al., 2008) and follow predefined stages for program development (Mercy et al., 1993;Mrazek and Haggerty, 1994). There have been some great successes of such models such as Nurse Family Partnerships (Olds, 2006), Perry Preschool Program (Heckman et al., 2010) and the Incredible Years (Menting et al., 2013). However, many times, interventions developed and tested by researchers do not transfer easily to practice (Glasgow and Emmons, 2007;Green et al., 2009;Wandersman et al., 2008) and are difficult to implement in many parts of the world due to resource constraints, logistics or a lack of implementation expertise by researchers (Miller and Shinn, 2005;Richardson, 2009;Wandersman et al., 2008;Ward et al., 2014). Others have criticized that programs developed by researchers are rigid, ignore participant or user values and practices and might not be adaptable to individual needs of the participants (Mullen and Streiner, 2004). Hence, overall interventions that have been evaluated and found to be most effective in prevention research are not necessarily the ones most widely implemented (Ringwalt et al., 2002;Wandersman and Florin, 2003).
Community-centered models, on the other hand, are more commonly implemented and are developed primarily by individuals with strong advocacy or practice expertise (Wandersman et al., 2008). However, they rarely have access to research or science expertise and their limited time and resources focus on developing and implementing the program itself (Green et al., 2009;Jones, 2014;Wandersman et al., 2008). For these models, research-practice partnerships are critical to strengthening programmatic evidence and improving the knowledge of what works (Kelly, 2012). In our experience though, such partnerships, if they are undertaken at all, typically happen late in program development, well after the program has been implemented widely or even scaled. The partnerships also sometimes occur around a specific pressure to include evaluation by outside sources, such as by a particular funding requirement.
As researchers who have consulted with nonprofit organizations working to improve the lives of children and families for decades, and in the case of one of us, worked for a major private international funder that commissioned research and evaluations, we have seen these challenges regularly interfere in efforts to build more effective programs for children and families. We believe that a new orientation is needed that includes several key features: novel, explicit and realistic approaches to guiding research-practice partnerships; models in which science and research is incorporated early in program development; and a roadmap for program developers (and evaluators) that provides guidance on the kind of evaluation or science use that makes sense given a program's stage of development.
To encapsulate this approach, we suggest the use of term "scientific accompaniment," borrowing from the German term wissenschaftliche Begleitung, which is sometimes translated as "concomitant research," (Bä r, 2013). We believe that the field's reliance on evaluation may limit the ways that program developers use science and research in their work. We hope that the use of the term scientific accompaniment will instead move researchers and practitioners to consider a process that builds research into early stages of development, increasing the rigor of programmatic evidence step by step, throughout the entire program life cycle. The term scientific accompaniment also promotes a co-constructive process, in which scientific knowledge and available evidence is integrated with expertise from practice and advocacy provided by the practitioner to build both effective and scalable programming. This article discusses the concept of scientific accompaniment, providing a heuristic for how research-practice teams can best collaborate based on program maturity. It first outlines and defines different stages of program development as well as different levels of evidence for programs. It then outlines the types of scenarios that research-practice partnerships may encounter depending on a program's stage of development and level of evidence. Finally, the paper provides suggestions for research-practice teams on how to best proceed to strengthen a program's evidence.

A heuristic for determining program maturity
The term program maturity has traditionally been used synonymously with a program's stage of development (Baker and Perkins, 1984;Milstein and Wetterhall, 1999). However, defining a program as mature should indicate that it has been developed, tested and grown over time along with accompanying science and evaluation support. Using this more multi-faceted definition, we suggest that research-practice teams assess the stage of program maturity according to a heuristic that combines the program's stage of development with its level of evidence support.

Stage of program development
There are numerous frameworks that describe stages of program development and implementation (Birken et al., 2017;Meyers et al., 2012;Tabak et al., 2012). In their seminal work, the Institute of Medicine published a five-stage model to develop successful interventions (Mrazek and Haggerty, 1994). Another pioneering model is the Center for Disease Control model developed by Mercy and colleagues (Mercy et al., 1993), which outlines four stages of program development: defining the problem, identifying risk factors, developing and testing interventions, ensuring widespread use. For the purpose of engaging in scientific accompaniment, we define a four-stage model of program development.
Concept stage. This stage describes the phase in which a community need is identified, and a solution to the problem in the form of an intervention emerges and is planned. Program components are developed and the logistics of implementation are considered.
Pilot stage. The pilot stage of a project is an initial small-scale implementation that is used to test whether the project idea is viable with a small number of beneficiaries. It enables the organization to assess and manage risks of a new idea and identify any need for improvements before substantial resources are invested. Implementation stage. The implementation stage is the phase in which the project is actually executed with a targeted number of beneficiaries. It has been suggested that it takes two to four years to solidify effective implementation of a program (Fixsen et al., 2009).
Scale up stage. This stage describes a program that was designed for one setting and is now being more widely implemented in other locations with the same or very similar settings (Aarons et al., 2017).

Levels of evidence
In considering the evidence support for program impact, we use a broad definition of evidence that includes research or evidence for a program's approach or components, evidence for the need for a program, data on implementation feasibility and evidence of a program's impact or effectiveness. While there have been some attempts to label different levels of evidence for an intervention (Dekkers, 2018;Geher, 2017), no established terminology exists. In the current paper, we define three levels of evidence: Not supported by evidence. We use the category "not supported by evidence" to refer to a program that: has not systematically gathered information on the problem, its risk factors, effective solutions for a problem or proven mechanisms of change in their conceptual framework; has not used established research on related social problems to develop a Theory of Change (TOC) and inform program components (Darling et al., 2016;De Silva et al., 2014;Jones, 2014;Valters, 2015;Valters et al., 2016;Weiss, 2011); and has not undergone a process or outcome evaluation.
Evidence-informed. We use the term "evidence-informed" to refer to programs that, while they may not yet have used outcome evaluation to confirm program efficacy, the program design and implementation have been designed using available data. Specifically, we consider this category as having two levels: first, the intervention has been embedded in a TOC that considers existing research results. Second, the program uses developmental or process evaluation to answer questions relevant to practitioners that arise during the course of program design, implementation and refinement (Peters et al., 2013). While "good" interventions can be badly implemented, poor interventions can equally be implemented successfully. Having theoretically sound programs does not, in itself, ensure successful implementation and/or effectiveness (Moir, 2018).
Evidence-based. We refer to evidence-based programmatic interventions as those that have undergone some kind of formal evaluation with evidence of positive impact on at least some key outcomes. There are a range of evaluation methodologies with different levels of rigor that have been well described in multiple texts. (Rossi et al., 2019;Wholey et al., 2010). There are also variations and differences of opinion on what level of evidence is needed to consider a program evidence-based (Mihalic and Elliott, 2015). In our view, the use of this term should refer less to an end result, than to a process of building evidence for program impact with increasing rigor as the program matures. Smaller outcome evaluations with less rigorous designs (e.g. pre-post designs) can help tweak program components earlier in development. Then, increasingly rigorous methodologies (e.g. randomized controlled trials) are needed before a program is scaled.

Program maturity
When a research-practice partnership is initiated, the first goal of the team should be to assess the current level of program maturity by defining: the program's stage of development; and the level of available evidence for the program. We have developed a heuristic depicted in Table 1 to outline the different scenarios that researchers can encounter at the outset of a partnership with practitioners based on a combination of these two factors. Based on the assessment and the resulting scenario, the team can systematically assess the key next steps for collaboration and how to best approach moving the evidence forward to "catch up" on missed steps. Some of the scenarios represent more ideal situations for research-practice partners than others [1]. In the best case, a research-practice partnership will be initiated in Scenario 2, then move through Scenarios 8, 9, 15 and 20 with the following recommendations: A program should not move into a next phase along this path until the suggested level of evidence has been reached. For example, program should not be scaled up until rigorous evaluation (with control groups) indicate that the program is effective.
A certain level of evidence should not be generated until all the steps have been "caught up." For example, a program should hold off from rigorous evaluation until an evidence-informed TOC exists and the project has been evaluated for process and some basic understanding of effectiveness has been established through pre-post evaluation.
The remaining scenarios represent situations in which a program's stage of development and level of evidence may be out of sync, which, in our experience, is not uncommon. For those scenarios, we use the heuristic to provide guidance to research-practice teams on how to work together in these conditions.

Scientific accompaniment: a roadmap for collaboration between researcher and practitioner
The following sections discuss the process of scientific accompaniment for each of the four stages of program development (concept phase, pilot phase, implementation phase and scale-up phase) using the heuristic presented in Table 1. We provide guidance, in particular when a research-practice collaboration is initiated at all the different stages of program maturity. In each section, we outline the opportunities and risks that exist within the different scenarios and identify strategies for making sure that a program's level of evidence and stage of development are in sync.

Concept phase
The integration of science and research and practitioner expertise during the concept development phase for social programs is critical, but evaluation and implementation literature have typically not focused much attention here. Despite calls to link research and practice in the process of intervention design and testing, it is rare for research-practice partnerships to occur at this phase (Glasgow and Emmons, 2007;Green et al., 2009;Miller and Shinn, 2005;Wandersman, 2003). Our program maturity heuristic defines Scenario 2 as an ideal place for the scientific accompaniment process to begin. Research-practice partnerships occurring at this stage of program development increase the chance that programs will have positive outcomes. It is also an effective way to build long-term partnerships: researchers and practitioners build a common language and a common understanding of the program as it develops. Early work together can also facilitate efforts to define outcomes that might be measured in future outcome evaluations and make preparations that will facilitate outcome evaluation work as the program develops.
With a research-practice partnership during this phase, practitioners' expertise can be connected with existing evidence to make sure that the program design and implementation plan is evidence-informed. This process involves two tasks. The first task is Note: TOC = Theory of change to clarify the community or population's need for the program, such as through a needs assessment (Soriano, 2013). Collaboration with researchers can provide data to confirm the practitioner's experience-based impression about what kinds of services are needed for a given target population.
The second task is to define the program components and connect them through an evidence-informed TOC. In a strong TOC, program elements and outcomes are welldefined, and the assumptions connecting them are backed up with evidence (Darling et al., 2016;De Silva et al., 2014;Jones, 2014;Valters, 2015;Valters et al., 2016;Weiss, 2011). While many programs construct a TOC (or a logic model or program theory), often the assumptions underlying the TOC are not clearly supported by evidence. Prior evaluation and research efforts help practitioners design components or mechanisms of change so that, from the start, the program has the best possible chance of being effective (Leijten et al., 2018;Melendez-Torres et al., 2019). Even if a program is being developed to tackle a relatively new problem area, there is likely evaluation research on programs addressing related areas, or research on risk and protective factors, that can be used to support program development.

Pilot phase
In a program's pilot phase, the use of science and research allows practitioners to test feasibility, address challenges in program delivery, establish procedures, collect preliminary data on participants and make changes to the model prior to expanding delivery to full implementation (Wiseman et al., 2007). The aim for this phase is to end up with a fully evidence-informed program design, ready for implementation and outcome evaluation. When teaming up with practitioners, researchers might encounter a program in a pilot phase that has not drawn from existing research or data yet (Scenario 6). In this case, because of the early stage of development, there is an opportunity to review the TOC and identify areas where research supports the program logic and where it does not. The research-practice team are then able to jointly strengthen the TOC with available evidence and refine program elements if work on the TOC reveals that changes are needed to the model or approach.
Once the team is working with an evidence-informed TOC (Scenario 7 in the heuristic), the key focus of the research-practice partnership needs to be on systematically collecting data on how the pilot and early implementation is going through the use of a formative or process evaluation (Cohen et al., 2000;Crowther and Lancaster, 2012). The goal of such an evaluation is to collect information that can be fed back into program implementation, and there are extensive resources to guide this process (Patton, 1994;Patton et al., 2016). It answers questions such as: Q1. How well is the program being implemented?
Q2. Is it implemented as planned?
Q3. How well is the target population being reached?
Q4. What challenges have been revealed?
Q5. What possible solutions have been tested?
Q6. What elements have proved useful/popular by the recipients, and which ones have not?
Typically, formative evaluation is a term used early in program development stages (as in a pilot stage), while the term process evaluation can refer to data collected at any point in a program's development to ensure that implementation is proceeding as intended (Wholey et al., 2010). Formative evaluations have been shown to be crucial in the process to strengthen an intervention and get it ready to be evaluated for impact more rigorously (Devries et al., 2021;Lachman et al., 2020;Madrid et al., 2020) It may also be informative to collect preliminary data on pilot effectiveness in a pre-post design, depending on the number of individuals participating in an intervention and the nature of the pilot (Heuristic scenario 9). Some organizations may want to verify likelihood of effectiveness in a larger-scale pilot before expanding to an implementation phase. Other organizations may do developmental evaluation with a small pilot, and then focus on outcome evaluation as part of the implementation phase.
There may be programs interested in engaging in more rigorous outcome evaluation (e.g. involving control groups) before moving to the implementation stage (Heuristic scenario 10) (Arain et al., 2010). This may be the case, for example, with programs developed in controlled academic settings who want to use an efficacy trial as part of early program development (Flay, 1986;Glasgow et al., 2003). Sometimes, the practitioners or researcher may be impatient to move to outcome evaluation. Practitioners may be eager to claim that they have an "evidence-based program"; researchers may have an incentive to lead rigorous evaluations with a high chance of being published. Both may have heard that highly rigorous designs, such as randomized controlled trials are gold-standard evaluations and are keen to move to that level of recognition. However, conducting rigorous outcome evaluation during the pilot stage bears risks for a program. Even efficacy trials require that programs have achieved some finalized and ready stage of development and stable implementation. Additionally, conducting rigorous evaluation in the midst of program development might produce results that reflect delivery issues still being sorted out versus the impact of program elements.

Implementation phase
The implementation phase is defined by programs that are more established and are being delivered as part of normal organizational procedures to a larger group of individuals. During this phase, the program should aim to move to a place where program efficacy is established as the program builds, and prior to any scaling. The implementation phase is typically variable in length, in which a program moves from early implementation to sustained and long-standing community programs. Some programs remain here as locally implemented programs without moving to expanded, scaled implementation. It is our experience that this is the most common phase during which a research-practice partnership is initiated. For example, a researcher may be brought on by a practitioner who wants to conduct evaluations to see if the program is working. This can happen early in implementation, but our experience is that often research-practice partnerships occur after a program has been well-established, and practitioners have become interested in documenting impact.
There are challenges that will need to be addressed if there has been little prior attention to the TOC (Scenario 11) or developmental evaluation work (Scenario 12). However, research-practice teams can work to address missed steps. There is some possibility that addressing the TOC at this stage might uncover significant gaps in program logic. In this case, moving too quickly to a rigorous evaluation would potentially waste resources, and instead, an adaptation of the program might be warranted before moving into a formal evaluation. It may be difficult for programs to consider changing program elements, particularly if the research-practice partnership is new and still building trust. However, it is better to do a correction now, before resources are spent on an outcome evaluation with negative outcomes and certainly before the program is scaled up. Similarly, researchpractice teams should make sure that process evaluation work precedes outcome evaluation, so that evaluation work is not being conducted on programs that are not being delivered fully or with a basic level of fidelity to the design (Scenario 13).
Outcome evaluation is critical to conduct during the implementation phase with several factors influencing the decision of the research-practice team about the rigor of outcome evaluation that should be conducted (Scenarios 14 and 15). It might be wise to collect prepost data only in a first step to solidify hypotheses related to outcomes of interest, before conducting a study producing more confidence in effectiveness with a more rigorous design (Habicht et al., 1999). Evaluation research should build on prior work, becoming more rigorous and more independent over time. Prior to scaling a program, it is ideal for rigorous and independent evaluation to confirm program impact with at least some key outcomes. Randomized control trials are considered the gold standard of outcome evaluation, although there are some logistical as well as resource considerations to be taken into account (Sanson-Fisher et al., 2007;West et al., 2008). Still, it is important to make sure that implementation is solid before engaging in rigorous impact evaluation. Many of the risks and challenges outlined under the pilot phase still apply. In addition, practitioners may get frustrated to try to incorporate large-scale evaluation as the program still works on building implementation objectives. (Carroll et al., 2007). A program might also fail to show effects because the number of participants is too small. In addition, conducting a rigorous evaluation early on might produce results indicating effectiveness in a very specific setting and render the program inflexible for other contexts (Glasgow et al., 2003).
Choosing the appropriate research design depending on the maturity of the program is key to mitigating risks while gradually strengthening evidence. At the same time, and depending on their training and focus, researchers may be inclined and more comfortable to conduct a certain type of evaluation, for example, process evaluation for more qualitatively trained researchers or outcome evaluation for more quantitatively trained researchers. Practitioners might also have a preference for one type of evaluation or another, depending on their understanding of evaluation or their ultimate purpose as opposed to choosing the design based on the research question, as has been strongly suggested (Peters et al., 2013). Often, the necessary researcher skills include qualitative as well as quantitative elements and a mixed approach is generally useful; hence, it is important to make sure that the research questions and methodology match researcher skills.

Scale-up phase
Many programs with successful implementation become interested in moving implementation to other communities or even large geographical regions. As a general recommendation, a program should only be scaled up once sufficient evidence of effectiveness is available. The scale up process brings many new challenges of implementation and also additional questions about effectiveness (Forum on Promoting Children's Cognitive, Affective, and Behavioral Health, Board on Children, Youth, and Families, Institute of Medicine, and National Research Council, 2014). While implementation may be very successful in one setting or community, there may be unexpected implementation challenges in new ones. And even strong outcome evaluation in one setting does not guarantee the same level of impact in a new setting (Olweus and Limber, 2010). Hence, it is important to continue process and outcome evaluation during scale up process for quality assurance. Aarons et al. (2017) argue that when implementing interventions in a moderately different setting or with a different population, it can sometimes "borrow" strength from evidence of impact in a prior effectiveness trial, but argue that some new empirical evidence is often necessary to retain evidentiary status.
However, many programs are scaled before any evidence of effectiveness has been established at all, with dire consequences. A famous example is Drug Abuse Resistance Education (D.A.R.E.): this program designed to prevent drug use was widely disseminated, administered in 70% of US school districts in 1996 (Rosenbaum and Hanson, 1998). Once evaluated, however, it was deemed ineffective (West and O'Neal, 2004) and even potentially harmful (Lilienfeld, 2007). Should a research-practice partnership be initiated with a program that has already been scaled up, the different stages of strengthening the evidence for a program provided in the heuristic should occur before moving to rigorous outcome evaluations (Heuristic scenarios 18, 19 and 20). The broader the scale of implementation, the more difficult it may be to adjust the program according to the work done in these scenarios. However, building in scientific accompaniment is still of critical value and can make a difference. Based on the findings from the evaluations, the D.A.R.E. program completely revised their curriculum with improved evidence of effectiveness in multiple, rigorous and controlled studies (Hecht et al., 2006;Marsiglia et al., 2011).

Conclusions
Aiming toward a model of "scientific accompaniment" will expand the orientation of both practitioners and researchers beyond just outcome evaluation. It encourages the use of research-practice partnerships from the beginning, throughout the program's life cycle and provides a roadmap for researcher-practice collaboration. Through the use of a heuristic, it outlines scenarios that define idealized pathways for syncing program development and the use of science, as well as how teams can "catch up" on evidence, should evidence and program development end up out of sync.
The model is particularly useful because a research-practice partnership can be initiated at any stage of program development, and at any level of pre-existing evidence. It frees the collaboration from a key stumbling block in our experience, namely, to implement the type of evaluation a researcher might be most familiar with or the evaluation the practitioner is envisioning for the program. Instead, the heuristic guides the partnership to carefully match the "science intervention" with program maturity, providing a useful "rule of thumb" on how to best proceed at this point in time.
The model makes sure to use available evidence to strengthen programming as well as systematically harvesting expertise from practitioners, responding to the call that efforts to close the gap should include both researcher and practitioner perspective (Morrissey et al., 1997;Wandersman, 2003). The model provides a realistic pathway to build increasing confidence that the program design is strong, that implementation is needed and feasible, and the that program is having the anticipated benefit in the recipients and perhaps community. Evaluation theory is increasingly emphasizing the importance of value-engaged approaches or making sure that evaluation incorporates stakeholder values (Hall et al., 2012). Realist evaluation theory emphasizes the importance of evaluation as an iterative process working to understand the nuance of how a particular program might work with a given population and setting and why (Jagosh et al., 2015;Marchal et al., 2012). A model of scientific accompaniment versus evaluation allows more easily for incorporation of these theoretical perspectives, which developed out of concern that a traditional, narrow approach to evaluation does not account for the complexity of the real-world contexts in which programs seek to make change.
While this model presents a structured approach to establishing the evidence-base of a given intervention and formulates a "roadmap," there may be additional issues that need to be taken into consideration during implementation. This may include challenges to receive funding for the initial phases of implementing the scientific accompaniment model, or challenges of researchers remaining involved in all phases of the research. However, there are a number of examples where this has been successfully done. As models of scientific accompaniment expand, the benefits are likely to become even more apparent to funders and other stakeholders in evidence-based practice.
Note 1. For the concept phase, only Scenarios 1 and 2 are relevant, as the pre-condition for both process and impact evaluation is that the program is implemented at least with a small number of beneficiaries and not only exists as a concept.
j JOURNAL OF CHILDREN'S SERVICES j