Selling science: optimizing the research funding evaluation and decision process

Purpose – In this contribution to EDI ’ s professional insights, the authors develop practical and evidence-basedrecommendationsthataredevelopedforbiasmitigation,discretioneliminationandprocessoptimizationinpanelevaluationsanddecisionsinresearchfunding.Ananalysisismadeofhowtheexpectationof “ selling science ” adds layers of complexity to the evaluation and decision process. The insights are relevant for optimization of similar processes, including publication, recruitment and selection, tenure and promotion. Design/methodology/approach – The recommendations are informed by experiences and evidence from commissioned projects with European research funding organizations. The authors distinguish between three aspects of the evaluation process: written applications, enacted performance and group dynamics. Vignettes are provided to set the stage for the analysis of how bias and (lack of) fit to an ideal image makes it easier for some than for others to be funded. Findings – In research funding decisions, (over)selling science is expected but creates shifting standards for evaluation, resultingin a narrowband of acceptable behaviorfor applicants. Inthe authors ’ recommendations, research funding organizations, evaluators and panel chairs will find practical ideas and levers for process optimization, standardization and customization, in terms of awareness, accountability, biased language, criteria, structure and time. Originality/value – Showing how “ selling science ” in research funding adds to the cumulative disadvantage of bias, the authors offer design specifications for interventions to mitigate the negative effects of bias on evaluations and decisions, improve selection habits, eliminate discretion and create a more inclusive process. funding. Our insights are relevant for process optimization in similar evaluative decision processes, including publication, recruitment and selection, tenure and promotion in higher education and research settings. Our recommendations are informed by evidence and experiences from commissioned projects in the context of European

research funding decisions, through interviewing panelists and observing panels. These projects were initiated with the purpose of uncovering and mitigating bias (i.e. cognitive distortion) in the evaluation process.
The evaluation and decision process involved in research funding, similar to other selection decisions for prestigious and scarce positions, can be characterized as imperfect or suboptimal (Vinkenburg et al., 2014;van den Besselaar et al., 2018). Room for bias occurs when evaluators are given discretionary space in evaluating performance and potential. In these kinds of complex decision processes, with limited time and inherent intransitivity, the volatile application of ambiguous criteria such as excellence, groundbreaking or "high risk high gain" induces the occurrence of bias. The stronger evaluators believe that the research system is meritocratic, the larger the room for bias to affect the outcome of the evaluation (R egner et al., 2019). While bias is often implicit, it becomes audible and readable through language and thus apparent to trained observers.
Research funding is both an indicator of and a (increasingly) necessary requirement for career success in academia (Bloch et al., 2014;Bol et al., 2018). The expectation of "selling science" as a critical component of research funding decisions became apparent in interviews we held on and observations we made of evaluation processes in the past decade. Competition implies that evaluators in their role as panelists or reviewers are asked to compare, weigh and decide whether to buy the ideas that an individual applicant, research team or consortium tries to sell them. While the ability to "sell science" is not a formal selection criterion and/or stage of the process, it emerges as a tacit criterion during the process and is a distinctive part of the narrative that evaluators use to describe why some are more or less likely to be selected and ultimately funded than others. We explore this phenomenon to provide professional insights and recommendations for process optimization and discretion elimination in evaluation.
Evaluators bring their own, often implicit, image of the "ideal academic" (or ideal scientist, career, team or consortium) into the process (Leslie et al., 2015). This ideal image or prototype is influenced by the discipline of the evaluator and the nature of the grant scheme, but the degree to which the applicant (or team) fits the stereotypical image determines their chances of success (Bickmore et al., 2018). Selling science with its connotations of self-promotion, assertiveness and confidence produces lack of fit for those who are held to injunctive norms of modesty or otherwise do not fit the part (Fox et al., 2017). The expectation of embodying and enacting the ideal image through situated performance in presentations and interviews makes fit or lack thereof almost tangible.
Prescriptive gender stereotypes or injunctive norms produce a narrow band of acceptable behavior for women when applying for prestigious roles or positions. When women act according to stereotypical expectations, they remain invisible; when women actively sell their science, they experience backlash. Therefore, the research funding evaluation and decision process is a particular and stark example of women's perceived lack of fit with the stereotypical or ideal academic (Bleijenbergh et al., 2013;Heilman et al., 2015;van Veelen and Derks, 2020). As explained by Biernat et al. (2020), shifting standards imply that when men stereotypically are expected to have more of an attribute than women, individual women are judged on that attribute relative to (lower) expectations for women, and men relative to (higher) expectations for men. Gender frames both viewing and valuing difference when making evaluations about who is truly exceptional, creating a double advantage for men (Correll et al., 2020). Acknowledging intersections of bias and stereotypes based on gender with those on other irrelevant but visible characteristics such as ethnicity or race, class, age and physical ability further illuminates how research funding decisions fuel cumulative advantage for White men and the inherent homogenization of the research system.
To illustrate the effects of bias on the evaluation process, we distinguish three different aspects of the process in research funding decisions, namely evaluating written information in applications, evaluating enacted performance in interviews and presentations, and group dynamics within the panel. We introduce each aspect of the evaluation process by sharing an evidence-based vignette informed by our observations and experiences in supporting research funding organizations as external experts. Understanding the vignettes as illustrations of how bias affects the evaluation process from the perspective of different stakeholders, we set the stage for our analysis and for developing actionable recommendations on how to mitigate bias and reduce discretionary space in the evaluation process. We analyze how bias occurs in each of these aspects and explain how selling science adds a layer of complexity to an already cognitively demanding situation. In the analysis, we use examples from interviews with panelists [1] held in the context of a commissioned project on bias in research funding (van den Besselaar et al., 2018), and we refer to research evidence on similar evaluations. We then provide recommendations to optimize the process for each aspect in particular and for the overall process. Raising awareness is important but not enough to mitigate the effects of bias and change evaluation and decision habits. Going beyond awareness, we elaborate on accountability (A), biased language (B), criteria (C), structure (S) and time (T). Our recommendations are listed in Tables 1-4 below, following the same AABCST logic. Through process optimization and discretion elimination, we contribute to diversity and inclusion in higher education and research settings.
The evaluation process in research funding: the perils of selling science Bias makes its way into the evaluation process as it affects panelists' implicit associations, explicit expectations and behaviors. Implicit notions of the "ideal academic" and the "ideal career" affect how panelists read, view, hear, question, describe and discuss applications. In order to obtain research funding, it is necessary to sell science. This means that on paper, applicants have to write well and convincingly to be perceived as excellent. In person, applicants are expected to act confident to be perceived as competent. For panelists to recognize merit, they have to buy what is being sold. In principle, writing and acting in a confident manner have low predictive validity for future research performance. At the same time, evaluators often confuse confidence with competence. More generally speaking, criteria such as institutional prestige are at best proxies for research performance. Especially for early career researchers, being judged on (research) potential rather than past performance results in the benefit of the doubt going to those who fit the "ideal image" better than those who do not. Nonperformance factors and irrelevant personal characteristics contribute to perceived (lack of) fit, and thus, the shared expectation of selling science makes bias explicit.
Interestingly, some panelists refer to the notion of overselling, implying that some applicants by virtue of being highly skilled and even groomed in writing and presentation style raise expectations of future performance that they may or may not be able to meet. Some panels (or members) are more sensitized towards overselling than others, perhaps related to the degree to which certain disciplines have a stronger tradition of hiring external consultants and coaches in preparing applications. The question is then whether overselling gets "caught" or "bought", and whether as a strategy it benefits or harms some more than others.
Selling and overselling science may play out differently depending on the stage of the evaluation process. Some panelists view overselling as the norm and argue that underselling will result in a written application being rejected immediately. Other panelists argue that the interview and presentation provide the chance to really learn about an applicant's merits. The norm of overselling is challenged when applicants with a sound, but not oversold written application, are able to show their qualities in the presentation or interview. This pattern was reported for women applicants more than men. In contrast, some applicants who oversell in writing are not able to meet these expectations in enacted performance.
This pattern was reported for men applicants more than women. So not overselling in a written application can be an advantage for women especially, highlighting the narrow band of acceptable behavior.

Bias in the evaluation of written materials
When asked to evaluate written materials, typically a CV or track record and an idea or research proposal, the panelists' implicit associations of the ideal application are made salient by the style, vocabulary and content of what is written (Markowitz, 2019). Panelists describe the writing as "assertive" or even an "aggressive" way of selfpromotion, in terms of selling science and scientist as worth funding. This style is said to serve applicants (and especially men) well in terms of leading to high impact publications. Panelists assume that gender differences in writing style exist; they read more confidence, ambition and "fanciness" in men's than in women's written applications.
Interestingly, some panelists describe the written materials provided by women as better prepared and more realistic, sound and substantial than those prepared by men. They argue that falling for the (over-)confident and exaggerated writing style of men may serve as an exclusionary mechanism for women whose writing is perceived as humbler and more realistic. Overselling on paper is sometimes caught when an exceedingly wellwritten proposal using superlatives or buzzwords raises doubts among panelists in terms of who actually wrote the proposal. Poor writing obviously affects applicants' chances but may also prompt panelists to look at it again and read through the lines to look for great ideas.
Implicit associations also affect how reviewers and panelists write about applicants and applications. Biased language in evaluation reports is found in terms of certain adjectives (e.g. superlatives), nouns (e.g. "leader") and negations (e.g. "she does not have a bad CV") and can be identified using linguistic software (Kaatz et al., 2015). Some evaluation procedures require applicants to submit recommendation letters, which is another place where such linguistic bias (e.g. doubt raising) becomes apparent (Madera et al., 2018) (see Table 1 for recommendations on evaluating written materials).

Bias in the evaluation of enactment in interviews and presentations
Selling science is an enacted performance that requires a stage and an audience. The enactment in presentations and interviews is an opportunity for applicants to make panelists "buy" their application. The applicants appear and perform on a stage while they are watched by a group of evaluators who respond both nonverbally and verbally to what is enacted.
In such a demanding atmosphere, this performance may appear as standing in front of a tribunal, which is not equally appreciated by and fitting for all applicants. -but should I be standing behind the lecternthroughout my whole talk? What will they think? Shouldn't I take the whole space that is there -in front of the slides?
Take the space? Own the space? Like I have seen so many others do previously? What did they say again in the prep-course for this occasion?
Take the space? Own the space? STOP. FOCUS. I go on. […] EDI Evaluation of interview and presentation (enactment) Awareness Be aware of enactment Acknowledge that interviews/presentations stage require enactment (as in performing on stage or putting on a show) Acknowledge that the expectation to 'enact' the ideal image fits some applicants better than others Acknowledge that the quality of the performance has little predictive validity for the quality of the research Be aware that panelists actively affect enactment Acknowledge that panelist are active audience members whose responses (smiling, affirming, discouraging, disapproving) have an effect on the enactment Accountability Hold each panelist accountable Write your own comments and fill out forms completely during the presentation and/or the interview, independently from the other panelists Biased language Be aware that confidence is not equal to competence Be sensitive to biased language used by applicants (superlatives, pronouns such as I versus we, etc) as a sign of (over) coaching and (over) selling or of making oneself bigger or smaller than one is Be aware of shifting standards Be sensitive to shifting standards expressed in language Reduce your own use of biased language in your response to the enactment Criteria Apply the same criteria to each applicant Be sensitive that the enactment itself is not being evaluated and scores on the formal criteria are colored by the enactment performance

Selling science
What sells well depends again on the norms in the discipline, but it also depends heavily on panelists' individual preferences and expectations. Panelists may not appreciate an applicant who dresses in a certain way, who has a voice perceived as too loud or not loud enough, who smiles too much or too little. As each panelist has an individual picture of the ideal applicant, each applicant's individual enactment may fit this image more or less.
Language is specifically relevant for identifying bias in enacted performance, in terms of language used by applicants as well as language used by panelists. Through the applicant's word use and language skills, panelists might be more or less impressed by and fond of the performance. An applicant who does not use the expected buzzwords for hot topics in one's discipline may receive lower support and less beneficial evaluations compared to the evaluation of an applicant whose vocabulary fits the ideal. The same goes for using self-promotional language in terms of superlatives, boasting and pronouns ("I" versus "we"). Applicants have limited opportunities to learn about and prepare for these (often informal or implicit) rules and preferences; transparent criteria might not or only vaguely exist.
When overselling is part of their enactment, applicants appear confident, and panelists can perceive and easily assess this confidence as competence. Indeed, panelists may have gendered expectations in terms of overselling: While acting (very) confident and assertive is clearly expected and appreciated in the enacted performance of men, this is less the case for women. Women are expected to act more modest and less confident and (for those sensitized to the issue) not to oversell. They may be assessed more positively when their enacted performance meets gendered norms for women, when panelists perceive them as well-prepared and nice. Enacted performance is particularly prone to the impact of shifting standards. For example, while being or appearing to be young is seen a sign of potential in men, it is seen as a sign of inexperience in women (Malmstr€ om et al., 2017). These shifting standards become audible through the language used by panelists when posing questions to, giving feedback on or raising doubts about applicants during and after the performance (see Table 2 for recommendations on the evlauation of interview and presentation).

Bias due to group dynamics
Essentially, each panel is a group of individuals. Groups and their members are sensitive to specific dynamics, both immediately present and/or gradually developing throughout the group process. Especially decision-making situations are prone to group dynamics, affecting Vignette 3 -group dynamics "Taking a look behind the scenes" The chair leans forward: "This stays confidential, right?" I am quick to say: "Yes, yes. Of course.", and the others on his panel nod, as expected.
He continues: "You know, if I have a favorite (applicant) -and usually, I know at least some of the guys on the panel quite well -I know exactly, who will think along the same lines and thus, who I need to give the word, first, (to find support and set the tone in support of my own preference)…".

Group dynamics
Awareness Understand panel as a group Make this explicit (know the benefits and flaws of group decision making) Introduce need for structure Make this explicit (structure reduces room for bias) Discuss and agree on structure Structure of "who speaks first" (e.g. rotation or novice rule); how to make final decisions Make all rules and procedures explicit Reconsider documents, be complete to reduce need for tacit and informal knowledge/assumptions Be aware of intransitivity Know that such complex, comparative evaluations that require a ranking suffer from inherent intransitivity: Paper, rock and scissors (i.e. no evident winners) Accountability Everybody is accountable Discuss and agree (e.g. filling in all relevant templates/forms)

Biased language
Sensitivity to language use Use less biased language and call on each other when doing so Application of criteria Explicitly agree (or disagree) on relevant criteria Discuss each criterion prior to starting the evaluation round (prevents relying on unclear or flawed understanding) Agree on how to apply criteria Discuss/make this explicit (e.g. weighing and compensation of criteria) Structure (for panel) Agree and apply evaluation structure Evaluate per criterion, not per candidate Provide both inclusion and exclusion arguments Discuss and agree upon evaluation structure Implement required changes (e.g. flip evaluation matrix to "per criterion") Align discussion with evaluation structure Applying evaluation structure means to also let discussion evolve accordingly (e.g. discuss per criterion, not per candidate) Apply discussion structure Introduce and agree on discussion structure (e.g. who speaks first/rotate speaking time per panelist, etc.) Optimally structure physical or virtual meeting environment Seating arrangement, e.g. mix men and women, low and high status panelists around the table or in the meeting application Invite external participant expert observer Needs to be accepted by panelists, supporting role in meetings and coach to chair and scientific officer Structure (for interviews) Apply strict interview guide Prescribe the exact questions, order and panelist to ask the question across all panelists Time Spend equal time on all criteria and applications Plan this ahead of time and communicate and apply

Reserve time for inclusion arguments
Reserve time for calibration Table 3.

Recommendations group dynamics
Selling science the decision outcome substantially (King et al., 2018). In research funding evaluation and decision-making, intersecting factors such as status hierarchy, power distribution and general conventions of interaction pave the way for decisions made in favor of some applicants in comparison to others. For instance, the convention of consistently giving the word to the most senior or highest status panelist first turns this panelist's impression and evaluation into an anchor for the rest of the discussion. Adopting a power lens illuminates how it is particularly challenging for less senior and lower status panelists to voice diverging impressions and evaluations in response. If worse comes to worst, such group dynamics may (unintentionally) lead to decision-making (re)producing systematic inequalities between members associated with particular social groups. Particularly, the panel chair and scientific officer supporting the chair are game changers with respect to group dynamics. Thanks to their formal role, they can influence the group dynamics more structurallyfor better or worse. Continuing on the previous example, usually, the chair asks panelists for their input to the discussion. As such, the chair can either embrace the general convention of asking the highest status panelist first, or the chair can organize the discussion differently (e.g. taking turns with whom speaks first). Thus, the chair can actively regulate the room for bias to affect decision-making.
The comparative and intransitive nature of panel decision-making, in which multiple applications are ranked on the basis of multiple criteria, leaves room for bias but also provides opportunities for process optimization in terms of explicitly operationalizing and weighing criteria and of deliberate calibration between and across ranked applicants (Vinkenburg, 2017). Again the possible impact of the chair on this part of process is quite large (see Table 3 for recommendations on group dynamics).

General recommendations
With these more detailed analyses of how bias affects three different but interrelated aspects of the evaluation and decision process, and with concrete recommendations on each aspect for panelists and chairs, we hope to have raised awareness (A) on how process optimization and discretion elimination can help eliminate the cumulative (dis)advantage of the requirement of selling science in research funding. Even if the room for improvement is marginal, changing selection habits and mitigating the effects of bias contribute to a more inclusive process and outcomes. It is important that all involved (panel members, scientific officers who support panels, and especially panel chairs) are held accountable (A) for their contribution to decision outcomes and are engaged in process optimization. Some research funding organizations allow for an appeal if the applicant (or a panelist) believes that bias has occurred and/or if insufficient or incorrect information is provided to justify a decision. These appeals are then considered in a double blind setting.
To provide insight into and offer practical recommendations for improving inclusive, non-biased language (B), a linguistic bias scan or app can be used on materials (e.g. guide for applicants), on written applications, reference letters and evaluation reports but also to support observations in panels. Emphasizing that bias becomes explicit and audible in language and giving evidence based, context specific examples of biased language provide a means to move away from general concerns panelists may have around political correctness. As language is evolving and fluid, showing the negative impact of language on evaluation is important, especially for internationally and/or disciplinary diverse panels.
To make the application of criteria (C) less volatile, it is importance to consider and discuss the meaning and operationalization of vague criteria such as excellence or research potential and of rating scales (e.g. "beyond expectations") among evaluators before moving on to the actual evaluations. Funders (or employers) are challenged to specify criteria in more detail, by providing examples of how to operationalize criteria given a particular grant scheme or call for proposal. Acknowledging that evaluators are experts when it comes to the discipline or the science involved, it is important to stress that reducing ambiguity and promoting transparency contributes to the predictive validity of the evaluation.
Restructuring the process and the materials (S) so that applications are evaluated per criterion instead of applicant reduces the likelihood of selectively downplaying or inflating applicants' strengths and weaknesses. Similarly, asking evaluators to emphasize inclusion arguments (why they think an application should be evaluated positively) circumvents (always) giving the benefit of the doubt to some and (only) raising doubts about others. Having the same panelists ask the same questions in the same order to all applicants, a basic recommendation to improve the predictive validity of job interviews is a practice not yet often carried out in panel interviews.
In addition, we stress that applicants, applications and scores are not to be discussed outside the meeting of the panel, to make sure that everyone involved has access to the same information and arguments, and to ensure accountability and transparency of the process. Evaluators should fill out the evaluation forms completely and individually before sharing with other evaluators, including an explanation of high and not only low scores.
Finally, time should be strictly kept (T), to ensure that equal time is spent on each criterion and application and that enough time is left for the calibration. While discussing the meaning of criteria and deciding on some ground rules for the panel discussion and the interviews takes time, it will also save time during the deliberation and calibration because of a shared understanding of the criteria and the process.
Throughout our argumentation, in the vignettes and the quotes, the role of the panel chair has appeared as crucial in the evaluation and decision process. We underline this importance here, providing a few instances in which the chair can take responsibility in process optimization. In the preparation of the meeting, in instructing (new) panel members and in the calibration process at the end of the meeting, the chair can effectively make or break the effort. Some chairs take on the role of referee or assign the role of referee for someone to observe the process, take notes (using an app) and provide feedback on the occurrence of (linguistic) bias and opportunities for improving the process. An external observer (physically present or in online meetings), who is explicitly not a panel member but does have status and voice can be engaged for this role as well. We applaud chairs who encourage and incorporate both positive and negative feedback and who allow time for careful calibration and reconsideration at the end of a long and intense evaluation round (see Table 4 for general recommendations and wild ideas).

Discussion
In this professional insights contribution, we have provided both specific and general recommendations for process optimization and discretion elimination in the evaluation and decision process for research funding. Our argumentation and recommendations serve to problematize the requirement of "(over)selling science" as an aspect adding additional, complex and gendered layers on top of the general challenge of making research funding more inclusive. In this sense, our recommendations can be translated to other performance evaluations in the editorial process or in research organizations. Especially in recruitment and selection, personorganization fit is an important criterion (relative to being able to sell science) that may lead to similar (dis)advantage for some compared to others depending on perceived (lack of) fit to an implicit ideal image (Herschberg, 2019;Rivera, 2017;White-Lewis, 2020).
When we design and implement interventions for research organizations based on these arguments, the question is often asked whether there is an optimum to process optimization in terms of standardization. Indeed, in some cases, algorithms (if unbiased), wild cards and lotteries can take over (some of) the decision-making and reduce (some of) the time and other resources involved in instructing and supporting panels and panel chairs. However, not all funding schemes, evaluation criteria and process steps are identical, and these interventions work best when customized to the particular context. In this sense, customization and standardization go hand in hand in bias mitigation, process optimization and discretion elimination in research funding evaluation and decisions.