Talking about the likelihood of risks: an agent-based simulation of discussion processes in risk workshops

Purpose – This paperaims toexplore driversof theeffectivenessof risk assessments in risk workshops. Design/methodology/approach – This study uses an agent-based model to simulate risk assessments in risk workshops. Combining the notions of transactive memory and the ideal speech situation, this study establishes a risk assessment benchmark and then investigates real-world deviations from this benchmark. Speci ﬁ cally, this study models limits to information transfer, incomplete discussions and potentially detrimentalgroupcharacteristics,as well as interaction patterns. Findings – First, limits to information transfer among workshop participants can prevent a correct consensus. Second, increasingthe required number of stable discussionrounds before anassessmentimproves the correct assessment of high but not low likelihood risks. Third, while theoretically advantageous group characteristics are associated with the highest assessment correctness for all risks, theoretically detrimental group characteristics are associated with the highest assessment correctness for high likelihood risks. Fourth, prioritizing participants who areparticularlyconcerned about the riskleadstothehighest level of correctness. Originality/value – This study shows that by increasing the duration of simulated risk workshops, the assessments change – as a rule – from underestimating to overestimating risks, unraveling a trade-off for risk workshop facilitators. Methodologically, this approach overcomes limitations of prior research, speci ﬁ cally the lack of an assessment and process benchmark, the inability to disentangle multiple effects and the dif ﬁ cultyof capturing individual cognitive processes.


Introduction
This study investigates conditions affecting the effectiveness of risk assessments in risk workshops [1]. Firms constantly adapt and transform themselves in response to potential risks that may threaten their existence. This entails the need to assess risks correctlya crucial task in firms' enterprise risk management (COSO, 2017). A failure to distinguish between severe and less severe risks can have serious detrimental consequences, even threatening the continuation of operations. However, this assessment is not a trivial task as decision-makers have to rely on their judgment (Mikes, 2009), which is based on information [2] that is often scattered within and beyond the organization (Neef, 2005).
Risk workshops are a frequently used technique to facilitate the aggregation of this distributed information (COSO, 2017) and allow stakeholders to discuss and assess the impact and likelihood of risks (Boholm and Corvellec, 2016). Risk assessment captures the entire process that determines the severity of a risk after it has been identified (COSO, 2017). The severity of a risk encompasses its potential impact and the likelihood of its occurrence. Risk management literature (van Asselt and Renn, 2011;Quail, 2011) suggests that the risk assessment's effectiveness, in terms of correctly assessing the risks and in terms of the time invested to reach a decision, depends on the design and implementation of this dialogue. We investigate risk workshops' design and implementation from the point where the worst credible impact of a certain risk is evident. Accordingly, the assessment focuses on the likelihood of the risk's worst credible impact. Subsequently, to ensure clarity, "high risks" and "low risks" refer to "high likelihood risks" and "low likelihood risks," respectively, and "risk assessment" refers to the "assessment of the likelihood of a risk." Because of the difficulty of observing organizational and individual cognitive conditions in discussions (instead of merely noting their outcomes) and the fact that a benchmark (i.e. the correct risk assessment and the time required to achieve it) is ex ante absent in most risk assessments (McNamara and Bromiley, 1997), prior research has been unable to systematically disentangle different sources of (in)effective risk assessments and to describe the unfolding of the discussion over time. We address these challenges by theoretically drawing on the idea of transactive memory and Habermas' (1983) notion of the ideal speech situation. We start by suggesting that risk workshops can be conceptualized as transactive memory systems. Such systems are based on the knowledge stored in each individual's memory, the knowledge about the domain of expertise of the other individuals and the communication of this knowledge. Transactive memory systems represent an attempt to use individuals' information by combining their expertise through a discursive process (Wegner, 1987). Thereafter, we draw on Habermas' (1983) characteristics of an ideal discoursewhich include free and full access to the discourse, equal opportunities to express attitudes, desires, and needs and the absence of coercionto define the most suitable, theoretical-likely conditions to achieve the correct assessment with the least effort. Subsequently, we investigate deviations from this ideal speech situation to determine the risk assessment's unfolding under real-world conditions.
We use agent-based modeling (ABM), namely, simulation experiments that allow agents to follow predefined rules when interacting with other agents and with their environment (Wall and Leitner, 2020). In this study, the agents are workshop participants who communicate to assess a specific risk. ABM allows the development of individual knowledge and its group-level combination, as well as related risk assessment outcomes (Secchi, 2015;Wall and Leitner, 2020). Moreover, our simulation experiments provide a correct assessmentlabeled the "benchmark assessment"against which to evaluate the risk assessment outcome (Labro and Vanhoucke, 2007). To define the "benchmark process," that is, the time required to achieve the benchmark assessment, we start by simulating an ideal speech situation in which all relevant risk information is shared by the participants. Thereafter, we introduce more realistic scenarios representing deviations from the ideal speech situation. Specifically, we consider the effects of limits to the information transfer among participants (i.e. the receiver only partially accepts the sender's argument because of cognitive load, time pressure, or different backgrounds), incomplete discussions (i.e. the introduction of a decision and termination approach, like voting on the risk assessment after a number of discussion rounds, instead of allowing unlimited information sharing), group characteristics (i.e. unequally distributed information, hierarchical differences and the nonrecognition of the possessors of expert knowledge) and specificities of the interaction patterns (e.g. prioritizing higher hierarchical positions in the discussion instead of randomly allowing an introduction of assertions).
We find that, under realistic discussion conditions, it is difficult to attain the benchmark assessment. We therefore generate fine-grained insights on the effects of deviations from the ideal speech situation.
Even though the risk assessment stabilizes when increasing the number of discussion rounds, limits to information transfer can still prevent a correct consensus [3].
In incomplete discussions, the discussion conditions suit the correct assessment of either low risks or high risks [4]. An increase in the required number of stable discussion rounds before the leader decides on the risk assessment worsens the correct assessment of low risks. Deviations from theoretically detrimental group characteristics lead potentially to higher instead of lower levels of correctness. Prioritizing participants who are concerned about a certain risk leads to the highest level of risk assessment correctness.
This paper makes a threefold contribution to research and practice. First, whereas prior risk assessment research focused on overall risks (Aven and Zio, 2014), we raise an awareness thereof that by increasing the average duration of all risk workshops, the assessments change from an underestimation to an overestimation of risks. Thus, the increased correctness of high risks' assessment over the discussion time comes at the cost of a gradually reduced correctness of low risks' assessment. Future researchers are encouraged to refine their research questions by distinguishing between the likelihood of the risks they are targeting, while firms are urged to make allowance for longer discussions if they want to avoid misidentifying high risks. Second, contrary to the intuitive understanding advocated by previous risk management and group discussion literature, we show thatin the context of risk workshopsthe individual characteristics of the theoretically ideal speech situation are not as ideal as presumed (Johnson and Pajares, 1996;Sheffield, 2004). For example, in terms of correctness, a decision made by the leader following an own or the majority assessment outperforms the choice made after waiting for the emergence of a consensus. Firms can learn that the workshop's effectiveness is unlikely to increase after simply improving a single design component. Future research should be cautious when using the ideality notion in discursive settings.
Finally, to the best of our knowledge, this study is the first to systematically introduce a benchmark assessment and process in risk assessment investigations. Generally, an objectively correct assessment is seldom available as a benchmark (Bromiley et al., 2014;McNamara and Bromiley, 1997). Instead, we overcome this limitation and also avoid the Talking about the likelihood of risks commonly used singular focus on the effort required to achieve a risk assessment, by focusing on the decisions' correctness (Chapman, 1998;Heemstra et al., 2003). Moreover, we allow disentangling the effects of distinct deviations from the ideal speech situation; effects that are otherwise only collectively evident in the risk assessment decision (He et al., 2012). While prior studies accounted for organizational effects, like the order in which participants speak (Hiltz et al., 1986), they were generally unable to capture individual information processing, like the individually assigned importance of received information. We model both types of effects.
2. Theoretical background 2.1 Risk assessment in risk workshops Risk workshops are instances of group discussions, usually moderated by a facilitator, which provide the basis for a decision made by a leader. Relying on a group requires more effort than, for example, directly soliciting a leader's decision. Collectively, however, the group is expected to make better use of its individual members' information than the individuals would do, as the group can profit from their members' diversity by aggregating their information on different domains (LiCalzi and Surucu, 2012;Lu et al., 2012;Stasser and Birchmeier, 2003). However, risk workshops (and, more generally, group discussions) often fail to provide reliable (risk) assessments (Hunziker, 2019;Stasser and Titus, 1985). Although scattered, the literature provides some explanations of these outcomes. Among others, the detrimental effects are caused by limited information transfer, because of information overload (Paul and Nazareth, 2010) or the diversity of the participants' backgrounds (LiCalzi and Surucu, 2012). Other arguments point at incomplete discussions owing to time constraints (van Knippenberg et al., 2004) or group characteristics like the lack of familiarity with each other's expertise (Moreland and Myaskovsky, 2000). Moreover, the intra-group interaction patterns are deemed relevant (Katzenbach and Smith, 2015). For example, homogeneity and concurrence seekingthe "groupthink" concept (Janis, 1972) are related to suboptimal group assessment (Schulz-Hardt et al., 2006). A similar effect could arise when participants are unengaged or when they dominate the discussion (Hunziker, 2019;Quail, 2011). While prior experiment-based laboratory studies are clear about the individual drivers of the quality of the discussion's outcome, they are generally unable to capture the (change of) perceptions of the individual participants and the group during a discussion that is simultaneously affected by multiple conditions (Schulz-Hardt et al., 2006) [5]. However, this process perspective explains at which specific stage of the discussion process a particular decision will be made, in turn unraveling the effectiveness achieved under a particular discussion condition (e.g. terminating the discussion after a certain period of time or focusing on specific participants during the discussion). This study contributes to closing the research gap.

Risk assessment process: ideal conditions and deviations
We merge the cognitive and discursive perspectives. From a cognitive perspective, we frame risk workshops as an example of distributed cognition. Distributed cognition means that groups make use of individuals' knowledge by combining their expertise (Hauke et al., 2018). Specifically, we rely on transactive memory, a mechanism through which risk workshop participants learn about each other's expertise (i.e. participants build transactive memory) and then identify and combine knowledge in a discursive process. In a risk workshop, a partially differentiated transactive memory system progresses toward an integrated system. In a differentiated transactive memory participants have fully disjunct areas of expertise (i.e. expertise is maximally, unevenly distributed), while in an integrated transactive memory all participants have the same knowledge (Wegner, 1987). Transactive memory systems have a positive impact on group performance. This impact is more likely to emerge when group members are familiar with each other's expertise and have initially distributed expertise (Lewis, 2004).
While this cognitive perspective of risk assessments focuses on the group's access to individual knowledge through discussion, the discursive perspective complements the cognitive perspective by focusing on the design of this discussion. Habermas (1983), referring to Alexy (1978), describes the conditions of an ideal speech situation that is theoretically suited to reach a true consensus [6]. In an ideal speech situation: all participants competent at speaking about the relevant topic are allowed to participate in the discourse; [7] all participants have the same chance of participating by speaking, disagreeing and asking and answering questions, and every aspect can be discussed and criticized; and all participants engage in the discussion without differences in power or other forms of coercion.
The ideal speech situation is regarded as a normative standard for a discussion of risks (Horlick-Jones et al., 2001) that ensures the proper sharing and use of individual knowledge in the group. Real-world discussions are limited by constraints that deviate from the aforesaid ideal speech-situation characteristics. Starting with Habermas (1983) and based on Handy (1986) and our summary of the literature on group discussions, we focus on the following four deviations: Limits to information transfer: To reach true consensus, the speaker and listener need "shared propositional knowledge, and mutual trust in subjective sincerity" (Habermas, 1982, p. 413). A speech act might not fully convince the receiver if these requirements are not met, with the result that the individual's expertise on a certain risk is not fully incorporated into the assessment. Incomplete discussion: The ideal speech situation is not limited by temporal constraints as "no preliminary opinion [should remain] permanently withdrawn from discussion and criticism" (Habermas, 1989, p. 177). By contrast, the leader must set time limits to each risk in a workshop (Quail, 2011) and must enforce a termination rule, after which the leader decides on the risk assessment. Specific group characteristics: As there is no limit to a discussion's length in the ideal speech situation, initial differences in the participants' information access can be resolved by successively sharing information. However, if the discussion remains incomplete (i.e. it is ended before arriving at a true consensus), an unequal distribution of information among participants may influence the risk assessment proposed by the group. Moreover, while the equal consideration of each participant's arguments forms a core of the ideal speech situation, in real-world situations hierarchical differences may influence the acceptance of arguments. Finally, expertise might go unrecognized (i.e. receivers have no transactive memory). Specific interaction patterns: Habermas (1989, p. 177) calls for participants to "have an equal chance to use representative speech acts" and to "have the same opportunity to use regulative speech acts, that is to give orders and to resist, to Talking about the likelihood of risks allow and prohibit, to make and take promises." However, it is unlikely that participants will be equally prioritized to speak in real-world discussions (Quail, 2011).
In line with the cognitive and discursive components of our theory, we expect that these deviations from the ideal speech conditions will, ceteris paribus, reduce the risk assessments' correctness and increase the required number of discussion rounds.

Overall design
We use a simulation experiment approach, that is, we model the reality of interest with its related processes and outcomes and combine it with an experimental design (Harrison et al., 2007) [8]. First, we align the benchmark process' simulation with Habermas' ideal speech situation. Second, we run four simulation experiments that model the aforesaid deviations from the ideal speech situation to disentangle the extent to which they change the risk assessment's effectiveness. Given the importance of gaining a better understanding of actors' roles in risk management and governance (Hiebl et al., 2018), we model the interaction of risk workshop participants as the exchange of information between agents in an ABM (Lorscheid and Meyer, 2021;Wall and Leitner, 2020).
The risk itself is modeled as a Bayesian network (Fenton and Neil, 2019;Kabir and Papadopoulos, 2019), representing both the discussed risk and the mental model of the participants [9]. Bayesian networks are probabilistic models that describe the conditional probabilities of an event (Gonz alez- Brenes et al., 2016;Pearl, 2008). Combining ABM and Bayesian networks provides the two components of a transactive memory system, namely, the transactive processes and individual memory systems, reflected, respectively, in the discursive interaction of the ABM's agents and in the likelihood of states represented by Bayesian networks.

Discussion process and risk assessment model
Each simulation experiment consists of a number of simulation runs. Each simulation run is an entire discussion of a single risk within a risk workshop, and it comprises five stages ( Figure 1). The risk structure forms the basis of each discussion ( Figure 2). The overall risk assessment (e.g. the likelihood that the introduction of a new product in the market can fail) is derived from the assessment of domain-specific risks (e.g. the likelihood of competitors introducing a similar product or the likelihood that the new product's cost is higher than the customers' willingness to pay). The domain-specific risk assessment is derived from the assessment of issue-specific risks (e.g. the likelihood that productions costs are higher than expected), which, in turn, is rooted in the assessment of specific risk information (e.g. the likelihood that existing machines cannot be adapted to the new product and new machines have to be purchased). The participants' mental model is constructed analogically. The full risk structure contains 40 nodes, comprising 27 information, nine issue and three domain nodes, as well as one node for the overall risk assessment. The individual participants, owing to their diverse backgrounds or priorities, have different risk perceptions (Sjöberg, 2000) and are initially only aware of the existence of the domains and issues related to the information they are provided with during the initialization. Before they can receive information about a certain domain or issue, they have to gain knowledge of this domain or issue's existence by discussing it with other participants [10]. During the discussion, information on the 27 information nodes is exchanged. The other nodes are derived from the state of the information nodes. All nodes are discrete variables in a "low," "medium" or "high" state. Each of these states is assigned a probability that represents the degree of belief that the variable is in a particular state [11]. To reflect a situation where the risk workshop needs to correctly account for a small share of critical information, we postulate in our Bayesian network that information nodes are individually ten times more likely to indicate a low than a medium likelihood, and ten times more likely to indicate a medium than a high likelihood. If a participant believes that a certain information node has a "high" state (i.e. the state represented by the information node has a high likelihood), the Bayesian network will reflect this with a higher probability of the corresponding issue-, domain-and overall risk assessment nodes being in a "high"-risk state. Thus, for the same risk, participants can arrive at different risk assessments, depending on the information available to them.

Model of the discussion
Nine participants [12] exchange information about the risk at hand. The discussion is divided into rounds, each comprising a sequence of actions performed by the participants (see stage 4 in Figure 1). The discussion's outcome is influenced by how it deviates from the ideal speech situation. (1) A risk assessment task is randomly generated, according to the Bayesian-network risk structure depicted in Figure 2. The task is to assess the likelihood of the risk, which is unknown to the workshop participants.
(2) The benchmark assessment is calculated using the complete information from the risk assessment task.
(3) Each participant is provided with some but not all information (i.e. with limited information), in a way that each information is initially known to at least one participant. (4) Then, simulation experiments are run, in line with the conditions delineated in Table 1. The risk is discussed repeatedly and assessed by the group in several discussion rounds. Specifically, (4.1) all participants share their assessment of the likelihood of the risk; then (4.2), when certain conditions are met (e.g. the group has reached a consensus), the leader terminates the discussion and decides on the risk assessment. If the discussion continues, (4.3) the next participant to share information (i.e. the sender) is chosen and (4.4) shares the information, (4.5) followed by the other participants (i.e. the receivers) who update their risk assessment based on this new information. (5) The assessment reached by the workshop is compared with the benchmark assessment. For example, if a high risk (benchmark assessment) is assessed to be low, it is a misidentified high risk Talking about the likelihood of risks 3.3.1 Limits to information transfer. The sender's arguments may not fully convince the receiver. In our model, after receiving information from the sender and when updating their risk assessment, participants will not necessarily fully discard their prior beliefs about the corresponding information node. Instead, a receiver's new assessment of the information node is a weighted average of his/her prior assessment and the sender's assessment [13]. The weight that the receiver attributes to the sender's input differs across receivers and is an aggregate that, in practice, may account for factors like cognitive load, time pressure or the participant's background.
3.3.2 Incomplete discussions. Under real-world conditions, leaders will have to determine the basis on which they will make their assessment decision and when the risk workshop should end. They might rely on their individual risk assessment, on the group consensus or on the majority's assessment. In terms of timing, if the leader adopts a consensual assessment, the discussion could be stopped when the consensus emerges. Otherwise, the leader might stop the discussion if it is not progressing, that is, when the average (numerical) group assessment has stabilized over a certain number of rounds (one, five or ten).
3.3.3 Specific group characteristics. We focus on the impact of three group characteristics.
(1) Unequal distribution of information: Participants might not have access to the same amount of information, in which case a larger share of information is provided to some participants.
(2) Differences in hierarchy: Information from higher-ranked participants might receive more consideration than information from other participants. Thus, the weight of the information is higher.
(3) Information about each other's expertise (transactive memory): Participants may be unaware of each other's expertise (i.e. receivers lack transactive memory); thus, they cannot differentiate between expert and non-expert senders and will not weigh the information accordingly.
3.3.4 Specific interaction patterns. Risk workshop facilitators decide who is allowed to speak in what order, thereby determining the interaction patterns. Using a random order as a baseline, we investigate the following interaction patterns, giving priority to: Concern: The probability of being the next sender is higher if the participant's risk assessment is "high." The probability of being the next sender is higher if the participant's assessment differs largely from the average (numerical) group risk assessment. Hierarchy: The probability of being the next sender is higher if the participant is assigned a higher hierarchical position. Homogeneity: The probability of being the next sender is higher if the participant's risk assessment is close to the average group risk assessment. Table 1 provides an overview of our simulation experiments. Figure 3 shows at the top the results of the simulation experiment for the ideal speech-situation conditions. It depicts, per discussion round, the specific proportion of simulated discussions that has reached a particular consensus type or failed to reach a consensus [14]. Before the discussion (i.e. in discussion round zero), no consensus is reached on the risk assessment in 38% of the simulated discussions. The reason is that participants, at the start of the discussion, base their risk assessment only on their limited sets of information. Achieving a (correct) consensus before the discussion is driven by chance. Moreover, we observe a tendency of initially underestimating risks (i.e. reaching a consensus, but misclassifying high risks). This is because of the lack of knowledge about the existence of certain information nodes. Initially, participants often overlook information about the risk structure and do not account for uncertainty regarding the probabilities of corresponding nodes (i.e. they do not yet know what they do not know). In our model, corresponding to the real-world distribution of risks, most information nodes are in the "low" likelihood state. Consequently, participants underestimate the risk until they, by learning something new about the risk structure, become aware of theirso far unconscious uncertainty. Therefore, in the early discussion rounds, the low risks are overproportionally correctly identified, compared to the high risks.

Ideal speech-situation conditions
Until discussion round seven, the driven-by-chance consensus drops over all simulated discussions. After this round, an increasing proportion of the discussions results in a consensusstemming from the increased amount of shared information (thus, from a better knowledge of the risk structure and the corresponding information). After 39 discussion rounds at most, all information is shared and adopted by all participants, resulting in a correct consensus for nearly all discussions [15]. The required maximum of 39 discussion rounds is determined by sum of the 27 information, the nine issue and the three domain nodes that must be shared to attain the overall risk assessment.
Overall, even under ideal speech conditions, it is apparent that a correct group assessment of a risk involves many discussions rounds and is error prone. Moreover, even if the participants reach a consensus, this consensus could be premature and wrong. Hence, the presence of a consensus is only a reliable indicator of a correct assessment after a large proportion of information has been shared. Figure 3 also shows that, when limiting information transfer, even after 78 discussion roundstwice as many rounds as in the benchmark processonly 84% of the discussions had reached a correct consensus. As the receivers do not fully integrate new information in their belief updating, senders may have to talk repeatedly about the same information to gradually increase their information's impact on the receivers' risk assessment. At the same Notes: The table presents the conducted simulation experiments, along with their respective experimental conditions, outcome variables, number of simulated discussions, number of high and low risks in the benchmark assessment and the section in which the findings are presented. Variables that vary from experiment to experiment are marked in bold. We use a nested design for the simulation experiments targeted at the group characteristics and at the interaction patterns. From simulation experiment 2, we select "leader follows majority" as the decision approach and "after ten stable rounds" as the termination approach, for these experiments. a A simulated discussion is the discussion of a single generated risk over several discussion rounds. In each round, a participant shares some information with the group. Each discussion was simulated for 140 discussion rounds, as this was sufficient to reach ten stable rounds for all discussions -which is our strictest stability criterion for the termination of a discussion. Deciding on the number of simulation runs typically involves balancing computational costs and getting representative data generated by the simulation's stochastic process (Lorscheid et al., 2012). b A stable round is defined as a discussion round in which the risk assessment does not change from the previous round. A discussion is said to have a number of stable rounds (i.e. the participants' perception is that they do not learn anything more from the discussion) if the average (numerical) group assessment does not differ more than 2% for the same number of consecutive rounds Table 1.

Simulation experiment 1: limits to information transfer
Overview of the simulation experiments time, as discussion rounds continue, the group assessments' classification becomes stable, sometimes without attaining a correct consensus. Thus, even after many discussion rounds, the unwillingness or inability to fully incorporate the sender's information impedes the achievement of the benchmark assessment.

31.3
Notes: The table depicts the results of the third, fourth and fifth simulation experiment, respectively, and shows the percentage of risks that were correctly assessed, and the average number of discussion rounds before the decision was made. For each experiment, italic values highlight the highest percentage of correct assessments per type of risk and the lowest average number of required discussion rounds. Simulation experiment 2: A stable round is defined as a discussion round in which the risk assessment does not change from the previous round. A discussion is said to have a number of stable rounds (i.e. the participants' perception is that they do not learn anything more from the discussion) if the average (numerical) group assessment does not differ more than 2% for the same number of consecutive rounds. If the leader follows the consensus, but no consensus is reached, the assessment is counted as incorrect. Simulation experiment 3: If the information is unequally distributed, it means that the information is distributed among the participants so that the best-informed participant knows twice as much as the second-best informed participant, who knows twice as much as the least informed participant. If receivers consider hierarchical differences, they weigh the sender's input according to their difference in hierarchy values: h low = 0.25, h medium = 0.5, h high = 0.75. If receivers have no transactive memory, they do not distinguish between the input of an expert sender and a non-expert sender. Simulation experiment 4: When concerned participants are prioritized, the probability of being the next sender is proportional to the probability that they assign the "high risk" state to the overall risk assessment. In a deviation from the standard sequence of actions in the simulation -in this setting -participants select the information to share with a likelihood proportional to the probability they assigned to the "high" state of the respective information node. When dissenting participants are prioritized, the probability of being the next sender is proportional to the difference between their risk assessment and the group's risk assessment. When participants are prioritized based on their hierarchical position, the probability of being the next sender is proportional to a hierarchy factor they are assigned: h low = 0.25, h medium = 0.5, h high = 0.75. When participants close to the group opinion are prioritized, the probability of being the next sender is proportional to the inverse of the difference between their risk assessment and the group risk assessment Table 2.
Talking about the likelihood of risks 4.3 Simulation experiment 2: incomplete discussions Table 2 aggregates the effects of a leader's three decision approaches, that is, relying on his or her individual risk assessment, accepting the group's consensus or following the majority's opinion. Leaders who follow their own or the majority's opinion outperform the consensus requirement. Regarding all decision approaches, we investigate what happens when the discussion is terminated after one, five or ten stable rounds. We find that this clearly impacts the percentage of correct assessments. Overall risks, a continuation of the discussion generally improves the correctness (e.g. a decision that follows the consensus after ten stable rounds, instead of five stable rounds, improves the overall percentage of correct risk assessments from 39.6% to 59.8%). Intriguingly, correct assessments are different for high and low risks. For example, a comparison of the decision approach with the same number of required stable rounds indicates that the leader will make better decisions by following the majority if the risk is low, but otherwise will improve the decision by relying on his or her individual risk assessment. Terminating the discussion when achieving a first consensus only leads to a correct assessment in 57.7% of the discussions with an actual high risk, while the same termination approach leads to a correct assessment in 97.2% of the discussions with an actual low risk. Moreover, an increase in correctness in high-risk assessments over the discussion time comes at the cost of a slow decrease in correct low-risk assessments. Given that firms want to reduce the severity of the risks that they are facing, and that this severity is the product of the risk's impact and likelihood, ceteris paribus, firms will want to identify the high likelihood risks at least correctly and then mitigate these risks. If this holds, based on our findings, firms are encouraged to make allowance for longer discussions to avoid misidentifying high risks. This trade-off (Figure 3) is partially the result of the previously discussed initial tendency to underestimate risks, as participantsat this point in timelack knowledge of the complete risk structure, resulting in objectively unjustified certainty ("unknown unknowns"). At this point in time, participants are correct with their "low" assessment, but for the wrong reason. However, as participants subsequently become aware of their lack of knowledge without obtaining information about the likelihood of nodes, they start to overestimate the actual risk as they assign likelihoods to the new nodes. Here, participants also assign small non-zero probabilities to the "medium" and "high" states of the node for the corresponding information node. Consequently, until they learn about the actual state of an increasing number of nodes, many participants assess the overall risk to be high and only switch to a low risk assessment when they learn about the actual state of "low" information nodes.
An increase in the stability requirements is accompanied by an increase in the average number of required discussion rounds. This increase may appear trivial, but it should be noted that it is over-proportional to the number of stable rounds (2.1 for one stable round vs. 17.8 for five stable rounds vs. 33.5 for ten stable rounds). While the overall correct risk assessment only improves in a somehow linear manner, the time costs of these improvements show a steeper non-linear increase. Table 2 reports the effects of a variation in group characteristics for a condition in which the leader follows the majority after ten stable rounds [16]. As expected, for all risks, we observe the highest correctness (78.2%) when information is equally distributed, receivers do not consider hierarchical differences, and receivers possess transactive memory. Moreover, we find the highest proportion of correctly identified low risks (75.0%) in the same setting. Notably, the highest share of high risks is correctly assessed when there are deviations in all three investigated group characteristics. Under this condition, after the required default of ten stable rounds required by this simulation experiment, the risk structure has already been learned (i.e. knowledge of the existence of the nodes has been gained); thus, the discussion focuses on the nodes' embedded information. Here, a suboptimal discussion generates noise, as the experts are unable to reduce the other participants' uncertainty. Because not all information is equally discussed, the hierarchically higher participants prevail over the experts, and the expertise of the experts is not recognized. Overall, this does not eliminate the small non-zero probabilities of the "medium" and "high" states of the nodes and leads to an overestimation of all risks. This is the situation in which agents are right with their "high" assessment, but for the wrong reasons [17]. Table 2 indicates that the highest correctness for all risks (88.9%) is observed when prioritizing concerned participants. Prioritizing participants that are close to the group opinion leads to the quickest agreement (31.3 discussion rounds), but at the cost of lower correctness. This aligns with the previous literature's findings that caution against concurrence seeking inherent in the groupthink effect, specifically in risk assessments (Hunziker, 2019;Janis, 1972). Interestingly, we observe improvements when deviating from the equal participation condition suggested by the ideal speech.

Contributions and discussion
Our results make three contributions to research and practice. First, we demonstrate that increasing the discussion rounds during a risk workshop may decrease rather than increase the rate of correct assessments for certain risks. Specifically, we identify a potential trade-off between the correct assessment of high and low risks. Along with an increased duration, on average over all risk workshops, the assessments progress from an underestimation to an overestimation of risks. As any improvement of one risk type's correctness reduces the correctness of other types, risk workshop facilitators can choose their discussion termination approaches on this basis. For example, if the correct assessment of high risks is prioritized, attention should be given to the longest possible continuation of the discussion (under the existing resource constraints). We contribute to research by highlighting the peculiarities in the identification of low and high risks over the duration of the group discussions. Future studies should include this distinction in their analyses. For example, would the results of Moreland and Myaskovsky (2000) who find a positive effect on group performance of a group member's familiarity with the expertise of othersstill hold in a risk assessment setting that specifically addresses high or low risks? Second, we go beyond an ideal speech situation, as we show that this theoretical notion might provide misleading practical guidance. A lengthy discussion that terminates after a large number of stable rounds does not necessarily lead to better outcomes for all risk types. While Stasser and Stewart (1992), following their simulation of political caucuses, concluded that lengthy discussions do not necessarily lead to better decisions, we transfer their finding to a firm-based risk assessment setting, thus indicating that the specific context of discussions is not a boundary condition of this finding. A decision not based on consensual agreement does not prevent good decisions. Thus, we substantiate the conceptual claim that the final risk assessment should be based on the leader's own assessment (Quail, 2011). Rather than allowing everyone to participate in an equal way, we see that facilitators can improve the group's risk assessment by encouraging the participation of those with concerned views. Herewith, we provide evidence supporting the effectiveness of an approach that countervails the concurrence-seeking, groupthink effect in risk workshops. Overall, risk workshop facilitators can learn from our study that an increase in workshop Talking about the likelihood of risks effectiveness cannot be achieved by simply improving a single design component. Instead, it requires a complete overhaul towards the theoretically ideal conditions, as shown in our benchmark process. Research can profit from our findings by using the identified conditions as a new baseline for further investigations of risk assessments. For example, we complement the work of Katzenbach and Smith (2015) who argue in favor of determining rules of interactionsby providing evidence of the need to prioritize concerned participants. Third, we contribute methodologically to the risk assessment literature by introducing a novel approach that uses ABM in combination with simulation experiments. We therefore respond to the call of Bromiley et al. (2014), who argue that studies with a known objective risk facilitate an understanding of why and how risk assessments fail to meet expectations. While such a benchmark is usually unavailable in case studies or surveys of the risk assessment practice (McNamara and Bromiley, 1997), it can be generated through a simulation experiment approach. Moreover, in a single study, this approach enables us to disentangle a multitude of effects on the risk assessment. Prior studies mainly focused on either an aggregate effect or on a single effect (Kim and Park, 2010). Finally, it must be noted that our ABM enables us to model individual cognitive processes, including the individual's weighting of the received information and the existence of a transactive memory, and the related group-level outcomes. To the best of our knowledge, this is the first risk assessment study that investigates individual cognitive processes in conjunction with organizational variables. Our modeling might serve as a stepping stone to future risk assessment investigations.

Summary and limitations
Risk workshops are a common technique of risk assessment and, if effectively used, constitute a powerful risk management instrument. However, difficulties such as defining benchmarks, disentangling different effects on the risk assessment and capturing individual cognitive processes in discussion processes pose serious challenges to a better understanding of the design and implementation of discussion processes in risk workshops. This study responds to these challenges. It theoretically draws on the notion of transactive memory, links it to the ideal speech conditions, and investigates how deviations from this situation, likely to occur in realworld risk workshops, change the risk assessment outcomes. We ran five simulation experiments rooted in ABM to disentangle the effects of different deviations.
Our results provide fine-grained insights into the processes and outcomes of risk workshops. First, even though the risk assessment stabilizes with an increasing number of discussion rounds, limits to information transfer can prevent a correct consensus. Second, contrary to our theory and the intuition of group discussion literature, we find that increasing the required number of stable discussion rounds before conducting the risk assessment worsens the correctness for low risks. Third, we show that, for high risks, after ten stable discussion rounds, the co-occurrence of seemingly detrimental group characteristics leads to the highest, instead of the lowest, level of risk assessment correctness. Finally, prioritizing concerned participants, instead of ensuring an equal chance to speak, leads to the highest level of risk assessment correctness.
Admittedly, this paper has limitations that future research should address. First, our analysis simulates a risk workshop that discusses a single risk. While a conscious choice to avoid obfuscating the results with the likely effect of interdependencies across risks, we encourage future studies to use our single risk model as a baseline to investigate these interdependencies' effects. Second, we focus on a classification task that ultimately makes a binary distinction between high and low risks. While we believe that our approach enhances the clarity of the results' communication, future research might be interested in investigating the outcomes of a ternary task. Third, our analysis models nine participants in the discussion. While nine is within the range of participants common in risk workshops (Ackermann et al., 2014) and the untabulated results of the simulation experimentsran with three and 18 participants qualitatively support our findings, any related choice is arbitrary; future research should investigate our findings' sensitivity to group size change. Fourth, as we do not address all possible deviations from the ideal speech situation, future research could account for participants' heterogeneous motivation as suggested by Bromiley et al. (2014). Likewise, it could clarify factors like hidden agendas or an increase in limits to information transfer over time owing to an increasing instead of a constant cognitive load. Notes 1. Risk means uncertainty about how potential events may affect the organization. These events may have positive and negative outcomes (COSO, 2017). In this paper, to enhance clarity, we restrict ourselves to the common focus of organizations, that is, those risks that may result in negative outcomes (COSO, 2017). However, our modeling is applicable to both threats and opportunities. For example, when considering interaction patterns in risk workshops, we refer to "concerned" participants; in a threats and opportunities language, a better label would be "concerned or enthusiastic" participants.
2. "Information" refers to the participant's organized data in the context of the risk assessment task, while "knowledge" refers to cognitively processed and aggregated information that enables participants to reach an understanding of the assessed risk.
3. A "correct consensus" refers to a risk assessment that is shared by all the participants of the risk workshop and that corresponds to the benchmark assessment that is ex ante established as correct.
4. We model a risk workshop that deals with a single risk. Investigating potential interdependencies in the risk assessment, when discussing several heterogeneous risks in a single risk workshop, is beyond this study's scope.
5. Usually, laboratory experiment participants are surveyed before and after the discussion. As the capturing (of change) of perceptions during the discussion would disrupt the process, it is generally avoided in laboratory experiments. 6. A consensus is considered true when every competent person agrees with it (Habermas, 1971). Note that a "correct consensus," as previously defined, does not have to be a "true consensus." For example, because the risk workshop conditions allow participants with no knowledge (i.e. not fully competent on the risk considered) to participate, all participants may still reach a correct consensus, albeit not a true consensus.
7. We use the term "discourse," which is common in Habermas' work, as a synonym for "discussion." The latter is used throughout the remainder of this paper.
8. The simulation code and the ODDþD (Overview, Design Concepts and Details þ Decision) protocol are available online at www.comses.net. The protocol provides a standard description of ABMs that include human decisions (Grimm et al., 2006;Müller et al., 2013). We use it to detail the information provided in this section.
9. A mental model is an internal representation of a human's understanding of a system (Rouse and Morris, 1986).
10. Gaining knowledge about the existence of a new domain or issue node and obtaining information about the likelihood of this particular node happen in different discussion rounds. When knowledge about the structure of an issue node is acquired, agents simultaneously learn about the existence of the underlying information nodes.
Talking about the likelihood of risks 11. For example, a likelihood of 100% for the "low" state signifies that the participant is absolutely certain about the assessment. A likelihood of 80% for the "low" state and, for example, 14% for the "medium" and 6% for the "high" states indicate some uncertainty regarding the actual state of the node.
12. Risk workshops can differ substantially regarding their number of participants. We chose nine participants for our simulation experiment; a group size within the common range for risk workshops (Ackermann et al., 2014).
13. For example, in the ideal speech situation, the non-expert receiver will weigh an expert opinion with 100%. With limited information transfer, a non-expert will weigh an expert opinion with 90% and the prior belief with 10% (e.g. a prior belief of 1% in the high state of an information will turn into a 91% = 90% Â 100% þ 10% Â 1% belief after talking to an expert who assigns 100% to the likelihood of the high state).
14. It is important to note that the Bayesian network is calibrated in a way that always results in a "low risk" or a "high risk" assessment of the overall risk. This simplifies the interpretation of the simulation results. In our Bayesian network, nodes aggregate the input from three other nodes. Because at least some input nodes assign high likelihoods to the "low" or "high" states, as is inevitably the case, the likelihoods assigned to the "medium" state decrease with each level of aggregation. As a result, the participants are presentedde factowith a binary assessment task.
15. Owing to the slight imprecision inherent in the computational framework, 100% correctness is never achieved.
16. For the sake of simplification, a condition for the decision and termination approach must be chosen for both simulation experiments 3 and 4, instead of running simulations experiments for all theoretically possible conditions. We chose the condition that leads to the highest proportion of correct assessments over all risks in the simulation experiments for simulation experiment 2. Untabulated robustness analyses show that changing this condition (e.g. the leader follows consensus after five stable rounds) does not qualitatively change the inferences from our findings.
17. In this simulation experiment, the unequal distribution of information among participants is operationalized so that the best-informed participant, on average, knows twice as much as the second best-informed participant, who in turn knows twice as much as the next best-informed participant, etc. (i.e. we work with a factor of two). In untabulated robustness tests, we run the same simulation experiment with factor 1.5 and factor 2.5. The robustness tests show the same overall direction of effects for all three factors. The only exception is a factor of 2.5, where information is initially highly concentrated within a small group of participants. This increases the participants' initial ignorance of their knowledge ("I don't know what I don't know"), resulting in a slightly lower-level recognition of high risks compared to a setting with a factor of 2.0.