Crowd modelling: aggregating non-expert views as a method for theorizing

Purpose – This paper aims to introduce a crowd-based method for theorizing. The purpose is not to achieve a scienti ﬁ c theory. On the contrary, the purpose is to achieve a model that may challenge current scienti ﬁ c theoriesorlead research in new phenomena. Design/methodology/approach – Thispaperdescribesacasestudyoftheorizingbyusingacrowd-based method. The ﬁ rst section of the paper introduces what do the authors know about crowdsourcing, crowd science and the aggregation of non-expert views. The second section details the case study. The third section analyses theaggregation.Finally,thefourthsectionelaboratestheconclusions,limitationsandfutureresearch. Findings – This document answers to what extent the crowd-based method produces similar results to theoriestested andpublished byexperts. Research limitations/implications – From a theoretical perspective, this study provides evidence to support the research agenda associated with crowd science. The main limitation of this study is that the crowded research models and the expert research models are compared in terms of the graph. Nevertheless, some academics mayargue that theorybuildingis about anacademicheritage. Practical implications – This paper exempli ﬁ es how to obtain an expert-level research model by aggregatingtheviewsof non-experts. Social implications – This study is particularly important for institutions with limited access to costly databases, labs andresearchers. Originality/value – Previous research suggested that a collective of individuals may help to conduct all the stages of a research endeavour. Nevertheless, a formal method for theorizing based on the aggregation of non-expert viewsdoes notexist.This paper providesthemethod andevidence of itspractical implications.


Introduction
Social phenomena are more complicated than it was in the 1960s when the foundations of the current scientific research were grounded. Feldman and Orlikowski (2011) discuss the necessity of thinking in new ways of theorizing to understand contemporary organizations which nowadays are more complex, dynamic and distributed than before. It is required a systemic view of practice focused on a broader range of stakeholders (Brodie et al., 2017) and path-breaking research (Möller, 2017). Since the 1960s, at least, researchers have put particular attention to how a theory must or must not look (Bacharach, 1989;Dublin, 1976;Feldman, 2004). Nevertheless, the distinction between theory as a product and theorizing as a process is less discussed. Theorizing is the process conducted to produce a theory (Folger and Turillo, 1999).
For Weick (1989) theorizing is a process of discovery, grounded on an artificial selection guided by the intentionality of the theorist. For Weick, a theory is a justified product rooted in a natural selection guided by validation and empirical evidence. This distinction is important because products improve by improving the process that produces them and a process can be standardized, taught and learned (Swedberg, 2016). Shepherd and Suddaby (2017) and Jaccard and Jacoby (2010) illustrate some practices and methods for theorizing. Unfortunately, all of them put at the centre of the theorizing process an expert researcher. This study contributes by providing a method for theorizing led by the aggregation of nonexpert views. The purpose is not to achieve a scientific theory. On the contrary, the purpose is to achieve a model that may challenge current scientific theories or lead research in new phenomena.
The shift from a discourse of theory-praxis gap to theoretically path-breaking research leads to more exciting theories (Möller, 2017). Nevertheless, this shift is not easy to achieve. To be concerned with justification at the time of discovery can be counterproductive to theory generation (Weick, 1989). Academic institutions advance theory by relying on experts but, unfortunately, experts tend to perform familiar tasks (Hoffman, 1998). Thus, experts may omit variables that matter and include variables that do not (Page, 2008). This omission or inclusion may lead to the problem of "homogeneous theorizing". Homogeneous theorizing is the unconscious selection of constructs, relations, explanations, theoretical frameworks or research methods closer to the cognition of the theorist mainly because it is cognitively more comfortable to fit the problem to those elements, not because the elements are the most suitable for the problem.
The collaboration with peers may diminish the problem of homogeneous theorizing; nevertheless, homophily, social influence or institutional incentives may lead to similar results. Homophily affects when the members of a group lock to common perspectives (Page, 2008) for example, a school of thought. Social influence happens when people considered experts provide comments that change the decisions of others (Bolger and Wright, 2011). For example, in a publication process. The third affects when an academic considers a few sources of information. For example, when a few journals are deemed worthy of being studied. Consequently, the integration of non-experts in the research process is an alternative. Internet facilitates integration (Linstone and Turoff, 2011); especially following the principles of crowdsourced research (Love and Hirschheim, 2017). The remaining introduction is divided into two sections. Firstly, the literature review about crowdsourcing and crowd research is discussed. Secondly, the literature review associated with the aggregation of mental models is elaborated.

Crowdsourcing and crowd science
In 2004, James Surowiecki (2004) writes his book "The wisdom of crowds" to points out that a crowd of random people, in some circumstances, may outperform the best individuals. He explains that the crowd leads to a kind of collective intelligence that may achieve better results than the smartest individuals. In this strand of thinking, in 2006, Jeff Howe coined the term "crowdsourcing" to highlight a phenomenon: a distributed network of external collaborators may help business enterprises to solve their problems. Since then, a theory to explain collaborative platforms enabling successful crowdsourcing endeavours is required. Because of more than 10 years of research, the fundamental pieces of a theory of wise crowds already exist and are described as follow: Disciplines: After analysing the evolution of crowdsourcing from 2006 up to 2015, Ghezzi et al. (2018) conclude that the main disciplines studying this topic are information systems (e.g. technologies, platforms, mechanisms and algorithms), marketing (e.g. advertising and promotion), strategy (e.g. business models) and organizational studies (e.g. behaviours, motivations and performance). Theories: Ikediego et al. (2018) conduct a general review and states that the main theories trying to explain the phenomenon of crowdsourcing are the strategic theory (e.g. knowledge-based theory and resource dependency theory), the economic theory (e.g. transaction cost theory) and relational theory (e.g. institutional theory, social capital theory and social exchange theory). Who: Ikediego et al. (2018) argue that both organizations and individuals may conduct crowdsourcing. Yu et al. (2018) complement this idea by indicating that it is better to add diversity to the crowd instead of adding expertise. What: For Schenk and Guittard (2009) two types of the task may be crowdsourced: creative and non-creative task. For them, the non-creative task has two categories: routine tasks and complex tasks. In this strand of thinking, Ikediego et al. (2018) classify the tasks in micro and macro projects. According to them, micro-projects are repetitive, do not require domain-specific knowledge, are completed in few days and do not require special organization or incentives. Macro projects are just the opposite. Why: Ikediego et al. (2018) indicate that companies may have incentives such as the reduction of cost in the innovation process, the reduction of the time to market, the increment in the quality of the results or the increment in the diversity of both inputs and outputs. On the other hand, Schenk and Guittard (2009) notice that individuals are motivated mainly for ego gratification, reputation or economic incentives. When: Boudreau and Lakhani (2013) identified four scenarios when it is recommended to use a crowdsourcing schema. Firstly, when there is not a definition of what a good solution will look like. Secondly, when the participations will be accumulated and recombined into a valuable whole. Thirdly, when the bidder needs solutions to multiple problems. Fourthly, when there is the need to match offer and demand. How: Brabham (2008) estates and Page (2008) explains that the "wisdom of crowds" is not an average of the solutions provided. The wisdom comes from the aggregation of their diversity. Quality: For Schenk and Guittard (2009) the variety (or diversity) of options provided by the crowd is part of the quality of the crowdsourcing endeavour. The authors also indicate the originality and the fit with the expectations.
The research in crowdsourcing and collective intelligence also led to illustrations of crowds outperforming industry experts (Poetz and Schreier, 2012). Some of them, in academia. These exercises in parallel with the movement of citizen science, which searches for the inclusion of non-experts in the research process, led to a new strand of research: crowd science. According to Scheliga et al. (2018), there are four strands of research associated with crowd science. Firstly, those cases where individual projects are described. Secondly, those where potential fields of application may exist. Thirdly, the stream arguing whether the borders of scientific knowledge are getting blurred. Finally, the stream arguing that crowd science will democratize academia. Nevertheless, there is a lack of practices to integrate a crowd of non-experts into existing scientific practices. According to Franzoni and Sauermann (2014), one way to include scientific rigour in crowd science projects is to embed them in the software or platforms tools supporting crowd science. Still, there are no mechanisms that ensure group diversity, independence of participants and decentralization (Yu et al., 2018). This paper proposes a mechanism with such characteristics for the theorizing process. This mechanism is based on the aggregation of non-expert views. The reasoning behind their aggregation and the expectation of being a useful method for theorizing is argued in the following section.

Aggregation of non-expert views
The concept of "expert" is not widely accepted. Relational theories argue that someone is an expert because others say so and realist/substantive theories say that performance is the main reason to justify that someone is an expert (Collins and Evans, 2007). According to the latter, this study defines an expert as someone who provides a richer set of alternatives and interpretations than the average person during the theorizing process. Experts present a broader set of other options and interpretations than novices (Foley and Hart, 1992). Still, the aggregation of diverse non-expert views may achieve better solutions than the expert because the cognition of one individual does not restrict the aggregation. In other words, adding experts trained with similar methods, theories or outlets preferences to the theorizing process improve the process less than adding diversity.
The Lattice theory of mental models for complex systems (Moray, 1988(Moray, , 1990(Moray, , 1998(Moray, , 1996 explains why a non-expert view serves as a source of evidence during the theorizing process. This theory postulates that human beings construct models of causal relationships to explain how a particular system works. Folger and Turillo (1999) recognize that perception, imagination or discourse builds mental models and Hoffman (1998) indicates that those associations represent cause-effect relationships among objects or events. Page (2008) states that mental models are predictive models used by human beings to make decisions. Therefore, during this paper, the reasoning is as follows: Firstly, a non-expert view is: (1) A personal theory used by a non-expert to explain a phenomenon.
(2) An elicited mental model from a person who is not an expert but has perceived the phenomenon.
Secondly, the person builds mental models based on real-world evidence with information lost. Thirdly, the aggregation of multiple mental models may reduce the loss. Fourthly, the variability conveys the relevance of those elements. The aggregation of non-expert views is a plausible method for theorizing. Still, this method must avoid problems such as interaction among views and homogeneity of participants. The former promotes fear of disagreement, waste of time in meetings, the difficulty of schedules, social loafing, group polarization and dominant participants (Nadkarni and Nah, 2003). Therefore, Rowe and Wright (2011) recommend removing any indicator of the majority or minority opinions and confidence levels during information exchange. The latter maintain events and objects inside of the boundaries of the owners of the project (Lukyanenko et al., 2017). Consequently, to organize the methods that may aggregate mental models, this study proposes a 2 Â 2 matrix (Table 1). based on the criteria of heterogeneity of members and Information exchange.
The first quadrant includes methods that do not limit group interaction or homogeneity of inputs (Ossadnik et al., 2013). The second quadrant includes methods that avoid group interaction but remain the homogeneity of inputs (Kwahk et al., 2007). The third quadrant includes methods that pursue the heterogeneity of inputs but do not avoid group interaction (Cunha et al., 2016). The fourth quadrant includes methods that explicitly avoid group interaction and foster the heterogeneity of inputs (Nadkarni and Nah, 2003). This study is different because: It proposes a method that receives and aggregates the inputs of non-experts. That includes the leading researcher. It may consider a few modellers or huge crowds. It promotes crowd science.
This study proposes a method for the fourth quadrant. The case study introduced in Section 2 reports the results of designing and implementing a method for theorizing which follows the characteristics explained above. In particular, the case study answers: To what extent the aggregation of non-expert views may lead to an expert-level research model?

Case study
Since 1995 the Dolphin programme has connected undergrads with professors around the country or overseas. The selected undergrads have the best grade point averages (GPAs) in each department. The programme runs yearly. Each undergrad collaborates with a Professor by doing research for around eight weeks. The aim is to foster the undergrads' research spirit. In 2018, Colombia and Costa Rica integrated their students into the programme. A total of 6,309 undergrads participated just in México. The number of Professors involved was around 1,900 worldwide. The Professors became tutors entirely for free. For many undergrads, it is the first time they leave their home and they are involved in a formal research project. The Dolphin programme organizes a conference at the end of the summer. During the meeting, the undergrads discuss their results.
2.1 Pre-study 2.1.1 Participants. Ten undergrads (henceforth "modeller") were selected out of 138 to participate in the pre-study. The modellers invited covered the five areas of knowledge defined by the coordination of the Dolphin programme. Those areas are physics, mathematics and science, biology, chemistry and agricultural science, medicine and health, humanities and behavioural science and social, law and economic science. Nevertheless, only nine modellers appeared. The modellers were selected based on their availability. Their age was around 20 and 22 years old. The leading researcher called them one by one. During the call, the leading researcher explained the relevance of the project, indicated the show-up reward and listed the free slots in the schedule. Each modeller selected the best time for him/her and was a first-time applicant. For each modeller, the show-up award was two movie tickets. The main researcher invited five assistants to avoid the intervention of the researcher during the data collection. The assistants helped to prepare the materials and were trained one day before the pre-study. The training consisted of reading the protocol and practice with their peers. They were students from a university with no relation to the study. The assistants were eager to participate because this activity helped to fulfil part of their graduation requirements. The assistants were psychology students in the last term. They were three women and two men who have never participated in the Dolphin programme. They assisted for 2 h every day. The principal researcher assigned one male and one female to lab 1 and one male and one female to lab 2. Another female helped to capture the data into the system and prepare the materials.
2.1.2 Sampling. The following step was to sample modellers and concepts. Modellers sampling must assure their cognitive differences. This type of sampling is theoretical rather than statistical (Glaser and Strauss, 1967). Page (2008) highlights that more than the number of crowd members what matters is the cognitive diversity of them. Therefore, the question is not about if there are enough modellers, but if there is enough diversity. Onwuegbuzie and Collins (2007) suggest collecting information from three to six focus groups formed by six to nine participants. Therefore, a pool of diverse modellers from 18 (3 Â 6) to 54 (6 Â 9) shall be enough.
Concept sampling also follows this strand of thinking. The researcher may argue the following reasons to consider a pool of concepts: a single theory does not fit the phenomena but concepts from multiple theories may help to theorize about it; concepts of competing theories can actually complement each other; alternative causal reasoning is required for current theories; and the concepts challenge a current theory, among others relevant for theoretical purposes.
The set of concepts must be grounded on previous knowledge and must vary from five to nine. Ideally, seven, which is the average number of concepts that, cognitively, a human being can work with (Miller, 1956). 2.1.3 Materials. Each modeller worked with four theories "A", "B", "C" and "D". Nevertheless, the order differs according to the data collection design. Consequently, to provide the appropriate material in each session, the material of each theory was organized inside an independent yellow folder. The yellow folder had a tag associated with the theory (i.e. "A", "B", "C" and "D") and a legend. The legend stated, "do not open it. Wait until the researcher says so". Each yellow folder contained three components. Firstly, a grey folder tagged "Training". Secondly, a grey folder tagged "modelling" plus the name of the theory (i.e. "A", "B", "C" and "D"). Finally, a half-page indicating the word "activity" plus the name of the activity (i.e. "A", "B", "C" or "D"), the context of the activity and the objective.
The training material comprehends a set of paper-based components. The set includes: Two independent concepts. The effect. Arrows to connect the concepts among them and with the effect. "þ" and "À" symbols to indicate the type of relation. "Relevant" and "irrelevant" tags to indicate whether the concepts are necessary for the theory. "Point" tags to rank the concepts based on their relevance to the theory.
Independently of the number of theories to work with, the modeller only uses the training material once. A set of four videos were recorded and played during the training session to guide the modeller. The working material is inside the grey folder tagged "Modelling" and it differs based on the theory associated with the folder (i.e. "A", "B", "C" and "D").
The data collection took place in a University in Mexico that participates and organizes the Dolphin programme. It was required two labs for data collection. The labs were in a closed section of the University's library. Each lab used its own set of materials (i.e. yellow folder "A", "B", "C" and "D"). The lab equipment consisted of a personal desk, a chair, two tablets, a headphone and a microphone. The microphone and the first tablet recorded the explanations of the modeller. The second tablet played the videos with the instructions for the modeller. Both labs were far enough from each other. Consequently, the modellers did not hear the explanations of another modeller. All the material was white and black to avoid distractions ( Figure 1) and the assistants used a printed protocol to conduct the sessions.
2.1.4 Procedures. The data collection conducts a within-subjects experiment design to answer the research question. The within-subject design is the best option because all the modellers participate in the four treatments (Creswell, 2014). Consequently, it is possible to measure the results of a modeller after receiving each treatment. Figure 2 depicts the withinsubjects repeated measures design. The objective of the design is to avoid the risk that the order of the treatments affects the results. Consequently, modellers in Group A models the theories in the following order A-B-C-D. Modellers in Group B models the theories in the following order B-C-D-A. Modellers in Group C models the theories in the following order C-D-B-A and finally, Modelers in Group D models the theories in the next order D-A-B-C. Then all the results are evaluated as a single group.
The purpose of the research is not to evaluate the effects of the order in any variable. Nevertheless, if all the modellers received the same treatment, it could be a cofounded variable. Consequently, the order (i.e. A, B, C and D) changed to reduce the possibility of bias. Therefore, the design considers four groups so that each one starts with a different theory. An exhaustive evaluation shall include 24 groups instead of four. That is the permutation of the four theories (i.e. 4 Â 3 Â 2 Â 1). Still, as explained above, an exhaustive evaluation of the order is not the purpose of this study. Because in a typical setting, a crowd of modellers will work with only one theory.
The theories selected as conditions represent two factors ( Table 2). The first factor is the number of concepts (including the effect). This number may be fewer or greater than seven concepts. The second one is the distance from the farthest independent concept to the effect. This distance is <=3 or > 3. This distance is considered based on the longest path. One more criterion also influenced. All of them must be from information systems contexts. The reasoning behind this criterion is that none of the modellers studied information systems or computer science or a related field. Still, all of them have used information systems. Consequently, the modellers are not experts, but they have lived the experience, or, at least, they have the cognitive tools required to suggest a plausible research model in the area. Lin and Bhattacherjee (2010) propose Theory A. This theory has six concepts and the "distance" is three concepts. The distance represents the path technical quality -> perceived enjoyment -> attitude. Laumer et al. (2016) propose Theory B. This theory has eight concepts and the distance is two. The distance is calculated with the path routine seeking -> perceived usefulness. Davis et al. (1989) proposed Theory C. This theory has six concepts and the distance is five. The distance represents the path external variables -> perceived ease of use -> perceived usefulness -> attitude towards using -> behavioural intention to use. Finally, Chen et al. (2012) propose Theory D. This theory has eight concepts and the distance is four. The distance is calculated with the path system quality -> confirmation -> perceived usefulness -> satisfaction.
During the data collection stage, the procedure was as follows: Firstly, the modeller arrived. Then, the researcher verified the identity, the lab assigned and finally, provided a code. After that, the modeller went to the indicated lab and provided the code to the assistant (Table 3). Thus, the assistant contrasted the code with the protocol and, based on that, the assistant assigned the first yellow folder. Afterwards, the assistant played the training videos and the modeller used the training material. The modeller asked whether the modeller has any doubt. If there was no doubt, the modeller continued with the working materials. If there was a doubt, the modeller repeated the training videos. The training videos helped to standardize the indications. Finally, the assistant photographed the research model and provided, according to the order of the treatments, another yellow folder.

IJCS
The modelling process was as follows. Firstly, the modeller identified independent concepts. Secondly, the modeller identified the effect. To differentiate the independent concepts and the effect, the main researcher printed the formers with lower case letters and the latter with upper case letters. Thirdly, the modeller used the arrows to indicate which concepts are causes of another. The effect never is a cause of another concept. Fourthly, for each relation, the modeller shows whether the relation is positive (i.e. "þ") or negative (i.e. "À"). Fifthly, the modeller added an "irrelevant" or "relevant" tag to each of the independent concepts. Sixthly, the modeller added a "point" tag to each of the independent concepts to rank them. Seventhly, the modeller argued the logic of the model and how the relationships depicted by the model explain the effect. The design of the pre-study avoids cognitive fatigue. Each modeller work with two activities per day. Therefore, each modeller had to come for two consecutive days. Each day the modeller came for 1 h. Each modeller should model the four theories in two sessions. This pre-study provided evidence of three things. Firstly, the modellers learnt the method by following the 2.53 min of video training. Secondly, the modellers required around 30 min to complete two tasks. Then, it is possible to finish the four tasks in just 1 h and a half without cognitive fatigue. Therefore, this avoids the risk of losing data for mortality. Thirdly, it is possible to reduce the reward from two movie tickets (one per session) to only one without affecting the results. It reduces the cost of the data collection by half.
2.2 Study 2.2.1 Participants. The researcher and assistants from the pre-study participated in the study. Nevertheless, the modellers from the pre-study did not participate in the study.
2.2.2 Sampling. During the study, 20 new modellers participated. They represented the five fields of knowledge described above. The objective of this stage was to collect the data of four modellers from each domain. The characteristics of the participants were the same as those from the pre-study. Nevertheless, the study considered neither the participants nor the data provided by them during the pre-study. The inclusion of the five areas provided diversity. This diversity refers to mental models and cognitive tools. The inclusion of four modellers from each field was necessary to give a different condition for each modeller. Table 3 conveys the organization of the modellers according to the data collection design. The study required the modeller for only 1 h and a half, only one day. Consequently, the incentive for the modeller changed to only one movie ticket. The sample of concepts was the same as in the pre-study.
2.2.3 Materials. The materials were the same as described during the pre-study. Plate 1 illustrates the lab, the materials organized by treatments and the materials once a modeller has used them to build a model.

Procedures.
The leading researcher contacted all the modellers to arrange the meetings. The modellers indicated which session fitted better for them. Then, the process continued as described in the pre-study section. Nevertheless, the modellers came one instead of two days and the assistant reviewed each model. The model must comply with the following specifications. Firstly, all the independent concepts had a connection with at least another one; on the contrary, the independent concept must have an "irrelevant" tag. Secondly, the effect has at least one independent concept as a cause. Thirdly, all the independent concepts have a "relevance" tag. Fourthly, all the relationships have only one direction symbol. Fifthly, all the independent concepts have only one "point" tag. Finally, the assistant took pictures of all the models.
During the pre-study, some modellers indicated that all the independent concepts were direct causes of the effect. Their models did not include any mediator. This situation appeared at the beginning of the study; consequently, when a modeller did not theorize any mediator, the assistant called the leading researcher. Then he asked the modeller whether he/she understood the task. If so, the leading researcher disregarded the four models provided by that modeller. Therefore, the study only considers models with at least one Plate 1. Infrastructure for data collection mediator. The modellers explicitly said to the leading researcher that they understood the task. Furthermore, they look motivated, asking when the results of the study will be available. Finally, the aggregation of the data started. Figure 3 depicts the algorithm for the aggregation of non-expert views (henceforth, "crowd modelling"). The following sections report the implementation of the algorithm throughout four phases: concept elimination, causal agreement, causal distance and model building. Approval voting is the base for the first two phases. In approval voting:

Execution
The decision maker votes for as many options as desired.
The winner is the option with more votes. The researcher collects more information from the decision maker than from plurality voting. The performance is superior to existing procedures with respect to fairness (Fishburn and Little, 1988). Furthermore: Approval voting is simple to use. Is simple to use. It reflects the decision maker preferences and frequently achieve similar results than competing methods such as Borda winner and Condorcet winner (Regenwetter and Grofman, 1998).

Concept elimination.
Each non-expert view provided a certain number of points for each concept. The algorithm calculated the total number of points for each independent concept and dropped those below the mean. The formula: (k)(n/2) indicates the cut-off, where "n" is the number of non-expert views and "k" is the number of independent concepts. For example, there were 20 non-expert views for each theory. Consequently, the value of "n" was 20 for all the theories. Tasks A and C had five independent constructs; therefore, the value of "k" for these theories was 5. The tasks B and D had seven independent constructs; therefore, the value of "k" for these theories was 7. Then the cut-off for theories A and C was <=50 and the cut-off for theories B and D was <=70.
The leading researcher dropped the following constructs. The independent concept social image in Theory A. The independent concept external variables, in Theory C and finally, the independent concept hedonic value, in Theory D. In Theory B, all the independent concepts remained. This study evaluates "relevant" and "irrelevant" tags. The formula was the same to evaluate the cut-off for the relevance criterion. Nevertheless, "k" always had the value of 1, given that the possible values were binary (1 = relevant, 0 = irrelevant). Considering this criterion, all the concepts, of all the theories, were above the mean. The evidence suggests that the "relevant" and "irrelevant" tags are not required. The aggregated model orders the concepts that remain with consecutive numbers based on their aggregated ranking. The numbers start from one (the least important) and finalizing with the most important (Figure 4). 3.1.2 Causal agreement. The leading researcher defined a language for the aggregated model. The purpose of this language is to convey the degree of the agreement. This language defines four causal-relationships connectors and Figure 5 depicts them. The singular agreement means that only one non-expert view indicates the causal relationship. The formula: m = 1 defines it. The shared agreement means that, at least, two non-expert views suggest the relationship. It is defined by the formula: m > 1. The majority agreement means that more than half of non-expert views indicate the relationship. The formula: m > (n/2) defines it. The consensus agreement means that all the non-expert views indicate the causal relationship. The formula: m = n defines it. In the formulas, the variables are "n" which represents the total number of non-expert views and "m" which represents the total number of non-expert views sharing the same interpretation.
3.1.3 Causal distance. Figure 6 depicts a problem in the aggregation. The non-expert views 1 and 2 agree that the independent concept X is a cause of the effect. Nevertheless, given that the first view adds a mediator, the aggregation will not indicate this agreement. This situation may lead to a loss of information. The algorithm only calculates the values for those relationships in the aggregated model. Nevertheless, the values come from all the nonexpert views. It is necessary to divide one by the number of causal relationships among two concepts to calculate the causal distance. Figure 7 depicts the formula to calculate the aggregated causal distance between two concepts. In the formula, "j" indicates the number of non-expert views suggesting the causal relationship. Furthermore, "x" is the causal distance calculated in view "j" for that relationship.
3.1.4 Model building. Each model depicts the aggregations by using the language for crowd modelling. The models present the independent concepts above the mean, their  Aggregated causal distance relations with the highest causal agreement, the directions of the relationships, the ranks based on the points and the effect. There are two critical considerations. Firstly, no cyclical relation must exist (i.e. a cause is a cause of its cause). Each causal path must start from the effect and prioritize the highest causal agreement, if equal, prioritize the causal distance, if equal, prioritize the ranking. The algorithm follows that causal path and drops the causal relationship that creates a cyclical relation. Secondly, each independent concept must be the cause of at least another independent concept or effect. Thus, Figures 8, 9, 10 and 11 represent the aggregated models for theories A, B, C and D.

Evaluation
It is necessary to evaluate the similitude among the aggregated models and those published by previous research. This evaluation will help to answer the research question. Table 4 reports the results. The percentage of similitude are 80% for theories A and C, 60% for Theory B and 70% for Theory C. For example; Theory A has five independent concepts and one effect. The effect cannot be a cause and an independent concept cannot connect to itself. Crowd modelling Therefore, there are 25 possible causal relationships to indicate in the aggregated model. The aggregated model did not suggest 18 unpublished causal relationships and suggested 2 published causal relationships. Thus, the aggregated model was correct in 20 out of the 25 predictions. That is 80% of effectiveness. The same logic applies to the rest of the aggregated models.
A match between the aggregated model and the published theory is a prediction of a causal relationship. The prediction may have two values: exist or do not exists. Therefore, in the example, Theory A states that technical quality leads to perceived enjoyment. Technical quality does not indicate a causal relationship with the other four concepts. There is no causal relationship predicted to interaction quality, social image, attitude or usage intention. Therefore, Theory A only depicts one out of five possible causal relationships. The aggregated model for Theory A indicates the same. Therefore, there are five matches. That is the correct prediction that one causal relationship exists and the correct prediction that four causal relationships do not exist. The process is the same for the rest of the concepts.

Discussion
Some of the results are: Theories A and D are closer than B and C to the published theories. Thus, it does not matter whether the number of concepts involved is six or eight. The distance from the farthest independent concept to the effect is also not relevant, as long as it varies from two to five. Consequently, the method for theorizing worked for all the theories evaluated. The aggregation of non-expert views is a plausible means to theorize. The differences between the aggregated models and the published models may be due to cultural differences, the problem of homogeneous theorizing, lack of incentives to be right or another reason. To determine why theories A and D were closer to the theories published than theories B and C is a topic of another study.
The purpose of the method is not to provide evidence to validate existing theories. If there is no theory for a particular phenomenon, the purpose is to theorize a new theory. If there are current theories, the purpose is to theorize alternative explanations or challenge the causal relationships. Therefore, Table 5 helps to guide the theorist. If the aggregated model  suggests a causal relationship that is already published, then the theorizing process search for alternative explanations. If the aggregated model suggests a causal relationship that is not already published, then the theorizing process tries to challenge the theory. If the aggregated model does not indicate a causal relationship that the published theory does, then the theorizing process tries to challenge the theory. In this process, it is particularly important to consider the causal agreement and the causal distance. For example, in Theory A, the theorist must test the causal relationship among perceived enjoyment and usage intention without the mediation suggested in the published theory especially because the aggregated model indicates a majority causal agreement and a causal distance of 0.7. The theorist should test the causal relationship between interaction quality and perceived enjoyment. The reason is that the aggregated model depicts a relationship that the theory published does not. The causal relationships among technical quality and perceived enjoyment. as well as attitude and usage intention is in both models. Thus, the theorist should search for alternative explanations. After that, the published theory indicates the causal relationship between perceived enjoyment and attitude, whereas the aggregated model does not. That means that the theorist should try to challenge that.

Conclusion, limitations and future research
The results of the case study answer the research question and provide evidence that the aggregation of non-expert views may achieve expert-like research models with a similitude that may vary from 60% to 80%. These expert-like research models may serve as an input for both positivist and interpretative researchers. It is possible to organize the lessons learnt in terms of crowd collaboration, diversity, contribution and performance. It is also possible to organize the lessons learnt in terms of the theorizing purpose that may be to find different cause-relationships or to find alternative explanations.
In terms of crowd collaboration, the non-expert views may come from collectives or individuals. In the case study, 20 modellers worked independently. In terms of crowd diversity, the professional background was the criterion to sample modellers. The case study included modellers from multiple areas of knowledge. In terms of crowd contribution, the case study exemplified how modellers may participate in the theorizing of one or up to four theories at the time. In terms of crowd performance, the pool of concepts may vary from five to nine. In the case study, the modellers worked with theories with six or eight concepts.
The theorist may theorize causal relationships or reasonings. When there is no reference theory, crowd modelling leads to a causal-relationship diagram and a set of reasons to sustain the relationships. When there is a reference theory, crowd modelling leads to a set of causal relationships that may or not exist in the reference theory. If they do not exist, that challenge the reference theory. Especially when the causal distance of that relationship is higher than 0.5 and the causal agreement is majority or consensus. If they exist, the theorist Note: *Specially if the causal distance is greater than 0.5 and the causal agreement is majority or consensus may search for alternative explanations. Finally, if the reference theory depicts relationships that the crowd does not see, that also challenge the reference theorythe context and participants matter for any conclusions.
The case study provides evidence to argue that the aggregation of non-expert views is a plausible means to theorize expert-like research models. This evidence is especially important for institutions with limited access to experts or databases. It is also essential for those who want to avoid the homogeneous theorizing problem. The main limitation of the method is that, according to its rationality, it only works for social sciences. The case study provides evidence to answer the research question of the study. Furthermore, the algorithm provides the grounds for a standard execution, whereas integrates the principles discussed during Section 1.
The results of this study have three implications. Firstly, it provides a method for theorizing that requires less statistical and background knowledge, as well as fewer resources, than similar approaches to support crowd science. Furthermore, provide the theoretical argumentation to explain how and why the method works. Secondly, it defines the problem of homogeneous theorizing and suggests a method to reduce it. Finally, suggests pieces to complement the theory of wise crowds. In Section 1.1 the pieces already considered are discussed. Still, the case study provides evidence that both topics discussed in Section 1.2 must be considered in the theory of wise crowds: firstly, the aggregation of mental models from a cognitive perspective. Specially, to explain under which conditions the diversity of mental models performs better to convey the reasoning of social facts. Secondly, the design of algorithms and platforms that integrate the theory by design. This piece must formalize the design principles of artefacts focused on theorizing and theory building.