An epidemic model for correlated information di ﬀ usion in crowd intelligence networks

Purpose – With the popularity of the internet and the increasing numbers of netizens, tremendous information ﬂ ows are generated daily by the intelligently interconnected individuals. The diffusion processes of different information are not independent, and they interact with and in ﬂ uence each other. Modeling and analyzing the interaction between correlated information play an important role in the understanding of the characteristics of information dissemination and better control of the information ﬂ ows. This paper aims to model the correlated information diffusion process over the crowd intelligence networks. Design/methodology/approach – This study extends the classic epidemic susceptible – infectious – recovered (SIR) model and proposes the SIR mixture model to describe the diffusion process of two correlated pieces of information. The whole crowd is divided into different groups with respect to their forwarding state of the correlated information, and the transition rate between different groups shows the property of each piece ofinformationandthein ﬂ uences betweenthem. Findings – The stable state of the SIR mixture model is analyzed through the linearization of the model, and the stable condition can be obtained. Real data are used to validate the SIR mixture model, and the detailed diffusion process of correlated information can be inferred by the analysis of the parameters learned through ﬁ tting thereal data into theSIRmixture model. Originality/value – The proposed SIR mixture model can be used to model the diffusion of correlated information and analyzethepropagationprocess.


Introduction
With the recent development of social media, Internet of Things, big data, cloud computing and many other new technologies, we witness new industrial and social management patterns, and the emergence of a crowd cyber eco-system consisting of smart and deeply connected entities such as individuals, enterprises and government agencies Shen et al., 2017;Nan et al., 2017;Huang et al., 2017). Such smart entities constantly interact with each other, influence each other's decisions and have a significant impact on our society and economy. It is of crucial importance to study how such smart entities interact with each other, to understand their decision-making process, to analyze how they influence each other and the impact of such interactions on the entire crowd intelligence networks. Such investigation provides important guidelines on the design of efficient and effective mechanisms to manage such crowd intelligence networks.
In this paper, using information diffusion in social networks as an example, we study user behavior in such crowd intelligence networks. With deeply connected smart entities, information diffusion plays a critical role in the information sharing and network evolvement of such crowd eco-systems, sometimes detrimental to our society and economy. One example is the "salt panic" in China after the 2011 Tohoku Tsunami, where the news of nuclear leakage greatly stimulated the rumors like iodized salt can help ward off radiation poisoning, which lead to the long lines and mob scenes at stores and 10-fold jump of salt price throughout China (Pierson, 2011). Thus, it is important to study this complex information diffusion process over networks, to analyze its social and economic impact, and to prevent the propagation of such detrimental rumors.
Tremendous efforts have been dedicated to model the information diffusion process. The existing works on modeling information diffusion can be classified into two categories: graph and non-graph based approaches (Guille et al., 2013), which we will discuss in details below.
The graph-based models assumed that social networks could be described by graphs, where nodes represented users and an edge connecting two nodes represented a certain relationship between the two corresponding users. Two seminal models in this category are the Independent Cascades (IC) model (Goldenberg et al., 2001) and the Linear Threshold (LT) model (Granovetter, 1978). For the IC model, each edge between users could be classified as either the "weak tie", i.e., the common relationship, or the "strong tie", i.e., the closer and stronger relationship. Different probabilities were assigned to different types of edges, and the probability for a user to adopt a piece of certain information was determined by the probabilities of the edges connecting with him/her. To take user's collective behavior into consideration, the LT model assumed that a user would accept a certain piece of information when the percentage of his/her neighbors who had adopted the information was above a threshold. Recently, the evolutionary dynamics of the natural ecological systems has been introduced to model the information diffusion over the social networks (Jiang et al., 2014a(Jiang et al., , 2014bCao et al., 2016). The authors modeled the information diffusion process with evolutionary game theory over the synthetic and real networks, and the evolutionary stable states were analyzed with respect to different types of information and different network structures.
The graph-based approaches could characterize the diffusion processes in the micro view, that is, how each intelligent individual was influenced by their neighbors and how he/ she decided whether to adopt the information. However, these models were often limited by the difficulty to obtain the complete social network structure (De Choudhury et al., 2010).
The non-graph based approaches did not consider network structure in their analysis, and modeled the information diffusion process in a macro view. These models were mainly Diffusion in crowd intelligence networks based on the epidemic models (Daley and Kendall, 1964) from epidemiology. The canonical "Susceptible-Infectious-Recovered" (SIR) model and the "Susceptible-Infectious-Susceptible" (SIS) model from epidemiology were introduced to model the information and computer virus propagation over the online networks (Abdullah and Wu, 2011;Daley and Kendall, 1964;Lerman and Ghosh, 2010;Pastor-Satorras and Vespignani, 2001). These epidemic models classified users into different groups: those who had not heard and never spread the news (Susceptible), those who had received and forwarded the news (Infectious) and those who had stopped forwarding (Recovered). Using a few parameters to model the transition rates between groups, these models used the differential equations to describe and model the dynamics of the population in different groups. Many works extended the classic epidemic models by introducing other groups to describe the diffusion process. Rui et al. proposed a Susceptible-Potential-Infective-Removed (SPIR) model that introduced the "Potential" groups into the classic SIR model (Rui et al., 2018). The "Potential" group was designed to describe the individuals who had heard of the information but did not become infectious and forward it. By introducing this new group, SPIR model matched the simulated information diffusion processes over the synthetic and real networks better than the classic SIR model. Considering the scenario where there were a few authorities that often clarified the fact and released the authoritative information to confirm or refute the content of network rumors, Xia et al. (2015) introduced a new group that represented these authorities into the classic SIR model, which is called the "SIAR" model. Through the simulation over synthetic networks, the authors showed that the "SIAR" model could realistically characterize the evolution of the rumor propagation. The works in Zhao et al. (2012), Zhao et al. (2013a) and Zhao et al. (2013b) introduced the forgetting and remembering mechanism, and designed a new group called "Hibernators" referring to the individuals who had transformed from the spreaders due to the forgetting mechanism and could be turned back to spreaders due to the remembering mechanism. Through analytical and simulations over homogeneous and heterogeneous networks, the authors found that this new group would reduce the maximum of rumor influence, and postpone the terminal time of the diffusion. Liu et al. considered the existence of the "superspreaders" in the networks, whose spreading speed was much faster, and introduced a corresponding new group into the classic SIR model . The validation on realworld Weibo dataset of the proposed model was conducted and showed that this improved SIR model was much more promising than the classic SIR model in characterizing a superspreading event of information propagation.
Most existing works focused on the diffusion process of only one information. However, the numerous smart entities in the crowd cyber system can interact with each other  due to the rapid development of the Internet. Thus, the diffusion process often exhibits more complex features, and different information often influences each other and spread together. The works in Beutel et al. (2012) and Prakash et al. (2012) modeled the interaction between different information propagation processes as the competition of users' attention, while in reality, there may be complicated patterns on how different information propagation interact with each other, e.g., different information can also promote the propagation of each other.
Inspired by the works of Myers and Leskovec (2012), Sun et al. (2017) and Fu et al. (2019), in this work, we study how correlated information influence each other's propagation over networks, and assume that intelligent individuals who have heard and spread the first information might have some pre-judgment and prior knowledge about the second information. Thus, the probability for him/her to spread the second information should be different from those who have not heard and never spread the first one.

IJCS 3,2
Based on this assumption, this work extends the classic SIR model in Daley and Kendall (1964), Lerman and Ghosh (2010) and Abdullah and Wu (2011) and proposes the SIR mixture model to describe the diffusion process of two correlated pieces of information. The SIR mixture model includes two stages. The first stage represents the process where the first information spreads alone, while the two correlated information spread together in the second stage. By classifying the crowd into 8 groups according to their current states in the propagation of the two correlated information, the dynamics of the percentage of all groups can be described by 8 differential equations, respectively. Through the linearization of the differential equations, we can obtain the necessary conditions of the stable state of the information propagation. We also test our model with real data and infer the propagation process of two pieces of information.
The rest of the paper is organized as follows: Section 2 summarizes the classic SIR model and discusses its properties. Section 3 models the diffusion processes of correlated information, and provides the details of the proposed SIR mixture model. The stable state analysis is discussed at the end of this section. Section 4 validates the SIR mixture model with real data and infers the diffusion process. The conclusion is drawn in Section 5.

The classic susceptible-infectious-recovered model
There have been numerous works on the modeling and analysis of the dynamics of information diffusion over social networks. A class of models is inspired by the epidemic model from the epidemiology due to the similarity between the spread of infectious disease and the diffusion of information.

The classic epidemic model in epidemiology
The epidemic model was proposed by Kermack and McKendrick (1932) to model the disease spreading. The two basic and classic epidemiology models are the SIS model, also known as "Susceptible-Infectious-Susceptible" model, and the SIR model, also known as "Susceptible-Infectious-Recovered" model (Hethcote, 2000).
These two classic models both divide the whole population into different groups according to their state towards a single disease, which included "Susceptible" (S), "Infectious" (I) and "Recovered" (R). The susceptible (S) state refers to those who might be infected with the disease, the infectious (I) state refers to those who currently have the ability to spread the disease and the recovered (R) state refers to those who have recovered from the disease and get the immunity.
The state of each individual could change in the SIS and SIR model. The SIS model only includes S and I states. The susceptible individuals might be infected by the infectious individuals, so that their state could transit from the susceptible to the infectious. The infectious individuals might recover from the disease and their state transit back to the susceptible, that is, they do not get the immunity and might be infected again. However, in the SIR model, the infectious individuals might recover from disease and acquire the immunity, that is, they will never get the disease again and their states change to R, the "Recovered". The state transition of the SIS model and the SIR model are shown in Figure 1

Diffusion in crowd intelligence networks
In addition to the similar state description of the whole population, these two models both assume that the whole population is well mixed, and each person has the same probability to contact with all the other people. All individuals are homogenous, that is, the probabilities to transit between different states are the same for all individuals.

Epidemic models for information diffusion
The spread of disease share much common character with the diffusion of information. By analogizing the spread of disease in the population, Daley and Kendall (1964) first modeled the rumor propagation with the SIR model. In the past decade, the SIR model was also successfully used to model the information diffusion over the social networks (Lerman and Ghosh, 2010;Abdullah and Wu, 2011). The analogy between the spread of disease and the information diffusion with respect to the "SIR" model is listed in Table I.
The "SIS" model can also be used to model the information diffusion (Pastor-Satorras and Vespignani, 2001). However, the "SIS" model does not have the recovered groups. Thus, it cannot characterize the process that the population loses interest in the information and the infectious proportion decrease.

The dynamics of the classic SIR model
As illustrated in the previous subsection, the state transition of the SIR model is shown in Figure 1(b). To characterize the percentage of each group changing with time, the parameters b and g are introduced to quantify the transition rates between S, I and R. The parameter b is the contact rate, and represents the average probability of adequate contacts (i.e., contacts sufficient for transmission) of an infectious person per unit time. The parameter g is called the recovery rate which represents the average probability for an infectious individual to transit to the recovered (Hethcote, 2000). Let S(t), I(t) and R(t) denote the percentage of each group, and the dynamics of the SIR model can be described with the following differential equations: The S(t), I(t) and R(t) should also satisfy the requirement Although the above equations of the SIR model are non-linear, the stable state analysis can still be conducted. By setting equations (1)-(3) to zero, we can first obtain the equilibrium points denoted as [s, i, r] T . According to the stability analysis at the equilibrium points in Hethcote (2000) and Piqueira (2010), the stable state should satisfy that i = 0 and s < g =b . The two conditions indicate that there does not exist any infectious individuals, and the percentage of the susceptible individuals is lower than a threshold which is determined by b and g at the stable state.
The analysis of the SIR model's dynamics in Hethcote (2000) also shows that the evolvement of the percentages of three groups in the SIR model can be obtained through IJCS 3,2 numerical calculations when the initial state {S(0), I(0), R(0)} and the parameters {b ,g } are given. The evolution process will finally reach one of the stable states.

Modeling correlated information diffusion based on the classic SIR model
Most prior works of SIR-based models only considered the diffusion of one single information. In this work, we extend the classic SIR model, and focus on the diffusion process of two pieces of correlated information, which takes the influences of correlated information on intelligent individuals into account.

The SIR mixture model
In this work, we consider two pieces of correlated information disseminated over the crowd intelligent networks and propose the SIR mixture model. Same as the classic SIR model (Abdullah and Wu, 2011;Daley and Kendall, 1964;Lerman and Ghosh, 2010), we assume that the whole population remains a constant with respect to time, and all individuals in the networks are well mixed and homogeneous.
Let E 1 and E 2 represent the correlated information respectively. According to the classic SIR model, each individual can have three possible states towards one information, denoted as {S 1 , I 1 , R 1 } and {S 2 , I 2 , R 2 }, respectively. Combining the states with respect to E 1 and E 2 , we will have 9 states, which are, S 1 S 2 , I 1 S 2 , R 1 S 2 , S 1 I 2 , I 1 I 2 , R 1 I 2 , S 1 R 2 , I 1 R 2 , and R 1 R 2 . However, in this work, we consider the simple scenario where individuals have limited attention and can only focus on one information at a time. This will also greatly simplifies the theoretical analysis as demonstrated below. Therefore, the state I 1 I 2 where the corresponding individuals spreading E 1 and E 2 simultaneously can be excluded. The total 8 possible states and their physical meanings are summarized in Table II. For notational simplicity, we also denote the percentage of individuals in each state at time t with the same annotation of the corresponding state when no confusions are made. For example, S 1 S 2 (t) stands for the percentage of individuals who are in the state of S 1 S 2 at time t, and other notations can be derived in the same way. To take the actual information diffusion scenario into consideration, two pieces of information may not begin to propagate at the same time, that is, one information might spread for a certain time t 0 , and then, the other information begins to spread. Thus, our SIR mixture model can be correspondingly divided into two stages.
In the first stage where t [ (0, t 0 ), there is only the first information spreading across the networks. Without loss of generality, we assume that E 1 spreads first. In this stage, E 2 does not exist and the diffusion of E 1 will not be influenced by E 2 . Thus, each individual can only change his/her own state in three possible states, {S 1 S 2 , I 1 S 2 , R 1 S 2 }, while the percentages of the other five states are always 0 in this stage.
In the information diffusion scenario, we often consider there are only a few people begin to spread the information (Abdullah and Wu, 2011;Daley and Kendall, 1964;Lerman and Ghosh, 2010;Liu et al., 2016). Consequently, the initial state of the first stage should satisfy S 1 S 2 (0) % 1 and I 1 S 2 (0) % 0, and the percentages of people in other groups equal to 0 at time 0.
The state transition of the SIR mixture model in the first stage is shown in Figure 2, and we can show that this state transition diagram is exactly the same with only one information propagating as shown in Figure 1(b). The parameter b 1 and g 1 are the contact rate and the recovery rate of the individuals who do not know E 2 , and they are actually the parameters for E 1 to spread alone. Therefore, we can use the classic SIR model to describe this process, and the dynamics of the whole population in the first stage can be directly obtained from equations (1)-(3) as: At the end of the first stage at t 0 , the percentages of individuals at state S 1 S 2 , I 1 S 2 and R 1 S 2 are S 1 S 2 (t 0 ), I 1 S 2 (t 0 ) and R 1 S 2 (t 0 ), respectively. The value of these three percentages are determined by the parameters b 1 , g 1 and t 0 , that can be numerically solved and analyzed as in prior works of the classic SIR model (Hethcote, 2000). The percentages of the other five states still remain 0 as the time approaches t 0 . The second stage of the SIR mixture model begins at time t 0 , where E 2 also begins to spread across the network together with E 1 . The state transition of the SIR mixture model in the second stage is shown in Figure 3. The initial state of the second stage is determined by the end of the first stage. According to the previous analysis of the first stage, the percentage of individuals in state S 1 S 2 , I 1 S 2 and R 1 S 2 at t = t 0 are S 1 S 2 (t 0 ), I 1 S 2 (t 0 ) and R 1 S 2 (t 0 ), respectively, while the percentages of other five states equal to 0. To characterize the beginning of propagation of E 2 , we assume that a little perturbation of S 1 I 2 exists. That is, at time t 0 , some individuals whose state are S 1 S 2 originally, become the "infectious" individuals of E 2 and change their current state to S 1 I 2 . Thus, this little perturbation will cause the percentage of S 1 I 2 (t 0 ) to be slightly more than 0, and the percentage of S 1 S 2 (t 0 ) to be slightly less than the original value calculated at the end of the first stage, while other percentages remain unchanged.
At the beginning of the second stage, most individuals are in states S 1 S 2 , I 1 S 2 and R 1 S 2 . For the individuals in state R 1 S 2 who had spread E 1 , they can spread E 2 , i.e., become the "infectious" individuals of E 2 with their states changing to R 1 I 2 , and finally lose interests in E 2 and become R 1 R 2 . This process is the "Part 2" shown in Figure 3. According to our assumption that each user can only spread one information at a time, the individuals in state I 1 S 2 can only finish their spreading of E 1 first, and change their state to R 1 S 2 . After that, they can spread E 2 , and the following state transition is the same as the individuals in state R 1 S 2 . For the individuals in state S 1 S 2 , they can either firstly spread E 1 or E 2 in the second stage of the SIR mixture model, which are the "Part 1" and "Part 3" in Figure 3, respectively. Furthermore, for the individuals in state S 1 R 2 who had spread E 2 first, they can still become a spreader of E 1 and transit their state in "Part 4" in Figure 3.
We can show from the state transition diagram in Figure 3 that both "Part 1" and "Part 4" correspond to the propagation of information E 1 , while "Part 2" and "Part 3" correspond to the propagation of information E 2 . "Part 1" is the propagation of E 1 for the individuals in S 1 S 2 who do not know and never spread E 2 , while "Part 4" is the propagation of E 1 for individuals who are in state S 1 R 2 and have spread E 2 . The difference between "Part 2" and "Part 3" can be derived in a similar way. Therefore, by comparing "Part 1" with "Part 4", we can study how E 2 affects the propagation of E 1 , and similarly, we can analyze how E 1 affects the propagation of E 2 by comparing "Part 2" with "Part 3".
To address the impact of correlated information on each other's propagation, in this work, we assume that the individuals in state S 1 R 2 who have spread E 2 have different contact rate of E 1 from those who have not spread and never know E 2 (S 1 S 2 ), due to the prior knowledge of E 1 which is generated by spreading E 2 . We denote the contact rate in the former case as a 1 , which is different from the latter case defined as b 1 previously. We also assume that the recovery rates of the individuals in state I 1 S 2 and I 1 R 2 are different to generalize the SIR mixture model, which are denoted as g 1 and d 1 , respectively. We can also denote b 2 and a 2 as the contact rate of individuals in state S 1 S 2 and R 1 S 2 . The recovery rate for individuals in state S 1 I 2 and R 1 I 2 are g 2 and d 2 , respectively. All parameters of the SIR mixture model are summarized in Table III. According to the dynamics of the SIR model, we can use differential equations to describe the second stage of the SIR mixture model, where E 1 and E 2 spread together. Based on the assumptions and parameters' physical meanings described previously, we can obtain that: where I 1 (t) = I 1 S 2 (t) þ I 1 R 2 (t) represents all individuals who are spreading E 1 at time t, and I 2 (t) = S 1 I 2 (t) þ R 1 I 2 (t) represents all the individuals who are spreading E 2 at time t. equations (8), (9) and (11); equations (11), (13) and (15); equations (8), (10) and (12); and Transition rate between S 1 S 2 and S 1 I 2 a 1 Transition rate between S 1 R 2 and I 1 R 2 a 2 Transition rate between R 1 S 2 and R 1 I 2 g 1 Transition rate between I 1 S 2 and R1S 2 g 2 Transition rate between S 1 I 2 and S 1 R 2 d 1 Transition rate between I 1 R 2 and R 1 R 2 d 2 Transition rate between R 1 I 2 and R 1 R 2 t 0 Time for E 1 spreading alone IJCS 3,2 equation (12), (14) and (15) describe the processes of "Part 1", "Part 2", "Part 3" and "Part 4" in Figure 3, respectively, based on the dynamics of SIR model.

Stable state analysis of the susceptible-infectious-recovered mixture model
Since we have derived the dynamical equations of the SIR mixture model in equations (8)- (15), we can perform the stability analysis. Let us denote the dynamical state as x(t) = [S 1 S 2 (t), I 1 S 2 (t), S 1 I 2 (t), R 1 S 2 (t), S 1 R 2 (t), R 1 I 2 (t), I 1 R 2 (t), R 1 R 2 (t)] T . By setting the dynamical equations (8)-(15) to zero, we can obtain the equilibrium points of the SIR mixture model, i.e., x e = [s 1 s 2 , i 1 s 2 , s 1 i 2 , r 1 s 2 , s 1 r 2 , r 1 i 2 , i 1 r 2 , r 1 r 2 ] T . Then by adopting the Lyapunov first method (Teschl, 2012), we can analyze the stability condition for the equilibrium points x e Piqueira, 2010). We first show that the equilibrium points should satisfy the requirement that i 1 s 2 , i 1 r 2 , s 1 i 2 and r 1 i 2 should equal to zero. Mathematically, all parameters are both positive, and each entry of the equilibrium points x e is non-negative due to their physical meanings. Then, by setting equation (15) to zero, we should have r 1 i 2 = 0 and i 1 r 2 = 0. Taking these two conditions into other equations, we will have i 1 s 2 = 0 and i 1 r 2 = 0. It is also reasonable in the physical aspect, because the equilibrium point means the percentage of each group should not change. Therefore, it should satisfy that no individuals spread E 1 or E 2 anymore at least, which means i 1 s 2 , i 1 r 2 , s 1 i 2 and r 1 i 2 should equal to zero. Thus, the equilibrium points can be simplified to x e = [s 1 s 2 , 0, 0, r 1 s 2 , s 1 r 2 , 0, 0, r 1 r 2 ] T . That is, at the equilibrium points, the state of each individual should be in one of S 1 S 2 , S 1 R 2 , R 1 S 2 and R 1 R 2 .
Due to the non-linearity of the dynamical equations (8)-(15), we could not obtain the exact values of s 1 s 2 , r 1 s 2 , s 1 r 2 and r 1 r 2 . However, we can adopt the Lyapunov first method to analyze the stability condition for s 1 s 2 , r 1 s 2 , s 1 r 2 and r 1 r 2 . We can first take the linear approximation of the dynamical equations, which is the Jacobian matrix J of the right hand side of equations (8)-(15). Then, plugging the simplified equilibrium point x e into the linear approximation of the dynamical system, we can have: Àb 1 s 1 s 2 Àb 2 s 1 s 2 0 0 Àb 2 s 1 s 2 Àb 1 s 1 s 2 0 0 b 1 s 1 s 2 À g 1 0 0 0 0 b 1 s 1 s 2 0 0 0 b 2 s 1 s 2 À g 2 0 0 b 2 s 1 s 2 0 0 0 g 1 Àa 2 r 1 s 2 0 0 Àa 2 r 1 s 2 0 0 0 Àa 1 s 1 r 2 g 2 0 0 0 Àa 1 s 1 r 2 0 0 0 a 2 r 1 s 2 0 0 a 2 r 1 s 2 À d 2 0 0 0 a 1 s 1 r 2 0 0 0 0 a 1 s 1 r 2 À d 1 0 0 0 0 0 0 d 2 d 1 0 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 x t ð Þ À x e ð Þ : According to the criterion of Lyapunov stability in Teschl (2012) and Liu et al. (2016), the real part of the eigenvalue of J should be less than or equal to zero, and for the eigenvalues whose real part is equal to zero, the algebraic multiplicity should equal to the geometry multiplicity. We can first derive the characteristic equation of J from equation (16), which is: Diffusion in crowd intelligence networks We can show that J has an eigenvalue l 1 = 0, which has algebraic multiplicity 4 from equation (17). The other four eigenvalues, i.e., l 2 , l 3 and l 4 , l 5 are the roots of two quadratic terms in the characteristic equation. To ensure the state is stable, l 2 , l 3 , l 4 and l 5 should have the negative real parts, and this condition also ensures that the geometric multiplicity of l 1 is 4 which is equal to its algebraic multiplicity. This condition is equivalent to: Hence, the equilibrium points x e = [s 1 s 2 , 0, 0, r 1 s 2 , s 1 r 2 , 0, 0, r 1 r 2 ] T which satisfies the inequalities in equation (18) are the stable states. When the initial state, the parameters of the SIR mixture model are given, the evolution process of each group can be obtained with numerical solution, and the final state of the evolution should stay in one of the stable states we calculated previously.

Real data validation with SIR mixture model
In this section, we use the dataset in Yang and Leskovec (2011) to validate our SIR mixture model. According to the study by Yang and Leskovec about the patterns of temporal variation in social media, the online contents exhibit rich temporal dynamics, that is, how contents' popularity grows and fades over time. The dataset includes top-1000 online contents with the largest overall volumes from Sept. 2008 to Aug. 2009. Each online content has a time sequence of 128 entries which indicates the numbers of news articles or the blog posts in each hour around the most popular period of the corresponding online contents.
Each online content can be regarded as the diffusing information, and the corresponding time sequence is the diffusion process of the information. Thus, the SIR model can be used to describe these time sequences, which correspond to the dynamics of the I groups.
To obtain the percentage of the news articles and blog posts with respect to each time unit, we adopt the method in Jiang et al. (2014b) and assume that the total number of news websites and bloggers is the maximal value of all time sequences in the dataset and divide all the time sequences by the maximal value, so that each entry of time sequences are normalized to [0,1].
Through the observation of the whole 1000 time sequences, they can be roughly divided into two kinds, whose representatives are shown in Figure 4 with blue solid lines. For the first kind of online contents, shown in Figure 4(a), their time sequences have only one peak. These diffusion processes can be described by the classic SIR model appropriately which is shown in Figure 4(a) with the green dot line. However, for the second kind of online contents, shown in Figure 4(b), there exist at least two peaks in the temporal sequences. The SIR model cannot fit the data properly, as shown in Figure 4(b) with the green dot line.
The existence of the second peak may indicate that the corresponding online content has something new so that the volume would go up again. We attribute this to the online content IJCS 3,2 consisting of two pieces of correlated sub-information, and the first peak mainly results from the first sub-information, while the second sub-information lead to the second peak of the time sequences. Hence, the proposed SIR mixture model can be used to describe this kind of information diffusion process.
Because the dataset does not provide the detailed diffusion processes of each information, we only have the total volumes of each information over time. To fit the real data into our proposed SIR mixture model, we define the percentage of total infectious individuals of both E 1 and E 2 at time t, that is: Thus, I(t) corresponds to the time sequences of the percentage of news articles and blog posts given by the SIR mixture model, and we also denote Î(t) as the real time sequences of the percentage of the same news articles and blog posts. Similar to the validation of the SIR model with real data, the aim of fitting time sequences with the SIR mixture model is to learn the parameters x = [b 1, b 2, g 1 , g 2 , a 1 , a 2 , d 1 , d 2 , t 0 ] T , so that it minimizes the following MSE between the real data and the model fitting result, that is: Because we cannot obtain the closed form expression of the I(t) of SIR mixture model through such complex differential equations, we use particle swarm optimization (PSO) algorithm Kennedy and Eberhart (1995) to efficiently obtain the solution to equation (20). Due to the discreteness of t 0 , we try each t 0 from 0 to 120 with a step size of 5. For a determined t 0 , the PSO solver is used to obtain the optimal solution of the optimization problem equation (20) with respect to [b 1, b 2, g 1 , g 2 , a 1 , a 2 , d 1 , d 2 ]. The result is shown in Figure 4(a) and (b) with orange dash lines.

Diffusion in crowd intelligence networks
We conduct the real data fitting validation of the total 1000 time sequences in the dataset, and find that the average MSE given by the SIR mixture model is 5.19 Â 10 À4 . Compared to the average MSE given by the classic SIR model, 1.44 Â 10 À3 , the proposed model reduces the MSE by about 64%, which indicates that the SIR mixture model is much more promising than the classic SIR model in characterizing complex information propagation processes. Specifically, with the time sequence shown in Figure 4, we can show that the SIR mixture model performs as well as the classic SIR model in terms of the diffusion process with single peaks. However, our SIR mixture model outperforms the classic SIR model with respect to the diffusion process with double peaks. Because no detailed information of each time sequence is provided in the dataset, we use the parameters learned by fitting into the corresponding real data to calculate the percentages of infectious individuals of E 1 and E 2 with respect to time t, which are I 1 (t) = I 1 S 2 (t) þ I 1 R 2 (t) and I 2 (t) = S 1 I 2 (t) þ R 1 I 2 (t), respectively. Then, we can use the detailed diffusion processes of E 1 and E 2 to analyze the evolvement processes of the correlated information diffusion and the influence between correlated information.
We take the time sequence in Figure 4(b) as an example, and the other time sequences can be analyzed in the same way. We first plot I 1 (t) and I 2 (t) in Figure 5(a) according to the SIR mixture model whose parameters are learned by the corresponding real data. It validates our assumption that the existence of the two peaks results from the spreading of E 1 and E 2 , respectively. The interval between the two peaks is approximately 25 time units, which is also exactly the fitting results of t 0 .
Through the fitting parameters, we observe that g 1 % g 2 % 0.35 and d 1 % d 2 % 0.57 which indicates that the crowd's recovery rates are barely influenced by the other information. The reason might be that people pay more attention to the information that they are spreading now. Hence, the recovery rates are less affected by the correlated information.
The fitting results show that b 2 % 0.54 < a 2 % 1, which indicates that in this particular example, people who have spread E 1 are easier to spread E 2 . Thus, E 1 has a positive impact on E 2 . We also plot I 1 S 2 (t) = I 1 R 2 (t), S 1 I 2 (t) and R 1 I 2 (t) in Figure 5(b), and the curve of R 1 I 2 (t) grows faster and higher than one of S 1 I 2 (t) which validates our analytical results.
The result in Figure 5(b) also indicates that I 1 R 2 % 0 with respect to time. We also show that a 1 % 0.52 < b 1 % 0.63, that is, the diffusion of E 1 is suppressed by E 2 . We infer that it is mainly because people have more interests in the new information although this two information are correlated.

Conclusion
In this work, we extend the classic SIR model, and propose the SIR mixture model which formulates the process that two pieces of correlated information jointly propagates over the crowd intelligent networks.
To describe the influence between the correlated information, we exploit the character of the crowd intelligence, that intelligent individuals will have pre-judgment and prior knowledge over the second information for spreading the correlated first information. The influence will result in a different probability to spread the second information compared with those who have not heard and never spread the first one.
The crowd is divided into 8 groups according to their state towards two pieces of information. The SIR mixture model is a two-stage model including the process that the first information propagates alone, which can be modeled by the SIR model, and the process both information spread together whose dynamics can be described by eight differential equations. We also discuss the stable state of the SIR mixture model through linearization of the differential equations and obtain the condition for the stable state.
Finally, we validate our model with real data, and find that our model can be used to describe not only the information diffusion process with one peak, but also the more complex one with two peaks. We also use the parameters learned from the real data to reason about how correlated information interact with each other and propagate over the crowd intelligent networks.