Minimizing the influence of dynamic rumors based on community structure

Purpose – With the rapid development of internet technology, open online social networks provide a broader platform for information spreading. While dissemination of information provides convenience for life, it also brings many problems such as security risks and public opinion orientation. Various negative, malicious and false information spread across regions, which seriously affect social harmony and national security. Therefore, this paper aims to minimize negative information such as online rumors that has attracted extensive attention. The most existing algorithms for blocking rumors have prevented the spread of rumors to some extent, but these algorithms are designed based on entire social networks, mainly focusing on the microstructure of the network, i.e. the pairwise relationship or similarity between nodes. The blocking effect of these algorithms may be unsatisfactory in some networks because of the sparse data in the microstructure. Design/methodology/approach – An algorithm for minimizing the influence of dynamic rumor based on community structure is proposed in this paper. The algorithm first divides the network into communities, and integrates the influence of each node within communities and rumor influence probability to measure the influence of each node in the entire network, and then selects key nodes and bridge nodes in communities as blocked nodes. After that, a dynamic blocking strategy is adopted to improve the blocking effect of rumors. Findings – Community structure is one of the most prominent features of networks. It reveals the organizational structure and functional components of a network from a mesoscopic level. The utilization of community structure can provide effective and rich information to solve the problem of data sparsity in the microstructure, thus effectively improve the blocking effect. Extensive experiments on two real-world data sets have validated that the proposed algorithm has superior performance than the baseline algorithms. Originality/value – As an important research direction of social network analysis, rumor minimization has a profound effect on the harmony and stability of society and the development of social media. However, because the rumor spread has the characteristics of multiple propagation paths, fast propagation speed, wide propagation area and time-varying, it is a huge challenge to improve the effectiveness of the rumor blocking algorithm.


Introduction
In recent years, the rapid development of online social networks, such as Weibo, WeChat and Twitter has brought about changes in the way of social communication. Hundreds of millions of people have shared messages with each other through social platforms, so the speed of information exchange amongst people have been greatly improved. On the one hand, these social platforms provide great convenience for the dissemination of positive messages (Montanari and Saberi, 2010), but on the other hand, they also become the main channels for spreading malicious rumors or misinformation. To reduce the impact of negative rumors, once they are detected, appropriate measures should be taken in time to prevent the spread of information and narrow the scope of its spread. As an important research direction of social network analysis, rumor minimization has a profound effect on the harmony and stability of society and the development of social media. However, because the rumor spread has the characteristics of multiple propagation paths, fast propagation speed, wide propagation area and time-varying, it is a huge challenge to improve the effectiveness of the rumor blocking algorithm.
The spread process of rumors is very similar to infectious diseases and computer viruses (Nekovee et al., 2007). Many scholars have studied the spread of rumors in social networks (Borge-Holthoefer and Moreno, 2012;Borge-Holthoefer and Meloni, 2013;Wang et al., 2012;Zhao et al., 2012). Although the studies on minimizing negative rumors have achieved some achievements, the depth and breadth of the study are not enough. For example, greedy blocking algorithms, dynamic blocking algorithms , etc., these methods control the spread of rumors to a certain extent. However, most of these methods are designed based on the whole social network, which focuses on the microstructure of the network, that is, pairwise relationships or similarity between nodes, resulting in the blocking effect may be unsatisfactory in some networks because of the problem of sparse data in microstructure.
As we all know, the community structure (Girvan and Newman, 2002) is a part of the most prominent features of the network. It reveals the organizational structure and functional components of the network and describes the structure of the network from the mesoscopic level (Wang et al., 2016). For two nodes in the community, even if they have only weak relationships in the microstructure because of data sparsity, their similarity will be strengthened by the constraints of the community structure . In this paper, we propose a community-based framework for minimizing the influence of dynamic rumor (CoF-MIDR). The community structure not only reflects the behavior characteristics of individuals in the network and the associated information between the communities but also simplifies the entire network into hierarchical relationships between communities, which is conducive to discover bridge nodes who play the important role in spreading rumors from one community to another. In addition, the utilization of community structure can provide effective and rich information to solve the problem of data sparsity in the microstructure. Considering the interaction between the nodes in the community and the important relationship between the communities is conducive to accurately measure the influence of the nodes. Based on the results of the node influence measurement with the community structure, the key nodes in intra-communities and the bridge nodes in intercommunities are selected as the blocked nodes. After that, a dynamic blocking strategy is adopted to expose fewer people to the rumor information, so as to reduce the number of rumor infections as much as possible and improve the rumor blocking effect. For example, given a network as shown in Figure 1(a), the blue node represents the source of the rumor (rumor publisher) and the black hollow node represents the uninfected node (not accept rumors). Figure 1(b) shows the normal propagation process, and the black solid node IJCS 3,3 indicates the infected node (accept rumors). Figure 1(c) is a network without community structure, using the method proposed by  to select three key nodes (red nodes) for blocking (cutting off all propagation paths with its neighbors, which indicated by blue dashed lines). Figure 1(d) is a network with community structure, and then select the key nodes (red nodes) and the bridge nodes (yellow nodes) to block. Comparing Figure 1(c) with 1(d), the number of infected nodes in Figure 1(d) is reduced, so blockage based on community structure can further reduce infected nodes.
The major contributions of this paper are summarized as follows: A method for measuring the influence of nodes based on the community structure and the influence probability of rumor between nodes is proposed. We propose a CoF-MIDR algorithm, which selects the blocked nodes and adopts a dynamic blocking strategy to minimize the influence scope of rumors.
Extensive experiments are carried out on two real-world data sets. The experimental results indicate that the proposed algorithm has superior performance than the baseline algorithms according to the evaluation metric of the infection rate.
The rest of this paper is organized as follows: Section 2 introduces the related work, Section 3 details the CoF-MIDR algorithm, Section 4 gives the experiment and results, and finally, we conclude the work in Section 5.

Related work
At present, the rumor blocking problem has aroused widespread concern in academic communities. Many scholars have studied the spreading process of rumors and proposed methods to reduce the influence area of rumors. Based on the susceptible-infected-removed model, Zhao et al. (2012) proposed a susceptible-infected-hibernator-removed model with the forgetting mechanism and the recall mechanism. Nguyen et al. (2012) defined the multicompetition independent cascade model and the influence propagation restriction problem, extended the independent cascade model to the MCICM model and gave an approximate solution algorithm for the EIL problem. As for the research on rumor blocking strategy, Budak and Agrawal (2011) introduced the concept of "good" movement in social networks and offset the negative effects of "bad" movements through "good" movements. Kimura et al. (2009) studied the problem of minimizing malicious rumor spread by blocking a limited number of links in social networks. Fan et al. (2013) studied the rumor blocking with the lowest cost in social networks, introduced the concept of "protector," and tried to select the minimum number of "protectors" to limit the influence of rumors by triggering the  (2018) proposed a new STCIR model to study the dynamic propagation of rumors, which suppress rumors by forwarding correct information. Chen (2019) considered the calm period before the propagator became an agitator and the mobility of the crowd in a certain area, and proposed a novel rumor propagation model to explore the control of rumor propagation in emergencies.
Although these methods have achieved the role of suppressing the spread of rumor information, most of them are designed for the entire social network, mainly focusing on the microstructure of the network, that is, the paired relationship or similarity between nodes, while ignoring the mesoscopic structure of the network. This may result in poor blocking effect in some networks because of the data sparsity in microstructure.

Community-based framework for minimizing the influence of dynamic rumor algorithm
In this paper, we propose a CoF-MIDR. The CoF-MIDR algorithm is divided into three stages. In the first stage, a community detection algorithm is used to obtain the community structure for the whole social network. The second stage measures the influence of each node in the community based on the community structure, and then select the nodes with greater influence and the bridge nodes linking two communities as the blocked node-set. The dynamic blocking strategy is implemented on the blocked node-set in the third stage.

Community detection
The community structure with nodes aggregation is one of the most prominent features of the social network. In this paper, we first divide the social network into multiple communities by a community detection algorithm and then select the need to be blocked nodes based the community structure. At present, there are abundant study achievements on community detection, most of them are not only effective but also efficient, such as Louvain algorithm (Blondel et al., 2008), GN algorithm (Girvan and Newman, 2002) and LPA algorithm (Raghavan and Albert, 2007), one of them is used to divide the network in this paper.

Blocked nodes selection
In this paper, we select the nodes with great influence to block. Therefore, we need to measure the influence of each node.
Suppose a social network G is divided into m communities by community detection algorithm, denoted as G = C 1 | C 2 | . . . C i [. . .] | C m . The influence of the node u in C i is denoted as I u (C i ). If node u is the rumor source, I u (C i ) = 1, otherwise, it is calculated using PageRank (Brin and Page, 1998) defined as equation (1).
Where v is the neighbor node of u, N (u) is the collection of neighbor nodes for node u in its community, N is the total number of nodes. According to the literature (Brin and Page, 1998), a is generally taken as 0.85. The I u (C i ) measures the importance of the node within the community C i , reflecting the authority, professionalism and popularity of the node in the community. However, it ignores the influence of the relationship between nodes on the rumor propagation process. For example, the high similarity between u and v indicates that v is more influenced by u, so that v is more likely to receive and forward the information posted by u.
It is common that the attributes of a node can reflect the characteristics of the node, such as gender, age, hobby, occupation and so on. Therefore, we use the attributes of nodes to measure the similarity between nodes. Cosine similarity is the most commonly used method of similarity measurement, which is applied in this paper to measure the similarity between nodes u and v. It is defined as equation (2).
Where u and v represent the two adjacent nodes within the same community, i = 1, 2 . . . . . ., n represent the n attributes of the nodes. The value of S _ values uv is range from [0, 1], the value close to 1 indicates that the two nodes are very similar, whereas the value close to 0 indicates that the two nodes are less similar. The neighbor nodes set of u denoted as N (u). Therefore, the influence probability of node u on its neighbor nodes is defined as equation (3).
I u (C i ) reflects the importance of a node within the community, while p u reflects the influence probability on neighbors. Therefore, the total influence of node u is defined by combining importance and influence probability, denoted as II u (C i ).
Based on the influence of nodes, key nodes and bridge nodes are selected as a need to be blocked nodes in this paper. The key node refers to the node with large II u (C i ), and the bridge node means the one that has edge with nodes in the other community. The key nodeset and bridge node-set are defined as: Definition 1 (key node-set). Given a network G = C 1 | C 2 | . . . | C m with m communities. Top k of II u (C i ) is selected as the key nodes in community C i , denoted as U i = (u 1 , u 2 ,. . ., u k ). The collection of key nodes in all communities is called the key node set of the network, denoted as K = U 1 | U 2 | . . . |U m .

Minimizing the influence
Definition 2 (bridge node-set). Given a network G = C 1 | C 2 | . . . | C m with m communities. If the number of edges between node u [ C i and other nodes in C i is less than the degree of the node u, the node u is called the bridge node of the community C i . The bridge node-set consists all bridge nodes in communities, denoted as B.

Dynamic blocking process
The process of rumor spread in the social networks usually consists of three stages, from the ascending phase at the beginning of the spread, then to the peak period, and finally, to the fading period. Considering the time-varying characteristics of rumor spread, we use the dynamic blocking algorithm to block the selected nodes step by step.
Algorithm 1 Minimize the impact of dynamic rumors based on community structure Input: a network G = (V, E) Output: infection rate 1. The network G is divided into m communities by community detection, G = C 1 | C m | . . . | C m 2. For j = 1 to T do 3. For i = 1 to m do 4. calculate the influence of nodes in the community by formula (1) 5. calculate the influence probability of the nodes by formula (2) and (3) 6. calculate the total influence of nodes (II) by formula (4) 7. If j = 1 8. block (top.k 1 ) | B 9. Else block top.k j 10. End if 11. End for 12. End for The blocking process includes T time iterations. At the first block (j = 1), the algorithm calculates the influence of each node, and then selects the key node set top.k 1 from each community, merges the bridge node-set B of each community as a set of blocked nodes at j = 1. When j = 2, because of the changes in the network structure and infection state of nodes, the influence II of the remaining nodes should be recalculated, and the algorithm selects the key node set top.k 2 from each community as a set of blocked nodes at j = 2. Repeat this process until j = T.
Algorithm 1 is the whole process of the CoF-MIDR algorithm.
Step 1 is to divide the network into m communities. The time complexity is O (l), where l is the number of edges in the network. Steps 4-6 calculate the influence of nodes. Steps 7-9 block the selected nodes. Assuming that each community has n nodes, the time complexity of T iterations is O (Tmn). Therefore, the total time complexity of the algorithm is O (l þ Tmn).

Experimental evaluation
In this section, we verify the effectiveness of the CoF-MIDR algorithm, the impact of the initial blocking time and the blocking duration on the blocking effect.
4.1 Experiment preparation 4.1.1 Data set. We use two data sets with attributes including Facebook and Rochester in the process of experiments. The Facebook data set contains 4,039 nodes, 88,234 edges and IJCS 3,3 1,403 attributes. The Rochester data set contains 4,563 nodes, 161,404 edges and 237 attributes. The detailed description information of these data sets is shown in Table I.
We use normal propagation (NormalP) , greedy blocking (GreedyB) , positive cascades blocking (PositiveCB) (Tong et al., 2018) and dynamic blocking (DynamicB)  as the baseline algorithms in this paper. NormalP is a normal propagation algorithm that does not consider any blocking strategy. Both GreedyB and PositiveCB are static blocking methods, while DynamicB is a dynamic blocking method. GreedyB is an improved algorithm based on greedy algorithms. When a rumor is detected, it immediately selects nodes to block, trying to prevent further spread of rumors as quickly as possible. PositiveCB selects the seed nodes to generate multiple positive cascades to minimize the number of rumor active nodes. DynamicB gradually blocks the selected nodes according to the update states of each round of nodes, instead of blocking them at the same time.
Evaluation metric.
The infection rate is used as the evaluation metric of the blocking effect, which is defined as the proportion of the number of infected nodes in the total number of nodes after each iteration in the process of rumor blocking. The low infection rate implies the better blocking effect of the algorithm. The definition of infection rate is shown in equation (5).
where I _ num represents the number of infected nodes, and N represents the total number of nodes in the network.
4.2 Experimental result 4.2.1 Effectiveness evaluation. In this section, to verify the effectiveness of the proposed algorithm, we compare the rumor infection rate of GreedyB, DynamicB, PositiveCB and CoF-MIDR on two data sets. In the experiment, we set the blocking rate (the proportion of blocked nodes in the social networks) as 5, 10 and 15 per cent, respectively, and suppose the rumor was discovered and blocked in the 5th iterations (blue dashed line). After several experiments, it was found that the infection rate of all blocking algorithms reached an almost steady state from 60th to 80th iteration. Therefore, the number of iterations was set as 100 in all experiments. In Figure 2, the solid blue line represents the normal spread of rumors. As the number of iterations increases, the infection rate of the algorithms using the blocking strategy will be reduced to a different extent than normal propagation. As can be seen from the result, the Minimizing the influence dynamic blocking algorithms have superior performance than the static blocking algorithms, in which our proposed CoF-MIDR algorithm performs best, especially on the Facebook data set. Therefore, with the mesoscopic structure information of the community, we combine the influence of nodes in the community with the probability of rumor influence, so that the influence of nodes can be measured more accurately, which is conducive to preventing the spread of rumors. In the Rochester data set with the high average degree of nodes because of the dense relationship of nodes, the rumor spreads too fast in the network and it is difficult to block them, resulting in similar results of these blocking algorithms in this data set. However, the CoF-MIDR algorithm still achieves slightly superior performance than baseline algorithms in this situation. In addition, because of the fast propagation speed and wide propagation path of rumors, the ideal blocking effect cannot be achieved when blocking rate is small, such as 5 per cent, especially in the dense network. 4.2.2 Impact of initial blocking time. In our experiments, it is generally assumed that a message is diffused in the social network for a period of time, and then it is detected as a Minimizing the influence rumor at a certain moment and the rumor blocking strategy is immediately launched to prevent further spread of rumors. Obviously, in this process, the initial blocking time will have a huge impact on the final rumor infection rate in the whole social network. In the experiment in this section, initial blocking time is set to 5, 10 and 15, indicating that the rumor was blocked at different stages of propagation. Figure Figure 4 shows the impact of blocking duration on the final rumor infection rate under the CoF-MIDR blocking algorithm. In this experiment, we assume that the rumor is detected at the fifth iteration and begins to block, with the blocking duration set to 10, 20 and 30. Figures 4(a) and Figures 4(b) show the rumor infection rates of the two data sets for three different blocking durations, respectively.
As can be seen from Figure 4, the longer the node is blocked (blocking duration = 30), the lower the rumor infection rate. With the decrease in blocking duration, the rumor infection rate is higher. However, as too long blocking duration may affect user satisfaction, the blocking duration should not be too long. As a consequence, the study of blocking duration is beneficial to the design of a rumor blocking strategy with lower cost and better effect.

Conclusion
In this paper, we propose a CoF-MIDR. Using the characteristics of community structure, we integrate the influence of nodes in the community and the influence probability of rumors to measure the total influence of nodes, and then mine key nodes and bridge nodes in communities as blocked nodes. The dynamic blocking strategy is adopted to improve the blocking effect of the rumor. Experimental results on two real data sets show that the proposed algorithm has superior performance.
The algorithm of this paper adopts user attributes to measure the similarity between users, so as to estimate the probability of rumor infection. The attributes of the users in the data sets we are using are complete, but some data sets may have incomplete attributes of the users in reality. Therefore, we will consider the impact of the absence of user attributes on the rumor blocking effect in future work.