Collaborative filtering recommendation algorithm based on variational inference

Kai Zheng (College of Mathematics Computer Science/College of Software, Fuzhou University, Fuzhou, China)

Xianjun Yang (College of Mathematics Computer Science/College of Software, Fuzhou University, Fuzhou, China)

Yilei Wang (College of Mathematics Computer Science/College of Software, Fuzhou University, Fuzhou, China)

Yingjie Wu (College of Mathematics Computer Science/College of Software, Fuzhou University, Fuzhou, China)

Xianghan Zheng (College of Mathematics Computer Science/College of Software, Fuzhou University, Fuzhou, China)

International Journal of Crowd Science

ISSN: 2398-7294

Article publication date: 3 February 2020

Issue publication date: 3 March 2020

Downloads

947

pdf (268 KB)

Abstract

Purpose

The purpose of this paper is to alleviate the problem of poor robustness and over-fitting caused by large-scale data in collaborative filtering recommendation algorithms.

Design/methodology/approach

Interpreting user behavior from the probabilistic perspective of hidden variables is helpful to improve robustness and over-fitting problems. Constructing a recommendation network by variational inference can effectively solve the complex distribution calculation in the probabilistic recommendation model. Based on the aforementioned analysis, this paper uses variational auto-encoder to construct a generating network, which can restore user-rating data to solve the problem of poor robustness and over-fitting caused by large-scale data. Meanwhile, for the existing KL-vanishing problem in the variational inference deep learning model, this paper optimizes the model by the KL annealing and Free Bits methods.

Findings

The effect of the basic model is considerably improved after using the KL annealing or Free Bits method to solve KL vanishing. The proposed models evidently perform worse than competitors on small data sets, such as MovieLens 1 M. By contrast, they have better effects on large data sets such as MovieLens 10 M and MovieLens 20 M.

Originality/value

This paper presents the usage of the variational inference model for collaborative filtering recommendation and introduces the KL annealing and Free Bits methods to improve the basic model effect. Because the variational inference training denotes the probability distribution of the hidden vector, the problem of poor robustness and overfitting is alleviated. When the amount of data is relatively large in the actual application scenario, the probability distribution of the fitted actual data can better represent the user and the item. Therefore, using variational inference for collaborative filtering recommendation is of practical value.

Keywords

Citation

Zheng, K., Yang, X., Wang, Y., Wu, Y. and Zheng, X. (2020), "Collaborative filtering recommendation algorithm based on variational inference", International Journal of Crowd Science, Vol. 4 No. 1, pp. 31-44. https://doi.org/10.1108/IJCS-10-2019-0030

Publisher

:

Emerald Publishing Limited

License

Published in International Journal of Crowd Science. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

1. Introduction

With the continuous development of social networks, the problem of information overload has become increasingly serious. Although information overload can be alleviated by search engine retrieval, users are required to actively summarize the search content. However, meeting the needs of users is often difficult when using search results. Therefore, many applications currently use personalized recommendation combined with information search engines to serve the users. Personalized recommendation is the process of recommending items that may be of interest to users based on the historical behaviors of users or information contained in items (Adomavicius and Tuzhilin, 2005). To date, personalized recommendation is widely used in many fields, such as e-commerce (Yan et al., 2018), movie (Deldjoo et al., 2019), music (Schedl et al., 2018) and article reading (Cao et al., 2017).

In the recent years, researchers have proposed a number of personalized recommendation algorithms based on neural networks, which can improve the accuracy of recommendations more effectively compared with traditional recommendation algorithms. However, most of these algorithms lack persuasive theoretical explanations. Several researchers combined the neural network with the Bayesian probability-based recommendation model to obtain a recommendation model with improved performance and strong theoretical support to enhance the interpretability of the recommendation. These models assume that user behavior data obey a specific probability distribution then perform further mathematical derivation and modeling, which mainly includes collaborative subject regression (Wang and Blei, 2011) and collaborative deep learning (Wang et al., 2015).

However, the abovementioned models frequently need to calculate complex probability distributions and only a few hypothetical conditional simplified models can be added for calculation. Furthermore, the neural network structure in the aforementioned models mostly uses simple models such as auto-encoder (AE) or multi-layer perceptron and cannot mine deep user relationships. Variational inference can effectively solve the two aforementioned problems. The concept of variational inference is to approximate a complex probability distribution that is difficult to solve by transforming known simple probability distributions, such as normal distribution and Bernoulli distribution. With the development of deep learning in recent years, the calculation of variational inference no longer requires the use of complex algorithms, such as EM. However, the neural network can be constructed to train the parameters of probability distribution. Therefore, variational inference has become a crucial technique in deep learning. Given the difficulty in calculating the complex probability distribution, variational inference uses the neural network to fit the parameters of the probability distribution. Compared with the single vector training of the traditional deep learning model, the training of distribution can comprehensively represent the user and item vectors, thus mining a deep relationship.

Based on the preceding analysis, this paper uses variational inference to solve the problem of traditional collaborative filtering algorithms caused by large-scale data. First, the user score matrix is directly filled using variational inference and then Top-N recommendation is performed. For the existing KL-vanishing problem in the variational inference deep learning algorithm (Bowman et al., 2015), several available solutions are proposed. The KL annealing (Bowman et al., 2015) and Free Bits (Kingma et al., 2016) methods are then selected to construct the model. Finally, two collaborative filtering recommendation algorithms based on variational inference are obtained.

2. Related work

2.1 Recommendation algorithm

2.1.1 Collaborative filtering recommendation algorithm.

The core idea of the collaborative filtering recommendation algorithm is to recommend items that the user may like according to the relationship between similar users or items. Sedhain et al. (2015) proposed a collaborative filtering recommendation algorithm based on AE, abbreviated as ACF.

The main idea of ACF is to use AE as a data-filling tool, take the scores as input and output the filled score matrix after AE training. Then, a score prediction or top-N recommendation is performed. The advantage of this method is that the neural network can be used to mine the non-linear relationship between users. However, the main problem is that the network structure is very simple and noise resistance is weak.

Wu et al. (2016) proposed a cooperative denoising auto-encoder (CDAE) model based on Sedhain et al. (2015), which replaces the AE in Sedhain et al. (2015) using a denoising auto-encoder (DAE) model (Vincent et al., 2010).

Three differences are observed between CDAE and ACF, which lead to the better performance of the CDAE model for top-N recommendations:

CDAE adds the same offset to each input variable.
CDAE uses implicit feedback recommendation.
CDAE adds Gaussian noise to the training to improve the noise immunity of the model.

In the CDAE comparison model, instead of direct comparison with ACF, the AE in ACF was replaced by DAE for comparison. Experiments in Wu et al. (2016) show that the recommendation of DAE for Top-N is superior to AE, which can enhance the robustness of the model.

2.1.2 Hybrid recommendation algorithm.

The hybrid recommendation algorithm mainly considers how to combine the user registration information or the item content information and the collaborative filtering algorithm more closely. This method can alleviate the problem of cold start in collaborative filtering algorithm and make the recommendation better.

At present, one way to use these user or item auxiliary information is to construct a model directly and train the auxiliary information and user-scoring matrix together. The representative model is sparse linear model (SLIM) (Ning and Karypis, 2011). Ning and Karypis (2012) proposed several methods of using auxiliary information in recommendation model. The best one is the collective sparse linear model (cSLIM).

2.2 Variational inference

Variational inference is a type of technique for approximating complex probability distributions in Bayesian estimation and machine learning. This technique can be applied in numerous fields, such as natural language processing (Duh, 2018; Wang and Blunsom, 2015), computer vision and robotics (Hu and O’Connor, 2018; Krishnan et al., 2018), and computational neurology (Daunizeau et al., 2014; Gershman et al., 2014). The core idea of variational inference comprises two steps:

assume a known probability distribution q(z;λ); and
approximate q(z;λ) to p(z|x) by changing the parameter λ of the distribution.

In this case, the calculation of p(z|x) is converted into the calculation of the following formula:

(1) λ*= argmin divergence(p(z|x),q(z;λ))

After convergence of Formula (1), the actual probability distribution of p(z|x) can be replaced by q(z;λ*) as the posterior distribution.

2.3 Combination of variational inference and deep learning

Kingma and Welling (2013) proposed the use of deep learning to solve variational inference and introduced the variational auto-encoder (VAE) model.

2.3.1 Using deep learning to construct variational inference.

For clarity, this paper reports the KL divergence of the joint probability distributions p(x,z) and q(x,z):

(2) KL(p(x,z)||q(x,z))=∬p(x,z)ln p(x,z)q(x,z)dzdx

According to the probability formula, the following formula can be obtained:

(3) KL(p(x,z)||q(x,z))=∬p(x)p(z|x)ln p(x)p(z|x)q(x,z)dzdx=Ex∼p(x)[∫p(z|x)ln p(x)dz+∫p(z|x)ln p(z|x)q(x,z)dz]

Formula (3) shows that ln p(x) of the first term is independent of z, and p(x) is uniquely determined by the sampled data. Therefore, the first term is a constant. Thus, according to the Bayesian probability formula, the ultimate optimization target becomes the minimum value of Formula (4):

(4) L= Ex∼p(x)[∫p(z|x)ln p(z|x)q(x|z)q(z)dz]=Ex∼p(x)[Ez∼p(z|x)[−ln q(x|z)]+KL(p(z|x)||q(z))]

From the perspective of probability, the expectation can be approximated to the sampling calculation. Therefore in practical application, the calculation of the first term of Formula (4) is equivalent to obtaining z from the p(z|x) sample, and then substituting it into –ln q(x|z) for calculation. q(x|z) is assumed to obey a specific distribution. The specific distribution parameters can be directly trained by constructing a neural network from z to x. The same assumption is true for p(z|x). The value of KL divergence is solved according to the parameters obtained above.

Figure 1 depicts the overall model structure. The model optimization target is Formula (4).The solid line in the figure represents the main flow of the model, and the dashed box indicates the optimization target that must be met at a certain step of the process.

2.3.2 Variational Auto-Encoder.

The VAE is an application of variational inference. In the model of the VAE, the number of samples of the specific p(z|x) is directly taken as 1 in Kingma and Welling (2013) because of the random process of generating z for each normal distribution sampling. When the number of iteration steps becomes sufficient, the sampling is also considered sufficient. Therefore, Formula (4) can be converted into the following:

(5) L = Ex∼p(x)[−ln q(x|z)+ KL(p(z|x)||q(z))],z ∼ p(z|x)

If p(z|x) is assumed to be normally distributed and q(z) is taken as the standard normal distribution, then the following formula can be obtained:

(6) KL(p(z|x)||q(z))=12∑k=1D(μ(k)2(x)+δ(k)2(x)−ln δ(k)2(x)−1)

In Formula (6), x can directly train μ(x) and δ(x) through a deep learning network.

Assuming that q(x|z) obeys the normal distribution with mean μ(z) and variance of fixed value δ², the following is derived:

(7) −ln q(x|z) ∼ 12δ2‖x−μ(z)‖2

In this formula, z can directly train μ(z) through the deep learning network. Thus, the two parts of Formula (5) can be derived from Formulas (6) and (7).

Therefore, a VAE model can be constructed as shown in Figure 2, and the model optimization target is Formula (5).

3. Collaborative filtering recommendation algorithm based on variational inference

Most existing collaborative filtering algorithms have poor robustness and overfitting problems with the expansion of the recommendation data scale. This section mainly discusses the construction of a collaborative filtering recommendation algorithm through variational inference and optimizes the recommendation results to solve these problems.

The VAE in the variational inference model pertains to the generation model. Thus, this paper first attempts to directly use the VAE for score matrix filling and then performs a top-N recommendation. Afterward, the findings indicate that the variational inference deep learning model has KL-vanishing problem and selects the KL annealing and Free Bits methods for optimization.

3.1 Collaborative filtering recommendation algorithm based on VAE

3.1.1 Algorithm description.

Suppose u ∈ {1,…, U} represents the user, i ∈ {1,…, I} denotes the item, X ∈ N^U*^I is the score matrix of the associated users and items, and x_u = [x_u₁,…, x_uI]^T represents the score of the u-th user on all items, abbreviated as x. z = [z₁,…, z_D]^T denotes the hidden vector obtained from x, where D is the dimension of z. Referring to Wu et al. (2016), the following model uses implicit feedback recommendation and processes the final score data with only two values, namely, 1 and 0, to ensure the applicability of the model. A value of 1 indicates that the user likes the item, whereas a value of 0 indicates that the user does not like the item or has not rated the item.

The architecture of the entire generation model depends on the VAE model derived from Section 2.3.2. The optimization target of the model is derived as follows:

(8) L = Ex∼p(x)[−ln q(x|z)+KL(p(z|x)||q(z))]

In Formula (8), the KL divergence is similarly solved as that in Section 2.3.2. According to the foregoing discussion, only two values, namely, 0 and 1, are applied for x|z. Therefore, the actual probability distribution is close to the Bernoulli distribution, as shown in the following formula:

(9) q(x(k)|z)={ρ(k)(z), x(k)|z=11−ρ(k)(z), x(k)|z=0 , k∈[0,D]

Thus, Formula (10) can be obtained as follows:

(10) q(x|z)=∏k=1U[(ρ(k)(z))x(k)(1−ρ(k)(z))1−x(k)]−ln q(x|z)=∑k=1U[−x(k)ln ρ(k)(z)−(1−x(k))ln (1−ρ(k)(z))]

In Formula (10), ρ₍_k₎(z) can directly trained through the neural network constructed by z. Meanwhile, according to the Bernoulli distribution constraint, ρ₍_k₎(z) ∈ [0,1]. Therefore, the final layer of the neural network must be activated using the sigmoid function. The model in Figure 3 can then be constructed as follows.

The complete algorithm is shown in Algorithm 1.

Algorithm 1 Collaborative Filtering Recommendation Algorithm Based on Variational Auto-Encoder

Input: User rating information x

Output: Filled rating information x’

1. Process the score data x and turn it into binary data.

2. Construct a neural network according to Figure 3.

3. Calculate the –ln q(x|z) term and the KL divergence during each training iteration:

−ln q(x|z)=∑k=1U[−x(k)ln ρ(k)(z)−(1−x(k))ln (1−ρ(k)(z))]

KL(p(z|x)||q(z))=12∑k=1D(μ(k)2(x)+δ(k)2(x)−ln δ(k)2(x)−1)

4. Determine the final optimization target of the neural network:

L=Ex∼p(x)[−ln q(x|z)+KL(p(z|x)||q(z))]

5. Output x’ after training and further use x’ for Top-N recommendation.

3.1.2 Algorithm analysis.

The final optimization target of Algorithm 1 is Formula (8). The first term pertains to the reconstruction error of the entire model, whereas the second term denotes the probability distribution of the hidden vector z and the KL divergence of the standard normal distribution. Ultimately, reducing the result of Formula (8) is equivalent to maximizing the reconstruction error –ln q(x|z) and minimizing the KL divergence. For the KL divergence, if the input vector x is simply independent of the hidden vector z, then

(11) p(z|x)=p(z)=q(z)

is satisfied. At this point, the posterior probability distribution will degenerate into a prior probability distribution, and the KL divergence can take a minimum value of 0. For the reconstruction error part, given that the neural network can fit any distribution, relying on z to train the distribution of q(x) after the decoder is fully trained is no longer necessary. The two reasons ultimately lead to the rapid optimization of the KL divergence to 0 during training, which is no longer important in the training process. The aforementioned problem is called the KL-vanishing problem.

3.2 Optimization method of KL vanishing problem

3.2.1 Existing optimization methods.

Many researchers have proposed their own solutions to solve the KL-vanishing problem. Bowman et al. (2015) presented the KL-annealing method, which is as depicted as follows:

(12) L=Ex∼p(x)[−ln q(x|z)dz+β⋅KL(p(z|x)||q(z))]

At the beginning of the training, parameter β is set to 0 and then gradually incremented to 1 with the increase in the training step. The advantage is that q(z|x) can obtain additional time to embed the information of x into the hidden vector z.

Unlike the KL-annealing method, the Free Bits method (Kingma et al., 2016) uses a technique of “reserving a little space” for each dimension of the KL divergence. Specifically, a threshold ε is added to each dimension of the KL divergence, and the model will only optimize a dimension larger than ε. Therefore, a loss function is obtained as shown below:

(13) L=Ex∼p(x)[−ln q(x|z)dz+∑i=1Dmax (KL(p(z|x)||q(z)),ε) ]

Moreover, the entire KL divergence can be controlled without subdividing into each dimension when controlling for the KL divergence size. However, this tendency may result in only a few working dimensions, and most of the dimensions of the final z will not contain the information of x.

In addition to the two aforementioned methods, the normalizing flow method (Chen et al., 2016) and auto-encoding method (Shen et al., 2018) are introduced. Normalizing flow aims to obtain an improved prior probability distribution. The core of the normalizing flow method is to first sample the hidden vector from a simple distribution and then stabilize the hidden vector by continuous iterative reversible transformation. Conversely, the auto-encoding method is mainly used for dialogue generation. Combining VAE and cyclic neural network(CNN), their loss functions will interfere with each other at the beginning of training. Simultaneously, when using the VAE to model the dialogue, whether the hidden vector z obtained by prior probability can contain the information of the input variable cannot be guaranteed. Therefore, the AE can be explicitly constructed for z, and the VAE and AE can be separately trained to ensure easy convergence.

3.2.2 Using KL annealing to optimize collaborative filtering recommendations.

Section 3.2.1 shows that the final optimization target of the KL-annealing method is Formula (12), which introduces the equilibrium parameter β into KL divergence compared with the optimization target of the original variational inference. β will gradually increase during the training process. Therefore, the optimization step size of the KL divergence will be enlarged, such that the information of x can be injected into z. Meanwhile, as β increases, the weight of the generator part –ln q(x|z) will decrease. Therefore, relying on z is necessary to generate results.

Thus, the score matrix-filling algorithm obtained in Section 3.1.1 and the KL-annealing method is combined to obtain a new algorithm step as shown in Algorithm 2.

Algorithm 2 Collaborative filtering recommendation Algorithm based on KL-annealing method

Input: User rating information x

Output: Filled rating information x’

1. Process the score data x and turn it into binary data.

2. Construct a neural network according to Figure 3.

3. Calculate the –ln q(x|z) term and the KL divergence during each training iteration:

−ln q(x|z)=∑k=1U[−x(k)ln ρ(k)(z)−(1−x(k))ln (1−ρ(k)(z))]

KL(p(z|x)||q(z))=12∑k=1D(μ(k)2(x)+δ(k)2(x)−ln δ(k)2(x)−1)

4. Determine the final optimization target of the neural network:

L=Ex∼p(x)[−ln q(x|z)+β⋅KL(p(z|x)||q(z))]

where β is defined as a variable, which is an input from outside the model.

5. Select the total annealing step total_anneal_steps, for the i-th iteration, and calculate β as follows:

β=itotal_anneal_steps

6. Take x and β as inputs, output x’, and further use x’ for Top-N recommendations.

3.2.3 Using free bits to optimize collaborative filtering recommendations.

In Section 3.2.1, the final optimization target of the Free Bits method is Formula (13), which adds a constraint to each dimension of the KL divergence compared with the optimization target of the original variational inference. The optimization of the corresponding dimension is only performed when the value in the KL divergence dimension is sufficiently large. Such an approach does not strictly require that the probability distribution of the final hidden vector is completely close to the normal distribution. However, the approach allows for a certain artificially defined deviation. In this manner, protection for KL divergence in the early stage of training can be provided, such that KL divergence will not be quickly iterated to 0 at the beginning of training.

Algorithm 3 Collaborative Filtering Recommendation Algorithm Based on Free Bits Method

Input: User rating information x

Output: Filled rating information x’

1. Process the score data x and turn it into binary data.

2. Construct a neural network according to Figure 3.

3. Calculate the –ln q(x|z) term and the KL divergence during each training iteration:

−ln q(x|z)=∑k=1U[−x(k)ln ρ(k)(z)−(1−x(k))ln (1−ρ(k)(z))]KL(p(z|x)||q(z))=12∑k=1D(μ(k)2(x)+δ(k)2(x)−ln δ(k)2(x)−1)

4. Determine the final optimization target of the neural network:

L=Ex∼p(x)[−ln q(x|z)+∑i=1Dmax (KL(p(z|x)||q(z)),ε) ]

where ε is defined as a variable, which is an input from outside the model.

5. Select ε, take x and ε as inputs, output x’, and further use x’ for Top-N recommendations.

4. Experiment results

The experiments in this section mainly verify the effects of the proposed models in Sections 3.1 and 3.2. This paper mainly discusses the Top-N recommendation and chooses Recall rate, which is frequently used in Top-N recommendation evaluation as an indicator of accuracy evaluation. Correspondingly, this paper opts for normalized discounted cumulative gain (NDCG) as the correlation evaluation indicator.

4.1 Context of the experiment

The experiment uses Ubuntu 16.04 64 bit as the operating system, Python 2.7.13 as the programming language, Google Tensorflow 1.1.0 as the deep learning framework, and the single-card NVIDIA GTX 1080 as GPU. All contrast models in the following experiments are conducted in the same environment.

The experimental data sets in this section include MovieLens 1, 10 and 20 M data sets. For the Movie Lens 1 and 10 M data sets, the experiment only uses “ratings.dat” files. For the MovieLens 20 M data set, the experiment uses only “ratings.csv” files. In the data processing and according to the practice in Wu et al. (2016), this paper replaces the data with scores of four points and above with 1 whereas those with scores below four points are replaced with 0.

During the experiment, the stochastic gradient descent method is used to optimize the model objective function. The data batch size is 128. The initial learning rate of the optimizer Adam is 0.001. The learning rate is attenuated once every 15 cycles, and the decay rate is 0.025. A total of 3,000 cycles are iterated.

4.2 Experiment results

In the following experimental results, DAE, CDAE and cSLIM represent the three aforementioned models in Section 2.1. VAE is used to represent the VAE model built in Section 3.1. VAE_KL represents the VAE recommendation model using the KL-annealing method, while VAE_FB represents the VAE recommendation model using Free Bits method. NDCG and Recall represent the evaluation indicators. In this paper, @20, @50, and @100 respectively indicate Top-20, Top-50, and Top-100 recommendations. Tables I, TII, and TIII provide the results of all comparison experiments.

Based on the experimental results of the three tables, VAE does not perform well in terms of recommendation for a small (such as MovieLens 1 M) or large-scale (such as MovieLens 10 M and MovieLens 20 M) data sets. The main reason for such performance is the KL-vanishing problem mentioned in Section 3.1.2.

The effect of the model is considerably improved after using the KL annealing or Free Bits method to solve KL vanishing. VAE_KL and VAE_FB evidently perform worse than DAE and CDAE on small data sets, such as MovieLens 1 M. By contrast, whereas VAE_KL and VAE_FB have better effects on large data sets such as MovieLens 10 M and MovieLens 20 M. This finding is mainly because when the amount of data is insufficiently large, the probability distribution of fitting the actual data is often inaccurate, and the fitting of the data distribution can only accommodate a local situation. Constructing a probability distribution becomes suitable in reality only when the amount of data reaches a certain scale or by adding other information auxiliary data for fitting. In this respect, the filling of the scoring matrix by directly using variational inference is suitable for comparing the real scenes of large data scale. The effectiveness of this approach is mainly because of the following reasons:

The DAE and CDAE methods adopt a fixed noise-adding method, which will reduce the robustness of the model. The noise of VAE_KL and VAE_FB is mainly derived from the sampling of the probability distribution of the hidden vectors, and the degree of human interference is small.
The distribution sampling of VAE_KL and VAE_FB is random. The same model parameters may eventually produce different hidden vectors. Based on this notion, different output data are trained. In this manner, overfitting is less likely to occur compared with the DAE and CDAE methods.
VAE_KL and VAE_FB analyze the specific distribution of user score data obedience from the perspective of probability, which is more theoretical than DAE and CDAE. When dealing with large-scale data, the fitting of the probability distribution will be close to the real scene with practical values.

5. Conclusion

This paper presents the usage of the variational inference model for collaborative filtering recommendation. After introducing the KL annealing and Free Bits methods, the basic model effect is improved. Compared with the traditional method, the fixed noise-adding method will reduce the robustness of the model. Meanwhile, variational inference training denotes the probability distribution of the hidden vector, and the model noise mainly comes from the sampling of the probability distribution, which requires no artificial noise. Therefore, the robustness of the model will be improved. Meanwhile, the sampling of the probability distribution obtains different hidden vectors each time. Based on this tendency to obtain different output data, the occurrence of overfitting is reduced. When the amount of data is relatively large in the actual application scenario, the probability distribution of the fitted actual data can better represent the user and the item. Therefore, using variational inference for collaborative filtering recommendation is of practical value.

Figures

Figure 1.

Use of deep learning to construct variational inference

Figure 2.

Use of deep learning to construct VAE

Figure 3.

Score matrix filling based on VAE

Table I.

Comparison of model experiments on the MovieLens 1 M data set

Architecture	NDCG@100	Recall@20	Recall@50
DAE	0.36073	0.32553	0.45752
CDAE	0.35515	0.32027	0.45281
cSLIM	0.36305	0.32715	0.46154
VAE	0.32997	0.28204	0.42395
VAE_KL	0.35494	0.32259	0.45427
VAE_FB	0.34596	0.31497	0.44971

Table II.

Comparison of model experiments on the MovieLens 10 M data set

Architecture	NDCG@100	Recall@20	Recall@50
DAE	0.43600	0.40968	0.54993
CDAE	0.43801	0.41309	0.55231
cSLIM	0.43980	0.41302	0.55103
VAE	0.41754	0.39036	0.53386
VAE_KL	0.44313	0.41881	0.56055
VAE_FB	0.44164	0.41801	0.55681

Table III.

Comparison of model experiments on the MovieLens 20 M data set

Architecture	NDCG@100	Recall@20	Recall@50
DAE	0.42319	0.39307	0.52554
CDAE	0.42629	0.39502	0.52751
cSLIM	0.42734	0.39602	0.53244
VAE	0.40733	0.37628	0.51307
VAE_KL	0.43007	0.39952	0.53543
VAE_FB	0.43089	0.40096	0.53679

References

Adomavicius, G. and Tuzhilin, A. (2005), “Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions”, IEEE Transactions on Knowledge and Data Engineering, Vol. 17 No. 6, pp. 734-749.

Bowman, S.R. Vilnis, L. Vinyals, O. Dai, A.M. Jozefowicz, R. and Bengio, S. (2015), “Generating sentences from a continuous space”, Computer Science.

Cao, S., Yang, N. and Liu, Z. (2017), “Online news recommender based on stacked au-to-encoder”, IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), IEEE, pp. 721-726.

Chen, X. Kingma, D.P. Salimans, T. Duan, Y. Dhariwal, P. Schulman, J. Sutskever, I. and Abbeel, P. (2016), “Variational lossy autoencoder”.

Daunizeau, J., Adam, V. and Rigoux, L. (2014), “VBA: a probabilistic treatment of nonlinear models for neurobiological and behavioural data”, PLoS Computational Biology, Vol. 10 No. 1, p. e1003441, doi: 10.1371/journal.pcbi.1003441.

Deldjoo, Y., Dacrema, M.F., Constantin, M.G., Eghbal-Zadeh, H., Cereda, S., Schedl, M., Ionescu, B. and Cremonesi, P. (2019), “Movie genome: alleviating new item cold start in movie recommendation”, User Modeling and User-Adapted Interaction, Vol. 2019 No. 5, pp. 1-53, doi: 10.1007/s11257-019-09221-y.

Duh, K. (2018), “Bayesian analysis in natural language processing”, Computational Linguistics, Vol. 44 No. 1, pp. 187-189, doi: 10.1162/COLI_r_00310.

Gershman, S.J., Blei, D.M., Norman, K.A. and Sederberg, P.B. (2014), “Decomposing spatiotemporal brain patterns into topographic latent sources”, NeuroImage, Vol. 98, pp. 91-102, doi: 10.1016/j.neuroimage.2014.04.055.

Hu, K. and O’Connor, P. (2018), “Learning a representation map for robot navigation using deep variational autoencoder”.

Kingma, D.P. Salimans, T. Jozefowicz, R. Chen, X. Sutskever, I. and Welling, M. (2016), “Improving variational inference with inverse autoregressive flow”, NIPS.

Kingma, D.P. and Welling, M. (2013), “Auto-encoding variational bayes”, arXiv preprint arXiv:1312.6114.

Krishnan, R. Subedar, M. and Tickoo, O. (2018), “BAR: Bayesian activity recognition using variational inference”, arXiv preprint arXiv:1811.03305.

Ning, X. and Karypis, G. (2011), “SLIM: sparse linear methods for top-N recommender systems”, 2011 IEEE 11th International Conference on Data Mining, IEEE, pp. 497-506.

Ning, X. and Karypis, G. (2012), “Sparse linear methods with side information for top-n recommendations”, Proceedings of the sixth ACM Conference on Recommender systems, ACM, pp. 155-162.

Schedl, M., Zamani, H., Chen, C.W., Deldjoo, Y. and Elahi, M. (2018), “Current challenges and visions in music recommender systems research”, International Journal of Multimedia Information Retrieval, Vol. 7 No. 2, pp. 95-116, doi: 10.1007/s13735-018-0154-2.

Sedhain, S., Menon, A.K., Sanner, S. and Xie, L. (2015), “Autorec: autoencoders meet collaborative filtering”, Proceedings of the 24th International Conference on World Wide Web, ACM, pp. 111-112.

Shen, X. Su, H. Niu, S. and Demberg, V. (2018), “Improving variational encoder-decoders in dialogue generation”.

Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y. and Manzagol, P.A. (2010), “Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion”, Journal of Machine Learning Research, Vol. 11 No. 12, pp. 3371-3408, doi: 10.1016/j.mechatronics.2010.09.004.

Wang, C. and Blei, D.M. (2011), “Collaborative topic modeling for recommending scientific articles”, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 448-456.

Wang, H., Wang, N. and Yeung, D.Y. (2015), “Collaborative deep learning for recommender systems”, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 1235-1244.

Wang, P. and Blunsom, P. (2015), “Stochastic collapsed variational inference for hidden markov models”, Statistics.

Wu, Y., Dubois, C., Zheng, A.X. and Ester, M. (2016), “Collaborative denoising auto-encoders for top-n recommender systems”, Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, ACM, pp. 153-162.

Yan, C., Yan, H., Zhang, Q. and Wan, Y. (2018), “NSPD: an N-stage purchase decision model for e-commerce recommendation”, Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data, Springer, Cham, pp. 149-164.

Corresponding author

Yilei Wang can be contacted at: 2736648669@qq.com

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Keywords

Citation

Publisher

License

1. Introduction

2. Related work

2.1 Recommendation algorithm

2.1.1 Collaborative filtering recommendation algorithm.

2.1.2 Hybrid recommendation algorithm.

2.2 Variational inference

2.3 Combination of variational inference and deep learning

2.3.1 Using deep learning to construct variational inference.

2.3.2 Variational Auto-Encoder.

3. Collaborative filtering recommendation algorithm based on variational inference

3.1 Collaborative filtering recommendation algorithm based on VAE

3.1.1 Algorithm description.

3.1.2 Algorithm analysis.

3.2 Optimization method of KL vanishing problem

3.2.1 Existing optimization methods.

3.2.2 Using KL annealing to optimize collaborative filtering recommendations.

3.2.3 Using free bits to optimize collaborative filtering recommendations.

4. Experiment results

4.1 Context of the experiment

4.2 Experiment results

5. Conclusion

Figures

Figure 1.

Figure 2.

Figure 3.

References

Corresponding author

Related articles

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information