An improved algorithm for personalized recommendation on MOOCs

Purpose – In the past few years, millions of people started to acquire knowledge from the Massive Open Online Courses (MOOCs). MOOCs contain massive video courses produced by instructors, and learners all over the world can get access to these courses via the internet. However, faced with massive courses, learners often waste much time ﬁ nding courses they like. This paper aims to explore the problem that how to make accuratepersonalizedrecommendations forMOOC users. Design/methodology/approach – This paper proposes a multi-attribute weight algorithm based on collaborative ﬁ ltering (CF) to select a recommendation set of coursesfortargetMOOC users. Findings – The recall of the proposed algorithm in this paper is higher than both the traditional CF and a CF-based algorithm – uncertain neighbors ’ collaborative ﬁ ltering recommendation algorithm. The higher the recallis, themoreaccuratetherecommendationresult is. Originality/value – This paper re ﬂ ects the target users ’ preferences for the ﬁ rst time by calculating separately theweight of theattributes andtheweight of attribute valuesof thecourses.

1. Introduction 1.1 Background and related work Recent years have witnessed the rapid development of computer and the internet, changing many aspects in people's daily lives such as the way of receiving education. People all over the world can get access to massive quality courses through the internet. Massive online courses are produced by instructors who are professional in specific fields. These online courses combine the picture, voice and flash, which greatly arouse the learners' interests (Al-Atabi and Deboer, 2014). This outstanding distance education method is called Massive Open Online Courses (MOOCs). MOOCs have benefited tens of millions of students all over the world. However, when learners are faced with those massive courses, they often need to spend much time looking for the course they really like. It also happens in other similar situations in the age when resources are increasing exponentially. Thus, the personalized recommendation has become a research point both in academic field and in industrial field.
The recommendation technology is first proposed in the early days of e-commerce. The early recommendation system treats all users the same, and it recommends the same resources to every user. It is obvious that the early recommendation system has low efficiency because it ignores the differences between users. As time went by, the personalized recommendation technology emerges to treat every user as a specific person and recommend resources to users according to the user's preferences. Nowadays, there exist three major recommendation approaches: collaborative filtering (CF) (Felfernig et al., 2014;Bobadilla et al., 2012;Jeong et al., 2010), content-based filtering (Pazzani and Billsus, 1997;Adomavicius and Tuzhilin, 2013;Serrano-Guerrero et al., 2011) and knowledge-based recommendation (Felfernig et al., 2006). In this paper, we focus on improving the CF because it is simple and obtains better recommendation results than the other recommendation algorithms (De Campos et al., 2010;Herlocker et al., 1999). The CF has two main problems: one is the sparse matrix problem (Papagelis et al., 2005) and the other is the cold start problem (Herlocker et al., 2004).
In the academic field, many scholars have studied the CF technology and have made a great progress. Sarwar et al. (2002) propose SVD (singular value decomposition) method to release the influence of sparse matrix problem of the CF. Ortega et al. (2013) use Pareto dominance to eliminate less representative users by pre-filtering process while retaining the most promising neighbors. Huang et al. (2010) propose a CF-based prediction algorithm named uncertain neighbors' collaborative filtering recommendation algorithm (UNCF) that selects a trustworthy subset from the original neighbor set. Guo and Deng (2007) propose a new personal technology of recommendation based on traditional CF algorithm to improve the e-commerce recommender system. Zhang et al. (2009) present an approach to compute similarity between genres as similarity to lower sparsity of the user-item score matrix. Feng et al. (2004) also try to lower sparsity of the user-item score matrix by using a new revised conditional probability expression.
In the industry field, the CF technology is first used in Grouplens system which aims to recommend users news which may be satisfied with the users' preference (Konstan et al., 1997). In the early twentieth century, CF is applied on e-commerce systems such as Amazon, Ebay, Taobao and so on. The CF helps these e-commerce systems obtain great success. Apart from being used in e-commerce, the CF technology is applied on Video Recommender which is a movie recommendation system developed by Bell Core. CF is also applied on the music community such as Last.fm.

Summary of content and contributions
Although some aspects of the CF have improved by many scholars, the CF still can be improved in other aspects. In this paper, we combine the CF's basic idea with the attributes of MOOC videos' structure's feature to propose a new algorithm of personalized recommendation. The results of this paper can be used in other fields as well.
The rest of this paper is organized as follows. First, we introduce the basic idea of the CF technology and the process of the CF technology. We analyze the process of the CF and find its advantages and shortcomings which have not been resolved completely. Second, we propose a multi-attribute weight algorithm (MAWA) using the attribute weight and attribute value weight to get accurate users' preferences. At last, we conduct an experiment to verify the MAWA's recommendation result is more accurate than the traditional CF and the UNCF proposed in Huang et al. (2010).
Our main contributions include two aspects: (1) using attribute weight and attribute value weight to reflect the users' preferences in both coarse granularity and fine granularity; and (2) making up for the two main shortcomings of CF, whereas many other improved approaches solve only one of them.

Collaborative filtering technology
In this part, we will introduce CF algorithm, describe the procedure of CF algorithm and analyze the advantages and shortcomings of CF algorithm.
2.1 Introduce to collaborative filtering technology CF algorithm was proposed by scholars in 1990s to make up for the shortcomings of the content filtering algorithm. Because CF algorithm appeared, CF technology has been a research hotspot because of its solid theoretical basis. The CF technology supposes that the users' preferences will not change over time. Typically, CF can be divided into user-based collaborative filtering (UBCF) and item-based collaborative filtering (IBCF). The difference between them lies in that the UBCF tries to find the collections of similar users and recommend the historical resources of the target user's similar users, whereas the IBCF tries to find the collections of similar resources and recommend the resources which are similar to the target user's resource history. The CF technology can find out the new resources which are similar to the target user's resource history. It means that CF technology can detect the potential resource that the target user may be interested in.
2.1.1 Sub-subheadings. Typeset sub-subheadings in medium face italic and capitalize the first letter of the first word only. Section numbers to be in roman.

Similarity measurement
Because both user-based collaborating filtering and item-based collaborating filtering need to measure the similarity, we will introduce the three most popular similarity measure methods below. Take the user-based collaborating filtering for example. The user-item score matrix R is an m Â n matrix, which means there are m users and n items. R u , c means that the user u gives the item c a score R u , c .
2.2.1 Cosine similarity. The user's scores of the items are considered as an n-dimensional vector, and the cosine angle between the users' score vectors represents the similarity between the users. The cosine similarity formula is as follows: In equation (1), sim(u i , u j ) represents the similarity between user u i and user u j . 2.2.2 Modified cosine similarity. The cosine similarity has a shortcoming that the user may not score the item, which will result in errors in the results. To make up for this, modified cosine similarity is proposed. The user u i and u j have scored two item sets I i and I j , respectively. The items in both I i and I j form a set I ij . The modified cosine similarity formula is as follows: In equation (2), R i means u i 's average score of items which have been scored by user u i . 2.2.3 Pearson correlation coefficient. The set I ij contains items that both u i and u j have scored. The Pearson's correlation coefficient formula is as follows:

Process of user-based collaborative filtering algorithm
The main process of UBCF algorithm can be divided into three stages. 2.3.1 Establish the user model. Use m Â n matrix R to represent the user-item score matrix, containing m users and n items. The element of matrix is the score that the user has given to the item, ranging from 0 to 10. When the element is 0, it means that the user has not used the item or has not given a score to the item.
2.3.2 K nearest neighbor query. First, use the similarity measure methods to measure the similarities between target user and other users. Second, select k users with the highest k similarity degrees to be the neighbors of the target user. Finally, form a neighbor set with the k selected users in non-incremental order of the similarity degrees.
2.3.3 Generate recommendations. Use the neighbor set in Step 2 and the formula below to predict the user's score for each item. Recommend N items with top N predicted scores. The predicted scores are calculated as follows: In equation (4), r n means u's average score for all items and N u is u's neighbor set.

The advantages and shortcomings of collaborative filtering
Both the UBCF and the IBCF have achieved great success in many fields in practice. Based on the theory and process of CF introduced above, the advantages and shortcomings of CF are summarized as follows: 2.4.1 The advantages of collaborative filtering CF is easy. As we can see from this paper, CF only needs a user-item score matrix and conducts simple calculation based on this matrix, so the CF is very easy to be applied in practice.
CF can be applied on many different fields. The CF can be applied to both structured and unstructured resources, so the CF can be applied widely. CF can detect users' new interest points. As mentioned above, CF can recommend the new resources to target users to help users detect their new interest points.

The shortcomings of collaborative filtering
Cold start problem: The CF generates recommendation set according to the resources that the users have used and scored. So only if one resource has been used and scored by at least one user, it can be recommended to other users. However, when there is a new resource, it has not been scored by any users, so it has no chance to be enrolled in the recommendation set. Sparse matrix problem: As mentioned above, the CF calculates on user-item score matrix. The matrix is sparse because the average amount of items that one user has used is less than 1 per cent of the total and many users do not score the items initiatively. So the similarity measurements between users are not accurate and the neighbor set is not reliable, resulting in low recommendation efficiency.

Multi-attribute weight algorithm
In Section 2, we introduce CF technology and analyze its shortcomings. To overcome these shortcomings, we propose MAWA.

Introduce to multi-attribute weight algorithm
Suppose that an item can be described by some constant attributes and the target user likes one item because it has some attributes that the user likes. Based on these two assumptions, we recommend some resources to the target user according to the attributes. Based on CF technology and the assumption, we propose a MAWA which uses attribute weight to represent the user's preference on attribute and attribute value weight to represent the user's preference on attribute value under an attribute. The MAWA calculates on the user-attribute value matrix R. The M*N matrix R means that there are M users and N attribute values for an attribute. The element of R is the number that the user has visited the attribute value. If the item can be described by T constant attributes, there will be T matrices in total.
The MAWA makes up for the shortcomings of CF. With regard to CF's cold start problem which means the new resources have no chance to be recommended, the MAWA uses the attribute value instead of item in basic matrices. As the amount of the old resources is big enough, the new resource contains at least one common attribute values with the old resources. Because the MAWA recommends a resource in accordance with the resource's attribute value instead of the entire resource, the new resources containing the old attribute values have the chance to be recommended. Thus, the MAWA does not have the cold start problem. As to the sparse matrix problem, because the number of attributes and the amount of attribute values are both constant and they are far less than the amount of the items, the dimension of the MAWA's basic matrix is far fewer than the dimension of the CF's basic matrix. Under the same data set, the MAWA's basic matrix with fewer dimension is far denser than the CF's basic matrix. With the amount of the resources increasing, the MAWA's basic matrix gets denser, whereas the CF's basic matrix gets sparser. Thus, the MAWA doesn't have the sparse matrix problem.
The MAWA calculates on the user-attribute value matrices to find every attribute value's neighbor attribute values by the means of CF and recommend the resources which has the target user's favorite attribute and attribute value or their neighbors.
3.1.1 Process of multi-attribute weight algorithm. The main process of MAWA is as follows: Calculate the similarity between attribute values. If the item can be described by T constant attributes, then there will be T userattributes value matrices R 1 , R 2 . . . R S to R T . For each user-attribute value matrix R S , use the cosine similarity measurement to calculate the attribute values' similarity with other attribute values of the attribute S.
In equation (5), sim(S a , S b ) represents the similarity between attribute value S a and S b . R u;S a means that the user U has visited the resources with S a for R u;S a times. Find the neighbor set of attribute value. Select K attribute values with top k similarity degrees to be the neighbors and form a neighbor set with the k selected items in non-incremental order of the similarity degrees. Get the recommended attribute value set. For every attribute value, predict the times that the target user uses it. Select L recommended attribute values with top L prediction values to form a recommended attributed value set. The prediction formula is as follows: In equation (6), P u;Sa represents the predicted amount that user U uses attribute value S a . P Sa is the average amount that all users use attribute value S a . The C means the neighbor set of attribute value S a . Calculate the weight of attribute values. For each attribute value, calculate its weight for target user U. If the attribute value is not included in recommended attribute value set of target userU, its weight is 0. The weight formula of attribute values is defined as follows: In equation (7), W u;S a ð Þ represents the weight of attribute value S a for user U.L S is the recommended attribute value set of attribute S: Calculate the weight of attributes. For target user U, calculate the standard deviation for each attribute to represent the interest distributions. The bigger the standard deviation is, the more concentrated the target user's interests are. The weight formula of attributes is as follows: In equation (8), Q u , s represents the standard deviation of attribute S for target user U. The R u;S is user U's average use times for attribute S. And n s is the number of attribute values of attribute S. Generate the recommendation set. For target user U, calculate the value of all attributes for each resource. Select the N recommended resources with highest N recommendation values to form a recommendation set. Recommend the resources in recommendation set to target user. The recommendation value formula is as follows: In equation (9), the V u , r represents the resourcer's recommendation value for user U. The M S is a controlling coefficient that can help modify the attribute weight to achieve a better recommendation efficiency.

Experimental results and discussion
In Section 2 and Section 3, we respectively introduce the CF technology and MAWA. In this part, we compare the efficiencies of the two algorithms using true data set from MOOC College.

Data set and evaluating indicator
In this paper, we use the web crawler to get the true data set from MOOC College. In the data set we use contains 978 MOOC videos, 30 users and 2026 score records. The scarcity of useritem score matrix is 6.9 per cent. We select 4 constant attributes including language, platform, school and video score to represent the MOOC videos. In more detail, the language contains Chinese and English. The videos are originally from 18 different platforms and 73 different schools. The scores vary from 0 to 10. The recall in the field of information retrieval can be used to evaluate the effect of the CF, the UNCF and the MAWA. The higher the recall is, the more accurate the recommendation results are. First, division coefficient "a" is utilized to divide the raw data into two parts, one of which is the test set, whereas the other one is the training set. For example, 'a' ranges from 0.1 to 0.9 by 0.1. The 2026* 'a' resources are regarded as the training set and the rest are test set. Second, use the CF, the UNCF and the MAWA respectively to generate the recommendation set. Finally, use equation (10) to calculate the recall rate:

Experimental environment
First, we use python 2.7.12 to write the web crawler to get data set from the MOOC College and the data downloaded from the web is stored in mysql 5.7.17. Second, we set the controlling coefficients to be 5 5 2 and 10 for the four attributes. Then we use the MATLAB 8.3.0.532 to generate the recommendation sets using the CF, the UNCF and the MAWA and calculate the recalls for nine times with varying 'a'. As for the coefficients of UNCF, we choose the coefficients that achieve the best recommendation effectiveness in [22]. Finally, we get a curve of recalls and an average recall.

Experimental results and discussion
The experimental results are shown in Table I and Figure 1: The average recalls of the CF, the UNCF and the MAWA are: Recall CF ¼ 0:3469 Recall MAWA ¼ 0:4453 In Figure 1, it is obvious that the MAWA's recall is higher than the CF and the UNCF. The average recalls show precisely that the MAWA increases the recalls of the CF and the UNCF, respectively, by 28.3 and 17.9 per cent. Because the MAWA and the CF are calculated on the same data set, the result proves that the MAWA can release the influence of the sparse matrix problem and improve the accuracy of recommendation.

Conclusion and future work
In this paper, we propose the MAWA to make up for the shortcomings of the CF algorithm. The MAWA uses attribute weight and attribute value weight to calculate the resources' recommendation value. Compared with the CF, the MAWA uses matrices with far less dimensions, so the MAWA does not have sparse matrix problem. We take the popular MOOC videos, for example, and conduct an experiment on the true data set and the result shows that the MAWA has higher efficiency than the CF and the UNCF. It should be emphasized that the MAWA uses attribute weight and attribute value weight respectively, which can reflect the users' preferences in the coarse granularity and fine granularity.
In the future, we will focus on adding the evolutionary mechanism to make the MAWA suitable for users' changing preferences.
Beijing, China. From 2014 to 2015, she was a Visiting Scholar at the Department of Electrical Engineering, Princeton University, USA. Her research areas include video communication and networking, video coding, channel coding, information theory, optimization, network economics and ubiquitous computing. Wen Ji is the corresponding author and can be contacted at: jiwen@ict.ac.cn Shiwei Wang received the PhD degree in Control Systems from the Liverpool John Moores University in 2007. He is currently the R&D Director of Weihai Yuanhang Technology Development Co., Ltd., Weihai, China, as an awardee of the Thousand Talents Plan of China. His current research interests include artificial intelligent, machine vision, robotics and the corresponding application in food and pharmaceutical industry.
Yiqiang Chen received the BSc and MS degrees from the University of Xiangtan, Xiangtan, China, in 1996 and1999, respectively, and  For instructions on how to order reprints of this article, please visit our website: www.emeraldgrouppublishing.com/licensing/reprints.htm Or contact us for further details: permissions@emeraldinsight.com