Recommender system using collaborative filtering approach uses the past users' behavior to predict what items the current user would like.
We create a UxM matrix where U is the number of users and M is the number of different items or products. Uij is the rating expressed by the user-i for product-j.
In the real world, not every user expresses an opinion about every product. For example, let us say there are five users including Bob has expressed their opinion about four movies as shown below Table 1:
Our goal is to predict what movies to recommend to Bob, or put it another way should we recommend movie4 to Bob, knowing the rating for four movies from other users including Bob.
Traditionally, we could do item to item comparison, which means if the user has liked item1 in the past then that user may like other items similar to item1. Another way to recommend is to do user to user comparison, where if two users have the similar profile then we can recommend items liked by user1 to other users similar to user1.
Above are examples of some of the memory based technique for the recommendation. On the other hand, the model-based techniques such as SVD, PCA, and probabilistic recommendation creates a model offline. This model is then used to determine the degree of likeness of an item by a user or recommending top N items to the user.
Alternating least square (ALS) is a latest and very popular technique for collaborating filtering. This technique won the Netflix prize. If we convert the above table into a matrix of NxP matrix where N is the number of users and P is the number of products for which users have provided the implicit or explicit rating. Implicit rating is where instead of asking users about how they liked the product, the rating is inferred based on other factors such how often user visited the website, how long users stayed on the website, or whether users buy the product. In the real world, this matrix will be very sparse since not every user will review or rate every product.
Since many of the entries in the table is empty including the one with '?', the goal is to fill those entries based on other users rating. ALS does this by breaking this NxP matrix into two matrices: U x n_factor and n_factor x M. n_factor is a number of latent factors, a property of the system. ALS is an iterative procedure in which a cost function is minimized to prevent overfitting. Missing entries are initialized to some numbers and then iteratively we update those numbers while minimizing the cost function.
Apache Spark has ALS.trainImplicit() function that takes RDD of tuples of (userId, productId, rating), a rank value which represents n_factor, and a seed for the random number. For example, a call to the function might be:
model = ALS.trainImplicit(trainData, rank=10, seed=100)
To find the right value of rank, we may try different value or through cross-validation, we can find the optimum value of the rank.
To get the prediction, call
We create a UxM matrix where U is the number of users and M is the number of different items or products. Uij is the rating expressed by the user-i for product-j.
In the real world, not every user expresses an opinion about every product. For example, let us say there are five users including Bob has expressed their opinion about four movies as shown below Table 1:
| movie1 | movie2 | movie3 | movie4 | |
| user1 | 1 | 3 | 3 | 5 |
| user2 | 2 | 4 | 5 | |
| user3 | 3 | 2 | 2 | |
| user4 | 1 | 3 | 4 | |
| Bob | 3 | 2 | 5 | ? |
Our goal is to predict what movies to recommend to Bob, or put it another way should we recommend movie4 to Bob, knowing the rating for four movies from other users including Bob.
Traditionally, we could do item to item comparison, which means if the user has liked item1 in the past then that user may like other items similar to item1. Another way to recommend is to do user to user comparison, where if two users have the similar profile then we can recommend items liked by user1 to other users similar to user1.
Above are examples of some of the memory based technique for the recommendation. On the other hand, the model-based techniques such as SVD, PCA, and probabilistic recommendation creates a model offline. This model is then used to determine the degree of likeness of an item by a user or recommending top N items to the user.
Alternating least square (ALS) is a latest and very popular technique for collaborating filtering. This technique won the Netflix prize. If we convert the above table into a matrix of NxP matrix where N is the number of users and P is the number of products for which users have provided the implicit or explicit rating. Implicit rating is where instead of asking users about how they liked the product, the rating is inferred based on other factors such how often user visited the website, how long users stayed on the website, or whether users buy the product. In the real world, this matrix will be very sparse since not every user will review or rate every product.
Since many of the entries in the table is empty including the one with '?', the goal is to fill those entries based on other users rating. ALS does this by breaking this NxP matrix into two matrices: U x n_factor and n_factor x M. n_factor is a number of latent factors, a property of the system. ALS is an iterative procedure in which a cost function is minimized to prevent overfitting. Missing entries are initialized to some numbers and then iteratively we update those numbers while minimizing the cost function.
Apache Spark has ALS.trainImplicit() function that takes RDD of tuples of (userId, productId, rating), a rank value which represents n_factor, and a seed for the random number. For example, a call to the function might be:
model = ALS.trainImplicit(trainData, rank=10, seed=100)
To find the right value of rank, we may try different value or through cross-validation, we can find the optimum value of the rank.
To get the prediction, call
recommended_products = [x.product for x in model.recommendProducts(userId, 5)]
Comments
Post a Comment