協同過濾結合迴圈神經網路的推薦系統——期末作業
Recommendation System using Collaborative Filtering and Recurrent Neural Network
author:Fu-ze Zhong
Email: [email protected]
School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China.
abstract
The behavior of user in an e-commerce system can be modeled as a time series, and RNN performs well on a sequence model.
RNN is used to predict the time series data, and the behavior pattern of the user can use RNN model to mine and produce recommendations in a certain period of time.
In this paper, the recommendation system mainly proposal a recommendation model based on the user’s static data and user’s dynamic data,
combined Deep-RNN and collaborative filtering(CF) and produce recommendations.
The model can indicate how the user interacts with the system and makes his/her purchases.
I built a deep recurrent neural network model to solve the real-time recommendation problem. The network tracks how users interact with the recommendation server.
Each hidden layer simulating how the user accessed the item-combination and the user‘s purchase pattern. To reduce DRNN processing cost, the network only records a limited number of states. Once user’s behavior is updated, the model adjusted itself and refreshed the recommendation results. As a user behavior continues and increases, DRNN training to get more better recommendations results.The CF (Collaborative Filtering) method shows its effectiveness in many practical applications. It captures the correlation between users and projects and reveals the common interests of users. Users sharing the same interests may purchase the same set of items. My DRNN model is a good complement to the CF method. If the user follows the old purchase model, the CF method will produce good recommendations, and the RNN model can effectively predict the purchase pattern of a particular user. I integrated the recurrent neural network with a collaborative filtering, and my model showed a significant improvement over the previous recommendation service.
INTRODUCTION
Collaborative Filtering (CF) recommendations are widely used in recommendation systems based on correlations between users and projects, and predict the probability that a user will purchase a particular item. The assumption is that users sharing similar purchase histories may purchase the same set of items.
Although the Collaborative Filtering (CF) method works well in some cases, I found that it does not provide more accurate real-time recommendations because its model is built using stale data and lacks customization options. When the user enters the system, I can collect his/her basic information and I can get a complete profile of the user preferences for the CF algorithm. All of this information is stale data.
In addition to these basic properties, we have a special type of dynamic property —— timestamp. Timestamps come from the recommendation server. The server records the current user’s browsing history. The user has a long list of item records.
This list actually records how the user interacts with the system and performs the purchase, which was not available in previous CF recommendations.
More accurate real-time recommendations are needed that not only consider the user’s preference history, but also consider the user’s short-term behavior patterns, flexibly capture hotspots and points of interest, and can adjust their recommendation results during the referral process.
Based on the current viewing history and user interests, we should provide recommendations and we should continuously recommend the recommendation system model by guessing what the user really wants to buy.
Here are some challenges:
-
Amazon has thousands of items. If each item represents a state, the input to our predictive model is a huge vector indicating which item was accessed.
-
Each user timestamp list can be thought of as a combination of random number states, making learning more complex.
-
uses the user timestamp (the item record in the user purchase list) as the training set because it tracks how the user interacts with our server.
-
The training processes have to constantly updated, because as long as the server is online, we can get more records from the user, which can be further used to optimize our model and continue to use the user’s new records for training to get better recommendations. . That is, the recommendation model should update the model in real time to reflect the user’s new purchase model.
To address these challenges, in this paper, we use Deep Recurrent Neural Network (DRNN) to simulate the user’s browsing mode and provide real-time recommendation services. The DRNN consists of multiple hidden layers, each with a time feedback loop. The input layer represents the timestamp of the user accessing the project, and the parameters of the hidden layer may represent a combination of users browsing the project through the training process.
My DRNN model is used to track the user’s access mode. The item predicted by the recommendation system is displayed in the UI interface to guide the user to access his/her desired item. The purpose of the DRNN model is to return real-time recommendations. It is best to be able to click on the desired and feed back to the user with the shortest path.
My DRNN model is designed to be based on a sliding window approach that maintains a limited number of states. As the learning process continues, old states will be replaced by new states. Determining the correct window size is a very critical task. Larger windows can cause excessive computational overhead, while smaller windows can affect the accuracy of predictions.
I use a collaborative filtering algorithm that accepts the user’s purchase history as input and generates a prediction of the probability of purchasing the item. CF combines with DRNN to produce the final result. DRNN can be implemented on the recommendation server. The DRNN model updates the neural network parameters as the user continues to interact with the server. The server generates a list of new recommended items based on the recommended results from the combination of DRNN and CF. The server then responds to the customer and displays the new recommendation results.
Here are some contributions:
- Many studies are modeled on static data such as user ratings or historical purchase records, which reflects the user’s historical interest orientation, but is not time-sensitive. For example, a user is identified by the system as a favorite of all types of books, but a series of browsing records before the user’s purchase behavior indicates that he is selecting food.
- Collaborative filtering can also continuously update the rating matrix to complete the update of user preferences, but collaborative filtering can not mine the real-time behavior patterns of users. Therefore, using the DRNN method, the user performs time series-based modeling to complete the recommended selection for the user at a period of time. Combining different areas into a single learning model can help improve the quality of recommendations in all areas. The recommendation system I created is mainly for the modeling of the user’s static data and user dynamic data to jointly produce the recommendation results.
- The two algorithm are integrated together to produce the final prediction. Our results on real dataset show that the CF and DRNN approach outperforms previous CF (Collaborative Filtering) approach significantly.
OVERVIEW OF MODULE
The rest of this article is organized as follows. The II section gives
An overview of the recommended modules.The III section deals with the details of the CF and DRNN models and how to combine the results of the two algorithms. The IV section introduces the experimental results.I summarized this paper in the V section.
Personalized recommendations are a key feature of improving the user experience in an e-commerce system. Many companies collect user purchase history and apply CF algorithms to generate recommendation results for each user during the offline process. When the user logs in, we push the recommendation to him/her. In order to catch up with new buying trends, the CF algorithm is called periodically to update the recommendation results with new log data. Unfortunately, the accuracy (the probability that the user ultimately buys the recommended item) is low, and the off-line method cannot find the latest purchase pattern.
To address this challenge, I developed a new recommendation module, the DRNN model, to find a better solution. Figure 1 shows the recommended model architecture I designed. The user’s request is sent to the recommendation server, which can obtain profiles and associations about the user and the item. Timestamps are used as an input to the DRNN model to generate real-time predictions. The DRNN model works with the CF model.
In other words, we consider the user’s current interests and past interests.
Finally, the server returns the requested list of items to the user by presenting the recommendation results. As the user continues to interact with the server (and generates more requests), the model improves our predictions. Users are expected to find their items from the recommendation results with a higher probability.
COLLABORATIVE FILTERING AND DEEP RECURRENT NEURAL NETWORK
After the response is completed, the model actually gets the real result we predicted, that is, whether the user purchased the result of the recommendation system. The model can be adjusted to use new training samples. I create an index for each row’s user ID, project ID, and timestamp. For a specific user , we generate its configuration file as follows:
Let be the specific user’s list of item records.
Let
be the timestamp of the last item record of
.
is a predefined a timeout threshold (e.g., 30 minutes).
Denote for a specific user
list of item records as
,where
is the item record.
The user’s history documents
are used as the input for the DRNN model to adjust the weights and bias values of the neural network. To reduce the learning cost,I use SGD to update parameters.
ITEM-BASED COLLABORATIVE FILTERING
The recommendation system essentially solves the problem of information overload, contacts users and information when the user’s needs are not clear. On the one hand, it helps users to find information that is valuable to them, on the other hand, the information can be displayed to users who are interested in it. In front of us, to achieve a win-win situation for information consumers and information producers (the meaning of information here can be very broad, such as consulting, movies and commodities, collectively referred to as item)
Collaborative filtering is mainly divided into neighborhood-based and implicit semantic models. Among the neighborhood-based algorithms, Item-based CF is widely used. The main idea is that “users who like item A mostly like user item B”, and use the group wisdom to generate item recommendation list of item by mining the operation log of user history.
The principle is to achieve the recommendation by comparing the data of the user and other users. The specific method of comparison is to calculate the similarity between two user data by calculating the similarity between two user data and calculating the similarity. The design of the similarity function must satisfy the three requirements of the metric space, namely non-negative, symmetry and triangular inequality. Commonly used similarity calculation methods are: Euclidean distance method, Pearson correlation coefficient method and angle cosine similarity method.
The basic idea of User-based is that if user A likes item a, user B likes items a, b, and c, and user C likes a and c, then user A is considered to be similar to users B and C because they all like a, but like The user of a also likes c, so recommend c to user A. The algorithm uses the nearest neighbor (neighbor-neighbor) algorithm to find a set of neighbors of a user. The user of the set has similar preferences to the user, and the algorithm predicts the user according to the preference of the neighbor.
There are two major problems with the User-based algorithm: 1. Data sparsity. A large e-commerce recommendation system generally has a large number of items. The user may buy less than 1% of the items. The overlap between the items purchased by different users is low, and the algorithm cannot find a user’s neighbor, that is, the preference. Similar users. 2. Algorithm scalability. The calculation of the nearest neighbor algorithm increases as the number of users and items increases, and is not suitable for use in situations where the amount of data is large.
The basic idea of Iterm-based is to calculate the similarity between items based on historical preference data of all users in advance, and then recommend the items similar to the items that the user likes to the user. Taking the previous example as an example, you can know that items a and c are very similar, because users who like a also like c, and user A likes a, so recommend c to user A.
Because the direct similarity of the items is relatively fixed, the similarity between different items can be calculated online in advance, and the results are stored in the table. When the recommendation is made, the table can be searched to calculate the possible scores of the user, and the above two can be solved simultaneously. Questions.
Item-based algorithm detailed process:
1, similarity calculation: Item-based algorithm is preferred to calculate the similarity between items, there are several ways to calculate similarity:
(1). Based on cosine-based similarity calculation, the similarity between items is calculated by calculating the cosine of the angle between two vectors.
(2). Calculating the Pearson-r correlation between two vectors based on Correlation-based similarity calculation
$$
\frac{\sum_{u \in U}^{ }(R_{u,i}-\bar{R}{i})(R{u,j}-\bar{R}{j}) }
{\sqrt{\sum{u \in U}^{ }(R_{u,i}-\bar{R}{i})^{2}} * \sqrt{\sum{u \in U} ^{ }(R_{u,j}-\bar{R}_{j})^{2}}}$$
2, predicted value calculation: weighted summation. Used to weight the sum of the scores of the items that the user u has scored, the weight is the similarity between each item and the item i, and then the sum of the similarities of all items Average, calculate user u to score item i