As more and more data become available through the Internet, the average user is often confronted with a situation where he is trying hard to find what he is looking for, under the pile of available and often misleading information. This phenomenon is described as the information overload problem and became the reason of development of an interesting concept of information retrieval: recommender systems.
As information overload is defined the problem occurring when the load of information at which a user is exposed to, goes beyond his processing capacity (Mulder, 2006). Recommender systems are systems that attempt to give a solution to the overload problem by making personalized suggestions to the user about items that have not yet considered based either on features of the items themselves, available information on preference patterns of the user or patterns of other users (Montaner, 2003).
A recommender system can be conceived as an advisor that helps the user navigate through the chaotic information space and alleviate his difficulty in choosing items that are the most appropriate for him between all the available options.
Motivation for me to deal with the field of recommender systems was The Netflix Prize competition (http://www.netflixprize.com/). Netflix is an online movies rental store and on 2006 they announced a competition in which developers were called to make a recommender system that would improve the accuracy of their existing system, by 10% or more. Although I did not have the chance to actively get involved in the competition, it became the seed that led to the idea of this project.
Witnessing the interest of the research community in response of the Netflix Prize and the constructive competition that followed, inspired me to undertake this project in order to investigate further this challenging field and put my own ideas into test.
Project aim and objectives
Aim of this project is to build a recommender system and make it competitive compared to existing approaches by incorporating appropriate techniques to improve its performance. In order to achieve this aim the following objectives have to be accomplished.
- Investigate, through the literature review, the different methodologies suggested in the bibliography to enhance the performance of recommender systems.
- Identify techniques that applied to the developed system can lead to an improved performance.
- Evaluate the effectiveness of these techniques by designing an experimental setup meeting the standards set in the literature and testing the system`s performance using appropriate metrics.
The rest of this document is organized as follows:
Chapter 2, Literature Review, provides background information on recommender systems. The differences on the implementation based on the implementation context, the common challenges faced and the methods of evaluating recommender systems are discussed. Finally in order to demonstrate the variety of techniques used some of the most up to date proposed approaches are presented.
Chapter 3, Experimental Design and Implementation This chapter describes the proposed approach to the recommendation problem. Starting point is a base model that uses classification algorithms to predict the ratings over the entire item space. By identifying the weaknesses of the base model, we suggest an alternative approach. In this chapter the details of the experimental procedure followed in order to test the performance of both the base and the proposed systems are presented along with the experimental results for the base system which set the benchmark for comparison.
Chapter 4, Results and Evaluation This chapter describes in detail the implementation decisions taken during the development of our approach and the way they affect the performance of the system. The results of the experiments for the proposed system are presented, followed by a discussion about their interpretation. Finally the proposed approach is compared with the base system in order to evaluate the effectiveness of our technique.
Chapter 5, Conclusions This chapter concludes the project by discussing what was accomplished in the research conducted and by describing the directions of future work.
This chapter provides background information on key points pertaining to recommender systems. It discusses how the implementation context dictates variations of approaches, what are the common challenges met regardless the implementation details and what are the metrics used in order to measure the performance of recommender systems. Finally some of the most up to date proposed approaches are presented in order to demonstrate the variety of techniques used.
Context of implementation
Recommender systems were developed and work under very different contexts of implementation, from recommending music to suggesting banking products. While the main working idea behind each of those systems remains the same, to filter the information space and present to the user the items that most closely satisfy his needs and taste, the different implementation contexts leads to a number of variations in the techniques used.
Recommending in a community of users
Recommending content within the boundaries of a group of users sharing the same information space is the kind of recommender systems considered the most traditional. A number of users are gathered forming a virtual community which is involved in a common subject and express their opinion on different items of their interest space through explicit ratings. Representatives of this category are the recommendation systems for movies such as MovieLens and IMDB, music such as Last.fm and Pandora, or books such as the WhatShouldIReadNext.
The main characteristic of implementations in this area is that they take advantage of the existence of the explicit ratings from the users for items. Based on these ratings the system can form an opinion about the preferences of each individual and base the recommendations on a user profile built upon this information (Mukherzee, 2003). The recommendation can come from the user`s own preference, in the content-based approach (we suggest you will like item A, because in the past you liked item B, and items A and B are similar), from the preferences of other users in the collaborative filtering approach (we suggest you will like item A because some other users that have similar taste to you, liked item A, so we assume you will like it too), or from combinations of these two approaches in hybrid systems (Burke, 2002).
Recommender systems in E-commerce
Recommender systems were introduced and today are widely used into the field of the e-commerce with the most known representative probably being Amazon.com. The object of such systems is to recommend to the user products that he may be most interested in buying among all the available alternatives.
Characteristics of e-commerce recommender systems are that usually there are no explicit ratings available from the user. Because of this, such systems can either base their recommendations on purely content-based techniques or try to model the user preferences based on implicit information, with the latter being the most common approach (Mobasher, 2000). Such implicit information can be the buying history of the user or their navigation patterns through the shops site (Cho, 2002).
Another characteristic of systems for e-commerce is their often ephemeral relationship with the customer. Unlike the rating communities discussed before, the user here has no reason to return to the site unless he intends to buy something. That can vary from few times a week in the case of a supermarket portal, to one time a year in the case of a computer selling site, or even rarer if the product is cars (Felfernig, 2008). This characteristic constraints the e-commerce systems to work with very limited available information about the user. In such cases collaborative filtering techniques or even the content based ones cannot perform well. To solve this problem a number of different techniques such as the constraint based (Felfernig, 2008) and the knowledge based recommendation (Burke, 1999) have been developed.
The techniques for constraint and knowledge based recommendations become even more important in the case of more complex products, such as financial services and tourism packages, for which the use of recommender systems become increasingly popular. Examples of such systems are Triplehop's TripMatcher and VacationCoach's Me-Print for tourism (Berka, 2004) and FSAdvisor for financial services (Felfernig, 2005). Here, the only way to suggest effectively a service to the user is through specific expert domain knowledge and the satisfaction of certain needs and restrictions in a stepwise process (Jannach, 2009).
Recommending Web content
Recommender systems are being used as part of the wider effort towards a more personalized web experience. Web pages, news items or articles are suggested to the user according to his preferences. As in the e-commerce area, here as well there is usually absence of explicit ratings from the user. The user preference modeling is made through the study of his web usage patterns. The web logs are analyzed and information such as the pages visited, the path followed and the time spent in each page are exploited in order to implicitly construct the user profile (Nasraoui, 2003).
Types of Recommender Systems
Modern recommender systems can be classified into three broad categories, content-based recommender systems, collaborative filtering systems and hybrid systems. In the following section is provided a brief description of these categories accompanied by some of the most recent approaches proposed. The publications presented as part of this chapter belong to the last two years in order to capture the most up to date directions followed.
Content based recommender systems
Content-based filtering approaches recommend items for the user based on the descriptions of previously evaluated items. In other words, they recommend items because they are similar to items the user has liked in the past (Montaner, 2003).
Some examples of recent content based approaches include the work of Zenebe (2009) who makes use of fuzzy modeling techniques in the item features description, Park (2008) who proposes the clustering of the items with low popularity together, using the EM algorithm in combination with classification rules, in order to improve the quality of the recommendations for the items with few ratings and Felfernig (2008) who explores the use of constraint-based recommendation in his implementation, where the recommendation is viewed as a process of constraint satisfaction.
Collaborative filtering recommender systems
The collaborative filtering technique matches people with similar interests and then makes recommendations on this basis. Recommendations are commonly extracted from the statistical analysis of patterns and analogies of data extracted explicitly from evaluations of items given by different users or implicitly by monitoring the behavior of the different users in the system. (Montaner, 2003).
Directions identified in collaborative filtering include:
Latent factor approaches with characteristic the transformation of items and users at the same latent factor space (Koren, 20010). Recent proposed methods in this category include the work of Chen (2009) who makes use of orthogonal nonnegative matrix tri-factorization in order to alleviate the sparsity problem and to solve the scalability problem by simultaneously clustering rows and columns of the user-item matrix, Rendle (2008) who proposes the use of kernel matrix factorization, where a generic method for learning regularized kernel matrix factorization models is suggested, from which an online update algorithm is derived that allows solving the new-user/new-item problem, Takacs (2009) proposes the use of incremental gradient descent method for weight updates, the exploitation of the chronological order of ratings and the use of a semi-positive version of the matrix factorization algorithm, Weimer (2008) suggests as extension to the maximum margin matrix factorization technique the usage of arbitrary loss functions, while an algorithm for the optimization of the ordinal ranking loss is used, Ma (2009) who proposes a semi-nonnegative matrix factorization method with global statistical consistency and Koren (2010) who introduces a new neighborhood model based on the optimization of a global cost function. A second, factorized version of the neighborhood model is also suggested, aiming to improve the scalability of the algorithm.
Fuzzy logic related techniques as the work of Campos (2008) who makes use of fuzzy logic to deal with the ambiguity and vagueness of the ratings, while at the same time uses Bayesian network formalism to model the way the user's ratings are related,
Methods of improving the system`s performance by enhancing the information space. The way this can be done varies. Some examples proposed in the literature are found in the work of Jeong (2009) who introduces the user credit as a new way to measure the similarity between users in a memory based collaborative filtering environment. The user credit is the degree of one's rating reliability that measures how adherently the user rates items as others do, the approach of Yang (2009) who suggests a collaborative filtering approach based on heuristic formulated inferences. The main idea behind this approach is that any two users may have some common interest genres as well as different ones. Based on this the similarity is calculated, by considering users' preferences and rating patterns. Umyarov (2009) proposes in his research the combination of external aggregate information with individual ratings. Hijikata (2009) proposes a discovery-oriented collaborative filtering algorithm that uses the so called profile of acquaintance, used to map the knowledge, or the lack of it, about items. Kwon (2008) aims to find new recommendation approaches that can take into account the rating variance of an item in the procedure of selecting recommendations. Amatriain (2009) tries to improve the system`s accuracy by reducing the natural noise in the input data via a preprocessing step, based on re-rating the items and calibrating the recommendations accordingly. Massa (2009) proposes to replace the step of finding similar users on which the recommendation will be based, with the use of a trust metric, an algorithm able to propagate trust over a network of users in order to find peers that can be trusted by the active user. Lakiotaki (2008) Propose a system that exploits multi-criteria ratings to improve the modeling of the user's preference behavior and enhance the accuracy of the recommendations. Koren (2009) Introduce the tracking of temporal changes in the customer's preferences in order to improve the quality of the recommendations provided.
Artificial Immune Network techniques as the work of Acilar (2009) who proposes a collaborative filtering model, constructed based on the Artificial Immune Network Algorithm (aiNet). Through the use of artificial immune network techniques, the system tries to address the data sparsity and scalability problems by describing the data structure, including their spatial distribution and cluster inter-relations.
Markov chain model based techniques such as the methods proposed by Yildirim (2008) who suggests an item-based algorithm, which first infers transition probabilities between items, based on their similarities and then computes the predictions by modeling finite length random walks on the item space, the work of Zhang (2008) who suggests a Topical PageRank based algorithm, which considers item genre to rank items. It is made an attempt to correlate ranking algorithms for web search with recommender systems. Specifically, it is attempted to leverage Topical PageRank, to rank items and then recommend users with top-rank items and the work of Bonnin (2009) who uses Markov models inspired from the ones used in language modeling and integrate skipping techniques to handle noise during navigation. Weighting schemes are used to alleviate the importance of distant resources.
Hybrid recommender systems
Hybrid recommender systems combine two or more recommendation techniques to achieve better performance and overcome problems faced by their one-sided counterparts. The ways that recommendation systems can be combined differs greatly. A good overview is given in (Burke, 2002).
Examples of recent approaches include the work of Porcel (2009) who proposes a fuzzy linguistic recommender system designed using a hybrid approach and assuming a multi-granular fuzzy linguistic modeling, the work of Al-Shamri (2008) who proposes a hybrid, fuzzy-genetic approach to recommender systems. In his approach in order to improve scalability, the user model is employed to find a set of likeminded users. In the resulting, reduced set, a memory-based search is then carried out to produce the recommendations, the work of Nam (2008) who focusing their research on solving the user-side cold start problem, develop a hybrid model based on the analysis of two probabilistic aspect models using pure collaborative filtering to combine with users' information and the work of Gunawardana (2009) who make use of unified Boltzmann machines, as probabilistic models that combine collaborative and content information in a coherent manner.
Recommender systems suffer from some common problems. The most usual ones and those that have drawn the most of the researcher's attention are the cold start and the data sparsity problems that can potentially lead to poor recommendations. Also due to their nature of implementation, recommender systems often face scalability problems. Other than these, there are a number of smaller problems that can also affect negatively the performance of the system and have become the reasons behind the introduction of some of the more innovative techniques at the recommender systems landscape. Such problems are the interest drift, the noisy data and the lack of diversity.
The cold-start problem occurs when items must be proposed to a new user without having previous usage patterns to support these recommendations (new user problem), or when items are newly introduced to the dataset thus lacking ratings from any user (new item problem) (Rashid, 2002). Both of the two faces of the cold start problem are commonly met in the recommendation system field, and result in poor recommendation quality. Modern commercial systems are constantly expanding and new items are added in constant basis, while new users become members as often. The collaborative filtering techniques are especially sensitive to the cold-start problem and for this reason a solution often suggested is the hybridization of the system with the use of a content-based method that can be used in the recommendation procedure of the new items or users.
In large e-commerce systems there are millions of participating users and equally as many items. Usually even the most active users have purchased or rated only a very little fraction of the whole collection. This leads to a sparse user-item matrix which affects the ability of the recommender system to form successful neighborhoods. Making recommendations on poorly formatted neighborhoods, results at poor overall recommendation quality (Acilar, 2009). The above problem is known as the data sparsity problem, and is one of the most challenging in the recommendation system field.
Modern recommender systems are applied to very big datasets for both users and items. Therefore they have to handle very high dimensional profiles to form the neighborhood while the calculation cost of the algorithms used grows with both the number of items as well as with the number of users (Acilar, 2009). Recommendation tactics that may work well and be effective when applied to small datasets under lab testing conditions may fail in practice because they cannot be effectively applied in real usage scenarios.
Other Potential Problems
Apart from the three commonly faced challenges mentioned above, researchers have tried to address a number of different problems. In this section we briefly present some of them.
- The interest drift problem. By the term interest drift in the recommender systems context we refer to the phenomenon that the taste and the interests of users may be altered over time or under changing circumstances, leading to inaccurate recommendation results (Ma, 2007). A once valid recommendation may not still be accurate after the user has changed his preference patterns. In order to counter fight this, the recommendation models should not be static but it must evolve and adapt itself to the changing preference environment in which it is called to work in.
- The noisy data problem. At the case of systems where the input data are explicit (e.g. ratings) and not implicit (like web logs), there is an extra data noise added coming from the vagueness of the ratings themselves as a product of the human perception (Campos, 2008). The given ratings are only an approximation of the user's approval on the artifact that he is rating and are restricted by the rating scale`s accuracy. For example in a rating scale of five stars, a user may give a movie three stars, but if he had the opportunity to rate the same movie in a percentage scale he may give something different than 60%. Results may be even more different if the scale he was called to rate the movie on, was something like "I hated it - it was average - I loved it".
Moreover a fuzziness of the rating is also introduced by the user himself and his own ratings may differ at another time, place or emotional condition. It has been reported (Amatriain, 2009) that if users are called to rate again movies that have seen and rated at the past, their new ratings will differ respectably from their original ratings. Cases that users did not even remember seeing the movie they have rated at the past were also not uncommon.
Finally there are deviations in the ratings characterizing the overall voting trends of either the user, or the items. For example a user may be strict and have a tendency to give lower ratings than the average reviewer, or from the item`s point of view, there may be deviations affecting positively or negatively the ratings that the items receives. From another perspective, a movie that is considered "classic" may tend to receive higher ratings than it would normally receive without its reputation affecting the audience, while those observed trends may or may not be static over the time. (Koren, 2009) For example there is a chance that a viewer becomes stricter as he grows up and as a result his ratings become more biased towards the lower end of the scale compared to his past the ratings.
All these factors introduce noise in the data and can have a negative effect on the accuracy of the recommendations.
The lack of diversity problem. Most of the researcher`s efforts are focused on making the recommendation produced by a recommender system more accurate. Lately thought, there are argues raised that accurate recommendations are not always what the user may be expecting from a recommender system (Zhang, 2008). To start with, the logic of such a system is to help the user select items for which he has not formed an opinion of his own yet. If the system keeps suggesting items that are too similar to the ones he is already familiar with, then the systems self-cancels, to a point, his own purpose. We can assume that the user can speculate the rating of an item too close to an item he is already knows about, without the need of an elaborate recommender system. What the user is looking for is from the system to help him estimate the rating of an item that he could not rate himself without the assistance of the recommender, solely based on his own experience. This problem is also referred as the over-specialization problem describing the situation where items too close to the items already returned from the user are returned as recommendations (Abbasi, 2009).
Evaluation Metrics for Recommender Systems
Evaluating a recommender system can be a complex procedure. Many different metrics have been proposed to evaluate the successfulness of recommender systems. In the following sections are presented the most commonly used.
Suggesting the non-obvious
While the accuracy metrics provide a good indication of the recommender system`s performance, there must be a distinction made between the accurate and the useful results (Herlocker, 2004). For example, a recommendation algorithm may be adequately accurate by suggesting to the user popular items with high average ratings. But often this is not enough. To some extent this kind of predictions are self-explanatory and offer no useful information to the user, as they would be the items for which the user would less likely need help to discover by himself.
The coverage can be defined as a measure of the domain of items over which the system can make recommendations (Herlocker, 2004). In its simplest form, coverage is expressed as the percentage of the items for which the system can form a prediction over the total number of items.
Along the same line of thought, other metrics such as novelty (Konstan, 2006) and serendipity (Murakami, 2008) have been proposed, for measuring how effectively the system recommends interesting items to the user which he might not otherwise come across.
Experimental Design and Implementation
This chapter describes the proposed approach to the recommendation problem. Starting with a base model we identify its weaknesses and we will try to improve its performance by incorporating our approach. Both the base and the proposed systems are tested in order to compare their relative performance. In this chapter the details of the experimental procedure followed are presented along with the experimental results for the base system which set the benchmark for comparison.
Recommendation as classification
Recommending items to a user can be seen as a classification task: Given a set of known relationships, train a model that will be able to predict the class of unseen instances. If we consider each user-item pair as instance and the rating as the class, recommendation can be treated as classification where the system is called to assign the unknown degree of preference of a user towards a given item to one of the possible class values, consisting of the points of the rating scale. Assuming a two point scale, we have a binary classification problem and the outcome could be that a user will either like or dislike the item, depending on the classification result. In a more multivariate scale, the result could be that the instance of user-item pair U-I is assigned to class 4 or in other words that user U is predicted to give item I a rating of 4. If we treat the rating as continuous value instead of nominal, the task becomes a regression problem.
It is under this prism that the recommendation problem is viewed in the work conducted as part of this dissertation. Prediction models are built on the known instances and then used to predict the values of the unknown ratings.
The MovieLens dataset was used for the experiments conducted. MovieLens dataset is created by the GroupLens Research Project group at the University of Minnesota through the MovieLens web site (movielens.umn.edu) and contains 100.000 ratings on a numeric five point scale (1-5) for 1682 movies provided by 943 users, with each user having rated at least 20 movies. Simple demographic data consisting of the age, gender, occupation and zip-code are provided for the users, while the information about the movies is title, release date, video release date, IMDB url and genre.
Trying to train any model only on the original dataset would prove highly ineffective since the information provided would not be sufficient to produce rigid rules. For this reason the data had to be enhanced in a way that it would allow the model to obtain as much information as possible during the training step.
Following the methodology of Park (2008) who has also experimented on the use of classification rules for recommendation, a number of derived variables were produced from the original data and used as independent variables in the models. More specifically the derived variables used are:
- c_aver_rating: The average rating of the user for the items he has rated at the past.
- c_quantity: The number of items the user has rated.
- c_seen_popularity: The average popularity of the items that the user has rated at the past.
- c_seen_rating: The overall average rating of the items the user has rated before. The overall average rating is the average of all the ratings given to the item by all the users.
- c_like_popularity: The average popularity of the items that were rated higher by the user, compared to his average rating.
- c_like_rating: The overall average rating of the items rated higher than the user`s average rating.
- c_dislike_popularity: The average popularity of the items that were rated lower by the user, compared to his average rating.
- c_dislike_rating: The overall average rating of the items rated lower than the user`s average rating.
- I_aver_rating: The average rating of the item.
- I_popularity: The popularity of the item.
The variables 1-8 are the user related values while variables 9 and 10 are the item related variables.
It should be noted here that Park (2008) proposes the use of an extra item related variable, namely I_likability defined as the difference between the rating of the user and the item`s average rating. The problem with this is that since we treat the rating as the class, there would be no way to know the value of the I_likability for a new instance beforehand. Although the value could be calculated for the experimental dataset, this would not be feasible in a real case scenario. For this reason the I_likability variable was not used in the variables set.
The attributes of the dataset, as visualized by Weka can be seen in Appendix 3.
The original data were loaded to an MSSQL Server 2008 database table from the MovieLens dataset and then the derived values were calculated and stored. The final form of the table, used in the data mining models can be seen in Figure 2. A snapshot of the actual data in the enhanced data table can be found in Appendix 2. Even though the variables UserID, MovieID, DateStamp and InstanceID are part of the dataset they were ignored from the models during the training step since they provide no useful information and would add noise at the data.
Building the predictive models
The Weka machine learning toolkit was used in order to build the predictive models of the experiment. Weka provides a big collection of classification, association and clustering algorithms and can be run from the GUI, using the command line or called from within a program as external library. The later was the approach followed since it provides a greater flexibility on the process. Java was chosen as the implementation language since Weka itself is written in Java and the routines provided could be used from within the developed program without any intermediate steps needed. Although Weka can be used with a number of different environments (including .NET) that would add unnecessary complexity to the project. Eclipse was used as the implementation platform.
One of the early and important decisions that had to be taken during the implementation was whether to treat the rating, which was the class for the models, as nominal or numeric value. This choice would dictate the range of models that could be used as some can work only with nominal classes while other only with numeric. While the rating is in fact nominal (a user can rate an item with 3 or 4 but not with 3.5) I chose to treat it as numeric, in the expectation of providing this way a finer granularity to the system and producing more accurate and useful results. If the predicted value was needed to be sent back to the user as actual recommendation it would be easy to round it to one of the allowed ratings within the rating scale.
Again following the initial methodology of Park (2008), for each of the items in the item list a separate predictive model was built. If we need to predict the rating of item I having 200 ratings for a user U, the model is built using those 200 known instances and is used to predict the unknown rating for U. In this way for a dataset containing n different items, n different models will be built.
Five different types of predictive models were implemented and tested:
- Simple Linear Regression (SLR) which "learns a linear regression model based on the single attribute that yield the smallest squared error". (Witten, 2005: 409).
- Locally Weighted Learning (LWL) which "assigns weight using an instance-based method. After this the classifier is built from the weighted instances" (Witten, 2005: 414).
- RBFNetwork (RBF) which "implements a Gaussian radial basis function network deriving the centers and widths of hidden units using k-means and combining the outputs obtained from the hidden layers using logistic regression if the class is nominal and linear regression in the case of numeric class (Witten, 2005: 410).
- Sequential Minimal Optimization algorithm for support vector regression (SMOreg) which "implements the sequential minimal optimization algorithm (SMO) training a support vector classifier with the use of polynomial or Gaussian kernels." (Witten, 2005: 410). SMOreg is the SMO version for regression problems.
- M5Rules which "obtains regression rules from model trees built using M5" (Witten,2005: 409)
These five models were chosen in an attempt to test the effectiveness of our approach using a diverse set of techniques namely linear regression, radial basis function networks, lazy classifiers, support vector machine classifiers and model trees.
Base Model evaluation
In order to evaluate the predictive models built, we use two performance measures, Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). As discussed in section 2.4.1, MAE measures the average absolute deviation between each predicted rating P(u, i) and each user's real ones p(u, i) over the number N of items and is given by the equation
Since the prediction of the ratings occurs by solving the problem as regression, these two metrics will give a good indication of how well the predicted values produced by the algorithms approximate the actual rating values.
We applied 10-fold cross validation and calculated the MAE and RMSE. The original dataset was randomly partitioned in 10 independent samples and each time one of the samples was used as test data while the rest used as training data for the model. The procedure was repeated 10 times, with each of the subsamples acting as test data exactly once. The resulting MAE and RMSE errors are the average errors over the 10 repetition of the testing. 10 was chosen as the number of folds because according to Witten (2005), 10 is shown to produce the better estimation of errors among the rest of the k-fold alternatives. Moreover 10-fold cross validation is a commonly used technique in the recommender system literature for error estimation, making the comparison of our results with the results of published work easier.
The errors calculated for each of the movies using the 5 models are presented in Figure 3 and Figure 4. The errors are ordered by the popularity of the movie.
The experimental results we got from this series of tests do seem to match with the findings that Park (2008) reports, as confidently as we can conclude to this based on the volume and the format of the results presented at his research paper.
Observing the graphs and the overall error tables we see that the different prediction models do have different performance with the difference being generally constant over the different items popularity. SMOreg and SLR are the two best predicting models, with LWL being the worst.
But the two Figures indicate an underlying trend that governs all the models and is more important from the sheer performance of each algorithm. It shows that the errors increase as the popularity of the items decrease, almost doubling from the one edge to the other. For example the MAE error for the SMOreg algorithm is 0.666542 for the biggest average popularity of 420, and becomes 1.169974 for the smallest popularity. This problem occurs because the prediction models don't have enough data to produce accurate rules for the less popular items.
In order to illustrate the potential impact of this weakness to the recommender`s performance we present at Figure 5 the histogram of the items rating frequencies for the MovieLens dataset.
As can be seen in the figure, the number of items having few ratings is very significant. If the recommendation will be done solely based on the approach presented at section 3.3 the system will often perform poorly. It is important to try and improve the performance of the models for this exactly the problematic area.
Enhancing the system`s efficiency via latent factor modeling.
Dimensionality reduction techniques such as Latent Semantic Indexing (LSI) originate from the information retrieval field and try to solve the problems of polysemy and synonymy (Deerwester, 1990). By using LSI we attempt to capture latent associations between the users, items and the given ratings, not apparent in the initial dataset. The reduced dimensionality space resulting is less sparse than the original data space and can help us determine clusters of users or items by discovering the underlying relationships between the rating instances (Sarwar, 2000). LSI using Singular Value Decomposition (SVD) as the underlying matrix factorization algorithm is the most commonly technique used in the recommender system literature, and this is going to be used in the current experimental setup, aiming to improve the prediction model performance.
Applying SVD to generate predictions
The initial step in using SVD to the recommender system is to produce the matrix R. Matrix R is the user-item matrix where users are the rows of the matrix, items are the columns and the values consist of the ratings so the value Ri,j represents the rating of user Ui for the item Ij.
Because of the sparse nature of the data, the resulting matrix R is also very sparse. As a pre-processing step, in order to reduce the sparsity we must fill the empty values of R with some meaningful data. Two available choices are to use the average ratings of the user (rows average) or the average ratings of the items (columns average). Following Sarwar (2000) we proceed with the later as in his findings reports that the items average provided better results. The same author suggests as next step in the pre-processing procedure to normalize the data of matrix R. This is done by subtracting the user`s average from each rating. Let the resulting matrix after the two pre-processing steps be Rnorm. Normalization as described above was used during the implementation.
We can now apply SVD to Rnorm and obtain matrices U, S, V and reduce the matrices to dimension k obtaining matrices Uk, Sk and Vk. These matrices can now be used in order to produce the prediction value Pi,j of the ith user for the jth item as:
Using SVD to form neighborhoods
Although applying SVD to produce score predictions as discussed above is really useful, SVD is used in a rather different context in the current implementation. What we are really interested in is the ability to form quality neighborhoods of either users or items. The reduced dimensional space produced by the SVD is less sparse than the original space. This advantage can lead to a better performance during the neighbor selection (Sarwar, 2000).
The matrix is the k-dimension representation of the users while the matrix is the k-dimension representation of the items. We can calculate the similarity between the observations using a distance measure, such as cosine similarity, Pearson`s correlation, mean squared distance or Spearman correlation at one of the two matrices and produce clusters of users or items respectively. What we are going to evaluate in this implementation is whether the formation of clusters of items in the reduced dimension space combined with the base model will improve its accuracy.
The similarity measure used for building the neighborhood was the cosine similarity. Cosine measure finds the cosine of the angle between two vectors A and B in order to calculate their similarity. It is defined as:
Training the models
As before, the prediction of the ratings were produced by building the data mining models on the training set, created using again 10-fold cross validation as on the first group of experiments. The big difference this time was that the models were built not for every item but for each cluster of items.
This means that in order to predict the rating of user Ui for the item Ij, first it was determined in which cluster Ii belonged to and then the model was built using the data from all the items belonging to that cluster.
The expectance from this approach is to provide to the predictive model enough information in order to produce quality rules. This was especially important for the items with few ratings as shown in section 3.4 for which the lack of enough supportive information lead the models in producing inaccurate predictions.
Results and Evaluation
This chapter describes the implementation decisions tested during the development of our approach and the way they affect the performance of the system. For each implementation decision detailed results from the experiments performed are presented, followed by a discussion about their interpretation. Finally the proposed approach is compared with the base system in order to evaluate the effectiveness of our technique.
Application of the technique in the implementation
As described in section 3.5.2, using Matlab the user-item matrix was created by porting the user, item, rating triplets from the SQL database containing the original dataset. The matrix was then normalized and SVD was applied. The resulting matrices U, S, V were finally reduced in k-dimensionality.
The choice for the value of k was based on the findings of both (Sarwar, 2000) and (Gong, 2009) for the same dataset and was fixed at 14.
Next step of the process was the formation of the neighborhoods. The K-means clustering algorithm using cosine distance was applied on the reduced 168214 matrix allocating each of the items to a corresponding cluster. The resulting cluster memberships were finally ported back to the database in order to be used from the system in the prediction model building.
Defining the parameters of the system
Choosing the number of clusters
One parameter that had to be considered was the number of clusters used during the neighborhood formation. In order to verify if and how much the choice of the number of clusters affected the prediction quality, five different clustering sizes were tested, and using the procedure described in the previous section the MAE and RMSE errors were calculated for each one. In order to get comparable results the same prediction model was used in all the 5 repetition of the experiment. This model was arbitrary chosen between the two better performing models and was the SMOreg.
The number of clusters used in the study was 10, 30, 50, 70 and 100. The MAE and RMSE errors produced by those cluster sizes, using SMOreg as the model can be seen in Figure 6 and Figure 7 respectively, labeled as MAE_ SVD_10 for the MAE errors of the experiment using 10 as the number of clusters, MAE_ SVD_30 the MAE errors of the experiment using 30 as the number of clusters etc. The results were once again averaged and ordered by item popularity as before.
Defining the importance of the predictive model
In section 3.4 we provided the relative performance of the 5 different predictive models, applied on the un-clustered data. The question that arises at this point is whether this relative performance will be the same when the models are used in combination with the SVD produced clusters.
Comparison of the clustered with the un-clustered approach
The most important comparison made as part of this work is the one showing the difference in performance of the suggested technique using SVD as way to form clusters of items with the original model approach of the predictive models built for each item. Figure 10 and Figure 11 show the MAE and RMSE errors for the same model, SMOreg, as it was proved to work well in both occasions. The number of clusters used in this comparison for the SVD version is set to 70, the best performing identified size.
The proposed solution addressing the cold start problem
As discussed in section 2.3 introducing new items to a recommender system can lead to poor performance. In section 3.4 we showed by evaluating the base model how the low number of ratings affected negatively the performance of the system.
In this set of experiments we showed that the proposed method improves significantly the accuracy of the base model, that its accuracy is less sensitive to the item popularity, and that the improvement introduced compared to the un-clustered model is bigger for the items with few ratings.
The above three attributes of the proposed approach are good steps towards the solution of the cold-start problem. While still the accuracy drops going from the high end of the popularity scale towards the low end, it always remains close to the overall mean error. That means that a newly introduced item will no longer receive poorly accurate recommendations because it cannot provide enough information to support the creation of effective rules from the classification models since by using the information of the items belonging to the same cluster we can improve the accuracy of the recommendation.
Execution time performance and scalability discussion
Scalability as discussed in section 2.3.3 is an important aspect for any recommender system. Characteristic of the recommender systems is that they deploy in large user-item spaces that depending upon the implementation context can extent to several millions of transactions in a large scale E-commerce site. In the case that browsing patterns are used to indicate the product preference these transactions will be more than the sheer combination of users-items and users-users for user similarity based systems or items-items for item similarity based system as in the approach developed here. At the same time the recommendation must be presented to the end user in a timely manner and many recommendations per second must be produced for all the active customers in a high-traffic site.
In order to test the potential scalability of the proposed system we performed measurements of the execution time required to produce recommendations over the entire experimental Dataset of 100.000 instances.
Execution time performance of the proposed system across different models
Figure 14 presents the average execution time (in seconds) needed to complete a full test run for each one of the different predictive models and the number of clusters set at 70. The times are the average of five repetitions of the experiment on the development machine consisting of an Intel Core 2 Duo T9400 processor (2.53 GHz, 1066MHz FSB, 6MB L2 cache) with 4 GB DDR2 memory running 32-bit Windows Vista. The machine load was tried to be kept minimum and uniform across all the experiments.
It should be noted here that the times presented here can be used in order only to compare the relative execution time required for the different predictive algorithms as an unknown, and probably big, percentage of the total execution time was used during the cross-validation and attribute filtering steps by Weka making the results not representatives of a real use scenario.
We can see that the execution times needed greatly vary depending on the algorithm used, with the worst performing algorithm being SMOreg and the best one being Simple Linear Regression. While this was expected and directly linked to the inner algorithm complexity, what was an interesting observation was the fact that the analogy better accuracy - worst performance did not hold truth for all the models under test. To demonstrate this clearer Figure 15 presents groups the RMSE errors and the execution time needed for the different models. For example we can see that M5Rules produce low error rates while performing reasonably well time-wise.
Execution time performance of the proposed system across different cluster sizes
The number of clusters used during the neighborhood formation had an immediate effect on the time performance of the system. The less clusters used (more items per cluster) the more time the predictive models need to be built. Figure 16 presents the execution time needed for the different number of clusters used, with SMOreg as the predictive algorithm.
From the results we got during this experiment we are led to the conclusion that while having more clusters means that more models have to be trained (one model per cluster) the reduction in time that occurs for the training of each cluster eventually leads to better performance.
Scaling the system to be applied in a much bigger real use scenario would presuppose the finding of the best performing number of clusters to be used, depending on the population of the dataset. Although this appears feasible, from the evidences we have, we cannot safely assume that the accuracy of the predictions will follow the same patterns as these of the experiment, linking the accuracy with the number of clusters described in section 4.2.1.
Nevertheless the fact that we have strong evidence to conclude that the worst (accuracy-wise) performing number of clusters was the smallest one (10) while the best (accuracy-wise) performing models introduced insignificant changes (for example 70-100) and using the results from the execution-time series of tests indicating that the best (time-wise) performing number of clusters were the bigger ones we can say that we can improve the throughput of the system by increasing the number of clusters without suffering loss of accuracy.
Singular Value Decomposition related scalability
Singular value decomposition is computationally expensive. For a m x n matrix of m users and n items, SVD requires time in order of O((m+n)3) (Deerwester, 1990). While during the experimental procedure this was not crucial, since the user-item matrix R was of dimension 943 x 1682 only (with average SVD execution time 13.84 seconds using Matlab) and even while SVD can be calculated offline, the cost would prove prohibitive for large scale datasets containing millions of customers and items (Sarwar, 2000), (Papagelis, 2005). As a result alternative techniques for factorization should be considered as for example in (Sarwar, 2002) and (Ma, 2009).
This chapter concludes the project by discussing what was accomplished by the research conducted and by describing the focus of future work.
By conducting accuracy tests in accord with the experimental procedure followed at the recommender systems literature and interpreting the significance of our results with paired T-tests, we concluded that our proposed approach significantly improves the overall accuracy of the recommendations, compared with the base system for which at the first part of our experimentations we showed that the use of classification models alone to predict the ratings leads to poor performance especially for items of low popularity.
The most important achievement of the system is its effectiveness on reducing the negative effects of the item-side cold start problem.
By using clustering in the reduced dimension space via Singular Value Decomposition we managed to improve the accuracy of the classification models at this problematic area. The fact that the performance of the proposed model presents smaller deviations across all the ranges of item popularity compared to the base model means that a newly introduced item will no longer receive poor recommendations compared to the items with many ratings that can inherently provide enough information to support the predictive models.
In the future we would like to investigate how effective will be the formation of clusters of users in the reduced dimension space instead of clusters of items in combination with classification models and how the size of these clusters will affect the system`s performance.
We would also like to test whether treating the class as nominal will have an impact at the prediction accuracy compared to the numeric approach followed in this work.
Another thing we would like to investigate is what will be the performance of the proposed system tested on a different dataset. The initial idea was to use more than one datasets to evaluate the performance of the method, but it was dropped due to time restrictions.
Finally, since the classification algorithms proved to be very sensitive to the amount of available information, it we would be interesting to investigate if we can achieve better performance by enriching the data with demographic and contextual information.
- Abbasi, Z. Amer-Yahia, S. Lakshmanan, L. Vassilvitskii, S. and Yo, C. (2009) Getting Recommender Systems to Think Outside the Box In proceedings of RecSys'09, October 23-25, 2009, New York, New York, USA, pp 285-288
- Acilar, M. Arslan, A. (2009). A collaborative filtering method based on artificial immune network Expert Systems with Applications 36, pp 8324-8332
- Al-Shamri, M. and Bharadwaj, K. (2008). Fuzzy-genetic approach to recommender systems based on a novel hybrid user model Expert Systems with Applications 35, pp 1386-1399
- Aldbavi, A. and Shahbazi, M. (2009). A hybrid recommendation technique based on product category attributes Expert Systems with Applications 36, pp 11480-11488
- Amatriain, A. Pujol, J. Tintarev, N. and Oliver, N. (2009). Rate it Again: Increasing Recommendation Accuracy by User re-Rating In proceedings of RecSys'09, October 23-25, 2009, New York, New York, USA, pp 173-180
- Balabanovic, M. and Shoham, Y. (1997) Fab: Content based collaborative recommendation Commun. ACM 40, pp 66-72
- Berka, T. and Plobnig, M. (2004) Designing Recommender Systems for Tourism Proceedings of ENTER 2004
- Bonnin, G. Brun, A. and Boyer, A. (2009). A Low-Order Markov Model integrating Long-Distance Histories for Collaborative Recommender Systems In proceedings of IUI'09, February 8-11, 2009, pp 57-66
- Burke, R. (1999). Integrating Knowledge-based and Collaborative-filtering Recommender Systems In Proceedings of the Workshop on AI and Electronic Commerce. AAAI 99. Orlando, Florida, pp 69-72
- Burke, R. (2002). Hybrid Recommender Systems: Survey and Experiments User Modeling and User-Adapted Interaction 12, pp 331-370
- Breese, J. Heckerman, D and Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. Uncertainty in artificial intelligence, proceedings of the fourteenth conference, Morgan Kaufman, pp 43-52
- Campos, L. Fernandez-Luna, J. and Huete, J. (2008). A collaborative recommender system based on probabilistic inference from fuzzy observations Fuzzy Sets and Systems 159, pp 1554 - 1576
- Chen, G. Wang, F. and Zhang, C. (2009). Collaborative filtering using orthogonal nonnegative matrix tri-factorization Information Processing and Management 45, pp 368-379
- Cho, Y. Kim, J. Kim, S. (2002) A personalized recommender system based on web usage mining and decision tree induction Expert Systems with Applications 23, pp 329-342
- Cleverdon, C. and Kean, M. (1968). Factors Determining the Performance of Indexing Systems Aslib Cranfield Research Project, Cranfield, England.
- Deerwester, S. Dumais, S.T. Furnas, G.W Laundauer, T.K and Harshman, R. (1990). Indexing by Latent Semantic Analysis Journal of the American Society for Information Science 41(6)
- Felfernig, A. and Kiener, A. (2005). Knowledge-based Interactive Selling of Financial Services with FSAdvisor Proceedings of the 17th conference on Innovative applications of artificial intelligence, Pittsburgh Pennsylvania 3, pp 1475-1482
- Felfernig, A. and Burke, R. (2008). Constraint-based Recommender Systems: Technologies and Research Issues 10th Int. Conf. on Electronic Commerce (ICEC) '08 Innsbruck, Austria, pp 1-10
- Givon, S. and Lavrenko, V. (2009). Predicting Social-tags for Cold Start Book Recommendations In proceedings of RecSys'09, October 23-25, 2009, New York, New York, USA. pp 333-336
- Goldberg, K. Roeder, T. Gupta, D. and Perkins, C. (2001). Eigentaste: A constant time collaborative filtering algorithm Information Retrieval 4 (2), pp 133-151
- Gong, S. Ye, H. and Dai, Y. (2009). Combining Singular Value Decomposition and Item-based Recommender in Collaborative Filtering In proceedings of the Second International Workshop on Knowledge Discovery and Data Mining, pp 769 - 772
- Gunawardana, A. and Meek, C. (2009). A Unified Approach to Building Hybrid Recommender Systems In proceedings of RecSys'09, October 23-25, 2009, pp 117-124
- Hanley, J. A., and Mcneil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (roc) curve Radiology 143, pp 29-36.
- Hall, M. Frank, E. Holmes, G. Pfahringer, B. Reutmann, P. and Witten, I. (2009) The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1)
- Herlocker, J. (2004). Evaluating Collaborative Filtering Recommender Systems ACM Transactions on Information Systems 22 (1), pp 5-53
- Hernandez del Olmo, F. and Gaudioso, E. (2008). Evaluation of recommender systems: A new approach Expert Systems with Applications 35, pp790-804
- Hijikata, Y. Shimizu, T. and Nishida, S. (2009). Discovery-oriented Collaborative Filtering for Improving User Satisfaction In proceedings of IUI'09, February 8-11, 2009, Sanibel Island, Florida, USA
- Jannach, D. Zanker, M. and Fuchs, M. (2009) Constraint-based Recommendation in Tourism: A multi-perspective Case Study Information Technology and Tourism 11(2)
- Jeong, B. Lee, J. and Cho, H. (2009). User credit-based collaborative filtering Expert Systems with Applications 36, pp 7309-7312
- Konstan, J. McNee, S. Ziegler, C. Torres, R Kapoor, N. and Riedl, J (2006). Lessons on applying automated recommender systems to information-seeking tasks In AAAI, 2006
- Koren, Y. (2009) Collaborative Filtering with Temporal Dynamics In proceedings of KDD'09, June 28-July 1, 2009, Paris, France
- Koren, Y. (2010). Factor in the Neighbors: Scalable and Accurate Collaborative Filtering ACM Transactions on Knowledge Discovery from Data 4 (1), Article 1
- Kwon. Y. (2008). Improving Top-N Recommendation Techniques Using Rating Variance In proceedings of RecSys'08, October 23-25, 2008, Lausanne, Switzerland, pp 307-310
- Lakiotaki, K. Tsafarakis, S. and Matsatsinis, N. (2008). UTA-Rec: A Recommender System based on Multiple Criteria Analysis In proceedings of RecSys'08, October 23-25, 2008, Lausanne, Switzerland, pp 219-225
- Lam, X. Vu, T. Le, T. and Duong, A. (2008). Addressing cold-start problem in recommendation systems Proceedings of the 2nd international conference on Ubiquitous information management and communication, January 31-February 01, 2008, Suwon, Korea
- Lee, J. and Olafsson, S. (2009). Two-way cooperative prediction for collaborative filtering recommendations Expert Systems with Applications 36, pp 5353-5361
- Ma, H. Yang, H. King, I. and Lyu, M. (2009). Semi-Nonnegative Matrix Factorization with Global Statistical Consistency for Collaborative Filtering In proceedings of CIKM'09, November 2-6, 2009, Hong Kong, China, pp 767-775
- Ma, S. Li, X. Ding, Y. and Orlowska, M. (2007). A Recommender System with Interest-Drifting Lecture Notes In Computer Science 4831, pp 633-642
- Massa, P. and Avesani, P. (2009). Computing with Social Trust, Springer-Verlag London Limited, pp 259-285
- Mobasher, B. Cooley, R. and Srivastava, J (2000). Automatic personalization based on web usage mining. Communications of ACM 43(8), pp 142-151
- Montaner, M. Lopez, B. and De La Rosa, J. (2003). A taxonomy of Recommender Agents on the Internet Artificial Intelligence Review 13, pp 285-330
- Mukherzee, R. Sajja, N. and Sen. S (2003) A movie recommendation system - An application of Voting Theory in User Modeling User Modeling and User-adapted Interaction 13, pp 5-33
- Mulder, I. de Poot, H. Verwijs, C. Janssen, R. and Bijlsma, M. (2006) An Information Overload study: Using design methods for understanding Proceedings of OZCHI`06 November 22-24, 2006, Sydney, Australia, pp 1-8
- Murakami, T. Mori, K. and Orihara, R. (2008). Metrics for evaluating the serendipity of recommendation lists New Frontiers in Artificial Intelligence 4914, pp 40-46
- Nasraoui, O. and Petenes, C. (2003). An Intelligent Web Recommendation Engine Based on Fuzzy Approximate Reasoning Proceedings of the IEEE International Conference on Fuzzy Systems - Special Track on Fuzzy Logic and the Internet, 2003
- Papagelis, M. Rousidis, I. Plexousakis, D. and Theoharopoulos, E. Incremental Collaborative Filtering for Highly-Scalable Recommendation Algorithms In proceedings of ISMIS 2005, pp 553-561
- Park, Y. and Tuzhilin, A. (2008). The Long Tail of Recommender Systems and How to Leverage it In proceedings of RecSys'08, October 23-25, 2008, Lausanne, Switzerland, pp 11-18
- Porcel, C. Moreno, J.and Herrera-Viedma, E. (2009). A multi-disciplinar recommender system to advice research resources in University Digital Libraries Expert Systems with Applications 36, pp 12520-12528
- Rendle, S. and Schmidt-Thieme, L. (2008). Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender Systems In proceedings of RecSys'08, October 23-25, 2008, Lausanne, Switzerland
- Rashid, A. Albert, I. Cosley, D. Lam, S. McNee, S. Konstan, J. and Riedl, J. (2002). Getting to Know You: Learning New User Preferences in Recommender Systems In proceedings of ACM IUI 2002, ACM Press
- Sarwar, B. Karypis, G. Konstan, J. Riedl, J. (2000). Application of Dimensionality Reduction in Recommender System -- A Case Study In ACM WebKDD Web Mining for E-Commerce Workshop, August 2000, Boston
- Sarwar, B. Karypis, G. Konstan, J. Riedl, J. (2002). Incremental Singular Value Decomposition Algorithms for Highly Scalable Recommender Systems In Fifth International Conference on Computer and Information Technology (ICCIT2002), 2002
- Schclar,A. Tsikinovsky, A. Rokach, L. Meisels, A. and Antwarg, L. (2009). Ensemble Methods for Improving the Performance of Neighborhood-based Collaborative Filtering In proceedings of RecSys'09, October 23-25, 2009, New York, New York, USA
- Shang, M. Jin, C. Zhou, T. and Zhang, Y. (2009). Collaborative filtering based on multi-channel diffusion Physica A 388, pp 4867- 4871
- Takacs, G. Pilaszy, I. Nemeth, B. and Tikk, D. (2009). Scalable Collaborative Filtering Approaches for Large Recommender Systems Journal of Machine Learning Research 10, pp 623 - 656
- Umyarov, A. and Tuzhilin, A. (2009). Improving Rating Estimation in Recommender Systems Using Aggregation- and Variance-based Hierarchical Model In proceedings of RecSys'09, October 23-25, 2009, pp 37 - 44
- Weimer, M. Karatzoglou, A. and Smola, A. (2008). Adaptive Collaborative Filtering In proceedings of RecSys'08, October 23-25, 2008, Lausanne, Switzerland
- Weng, S. Lin, B. and Chen, W. (2007) Using contextual information and multidimensional approach for recommendation Expert Systems with Applications 36, pp 1268-1279
- Witten, I. and Frank, E (2005) Data mining: Practical machine learning tools and techniques with Java implementations. Morgan Kaufmann
- Yang, J. Li, K. and Zhang, D. (2009). Recommendation based on rational inferences in collaborative filtering Knowledge-Based Systems 22, pp 105-114
- Yildirim, H. and Krishnamoorthy, M. (2008). A Random Walk Method for Alleviating the Sparsity Problem in Collaborative Filtering In proceedings of RecSys'08, October 23-25, 2008, Lausanne, Switzerland
- Zenebe A. and Norcio, A. (2009). Representation, similarity measures and aggregation methods using fuzzy sets for content-based recommender systems Fuzzy Sets and Systems 160, pp 76 - 94
- Zhang, L. Zhang, K. and Li, C. (2008). A Topical PageRank Based Algorithm for Recommender Systems In proceedings of SIGIR'08, July 20-24, 2008, pp 713 - 714
- Zhang, M. and Hurley, N. (2008) Avoiding Monotony: Improving the Diversity of Recommendation Lists In proceedings of RecSys'08, October 23-25, 2008, Lausanne, Switzerland