BENCHMARKING FOR RECOMMENDER
SYSTEM (MFRISE)
Mahesh Mali
Computer Engineering Department, SVKMs NMIMS, Mukesh Patel School of Technology
Management and Engineering, Mumbai, (India).
E-mail: maheshmalisir@gmail.com
Dhirendra Mishra
Computer Engineering Department, SVKMs NMIMS, Mukesh Patel School of Technology
Management and Engineering, Mumbai, (India).
M. Vijayalaxmi
Computer Engineering Department, V.E.S. College of Engineering, Mumbai University, (India).
Reception: 05/11/2022 Acceptance: 20/11/2022 Publication: 29/12/2022
Suggested citation:
Mali, M., Mishra, D., y Vijayalaxmi, M. (2022). Benchmarking for Recommender System (MFRISE). 3C TIC.
Cuadernos de desarrollo aplicados a las TIC, 11(2), 146-156. https://doi.org/10.17993/3ctic.2022.112.146-156
https://doi.org/10.17993/3ctic.2022.112.146-156
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed. 41 Vol. 11 N.º 2 August - December 2022
146
ABSTRACT
The advent of the internet age offers overwhelming choices of movies and shows to viewers which
create need of comprehensive Recommendation Systems (RS). Recommendation System will suggest
best content to viewers based on their choice using the methods of Information Retrieval, Data Mining
and Machine Learning algorithms. The novel Multifaceted Recommendation System Engine (MF-
RISE) algorithm proposed in this paper will help the users to get personalized movie recommendations
based on multi-clustering approach using user cluster and Movie cluster along with their interaction
effect. This will add value to our existing parameters like user ratings and reviews.
In real-world scenarios, recommenders have many non-functional requirements of technical nature.
Evaluation of Multifaceted Recommendation System Engine must take these issues into account in
order to produce good recommendations. The paper will show various technical evaluation
parameters like RMSE, MAE and timings, which can be used to measure accuracy and speed of
Recommender system. The benchmarking results also helpful for new recommendation algorithms.
The paper has used MovieLens dataset for purpose of experimentation. The studied evaluation
methods consider both quantitative and qualitative aspects of algorithm with many evaluation
parameters like mean squared error (MSE), root mean squared error (RMSE), Test Time and Fit Time
are calculated for each popular recommender algorithm (NMF, SVD, SVD++, SlopeOne, Co-
Clustering) implementation. The study identifies the gaps and challenges faced by each above
recommender algorithm. This study will also help researchers to propose new recommendation
algorithms by overcoming identified research gaps and challenges of existing algorithms.
KEYWORDS
Comparing recommender system, bench-marking recommendation system algorithms, comparing
recommendation algorithms, challenges of various recommendation algorithms, Performance
evaluation of Recommendation algorithms.
https://doi.org/10.17993/3ctic.2022.112.146-156
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed. 41 Vol. 11 N.º 2 August - December 2022
147
1. INTRODUCTION
Availability of internet and global resources has increased number of availability of movies and shows
which can be viewed by users. Recommendation Systems are tools used to give movie
recommendations to the end-users based on their likes or likes of the similar users [1]. Recommender
systems are good for both service providers as well as users. They reduce the time to find and
selecting correct item on internet. A recommender system is an information filtering system which
recommends the best movies to the user by considering some similarity between users or movies or
user ratings for movies. The existing types of recommendation systems algorithms are Collaborative
Filtering (CF) and Content-Based Filtering (CB).
Multiple well-known recommendation algorithms based on above categories are already proposed,
KDD algorithm, SVD algorithms, SlopeOne and Co-Clustering algorithm. In this paper we have
implemented and analyzed their comparative performance, as it can be used for benchmarking
performance of our proposed multifaceted recommender system (MFRISE). This paper also explained
the challenges and limitations of each algorithm. Such, challenges can be used to improve
performance quality recommendations by modifying algorithm.
2. ARCHITECTURE OF MULTIFACETED RECOMMENDER
SYSTEM (MFRISE)
The general recommendation system algorithm will use the mathematical function to suggest
recommendations based on past similarity between users and movies [3]. The algorithm must be able
to measure the usefulness of movie to user. In order to get good recommendations, we need lot of
implicit and explicit data. Data coming from user ratings is acts as explicit data. Implicit data fetched
from social data and watch history. The MFRISE is hybrid recommender system introduced by our
paper which is used for Improvement Recommendations with help of identifying similar movies using
content based (CB) filtering and perform multi-clustering and find the community impact on
recommendations using text analytics,
Step 1 : Data Prepossessing
Step 2 : Similarity based
recommendations
Step 3 : Clustering using Items similarity
Step 4 : Clustering using User similarity
Step 5 : Find Social impact on items
Step 6 : Multi-cluster & interactions
Step 7 : Ranking recommendations to
user
Step 8 : Validation and testing
Fig. 2. MS-RISE Proposed Architecture.
https://doi.org/10.17993/3ctic.2022.112.146-156
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed. 41 Vol. 11 N.º 2 August - December 2022
148
The detailed Implementation of Multifaceted Recommendation System Engine (MF-RISE) is included
in upcoming paper on our proposed work. The proposal of method and
experimentation on benchmarking algorithms is proposed in this paper.
2.1 RECOMMENDER SYSTEM (RS) EVALUATION METHODS
The evaluation of recommendation system algorithm is not as easy as evaluation of any other machine
learning algorithms, as the recommendation output for each user is different than other user [3]. This is
main reason for which we cannot simply divide dataset into training and testing data. The methods
used for evaluation RSs are,
a) Train - Test Split [6]:
In RS algorithms it is not possible to take separate Training data set and
testing data set, since the training data used to fit algorithm and test data set is used for evaluating RS
algorithm. But, the user in training data may not be available in test data so, it is difficult to use
separate test data. We have used masking method, rating values for some users are masked and then
rating are predicted using algorithm then we can compare these ratings for checking accuracy.
Fig. 3. Train-Test Split (masking).
b) K-Fold Cross Validation Method [7]:
Cross-validation is a statistical method used to estimate the
performance of RS algorithms or any machine learning algorithm.
3. BENCHMARKING RECOMMENDER SYSTEM (RS)
ALGORITHMS
We usually categorize recommendation engine algorithms as collaborative filtering models and
content-based models. In this paper, we are going to study and discuss few advantages and drawbacks
of some popular recommender algorithms to compare their performances based on various evaluation
• Dataset divided in k Groups
Accept that group as Test Data
• Take remaining groups as Training Data
• Fit Model on Training Data
• Evaluate on Test Data
• Calculate Evaluation Score
Fig. 4. K-Fold Cross Validation Concept.
https://doi.org/10.17993/3ctic.2022.112.146-156
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed. 41 Vol. 11 N.º 2 August - December 2022
149
metrics. This paper will set a benchmark for our proposed implementation of MF-RISE with below
popular RS algorithms,
a) Similarity Based Algorithm
Baseline algorithm
b) Neighborhood Algorithm
K-Nearest Neighbors Algorithm (KNN)
c) Hybrid Methods
Co-Clustering
Slope-One
d) Matrix Factorization Method
Single value Decomposition (SVD)
Advanced SVD (SVD++)
Negative Matrix Factorization (NMF)
4. EVALUATION METRICS FOR RECOMMENDER SYSTEMS
The most important thing for RS is to evaluate the performance of algorithm. The traditional algorithm
evaluation metrics used to measure errors may not be effective for recommendation algorithms, as
there are different recommendations for each user and no recommendations can be same for even
same user. We need to take help of various traditional and modern methods for validating
recommendation results.
A. Accuracy Metrics
Recommendation accuracy will measure difference between recommender’s estimated ratings and
actual user ratings.
a) Mean Absolute Error(MAE): [5] Absolute Error is the amount of error in prediction and actual
rating.
The mean value of absolute errors can be given as Mean Absolute Error(MAE),
b) Mean Square Error(MSE): [5] The measure of the average of the squares of the errors is called as
Mean Square Error(MSE). MSE is not as small as MAE. MSE can be calculated as,
c) Root Mean Square Error(RMSE): [5] The Root Mean Squared Error(RMSE) is better in terms of
performance when dealing with larger error values. RMSE is more useful when lower residual values
https://doi.org/10.17993/3ctic.2022.112.146-156
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed. 41 Vol. 11 N.º 2 August - December 2022
150
are preferred. MSE is highly biased for higher values. Therefore, RMSE is more preferred accuracy
measure.
B. Classification Metrics: Recommendation accuracy can also be measured by traditional precision
and recall metrics. Recommended items has a high interaction value i.e. number of ratings, can be
considered as most accurate predictions.
i) Precision: Precision is the ratio of true positives (number of relevant results) and total positives
recommended items.
ii) Recall: A Recall is essentially the ratio of number of relevant items that are recommended to all
relevant items.
C. Ranking Metrics [4]: Recommendation accuracy can also measure by Top-N results given by RS
algorithm.
i) Hit Rate: The Hit occurs, if a user rated one of the top-10 recommended movie. So,
first we find the Top 10 movie recommendations. then, we find movies rated by
user. If user rates a movie which is already recommended, we consider that as one
hit. Finally, ratio of Number of hits and total recommended movies is Hit Ratio.
ii) Miss Ratio: The Miss occurs, if a user rated movie not present in the top-10 recommended movie.
If user rates a movie which is not recommended, we consider that as Miss. Finally, ratio of Number
of Misses and total recommended movies is Miss Ratio.
D. Execution Time Metrics
Recommendation algorithm speed can be one of the important metrics, as we are dealing with very
large set of data. The time required for algorithm to calculate the recommendation from input dataset
is used as execution time. The time required for fitting algorithm is Fit Time and the time taken to run
it on test data is Test time.
5. EXPERIMENTAL SETUP
We build experiments based on MovieLens datasets provided by Group Lens [11]. MovieLens datasets
contain user ratings for multiple movies. The dataset contains 2113 users, 10197 movies and 855598
user ratings including tag assignments. The datasets contain only users that have rated at least 20
movies. They have conventional ratings which is preferred when predicting ratings. Since the Root
Mean Squared Error(RMSE) and Mean Squared Error(MAE) values are depended on the rating scale,
the results will be more comparable. We have used 5-Fold cross validation method for selecting
training and testing dataset more effectively. The performance after each fold is analyzed and decided
to work on 5 Folds for memory and time optimization. After 5-Folds, the performance is not improved
considerably hence, decided to work with 5-Fold method [7]. The comparison of MovieLens datasets,
Table I. Comparisons of Datasets.
https://doi.org/10.17993/3ctic.2022.112.146-156
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed. 41 Vol. 11 N.º 2 August - December 2022
151
6. PERFORMANCE EVALUATION
A. Baseline Algorithms [12]
The similarity based algorithms are content based algorithms used for predicting a random rating
based on the distribution, this algorithm assumes user ratings are normally distributed. The
prediction is generated from a normal distribution N (µ, σ2) where µ and σ are estimated from the
training data using Maximum Likelihood Estimation [3], If user u is unknown, then the bias bu is
assumed to be zero. The same applies for item i with bi.
The best part of algorithm is simple implementation and useful for comparing algorithm accuracy.
The points to improve is need more personalized predictions and less execution time for complex
predictions. This algorithm also faces the problem of cold start for novice system users.
B. Matrix factorization Algorithm [16]
The Single Value Decomposition (SVD) is a Matrix factorization algorithm popularized by Simon
Funk during the Netflix Prize. This is equivalent to Probabilistic Matrix Factorization algorithm. It
Constructs a matrix with the row of users and columns of items and the elements are given by the
users’ ratings The singular value decomposition [15] is a method of decomposing a matrix into
three other matrices.
The SVD is good for with few datasets and it can improve performance on many algorithms. It
majorly uses the Principal component analysis(PCA) which is useful for dimensional reduction.
C. SVD++ Algorithm [20]
The Single Value Decomposition (SVD++) is extension of SVD algorithm, with considering
implicit ratings This is equivalent to Probabilistic Matrix Factorization algorithm. It Constructs a
matrix with the row of users and columns of items and the elements are given by the users’ ratings.
The prediction rui is set as,
Predicted Rating is,
A=U S VT
Where A= m x n utility matrix
U= m x r rating singular
matrix
The prediction rui is set as,
If user u is unknown, then the bias bu and the
factors pu are assumed to be zero.
https://doi.org/10.17993/3ctic.2022.112.146-156
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed. 41 Vol. 11 N.º 2 August - December 2022
152
Where, the yj
terms are a new set of item factors that capture implicit ratings. Here, an implicit
rating describes the fact that a user u rated an item j, regardless of the rating value. If user u is
unknown, then the bias bu and the factors pu are assumed to be zero.
D. Non-Negative Matrix Factorization(NMF) [22]
The Non-Negative Matrix Factorization(NMF) is equivalent to Non negative Matrix Factorization
algorithm. It Constructs a matrix with the row of users and columns of items and the elements are
given by the users’ ratings.
The NMF algorithm can improves performance on many algorithms. NMF based methods
used in for solving problems in computer vision. The computational complexity of CF based
algorithm is very high and it results in many missing ratings. Some major improvements are
required to achieve high computational efficiency and prediction accuracy.
E. Co-Clustering Algorithm [25]
A Co-Clustering is based on collaborative filtering algorithm. This approach is based on
simultaneous clustering of users and movies (items) for efficient CF based algorithm.
In Co-clustering method, every users and movies are assigned some clusters Cu, Ci, and some
co-clusters Cui The prediction rui is set as,
rui = Cui + (µu Cu) + (µi Ci)
Where, If the user is unknown, the prediction is rui = µi, If the item is unknown, the prediction is rui
= µu , If both are unknown, the prediction is rui = µ
The co-clustering algorithm has good control over learning and can consider multiple
dimensions of data. but, needs more execution time in few cases for some critical recommendation.
The cold start issue become major issue in this algorithm.
F. Slope One Algorithms [27]
Slope One algorithm is based on the movie-user rating matrix based on the linear model y=xb+c.
Where, parameter y is the rating of the predicted target user on the target movie, parameter x is the
rating of the target user on the reference movie, and parameter b is the deviation value of the user's
score of different movies. Slope One algorithm calculates enter of the evaluation of excessive user
ratings mean the score difference between the movies, and then at the time of target users
recommend, uses the linear relationship, estimate the prediction score of the movies y according to
the target user's score of project x and the deviation value b , that is, generate the prediction by
using the deviation value of all users among different movies. Slope One algorithm is simple in
calculation and having good performance. It can handle cold start issue well by predicting ratings.
But, the fit time will be higher as compare to other algorithms.
7. RESULTS
All experiments are run on a Desktop with Intel Core i5 8th gen (CPU@2.30GHz) and 8GB RAM, all
data stored on solid state Memory (SSD) for faster access and optimum performance. In this paper, we
present the various evaluation parameter like average RMSE, MAE and total execution time of various
algorithms (used in study) with a 5-fold cross-validation procedure.
Table II. Evaluation Metrics for Benchmark Algorithms.
https://doi.org/10.17993/3ctic.2022.112.146-156
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed. 41 Vol. 11 N.º 2 August - December 2022
153
8. CONCLUSION AND FUTURE WORK
In this paper, we present a real-world benchmark for our new recommendation system algorithm. The
prediction accuracy of a recommender system is dependent on various parameters. In study, we have
seen all algorithms are optimized for the MovieLens dataset. SVD, NMF and co-clustering algorithm
performs better on the larger dataset than the other collaborative filtering algorithm. To obtain more
detailed results, testing algorithms on datasets with more similar properties can be performed.
We have deployed many important RS algorithms to study their performance comparisons, which was
ubiquitous and crucial in recommendation scenarios. After comparing all algorithms, we found that
SVD++ algorithm need highest Fit Time due to complexity of calculations. There is lower RMSE
calculations in all algorithms except Baseline algorithm. Overall execution time of SVD algorithm and
co-clustering algorithm is very lower. So, we are planning to can plan to use SVD, NMF and Co-
Clustering algorithms for efficient implementation of movies recommendation process.
We can conclude that the SVD, NMF and Co-Clustering algorithm is seemingly more accurate than
other the Item-based collaborative filtering algorithm for larger datasets.
REFERENCES
[1] MALI, MAHESH, DHIRENDRA S. MISHRA, AND M. VIJAYALAXMI. "Multifaceted
recommender systems methods: A review." Journal of Statistics and Management Systems 23.2
(2020): 349-361.
[2] MEHTA Y., SINGHANIA A., TYAGI A., SHRIVASTAVA P., MALI M. (2020) A Comparative
Study of Recommender Systems. In: Kumar A., Paprzycki M., Gunjan V. (eds) ICDSMLA
2019. Lecture Notes in Electrical Engineering, vol 601. Springer, Singapore.
Fig. 15. Benchmark Algorithms Performance Analysis.
Fig. 17. Comparative Performance Analysis.
https://doi.org/10.17993/3ctic.2022.112.146-156
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed. 41 Vol. 11 N.º 2 August - December 2022
154
[3] F.O. ISINKAYE, Y.O. FOLAJIMI, B.A. OJOKOH, Recommendation systems: Principles, methods
and evaluation, Egyptian Informatics Journal, Volume 16, Issue 3, 2015, Pages 261-273, ISSN
1110-8665,
https://doi.org/10.1016/j.eij.2015.06.005.
[4] PU, PEARL, LI CHEN, AND RONG HU. "Evaluating recommender systems from the users
perspective: survey of the state of the art." User Modeling and User-Adapted Interaction 22.4
(2012): 317-355.
[5] CREMONESI, PAOLO, ET AL. "An evaluation methodology for collaborative recommender
systems." 2008 International Conference on Automated Solutions for Cross Media Content and
Multi-Channel Distribution. IEEE, 2008.
[6] CANAMARES, ROCIO, PABLO CASTELLS, AND ALISTAIR MOFFAT. "Offline evaluation
options for recommender systems." Information Retrieval Journal 23.4 (2020): 387-410.
[7] MORENO-TORRES, JOSE GARCIA, JOSE A. SAEZ, AND FRANCISCO HERRERA. "Study
on the impact of partition-induced dataset shift on k-fold cross validation." IEEE Transactions
on Neural Networks and Learning Systems 23.8 (2012): 1304-1312.
[8] FAYYAZ, ZESHAN, ET AL. "Recommendation systems: Algorithms, challenges, metrics, and
business opportunities." applied sciences 10.21 (2020): 7748.
[9] SHARMA, RITU, DINESH GOPALANI, AND YOGESH MEENA. "Collaborative filtering-based
recommender system: Approaches and research challenges." 2017 3rd international conference
on computational intelligence and communication technology (CICT). IEEE, 2017.
[10] HUG, NICOLAS. "Surprise: A Python library for recommender systems." Journal of Open
Source Software 5.52 (2020): 2174.
[11] F. MAXWELL HARPER AND JOSEPH A. KONSTAN. 2015. The MovieLens Datasets: History
and Context. ACM Trans. Interact. Intell. Syst. 5, 4, Article 19 (January 2016), 19 pages. https://
doi.org/10.1145/2827872
[12] RENDLE, STEFFEN, LI ZHANG, AND YEHUDA KOREN. "On the difficulty of evaluating
baselines: A study on recommender systems." arXiv preprintarXiv:1905.01395 (2019).
[13] WANG, KAI, ET AL. "Rl4rs: A real-world benchmark for reinforcement learning based
recommender system." arXiv preprint arXiv:2110.11073 (2021).
[14] AHUJA, RISHABH, ARUN SOLANKI, AND ANAND NAYYAR. "Movie recommender system
using k-means clustering and k-nearest neighbor." 2019 9th
International Conference on Cloud
Computing, Data Science and Engineering (Confluence). IEEE, 2019.
[15] WANG, JIANFANG, ET AL. "A collaborative filtering algorithm based on svd and trust factor."
2019 international conference on computer, network, communication and information systems
(CNCI 2019). Atlantis Press, 2019.
[16] MEHTA, RACHANA, AND KEYUR RANA. "A review on matrix factorization techniques in
recommender systems." 2017 2nd International Conference on Communication Systems,
Computing and IT Applications (CSCITA).
IEEE, 2017.
[17] RICCI, FRANCESCO, LIOR ROKACH, AND BRACHA SHAPIRA. "Recommender systems:
introduction and challenges." Recommender systems handbook. Springer, Boston, MA, 2015.
1-34.
[18] ZHANG, WEIWEI, ET AL. "Recommendation system in social networks with topical attention
and probabilistic matrix factorization." PloS one 14.10 (2019): e0223967.
[19] VOZALIS, MANOLIS G., AND KONSTANTINOS G. MARGARITIS. "A recommender system
using principal component analysis." Published in 11th panhellenic conference in informatics.
2007.
[20] AL SABAAWI, A., KARACAN, H. AND YENICE, Y. (2021). Two Models Based on Social
Relations and SVD++ Method for Recommendation System. International Association of
Online Engineering. Retrieved May 9, 2022
from https://www.learntechlib.org/p/218689/.
https://doi.org/10.17993/3ctic.2022.112.146-156
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed. 41 Vol. 11 N.º 2 August - December 2022
155
[21] GUAN, XIN, CHANG-TSUN LI, AND YU GUAN. "Matrix factorization with rating
completion: An enhanced SVD model for collaborative filtering recommender systems." IEEE
access 5 (2017): 27668-27678.
[22] LUO, XIN, ET AL. "An efficient non-negative matrix-factorization-based approach to
collaborative filtering for recommender systems." IEEE Transactions on Industrial Informatics
10.2 (2014): 1273-1284.
[23] ZHANG, SHENG, ET AL. "Learning from incomplete ratings using nonnegative matrix
factorization." Proceedings of the 2006 SIAM international conference on data mining. Society
for Industrial and Applied
Mathematics, 2006.
[24] KUMAR, RAJEEV, B. K. VERMA, AND SHYAM SUNDER RASTOGI. "Social popularity
based SVD++ recommender system." International Journal of Computer Applications 87.14
(2014).
[25] FENG, LIANG, QIANCHUAN ZHAO, AND CANGQI ZHOU. "Improving performances of
Top-N recommendations with co-clustering method." Expert Systems with Applications 143
(2020): 113078.
[26] LI, MAN, LUOSHENG WEN, AND FEIYU CHEN. "A novel Collaborative Filtering
recommendation approach based on Soft Co-Clustering." Physica A: Statistical Mechanics and
its Applications 561 (2021): 125140.
[27] WANG, QING-XIAN, ET AL. "Incremental Slope-one recommenders." Neuro-computing 272
(2018): 606-618.
[28] SONG, YUE TING, AND SHENG WU. "Slope one recommendation algorithm based on user
clustering and scoring preferences." Procedia Computer Science 166 (2020): 539-545.
[29] SALAM PATROUS, ZIAD, AND SAFIR NAJAFI. "Evaluating prediction accuracy for
collaborative filtering algorithms in recommender systems." (2016).
https://doi.org/10.17993/3ctic.2022.112.146-156
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed. 41 Vol. 11 N.º 2 August - December 2022
156