• Survey paper
  • Open access
  • Published: 03 May 2022

A systematic review and research perspective on recommender systems

  • Deepjyoti Roy   ORCID: orcid.org/0000-0002-8020-7145 1 &
  • Mala Dutta 1  

Journal of Big Data volume  9 , Article number:  59 ( 2022 ) Cite this article

61k Accesses

94 Citations

6 Altmetric

Metrics details

Recommender systems are efficient tools for filtering online information, which is widespread owing to the changing habits of computer users, personalization trends, and emerging access to the internet. Even though the recent recommender systems are eminent in giving precise recommendations, they suffer from various limitations and challenges like scalability, cold-start, sparsity, etc. Due to the existence of various techniques, the selection of techniques becomes a complex work while building application-focused recommender systems. In addition, each technique comes with its own set of features, advantages and disadvantages which raises even more questions, which should be addressed. This paper aims to undergo a systematic review on various recent contributions in the domain of recommender systems, focusing on diverse applications like books, movies, products, etc. Initially, the various applications of each recommender system are analysed. Then, the algorithmic analysis on various recommender systems is performed and a taxonomy is framed that accounts for various components required for developing an effective recommender system. In addition, the datasets gathered, simulation platform, and performance metrics focused on each contribution are evaluated and noted. Finally, this review provides a much-needed overview of the current state of research in this field and points out the existing gaps and challenges to help posterity in developing an efficient recommender system.

Introduction

The recent advancements in technology along with the prevalence of online services has offered more abilities for accessing a huge amount of online information in a faster manner. Users can post reviews, comments, and ratings for various types of services and products available online. However, the recent advancements in pervasive computing have resulted in an online data overload problem. This data overload complicates the process of finding relevant and useful content over the internet. The recent establishment of several procedures having lower computational requirements can however guide users to the relevant content in a much easy and fast manner. Because of this, the development of recommender systems has recently gained significant attention. In general, recommender systems act as information filtering tools, offering users suitable and personalized content or information. Recommender systems primarily aim to reduce the user’s effort and time required for searching relevant information over the internet.

Nowadays, recommender systems are being increasingly used for a large number of applications such as web [ 1 , 67 , 70 ], books [ 2 ], e-learning [ 4 , 16 , 61 ], tourism [ 5 , 8 , 78 ], movies [ 66 ], music [ 79 ], e-commerce, news, specialized research resources [ 65 ], television programs [ 72 , 81 ], etc. It is therefore important to build high-quality and exclusive recommender systems for providing personalized recommendations to the users in various applications. Despite the various advances in recommender systems, the present generation of recommender systems requires further improvements to provide more efficient recommendations applicable to a broader range of applications. More investigation of the existing latest works on recommender systems is required which focus on diverse applications.

There is hardly any review paper that has categorically synthesized and reviewed the literature of all the classification fields and application domains of recommender systems. The few existing literature reviews in the field cover just a fraction of the articles or focus only on selected aspects such as system evaluation. Thus, they do not provide an overview of the application field, algorithmic categorization, or identify the most promising approaches. Also, review papers often neglect to analyze the dataset description and the simulation platforms used. This paper aims to fulfil this significant gap by reviewing and comparing existing articles on recommender systems based on a defined classification framework, their algorithmic categorization, simulation platforms used, applications focused, their features and challenges, dataset description and system performance. Finally, we provide researchers and practitioners with insight into the most promising directions for further investigation in the field of recommender systems under various applications.

In essence, recommender systems deal with two entities—users and items, where each user gives a rating (or preference value) to an item (or product). User ratings are generally collected by using implicit or explicit methods. Implicit ratings are collected indirectly from the user through the user’s interaction with the items. Explicit ratings, on the other hand, are given directly by the user by picking a value on some finite scale of points or labelled interval values. For example, a website may obtain implicit ratings for different items based on clickstream data or from the amount of time a user spends on a webpage and so on. Most recommender systems gather user ratings through both explicit and implicit methods. These feedbacks or ratings provided by the user are arranged in a user-item matrix called the utility matrix as presented in Table 1 .

The utility matrix often contains many missing values. The problem of recommender systems is mainly focused on finding the values which are missing in the utility matrix. This task is often difficult as the initial matrix is usually very sparse because users generally tend to rate only a small number of items. It may also be noted that we are interested in only the high user ratings because only such items would be suggested back to the users. The efficiency of a recommender system greatly depends on the type of algorithm used and the nature of the data source—which may be contextual, textual, visual etc.

Types of recommender systems

Recommender systems are broadly categorized into three different types viz. content-based recommender systems, collaborative recommender systems and hybrid recommender systems. A diagrammatic representation of the different types of recommender systems is given in Fig.  1 .

figure 1

Content-based recommender system

In content-based recommender systems, all the data items are collected into different item profiles based on their description or features. For example, in the case of a book, the features will be author, publisher, etc. In the case of a movie, the features will be the movie director, actor, etc. When a user gives a positive rating to an item, then the other items present in that item profile are aggregated together to build a user profile. This user profile combines all the item profiles, whose items are rated positively by the user. Items present in this user profile are then recommended to the user, as shown in Fig.  2 .

figure 2

One drawback of this approach is that it demands in-depth knowledge of the item features for an accurate recommendation. This knowledge or information may not be always available for all items. Also, this approach has limited capacity to expand on the users' existing choices or interests. However, this approach has many advantages. As user preferences tend to change with time, this approach has the quick capability of dynamically adapting itself to the changing user preferences. Since one user profile is specific only to that user, this algorithm does not require the profile details of any other users because they provide no influence in the recommendation process. This ensures the security and privacy of user data. If new items have sufficient description, content-based techniques can overcome the cold-start problem i.e., this technique can recommend an item even when that item has not been previously rated by any user. Content-based filtering approaches are more common in systems like personalized news recommender systems, publications, web pages recommender systems, etc.

Collaborative filtering-based recommender system

Collaborative approaches make use of the measure of similarity between users. This technique starts with finding a group or collection of user X whose preferences, likes, and dislikes are similar to that of user A. X is called the neighbourhood of A. The new items which are liked by most of the users in X are then recommended to user A. The efficiency of a collaborative algorithm depends on how accurately the algorithm can find the neighbourhood of the target user. Traditionally collaborative filtering-based systems suffer from the cold-start problem and privacy concerns as there is a need to share user data. However, collaborative filtering approaches do not require any knowledge of item features for generating a recommendation. Also, this approach can help to expand on the user’s existing interests by discovering new items. Collaborative approaches are again divided into two types: memory-based approaches and model-based approaches.

Memory-based collaborative approaches recommend new items by taking into consideration the preferences of its neighbourhood. They make use of the utility matrix directly for prediction. In this approach, the first step is to build a model. The model is equal to a function that takes the utility matrix as input.

Model = f (utility matrix)

Then recommendations are made based on a function that takes the model and user profile as input. Here we can make recommendations only to users whose user profile belongs to the utility matrix. Therefore, to make recommendations for a new user, the user profile must be added to the utility matrix, and the similarity matrix should be recomputed, which makes this technique computation heavy.

Recommendation = f (defined model, user profile) where user profile  ∈  utility matrix

Memory-based collaborative approaches are again sub-divided into two types: user-based collaborative filtering and item-based collaborative filtering. In the user-based approach, the user rating of a new item is calculated by finding other users from the user neighbourhood who has previously rated that same item. If a new item receives positive ratings from the user neighbourhood, the new item is recommended to the user. Figure  3 depicts the user-based filtering approach.

figure 3

User-based collaborative filtering

In the item-based approach, an item-neighbourhood is built consisting of all similar items which the user has rated previously. Then that user’s rating for a different new item is predicted by calculating the weighted average of all ratings present in a similar item-neighbourhood as shown in Fig.  4 .

figure 4

Item-based collaborative filtering

Model-based systems use various data mining and machine learning algorithms to develop a model for predicting the user’s rating for an unrated item. They do not rely on the complete dataset when recommendations are computed but extract features from the dataset to compute a model. Hence the name, model-based technique. These techniques also need two steps for prediction—the first step is to build the model, and the second step is to predict ratings using a function (f) which takes the model defined in the first step and the user profile as input.

Recommendation = f (defined model, user profile) where user profile  ∉  utility matrix

Model-based techniques do not require adding the user profile of a new user into the utility matrix before making predictions. We can make recommendations even to users that are not present in the model. Model-based systems are more efficient for group recommendations. They can quickly recommend a group of items by using the pre-trained model. The accuracy of this technique largely relies on the efficiency of the underlying learning algorithm used to create the model. Model-based techniques are capable of solving some traditional problems of recommender systems such as sparsity and scalability by employing dimensionality reduction techniques [ 86 ] and model learning techniques.

Hybrid filtering

A hybrid technique is an aggregation of two or more techniques employed together for addressing the limitations of individual recommender techniques. The incorporation of different techniques can be performed in various ways. A hybrid algorithm may incorporate the results achieved from separate techniques, or it can use content-based filtering in a collaborative method or use a collaborative filtering technique in a content-based method. This hybrid incorporation of different techniques generally results in increased performance and increased accuracy in many recommender applications. Some of the hybridization approaches are meta-level, feature-augmentation, feature-combination, mixed hybridization, cascade hybridization, switching hybridization and weighted hybridization [ 86 ]. Table 2 describes these approaches.

Recommender system challenges

This section briefly describes the various challenges present in current recommender systems and offers different solutions to overcome these challenges.

Cold start problem

The cold start problem appears when the recommender system cannot draw any inference from the existing data, which is insufficient. Cold start refers to a condition when the system cannot produce efficient recommendations for the cold (or new) users who have not rated any item or have rated a very few items. It generally arises when a new user enters the system or new items (or products) are inserted into the database. Some solutions to this problem are as follows: (a) Ask new users to explicitly mention their item preference. (b) Ask a new user to rate some items at the beginning. (c) Collect demographic information (or meta-data) from the user and recommend items accordingly.

Shilling attack problem

This problem arises when a malicious user fakes his identity and enters the system to give false item ratings [ 87 ]. Such a situation occurs when the malicious user wants to either increase or decrease some item’s popularity by causing a bias on selected target items. Shilling attacks greatly reduce the reliability of the system. One solution to this problem is to detect the attackers quickly and remove the fake ratings and fake user profiles from the system.

Synonymy problem

This problem arises when similar or related items have different entries or names, or when the same item is represented by two or more names in the system [ 78 ]. For example, babywear and baby cloth. Many recommender systems fail to distinguish these differences, hence reducing their recommendation accuracy. To alleviate this problem many methods are used such as demographic filtering, automatic term expansion and Singular Value Decomposition [ 76 ].

Latency problem

The latency problem is specific to collaborative filtering approaches and occurs when new items are frequently inserted into the database. This problem is characterized by the system’s failure to recommend new items. This happens because new items must be reviewed before they can be recommended in a collaborative filtering environment. Using content-based filtering may resolve this issue, but it may introduce overspecialization and decrease the computing time and system performance. To increase performance, the calculations can be done in an offline environment and clustering-based techniques can be used [ 76 ].

Sparsity problem

Data sparsity is a common problem in large scale data analysis, which arises when certain expected values are missing in the dataset. In the case of recommender systems, this situation occurs when the active users rate very few items. This reduces the recommendation accuracy. To alleviate this problem several techniques can be used such as demographic filtering, singular value decomposition and using model-based collaborative techniques.

Grey sheep problem

The grey sheep problem is specific to pure collaborative filtering approaches where the feedback given by one user do not match any user neighbourhood. In this situation, the system fails to accurately predict relevant items for that user. This problem can be resolved by using pure content-based approaches where predictions are made based on the user’s profile and item properties.

Scalability problem

Recommender systems, especially those employing collaborative filtering techniques, require large amounts of training data, which cause scalability problems. The scalability problem arises when the amount of data used as input to a recommender system increases quickly. In this era of big data, more and more items and users are rapidly getting added to the system and this problem is becoming common in recommender systems. Two common approaches used to solve the scalability problem is dimensionality reduction and using clustering-based techniques to find users in tiny clusters instead of the complete database.

Methodology

The purpose of this study is to understand the research trends in the field of recommender systems. The nature of research in recommender systems is such that it is difficult to confine each paper to a specific discipline. This can be further understood by the fact that research papers on recommender systems are scattered across various journals such as computer science, management, marketing, information technology and information science. Hence, this literature review is conducted over a wide range of electronic journals and research databases such as ACM Portal, IEEE/IEE Library, Google Scholars and Science Direct [ 88 ].

The search process of online research articles was performed based on 6 descriptors: “Recommender systems”, “Recommendation systems”, “Movie Recommend*”, “Music Recommend*”, “Personalized Recommend*”, “Hybrid Recommend*”. The following research papers described below were excluded from our research:

News articles.

Master’s dissertations.

Non-English papers.

Unpublished papers.

Research papers published before 2011.

We have screened a total of 350 articles based on their abstracts and content. However, only research papers that described how recommender systems can be applied were chosen. Finally, 60 papers were selected from top international journals indexed in Scopus or E-SCI in 2021. We now present the PRISMA flowchart of the inclusion and exclusion process in Fig.  5 .

figure 5

PRISMA flowchart of the inclusion and exclusion process. Abstract and content not suitable to the study: * The use or application of the recommender system is not specified: **

Each paper was carefully reviewed and classified into 6 categories in the application fields and 3 categories in the techniques used to develop the system. The classification framework is presented in Fig.  6 .

figure 6

Classification framework

The number of relevant articles come from Expert Systems with Applications (23%), followed by IEEE (17%), Knowledge-Based System (17%) and Others (43%). Table 3 depicts the article distribution by journal title and Table 4 depicts the sector-wise article distribution.

Both forward and backward searching techniques were implemented to establish that the review of 60 chosen articles can represent the domain literature. Hence, this paper can demonstrate its validity and reliability as a literature review.

Review on state-of-the-art recommender systems

This section presents a state-of-art literature review followed by a chronological review of the various existing recommender systems.

Literature review

In 2011, Castellano et al. [ 1 ] developed a “NEuro-fuzzy WEb Recommendation (NEWER)” system for exploiting the possibility of combining computational intelligence and user preference for suggesting interesting web pages to the user in a dynamic environment. It considered a set of fuzzy rules to express the correlations between user relevance and categories of pages. Crespo et al. [ 2 ] presented a recommender system for distance education over internet. It aims to recommend e-books to students using data from user interaction. The system was developed using a collaborative approach and focused on solving the data overload problem in big digital content. Lin et al. [ 3 ] have put forward a recommender system for automatic vending machines using Genetic algorithm (GA), k-means, Decision Tree (DT) and Bayesian Network (BN). It aimed at recommending localized products by developing a hybrid model combining statistical methods, classification methods, clustering methods, and meta-heuristic methods. Wang and Wu [ 4 ] have implemented a ubiquitous learning system for providing personalized learning assistance to the learners by combining the recommendation algorithm with a context-aware technique. It employed the Association Rule Mining (ARM) technique and aimed to increase the effectiveness of the learner’s learning. García-Crespo et al. [ 5 ] presented a “semantic hotel” recommender system by considering the experiences of consumers using a fuzzy logic approach. The system considered both hotel and customer characteristics. Dong et al. [ 6 ] proposed a structure for a service-concept recommender system using a semantic similarity model by integrating the techniques from the view of an ontology structure-oriented metric and a concept content-oriented metric. The system was able to deliver optimal performance when compared with similar recommender systems. Li et al. [ 7 ] developed a Fuzzy linguistic modelling-based recommender system for assisting users to find experts in knowledge management systems. The developed system was applied to the aircraft industry where it demonstrated efficient and feasible performance. Lorenzi et al. [ 8 ] presented an “assumption-based multiagent” system to make travel package recommendations using user preferences in the tourism industry. It performed different tasks like discovering, filtering, and integrating specific information for building a travel package following the user requirement. Huang et al. [ 9 ] proposed a context-aware recommender system through the extraction, evaluation and incorporation of contextual information gathered using the collaborative filtering and rough set model.

In 2012, Chen et al. [ 10 ] presented a diabetes medication recommender model by using “Semantic Web Rule Language (SWRL) and Java Expert System Shell (JESS)” for aggregating suitable prescriptions for the patients. It aimed at selecting the most suitable drugs from the list of specific drugs. Mohanraj et al. [ 11 ] developed the “Ontology-driven bee’s foraging approach (ODBFA)” to accurately predict the online navigations most likely to be visited by a user. The self-adaptive system is intended to capture the various requirements of the online user by using a scoring technique and by performing a similarity comparison. Hsu et al. [ 12 ] proposed a “personalized auxiliary material” recommender system by considering the specific course topics, individual learning styles, complexity of the auxiliary materials using an artificial bee colony algorithm. Gemmell et al. [ 13 ] demonstrated a solution for the problem of resource recommendation in social annotation systems. The model was developed using a linear-weighted hybrid method which was capable of providing recommendations under different constraints. Choi et al. [ 14 ] proposed one “Hybrid Online-Product rEcommendation (HOPE) system” by the integration of collaborative filtering through sequential pattern analysis-based recommendations and implicit ratings. Garibaldi et al. [ 15 ] put forward a technique for incorporating the variability in a fuzzy inference model by using non-stationary fuzzy sets for replicating the variabilities of a human. This model was applied to a decision problem for treatment recommendations of post-operative breast cancer.

In 2013, Salehi and Kmalabadi [ 16 ] proposed an e-learning material recommender system by “modelling of materials in a multidimensional space of material’s attribute”. It employed both content and collaborative filtering. Aher and Lobo [ 17 ] introduced a course recommender system using data mining techniques such as simple K-means clustering and Association Rule Mining (ARM) algorithm. The proposed e-learning system was successfully demonstrated for “MOOC (Massively Open Online Courses)”. Kardan and Ebrahimi [ 18 ] developed a hybrid recommender system for recommending posts in asynchronous discussion groups. The system was built combining both collaborative filtering and content-based filtering. It considered implicit user data to compute the user similarity with various groups, for recommending suitable posts and contents to its users. Chang et al. [ 19 ] adopted a cloud computing technology for building a TV program recommender system. The system designed for digital TV programs was implemented using Hadoop Fair Scheduler (HFC), K-means clustering and k-nearest neighbour (KNN) algorithms. It was successful in processing huge amounts of real-time user data. Lucas et al. [ 20 ] implemented a recommender model for assisting a tourism application by using associative classification and fuzzy logic to predict the context. Niu et al. [ 21 ] introduced “Affivir: An Affect-based Internet Video Recommendation System” which was developed by calculating user preferences and by using spectral clustering. This model recommended videos with similar effects, which was processed to get optimal results with dynamic adjustments of recommendation constraints.

In 2014, Liu et al. [ 22 ] implemented a new route recommendation model for offering personalized and real-time route recommendations for self-driven tourists to minimize the queuing time and traffic jams infamous tourist places. Recommendations were carried out by considering the preferences of users. Bakshi et al. [ 23 ] proposed an unsupervised learning-based recommender model for solving the scalability problem of recommender systems. The algorithm used transitive similarities along with Particle Swarm Optimization (PSO) technique for discovering the global neighbours. Kim and Shim [ 24 ] proposed a recommender system based on “latent Dirichlet allocation using probabilistic modelling for Twitter” that could recommend the top-K tweets for a user to read, and the top-K users to follow. The model parameters were learned from an inference technique by using the differential Expectation–Maximization (EM) algorithm. Wang et al. [ 25 ] developed a hybrid-movie recommender model by aggregating a genetic algorithm (GA) with improved K-means and Principal Component Analysis (PCA) technique. It was able to offer intelligent movie recommendations with personalized suggestions. Kolomvatsos et al. [ 26 ] proposed a recommender system by considering an optimal stopping theory for delivering books or music recommendations to the users. Gottschlich et al. [ 27 ] proposed a decision support system for stock investment recommendations. It computed the output by considering the overall crowd’s recommendations. Torshizi et al. [ 28 ] have introduced a hybrid recommender system to determine the severity level of a medical condition. It could recommend suitable therapies for patients suffering from Benign Prostatic Hyperplasia.

In 2015, Zahálka et al. [ 29 ] proposed a venue recommender: “City Melange”. It was an interactive content-based model which used the convolutional deep-net features of the visual domain and the linear Support Vector Machine (SVM) model to capture the semantic information and extract latent topics. Sankar et al. [ 30 ] have proposed a stock recommender system based on the stock holding portfolio of trusted mutual funds. The system employed the collaborative filtering approach along with social network analysis for offering a decision support system to build a trust-based recommendation model. Chen et al. [ 31 ] have put forward a novel movie recommender system by applying the “artificial immune network to collaborative filtering” technique. It computed the affinity of an antigen and the affinity between an antibody and antigen. Based on this computation a similarity estimation formula was introduced which was used for the movie recommendation process. Wu et al. [ 32 ] have examined the technique of data fusion for increasing the efficiency of item recommender systems. It employed a hybrid linear combination model and used a collaborative tagging system. Yeh and Cheng [ 33 ] have proposed a recommender system for tourist attractions by constructing the “elicitation mechanism using the Delphi panel method and matrix construction mechanism using the repertory grids”, which was developed by considering the user preference and expert knowledge.

In 2016, Liao et al. [ 34 ] proposed a recommender model for online customers using a rough set association rule. The model computed the probable behavioural variations of online consumers and provided product category recommendations for e-commerce platforms. Li et al. [ 35 ] have suggested a movie recommender system based on user feedback collected from microblogs and social networks. It employed the sentiment-aware association rule mining algorithm for recommendations using the prior information of frequent program patterns, program metadata similarity and program view logs. Wu et al. [ 36 ] have developed a recommender system for social media platforms by aggregating the technique of Social Matrix Factorization (SMF) and Collaborative Topic Regression (CTR). The model was able to compute the ratings of users to items for making recommendations. For improving the recommendation quality, it gathered information from multiple sources such as item properties, social networks, feedback, etc. Adeniyi et al. [ 37 ] put forward a study of automated web-usage data mining and developed a recommender system that was tested in both real-time and online for identifying the visitor’s or client’s clickstream data.

In 2017, Rawat and Kankanhalli [ 38 ] have proposed a viewpoint recommender system called “ClickSmart” for assisting mobile users to capture high-quality photographs at famous tourist places. Yang et al. [ 39 ] proposed a gradient boosting-based job recommendation system for satisfying the cost-sensitive requirements of the users. The hybrid algorithm aimed to reduce the rate of unnecessary job recommendations. Lee et al. [ 40 ] proposed a music streaming recommender system based on smartphone activity usage. The proposed system benefitted by using feature selection approaches with machine learning techniques such as Naive Bayes (NB), Support Vector Machine (SVM), Multi-layer Perception (MLP), Instance-based k -Nearest Neighbour (IBK), and Random Forest (RF) for performing the activity detection from the mobile signals. Wei et al. [ 41 ] have proposed a new stacked denoising autoencoder (SDAE) based recommender system for cold items. The algorithm employed deep learning and collaborative filtering method to predict the unknown ratings.

In 2018, Li et al. [ 42 ] have developed a recommendation algorithm using Weighted Linear Regression Models (WLRRS). The proposed system was put to experiment using the MovieLens dataset and it presented better classification and predictive accuracy. Mezei and Nikou [ 43 ] presented a mobile health and wellness recommender system based on fuzzy optimization. It could recommend a collection of actions to be taken by the user to improve the user’s health condition. Recommendations were made considering the user’s physical activities and preferences. Ayata et al. [ 44 ] proposed a music recommendation model based on the user emotions captured through wearable physiological sensors. The emotion detection algorithm employed different machine learning algorithms like SVM, RF, KNN and decision tree (DT) algorithms to predict the emotions from the changing electrical signals gathered from the wearable sensors. Zhao et al. [ 45 ] developed a multimodal learning-based, social-aware movie recommender system. The model was able to successfully resolve the sparsity problem of recommender systems. The algorithm developed a heterogeneous network by exploiting the movie-poster image and textual description of each movie based on the social relationships and user ratings.

In 2019, Hammou et al. [ 46 ] proposed a Big Data recommendation algorithm capable of handling large scale data. The system employed random forest and matrix factorization through a data partitioning scheme. It was then used for generating recommendations based on user rating and preference for each item. The proposed system outperformed existing systems in terms of accuracy and speed. Zhao et al. [ 47 ] have put forward a hybrid initialization method for social network recommender systems. The algorithm employed denoising autoencoder (DAE) neural network-based initialization method (ANNInit) and attribute mapping. Bhaskaran and Santhi [ 48 ] have developed a hybrid, trust-based e-learning recommender system using cloud computing. The proposed algorithm was capable of learning online user activities by using the Firefly Algorithm (FA) and K-means clustering. Afolabi and Toivanen [ 59 ] have suggested an integrated recommender model based on collaborative filtering. The proposed model “Connected Health for Effective Management of Chronic Diseases”, aimed for integrating recommender systems for better decision-making in the process of disease management. He et al. [ 60 ] proposed a movie recommender system called “HI2Rec” which explored the usage of collaborative filtering and heterogeneous information for making movie recommendations. The model used the knowledge representation learning approach to embed movie-related information gathered from different sources.

In 2020, Han et al. [ 49 ] have proposed one Internet of Things (IoT)-based cancer rehabilitation recommendation system using the Beetle Antennae Search (BAS) algorithm. It presented the patients with a solution for the problem of optimal nutrition program by considering the objective function as the recurrence time. Kang et al. [ 50 ] have presented a recommender system for personalized advertisements in Online Broadcasting based on a tree model. Recommendations were generated in real-time by considering the user preferences to minimize the overhead of preference prediction and using a HashMap along with the tree characteristics. Ullah et al. [ 51 ] have implemented an image-based service recommendation model for online shopping based random forest and Convolutional Neural Networks (CNN). The model used JPEG coefficients to achieve an accurate prediction rate. Cai et al. [ 52 ] proposed a new hybrid recommender model using a many-objective evolutionary algorithm (MaOEA). The proposed algorithm was successful in optimizing the novelty, diversity, and accuracy of recommendations. Esteban et al. [ 53 ] have implemented a hybrid multi-criteria recommendation system concerned with students’ academic performance, personal interests, and course selection. The system was developed using a Genetic Algorithm (GA) and aimed at helping university students. It combined both course information and student information for increasing system performance and the reliability of the recommendations. Mondal et al. [ 54 ] have built a multilayer, graph data model-based doctor recommendation system by exploiting the trust concept between a patient-doctor relationship. The proposed system showed good results in practical applications.

In 2021, Dhelim et al. [ 55 ] have developed a personality-based product recommending model using the techniques of meta path discovery and user interest mining. This model showed better results when compared to session-based and deep learning models. Bhalse et al. [ 56 ] proposed a web-based movie recommendation system based on collaborative filtering using Singular Value Decomposition (SVD), collaborative filtering and cosine similarity (CS) for addressing the sparsity problem of recommender systems. It suggested a recommendation list by considering the content information of movies. Similarly, to solve both sparsity and cold-start problems Ke et al. [ 57 ] proposed a dynamic goods recommendation system based on reinforcement learning. The proposed system was capable of learning from the reduced entropy loss error on real-time applications. Chen et al. [ 58 ] have presented a movie recommender model combining various techniques like user interest with category-level representation, neighbour-assisted representation, user interest with latent representation and item-level representation using Feed-forward Neural Network (FNN).

Comparative chronological review

A comparative chronological review to compare the total contributions on various recommender systems in the past 10 years is given in Fig.  7 .

figure 7

Comparative chronological review of recommender systems under diverse applications

This review puts forward a comparison of the number of research works proposed in the domain of recommender systems from the year 2011 to 2021 using various deep learning and machine learning-based approaches. Research articles are categorized based on the recommender system classification framework as shown in Table 5 . The articles are ordered according to their year of publication. There are two key concepts: Application fields and techniques used. The application fields of recommender systems are divided into six different fields, viz. entertainment, health, tourism, web/e-commerce, education and social media/others.

Algorithmic categorization, simulation platforms and applications considered for various recommender systems

This section analyses different methods like deep learning, machine learning, clustering and meta-heuristic-based-approaches used in the development of recommender systems. The algorithmic categorization of different recommender systems is given in Fig.  8 .

figure 8

Algorithmic categorization of different recommender systems

Categorization is done based on content-based, collaborative filtering-based, and optimization-based approaches. In [ 8 ], a content-based filtering technique was employed for increasing the ability to trust other agents and for improving the exchange of information by trust degree. In [ 16 ], it was applied to enhance the quality of recommendations using the account attributes of the material. It achieved better performance concerning with F1-score, recall and precision. In [ 18 ], this technique was able to capture the implicit user feedback, increasing the overall accuracy of the proposed model. The content-based filtering in [ 30 ] was able to increase the accuracy and performance of a stock recommender system by using the “trust factor” for making decisions.

Different collaborative filtering approaches are utilized in recent studies, which are categorized as follows:

Model-based techniques

Neuro-Fuzzy [ 1 ] based technique helps in discovering the association between user categories and item relevance. It is also simple to understand. K-Means Clustering [ 2 , 19 , 25 , 48 ] is efficient for large scale datasets. It is simple to implement and gives a fast convergence rate. It also offers automatic recovery from failures. The decision tree [ 2 , 44 ] technique is easy to interpret. It can be used for solving the classic regression and classification problems in recommender systems. Bayesian Network [ 3 ] is a probabilistic technique used to solve classification challenges. It is based on the theory of Bayes theorem and conditional probability. Association Rule Mining (ARM) techniques [ 4 , 17 , 35 ] extract rules for projecting the occurrence of an item by considering the existence of other items in a transaction. This method uses the association rules to create a more suitable representation of data and helps in increasing the model performance and storage efficiency. Fuzzy Logic [ 5 , 7 , 15 , 20 , 28 , 43 ] techniques use a set of flexible rules. It focuses on solving complex real-time problems having an inaccurate spectrum of data. This technique provides scalability and helps in increasing the overall model performance for recommender systems. The semantic similarity [ 6 ] technique is used for describing a topological similarity to define the distance among the concepts and terms through ontologies. It measures the similarity information for increasing the efficiency of recommender systems. Rough set [ 9 , 34 ] techniques use probability distributions for solving the challenges of existing recommender models. Semantic web rule language [ 10 ] can efficiently extract the dataset features and increase the model efficiency. Linear programming-based approaches [ 13 , 42 ] are employed for achieving quality decision making in recommender models. Sequential pattern analysis [ 14 ] is applied to find suitable patterns among data items. This helps in increasing model efficiency. The probabilistic model [ 24 ] is a famous tool to handle uncertainty in risk computations and performance assessment. It offers better decision-making capabilities. K-nearest neighbours (KNN) [ 19 , 37 , 44 ] technique provides faster computation time, simplicity and ease of interpretation. They are good for classification and regression-based problems and offers more accuracy. Spectral clustering [ 21 ] is also called graph clustering or similarity-based clustering, which mainly focuses on reducing the space dimensionality in identifying the dataset items. Stochastic learning algorithm [ 26 ] solves the real-time challenges of recommender systems. Linear SVM [ 29 , 44 ] efficiently solves the high dimensional problems related to recommender systems. It is a memory-efficient method and works well with a large number of samples having relative separation among the classes. This method has been shown to perform well even when new or unfamiliar data is added. Relational Functional Gradient Boosting [ 39 ] technique efficiently works on the relational dependency of data, which is useful for statical relational learning for collaborative-based recommender systems. Ensemble learning [ 40 ] combines the forecast of two or more models and aims to achieve better performance than any of the single contributing models. It also helps in reducing overfitting problems, which are common in recommender systems.

SDAE [ 41 ] is used for learning the non-linear transformations with different filters for finding suitable data. This aids in increasing the performance of recommender models. Multimodal network learning [ 45 ] is efficient for multi-modal data, representing a combined representation of diverse modalities. Random forest [ 46 , 51 ] is a commonly used approach in comparison with other classifiers. It has been shown to increase accuracy when handling big data. This technique is a collection of decision trees to minimize variance through training on diverse data samples. ANNInit [ 47 ] is a type of artificial neural network-based technique that has the capability of self-learning and generating efficient results. It is independent of the data type and can learn data patterns automatically. HashMap [ 50 ] gives faster access to elements owing to the hashing methodology, which decreases the data processing time and increases the performance of the system. CNN [ 51 ] technique can automatically fetch the significant features of a dataset without any supervision. It is a computationally efficient method and provides accurate recommendations. This technique is also simple and fast for implementation. Multilayer graph data model [ 54 ] is efficient for real-time applications and minimizes the access time through mapping the correlation as edges among nodes and provides superior performance. Singular Value Decomposition [ 56 ] can simplify the input data and increase the efficiency of recommendations by eliminating the noise present in data. Reinforcement learning [ 57 ] is efficient for practical scenarios of recommender systems having large data sizes. It is capable of boosting the model performance by increasing the model accuracy even for large scale datasets. FNN [ 58 ] is one of the artificial neural network techniques which can learn non-linear and complex relationships between items. It has demonstrated a good performance increase when employed in different recommender systems. Knowledge representation learning [ 60 ] systems aim to simplify the model development process by increasing the acquisition efficiency, inferential efficiency, inferential adequacy and representation adequacy. User-based approaches [ 2 , 55 , 59 ] specialize in detecting user-related meta-data which is employed to increase the overall model performance. This technique is more suitable for real-time applications where it can capture user feedback and use it to increase the user experience.

Optimization-based techniques

The Foraging Bees [ 11 ] technique enables both functional and combinational optimization for random searching in recommender models. Artificial bee colony [ 12 ] is a swarm-based meta-heuristic technique that provides features like faster convergence rate, the ability to handle the objective with stochastic nature, ease for incorporating with other algorithms, usage of fewer control parameters, strong robustness, high flexibility and simplicity. Particle Swarm Optimization [ 23 ] is a computation optimization technique that offers better computational efficiency, robustness in control parameters, and is easy and simple to implement in recommender systems. Portfolio optimization algorithm [ 27 ] is a subclass of optimization algorithms that find its application in stock investment recommender systems. It works well in real-time and helps in the diversification of the portfolio for maximum profit. The artificial immune system [ 31 ]a is computationally intelligent machine learning technique. This technique can learn new patterns in the data and optimize the overall system parameters. Expectation maximization (EM) [ 32 , 36 , 38 ] is an iterative algorithm that guarantees the likelihood of finding the maximum parameters when the input variables are unknown. Delphi panel and repertory grid [ 33 ] offers efficient decision making by solving the dimensionality problem and data sparsity issues of recommender systems. The Firefly algorithm (FA) [ 48 ] provides fast results and increases recommendation efficiency. It is capable of reducing the number of iterations required to solve specific recommender problems. It also provides both local and global sets of solutions. Beetle Antennae Search (BAS) [ 49 ] offers superior search accuracy and maintains less time complexity that promotes the performance of recommendations. Many-objective evolutionary algorithm (MaOEA) [ 52 ] is applicable for real-time, multi-objective, search-related recommender systems. The introduction of a local search operator increases the convergence rate and gets suitable results. Genetic Algorithm (GA) [ 2 , 22 , 25 , 53 ] based techniques are used to solve the multi-objective optimization problems of recommender systems. They employ probabilistic transition rules and have a simpler operation that provides better recommender performance.

Features and challenges

The features and challenges of the existing recommender models are given in Table 6 .

Simulation platforms

The various simulation platforms used for developing different recommender systems with different applications are given in Fig.  9 .

figure 9

Simulation platforms used for developing different recommender systems

Here, the Java platform is used in 20% of the contributions, MATLAB is implemented in 7% of the contributions, different fold cross-validation are used in 8% of the contributions, 7% of the contributions are utilized by the python platform, 3% of the contributions employ R-programming and 1% of the contributions are developed by Tensorflow, Weka and Android environments respectively. Other simulation platforms like Facebook, web UI (User Interface), real-time environments, etc. are used in 50% of the contributions. Table 7 describes some simulation platforms commonly used for developing recommender systems.

Application focused and dataset description

This section provides an analysis of the different applications focused on a set of recent recommender systems and their dataset details.

Recent recommender systems were analysed and found that 11% of the contributions are focused on the domain of healthcare, 10% of the contributions are on movie recommender systems, 5% of the contributions come from music recommender systems, 6% of the contributions are focused on e-learning recommender systems, 8% of the contributions are used for online product recommender systems, 3% of the contributions are focused on book recommendations and 1% of the contributions are focused on Job and knowledge management recommender systems. 5% of the contributions concentrated on social network recommender systems, 10% of the contributions are focused on tourist and hotels recommender systems, 6% of the contributions are employed for stock recommender systems, and 3% of the contributions contributed for video recommender systems. The remaining 12% of contributions are miscellaneous recommender systems like Twitter, venue-based recommender systems, etc. Similarly, different datasets are gathered for recommender systems based on their application types. A detailed description is provided in Table 8 .

Performance analysis of state-of-art recommender systems

The performance evaluation metrics used for the analysis of different recommender systems is depicted in Table 9 . From the set of research works, 35% of the works use recall measure, 16% of the works employ Mean Absolute Error (MAE), 11% of the works take Root Mean Square Error (RMSE), 41% of the papers consider precision, 30% of the contributions analyse F1-measure, 31% of the works apply accuracy and 6% of the works employ coverage measure to validate the performance of the recommender systems. Moreover, some additional measures are also considered for validating the performance in a few applications.

Research gaps and challenges

In the recent decade, recommender systems have performed well in solving the problem of information overload and has become the more appropriate tool for multiple areas such as psychology, mathematics, computer science, etc. [ 80 ]. However, current recommender systems face a variety of challenges which are stated as follows, and discussed below:

Deployment challenges such as cold start, scalability, sparsity, etc. are already discussed in Sect. 3.

Challenges faced when employing different recommender algorithms for different applications.

Challenges in collecting implicit user data

Challenges in handling real-time user feedback.

Challenges faced in choosing the correct implementation techniques.

Challenges faced in measuring system performance.

Challenges in implementing recommender system for diverse applications.

Numerous recommender algorithms have been proposed on novel emerging dimensions which focus on addressing the existing limitations of recommender systems. A good recommender system must increase the recommendation quality based on user preferences. However, a specific recommender algorithm is not always guaranteed to perform equally for different applications. This encourages the possibility of employing different recommender algorithms for different applications, which brings along a lot of challenges. There is a need for more research to alleviate these challenges. Also, there is a large scope of research in recommender applications that incorporate information from different interactive online sites like Facebook, Twitter, shopping sites, etc. Some other areas for emerging research may be in the fields of knowledge-based recommender systems, methods for seamlessly processing implicit user data and handling real-time user feedback to recommend items in a dynamic environment.

Some of the other research areas like deep learning-based recommender systems, demographic filtering, group recommenders, cross-domain techniques for recommender systems, and dimensionality reduction techniques are also further required to be studied [ 83 ]. Deep learning-based recommender systems have recently gained much popularity. Future research areas in this field can integrate the well-performing deep learning models with new variants of hybrid meta-heuristic approaches.

During this review, it was observed that even though recent recommender systems have demonstrated good performance, there is no single standardized criteria or method which could be used to evaluate the performance of all recommender systems. System performance is generally measured by different evaluation matrices which makes it difficult to compare. The application of recommender systems in real-time applications is growing. User satisfaction and personalization play a very important role in the success of such recommender systems. There is a need for some new evaluation criteria which can evaluate the level of user satisfaction in real-time. New research should focus on capturing real-time user feedback and use the information to change the recommendation process accordingly. This will aid in increasing the quality of recommendations.

Conclusion and future scope

Recommender systems have attracted the attention of researchers and academicians. In this paper, we have identified and prudently reviewed research papers on recommender systems focusing on diverse applications, which were published between 2011 and 2021. This review has gathered diverse details like different application fields, techniques used, simulation tools used, diverse applications focused, performance metrics, datasets used, system features, and challenges of different recommender systems. Further, the research gaps and challenges were put forward to explore the future research perspective on recommender systems. Overall, this paper provides a comprehensive understanding of the trend of recommender systems-related research and to provides researchers with insight and future direction on recommender systems. The results of this study have several practical and significant implications:

Based on the recent-past publication rates, we feel that the research of recommender systems will significantly grow in the future.

A large number of research papers were identified in movie recommendations, whereas health, tourism and education-related recommender systems were identified in very few numbers. This is due to the availability of movie datasets in the public domain. Therefore, it is necessary to develop datasets in other fields also.

There is no standard measure to compute the performance of recommender systems. Among 60 papers, 21 used recall, 10 used MAE, 25 used precision, 18 used F1-measure, 19 used accuracy and only 7 used RMSE to calculate system performance. Very few systems were found to excel in two or more matrices.

Java and Python (with a combined contribution of 27%) are the most common programming languages used to develop recommender systems. This is due to the availability of a large number of standard java and python libraries which aid in the development process.

Recently a large number of hybrid and optimizations techniques are being proposed for recommender systems. The performance of a recommender system can be greatly improved by applying optimization techniques.

There is a large scope of research in using neural networks and deep learning-based methods for developing recommender systems. Systems developed using these methods are found to achieve high-performance accuracy.

This research will provide a guideline for future research in the domain of recommender systems. However, this research has some limitations. Firstly, due to the limited amount of manpower and time, we have only reviewed papers published in journals focusing on computer science, management and medicine. Secondly, we have reviewed only English papers. New research may extend this study to cover other journals and non-English papers. Finally, this review was conducted based on a search on only six descriptors: “Recommender systems”, “Recommendation systems”, “Movie Recommend*”, “Music Recommend*”, “Personalized Recommend*” and “Hybrid Recommend*”. Research papers that did not include these keywords were not considered. Future research can include adding some additional descriptors and keywords for searching. This will allow extending the research to cover more diverse articles on recommender systems.

Availability of data and materials

Not applicable.

Castellano G, Fanelli AM, Torsello MA. NEWER: A system for neuro-fuzzy web recommendation. Appl Soft Comput. 2011;11:793–806.

Article   Google Scholar  

Crespo RG, Martínez OS, Lovelle JMC, García-Bustelo BCP, Gayo JEL, Pablos PO. Recommendation system based on user interaction data applied to intelligent electronic books. Computers Hum Behavior. 2011;27:1445–9.

Lin FC, Yu HW, Hsu CH, Weng TC. Recommendation system for localized products in vending machines. Expert Syst Appl. 2011;38:9129–38.

Wang SL, Wu CY. Application of context-aware and personalized recommendation to implement an adaptive ubiquitous learning system. Expert Syst Appl. 2011;38:10831–8.

García-Crespo Á, López-Cuadrado JL, Colomo-Palacios R, González-Carrasco I, Ruiz-Mezcua B. Sem-Fit: A semantic based expert system to provide recommendations in the tourism domain. Expert Syst Appl. 2011;38:13310–9.

Dong H, Hussain FK, Chang E. A service concept recommendation system for enhancing the dependability of semantic service matchmakers in the service ecosystem environment. J Netw Comput Appl. 2011;34:619–31.

Li M, Liu L, Li CB. An approach to expert recommendation based on fuzzy linguistic method and fuzzy text classification in knowledge management systems. Expert Syst Appl. 2011;38:8586–96.

Lorenzi F, Bazzan ALC, Abel M, Ricci F. Improving recommendations through an assumption-based multiagent approach: An application in the tourism domain. Expert Syst Appl. 2011;38:14703–14.

Huang Z, Lu X, Duan H. Context-aware recommendation using rough set model and collaborative filtering. Artif Intell Rev. 2011;35:85–99.

Chen RC, Huang YH, Bau CT, Chen SM. A recommendation system based on domain ontology and SWRL for anti-diabetic drugs selection. Expert Syst Appl. 2012;39:3995–4006.

Mohanraj V, Chandrasekaran M, Senthilkumar J, Arumugam S, Suresh Y. Ontology driven bee’s foraging approach based self-adaptive online recommendation system. J Syst Softw. 2012;85:2439–50.

Hsu CC, Chen HC, Huang KK, Huang YM. A personalized auxiliary material recommendation system based on learning style on facebook applying an artificial bee colony algorithm. Comput Math Appl. 2012;64:1506–13.

Gemmell J, Schimoler T, Mobasher B, Burke R. Resource recommendation in social annotation systems: A linear-weighted hybrid approach. J Comput Syst Sci. 2012;78:1160–74.

Article   MathSciNet   Google Scholar  

Choi K, Yoo D, Kim G, Suh Y. A hybrid online-product recommendation system: Combining implicit rating-based collaborative filtering and sequential pattern analysis. Electron Commer Res Appl. 2012;11:309–17.

Garibaldi JM, Zhou SM, Wang XY, John RI, Ellis IO. Incorporation of expert variability into breast cancer treatment recommendation in designing clinical protocol guided fuzzy rule system models. J Biomed Inform. 2012;45:447–59.

Salehi M, Kmalabadi IN. A hybrid attribute–based recommender system for e–learning material recommendation. IERI Procedia. 2012;2:565–70.

Aher SB, Lobo LMRJ. Combination of machine learning algorithms for recommendation of courses in e-learning System based on historical data. Knowl-Based Syst. 2013;51:1–14.

Kardan AA, Ebrahimi M. A novel approach to hybrid recommendation systems based on association rules mining for content recommendation in asynchronous discussion groups. Inf Sci. 2013;219:93–110.

Chang JH, Lai CF, Wang MS, Wu TY. A cloud-based intelligent TV program recommendation system. Comput Electr Eng. 2013;39:2379–99.

Lucas JP, Luz N, Moreno MN, Anacleto R, Figueiredo AA, Martins C. A hybrid recommendation approach for a tourism system. Expert Syst Appl. 2013;40:3532–50.

Niu J, Zhu L, Zhao X, Li H. Affivir: An affect-based Internet video recommendation system. Neurocomputing. 2013;120:422–33.

Liu L, Xu J, Liao SS, Chen H. A real-time personalized route recommendation system for self-drive tourists based on vehicle to vehicle communication. Expert Syst Appl. 2014;41:3409–17.

Bakshi S, Jagadev AK, Dehuri S, Wang GN. Enhancing scalability and accuracy of recommendation systems using unsupervised learning and particle swarm optimization. Appl Soft Comput. 2014;15:21–9.

Kim Y, Shim K. TWILITE: A recommendation system for twitter using a probabilistic model based on latent Dirichlet allocation. Inf Syst. 2014;42:59–77.

Wang Z, Yu X, Feng N, Wang Z. An improved collaborative movie recommendation system using computational intelligence. J Vis Lang Comput. 2014;25:667–75.

Kolomvatsos K, Anagnostopoulos C, Hadjiefthymiades S. An efficient recommendation system based on the optimal stopping theory. Expert Syst Appl. 2014;41:6796–806.

Gottschlich J, Hinz O. A decision support system for stock investment recommendations using collective wisdom. Decis Support Syst. 2014;59:52–62.

Torshizi AD, Zarandi MHF, Torshizi GD, Eghbali K. A hybrid fuzzy-ontology based intelligent system to determine level of severity and treatment recommendation for benign prostatic hyperplasia. Comput Methods Programs Biomed. 2014;113:301–13.

Zahálka J, Rudinac S, Worring M. Interactive multimodal learning for venue recommendation. IEEE Trans Multimedia. 2015;17:2235–44.

Sankar CP, Vidyaraj R, Kumar KS. Trust based stock recommendation system – a social network analysis approach. Procedia Computer Sci. 2015;46:299–305.

Chen MH, Teng CH, Chang PC. Applying artificial immune systems to collaborative filtering for movie recommendation. Adv Eng Inform. 2015;29:830–9.

Wu H, Pei Y, Li B, Kang Z, Liu X, Li H. Item recommendation in collaborative tagging systems via heuristic data fusion. Knowl-Based Syst. 2015;75:124–40.

Yeh DY, Cheng CH. Recommendation system for popular tourist attractions in Taiwan using delphi panel and repertory grid techniques. Tour Manage. 2015;46:164–76.

Liao SH, Chang HK. A rough set-based association rule approach for a recommendation system for online consumers. Inf Process Manage. 2016;52:1142–60.

Li H, Cui J, Shen B, Ma J. An intelligent movie recommendation system through group-level sentiment analysis in microblogs. Neurocomputing. 2016;210:164–73.

Wu H, Yue K, Pei Y, Li B, Zhao Y, Dong F. Collaborative topic regression with social trust ensemble for recommendation in social media systems. Knowl-Based Syst. 2016;97:111–22.

Adeniyi DA, Wei Z, Yongquan Y. Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Appl Computing Inform. 2016;12:90–108.

Rawat YS, Kankanhalli MS. ClickSmart: A context-aware viewpoint recommendation system for mobile photography. IEEE Trans Circuits Syst Video Technol. 2017;27:149–58.

Yang S, Korayem M, Aljadda K, Grainger T, Natarajan S. Combining content-based and collaborative filtering for job recommendation system: A cost-sensitive Statistical Relational Learning approach. Knowl-Based Syst. 2017;136:37–45.

Lee WP, Chen CT, Huang JY, Liang JY. A smartphone-based activity-aware system for music streaming recommendation. Knowl-Based Syst. 2017;131:70–82.

Wei J, He J, Chen K, Zhou Y, Tang Z. Collaborative filtering and deep learning based recommendation system for cold start items. Expert Syst Appl. 2017;69:29–39.

Li C, Wang Z, Cao S, He L. WLRRS: A new recommendation system based on weighted linear regression models. Comput Electr Eng. 2018;66:40–7.

Mezei J, Nikou S. Fuzzy optimization to improve mobile health and wellness recommendation systems. Knowl-Based Syst. 2018;142:108–16.

Ayata D, Yaslan Y, Kamasak ME. Emotion based music recommendation system using wearable physiological sensors. IEEE Trans Consum Electron. 2018;64:196–203.

Zhao Z, Yang Q, Lu H, Weninger T. Social-aware movie recommendation via multimodal network learning. IEEE Trans Multimedia. 2018;20:430–40.

Hammou BA, Lahcen AA, Mouline S. An effective distributed predictive model with matrix factorization and random forest for big data recommendation systems. Expert Syst Appl. 2019;137:253–65.

Zhao J, Geng X, Zhou J, Sun Q, Xiao Y, Zhang Z, Fu Z. Attribute mapping and autoencoder neural network based matrix factorization initialization for recommendation systems. Knowl-Based Syst. 2019;166:132–9.

Bhaskaran S, Santhi B. An efficient personalized trust based hybrid recommendation (TBHR) strategy for e-learning system in cloud computing. Clust Comput. 2019;22:1137–49.

Han Y, Han Z, Wu J, Yu Y, Gao S, Hua D, Yang A. Artificial intelligence recommendation system of cancer rehabilitation scheme based on IoT technology. IEEE Access. 2020;8:44924–35.

Kang S, Jeong C, Chung K. Tree-based real-time advertisement recommendation system in online broadcasting. IEEE Access. 2020;8:192693–702.

Ullah F, Zhang B, Khan RU. Image-based service recommendation system: A JPEG-coefficient RFs approach. IEEE Access. 2020;8:3308–18.

Cai X, Hu Z, Zhao P, Zhang W, Chen J. A hybrid recommendation system with many-objective evolutionary algorithm. Expert Syst Appl. 2020. https://doi.org/10.1016/j.eswa.2020.113648 .

Esteban A, Zafra A, Romero C. Helping university students to choose elective courses by using a hybrid multi-criteria recommendation system with genetic optimization. Knowledge-Based Syst. 2020;194:105385.

Mondal S, Basu A, Mukherjee N. Building a trust-based doctor recommendation system on top of multilayer graph database. J Biomed Inform. 2020;110:103549.

Dhelim S, Ning H, Aung N, Huang R, Ma J. Personality-aware product recommendation system based on user interests mining and metapath discovery. IEEE Trans Comput Soc Syst. 2021;8:86–98.

Bhalse N, Thakur R. Algorithm for movie recommendation system using collaborative filtering. Materials Today: Proceedings. 2021. https://doi.org/10.1016/j.matpr.2021.01.235 .

Ke G, Du HL, Chen YC. Cross-platform dynamic goods recommendation system based on reinforcement learning and social networks. Appl Soft Computing. 2021;104:107213.

Chen X, Liu D, Xiong Z, Zha ZJ. Learning and fusing multiple user interest representations for micro-video and movie recommendations. IEEE Trans Multimedia. 2021;23:484–96.

Afolabi AO, Toivanen P. Integration of recommendation systems into connected health for effective management of chronic diseases. IEEE Access. 2019;7:49201–11.

He M, Wang B, Du X. HI2Rec: Exploring knowledge in heterogeneous information for movie recommendation. IEEE Access. 2019;7:30276–84.

Bobadilla J, Serradilla F, Hernando A. Collaborative filtering adapted to recommender systems of e-learning. Knowl-Based Syst. 2009;22:261–5.

Russell S, Yoon V. Applications of wavelet data reduction in a recommender system. Expert Syst Appl. 2008;34:2316–25.

Campos LM, Fernández-Luna JM, Huete JF. A collaborative recommender system based on probabilistic inference from fuzzy observations. Fuzzy Sets Syst. 2008;159:1554–76.

Funk M, Rozinat A, Karapanos E, Medeiros AKA, Koca A. In situ evaluation of recommender systems: Framework and instrumentation. Int J Hum Comput Stud. 2010;68:525–47.

Porcel C, Moreno JM, Herrera-Viedma E. A multi-disciplinar recommender system to advice research resources in University Digital Libraries. Expert Syst Appl. 2009;36:12520–8.

Bobadilla J, Serradilla F, Bernal J. A new collaborative filtering metric that improves the behavior of recommender systems. Knowl-Based Syst. 2010;23:520–8.

Ochi P, Rao S, Takayama L, Nass C. Predictors of user perceptions of web recommender systems: How the basis for generating experience and search product recommendations affects user responses. Int J Hum Comput Stud. 2010;68:472–82.

Olmo FH, Gaudioso E. Evaluation of recommender systems: A new approach. Expert Syst Appl. 2008;35:790–804.

Zhen L, Huang GQ, Jiang Z. An inner-enterprise knowledge recommender system. Expert Syst Appl. 2010;37:1703–12.

Göksedef M, Gündüz-Öğüdücü S. Combination of web page recommender systems. Expert Syst Appl. 2010;37(4):2911–22.

Shao B, Wang D, Li T, Ogihara M. Music recommendation based on acoustic features and user access patterns. IEEE Trans Audio Speech Lang Process. 2009;17:1602–11.

Shin C, Woo W. Socially aware tv program recommender for multiple viewers. IEEE Trans Consum Electron. 2009;55:927–32.

Lopez-Carmona MA, Marsa-Maestre I, Perez JRV, Alcazar BA. Anegsys: An automated negotiation based recommender system for local e-marketplaces. IEEE Lat Am Trans. 2007;5:409–16.

Yap G, Tan A, Pang H. Discovering and exploiting causal dependencies for robust mobile context-aware recommenders. IEEE Trans Knowl Data Eng. 2007;19:977–92.

Meo PD, Quattrone G, Terracina G, Ursino D. An XML-based multiagent system for supporting online recruitment services. IEEE Trans Syst Man Cybern. 2007;37:464–80.

Khusro S, Ali Z, Ullah I. Recommender systems: Issues, challenges, and research opportunities. Inform Sci Appl. 2016. https://doi.org/10.1007/978-981-10-0557-2_112 .

Blanco-Fernandez Y, Pazos-Arias JJ, Gil-Solla A, Ramos-Cabrer M, Lopez-Nores M. Providing entertainment by content-based filtering and semantic reasoning in intelligent recommender systems. IEEE Trans Consum Electron. 2008;54:727–35.

Isinkaye FO, Folajimi YO, Ojokoh BA. Recommendation systems: Principles, methods and evaluation. Egyptian Inform J. 2015;16:261–73.

Yoshii K, Goto M, Komatani K, Ogata T, Okuno HG. An efficient hybrid music recommender system using an incrementally trainable probabilistic generative model. IEEE Trans Audio Speech Lang Process. 2008;16:435–47.

Wei YZ, Moreau L, Jennings NR. Learning users’ interests by quality classification in market-based recommender systems. IEEE Trans Knowl Data Eng. 2005;17:1678–88.

Bjelica M. Towards TV recommender system: experiments with user modeling. IEEE Trans Consum Electron. 2010;56:1763–9.

Setten MV, Veenstra M, Nijholt A, Dijk BV. Goal-based structuring in recommender systems. Interact Comput. 2006;18:432–56.

Adomavicius G, Tuzhilin A. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng. 2005;17:734–49.

Symeonidis P, Nanopoulos A, Manolopoulos Y. Providing justifications in recommender systems. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans. 2009;38:1262–72.

Zhan J, Hsieh C, Wang I, Hsu T, Liau C, Wang D. Privacy preserving collaborative recommender systems. IEEE Trans Syst Man Cybernet. 2010;40:472–6.

Burke R. Hybrid recommender systems: survey and experiments. User Model User-Adap Inter. 2002;12:331–70.

Article   MATH   Google Scholar  

Gunes I, Kaleli C, Bilge A, Polat H. Shilling attacks against recommender systems: a comprehensive survey. Artif Intell Rev. 2012;42:767–99.

Park DH, Kim HK, Choi IY, Kim JK. A literature review and classification of recommender systems research. Expert Syst Appl. 2012;39:10059–72.

Download references

Acknowledgements

We thank our colleagues from Assam Down Town University who provided insight and expertise that greatly assisted this research, although they may not agree with all the interpretations and conclusions of this paper.

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and affiliations.

Department of Computer Science & Engineering, Assam Down Town University, Panikhaiti, Guwahati, 781026, Assam, India

Deepjyoti Roy & Mala Dutta

You can also search for this author in PubMed   Google Scholar

Contributions

DR carried out the review study and analysis of the existing algorithms in the literature. MD has been involved in drafting the manuscript or revising it critically for important intellectual content. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Deepjyoti Roy .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Roy, D., Dutta, M. A systematic review and research perspective on recommender systems. J Big Data 9 , 59 (2022). https://doi.org/10.1186/s40537-022-00592-5

Download citation

Received : 04 October 2021

Accepted : 28 March 2022

Published : 03 May 2022

DOI : https://doi.org/10.1186/s40537-022-00592-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Recommender system
  • Machine learning
  • Content-based filtering
  • Collaborative filtering
  • Deep learning

recommender system research topics

Help | Advanced Search

Computer Science > Information Retrieval

Title: recommender systems: a primer.

Abstract: Personalized recommendations have become a common feature of modern online services, including most major e-commerce sites, media platforms and social networks. Today, due to their high practical relevance, research in the area of recommender systems is flourishing more than ever. However, with the new application scenarios of recommender systems that we observe today, constantly new challenges arise as well, both in terms of algorithmic requirements and with respect to the evaluation of such systems. In this paper, we first provide an overview of the traditional formulation of the recommendation problem. We then review the classical algorithmic paradigms for item retrieval and ranking and elaborate how such systems can be evaluated. Afterwards, we discuss a number of recent developments in recommender systems research, including research on session-based recommendation, biases in recommender systems, and questions regarding the impact and value of recommender systems in practice.

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Microsoft Research Lab – Asia

Personalized recommendation systems: five hot research topics you must know, share this page.

By  Xing Xie , Jianxun Lian, Zheng Liu, Xiting Wang , Fangzhao Wu , Hongwei Wang, and Zhongxia Chen

Information overload is a big challenge for online users. It is particularly an issue when a user has to quickly and accurately identify a resource from an exponentially growing set of information and products. It is also difficult for merchants to present the appropriate products to users in a timely manner. The emergence of the recommendation system has somewhat relieved this challenge.

A recommendation system is a type of information filter, which can learn users’ interests and hobbies according to their profile or historical behaviors, and then predict their ratings or preferences for a given item. It changes the way businesses communicate with users and strengthens the interactivity between them.

The statistics from studies by McKinsey and Tech Emergence bear this out: this type of recommendation system brought Amazon 35 percent of its revenue and 23.7 percent growth to BestBuy. Up to 75 percent of video consumption on Netflix comes from the recommendation system and 60 percent of views on YouTube come from their recommendation feature.

Therefore, how to build an effective recommendation system is of profound significance. What should the recommendation system of the future look like? Our research focuses on several aspects, such as the application of deep learning, knowledge graph, reinforcement learning, user profiling, and explainable recommendations.

Research topic 1: recommendation system and deep learning

In recent years, deep learning technology has achieved great success in areas of speech recognition, computer vision, and natural language processing; and recommendation systems can benefit from these breakthroughs. Today, deep learning-based recommendation algorithms have made remarkable progress in the following three aspects:

Powerful representation learning capability. One of the advantages of deep neural networks is their powerful capability in representation learning. Therefore, a direct application of deep learning for recommender systems is to learn meaningful latent factors from complex data sources.

Deep collaborative filtering. The conventional matrix factorization model can easily be interpreted as a simple neural network. In fact, we can incorporate additional non-linear units to further improve its performance. In one recent work (Neural Collaborative Filtering , WWW 2017), researchers propose an enhanced matrix factorization model. This model addresses the problem where the importance of different dimensions cannot be distinguished by the dot product between two vectors. This model requires an additional multi-layer perceptual module to carry out extra non-linear operations. Other examples of deep collaborative filtering include deep learning models such as auto-encoders, convolutional neural networks, memory networks, and attention mechanisms, all combined with the traditional collaborative filtering models and achieve remarkable improvements.

Deep interaction between features. In industrial applications, highly diverse and heterogeneous data  are usually exploited and fused for achieving a better predictive performance. Traditional approaches to combining features are not scalable, are costly, and can’t be extended to new cases. Researchers are using neural networks to train high-order crossing features automatically. Representative works include Wide & Deep, PNN, DeepFM, DCN, and our recent proposed xDeepFM model (xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems, KDD 2018).

Deep learning can be further applied to a great number of potential recommendation scenarios. Here is a brief discussion that highlights promising future directions.

  • Efficiency and scalability. For industry-grade recommendation systems, people should not only consider a model’s accuracy, but also its running efficiency and maintainability. A recommendation system needs to return the result in real time, should be easily deployed, and should support regular incremental updates. Because the computational complexity of sophisticated neural networks is huge, making them more efficient and maintainable on super-large-scale platforms is imperative.
  • Diversity data fusion. In the real-world, data for users and items is often complex and diverse. For example, the forms of data for an item can be text, images, and categorical properties. Behavioral data for users may also come from many fields, such as social networks, search engines, and online news apps. Behavioral data for users can also be diverse. For example, in e-commerce websites, their behavior may include searching, browsing, clicking, collecting, and purchasing. Moreover, for such different dimensions, the distributions of users or items are widely varied. For example, some items have only the attribute of text, while some others have only the attribute of image. Meanwhile, data volume might vary a lot, given different kinds of behaviors. For example, the volume of user clicks is often much larger than that of user purchases. Apparently, a single, homogeneous model cannot effectively handle such diverse data and effectively integrating complex data is technically difficult.
  • The capture of users’ long- and short-term preferences. Users’ preferences can be roughly divided into long term and short term. Long-term preference refers to user’s natural interests which will show up eventually. Short-term preference refers to a user’s current or immediate interests and is prone to vanishing soon. Currently, some popular methods combine recursive neural networks with deep collaborative filtering technology as a method to integrate both short-term and long-term interests. Learning how to effectively incorporate the contextual states of users with their long-term and short-term interests is also a hot research topic.

Research topic 2: recommendation system and knowledge graph

In most recommendation scenarios, items may contain rich knowledge information. The network structure that captures such knowledge is referred to as the knowledge graph. The knowledge graph greatly expands the amount of information of each item and strengthens the connection between them, providing abundant reference values for a recommendation engine, which leads to additional diversity and explainability of the recommendation result (Figure 1).

recommender system research topics

Figure 1. Relevance discovery of news reports, based on a knowledge graph

Compared to a social network, a knowledge graph is a heterogeneous network; therefore, more sophisticated recommendation algorithms are required. In recent years, network representation learning has become one of the most popular research areas for addressing this. The introduction of network representation will facilitate the learning capability of a recommendation system, thus contributing to a better recommendation accuracy and user experience.

There are two different ways of introducing a knowledge graph to a recommendation system.

The feature-based approach . The key technique for this approach is knowledge graph embedding (KGE). In general, a knowledge graph is a heterogeneous network composed by tuples in the form of . With KGE, compact real vectors can be generated for the representation of entities and relationships, which are originally high-dimensional and heterogeneous. Such representations can be naturally combined, and interact, with the recommendation system.

Under such a general framework, the learning of a recommendation system and KGE become two relative tasks. However, according to the differences of learning order, there are two combination strategies.

1) The sequential learning strategy, where features of the knowledge graph are learned first, and then applied to the recommendation system.

2) The alternating learning strategy, where training the KGE and the recommendation system become two related tasks. A multi-task learning framework is usually designed for it, where the learning of KGE significantly contributes to that of the recommendation system.

The structure-based approach . This approach uses the structural feature of the knowledge graph more directly. To be specific, for each specific entity, we may use the Breath First Search algorithm to get the recommendation results from the multi-hopped associated entities in the knowledge graph. Based on different ways of using the associated entities, the corresponding techniques can be divided into two categories: the one through outward propagation and the one through inward aggregation.

1) The outward propagation simulates the process when users’ interests propagate in the knowledge graph. As one of the representations of outward propagation, our recent work (RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems, CIKM 2018) aggregates the historical interests of users as seeds, and then propagates them iteratively along with the knowledge graph.

2) The inward aggregation aggregates an entity’s neighborhood features, while learning the knowledge graph. With cohesion computations, the entities incorporate the structural information of their neighborhood, whose weights are determined by the connectivity and specific users. Therefore, the semantic information of the knowledge graph and the individual interests of users can be captured simultaneously.

Opportunities and challenges of knowledge graph-based recommendation systems

The combination of a recommendation system with a knowledge graph is becoming one of the most popular topics in academia. However, the existing methods are limited in various aspects and there is still much room for improvement. First, most of the existing methods are based on statistical learning models, which extract statistical information from the network and make inferences accordingly. A difficult, but more promising direction, is integrating the graph reasoning with the recommendation system. Second, it is also interesting to design algorithms that produce competitive performance with economic running cost. Existing methods do not pay much attention to computational platforms, nor do they consider much about the coordination with systems and hardware, and so this opens up a third area of potential improvement. How to jointly design and optimize the upper-level algorithms and the underlying architecture will be another crucial issue. Finally, the existing methods are always static; whereas, in reality, the knowledge graph evolves over time. As a result, it is also important to think about how to cope with such temporal evolvement and make use of it for better recommendations.

Research topic 3: recommendation system and reinforcement learning

Empowered by the latest techniques on deep learning and the knowledge graph, recommendation systems have been increasing in performance. However, most of the existing recommendation systems are formulated in a one-way fashion: given sufficiently collected historical data, a specific type of supervised learning model (such as linear regression or factorization machine), is trained to capture the underlying preferences of users over difference kinds of items. Once deployed online, the well-trained model can identify the most attractive items for its users, thereby generating the personalized recommendation precisely. In this place, it is assumed that the behavioral characteristics of users have been fully reflected from the historical data; meanwhile, they will always remain stable over the time. As a result, a static model will be sufficient for practical usage. However, user data might be limited in practice, and characteristics of users may constantly evolve during their intensive interaction between recommendation systems. Fortunately, the user feedback generated in such a process will not only complement any insufficiency of the historical data, but also help to uncover user characteristics for the current stage. Reinforcement learning lays the technical foundation for utilizing user feedback for a recommendation system. In the following section, applications of the reinforcement learning-based recommendation system are individually discussed for both static and dynamic scenarios, according to the different behavioral characteristics of users.

Application in the static scenario

Under the static scenario, user behavior is regarded as unchanging. For these kinds of situations, some of the most notable work is the contextual multi-armed bandit, which aims to address the cold-start problem in recommendation systems. For many real-world applications, user behavior follows long-tailed distribution, that is, little behavioral data is collected for the majority of users, while only a small fraction of users offer substantial records. As a result, it is hard for the conventional recommendation algorithms to generate satisfactory performances, due to data sparsity.

A straightforward idea to deal with the cold-start problem is that of “active exploration”—instead of accumulating user data in a passive way, the recommendation system actively detects the behavioral patterns of users through continuous trials, such that the collected data will be sufficient to guarantee the effectiveness of recommendations. Unfortunately, such a simple approach will inevitably incur tremendous exploration and user time cost, which makes it infeasible in practice. With inspirations from the multi-armed bandit problem, we can strategically explore recommendations based on user feedback, making it much more competitive in terms of cost-effectiveness.   The multi-armed bandit problem has been intensively studied, where all the proposed algorithms share the common principle of jointly considering both utility and cumulative trials. Higher utility indicates lower exploration costs, while lower cumulative trials suggest higher uncertainties. As a result, we can design specific aggregation mechanisms which prioritize the items with high recommendation utility and uncertainties.

Application in the dynamic scenario

One of the inherent assumptions of the multi-armed bandit is that the user’s underlying character will always remain stable. However, for many real-world scenarios, the behavioral patterns of users evolve constantly. As a result, it is necessary to conduct precise estimation about such evolution, and then optimize the recommendation strategy on top of it. Particularly, an ideal recommendation system should satisfy the following two requirements: the recommendations should be based on the constantly evolving feedback data of users and specific types of long-term objectives need to be optimized over the whole interactive process.

Under the framework of reinforcement learning, the recommendation system is regarded as an agent that aims to optimize the predefined long-term objective through its strategic interaction with users. User characteristics are treated as a state and specific recommendation items become actions of the agent. The behavioral data generated from the interaction is organized as experience, which records the reward and state-transition resulting from a certain action. Based on the constantly accumulated experience, the reinforcement learning algorithm produces the policy, which then guides the optimal action selection given each specific state.

Recently we applied reinforcement learning to Bing personalized new recommendation (DRN: A Deep Reinforcement Learning Framework for News Recommendation, WWW 2018). Thanks to the capability of sequential decision making and long-term objective optimization, reinforcement learning algorithms can greatly enhance a recommendation system’s capability for both user perception and personalization.

Opportunities and challenges of the reinforcement recommendation systems

We expect the research community to be working on many technical advancements in reinforcement learning-based recommendations. For one, helping reinforcement learning algorithms to adapt to limited data sets. Today, mainstream deep reinforcement learning algorithms try to avoid modeling the environment, and instead try to learn policy directly from the user experience (model-free). However, such a strategy requires a considerable amount of empirical data that is typically limited in scale and sparse in reward. How to fully take advantage of limited user interactions will be one of the major directions for the algorithm’s further improvement.

Second, a policy is usually learned independently for each individual recommendation scenario and policies from different scenarios are typically different. As a result, each policy learning process requires a considerable data collection expense. Meanwhile, due to the lack of generalizability, it is difficult for existing algorithms to adapt their policies in response to newly emerging situations. Given the above challenges, it is necessary to come up with a highly generalized strategy that breaks down the barriers between different recommendation scenarios and increases its robustness in the changing environment.

Research topic 4: user profiling in recommendation systems

One of the important tasks of building a recommendation system is analyzing the characteristics of users’ interests. This is often referred to as user profiling.

User profiling refers to the extraction of user labels on different attributes such as age, gender, occupation, income, and interests. Complete and accurate attribute labels will effectively reveal the inherent characteristics of users, thus greatly facilitating accurate personalized recommendations.

Current status and challenges in user profiling

Currently, mainstream approaches to user profiling are based on machine learning, especially supervised learning. These methods extract features from user data, which serve as the user’s representations. User data, together with its annotations, are used to train the prediction functions of a user’s profile, from which we can infer the profiles of many users whose profile labels are unknown.

Although current user profiling methods have achieved good results and are widely applied to real-world recommendation systems, a number of challenges remain to be addressed.

First, most of the existing methods are based on manually extracted discrete features. These features cannot capture contextual information about the user, which restricts their representational capacity.

Second, existing user profiling methods are usually based on simple linear regression or classification models, so they can neither automatically learn the high-level abstracted features from user data, nor model the interaction between features. In addition, existing methods for user profiling are often based on homogeneous data from one source, which is not rich enough for effective representation of users. In fact, user data many come from different sources, which can help build higher-quality user profiling.

Finally, few of the existing methods for user profiling take time into their consideration, so it is difficult to reflect the dynamic changes of user attributes.

Deep, universal, and dynamic user profiling from multi-source heterogeneous data

In response to the above challenges, researchers are working on the following user profiling directions.

1) Building user representation models with stronger representation capability. With the development of deep learning, neural networks can automatically extract deep and informative features from a user’s original data. Based on the deep neural network, we can construct representations with the full use of user data, thus effectively improving the accuracy of user profiling.  Recently we developed a hierarchical user representation with attention (Neural Demographic Prediction using Search Query, WSDM 2019), which is shown to be effective in inferring user demographics based their query logs.

2) Conducting user profiling on multi-source and heterogeneous data. The data generated by users is usually rich in form, exhibiting different structures (such as unstructured text data from social media and structured purchase records from e-commerce websites), and represented in different modes (such as text and images), which is a challenge for user profiling. Designing a deep information fusion model to employ user data from different sources, structures, and modes for user profiling is an important direction in the future. Collaborative learning and multi-channel deep neural networks can be the potential solutions for relevant problems.

3) Sharing the user data across different platforms and protecting user privacy. Different platforms record different types of user data. For example, search engines have users’ search logs and web browsing records are tracked by search engines, while the e-commerce platforms have users’ commodity browsing and shopping behaviors. User data from different platforms are of great value for user profiling, providing complementary information and helping to build richer and more comprehensive user representations. How to make full use of user information from different platforms, without explicit transferring or sharing of private user data, is an important issue to work on.

4) Constructing a unified user representation model for user profiling. Existing user profiling methods often train an individual model for each user attribute. However, in practice, the number of user attributes can be huge. Therefore, existing user profiling methods often involve a great deal of model training and storage. In addition, the underlying connections between different user attributes are yet to be fully explored. Can we find a way to build a unified user representation from the heterogeneous data, such that the model can comprehensively capture the information of a user from different dimensions? The multi-task learning-based learning algorithms and user-embedding technologies provide promising solutions for such a problem.

Research topic 5: explainable recommendation system

A recommendation system whose results can be easily explained and that uses examples will be more likely to capture the user’s attention. Current research finds that such a system will not only improve the system’s transparency, but also increase the user’s trust and acceptance of the system, thus facilitating the user’s selection of the recommended products and improving user satisfaction. As a result, designing an explainable recommendation system will be our ultimate goal.

Opportunities and challenges of the explainable recommendation system

As a comparatively fresh issue in the field of recommendation systems, many aspects of explainable recommendations are worthy of exploration. We are presently considering future research directions for the following three aspects.

Enhancement of the explainability by knowledge graph. As an external knowledge carrier with high readability, the knowledge graph brings a great opportunity to improve the explanation of the algorithm. The existing recommendation explanations are usually limited to one of three forms: item-mediated, user-mediated, or feature-mediated. We expect to use a knowledge graph to build connections between these three forms and flexibly choose the most suitable one for a user’s explanation for a specific situation. In addition, we may also use concept graphs, such as Microsoft Concept Graph, to establish the deep readable structure between features and thus equip the current deep neural network with both readability and accuracy.

In an era where artificial intelligence is becoming more and more important, the combination of symbolic knowledge from a knowledge graph and deep learning is a promising research direction.

Model-agnostic explainable recommendation framework. At present, most explainable recommendation systems are designed for specific recommendation models with limited extensibility. For emerging recommendation models, such as the complex and mixed models using deep neural networks, the explainable capability is usually insufficient. Once there is a model-agnostic explainable recommendation framework, we can avoid designing the explanation schemes for different recommendation systems—thus improving their extensibility. One recent work proposed by our group (A Reinforcement Learning Framework for Explainable Recommendation, ICDM 2018) makes a preliminary effort in this direction. In this work (Figure 2), a reinforcement learning framework is designed for the recommendation model’s explanation, and it exhibits superior extendibility, explainability, and explanation quality.

Figure 2. Framework of the model-agnostic reinforcement explainable recommendation

Figure 2. Framework of the model-agnostic reinforcement explainable recommendation

Conversational explainability based on generative models. Current explanations for recommendations tend to be inflexible and monotonous (for example, explanations are preset to be user-mediated). Although current systems can generate useful explanations, they are still too rigid in terms of how they communicate. If the recommendation system can create some natural and emotional words via generative models, the recommendation can be explained flexibly when chatting with users. In our collaboration with Microsoft Xiaolce, we have made efforts to generate explainable music recommendations through chatting.

A recommendation system of high efficiency and extendibility is imperative in the future; meanwhile, we expect advances in the incorporation of heterogeneous data and the perception of the long- and short-term interests of users. The exploitation of the knowledge graph, the design of generalized learning mechanisms, and making full use of interactive data will be the most important research directions in the coming few years. We also need to pay close attention to explainability, which will require integrating the knowledge graph, the collaboration with reinforcement learning, the design of model-agnostic algorithms, and the incorporation of generative models. Last, but not least, user privacy should never be ignored as we move toward data-sharing mechanisms and a unified user representation across different platforms.

We believe that personalized recommendation systems will continue to develop in various directions, including effectiveness, diversity, computational efficiency, and explainability; and that this will ultimately address the problem of information overload.

  • Follow on Twitter
  • Like on Facebook
  • Follow on LinkedIn
  • Subscribe on Youtube
  • Follow on Instagram
  • Subscribe to our RSS feed

Share this page:

  • Share on Twitter
  • Share on Facebook
  • Share on LinkedIn
  • Share on Reddit

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

A systematic literature review on educational recommender systems for teaching and learning: research trends, limitations and opportunities

Felipe leite da silva.

1 Centro de Estudos Interdisciplinares em Novas Tecnologias da Educação, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul Brazil

Bruna Kin Slodkowski

Ketia kellen araújo da silva, sílvio césar cazella.

2 Departamento de Ciências Exatas e Sociais Aplicadas, Universidade Federal de Ciências da Saúde de Porto Alegre, Porto Alegre, Rio Grande do Sul Brazil

Associated Data

The datasets generated during the current study correspond to the papers identified through the systematic literature review and the quality evaluation results (refer to Section  3.4 in paper). They are available from the corresponding author on reasonable request.

Recommender systems have become one of the main tools for personalized content filtering in the educational domain. Those who support teaching and learning activities, particularly, have gained increasing attention in the past years. This growing interest has motivated the emergence of new approaches and models in the field, in spite of it, there is a gap in literature about the current trends on how recommendations have been produced, how recommenders have been evaluated as well as what are the research limitations and opportunities for advancement in the field. In this regard, this paper reports the main findings of a systematic literature review covering these four dimensions. The study is based on the analysis of a set of primary studies ( N  = 16 out of 756, published from 2015 to 2020) included according to defined criteria. Results indicate that the hybrid approach has been the leading strategy for recommendation production. Concerning the purpose of the evaluation, the recommenders were evaluated mainly regarding the quality of accuracy and a reduced number of studies were found that investigated their pedagogical effectiveness. This evidence points to a potential research opportunity for the development of multidimensional evaluation frameworks that effectively support the verification of the impact of recommendations on the teaching and learning process. Also, we identify and discuss main limitations to clarify current difficulties that demand attention for future research.

Supplementary Information

The online version contains supplementary material available at 10.1007/s10639-022-11341-9.

Introduction

Digital technologies are increasingly integrated into different application domains. Particularly in education, there is a vast interest in using them as mediators of the teaching and learning process. In such a task, the computational apparatus serves as an instrument to support human knowledge acquisition from different educational methodologies and pedagogical practices (Becker, 1993 ).

In this sense, Educational Recommender Systems (ERS) play an important role for both educators and students (Maria et al., 2019 ). For instructors, these systems can contribute to their pedagogical practices through recommendations that improve their planning and assist in educational resources filtering. As for the learners, through preferences and educational constraints recognition, recommenders can contribute for their academic performance and motivation by indicating personalized learning content (Garcia-Martinez & Hamou-Lhadj, 2013 ).

Despite the benefits, there are known issues upon the usage of the recommender system in the educational domain. One of the main challenges is to find an appropriate correspondence between the expectations of users and the recommendations (Cazella et al., 2014 ). Difficulties arise from differences in learner’s educational interests and needs (Verbert et al., 2012 ). The variety of student’s individual factors that can influence the learning process (Buder & Schwind, 2012 ) is one of the challenging matters that makes it complex to be overcome. On a recommender standpoint, this reflects an input diversity with potential to tune recommendations for users.

In another perspective, from a technological and artificial intelligence standpoint, the ERS are likely to suffer from already known issues noted on the general-purpose ones, such as the cold start and data sparsity problems (Garcia-Martinez & Hamou-Lhadj, 2013 ). Furthermore, problems are related to the approach used to generate recommendations. For instance, the overspecialization is inherently associated with the way that content-based recommender systems handle data (Iaquinta et al., 2008 ; Khusro et al., 2016 ). These issues pose difficulties to design recommenders that best suit the user’s learning needs and that distance themselves from user’s dissatisfaction in the short and long term.

From an educational point of view, issues emerge on how to evaluate ERS effectiveness. A usual strategy to measure the quality of educational recommenders is to apply the traditional recommender’s evaluation methods (Erdt et al., 2015 ). This approach determines system quality based on performance properties, such as its precision and prediction accuracy. Nevertheless, in the educational domain, system effectiveness needs to take into account the students’ learning performance. This dimension brings new complexities on how to successfully evaluate ERS.

As ERS topic has gradually increased in attraction for scientific community (Zhong et al., 2019 ), extensive research have been carried out in recent years to address these issues (Manouselis et al. 2010 ; Manouselis et al., 2014 ; Tarus et al., 2018 ; George & Lal, 2019 ). ERS has become a field of application and combination of different computational techniques, such as data mining, information filtering and machine learning, among others (Tarus et al., 2018 ). This scenario indicates a diversity in the design and evaluation of recommender systems that support teaching and learning activities. Nonetheless, research is dispersed in literature and there is no recent study that encompasses the current scientific efforts in the field that reveals how such issues are addressed in current research. Reviewing evidence, and synthesizing findings of current approaches in how ERS produce recommendations, how ERS are evaluated and what are research limitations and opportunities can provide a panoramic perspective of the research topic and support practitioners and researchers for implementation and future research directions.

From the aforementioned perspective, this work aims to investigate and summarize the main trends and research opportunities on ERS topic through a Systematic Literature Review (SLR). The study was conducted based on the last six years publications, particularly, regarding to recommenders that support teaching and learning process.

Main trends referrer to recent research direction on the ERS field. They are analyzed in regard to how recommender systems produce recommendations and how they are evaluated. As mentioned above, these are significant dimensions related to current issues of the area. Specifically for the recommendation production, this paper provides a three-axis-based analysis centered on systems underlying techniques, input data and results presentation.

Additionally, research opportunities in the field of ERS as well as their main limitations are highlighted. Because current comprehension of these aspects is fragmented in literature, such an analysis can shed light for future studies.

The SLR was carried out using Kitchenham and Charters ( 2007 ) guidelines. The SLR is the main method for summarizing evidence related to a topic or a research question (Kitchenham et al., 2009 ). Kitchenham and Charters ( 2007 ) guidelines, in turn, are one of the leading orientations for reviews on information technology in education (Dermeval et al., 2020 ).

The remainder of this paper is structured as follows. In Section  2 , the related works are presented. Section  3 details the methodology used in carrying out the SLR. Section  4 covers the SLR results and related discussion. Section  5 presents the conclusion.

Related works

In the field of education, there is a growing interest in technologies that support teaching and learning activities. For this purpose, ERS are strategic solutions to provide a personalized educational experience. Research in this sense has attracted the attention of the scientific community and there has been an effort to map and summarize different aspects of the field in the last 6 years.

In Drachsler et al. ( 2015 ) a comprehensive review of technology enhanced learning recommender systems was carried out. The authors analyzed 82 papers published from 2000 to 2014 and provided an overview of the area. Different aspects were analyzed about recommenders’ approach, source of information and evaluation. Additionally, a categorization framework is presented and the study includes the classification of selected papers according to it.

Klašnja-Milićević et al. ( 2015 ) conducted a review on recommendation systems for e-learning environments. The study focuses on requirements, challenges, (dis)advantages of techniques in the design of this type of ERS. An analysis on collaborative tagging systems and their integration in e-learning platform recommenders is also discussed.

Ferreira et al. ( 2017 ) investigated particularities of research on ERS in Brazil. Papers published between 2012 and 2016 in three Brazilian scientific vehicles were analyzed. Rivera et al. ( 2018 ) presented a big picture of the ERS area through a systematic mapping. The study covered a larger set of papers and aimed to detect global characteristics in ERS research. Aiming at the same focus, however, setting different questions and repositories combination, Pinho, Barwaldt, Espíndola, Torres, Pias, Topin, Borba and Oliveira (2019) performed a systematic review on ERS. In these works, it is observed the common concern of providing insights about the systems evaluation methods and the main techniques adopted in the recommendation process.

Nascimento et al. ( 2017 ) carried out a SLR covering learning objects recommender systems based on the user’s learning styles. Learning objects metadata standards, learning style theoretical models, e-learning systems used to provide recommendations and the techniques used by the ERS were investigated.

Tarus et al ( 2018 ) and George and Lal ( 2019 ) concentrated their reviews on ontology-based ERS. Tarus et al. ( 2018 ) examined research distribution in a period from 2005 to 2014 according to their years of publication. Furthermore, the authors summarized the techniques, knowledge representation, ontology types and ontology representations covered in the papers. George and Lal ( 2019 ), in turn, update the contributions of Tarus et al. ( 2018 ), investigating papers published between 2010 and 2019. The authors also discuss how ontology-based ERS can be used to address recommender systems traditional issues, such as cold start problem and rating sparsity.

Ashraf et al. ( 2021 ) directed their attention to investigate course recommendation systems. Through a comprehensive review, the study summarized the techniques and parameters used by this type of ERS. Additionally, a taxonomy of the factors taken into account in the course recommendation process was defined. Salazar et al. ( 2021 ), on the other hand, conducted a review on affectivity-based ERS. Authors presented a macro analysis, identifying the main authors and research trends, and summarized different recommender systems aspects, such as the techniques used in affectivity analysis, the source of affectivity data collection and how to model emotions.

Khanal et al. ( 2019 ) reviewed e-learning recommendation systems based on machine learning algorithms. A total of 10 papers from two scientific vehicles and published between 2016 and 2018 were examined. The study focal point was to investigate four categories of recommenders: those based on collaborative filtering, content-based filtering, knowledge and a hybrid strategy. The dimensions analyzed were the machine learning algorithms used, the recommenders’ evaluation process, inputs and outputs characterization and recommenders’ challenges addressed.

Related works gaps and contribution of this study

The studies presented in the previous section have a diversity of scope and dimensions of analysis, however, in general, they can be classified into two distinct groups. The first, focus on specific subjects of ERS field, such as similar methods of recommendations (George & Lal, 2019 ; Khanal et al., 2019 ; Salazar et al., 2021 ; Tarus et al., 2018 ) and same kind of recommendable resources (Ashraf et al., 2021 ; Nascimento et al., 2017 ). This type of research scrutinizes the particularities of the recommenders and highlights aspects that are difficult to be identified in reviews with a broader scope. Despite that, most of the reviews concentrate on analyses of recommenders’ operational features and have limited discussion on crosswise issues, such as ERS evaluation and presentation approaches. Khanal et al. ( 2019 ), specifically, makes contributions regarding evaluation, but the analysis is limited to four types of recommender systems.

The second group is composed of wider scope reviews and include recommendation models based on a diversity of methods, inputs and outputs strategies (Drachsler et al., 2015 ; Ferreira et al., 2017 ; Klašnja-Milićević et al., 2015 ; Pinho et al., 2019 ; Rivera et al., 2018 ). Due to the very nature of systematic mappings, the research conducted by Ferreira et al. ( 2017 ) and Rivera et al. ( 2018 ) do not reach in depth some topics, for example, the data synthesized on the evaluations of the ERS are delimited to indicate only the methods used. Ferreira et al. ( 2017 ), in particular, aims to investigate only Brazilian recommendation systems, offering partial contributions to an understanding of the state of the art of the area. In Pinho et al. ( 2019 ) it is noted the same limitation of the systematic mappings. The review was reported with a restricted number of pages, making it difficult to detail the findings. On the other hand, Drachsler et al. ( 2015 ) and, Klašnja-Milićević et al. ( 2015 ) carried out comprehensive reviews that summarizes specific and macro dimensions of the area. However, the papers included in their reviews were published until 2014 and there is a gap on the visto que advances and trends in the field in the last 6 years.

Given the above, as far as the authors are aware, there is no wide scope secondary study that aggregate the research achievements on recommendation systems that support teaching and learning in recent years. Moreover, a review in this sense is necessary since personalization has become an important feature in the teaching and learning context and ERS are one of main tools to deal with different educational needs and preferences that affect individuals’ learning process.

In order to widen the frontiers of knowledge in the field of research, this review aims to contribute to the area by presenting a detailed analysis of the following dimensions: how recommendations are produced and presented, how recommender systems are evaluated and what are the studies limitations and research opportunities. Specifically, to summarize the current knowledge, a SLR was conducted based on four research questions (Section  3.1 ). The review focused on papers published from 2015 to 2020 in scientific journals. A quality assessment was performed to select the most mature systems. The data found on the investigated topics are summarized and discussed in Section  4 .

Methodology

This study is based on the SLR methodology for gathering evidences related to the research topic investigated. As stated by Kitchenham and Charters ( 2007 ) and Kitchenham et al. ( 2009 ), this method provides the means for aggregate evidences from current research prioritizing the impartiality and reproducibility of the review. Therefore, a SLR is based on a process that entails the development of a review protocol that guides the selection of relevant studies and the subsequent extraction of data for analysis.

Guidelines for SLR are widely described in literature and the method can be applied for gathering evidences in different domains, such as, medicine and social science (Khan et al., 2003 ; Pai et al., 2004 ; Petticrew & Roberts, 2006 ; Moher et al., 2015 ). Particularly for informatics in education area, Kitchenham and Charters ( 2007 ) guidelines have been reported as one of the main orientations (Dermeval et al, 2020 ). Their approach appears in several studies (Petri & Gresse von Wangenheim, 2017 ; Medeiros et al., 2019 ; Herpich et al, 2019 ) including mappings and reviews on ERS field (Rivera et al., 2018 ; Tarus et al., 2018 ).

As mentioned in Section  1 , Kitchenham and Charters ( 2007 ) guidelines were used in the conducted SLR. They are based on three main stages: the first for planning the review, the second for conducting it and the last for the results report. Following these orientations, the review was structured in three phases with seven main activities distributed among them as depicted in Fig.  1 .

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11341_Fig1_HTML.jpg

Systematic literature review phases and activities

The first was the planning phase. The identification of the need for a SLR about teaching and learning support recommenders and the development of the review protocol occurred on this stage. In activity 1, the search for SLR with the intended scope of this study was performed. The result did not return compatible papers with this review scope. Papers identified are described in Section  2 . In activity 2, the review process was defined. The protocol was elaborated through rounds of discussion by the authors until consensus was reached. The activity 2 output were the research questions, search strategy, papers selection strategy and the data extraction method.

The next was the conducting phase. At this point, activities for relevant papers identification (activity 3) and selection (activities 4) were executed. In Activity 3, searches were carried out in seven repositories indicated by Dermeval et al. ( 2020 ) as relevant to the area of informatics in education. Authors applied the search string into these repositories search engines, however, due to the large number of returned research, the authors established the limit of 600 to 800 papers that would be analyzed. Thus, three repositories whose sum of search results was within the established limits were chosen. The list of potential repositories considered for this review and the selected ones is listed in Section  3.1 . The search string used is also shown in Section  3.1 .

In activity 4, studies were selected through two steps. In the first, inclusion and exclusion criteria were applied to each identified paper. Accepted papers had they quality assessed in the second step. Parsifal 1 was used to manage planning and conducting phase data. Parsifal is a web system, adhering to Kitchenham and Charters ( 2007 ) guidelines, that helps in SLR conduction. At the end of this step, relevant data were extracted (activity 5) and registered in a spreadsheet. Finally, in the reporting phase, the extracted data were analyzed in order to answer the SLR research questions (activity 6) and the results were recorded in this paper (activity 7).

Research question, search string and repositories

Teaching and learning support recommender systems have particularities of configuration, design and evaluation method. Therefore, the following research questions (Table ​ (Table1) 1 ) were elaborated in an effort to synthesize these knowledge as well as the main limitations and research opportunities in the field from the perspective of the most recent studies:

SLR research questions

Regarding the search strategy, papers were selected from three digital repositories (Table ​ (Table2). 2 ). For the search, “Education” and “Recommender system” were defined as the keywords and synonyms were derived from them as secondary terms (Table ​ (Table3). 3 ). From these words, the following search string was elaborated:

  • ("Education" OR "Educational" OR "E-learning" OR "Learning" OR "Learn") AND ("Recommender system" OR "Recommender systems" OR "Recommendation system" OR "Recommendation systems" OR "Recommending system" OR "Recommending systems")

Repositories considered for the SLR

Keywords and their synonyms used in the search string

Inclusion and exclusion criteria

The first step for the selection of papers was performed through the application of objective criteria, thus a set of inclusion and exclusion criteria was defined. The approved papers formed a group that comprises the primary studies with potential relevance for the scope of the SLR. Table ​ Table4 4 lists the defined criteria. In the description column of Table ​ Table4, 4 , the criteria are informed and in the id column they are identified with a code. The latter was defined appending an abbreviation of the respective kind of criteria (IC for Inclusion Criteria and EC for Exclusion Criteria) with an index following the sequence of the list. The Id is used for referencing its corresponding criterion in the rest of this document.

Inclusion and exclusion criteria of the SLR

Since the focus of this review is on the analysis of recent ERS publications, only studies from the past 6 years (2015–2020) were screened (see IC1). Targeting mature recommender systems, only full papers from scientific journals that present the recommendation system evaluation were considered (see IC2, IC4 and IC7). Also, solely works written in English language were selected, because they are the most expressive in quantity and are within the reading ability of the authors (see IC3). Search string was verified on papers’ title, abstract and keywords to ensure only studies related to the ERS field were screened (see IC5). The IC6, specifically, delimited the subject of selected papers and aligned it to the scope of the review. Additionally, it prevented the selection of secondary studies in process (e.g., others reviews or systematic mappings). Conversely, exclusion criteria were defined to clarify that papers contrasting with the inclusion criteria should be excluded from review (see EC1 to EC8). Finally, duplicate searches were marked and, when all criteria were met, only the latest was selected.

Quality evaluation

The second step in studies selection activity was the quality evaluation of the papers. A set of questions were defined with answers of different weights to estimate the quality of the studies. The objective of this phase was to filter researches with higher: (i) validity; (ii) details of the context and implications of the research; and (iii) description of the proposed recommenders. Research that detailed the configuration of the experiment and carried out an external validation of the ERS obtained higher weight in the quality assessment. Hence, the questions related to recommender evaluation (QA8 and QA9) ranged from 0 to 3, while the others, from 0 to 2. The questions and their respective answers are presented in Table ​ Table7 7 (see Appendix). Each paper evaluated had a total weight calculated according to Formula 1 :

Quality evaluation questions and answers

Papers total weight range from 0 to 10. Only works that reached the minimum weight of 7 were accepted.

Screening process

Papers screening process occurred as shown in Fig.  2 . Initially, three authors carried out the identification of the studies. In this activity, the search string was applied into search engines of the repositories along with the inclusion and exclusion criteria through filtering settings. Two searches were undertaken on the three repositories at distinct moments, one in November 2020 and another in January 2021. The second one was performed to ensure that all 2020 published papers in the repositories were counted. A number of 756 preliminary primary studies were returned and their metadata were registered in Parsifal.

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11341_Fig2_HTML.jpg

Flow of papers search and selection

Following the protocol, the selection activity was initiated. At the start, the duplicity verification feature of Parsifal was used. A total of 5 duplicate papers were returned and the oldest copies were ignored. Afterwards, papers were divided into groups and distributed among the authors. Inclusion and exclusion criteria were applied through titles and abstracts reading. In cases which were not possible to determine the eligibility of the papers based on these two fields, the body of text was read until it was possible to apply all criteria accurately. Finally, 41 studies remained for the next step. Once more, papers were divided into three groups and each set of works was evaluated by one author. Studies were read in full and weighted according to each quality assessment question. At any stage of this process, when questions arose, the authors defined a solution through consensus. As a final result of the selection activity, 16 papers were approved for data extraction.

Procedure for data analysis

Data from selected papers were extracted in a data collection form that registered general information and specific information. The general information extracted was: reviewer identification, date of data extraction and title, authors and origin of the paper. General information was used to manage the data extraction activity. The specific information was: recommendation approach, recommendation techniques, input parameters, data collection strategy, method for data collection, evaluation methodology, evaluation settings, evaluation approaches, evaluation metrics. This information was used to answer the research questions. Tabulated records were interpreted and a descriptive summary with the findings was prepared.

Results and discussion

In this section, the SLR results are presented. Firstly, an overview of the selected papers is introduced. Next, the finds are analyzed from the perspective of each research question in a respective subsection.

Selected papers overview

Each selected paper presents a distinct recommendation approach that advances the ERS field. Following, an overview of these studies is provided.

Sergis and Sampson ( 2016 ) present a recommendation system that supports educators’ teaching practices through the selection of learning objects from educational repositories. It generates recommendations based on the level of instructors’ proficiency on ICT Competences. In Tarus et al. ( 2017 ), the recommendations are targeted at students. The study proposes an e-learning resource recommender based on both user and item information mapped through ontologies.

Nafea et al. ( 2019 ) propose three recommendation approaches. They combine item ratings with student’s learning styles for learning objects recommendation. Klašnja-Milićević et al. ( 2018 ) present a recommender of learning materials based on tags defined by the learners. The recommender is incorporated in Protus e-learning system.

In Wan and Niu ( 2016 ), a recommender based on mixed concept mapping and immunological algorithms is proposed. It produces sequences of learning objects for students. In a different approach, the same authors incorporate the self-organization theory into ERS. Wan and Niu ( 2018 ) deals with the notion of self-organizing learning objects. In this research, resources behave as individuals who can move towards learners. This movement results in recommendations and is triggered based on students’ learning attributes and actions. Wan and Niu ( 2020 ), in turn, self-organization refers to the approach of students motivated by their learning needs. The authors propose an ERS that recommends self-organized cliques of learners and, based on these, recommend learning objects.

Zapata et al. ( 2015 ) developed a learning object recommendation strategy for teachers. The study describes a methodology based on collaborative methodology and voting aggregation strategies for the group recommendations. This approach is implemented in the Delphos recommender system. In a similar research line, Rahman and Abdullah ( 2018 ) show an ERS that recommends Google results tailored to students’ academic profile. The proposed system classifies learners into groups and, according to the similarity of their members, indicates web pages related to shared interests.

Wu et al. ( 2015 ) propose a recommendation system for e-learning environments. In this study, complexity and uncertainties related to user profile data and learning activities is modeled through tree structures combined with fuzzy logic. Recommendations are produced from matches of these structures. Ismail et al. ( 2019 ) developed a recommender to support informal learning. It suggests Wikipedia content taking into account unstructured textual platform data and user behavior.

Huang et al. ( 2019 ) present a system for recommending optional courses. The system indications rely on the student’s curriculum time constraints and similarity of academic performance between him and senior students. The time that individuals dedicate for learning is also a relevant factor in Nabizadeh et al. ( 2020 ). In this research, a learning path recommender that includes lessons and learning objects is proposed. Such a system estimates the learner’s good performance score and, based on that, produces a learning path that satisfies their time constraints. The recommendation approach also provides indication of auxiliary resources for those who do not reach the estimated performance.

Fernandez-Garcia et al. ( 2020 ) deals with recommendations of disciplines through a dataset with few instances and sparse. The authors developed a model based on several techniques of data mining and machine learning to support students’ decision in choosing subjects. Wu et al. ( 2020 ) create a recommender that captures students’ mastery of a topic and produces a list of exercises with a level of difficulty adapted to them. Yanes et al. ( 2020 ) developed a recommendation system, based on different machine learning algorithms, that provides appropriate actions to assist teachers to improve the quality of teaching strategies.

How teaching and learning support recommender systems produce recommendations?

The process of generating recommendations is analyzed based on two axes. Underlying techniques of recommender systems are discussed first then input parameters are covered. Studies details are provided in Table ​ Table5 5 .

Summary of ERS techniques and input parameters used in the selected papers

Techniques approaches

Through selected papers analysis is observed that hybrid recommendation systems are predominant in selected papers. Such recommenders are characterized by computing predictions through a set of two or more algorithms in order to mitigate or avoid the limitations of pure recommendation systems (Isinkaye et al., 2015 ). From sixteen analyzed papers, thirteen (p = 81,25%) are based on hybridization. This tendency seems to be related with the support that hybrid approach provides for development of recommender systems that must meet multiple educational needs of users. For example, Sergis and Sampson ( 2016 ) proposed a recommender based on two main techniques: fuzzy set to deal with uncertainty about teacher competence level and Collaborative Filtering (CF) to select learning objects based on neighbors who may have competences similarities. In Tarus et al. ( 2017 ) students and learning resources profiles are represented as ontologies. The system calculates predictions based on them and recommends learning items through a mechanism that applies collaborative filtering followed by a sequential pattern mining algorithm.

Moreover, the hybrid approach that combines CF and Content-Based Filtering (CBF), although a traditional technique (Bobadilla, Ortega, Hernando and Gutiérrez, 2013), it seems to be not popular in teaching and learning support recommender systems research. From the selected papers, only Nafea et al. ( 2019 ) has a proposal in this regard. Additionally, the extracted data indicates that a significant number of hybrid recommendation systems (p = 53.85%, n  = 7) have been built based on the combination of methods of treatment or representation of data, such as the use of ontologies and fuzzy sets, with methods to generate recommendation. For example, Wu et al. ( 2015 ) structure users profile data and learning activities through fuzzy trees. In such structures the values assigned to their nodes are represented by fuzzy sets. The fuzzy tree data model and users’ ratings feed a tree structured data matching method and a CF algorithm for similarities calculation.

Collaborative filtering recommendation paradigm, in turn, plays an important role in research. Nearly a third of the studies (p = 30.77%, n  = 4) that propose hybrid recommenders includes a CF-based strategy. In fact, this is the most frequent pure technique on the research set. A total of 31.25%( n  = 5) are based on a CF adapted version or combine it with other approaches. CBF-based recommenders, in contrast, have not shared the same popularity. This technique is an established recommendation approach that produces results based on the similarity between items known to the user and others recommendable items (Bobadilla et al., 2013 ). Only Nafea et al. ( 2019 ) propose a CBF-based recommendation system.

Also, CF user-based variant is widely used in analyzed research. In this version, predictions are calculated by similarity between users, as opposed to the item-based version where predictions are based on item similarities (Isinkaye et al., 2015 ). All CF-based recommendation systems identified, whether pure or combined with other techniques, use this variant.

The above finds seem to be related to the growing perception, in the education domain, of the relevance of a student-centered teaching and learning process (Krahenbuhl, 2016 ; Mccombs, 2013 ). Recommendation approaches that are based on users’ profile, such as interests, needs, and capabilities, naturally fit this notion and are more widely used than those based on other information such as the characteristics of the recommended items.

Input parameters approaches

In regard to the inputs consumed in the recommendation process, collected data shows that the main parameters are attributes related to users’ educational profile. Examples are ICT competences (Sergis & Sampson, 2016 ); learning objectives (Wan & Niu, 2018 ; Wu et al., 2015 ), learning styles (Nafea et al., 2019 ), learning levels (Tarus et al., 2017 ) and different academic data (Yanes et al., 2020 ; Fernández-García et al., 2020). Only 25% ( n  = 4) of the systems apply item-related information in the recommendation process. Furthermore, with the exception of the Nafea et al. ( 2019 ) CBF-based recommendation, the others are based on a combination of items and users’ information. A complete list of the identified input parameters is provided in Table ​ Table5 5 .

Academic information and learning styles, compared to others parameters, features highly on research. They appear, respectively, in 37.5% ( n  = 6) and 31.25% ( n  = 5) papers. Student’s scores (Huang et al., 2019 ), academic background (Yanes et al., 2020 ), learning categories (Wu et al., 2015 ) and subjects taken (Fernández-García et al.,2020) are some of the academic data used. Learning styles, in turn, are predominantly based on Felder ( 1988 ) theory. Wan and Niu ( 2016 ), exceptionally, combine Felder ( 1988 ), Kolb et al. ( 2001 ) and Betoret ( 2007 ) to build a specific notion of learning styles. This is also used in two other researchers, carried out by the same authors, and has a questionnaire also developed by them (Wan & Niu, 2018 , 2020 ).

Regarding the way inputs are captured, it was observed that explicit feedback is prioritized over others data collection strategies. In this approach, users have to directly provide the information that will be used in the process of preparing recommendations (Isinkaye et al., 2015 ). Half of analyzed studies are based only on explicit feedback. The use of graphical interface components (Klašnja-Milićević et al., 2018 ), questionnaires (Wan & Niu, 2016 ) and manual entry of datasets (Wu et al., 2020 ; Yanes et al., 2020 ) are the main methods identified.

Only 18.75%( n  = 3) ERS rely solely on gathering information through implicit feedback, that is, when inputs are inferred by the system (Isinkaye et al., 2015 ). This type of data collection appears to be more popular when applied with an explicit feedback method for enhancing the prediction tasks. Recommenders that combine both approaches occur in 31.25%( n  = 5) of the studies. Implicit data collection methods identified are user’s data usage tracking, as access, browsing and rating history (Rahman & Abdullah, 2018 ; Sergis & Sampson, 2016 ; Wan & Niu, 2018 ), data extraction from another system (Ismail et al., 2019 ), users data session monitoring (Rahman & Abdullah, 2018 ) and data estimation (Nabizadeh et al., 2020 ).

The aforementioned results indicate that, in the context of the teaching and learning support recommender systems, the implicit collection of data has usually been explored in a complementary way to the explicit one. A possible rationale is that the inference of information is noisy and less accurate (Isinkaye et al., 2015 ) and, therefore, the recommendations produced from it involve greater complexity to be adjusted to the users’ expectations (Nichols, 1998 ). This aspect makes it difficult to apply the strategy in isolation and can be a factor that produces greater user dissatisfaction when compared to the disadvantage of the acquisition load of the explicit strategy inputs.

How teaching and learning support recommender systems present recommendations?

From the analyzed paper, two approaches for presenting recommendations are identified. The majority of the proposed ERS are based on a listing of ranked items according to a per-user prediction calculation (p = 87.5%, n  = 14). This strategy is applied in all cases where the supported task is to find good items that assist users in teaching and learning tasks (Ricci et al., 2015 ; Drachsler et al., 2015 ). The second one, is based on a learning pathway generation. In this case, recommendations are displayed through a series of linked items tied by some prerequisites. Only 2 recommenders use this approach. In them, the sequence is established by learning objects association attributes (Wan & Niu, 2016 ) and by a combination of prior knowledge of the user, the time he has available and a learning score (Nabizadeh et al., 2020 ). These ERS are associated with the item sequence recommendation task and are intended to guide users who wish to achieve a specific knowledge (Drachsler et al., 2015 ).

In a further examination, it is observed that more than a half (62.5%, n  = 10) do not present details of how recommendations list is presented to the end user. In Huang et al. ( 2019 ), for example, there is a vague description of a production of predicted scores for students and a list of the top-n optional courses and it is not specified how this list is displayed. This may be related to the fact that most of these recommenders do not report an integration into another system (e.g., learning management systems) or the purpose of making it available as a standalone tool (e.g., web or mobile recommendation system). The absence of such requirements mitigates the need for the development of a refined presentation interface. Only Tarus et al. ( 2017 ), Wan and Niu ( 2018 ) and Nafea et al. ( 2019 ) propose recommenders incorporated in an e-learning system and do not detail the way in which the results are exhibited. In the six papers that provide insights about recommendation presentation, a few of them (33.33%, n  = 2), have a graphical interface that explicitly seeks to capture the attention of the user who may be performing another task in the system. This approach highlights recommendations and is common in commercial systems (Beel, Langer and Genzmehr, 2013). In Rahman and Abdullah ( 2018 ), a panel entitled “recommendations for you” is used. In Ismail et al. ( 2019 ), a pop-up box with suggestions is displayed to the user. The other part of the studies exhibits organic recommendations, i.e., naturally arranged items for user interaction (Beel et al., 2013 ).

In Zapata et al. ( 2015 ), after the user defines some parameters, a list of recommended learning objects that are returned similarly to a search engine result. As for the aggregation methods, another item recommended by the system, only the strategy that fits better to the interests of the group is recommended. The result is visualized through a five-star Likert scale that represents the users’ consensus rating. In Klašnja-Milićević et al. ( 2018 ) and Wu et al. ( 2015 ), the recommenders’ results are listed in the main area of the system. In Nabizadeh et al. ( 2020 ) the learning path occupies a panel on the screen and the items associated with it are displayed as the user progresses through the steps. The view of the auxiliary learning objects is not described in the paper. These three last recommenders do not include filtering settings and distance themselves from the archetype of a search engine.

Also, a significant number of researches are centralized on learning objects recommendations (p = 56.25%, n  = 9). Other researches recommendable items identified are learning activities (Wu et al., 2015 ), pedagogical actions (Yanes et al., 2020 ), web pages (Ismail et al., 2019 ; Rahman & Abdullah, 2018 ), exercises (Wu et al., 2020 ), aggregation methods (Zapata et al., 2015 ), lessons (Nabizadeh et al., 2020 ) and subjects (Fernández-García et al., 2020). None of the study relates the way of displaying results to the recommended item. This is a topic that needs further investigation to answer whether there are more appropriate ways to present specific types of items to the user.

How teaching and learning support recommender systems are evaluated?

In ERS, there are three main evaluation methodologies (Manouselis et al., 2013 ). One of them is the offline experiment, which is based on the use of pre-collected or simulated data to test recommenders’ prediction quality (Shani & Gunawardana, 2010 ). User study is the second approach. It takes place in a controlled environment where information related to real interactions of users are collected (Shani & Gunawardana, 2010 ). This type of evaluation can be conducted, for example, through a questionnaire and A/B tests (Shani & Gunawardana, 2010 ). Finally, the online experiment, also called real life testing, is one in which recommenders are used under real conditions by the intended users (Shani & Gunawardana, 2010 ).

In view of these definitions, the analyzed researches comprise only user studies and offline experiments in reported experiments. Each of these methods were identified in 68.75% ( n  = 11) papers respectively. Note that they are not exclusive for all cases and therefore the sum of the percentages is greater than 100%. For example, Klašnja-Milićević et al. ( 2018 ) and Nafea et al. ( 2019 ) assessed the quality of ERS predictions from datasets analysis and also asked users to use the systems to investigate their attractiveness. Both evaluation methods are carried out jointly in 37.5%( n  = 6) papers. When comparing with methods exclusive usage, each one is conducted at 31.25% ( n  = 5). Therefore, the two methods seem to have a balanced popularity. Real-life tests, on the contrary, although they are the ones that best demonstrate the quality of a recommender (Shani & Gunawardana, 2010 ), are the most avoided, probably due to the high cost and complexity of execution.

An interesting finding concerns user study methods used in research. When associated with offline experiments, the user satisfaction assessment is the most common ( p  = 80%, n  = 5). Of these, only Nabizadeh et al. ( 2020 ) performed an in-depth evaluation combining a satisfaction questionnaire with an experiment to verify the pedagogical effectiveness of their recommender. Wu et al. ( 2015 ), in particular, does not include a satisfaction survey. They conducted a qualitative investigation of user interactions and experiences.

Although questionnaires assist in identification of users’ valuables information, it is sensitive to respondents’ intentions and can be biased with erroneous answers (Shani & Gunawardana, 2010 ). Papers that present only user studies, in contrast, have a higher rate of experiments that results in direct evidence about the recommender’s effectiveness in teaching and learning. All papers in this group have some investigation in this sense. Wan and Niu ( 2018 ), for example, verified whether the recommender influenced the academic score of students and their time to reach a learning objective. Rahman and Abdullah ( 2018 ) investigated whether the recommender impacted the time students took to complete a task.

Regarding the purpose of the evaluations, ten distinct research goals were identified. Through Fig.  3 , it is observed that the occurrence of accuracy investigation excelled the others. Only 1 study did not carry out experiments in this regard. Different traditional metrics were identified for measuring the accuracy of recommenders. The Mean Absolute Error (MAE), in particular, has the higher frequency. Table ​ Table6 6 lists the main metrics identified.

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11341_Fig3_HTML.jpg

Evaluation purpose of recommender systems in selected papers

Summary of ERS evaluation settings, approaches and metrics in selected papers

The system attractiveness analysis, through the verification of user satisfaction, has the second highest occurrence. It is present in 62.5% ( n  = 10) studies. The pedagogical effectiveness evaluation of the ERS has a reduced participation in the studies and occurs in only 37.5% ( n  = 6). Experiments to examine recommendations diversity, user’s profile elicitation accuracy, evolution process, user’s experience and interactions, entropy, novelty and perceived usefulness and easiness were also identified, albeit to a lesser extent.

Also, 81.25% ( n  = 13) papers presented experiments to achieve multiple purposes. For example, in Wan and Niu ( 2020 ) an evaluation is carried out to investigate recommenders’ pedagogical effectiveness, student satisfaction, accuracy, diversity of recommendations and entropy. Only in Huang et al. ( 2019 ), Fernandez-Garcia et al. ( 2020 ) and Yanes et al. ( 2020 ) evaluated a single recommender system dimension.

The upper evidence suggests an engagement of the scientific community in demonstrating the quality of the recommender systems developed through multidimensional analysis. However, offline experiments and user studies, particularly those based on questionnaires, are mostly adopted and can lead to incomplete or biased interpretations. Thus, such data also signalize the need for a greater effort to conduct real life tests and experiments that lead to an understanding of the real impact of recommenders on the teaching and learning process. Researches that synthesize and discuss the empirical possibilities of evaluating the pedagogical effectiveness of ERS can help to increase the popularity of these experiments.

Through papers analysis is also find that the results of offline experiments are usually based on a greater amount of data compared to user studies. In this group, 63.64% ( n  = 7) of evaluation datasets have records of more than 100 users. User studies, on the other hand, predominate sets of up to 100 participants in the experiments (72.72%, n  = 8). In general, offline assessments that have smaller datasets are those that occur in association with a user study. This is because the data for both experiments usually come from the same subjects (Nafea et al., 2019 ; Tarus et al., 2017 ). The cost (e.g., time and money) related to surveying participants for the experiment is possibly a determining factor in defining appropriate samples.

Furthermore, it is also verified that the greater parcel of offline experiments has a 70/30% division approach for training and testing data. Nguyen et al. ( 2021 ) give some insights in this sense arguing that this is the most suitable ratio for training and validating machine learning models. Further details on recommendation systems evaluation approaches and metrics are presented in Table ​ Table6 6 .

What are the limitations and research opportunities related to the teaching and learning support recommender systems field?

The main limitations observed in selected papers are presented below. They are based on articles’ explicit statements and on authors’ formulations. In this section, only those that are transverse to the majority of the studies are listed. Next, a set of research opportunities for future investigations are pointed out.

Research limitations

Research limitations are factors that hinders current progress in the ERS field. Knowing these factors can assist researchers to attempt coping with them on their study and mitigate the possibility of the area stagnation, that is, when new proposed recommenders does not truly generate better outcomes than the baselines (Anelli et al., 2021 ; Dacrema et al., 2021 ). As a result of this SLR, research limitations were identified in three strands that are presented below.

Reproducibility restriction

The majority of the papers report a specifically collected dataset to evaluate the proposed ERS. The main reason for this is the scarcity of public datasets suited to the research’s needs, as highlighted by some authors (Nabizadeh et al., 2020 ; Tarus et al., 2017 ; Wan & Niu, 2018 ; Wu et al., 2015 ; Yanes et al., 2020 ). Such approach restricts the feasibility of experiment reproduction and makes it difficult to compare recommenders. In fact, this is an old issue in the ERS field. Verbert et al. ( 2011 ) observed, in the beginning of the last decade, the necessity to improve reproducibility and comparison on ERS in order to provide stronger conclusions about their validity and generalizability. Although there was an effort in this direction in the following years based on a broad educational dataset sharing, currently, most of the known ones (Çano & Morisio, 2015 ; Drachsler et al., 2015 ) are retired, and the remaining, proved not to be sufficient to meet current research demands. Of the analyzed studies, only Wu et al. ( 2020 ) use public educational datasets.

Due to the fact that datasets sharing play an important role for recommenders’ model reproduction and comparison in the same conditions, this finding highlight the need of a research community effort for the creation of means to supply this need (e.g., development of public repositories) in order to mitigate current reproducibility limitation.

Dataset size / No of subjects

As can be observed on Table ​ Table6, 6 , a few experimental results are based on a large amount of data. Only five studies have information from 1000 or more users. In particular, the offline evaluation conducted by Wu et al. ( 2015 ), despite having an extensive dataset, uses MovieLens records and is not based on real information related to teaching and learning. Another limitation concerns where data comes from, it is usually from a single origin (e.g., class of a college).

Although experiments based on small datasets can reveal the relevance of an ERS, an evaluation based on a large-scale dataset should provide stronger conclusions on recommendation effectiveness (Verbert et al., 2011 ). Experiments based on larger and more diverse data (e.g., users from different areas and domains) would contribute to most generalizable results. On another hand, scarcity of public dataset may be impairing the quantity and diversity of data used on scientific experiments in the ERS field. As reported by Nabizadeh et al. ( 2020 ), the increasement of the size of the experiment is costly in different aspects. If more public dataset were available, researchers would be more likely to find the ones that could be aligned to their needs and, naturally, increasing the size of their experiment. In this sense, they could be favored by reducing data acquisition difficulty and cost. Furthermore, the scientific community would access users’ data out of their surrounding context and could base their experiments on diversified data.

Lack of in-depth investigation of the impact of known issues in the recommendation system field

Cold start, overspecialization and sparsity are some known challenges in the field of recommender systems (Khusro et al., 2016 ). They are mainly related to a reduced and unequally distributed number of users’ feedback or item description used for generating recommendations (Kunaver & Požrl, 2017 ). These issues also permeate the ERS Field. For instance, in Cechinel et al. ( 2011 ) is reported that on a sample of more than 6000 learning objects from Merlot repository was observed a reduced number of users ratings over items. Cechinel et al. ( 2013 ), in turn, observed, in a dataset from the same repository, a pattern of few users rating several resources while the vast number of them rating 5 or less. Since such issues directly impact the quality of recommendations, teaching and learning support recommenders should be evaluated considering such issues to clarify in which extent they can be effective in real life situations. Conversely, in this SLR, we detected an expressive number of papers (43.75%, n  = 7) that do not analyze or discuss how the recommenders behave or handle, at least partially, these issues. Studies that rely on experiments to examine such aspects would elucidate more details of the quality of the proposed systems.

Research opportunities

From the analyzed papers, a set of research opportunities were identified. They are based on gaps related to the subjects explored through the research questions of this SLR. The identified opportunities provide insights of under-explored topics that need further investigation taking into account their potential to contribute to the advancement of the ERS field. Research opportunities were identified in three strands that are presented below.

Study of the potential of overlooked user’s attributes

The papers examined present ERS based on a variety of inputs. Preferences, prior knowledge, learning style, and learning objectives are some examples (Table ​ (Table5 5 has the complete list). Actually, as reported by Chen and Wang ( 2021 ), this is aligned with a current research trend of investigating the relationships between individual differences and personalized learning. Nevertheless, one evidence that rises from this SLR also confirms that “some essential individual differences are neglected in existing works” (Chen & Wang, 2021 ). The papers sample suggests a lack of studies that incorporate, in recommendation model, others notably relevant information, such as emotional state and cultural context of students (Maravanyika & Dlodlo, 2018 ; Salazar et al., 2021 ; Yanes et al., 2020 ). This indicates that further investigation is needed in order to clarify the true contributions and existing complexities of collect, measure and apply these other parameters. In this sense, an open research opportunity refers to the investigation of these other users’ attributes in order to explore the impact of such characteristics on the quality of ERS results.

Increase studies on the application of ERS in informal learning situations

Informal learning refers to a type of learning that, typically, occurs out of an education institution (Pöntinen et al., 2017 ). In it, learners do not follow a structured curriculum or have a domain expert to guide him (Pöntinen et al., 2017 ; Santos & Ali, 2012 ). Such aspects influence how ERS can support users. For instance, in informal settings, content can come from multiple providers, as a consequence, it can be delivered without taking into account a proper pedagogical sequence. ERS targeting this scenario, in turn, should concentrate on organizing and sequencing recommendations guiding users’ learning process (Drachsler et al., 2009 ).

Although literature highlight the existence of significative differences on the design of educational recommenders that involves formal or informal learning circumstance (Drachsler et al., 2009 ;Okoye et al, 2012 ; Manouselis et al., 2013 ; Harrathi & Braham, 2021 ), through this SLR was observed that current studies tend to not be explicit in reporting this characteristic. This scenario makes it difficult to obtain a clear landscape of the current field situation in this dimension. Nonetheless, through the characteristics of the proposed ERS, it was observed that current research seems to be concentrated on the formal learning context. This is because recommenders from analyzed papers usually use data that are maintained by institutional learning systems. Moreover, recommendations, predominantly, do not provide a pedagogical sequencing to support self-directed and self-paced learning (e.g., recommendations that build a learning path to lead to specific knowledge). Conversely, informal learning has increasingly gained attention of the scientific community with the emergence of the coronavirus pandemic (Watkins & Marsick, 2020 ).

In view of this, the lack of studies of ERS targeting informal learning settings open a research opportunity. Specifically, further investigation focused on the design and evaluation of recommenders that take into consideration different contexts (ex. location or used device) and that guide users through a learning sequence to achieve a specific knowledge would figure prominently in this context considering the less structured format informal learning circumstances has in terms of learning objectives and learning support.

Studies on the development of multidimensional evaluation frameworks

Evidence from this study shows that the main purpose of ERS evaluation has been to assess recommender’s accuracy and users’ satisfaction (Section  4.4 ). This result, connected with Erdt et al. ( 2015 ) reveals a two decade of evaluation predominantly based on these two goals. Even though others evaluation purposes had a reduced participation in research, they are also critical for measuring the success of ERS. Moubayed et al. ( 2018 ), for example, highlights two e-learning systems evaluation aspects, one is concerned with how to properly evaluate the student performance, the other refers to measuring learners’ learning gains through systems usage. Tahereh et al. ( 2013 ) identifies that stakeholder and indicators associated with technological quality are relevant to consider in educational system assessment. From the perspective of recommender systems field, there are also important aspects to be analyzed in the context of its application in the educational domain such as novelty and diversity (Pu et al., 2011 ; Cremonesi et al., 2013 ; Erdt et al., 2015 ).

Upon this context, it is noted that, although evaluating recommender's accuracy and users’ satisfaction give insights about the value of the ERS, they are not sufficient to fully indicate the quality of the system in supporting the learning process. Other different factors reported in literature are relevant to take in consideration. However, to the best of our knowledge, there is no framework that identifies and organizes these factors to be considered in an ERS evaluation, leading to difficulties for the scientific community to be aware of them and incorporate them in studies.

Because the evaluation of ERS needs to be a joint effort between computer scientists and experts from other domains (Erdt et al., 2015 ), further investigation should be carried out seeking the development of a multidimensional evaluation framework that encompass evaluation requirements based on a multidisciplinary perspective. Such studies would clarify the different dimensions that have the potential to contribute to better ERS evaluation and could even identify which one should be prioritized to truly assess learning impact with reduced cost.

In recent years, there has been an extensive scientific effort to develop recommenders that meet different educational needs; however, research is dispersed in literature and there is no recent study that encompasses the current scientific efforts in the field.

Given this context, this paper presents an SLR that aims to analyze and synthesize the main trends, limitations and research opportunities related to the teaching and learning support recommender systems area. Specifically, this study contributes to the field providing a summary and an analysis of the current available information about the teaching and learning support recommender systems topic in four dimensions: (i) how the recommendations are produced (ii) how the recommendations are presented to the users (iii) how the recommender systems are evaluated and (iv) what are the limitations and opportunities for research in the area.

Evidences are based on primary studies published from 2015 to 2020 from three repositories. Through this review, it is provided an overarching perspective of current evidence-based practice in ERS in order to support practitioners and researchers for implementation and future research directions. Also, research limitations and opportunities are summarized in light of current studies.

The findings, in terms of current trends, shows that hybrid techniques are the most used in teaching and learning support recommender systems field. Furthermore, it is noted that approaches that naturally fit a user centered design (e.g., techniques that allow to represent students’ educational constraints) have been prioritized over that based on other aspects, like item characteristics (e.g., CBF Technique). Results show that these approaches have been recognized as the main means to support users with recommendations in their teaching and learning process and provide directions for practitioners and researchers who seek to base their activities and investigations on evidence from current studies. On the other hand, this study also reveals that highly featured techniques in the major topic of general recommender systems, such as the bandit-based and the deep learning ones (Barraza-Urbina & Glowacka, 2020 ; Zhang et al., 2020 ), have been underexplored, implying a mismatch between the areas. Therefore, the result of this systematic review indicates that a greater scientific effort should be employed to investigate the potential of these uncovered approaches.

With respect to recommendation presentation, the organic display is the most used strategy. However, most of the researches have the tendency to not show details of the used approach making it difficult to understand the state of the art of this dimension. Furthermore, among other results, it is observed that the majority of the ERS evaluation are based on the accuracy of recommenders and user's satisfaction analysis. Such a find open research opportunity scientific community for the development of multidimensional evaluation frameworks that effectively support the verification of the impact of recommendations on the teaching and learning process.

Lastly, the limitations identified indicate that difficulties related to obtaining data to carry out evaluations of ERS is a reality that extends for more than a decade (Verbert et al., 2011 ) and call for scientific community attention for the treatment of this situation. Likewise, the lack of in-depth investigation of the impact of known issues in the recommendation system field, another limitation identified, points to the importance of aspects that must be considered in the design and evaluation of these systems in order to provide a better elucidation of their potential application in a real scenario.

With regard to research limitations and opportunities, some of this study findings indicate the need for a greater effort in the conduction of evaluations that provide direct evidence of the systems pedagogical effectiveness and the development of a multidimensional evaluation frameworks for ERS is suggested as a research opportunity. Also, it was observed a scarcity of public dataset usage on current studies that leads to limitation in terms of reproducibility and comparison of recommenders. This seems to be related to a restricted number of public datasets currently available, and such aspect can also be affecting the size of experiments conducted by researchers.

In terms of limitations of this study, the first refers to the number of datasources used for paper selection. Only the repositories mentioned in Section  3.1 were considered. Thus, the scope of this work is restricted to evidence from publications indexed by these platforms. Furthermore, only publications written in English were examined, thus, results of papers written in other languages are beyond the scope of this work. Also, the research limitations and opportunities presented on Section  4.5 were identified based on the extracted data used to answer this SLR research questions, therefore they are limited to their scope. As a consequence, limitations and opportunities of the ERS field that surpass this context were not identified nor discussed in this study. Finally, the SLR was directed to papers published in scientific journals and, due to this, the results obtained do not reflect the state of the area from the perspective of conference publications. In future research, it is intended to address such limitations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Table ​ Table7 7

Author contribution

Felipe Leite da Silva: Conceptualization, Methodology approach, Data curation, Writing – original draft. Bruna Kin Slodkowski: Data curation, Writing – original draft. Ketia Kellen Araújo da Silva: Data curation, Writing – original draft. Sílvio César Cazella: Supervision and Monitoring of the research; Writing – review & editing.

Data availability statement

Informed consent.

This research does not involve human participation as research subject, therefore research subject consent does not apply.

Authors consent with the content presented in the submitted manuscript.

Financial and non-financial interests

The authors have no relevant financial or non-financial interests to disclose.

Research involving human participants and/or animals

This research does not involve an experiment with human or animal participation.

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 http://parsif.al/

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Anelli, V. W., Bellogín, A., Di Noia, T., & Pomo, C. (2021). Revisioning the comparison between neural collaborative filtering and matrix factorization. Proceedings of the Fifteenth ACM Conference on Recommender Systems , 521–529. 10.1145/3460231.3475944
  • Ashraf E, Manickam S, Karuppayah S. A comprehensive review of curse recommender systems in e-learning. Journal of Educators Online. 2021; 18 :23–35. [ Google Scholar ]
  • Barraza-Urbina, A., & Glowacka, D. (2020). Introduction to Bandits in Recommender Systems. Proceedings of the Fourteenth ACM Conference on Recommender Systems , 748–750. 10.1145/3383313.3411547
  • Becker F. Teacher epistemology: The daily life of the school. 1. Editora Vozes; 1993. [ Google Scholar ]
  • Beel J, Langer S, Genzmehr M. Sponsored vs. Organic (Research Paper) Recommendations and the Impact of Labeling. In: Aalberg T, Papatheodorou C, Dobreva M, Tsakonas G, Farrugia CJ, editors. Research and Advanced Technology for Digital Libraries. Springer Berlin Heidelberg; 2013. pp. 391–395. [ Google Scholar ]
  • Betoret F. The influence of students’ and teachers’ thinking styles on student course satisfaction and on their learning process. Educational Psychology. 2007; 27 (2):219–234. doi: 10.1080/01443410601066701. [ CrossRef ] [ Google Scholar ]
  • Bobadilla J, Serradilla F, Hernando A. Collaborative filtering adapted to recommender systems of e-learning. Knowledge-Based Systems. 2009; 22 (4):261–265. doi: 10.1016/j.knosys.2009.01.008. [ CrossRef ] [ Google Scholar ]
  • Bobadilla J, Ortega F, Hernando A, Gutiérrez A. Recommender systems survey. Knowledge-Based Systems. 2013; 46 :109–132. doi: 10.1016/j.knosys.2013.03.012. [ CrossRef ] [ Google Scholar ]
  • Buder J, Schwind C. Learning with personalized recommender systems: A psychological view. Computers in Human Behavior. 2012; 28 (1):207–216. doi: 10.1016/j.chb.2011.09.002. [ CrossRef ] [ Google Scholar ]
  • Çano, E., & Morisio, M. (2015). Characterization of public datasets for Recommender Systems. (2015 IEEE 1 st ) International Forum on Research and Technologies for Society and Industry Leveraging a better tomorrow (RTSI) , 249–257.10.1109/RTSI.2015.7325106
  • Cazella SC, Behar PA, Schneider D, Silva KKd, Freitas R. Developing a learning objects recommender system based on competences to education: Experience report. New Perspectives in Information Systems and Technologies. 2014; 2 :217–226. doi: 10.1007/978-3-319-05948-8_21. [ CrossRef ] [ Google Scholar ]
  • Cechinel C, Sánchez-Alonso S, García-Barriocanal E. Statistical profiles of highly-rated learning objects. Computers & Education. 2011; 57 (1):1255–1269. doi: 10.1016/j.compedu.2011.01.012. [ CrossRef ] [ Google Scholar ]
  • Cechinel C, Sicilia M-Á, Sánchez-Alonso S, García-Barriocanal E. Evaluating collaborative filtering recommendations inside large learning object repositories. Information Processing & Management. 2013; 49 (1):34–50. doi: 10.1016/j.ipm.2012.07.004. [ CrossRef ] [ Google Scholar ]
  • Chen SY, Wang J-H. Individual differences and personalized learning: A review and appraisal. Universal Access in the Information Society. 2021; 20 (4):833–849. doi: 10.1007/s10209-020-00753-4. [ CrossRef ] [ Google Scholar ]
  • Cremonesi P, Garzotto F, Turrin R. User-centric vs. system-centric evaluation of recommender systems. In: Kotzé P, Marsden G, Lindgaard G, Wesson J, Winckler M, editors. Human-Computer Interaction – INTERACT 2013, 334–351. Springer Berlin Heidelberg; 2013. [ Google Scholar ]
  • Dacrema MF, Boglio S, Cremonesi P, Jannach D. A troubling analysis of reproducibility and progress in recommender systems research. ACM Transactions on Information Systems. 2021; 39 (2):1–49. doi: 10.1145/3434185. [ CrossRef ] [ Google Scholar ]
  • Dermeval, D., Coelho, J.A.P.d.M., & Bittencourt, I.I. (2020). Mapeamento Sistemático e Revisão Sistemática da Literatura em Informática na Educação. Metodologia de Pesquisa Científica em Informática na Educação: Abordagem Quantitativa . Porto Alegre.  https://jodi-ojs-tdl.tdl.org/jodi/article/view/442
  • Drachsler H, Hummel HGK, Koper R. Identifying the goal, user model and conditions of recommender systems for formal and informal learning. Journal of Digital Information. 2009; 10 (2):1–17. [ Google Scholar ]
  • Drachsler H, Verbert K, Santos OC, Manouselis N. Panorama of Recommender Systems to Support Learning. In: Ricci F, Rokach L, Shapira B, editors. Recommender Systems Handbook. Springer; 2015. pp. 421–451. [ Google Scholar ]
  • Erdt M, Fernández A, Rensing C. Evaluating recommender systems for technology enhanced learning: A quantitative survey. IEEE Transactions on Learning Technologies. 2015; 8 (4):326–344. doi: 10.1109/TLT.2015.2438867. [ CrossRef ] [ Google Scholar ]
  • Felder R. Learning and teaching styles in engineering education. Journal of Engineering Education. 1988; 78 :674–681. [ Google Scholar ]
  • Fernandez-Garcia AJ, Rodriguez-Echeverria R, Preciado JC, Manzano JMC, Sanchez-Figueroa F. Creating a recommender system to support higher education students in the subject enrollment decision. IEEE Access. 2020; 8 :189069–189088. doi: 10.1109/ACCESS.2020.3031572. [ CrossRef ] [ Google Scholar ]
  • Ferreira, V., Vasconcelos, G., & França, R. (2017). Mapeamento Sistemático sobre Sistemas de Recomendações Educacionais. Proceedings of the XXVIII Brazilian Symposium on Computers in Education , 253-262. 10.5753/cbie.sbie.2017.253
  • Garcia-Martinez S, Hamou-Lhadj A. Educational recommender systems: A pedagogical-focused perspective. Multimedia Services in Intelligent Environments. Smart Innovation, Systems and Technologies. 2013; 25 :113–124. doi: 10.1007/978-3-319-00375-7_8. [ CrossRef ] [ Google Scholar ]
  • George G, Lal AM. Review of ontology-based recommender systems in e-learning. Computers & Education. 2019; 142 :103642–103659. doi: 10.1016/j.compedu.2019.103642. [ CrossRef ] [ Google Scholar ]
  • Harrathi M, Braham R. Recommenders in improving students’ engagement in large scale open learning. Procedia Computer Science. 2021; 192 :1121–1131. doi: 10.1016/j.procs.2021.08.115. [ CrossRef ] [ Google Scholar ]
  • Herpich F, Nunes F, Petri G, Tarouco L. How Mobile augmented reality is applied in education? A systematic literature review. Creative Education. 2019; 10 :1589–1627. doi: 10.4236/ce.2019.107115. [ CrossRef ] [ Google Scholar ]
  • Huang L, Wang C-D, Chao H-Y, Lai J-H, Yu PS. A score prediction approach for optional course recommendation via cross-user-domain collaborative filtering. IEEE Access. 2019; 7 :19550–19563. doi: 10.1109/ACCESS.2019.2897979. [ CrossRef ] [ Google Scholar ]
  • Iaquinta, L., Gemmis, M. de,Lops, P., Semeraro, G., Filannino, M.& Molino, P. (2008). Introducing serendipity in a content-based recommender system.  Proceedings of the Eighth International Conference on Hybrid Intelligent Systems , 168-173, 10.1109/HIS.2008.25
  • Isinkaye FO, Folajimi YO, Ojokoh BA. Recommendation systems: Principles, methods and evaluation. Egyptian Informatics Journal. 2015; 16 (3):261–273. doi: 10.1016/j.eij.2015.06.005. [ CrossRef ] [ Google Scholar ]
  • Ismail HM, Belkhouche B, Harous S. Framework for personalized content recommendations to support informal learning in massively diverse information Wikis. IEEE Access. 2019; 7 :172752–172773. doi: 10.1109/ACCESS.2019.2956284. [ CrossRef ] [ Google Scholar ]
  • Khan KS, Kunz R, Kleijnen J, Antes G. Five steps to conducting a systematic review. Journal of the Royal Society of Medicine. 2003; 96 (3):118–121. doi: 10.1258/jrsm.96.3.118. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Khanal SS, Prasad PWC, Alsadoon A, Maag A. A systematic review: Machine learning based recommendation systems for e-learning. Education and Information Technologies. 2019; 25 (4):2635–2664. doi: 10.1007/s10639-019-10063-9. [ CrossRef ] [ Google Scholar ]
  • Khusro S, Ali Z, Ullah I. Recommender Systems: Issues, Challenges, and Research Opportunities. In: Kim K, Joukov N, editors. Lecture Notes in Electrical Engineering. Springer; 2016. pp. 1179–1189. [ Google Scholar ]
  • Kitchenham, B. A., & Charters, S. (2007). Guidelines for performing Systematic Literature Reviews in Software Engineering. Technical Report EBSE 2007–001 . Keele University and Durham University Joint Report. https://www.elsevier.com/data/promis_misc/525444systematicreviewsguide.pdf .
  • Kitchenham B, Pearl Brereton O, Budgen D, Turner M, Bailey J, Linkman S. Systematic literature reviews in software engineering – A systematic literature review. Information and Software Technology. 2009; 51 (1):7–15. doi: 10.1016/j.infsof.2008.09.009. [ CrossRef ] [ Google Scholar ]
  • Klašnja-Milićević A, Ivanović M, Nanopoulos A. Recommender systems in e-learning environments: A survey of the state-of-the-art and possible extensions. Artificial Intelligence Review. 2015; 44 (4):571–604. doi: 10.1007/s10462-015-9440-z. [ CrossRef ] [ Google Scholar ]
  • Klašnja-Milićević A, Vesin B, Ivanović M. Social tagging strategy for enhancing e-learning experience. Computers & Education. 2018; 118 :166–181. doi: 10.1016/j.compedu.2017.12.002. [ CrossRef ] [ Google Scholar ]
  • Kolb, D., Boyatzis, R., Mainemelis, C., (2001). Experiential Learning Theory: Previous Research and New Directions Perspectives on Thinking, Learning and Cognitive Styles , 227–247.
  • Krahenbuhl KS. Student-centered Education and Constructivism: Challenges, Concerns, and Clarity for Teachers. The Clearing House: A Journal of Educational Strategies, Issues and Ideas. 2016; 89 (3):97–105. doi: 10.1080/00098655.2016.1191311. [ CrossRef ] [ Google Scholar ]
  • Kunaver M, Požrl T. Diversity in recommender systems – A survey. Knowledge-Based Systems. 2017; 123 :154–162. doi: 10.1016/j.knosys.2017.02.009. [ CrossRef ] [ Google Scholar ]
  • Manouselis N, Drachsler H, Vuorikari R, Hummel H, Koper R. Recommender systems in technology enhanced learning. In: Ricci F, Rokach L, Shapira B, Kantor P, editors. Recommender Systems Handbook. Springer; 2010. pp. 387–415. [ Google Scholar ]
  • Manouselis N, Drachsler H, Verbert K, Santos OC. Recommender systems for technology enhanced learning. Springer; 2014. [ Google Scholar ]
  • Manouselis, N., Drachsler, H., Verbert, K., & Duval, E. (2013). Challenges and Outlook. Recommender Systems for Learning , 63–76. 10.1007/978-1-4614-4361-2
  • Maravanyika M, Dlodlo N. An adaptive framework for recommender-based learning management systems. Open Innovations Conference (OI) 2018; 2018 :203–212. doi: 10.1109/OI.2018.8535816. [ CrossRef ] [ Google Scholar ]
  • Maria, S. A. A., Cazella, S. C., & Behar, P. A. (2019). Sistemas de Recomendação: conceitos e técnicas de aplicação. Recomendação Pedagógica em Educação a Distância , 19–47, Penso.
  • McCombs, B. L. (2013). The Learner-Centered Model: Implications for Research Approaches. In Cornelius-White, J., Motschnig-Pitrik, R. & Lux, M. (eds), Interdisciplinary Handbook of the Person-Centered Approach , 335–352. 10.1007/ 978-1-4614-7141-7_23
  • Medeiros RP, Ramalho GL, Falcao TP. A systematic literature review on teaching and learning introductory programming in higher education. IEEE Transactions on Education. 2019; 62 (2):77–90. doi: 10.1109/te.2018.2864133. [ CrossRef ] [ Google Scholar ]
  • Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, Shekelle P, Stewart LA, PRISMA-P Group Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Systematic Reviews. 2015; 4 (1):1. doi: 10.1186/2046-4053-4-1. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Moubayed A, Injadat M, Nassif AB, Lutfiyya H, Shami A. E-Learning: Challenges and research opportunities using machine learning & data analytics. IEEE Access. 2018; 6 :39117–39138. doi: 10.1109/access.2018.2851790. [ CrossRef ] [ Google Scholar ]
  • Nabizadeh AH, Gonçalves D, Gama S, Jorge J, Rafsanjani HN. Adaptive learning path recommender approach using auxiliary learning objects. Computers & Education. 2020; 147 :103777–103793. doi: 10.1016/j.compedu.2019.103777. [ CrossRef ] [ Google Scholar ]
  • Nafea SM, Siewe F, He Y. On Recommendation of learning objects using Felder-Silverman learning style model. IEEE Access. 2019; 7 :163034–163048. doi: 10.1109/ACCESS.2019.2935417. [ CrossRef ] [ Google Scholar ]
  • Nascimento PD, Barreto R, Primo T, Gusmão T, Oliveira E. Recomendação de Objetos de Aprendizagem baseada em Modelos de Estilos de Aprendizagem: Uma Revisão Sistemática da Literatura. Proceedings of XXVIII Brazilian Symposium on Computers in Education- SBIE. 2017; 2017 :213–222. doi: 10.5753/cbie.sbie.2017.213. [ CrossRef ] [ Google Scholar ]
  • Nguyen QH, Ly H-B, Ho LS, Al-Ansari N, Le HV, Tran VQ, Prakash I, Pham BT. Influence of data splitting on performance of machine learning models in prediction of shear strength of soil. Mathematical Problems in Engineering. 2021; 2021 :1–15. doi: 10.1155/2021/4832864. [ CrossRef ] [ Google Scholar ]
  • Nichols, D. M. (1998). Implicit rating and filtering. Proceedings of the Fifth Delos Workshop: Filtering and Collaborative Filtering , 31–36.
  • Okoye, I., Maull, K., Foster, J., & Sumner, T. (2012). Educational recommendation in an informal intentional learning system. Educational Recommender Systems and Technologies , 1–23. 10.4018/978-1-61350-489-5.ch001
  • Pai M, McCulloch M, Gorman JD, Pai N, Enanoria W, Kennedy G, Tharyan P, Colford JM., Jr Systematic reviews and meta-analyses: An illustrated, step-by-step guide. The National Medical Journal of India. 2004; 17 (2):86–95. [ PubMed ] [ Google Scholar ]
  • Petri G, Gresse von Wangenheim C. How games for computing education are evaluated? A systematic literature review. Computers & Education. 2017; 107 :68–90. doi: 10.1016/j.compedu.2017.01.00. [ CrossRef ] [ Google Scholar ]
  • Petticrew M, Roberts H. Systematic reviews in the social sciences a practical guide. Blackwell Publishing. 2006 doi: 10.1002/9780470754887. [ CrossRef ] [ Google Scholar ]
  • Pinho PCR, Barwaldt R, Espindola D, Torres M, Pias M, Topin L, Borba A, Oliveira M. Developments in educational recommendation systems: a systematic review. Proceedings of 2019 IEEE Frontiers in Education Conference (FIE) 2019 doi: 10.1109/FIE43999.2019.9028466. [ CrossRef ] [ Google Scholar ]
  • Pöntinen S, Dillon P, Väisänen P. Student teachers’ discourse about digital technologies and transitions between formal and informal learning contexts. Education and Information Technologies. 2017; 22 (1):317–335. doi: 10.1007/s10639-015-9450-0. [ CrossRef ] [ Google Scholar ]
  • Pu, P., Chen, L., & Hu, R. (2011). A user-centric evaluation framework for recommender systems. Proceedings of the fifth ACM conference on Recommender systems , 157–164. 10.1145/2043932.2043962
  • Rahman MM, Abdullah NA. A personalized group-based recommendation approach for web search in E-Learning. IEEE Access. 2018; 6 :34166–34178. doi: 10.1109/ACCESS.2018.2850376. [ CrossRef ] [ Google Scholar ]
  • Ricci, F., Rokach, L., & Shapira, B. (2015). Recommender Systems: Introduction and Challenges. I Ricci, F., Rokach, L., Shapira, B. (eds), Recommender Systems Handbook , 1–34. 10.1007/978-1-4899-7637-6_1
  • Rivera, A. C., Tapia-Leon, M., & Lujan-Mora, S. (2018). Recommendation Systems in Education: A Systematic Mapping Study. Proceedings of the International Conference on Information Technology & Systems (ICITS 2018) , 937–947. 10.1007/978-3-319-73450-7_89
  • Salazar C, Aguilar J, Monsalve-Pulido J, Montoya E. Affective recommender systems in the educational field. A systematic literature review. Computer Science Review. 2021; 40 :100377. doi: 10.1016/j.cosrev.2021.100377. [ CrossRef ] [ Google Scholar ]
  • Santos IM, Ali N. Exploring the uses of mobile phones to support informal learning. Education and Information Technologies. 2012; 17 (2):187–203. doi: 10.1007/s10639-011-9151-2. [ CrossRef ] [ Google Scholar ]
  • Sergis S, Sampson DG. Learning object recommendations for teachers based on elicited ICT competence profiles. IEEE Transactions on Learning Technologies. 2016; 9 (1):67–80. doi: 10.1109/TLT.2015.2434824. [ CrossRef ] [ Google Scholar ]
  • Shani G, Gunawardana A. Evaluating recommendation systems. In: Ricci F, Rokach L, Shapira B, Kantor P, editors. Recommender Systems Handbook. Springer; 2010. pp. 257–297. [ Google Scholar ]
  • Tahereh, M., Maryam, T. M., Mahdiyeh, M., & Mahmood, K. (2013). Multi dimensional framework for qualitative evaluation in e-learning. 4th International Conference on e-Learning and e-Teaching (ICELET 2013), 69–75. 10.1109/icelet.2013.6681648
  • Tarus JK, Niu Z, Yousif A. A hybrid knowledge-based recommender system for e-learning based on ontology and sequential pattern mining. Future Generation Computer Systems. 2017; 72 :37–48. doi: 10.1016/j.future.2017.02.049. [ CrossRef ] [ Google Scholar ]
  • Tarus JK, Niu Z, Mustafa G. Knowledge-based recommendation: A review of ontology-based recommender systems for e-learning. Artificial Intelligence Review. 2018; 50 (1):21–48. doi: 10.1007/s10462-017-9539-5. [ CrossRef ] [ Google Scholar ]
  • Verbert K, Manouselis N, Ochoa X, Wolpers M, Drachsler H, Bosnic I, Duval E. Context-aware recommender systems for learning: A survey and future challenges. IEEE Transactions on Learning Technologies. 2012; 5 (4):318–335. doi: 10.1109/TLT.2012.11. [ CrossRef ] [ Google Scholar ]
  • Verbert, K., Drachsler, H., Manouselis, N., Wolpers, M., Vuorikari, R., & Duval, E. (2011). Dataset-Driven Research for Improving Recommender Systems for Learning. Proceedings of the 1st International Conference on Learning Analytics and Knowledge , 44–53. 10.1145/2090116.2090122
  • Wan S, Niu Z. A learner oriented learning recommendation approach based on mixed concept mapping and immune algorithm. Knowledge-Based Systems. 2016; 103 :28–40. doi: 10.1016/j.knosys.2016.03.022. [ CrossRef ] [ Google Scholar ]
  • Wan S, Niu Z. An e-learning recommendation approach based on the self-organization of learning resource. Knowledge-Based Systems. 2018; 160 :71–87. doi: 10.1016/j.knosys.2018.06.014. [ CrossRef ] [ Google Scholar ]
  • Wan S, Niu Z. A hybrid E-Learning recommendation approach based on learners’ influence propagation. IEEE Transactions on Knowledge and Data Engineering. 2020; 32 (5):827–840. doi: 10.1109/TKDE.2019.2895033. [ CrossRef ] [ Google Scholar ]
  • Watkins KE, Marsick VJ. Informal and incidental learning in the time of COVID-19. Advances in Developing Human Resources. 2020; 23 (1):88–96. doi: 10.1177/1523422320973656. [ CrossRef ] [ Google Scholar ]
  • Wu D, Lu J, Zhang G. A Fuzzy Tree Matching-based personalized E-Learning recommender system. IEEE Transactions on Fuzzy Systems. 2015; 23 (6):2412–2426. doi: 10.1109/TFUZZ.2015.2426201. [ CrossRef ] [ Google Scholar ]
  • Wu Z, Li M, Tang Y, Liang Q. Exercise recommendation based on knowledge concept prediction. Knowledge-Based Systems. 2020; 210 :106481–106492. doi: 10.1016/j.knosys.2020.106481. [ CrossRef ] [ Google Scholar ]
  • Yanes N, Mostafa AM, Ezz M, Almuayqil SN. A machine learning-based recommender system for improving students learning experiences. IEEE Access. 2020; 8 :201218–201235. doi: 10.1109/ACCESS.2020.3036336. [ CrossRef ] [ Google Scholar ]
  • Zapata A, Menéndez VH, Prieto ME, Romero C. Evaluation and selection of group recommendation strategies for collaborative searching of learning objects. International Journal of Human-Computer Studies. 2015; 76 :22–39. doi: 10.1016/j.ijhcs.2014.12.002. [ CrossRef ] [ Google Scholar ]
  • Zhang S, Yao L, Sun A, Tay Y. Deep learning based recommender system. ACM Computing Surveys. 2020; 52 (1):1–38. doi: 10.1145/3285029. [ CrossRef ] [ Google Scholar ]
  • Zhong J, Xie H, Wang FL. The research trends in recommender systems for e-learning: A systematic review of SSCI journal articles from 2014 to 2018. Asian Association of Open Universities Journal. 2019; 14 (1):12–27. doi: 10.1108/AAOUJ-03-2019-0015. [ CrossRef ] [ Google Scholar ]

Fairness in recommender systems: research landscape and future directions

  • Open access
  • Published: 24 April 2023
  • Volume 34 , pages 59–108, ( 2024 )

Cite this article

You have full access to this open access article

  • Yashar Deldjoo 1 ,
  • Dietmar Jannach 2 ,
  • Alejandro Bellogin 3 ,
  • Alessandro Difonzo 1 &
  • Dario Zanzonelli 1  

7495 Accesses

16 Citations

11 Altmetric

Explore all metrics

Recommender systems can strongly influence which information we see online, e.g., on social media, and thus impact our beliefs, decisions, and actions. At the same time, these systems can create substantial business value for different stakeholders. Given the growing potential impact of such AI-based systems on individuals, organizations, and society, questions of fairness have gained increased attention in recent years. However, research on fairness in recommender systems is still a developing area. In this survey, we first review the fundamental concepts and notions of fairness that were put forward in the area in the recent past. Afterward, through a review of more than 160 scholarly publications, we present an overview of how research in this field is currently operationalized, e.g., in terms of general research methodology, fairness measures, and algorithmic approaches. Overall, our analysis of recent works points to certain research gaps. In particular, we find that in many research works in computer science, very abstract problem operationalizations are prevalent and questions of the underlying normative claims and what represents a fair recommendation in the context of a given application are often not discussed in depth. These observations call for more interdisciplinary research to address fairness in recommendation in a more comprehensive and impactful manner.

Similar content being viewed by others

recommender system research topics

Fairness in rankings and recommendations: an overview

Evaggelia Pitoura, Kostas Stefanidis & Georgia Koutrika

recommender system research topics

Consumer-side fairness in recommender systems: a systematic survey of methods and evaluation

Bjørnar Vassøy & Helge Langseth

recommender system research topics

Recommender Systems: Techniques, Effects, and Measures Toward Pluralism and Fairness

Avoid common mistakes on your manuscript.

1 Introduction

Recommender systems (RS) are one of the most visible and successful applications of AI technology in practice, and personalized recommendations—as provided on many modern e-commerce or media sites—can have a substantial impact on different stakeholders. On e-commerce sites, for example, the choices of consumers can be largely influenced by recommendations, and these choices are often directly related to the profitability of the platform. On news websites or social media, on the other hand, personalized recommendations may determine to a large extent which information we see, which in turn may shape not only our own beliefs, decisions, and actions but also the beliefs of a community of users or an entire society.

In academia, recommenders have historically been considered as “benevolent” systems that create value for consumers, e.g., by helping them find relevant items, and that this value for consumers then translates to value for businesses, e.g., due to higher sales numbers or increased customer retention (Jannach and Jugovac 2019 ). Only in the most recent years was more awareness raised regarding possible negative effects of automated recommendations, e.g., that they may promote items on an e-commerce site that mainly maximize the profit of providers or that they may lead to an increased spread of misinformation on social media.

Given the potentially significant effects of recommendations on different stakeholders, researchers increasingly argue that providing recommendations may raise various ethical questions and should thus be done in a responsible way (Ntoutsi et al. 2020 ; Trattner et al. 2022 ). One important ethical question in this context is that of the fairness of a recommender system, see (Burke 2017 ; Ekstrand et al. 2022 ), reflecting related discussions on the more general level of  fair machine learning and  fair AI (Mehrabi et al. 2021 ; Barocas et al. 2019 ; Ntoutsi et al. 2020 ).

During the last few years, researchers have discussed and analyzed different dimensions in which a recommender system should be fair or vice versa.

Given the nature of fairness as a social construct, it, however, seems difficult or even impossible (Ekstrand et al. 2022 ), to establish a general definition of what represents a fair recommendation. In addition to the subjectivity of fairness, there are frequently competing stakeholder interests to account for in real-world recommendation contexts (Naghiaei et al. 2022 ; Abdollahpouri et al. 2020a ).

With this survey, we aim to provide an overview of what has been achieved in this emerging area so far and highlight potential research gaps. Specifically, drawing on an analysis of more than 150 recent papers in computer science, we investigate (i) which dimensions and definitions of fairness in RS have been identified and established, (ii) at which application scenarios researchers target and which examples they provide, and (iii) how they operationalize the research problem in terms of methodology, algorithms, and metrics. Based on this analysis, we then paint a landscape of current research in various dimensions and discuss potential shortcomings and future directions for research in this area.

Overall, we find that research in computing typically assumes that a clear definition of fairness is available, thus rendering the problem as one of designing algorithms to optimize a given metric. Such an approach may however appear too abstract and simplistic, cf. Selbst et al. ( 2019 ), calling for more faceted and multi-disciplinary approaches to research in fairness-aware recommendation.

The paper is organized as follows. Next, in Sect.  2 , we lay out the motivation behind this survey in more detail, and we present the essential notions used to characterize fairness in the literature. Section  3 then presents our methodology to identify and categorize relevant research works. Section  4 represents the main part of our study, which paints the current research landscape of fairness in recommender systems in various dimensions, e.g., in terms of the addressed fairness problem and the chosen research methodology. In Sect.  5 , we then reflect on these observations and identify open challenges and possible future research directions.

2 Background and foundations

2.1 examples of unfair recommendations.

In the general literature on Fair ML/AI, an important application case is the automated prediction of recidivism by convicted criminal. In this case, an ML-based system is usually considered unfair if its predictions depend on demographic aspects like ethnicity and when it then ultimately discriminates members of certain ethnic groups  (Angwin et al. 2016 ). In the context of our present work, such use cases of ML-based decision-support systems are not in focus. Instead, we focus on common application areas of RS where  personalized item suggestions are made to users, e.g., in e-commerce, media streaming, or news and social media sites.

At first sight, one might think that the recommendation providers here are independent businesses and it is entirely at their discretion which shopping items, movies, jobs, or social connections they recommend on their platforms. Also, one might assume that the  harm that is made by such recommendations is limited, compared, e.g., to the legal decision problem mentioned above. There are, however, several situations also in typical application scenarios of RS where many people might think a system is unfair in some sense. For example, an e-commerce platform might be considered unfair if it mainly promotes those shopping items that maximize its own profit but not consumer utility. Besides such intentional interventions, there might also be situations where an RS reinforces existing discrimination patterns or biases in the data, e.g., when a system on an employment platform mainly recommends lower-paid jobs to certain demographic groups.

Nonetheless, questions of fairness in RS extend beyond the consumer’s perspective. In reality, a recommendation service often involves multiple stakeholders (Abdollahpouri et al. 2020a ). On music streaming platforms, for example, we have not only the consumers but also the artists, record labels, and the platform itself, which might have diverging goals that may be affected by the recommendation service. Artists and labels are usually interested in increasing their visibility through recommendations. On the other hand, platform providers might seek to maximize engagement with the service across the entire user base, which might result in promoting mostly already popular artists and tracks with the recommendations. Such a strategy, however, easily leads to a “rich-get-richer” effect and reduces the chances of less popular artists being exposed to consumers, which might be considered  unfair to providers . Finally, there are also use cases where recommendations may have  societal impact, particularly on news and social media sites. Some may consider it unfair if a recommender system only promotes content that emphasizes one side of a political discussion or promotes misinformation that is suitable to discriminate against certain user groups.

As we will see later, different notions of fairness exist in the literature. What is important, however, is that in any discussed scenario, there are certain ethical questions or principles which are put at stake, and these are usually related to some underlying normative claims (Srivastava et al. 2019 ; Cooper 2020 ). Our research, however, indicates that these normative claims are often not unpacked and discussed to a sufficient extent in today’s research on fairness in recommender systems. For instance, it may be argued that the issue with an e-commerce site optimizing for profit is not that it does so, but rather that it does so while misleading people into believing that recommendations are tailored to their needs. In situations such as this, the distinction between unfair and deceptive business activities can easily get blurred.

We note here that being fair to consumers or society in the bespoke examples may, in turn, also service providers, e.g., when consumers establish long-term trust due to valuable recommendations or when they engage more with a music service when they discover more niche content. Finally, there are also  legal guardrails that may come into play, e.g., when a large platform uses a monopoly-like market position to put certain providers inappropriately into bad positions. The current draft of the European Commission’s Digital Service Act Footnote 1 can be seen as a prime example where recommender systems and their potential harms are explicitly addressed in legal regulations, as it “ calls for more fairness, transparency and accountability for digital services’ content moderation processes, ensuring that fundamental rights are respected, and guaranteeing independent recourse to judicial redress .”

Overall, several examples exist where recommendations might be considered unfair for different stakeholders. In the context of the survey presented in this work, we are particularly interested in which specific  real-world problems related to unfair recommendations are considered in the existing literature.

2.2 Reasons for unfairness

There are different reasons why a recommender system might exhibit behavior that may be considered unfair. For example, in Ekstrand et al. ( 2022 ), the authors report that unfairness can arise in many places, either in society, in the observations that form our data, and in the construction, evaluation, and application of decision support models. Similarly, in Ashokan and Haas ( 2021 ), the authors classify the biases in a computing system as pre-existing bias, technical bias, and emergent bias, whereas in Olteanu et al. ( 2019 ) the authors differentiate between issues introduced when collecting social data (in general, not focused on recommender systems), introduced while processing such data, pitfalls that occurred when analyzing data, and issues with the evaluation and interpretation of the findings. Herein, our discussions are based on insights from these and other earlier works, aiming to summarize and highlight the main causes of unfairness reported in the literature.

One first common issue mentioned in the literature is that the data on which the machine learning model is trained is biased  (Chen et al. 2022 ; Olteanu et al. 2019 ). Such biases might, for example, result from the specifics of the data collection process, e.g., when a biased sampling strategy is applied. A machine learning model may then “pick up” such a bias and reflect it in the resulting recommendations.

Another source of unfairness may lie in the machine learning model itself, e.g., when it reinforces existing biases or skewed distributions in the underlying data. Differences between recommendation algorithms in terms of reinforcing popularity biases and concentration effects were, for example, examined in Jannach et al. ( 2015 ). In some cases, the machine learning model might also directly consider a “protected characteristic” (or a proxy thereof) in its predictions (Ekstrand et al. 2022 ). To avoid discrimination, and thus unfair treatment, of certain groups, a machine learning model should therefore not make use of protected characteristics such as age, color, or religion (fairness through unawareness) (Grgic-Hlaca et al. 2016 ). Despite its appealing simplicity, this definition has a clear issue, as sensitive characteristics may have historically affected non-sensitive characteristics (e.g., a person’s GPA may have been influenced by their socioeconomic status). In order to adjust for biases in data collection or historical outcomes, it has been argued that, in fact, protected characteristics must be taken into account to place other observable features in context (Kusner et al. 2017 ).

Unfairness that is induced by the underlying data or algorithms may arise unknowingly to the recommendation provider. It is, however, also possible that a certain level of unfairness is designed into a recommendation algorithm, e.g., when a recommendation provider aims to maximize monetary business metrics while simultaneously keeping users satisfied as much as possible (Ghanem et al. 2022 ; Jannach and Adomavicius 2017 ). Likewise, a recommendation provider may have a political agenda and particularly promote the distribution of information that mainly supports their own viewpoints.

Some works finally mention that the “world itself may be unfair or unjust” (Ekstrand et al. 2022 ), e.g., due to historical discrimination of certain groups. In the context of  algorithmic fairness—which is the topic of our present work—such historical developments are, however, often not in the focus even though the real reason certain characteristics are regarded protected is because of historical discrimination or subordination, where redress is necessary. Rather, the question is to what extent this is reflected in the data or how this unfairness influences the fairness goals.

In general, the underlying reasons also determine where in a machine learning pipeline Footnote 2 interventions can or should be made to ensure fairness (or to mitigate unfairness). In a common categorization, (Mehrabi et al. 2021 ; Shrestha and Yang 2019 ; Pitoura et al. 2022 ; Zehlike et al. 2022a ), this could be achieved (i) in a data pre-processing phase, (ii) during model learning and optimization, and (iii) in a post-processing phase. In particular, in the model learning and post-processing phase, fairness-ensuring algorithmic interventions must be guided by an operationalizable (i.e., mathematically expressed) goal. In the case of affirmative action policies, one could, for example, aim to have an equal distribution of recommendations of members of the majority group and members of an underrepresented group. As we will see in Sect.  4 , such a goal is often formalized as a target distribution and/or as an evaluation metric to gauge the level of existing or mitigated fairness.

2.3 Notions of fairness

When dealing with phenomena of unfairness such as those outlined, and when our purpose is to prevent or mitigate such phenomena, a question arises: what do we consider fair in general and in a particular application context? Fairness, in general, is fundamentally a societal construct or a  human value , which has been discussed for centuries in many disciplines like philosophy and moral ethics, sociology, law, or economics. Correspondingly, countless definitions of fairness were proposed in different contexts, see for example (Verma and Rubin 2018 ; Verma et al. 2020 ) for a high-level discussion of the definition of fairness in machine learning and ranking algorithms, or Mulligan et al. ( 2019 ) for the relationship to social science conception of fairness. As we will see in the remainder of this survey, fairness is a complex concept with multiple perspectives. Consequently, there are numerous definitions, but none of them appear to be exhaustive.

In general, the societal constructs around fairness mainly depend on how moral standards or dilemmas are addressed: either through  descriptive or  normative approaches (Srivastava et al. 2019 ). While normative ethics involves creating or evaluating moral standards to decide what people should do or whether their current moral behavior is reasonable, descriptive (or comparative) ethics is a form of empirical research into the attitudes of individuals or groups of people towards morality and moral decision-making. As mentioned above, normative claims are often not explicitly specified in existing research, both in general machine learning and in recommender systems research. In fact, it was already recommended in earlier research to make these assumptions more explicit (Cooper 2020 ). From our study of the literature, we observe that a majority of the works did not clarify what the actual normative claim is being addressed or who is representing or making such claims.

As a possible consequence of this problem, we also observe that researchers, in most cases, do not refer to a specific public discussion of the topic at hand. For many papers on recommender systems, there is, for example, no indication or evidence that there is a public debate outside computer science, e.g., whether or not it is fair to recommend niche movies. Nonetheless, it is true that there actually are areas, like job recommendation, where a public discussion takes place, e.g., about discrimination and what normative claims are agreed to be addressed.

The primary notions of fairness that will be used throughout this review—as extracted from the aforementioned literature and recent surveys (Li et al. 2022 ; Wang et al. 2022b )—are presented next and further expanded in Section 4.6 . We emphasize that these definitions present a specific perspective on defining the concept of fairness. They are, however, not necessarily  orthogonal and  all-encompassing . Table  1 shows examples of fictitious statements of a user regarding unfairness in a job recommendation scenario under different notions of fairness.

Group vs. individual : Individual fairness roughly expresses that similar individuals should be treated similarly, e.g., candidates with similar qualifications should be ranked similarly in a job recommendation scenario. Group fairness, in contrast, aims to ensure that “different groups have similar experience” (Ekstrand et al. 2022 ), i.e., protected groups receive similar benefits from the decision-making as others. Typical groups in such a context are a majority or dominant group and a protected group (e.g., an ethnic minority). Since this may be too simplistic, other authors state we are all equal as the fundamental logic underlying group fairness (Friedler et al. 2021 ), asserting their equivalence as a starting point.

Process vs. outcome: Process (or: treatment) unfairness means that individuals with similar non-sensitive attributes receive different outcomes solely due to the difference in sensitive features. Outcome (or: impact) unfairness occurs when a system produces outputs that benefit (harm) a group of individuals sharing a sensitive attribute value more frequently than other groups (Zafar et al. 2017 ). Put it differently, process fairness assesses aspects such as the data used, the decision-making principles of the system, and the causal association between inputs and outputs. In contrast, outcome fairness disregards the internal operation of the system and concentrates solely on the equitable distribution of rewards (Amigó et al. 2023 ).

Direct vs. indirect : Fairness can also be analyzed based on whether particular sensitive feature holders are directly harmed or not (Council et al. 2004 ). Direct fairness refers to situations in which persons receive less favorable treatment based on protected characteristics such as race, religion, or gender. When the reasons for the discrimination are only tenuously connected to (or identical to) the protected characteristic, we have indirect fairness. Footnote 3 For example, some institutions use the location of candidates as a  proxy for an overtly discriminating characteristic (e.g., race) (Zhang and Bareinboim 2018 ).

Statistical vs. predictive parity : In machine learning, fairness definitions fundamentally seek some sort of equity on various portions of the  confusion matrix used for binary classification evaluation. Statistical parity is independent of the actual value and requires protected group members to have an equal positive prediction rate. Predictive parity employs the actual outcome and requires that the model’s precision (or accuracy) is comparable for all subgroups under consideration.

Static vs. dynamic : In static fairness, the recommendation environment is fixed during the recommendation process; hence, the user activity level is assumed to remain unchanged. Dynamic fairness definitions, on the other hand, integrate the (typical) dynamic attribute of most recommender systems, which needs to consider new user interactions, new items, or continually evolving user groups.

Associative vs. causal : Associative fairness metrics are computed based on data and do not allow reason about the causal relations between the features and the decisions. Causal fairness definitions, on the other hand, are usually defined in terms of (non-observable) interventions and counterfactuals and tend to consider the additional structural knowledge of the system regarding how variables propagate on a causal model (Li et al. 2022 ).

Other categorizations can be found in the literature, based on  short-term vs. long-term considerations (according to the duration of the fairness requirements),  granularity (whether the system applies the same fairness notion to everyone or if users could decide how they want to be treated by the system),  transparency (to discriminate notions that are explainable from those that are a black box), or  depending on the associated fairness concept (such as consistent, calibrated, counterfactual, Rawlsian maximin, envy-free, and maximin-shared) (Li et al. 2022 ; Wang et al. 2022b ; Amigó et al. 2023 ). An in-depth discussion of these—sometimes even incompatible (Verma and Rubin 2018 ; Amigó et al. 2023 )—notions of fairness is beyond the scope of this work, which focuses on an analysis of how scholars in recommender systems operationalize the research problem. For questions of individual fairness, this might relate to the problem of defining a similarity function. For certain group fairness goals, on the other hand, one has to determine which are the (protected) attributes that determine group membership. Furthermore, it is often required to define/indicate precisely some target distributions . Later, in Sect.  4 , where we review the current literature, we will introduce additional notions of fairness and their operationalizations as they are found in the studied papers. As we will see, a key point here is that researchers often propose to use very abstract operationalizations (e.g., in the form of fairness metrics), which was identified earlier as a potential key problem in the broader area of fair ML in Selbst et al. ( 2019 ).

2.4 Related concepts: responsible recommendation and biases

Issues of fairness are often discussed within the broader area of responsible recommendation (Elahi et al. 2022 ; Ekstrand et al. 2022 ; Di Noia et al. 2022 ), with the key dimensions generalizability ,  robustness (Deldjoo et al. 2020 , 2021c ),  privacy (Anelli et al. 2021 ; Friedman et al. 2015 ),  interpretability (Tintarev and Masthoff 2022 ; Deldjoo et al. 2023 ), and fairness , with the definitions of these concepts blurring as we progress through the list. In Elahi et al. ( 2022 ), the authors, in particular, discuss the potential negative effects of recommendations and their underlying reasons with a focus on the media domain. Specific phenomena in this domain include the emergence of filter bubbles and echo chambers. There are, however, also other more general potential harms such as popularity biases as well as fairness-related aspects like discrimination that can emerge in media recommendation setting, for example, when one gender or race is treated differently just based on this attribute, as when suggesting images for a specific profession. Fairness is therefore seen as a particular aspect of responsible recommendation in Elahi et al. ( 2022 ). A similar view is taken in Ekstrand et al. ( 2022 ), where the authors review a number of related concerns of responsibility: accountability, transparency, safety, privacy, and ethics. In the context of our present work, most of these concepts are however only of secondary interest.

More important, however, is the use of the term bias in the related literature. As discussed above, one frequently discussed topic in the area of recommender systems is the problem of biased data (Chen et al. 2022 ; Baeza-Yates 2018 ). One issue in this context is that the data that is collected from existing websites—e.g., regarding which content visitors view or what consumers purchase—may in part be the result of an already existing recommender system and, hence, biased by what is shown to users. This, in turn, then may lead to biased recommendations when machine learning models reflect or reinforce the bias, as mentioned above. In works that address this problem, the term bias is often used in a more statistical sense, as done in Ekstrand et al. ( 2022 ). However, the use of the term is inconsistent in the literature, as also observed in our work in Chen et al. ( 2022 ) and in our work. In some early papers, bias is used almost synonymously with fairness. In Friedman and Nissenbaum ( 1996 ), for example, bias is used to “ refer to computer systems that systematically and unfairly discriminate against certain individuals or groups of individuals in favor of others ”. In our work, we acknowledge that biased recommendations may be unfair, but we do not generally equate bias with unfairness. Considering the problem of popularity bias in recommender systems, such a bias may lead to an over-proportional exposure of certain items to users. This, however, not necessarily leads to unfairness in an ethical or legal sense. Instead, it all depends on the underlying ethical principles and normative claims, as discussed before. Moreover, an in-depth discussion and systematic comparison of various forms of biases is beyond the scope of our work; we instead refer the reader to Chen et al. ( 2022 ), where different forms of biases are discussed in more depth.

3 Research methodology

In this section, we first describe our methodology for identifying relevant papers for our survey. Afterward, briefly discuss how our survey extends previous works in this area.

3.1 Paper collection process

We adopted a mixed and semi-systematic approach to identify relevant research papers. Footnote 4 In the first step, we identified relevant research papers by querying the DBLP Footnote 5 digital library with predefined search terms and a set of explicit criteria for inclusion and exclusion. Afterwards, to include relevant papers which did not match the search terms in this still-evolving field, we (a) applied a snow-balling procedure and (b) relied on researcher experience to identify other relevant papers that were published in focused outlets.

Based on our prior knowledge about the literature, we used the following search terms in order to cover a wide range of works in an emerging area, where terminology is not yet entirely unified: fair recommend , fair collaborative system , fair collaborative filtering , bias recommend , debias recommend , fair ranking , bias ranking , unbias ranking , re-ranking recommend , reranking recommend . To identify papers, we queried DBLP in its respective search syntax, stating that the provided keywords must appear in the title of the paper.

From the returned results, we then removed all papers that were published only as preprints on arXiv.org Footnote 6 and we removed survey papers. We then manually scanned the remaining 268 papers. In order to be included in this survey, a paper had to fulfill the following additional criteria:

It had to be explicitly about fairness , at least by mentioning this concept somewhere in the paper. Papers which, for example, focus on mitigating popularity biases, but which do not mention that fairness is an underlying goal of their work, were thus not considered.

It had to be about recommender systems . Given the inclusiveness of our set of query terms, a number of papers were returned which focused on fair information retrieval. Such works were also excluded from our study.

This process left us with 157 papers. The papers were read by at least two researchers and categorized in various dimensions, see Sect.  4 .

3.2 Relation to previous surveys

A number of related surveys were published in the last few years. The survey provided by Chen et al. ( 2022 ) focuses on biases in recommender systems, and connects different types of biases, e.g., popularity biases, with questions of fairness, see also (Abdollahpouri et al. 2020b ). Note that bias mitigation in recommendation mostly focuses on increasing the accuracy or robustness of the recommendations through debiasing approaches, rather than on promoting fairness.

The recent monograph by Ekstrand et al. ( 2022 ) discusses fairness aspects in the broader context of information access systems, an area that covers both information retrieval and recommender systems. Their comprehensive work in particular includes a taxonomy of various fairness dimensions, which also serves as a foundation of our present work. This study differs from our work in that our objective is not to give a fresh classification of fairness concepts and methods found in the literature. Instead, our main objective is to investigate the current state of existing research, e.g., in terms of which concepts and algorithmic approaches are predominantly investigated and where there might be research gaps. Ekstrand et al., on the other hand, focus more generally on future directions in this area.

Different survey papers were published also in the more general area of fair machine learning or fair AI, as mentioned above (Mehrabi et al. 2021 ; Barocas et al. 2019 ). Clearly, many questions and principles of fair AI apply also to recommender systems, which can be seen as a highly successful area of applied machine learning. Differently from such more general works, however, our present work focuses on the particularities of fairness in recommender systems.

Very recently, while we conducted our research, a number of alternative surveys on fairness in recommender systems have become available as preprints or peer-reviewed publications, including Pitoura et al. ( 2022 ), Zehlike et al. ( 2022b ), Wang et al. ( 2022b ), and Li et al. ( 2022 ). Clearly, there is a certain overlap of our survey and these recent publications, e.g., in terms of the used taxonomy of fairness-related aspects. Note, however, that unlike some of these papers, e.g., Li et al. ( 2022 ), Pitoura et al. ( 2022 ), our aim is not to establish a new taxonomy or to discuss the technical details of specific computational metrics or algorithmic approaches that were proposed in the past literature. Instead, our aim is to paint a landscape of existing research and to thereby identify potential research gaps. In that context, our work has similarities with the work by Wang et al. ( 2022b ), who reviewed and categorized 60 recent works on fairness in recommender systems. While our survey involves a larger number of papers, Wang et al. dive deeper into the technicalities of particular approaches, which is not the focus of our work. Here, in contrast, we aim to paint a broader picture of today’s research activities and existing gaps without entering into the technical specifics of existing approaches. Moreover, our work also emphasizes more on evaluation aspects and on potential methodological issues in this research area. The recent work by Zehlike et al. ( 2022b ), finally, mainly discusses individual research works in detail, also including more general ones on learning-to-rank. The overlap with this work, except for the discussion of different dimensions of fairness, is therefore limited.

In general, the goal of these existing works is mainly to review and synthesize the various existing approaches so far to design fair recommender systems and to evaluate them. The goal of our work is indeed different, as we aim to analyze and quantify which notions of fairness the research community is working on and how the research problem is operationalized. Differently from previous surveys, our study can therefore inform about the less frequently studied areas, and thus potential gaps, of fairness research in a quantitative manner. Moreover, our analyses of the applied research methodologies reveal a very strong predominance of data-based experiments, which rely on abstract computational metrics and do not involve humans in the loop. We, therefore, believe that our survey complements existing surveys well.

4 Landscape of fairness research in recommender systems

In this section, we categorize the identified literature along different dimensions to paint a landscape of current research and to identify existing research gaps.

4.1 Publication activity per year

Interest in fairness in recommender systems has been constantly growing over the past few years. Figure  1 shows the number of papers per year that were considered in our survey. Questions of fairness in information retrieval have been discussed for many years, see, e.g., Pedreshi et al. ( 2008 ) for an earlier work. The area has been consistently growing since then, leading also to the establishment of dedicated conference series like the ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT). Footnote 7 In the area of recommender systems, however, the earliest paper we identified through our search, which only considers papers in which fairness is explicitly addressed, was published as late as in 2017.

figure 1

Number of papers published per year. The entire number of papers sum up to 157

4.2 Types of contributions

Academic research on recommender systems in general is largely dominated by algorithmic contributions, and we correspondingly observe a large amount of new methods that are published every year. Clearly, building an effective recommender system requires more than a smart algorithm, e.g., because recommendation to a large extent is also a problem of human-computer interaction and user experience design (Jannach et al. 2016 , 2021 ). Now when questions of fairness should be considered as well, the problem becomes even more complex as for example ethical questions may come into play and we may be interested on the impact of recommendations on individual stakeholders, including society.

In the context of our study, we were therefore interested in which general types of contributions we find in the computer science and information systems literature on fair recommendation. Based on the analysis of the relevant papers, we first identified two general types of works: (a) technical papers, which, e.g., propose new algorithms, protocols, and metrics or analyze data, and (b) conceptual papers. The latter class of papers is diverse and includes, for example, papers that discuss different dimensions of fair recommendations, papers that propose conceptual frameworks, or works that connect fairness with other quality dimensions like diversity.

We then further categorized the technical papers in terms of their specific technical type of contribution. The main categories we identified based on the research contributions of the surveyed papers are (a) algorithm papers, which for example propose re-ranking techniques, (b) analytic papers, which for example study the outcomes of a given algorithm, and (c) methodology papers, which propose new metrics or evaluation protocols.

Figure  2 shows how many papers in our survey were considered as technical and conceptual papers. Non-technical papers cover a wide range of contributions, such as guidelines for designers to avoid compounding previous injustices (Schelenz 2021 ), exploratory studies that investigate user perceptions of fairness (Sonboli et al. 2021 ), or discussions about how difficult it is to audit these types of systems (Krafft et al. 2020 ).

figure 2

Technical vs. Conceptual Papers

We observe that today’s research on fairness on recommender systems is dominated by technical papers. In addition, we find that the majority of these works focuses on improved algorithms, e.g., to debias data or to obtain a fairer recommendation outcome through list re-ranking. To some extent this is expected as we focus on the computer science literature. However, we have to keep in mind that the concepts of fairness and unfairness or social constructs may depend on a variety of environmental factors in which a recommender system is deployed. As such, the research focus in the area of fair recommender systems seems rather narrow and on algorithmic solutions. As we will observe later, however, such algorithmic solutions commonly assume that some pre-existing and mathematically defined optimization goals are available, e.g., a target distribution of recommendations. In practical applications, the major challenges mostly lie (a) in establishing a common understanding and agreement on such fairness goals and (b) in finding or designing operationalizable optimization goals (e.g., a computational metric) which represent reliable measures or proxies for the given fairness goals.

4.3 Categorization of notions of fairness in literature

In Li et al. ( 2021c ), a taxonomy of different notions of fairness was introduced: group vs. individual, single-sided vs. multi-sided, static vs. dynamic, and associative vs. causal fairness; see also our discussions in Sect.  2.3 . In the following, we review the literature following this taxonomy. Footnote 8

Group vs. individual fairness A very common differentiation in fair recommendation is to distinguish between group fairness and individual fairness, as indicated before. With group fairness, the goal is to achieve some sort of statistical parity between protected groups (Binns 2020 ). In fair machine learning, a traditional goal often is to ensure that there are equal number of members of each protected group in the outcome, e.g., when it comes to make a ranked list of job candidates. The protected groups in such situations are commonly determined by characteristics like age, gender, or ethnicity. Achieving individual fairness in the described scenario means that candidates with similar characteristics should be treated similarly. To operationalize this idea, therefore some distance metric is needed to assess the similarity of individuals. This can be a challenging task, since there is no consensus on the notion of similarity, and it could be task-specific (Dwork et al. 2012 ). Ideas of individual fairness in machine learning were discussed in an early work in Dwork et al. ( 2012 ), where it was also observed that achieving group fairness might lead to an unfair treatment at the individual level. In the candidate ranking example, favoring members of protected groups to achieve parity might ultimately result in the non-consideration of a better qualified candidate from a non-protected group. As a result, group and individual fairness are frequently viewed as trade-offs, which is not always immediately evident (Binns 2020 ).

figure 3

Group vs. Individual Fairness

Figure  3 shows how many of the surveyed papers focus on each category. The figure shows that research on scenarios where group fairness is more common than works that adopt the concept of individual fairness. Only in rare cases, both types of fairness are considered.

Group fairness entails comparing, on average, the members of the privileged group against the unprivileged group. One overarching aspect to identify research papers on groups fairness is the distinction between the (i) benefit type (exposure vs. relevance), and (ii) major stakeholders (consumer vs. provider). Exposure relates to the degree to which items or item groups are exposed uniformly to all users/user groups. Relevance (accuracy) indicates how well an item’s exposure is effective, i.e., how well it meets the user’s preference. For recommender systems, where users are first-class citizens, there are multiple stakeholders, consumers, producers, and other stakeholders (see next section).

To perform fairness evaluation for item recommendation tasks, the users or items are divided into non-overlapping groups (segments) based on some form of attributes . These attributes can be either supplied externally by the data provider (e.g., gender, age, race) or computed internally Footnote 9 from the interaction data (e.g., based on user activity level, mainstreamness, or item popularity) (Abdollahpouri et al. 2021 ; Li et al. 2021a ). In Table  2 , we provide a list of the most commonly used attributes in the recommendation fairness literature, which can be utilized to operationalize the group fairness concept. They are divided according to Consumer fairness (C-Fairness), Producer Fairness (P-Fairness), and combinations (CP-Fairness)  (Burke 2017 ) or multi-sided fairness..

Additionally, it is possible to observe in RS settings that these sensitive attributes may be provided by external providers as demographic metadata (for example, user’s gender, age, occupation), or they may be extracted from user-item interaction data, for example, dividing users based on their level of activity (i.e., active vs. inactive users), or the types of items they consume (e.g., mainstream-users vs. non-mainstream). Here a related concept is obfuscation (Slokom et al. 2021 ), which is a strategy for privacy protection to conceal sensitive information. Fairness and privacy can be considered as interwoven under obfuscation, as described by Dwork et al. ( 2012 ) and Pessach and Shmueli ( 2022 ), where a violation of privacy can lead to unfairness due to an adversary’s capacity to infer sensitive information about an individual and utilize it in a discriminatory manner.

Moreover, in the area of recommender systems, a number of people recommendation scenarios can be identified that are similar to classical fair ML problems. These include recommenders on dating sites, social media sites that provide suggestions for connections, and specific applications, e.g., in the educational context (Gómez et al. 2021 ). In these cases, user demographics may play a major role, together with other factors such as popularity, expertise, and availability at a certain point in time. However, in many other cases, e.g., in e-commerce recommendation or media recommendation, it is not always immediately clear what protected groups may be. In Li et al. ( 2021a ) and other works, for example, user groups are defined based on their activity level, and it is observed that highly active users (of an e-commerce site) receive higher-quality recommendations in terms of usual accuracy measures. This is in general not surprising because there is more information a recommender system can use to make suggestions for more active users. However, it stands to question if an algorithm that returns the best recommendations it can generate given the available amount of information should be considered unfair per se. In fact, merely observing different levels of recommendation accuracy for more active and less active users may not be enough to conclude that a system is unfair. Instead, it is important to carefully elaborate on the underlying reasons and the related normative claims. Some particular user groups may for example have had fewer opportunities to engage with a system.

Recent studies have also focused on two-sided CP-Fairness, as illustrated in Naghiaei et al. ( 2022 ); Rahmani et al. ( 2022b ). In these works, the authors demonstrate the existence of inequity in terms of exposure to popular products and the quality of recommendation offered to active users. It is unknown if increasing fairness on one or both sides (consumer/producers) has an effect on the overall quality of the system. In Naghiaei et al. ( 2022 ), an optimization-based re-ranking strategy is then presented that leverages consumer and provider-side benefits as constraints. The authors demonstrate that it is feasible to boost fairness on both the user and item sides without compromising (and even enhancing) recommendation quality.

Different from traditional fairness problems in ML, research in fairness for recommenders also frequently considers the concept of fairness towards items or their providers (suppliers), see also (Li et al. 2021c ), which differentiates between user and item fairness. In these research works, the idea often is to avoid an unequal (or: unfair) exposure of items from different providers, e.g., artists in a music recommendation scenario. The term item fairness , although used in the literature, may however not be optimal. In reality, it might be argued that this perspective is only important because the item providers—hence, other people or organizations—are actually impacted and, therefore, the underlying fairness concept aims to convey some sense of social justice related to people.

In some works, e.g., Boratto et al. ( 2021a ), the popularity of items is considered an important attribute. Typical goals in that context are to give fair exposure to items that belong to the long tail, or to include a combination of popular and less popular items in a user-calibrated fashion (Abdollahpouri et al. 2021 ). In other research works that focus on fair item exposure, e.g., in Gupta et al. ( 2021 ), groups are defined based on attributes that are in practice not protected in legal terms or based on some accepted normative claim, e.g., the price range of accommodation. The purpose of such experiments is usually to demonstrate the effectiveness of an algorithm if (any) groups were given. Nonetheless, in these cases it often remains unclear in which ways evaluations make sense with datasets from domains where there is no clear motivation for considering questions of fairness. Also, in cases where the goal is to increase the exposure of long-tail items, no particular motivation is usually provided about why recommending (already) popular items is generally unfair. There are often good reasons why certain items are unpopular and should not be recommended, for example, simply because they are of poor quality (Zhao et al. 2022 ).

Fairness for items at the individual level, in particular for cold-start items, is for example discussed in Zhu et al. ( 2021 ). In general, as shown in Fig.  3 , works that consider aspects of individual fairness are less frequently investigated than group fairness scenarios. An even smaller number of works addresses both types of fairness.

The definition from classical fair ML settings—similar individuals should be treated similarly—cannot always be directly transferred to recommendation scenarios. In Edizel et al. ( 2020 ), for example, the goal is to make sure that the system is not able to derive a user’s sensitive attribute, e.g., gender, and should thus be able to treat male and female individuals similarly Footnote 10 . Most other works that focus on individual fairness address problems of group recommendation , i.e., situations where a recommender is used to make item suggestions for a group of users. Group recommendation problems have been studied for many years (Masthoff and Delic 2022 ; Felfernig et al. 2018 ), usually with the goal to make item suggestions that are acceptable for all group members and where all group members are treated similarly. In the past, these works were often not explicitly mentioning fairness as a goal, because this was an implicit underlying assumption of the problem setting. Footnote 11 In more recent works on group recommendation, in contrast, fairness is explicitly mentioned, e.g., in Htun et al. ( 2021 ), Kaya et al. ( 2020 ), Malecek and Peska ( 2021 ), maybe also due to the current interest in this topic. Notable works in this context are (Htun et al. 2021 ) and (Wang et al. 2022a ), which are one of the few works in our survey which consider questions of fairness perceptions .

Finally, we underline the resurgence of the notion of calibration recommendation or calibration fairness in recommender systems. In ML, calibration is a fundamental concept which occurs when the expected proportions of (predicted) classes match the observed proportions data points in the available data. Similarly, the purpose of calibration fairness is to reflect a measure of the deviation of users’ interests from the suggested recommendation in an acceptable proportion (Oh et al. 2011 ; Steck 2018 ; Jugovac et al. 2017 ). While this may not be inherent and directly related to individual or group fairness, this is the category from this section that better suits such an important (and popular) technique. In fact, from a conceptual point of view, one may see calibration as implementing a particular form of group fairness, without there being an explicitly protected attribute. In the entertainment domain, this might be the (implicit) group of independent movie lovers (Abdollahpouri et al. 2021 ); in the news domain, there may be a group of users who prefer a balanced information offering, e.g., in terms of political opinions. Applying calibration may then help to avoid that the independent movie lovers receive mainly recommendations of mainstream movies; and that vice versa independent movies obtain a higher chance of exposure.

More in general, calibration has been applied to either users—by considering age or gender as features to be calibrated against—or items—to compensate for popularity, but also to diversify with respect to item attributes such as genre (Bobadilla et al. 2021 ; Abdollahpouri et al. 2021 ; da Silva et al. 2021 ). Besides, in works like (Abdollahpouri et al. 2020b ), calibration is considered as a quality of the recommendations, and the authors measure whether different users or groups experience varying levels of (mis)calibration in their recommendations, since this may indicate an unfair treatment on those populations. Nonetheless, as stated in Lin et al. ( 2020 ), calibrated recommendations in some domains (such as news or microblogging) might contribute to political polarization in society, so this technique is generally applied to consumer taste domains, where focused, less-diverse recommendations might be valued by users. Like for other fairness approaches, however, there must be an underlying normative claim that is addressed. Without an underlying normative claim, calibrating recommendations may in some cases merely be a matter of improved personalization and, thus, recommendation quality.

Single-sided and Multi-Sided Fairness Traditionally, research in computer science on recommender systems has focused on the consumer value (or utility) of recommender systems, e.g., on how algorithmically generated suggestions may help users deal with information overload. Providers of recommendation services are however primarily interested in the value a recommender can ultimately create for their organization. The organizational impact of recommender systems has been, for many years, the focus in the field of information systems, see (Xiao and Benbasat 2007 ) for a survey. Only in recent years we observe an increased interest on such topics in the computer science literature. Many of these recent works aim to shed light on the impact of recommendations in a multistakeholder environment, where typical stakeholders may include consumers, service providers, suppliers of the recommendable items, or even society (Abdollahpouri et al. 2020a ; Jannach and Bauer 2020 ).

In multistakeholder environments, there may exist trade-offs between the goals of the involved entities. A recommendation that is good for the consumer might for example not be the best for the profit perspective of the provider (Jannach and Adomavicius 2017 ). In a similar vein, questions of fairness can be viewed from multiple stakeholders, leading to the concept of multisided fairness (Burke 2017 ), which might include the utility of system designer and other side-stakeholders in addition to the consumer and provider. As mentioned above, there can be fairness questions that are related to the providers of the items. Again, there can also be tradeoffs and in some ways incompatible notions of fairness, i.e., what may be a fair recommendation for users might be in some ways be seen to be unfair to item providers, e.g., when their items get limited exposure (Chaudhari et al. 2020 ).

Figure  4 shows the distribution of works that focus on one single side of fairness and works which address questions of multisided fairness. The illustration clearly shows that the large majority of the works concentrates on the single-sided case, indicating an important research gap in the area of multisided fairness within multistakeholder application scenarios.

figure 4

Fairness Notions: Single-sided vs. Multi-sided Fairness

Among the few studies on multi-sided fairness, Abdollahpouri and Burke ( 2019 ) discusses techniques for CP-fairness in matching platforms such as Airbnb and Uber. In Rahmani et al. ( 2022a ), the authors explore how adding contextual information such as geographical, temporal, social, and categorical affects the multi-aspect quality of POI suggestions, including accuracy, beyond-accuracy, fairness, and interpretability (see also Rahmani et al. 2022d for a discussion on a temporal bias). Patro et al. ( 2020 ) model the fair recommendation problem as a constrained fair allocation problem with indivisible goods and propose a recommendation algorithm that takes producer fairness into consideration. In Anelli et al. ( 2023 ) the authors study the CP-Fairness in several graph CF models. Wu et al. ( 2021b ) propose an individual-based perspective, where fairness is defined as the same exposure for all producers and the same NDCG for all consumers involved. Exposure in this work is defined based on the appearance of items of providers on top-n recommendation lists, where a higher ranking is assumed to lead to higher exposure.

Static vs. dynamic fairness Another dimension of fairness research relates to the question whether the fairness assessment is done in a static or dynamic environment (Li et al. 2021c ). In static settings, the assessment is done at a single point of time, as commonly done also in offline evaluations that focus on accuracy. Thus, it is assumed that the attributes of the items do not change, that the set of available items does not change, and that the analysis that is made at one point in time is sufficient to assess the fairness of algorithms or if an unfairness mitigation technique is effective.

Such static evaluations however have their shortcomings, e.g., as there may be feedback loops that are induced by the recommendations. Also, some effects of unfairness and the effects of corresponding mitigation strategies might only become visible over time. Such longitudinal studies require alternative evaluation methodologies, for example, approaches based on synthetic data or different types of simulation , such as those developed in the context of reinforcement learning algorithms, see (Rohde et al. 2018 ; Mladenov et al. 2021 ; Ghanem et al. 2022 ; Zhou et al. 2021 ; Adomavicius et al. 2021 ) for simulation studies and related frameworks in recommender systems.

figure 5

Fairness Notions: Static vs. Dynamic Fairness Evaluation

Figure  5 shows how many studies in our survey considered static and dynamic evaluation settings, respectively. Static evaluations are clearly predominant: we only found 16 works that consider dynamically changing environments. In Ge et al. ( 2021 ), for example, the authors consider the dynamic nature of the recommendation environment by proposing a fairness-constrained reinforcement learning algorithm so that the model dynamically adjusts its recommendation policy to ensure the fairness requirement is satisfied even when the environment changes. A similar idea is developed in Liu et al. ( 2020 ), where a long-term balance between fairness and accuracy is considered for interactive recommender systems, by incorporating fairness into the reward function of the reinforcement algorithm. Moreover, in Sonboli et al. ( 2020 ), a framework is proposed for the dynamic adaptation of recommendation fairness using Social Choice . The goal of this work is to arbitrate between different re-ranking methods, aiming to achieve a better accuracy-fairness tradeoff with respect to all sensitive features. On the other hand, works such as (Beutel et al. 2019 ) and (Deldjoo et al. 2021a ) model fairness in a specific snapshot of the system, by simply taking the system and its training information as a fixed image of the interactions performed by the users on the system.

Associative vs. causal fairness The final categorization discussed in Li et al. ( 2021c ) contrasts associative and causal fairness. One key observation by the authors in that context is that most research in fair ML is based on association-based (correlation-based) approaches. In such approaches, researchers typically investigate the potential “discrepancy of statistical metrics between individuals or subpopulations ”. However, certain aspects of fairness cannot be investigated properly without considering potential causal relations, e.g., between a sensitive (protected) feature like gender and the model’s output. In terms of methodology, causal effects are often investigated based on counterfactual reasoning (Kusner et al. 2017 ; Li et al. 2021b ).

Figure  6 shows that there are only three works investigating recommendation fairness problems based on causality considerations. More specifically, in Cornacchia et al. ( 2021 ), the authors propose the use of counterfactual explanation to provide fair recommendations in the financial domain. An interesting alternative is presented in Li et al. ( 2021b ), where the authors analyze the causal relations between the protected attributes and the obtained results. The third work we found in our review, Qiu et al. ( 2021 ), derives a causal graph to identify and analyze the visual bias of existing methods, so that spurious relationships between users and items can be removed.

figure 6

Fairness Notions: Associative vs. Causal Fairness

One additional dimension we have discovered through our literature analysis is the use of constraint-based approaches to integrate or model fairness characteristics in recommender systems. In this context, these approaches may be seen as an alternative paradigm to associative and causal inference, which is based on explicit constraints and special techniques, often from multi-objective optimization, to achieve the desired fairness goals. For example, Hao et al. ( 2021 ) address the issue of enforcing equality to biased data by formulating a constrained multi-objective optimization problem to ensure that sampling from imbalanced sub-groups does not affect gradient-based learning algorithms; the same work and others—including (Seymen et al. 2021 ) or (Yadav et al. 2021 )—define fairness as another constraint to be optimized by the algorithms. In Yadav et al. ( 2021 ), in particular, such a constraint is amortized fairness-of-exposure.

4.4 Application domains and datasets

Next, we look at application domains that are in the focus of research on fair recommendations. Figure  7 shows an overview of the most frequent application domains and how many papers focused on these domains in their evaluations. Footnote 12 The by far most researched domain is the recommendation of videos (movies) and music, followed by e-commerce, and finance. For many other domains shown in the figure (e.g., jobs, tourism, or books), only a few papers were identified. Certain domains were only considered in one or two papers. These papers are combined in the “Other” domain in Fig.  7 .

figure 7

Application domains of used datasets. Note that some studies rely on more than one dataset, and a number of theoretical or conceptual works do not provide experimental validation

Since most of the studied papers are technical papers and use an offline experimental procedure, corresponding datasets from the respective domains are used. Strikingly often, in more than one third of the papers, one of the MovieLens datasets is used. This may seem surprising as some of these datasets not even contain information about sensitive attributes. Generally, these observations reflect a common pattern in recommender systems research, which is largely driven by the availability of datasets. The MovieLens datasets are a widely adopted and probably overused case and have been used for all sorts of research in the past (Harper and Konstan 2015 ). Fairness research in recommender systems thus seems to have a quite different focus than fair ML research in general, which is often about avoiding discrimination of people.

We may now wonder which specific fairness problems are studied with the help of the MovieLens rating datasets. What would be unfair recommendations to users? What would be unfair towards the movies (or their providers)? It turns out that item popularity is often the decisive attribute to achieve fairness towards items , and quite a number of works aim to increase the exposure of long-tail items which are not too popular, see, e.g., Dong et al. ( 2021 ). In terms of fairness towards users , the technical proposal in da Silva et al. ( 2021 ) for example aims to serve users with recommendations that reflect their past diversity preferences with respect to movie genres. An approach towards group fairness is proposed in Misztal-Radecka and Indurkhya ( 2021 ). Here, groups are not identified by their protected attribute, but by the recommendation accuracy that is achieved (using any metric) for the members of the group.

In other domains beyond Video/Music (dominated, as mentioned above, by MovieLens datasets), fairness is characterized by the inherent properties of users and items in each particular domain. For example, in e-commerce the price or year of the item, or the helpfulness of the provided user’s review are considered (Deldjoo et al. 2021a ); in tourism, the user’s gender and the business category are typically analyzed (Mansoury et al. 2019 ).

Continuing our discussions above, such notions of unfairness in the described application contexts may not be undisputed. When some users receive recommendations with lower accuracy, this might be caused by their limited activity on the platform or their unwillingness to allow the system to collect data. Actually, one may consider it unfair to artificially lower the quality of recommendations for the group of highly active and open users. In another example, it might not be clear why recommending less popular items—which might in fact not be popular because of their limited quality—would make a system fairer, and equating bias (or skewed distributions) with unfairness in general seems questionable. Therefore, we iterate the importance of clearly specifying the underlying assumptions, hypothesis, and normative claims in any given research work on fairness. Otherwise it may remain unclear to what extent a particular system design or algorithmic approach will ensure or increase a system’s level of fairness.

Similar questions arise when using calibration approaches to ensure fairness in a personalized, user-individual way. Considering, for example, a user fairness calibration approach like the one presented in da Silva et al. ( 2021 ), it is less than clear why diversifying recommendations according to user tastes would increase the system’s fairness. It may increase the quality of the recommendations, but a system that generates recommendations of limited quality in terms of calibration for everyone is probably not one we would call unfair. However, note that there actually may be situations where calibration serve a certain fairness goal. Consider, for example, that a recommendation provider notices that users with niche tastes often receive item recommendations that are not interesting to them. This may happen when an algorithm too strongly focuses on mainstream items and when the used metrics do not reveal clearly that there are some user groups that are not served well. Under the assumption that users with niche tastes might also be users who are marginalized in other ways, e.g., when they are users who differ because of ethnicity or national origin, then improving calibration may indeed serve a fairness goal. These assumptions and claims however have to be made explicit, as otherwise it might just be an issue of whether the recommendation quality is measured in the right way.

In several cases, and independent of the particular application domain, it therefore seems that the addressed problem settings are not too realistic or remain artificial to a certain extent. One main reason for this phenomenon in our view lies in the lack of suitable datasets for domains where fairness really matters. These could for example be the problem of job recommendations on business networks or people recommendations on social media which can be discriminatory. In today’s research, often datasets from rather non-critical domains or synthetic datasets are used to showcase the effectiveness of a technical solution (Ge et al. 2021 ; Abdollahpouri et al. 2021 ; Yao and Huang 2017 ; Misztal-Radecka and Indurkhya 2021 ; Hao et al. 2021 ; Tsintzou et al. 2019 ; Sun et al. 2019 ; Geyik et al. 2019 ; Stratigi et al. 2017 ). While this may certainly be meaningful to demonstrate the effects of, e.g., a fairness-aware re-ranking algorithm, such research may appear to remain quite disconnected from real-world problems. Related phenomena of “abstraction traps” in fair ML were discussed earlier in Selbst et al. ( 2019 ). While abstraction certainly is central to computer science, the danger exists that central domain-specific or application-specific idiosyncrasies are abstracted away so that ML tools can be applied. In the end, the proposed solutions for the abstracted problem may then fail to properly account for the sometimes complex interactions between technical systems and the real world, and to respond to the “fundamental tensions, uncertainties, and conflicts inherent in sociotechnical systems.” (Selbst et al. 2019 )

4.5 Methodology

In this section, we review how researchers approach the problems from a methodological perspective.

Research methods In principle, research in recommender systems can be done through experimental research (e.g., with a field study or through a simulation) or non-experimental research (e.g., through observational studies or with qualitative methods) (Gunawardana et al. 2022 ; Jannach et al. 2010 ). In recommender systems research, three main types of experimental research are common: (a) offline experiments based on historical data, (b) user studies (laboratory studies), and (c) field tests (A/B tests, where different systems versions are evaluated in the real world). Figure  8 shows how many papers fall into each category. Like in general recommender systems research (Jannach et al. 2012 ), we find that offline experiments are the predominant form of research. Note that we here only consider 83 technical papers, and not the conceptual, theoretical, and analytic ones that we identified. Only in very few cases (6 papers), humans were involved in the experiments, and in even fewer cases (3 papers) we found reports of field tests. Regarding user studies, Htun et al. ( 2021 ) for example involves real users to evaluate fairness in a group recommendation setting. On the other hand, notable examples of field experiment are provided in Geyik et al. ( 2019 ), where a gender-representative re-ranker is deployed for a randomly chosen 50% of the recruiters on the LinkedIn Recruiter platform (A/B testing), and in Beutel et al. ( 2019 ), where the engagement with a large-scale recommender system in production is reported across sub-groups of users. We only found one paper that relied on interviews as a qualitative research method (Sonboli et al. 2021 ). Also, only very few papers used more than one experiment type, e.g., Serbos et al. ( 2017 ) were both a user study and an offline experiment were conducted.

figure 8

Experiment types

The dominance of offline experiments points to a research gap in terms of our understanding of fairness perceptions by users. Many technical papers that use offline experiments assume that there is some target distribution or a target constraint that should be met. And these papers then use computational metrics to assess to what extent an algorithm is able to meet those targets. The target distribution, e.g., of popular and long-tail content, is usually assumed to be given or to be a system parameter. To what extent a certain distribution or metric value would be considered fair by users or other stakeholders in a given domain is usually not discussed. In any practical application, this question is however fundamental, and again the danger exists that research is stuck in an abstraction trap, as characterized above. In a recent work on job recommendations (Wang et al. 2022a ), it was for example found that a debiasing algorithm lead to fairer recommendation without a loss in accuracy. A user study then however revealed that participants actually preferred the original system recommendations.

Main technical contributions and algorithmic approaches Looking only at the technical papers, we identified three main groups of technical contributions: (i) works that report outcomes of data analyses or which compare recommendation outcomes, (ii) works that propose algorithmic approaches to increase the fairness of the recommendations, and (iii) works that propose new metrics or evaluation approaches. Figure  9 shows the distribution of papers according to this categorization.

figure 9

Technical focus of papers

We observe that most technical papers aim to make the recommendations of a system fairer, e.g., by reducing biases or by aiming to meet a target distribution. Technically, in analogy to context-aware recommender systems (Adomavicius and Tuzhilin 2015 ), this “fairness step” can be done (i) in a pre-processing step, (ii) integrated in the ranking model (modeling approaches), or (iii) in a post-processing step. Figure  10 shows what is common in the current literature, see also (Li et al. 2022 ). Methods that rely on some form of pre-processing are comparably rare. Typical approaches for modeling approaches include specific fairness-aware loss functions or optimizing methods that consider certain constraints. Post-processing approaches are frequently based on re-ranking.

figure 10

Fairness step

Overall, the statistics on the one hand point to a possible research gap in terms of works that aim to understanding what leads to unfair recommendations and how severe the problems are for different algorithmic approaches in particular domains. In the future, it might therefore be important to focus more on analytical research, as advocated also in Jannach and Bauer ( 2020 ), e.g., to understand the idiosyncrasies of a particular application scenario instead of aiming solely for general-purpose algorithms. On the other hand, the relatively large amount of work that propose new ways of evaluating indicate that the field is not yet mature and has not yet established a standardized research methodology. We discuss evaluation metrics next.

Evaluation metrics. In offline experiments, a variety of computational metrics are employed to evaluate the fairness of a set of recommendations. The choice of a certain fairness metric is mostly determined by the underlying concept of fairness, such as whether it is about individual or group fairness. In Table  3 and Table  4 , we provide detailed lists of selected metrics used in the literature on fairness in recommender systems. Footnote 13 We primarily organize the metrics along the common categorization of group fairness (Table  3 ) vs. individual fairness (Table  4 ). Within the category of group fairness metrics, we furthermore mainly distinguish between the types of utility (benefit) in terms of exposure and effectiveness (Amigó et al. 2023 ). The metrics listed in Table  4 , in contrast, are split into (a) metrics for individual item recommendation scenarios, and (b) metrics for group recommendation settings. Exposure and effectiveness can be defined as follows:

Exposure refers to the degree to which an item or group of items is exposed to a user or group of users;

Effectiveness (sometimes called relevance ) defines the amount to which an item’s exposure is effective, i.e., corresponds to the user’s preferences.

Different stakeholders in recommender systems may be concerned with these two types of utility to varying degrees. For instance, from the perspective of customers, fairness primarily entails an equitable distribution of effectiveness among users, thereby preventing the discrimination of historically disadvantaged groups such as female or black job applicants, for example. In contrast, producers and item providers that seek enhanced visibility are primarily concerned with exposure equity, which should not be punished, for instance, based on producers’ popularity or country.

We note that the popularity of items is a central concept in most metrics that are related to exposure . Most commonly, the popularity of an item is assessed in offline experiments by counting the number of observed interactions for each item in the training data. Moreover, various work assume that there is a trade-off between different evaluation objectives: customer fairness, provider fairness, and overall system accuracy. Thus, some metrics in the literature are designed against the background of such potential trade-offs.

Discussion The main problem when using computational metrics in offline experiments, in general, is that it is often unclear to what extent these metrics translate to better systems in practice. In non-fairness research, this typically amounts to the question if higher prediction accuracy on past data will lead to more value for consumers or providers, e.g., in terms of user satisfaction or business-oriented key performance indicators, see (Jannach and Jugovac 2019 ). In fairness research, the corresponding questions are if users would actually consider the recommendations fairer or if a fairness-aware algorithm would lead to the different behavior of the users. Unfortunately, research that involves humans is very rare. An example of a work that considers the effects of fair rankings can be found in Sühr et al. ( 2021 ), where mixed effects were observed in the context of job recommendation, accounting for gender biases and the impact of job context, candidate profiles, and employer inherent biases, revealing that fair algorithms are useful unless employers evidence strong gender preferences.

Another potential issue of the metrics used is that they may be a strong over-simplification or too strong abstraction of the real problems. Consider the problem of recommending long-tail (less popular) items, which is in the focus of many research works. The metrics we found that measure how many long-tail items are recommended usually do not differentiate whether the recommended item is a “good” one or not, by using some form of quality assessment. As mentioned, some items may be unpopular just because of their poor quality. Also, in many of these works, it is not clear what a desirable level of exposure of long-tail items would be. This is a problem that is particularly pronounced also for many works that measure fairness through the deviation of the recommendations from some target (desirable) distribution. In technical terms, adjusting the recommendations to be closer to some target distribution can be done with almost trivial and very efficient means like re-ranking. The true and important question, however, is how we know the target distribution in a given application context.

Generally, we also found a number of works where biased recommendations (e.g., towards popular items) were equated with unfairness. As discussed, this assumption may be too strong. In some of these papers, no deeper discussion is provided about why the biases lead to unfairness in a certain application context. The normative claims and underlying assumptions about how and when fairness is defined are missing, in parts leading to the impression that the concept of ’bias mitigation’ instead of ’fairness’ should have been used. As noted earlier, a similar observation can be made for papers that assume that calibrating recommendations per se leads to fairness. This can probably not be safely stated in general unless the normative claims are made explicit and fit the goals that are achieved by calibration.

When considering recommendation quality metrics for groups, the assumption is either that different groups should have equal recommendation quality (to treat them all alike) or that there is some justified inequality. The latter case may, for example, arise if some groups are assumed to receive better service, e.g., because they have paid for better service or when the inequality is dependent on the corpus size or the available relevant data (Kirnap et al. 2021 ; Amigó et al. 2023 ).

As argued above, in most applications of recommenders the recommendations will be better in terms of accuracy measures for active users than for less active users. Some papers in this survey consider this unfair, but this line of argumentation is not easy to follow. In fact, some researchers may argue that the correct mitigation strategy would be to fix the data or change the user interface to elicit more data. It would also be debatable which percentage of performance is acceptable to consider such a tradeoff (un)fair, as is the norm in the discussion around statistical parity. Certainly, there may be scenarios where there are particular protected attributes for which it may be desirable not to have largely varying accuracy levels across the groups. In many of the surveyed papers, no realistic use cases are however given.

In terms of the different notions of fairness, traditionally either group fairness or individual fairness are studied to address consumer effectiveness and producer exposure. However, recent research also addresses situations involving mixed individual and group fairness, such as group item exposure fairness and user-individual effectiveness fairness, see for example (Wu et al. 2021b ; Rastegarpanah et al. 2019 ). In such studies, it is often assumed that when provider exposure is addressed, the quality of the recommendations may diminish. The authors thus define individual unfairness as disparities in user losses and demand that the decline in recommendation quality be dispersed equitably across all users. As previously stated, the notion of a trade-off between the fairness evaluation objectives and overall system accuracy is prevalent in fairness research, and these demonstrate the need for additional research on multi-sided recommendation fairness.

Finally, looking at individual fairness in group recommendation scenarios, a multitude of aggregation strategies were proposed over the years such as Least Misery or Borda Count (Masthoff and Delic 2022 ). The literature on group recommender systems—which is now revived under the term fairness—however, does not provide a clear conclusion regarding which aggregation metric should be used in a given application. It should be noted that Arrow’s impossibility theorem (from Social Choice Theory ) supports the conclusion that no aggregation strategy will be universally ideal, hence leading again to a potential reason for unfairness in a group. Also in this area researchers, may have been stuck in an abstraction trap (Selbst et al. 2019 ; Jannach and Bauer 2020 ) as we have pointed out several oversimplification instances in fairness research, and more (multi-disciplinary) research seems required to understand group recommendation processes, see (Delic et al. 2018 ) for an observational study in the tourism domain.

Reproducibility The lack of reproducibility can be a major barrier to achieving progress in AI (Gundersen and Kjensmo 2018 ), and recent studies indicate that limited reproducibility is a substantial issue also in recommender systems research (Cremonesi and Jannach 2021 ; Bellogín and Said 2021 ). Figure  11 shows how many of the studied technical papers and artifacts were shared to ensure the reproducibility of the reported experiments. Footnote 14 While the level of reproducibility seems to be higher than in general AI (Gundersen and Kjensmo 2018 ), still for the large majority of the considered works authors did not share any code or data.

figure 11

Level of Reproducibility (Shared Artifacts)

4.6 Landscape overview

Fairness is a multi-faceted subject. In order to provide an encompassing understanding of different fairness dimensions, we have developed a taxonomy that takes different perspectives, as explained in Sect.  3.2 , which allows us to describe the landscape of fairness research in recommender systems, as shown in Fig.  12 . The landscape’s main aspects can be summarized based on the following questions.

How is fairness implemented? Depending on which step of the recommendation pipeline we change, fairness-enhancing systems can be divided into are pre-, in- and post-processing techniques. Here we also note that the main patterns are in- and post-processing (typically re-ranking), probably due to the advantage of an easier applicability to existing systems.

What is the target representation? The target representation is defined as the ideal representation (i.e., proportion or distribution of exposure) (Kirnap et al. 2021 ). In other works, this is also referred to as target distribution (of benefits such as exposure or relevance). Even though this aspect has not been specifically analyzed in the previously presented figures, we have identified three main target representations against which most fairness metrics compare: catalog size, relevance, and parity. These representations match those introduced in Kirnap et al. ( 2021 ), where authors state that the choice of the representation target depends on the application domain. Among these, the most common interpretation is that items should be recommended equally for each group, hence, using a parity-based representation target. However, there are also other aspects and fairness notions that do not use this assumption, as discussed in Sect.  2.3 .

What is the benefit of fairness? As in the previous case, for the sake of conciseness, we have not considered this dimension in this detailed analysis, but it is worth mentioning that fairness definitions can be categorized depending on whether its main benefit is based on exposure (by assessing if items are exposed in a uniform or fair way) or relevance (with the additional constraint on the exposure that it must be effective, that is, it should match the user preferences). In principle, any information seeking system (such as search engines or recommender systems) should aim for relevance-based benefits. However, considering the difficulty of these tasks, by measuring and achieving a situation with fair exposure, the subsequent measurements on the system would already be impacted and improved, from a fairness perspective and, hence, it is a reasonable goal to obtain.

How is fairness measured? Fairness evaluation, as any other experimental research, can be performed through qualitative or quantitative methods. As discussed in Sect.  4.5 , qualitative approaches are currently almost never taken, and most of the analyses are done by quantitative approaches such as offline experiments or A/B tests.

On which level is fairness considered? Fairness can be defined on a group level or individual level, as discussed above. Today, group-level fairness is the prevalent option, most likely because measuring (operationalizing) group fairness is easier than individual fairness. In other words, what it means for two individuals to be similar is task-sensitive and more difficult than segmenting users/items into groups based on a sensitive feature, as is often done in the examined literature of group fairness. This might also have social implications, as many major considerations of fairness in the literature, including gender equality, demographic equality, and others, are predicated on the concept of group fairness. This is connected with the so-called issue of intersectionality , which we discuss in some more detail below. It is important to note that the primary limitation of group fairness is the decreasing reliability of sensitive attributes in recent years due to privacy concerns and firms’ reluctance to share such information.

Fairness for whom? In many cases, the circumstance for making a recommendation is intrinsically multi-sided. As a result, any of the stakeholders engaged, as well as the platform itself, may be affected by (un)fairness. Through our survey, we found that there is a balance in the literature between consumer and provider viewpoints. In addition, more recent research in ML has begun to address the issue of intersectionality in fairness by building statistical frameworks that account for bias within multiple protected groups, for example, “black women” instead of just “black people” or “women” (Ghosh et al. ( 2021b ); Morina et al. ( 2019 )). An interesting example is presented by Buolamwini and Gebru ( 2018 ) where the authors found that commercial facial image classification systems do not show the full distribution of mis-classifications when considering gender and skin type alone, and that darker-skinned women being the most misclassified group, with an accuracy drop of over 30% compared to lighter-skinned men. This aspect has, to the best of our knowledge, been largely overlooked in the recommender fairness research; one exception is the study presented recently by Shen et al. ( 2023 ), where such intersectionality between gender (male vs. female) and skin color (black vs. white) fairness was applied to language model-driven conversational recommendation.

What is the considered time horizon of fairness? Fairness can be pursued in a static way (or: one-shot ), or dynamically over time, taking into account shifts in the item catalog, user tastes, etc. However, practically we observe a prevalence of the former, with the latter including new trends like reinforcement learning-based approaches.

What are the causes of unfairness? The dominant pattern of fairness-enhancing approaches seems to pursue a static, associative, group-level notion of fairness, inheriting from fair ML traditional research. Hence, papers considering relatively new approaches such as causal inference and long-term fairness are more rare. We can describe this as a research gap, i.e., there should be more research into the reasons of unfairness through the lens of causality and counterfactuals.

figure 12

Taxonomy and landscape

5 Discussion

Summary of Main Observations Due to today’s broad and increasing use of AI in practical applications, questions relating to the potential harms of AI-powered systems have received more and more attention in recent years, both in academic research, the tech industry, and within political organizations. Fairness is often considered a central component of what is sometimes called responsible AI. These developments can also be seen in the area of recommender systems, where we observed a strong increase in terms of publications on fairness since the mid-2010 s, cf. Fig.  1 .

Looking closer at the research contributions from the field of computer science, we observe that the large majority of works aim to provide technical solutions, and that the technical contributions are predominantly fairness-aware algorithms (cf. Fig.  2 and Fig.  9 ). In contrast, only comparably limited research activity seems to take place on topics that go beyond the algorithmic perspective, such as user interfaces and human-in-the-loop approaches, or even beyond computer science (that is applied to AI in general, and recommender systems in particular), such as psychology, economics, or social sciences. While algorithmic research is certainly important, focusing almost exclusively on improving algorithms in terms of optimizing an abstract computational fairness metric may be too limited. Ultimately, however, our goal should rather be to design “ algorithmic systems that support human values ” (Narayanan 2018 ) and avoid potential abstraction traps, similar as in the general area of fair ML.

On the positive side, we find that researchers in fair RS are addressing various notions of fairness (cf. Figs.  3 to 6 ), e.g., they deal with questions both of individual fairness and of group fairness. In addition, the community has expanded the scope of fairness considerations beyond its impact on people and has developed various approaches to deal with fairness towards items and providers. This is different from many other traditional application areas of fair ML, e.g., credit default prediction, where people are usually the main focus of research, even though these concepts of item fairness are ultimately always related to people (or organizations) in the end, because the item providers are the ones impacted when their items are not recommended.

Looking at the considered application domains and datasets, we observe that various domains are addressed. However, the large majority of technical papers report experiments with datasets from the media domain (videos and music), cf. Fig.  7 . Specifically, some of the MovieLens datasets are frequently used either as a concrete use case or as a way to at least provide reproducible results, given that the set of fairness aspects that can be reasonably studied with such datasets seems limited. All in all, there seems to be a certain lack of real-world datasets for real-world fairness problems, which is why researchers frequently also rely on synthetic data or on protected groups that are artificially introduced into a given recommendation dataset.

In terms of the research methodology, offline experiments using the described datasets are the method of choice for most researchers, cf. Fig.  8 . Only very few works rely on studies that have the human in the loop, which points to a major research gap in fair recommender systems . In the context of these offline evaluations, a rich variety of evaluation approaches and computational metrics are used. The way the research problems are operationalized however often appears to be an oversimplification of the underlying problem. In many research works, for example, (popularity) biases are equated with unfairness, which we believe is not necessarily the case in general. Some of the surveyed works also seem to “re-brand” existing research on beyond-accuracy quality aspects of recommendations—e.g., on diversity or calibration—as fairness research, sometimes missing a clear and detailed discussion of the underlying normative claims that are addressed. Finally, in almost all works some “gold standard” for fair recommendations is assumed to be given, e.g., in the form of a target distribution regarding item exposures. With the goal of providing generic algorithmic solutions, little or no guidance is however usually provided on how to decide or determine this gold standard for a given use case. While general-purpose solutions are certainly desirable, the danger of being stuck in an abstraction trap with limited practical impact increases (Selbst et al. 2019 ; Jannach and Bauer 2020 ).

Future Directions Our analysis of the current research landscape points to a number of further research gaps. Considering the type of contributions and the different notions of fairness, we find that today’s research efforts are not balanced. Most published works are algorithmic contributions and use offline evaluations with a variety of proxy metrics to assess fairness. Less discussion is provided regarding how different level content used in mainstream recommender systems (e.g., user-generated, expert-generated content, and audio) (Moscati et al. 2022 ; Deldjoo et al. 2021d ) are susceptible to the promotion of certain types of biases and unfairness, e.g., audio content could suffer more from an accuracy standpoint but could promote the recommendation of long-term items more effectively. Moreover, these offline evaluations are based on one particular point in time. As such, these evaluations do not consider longitudinal dynamics that may emerge (a) when the fairness goals change over time or (b) when an algorithm’s output changes over time, e.g., when a fairness intervention gradually improves the recommendations. This limitation of static offline evaluations also becomes more acknowledged in the general recommender systems literature. Simulation approaches are recently often considered as one promising approach to model such longitudinal dynamics (Ghanem et al. 2022 ; Rohde et al. 2018 ; Mladenov et al. 2021 ; Zhou et al. 2021 ). Causal models, in contrast to associative ones, also received very limited research attention so far.

Through our survey, we furthermore identified a number of promising research problems for which only few works exist so far:

Challenge 1: Achieving realistic and useful definitions for fairness. As discussed before, there are several definitions for fairness, not only in the RS literature but in ML and AI in general (Olteanu et al. 2019 ). This provokes incompatibility between some of these definitions and potential disagreement, where one metric may conclude that a recommender system is fair and another the opposite, even from a mathematical point of view (Chouldechova 2017 ). As a consequence, it is not easy to find a proper balance between different notions of fairness and the performance of the recommendation models. An example of a relevant proposal can be found in Liu et al. ( 2020 ), where the authors employ metrics that capture the cumulative reward in a way that combines accuracy and fairness while aiming to improve both. This is a rich area of investigation, open to novel definitions and approaches about how to leverage this tradeoff and whether one dimension should weight more than the other (Friedler et al. 2021 ; Chouldechova 2017 ; Kleinberg et al. 2017 ).

However, this is not the only problem we have identified in our literature review. As stated in Sect.  4.5 , the seldom use of user studies and field tests make it very difficult to incorporate user perception (Ferwerda et al. 2023 ) into our understanding of what should be defined as a fair recommendation. In fact, some works propose to move from notions of equality to those of equity and independence (Amigó et al. 2023 ), but even these general definitions that may work at a societal level, may not necessarily make sense depending on the domain or the user needs.

Challenge 2: Building on appropriate data to assess fairness. As discussed in Sect.  4.4 , some datasets used in the literature do not contain sensitive attributes at all. This problem has been addressed in different ways, none of them perfect but fruitful towards the goal of mimicking the evaluation of recommender systems in realistic scenarios. A first possibility is to perform data augmentation, where the main idea is, without changing the underlying data and algorithm, to be able to remove biases from the data to provide higher-quality information to the algorithms (Rastegarpanah et al. 2019 ). Another, more popular, possibility is to use of simulation instead of real-world datasets. Various recent papers use simulation, sampling techniques (see e.g., the work by Deldjoo et al. ( 2021b ) investigating the impact of data characteristics), and synthetic data to evaluate fairness in search scenarios (Geyik et al. 2019 ). This may require more advanced techniques in the evaluation step, such as counterfactual evaluation, in order to properly interpret the data coming from A/B logged interactions once interventions have been performed through a recommendation algorithm, for example, by focusing on improving item exposure (Mehrotra et al. 2018 ).

Challenge 3: Understanding fairness in reciprocal settings. Maintaining the utility of stakeholders in reciprocal settings is a new notion of fairness (Xia et al. 2019 ), even though reciprocal recommender systems have been studied (although not as frequently as other systems) in the past and remain at the core of social network and matching platforms, see (Koprinska and Yacef 2015 ) for a survey on people-to-people recommender systems. In the former work, Xia et al. define fairness as an equilibrium between parties where there are ’buyers’ and ’sellers’ and each seller has the same value or ’price’; hence, in their notion of “Walrasian Equilibrium” they are treated fairly by considering at the same time (a) the disparity of service, (b) the similarity of mutual preference, and (c) the equilibrium of demand and supply, that is, by balancing the demand of buyers and the supply of sellers.

By considering the importance of this type of systems, being able to operationalize a reasonable definition for this context is foreseen as a major challenge to tackle in the future. In fact, going beyond these notions of equilibrium for reciprocal settings, such as cooperative behaviors and non-zero sum games, would require digging further into game theory and related areas, which would be potential avenues for future research.

Challenge 4: Fairness auditing. As stated in Koshiyama et al. ( 2022 ), algorithm auditing is the research and practice of assessing, mitigating, and assuring an algorithm’s legality, ethics, and safety. In that work, the authors consider bias and discrimination as one of the main verticals of algorithm auditing. Hence, auditing recommender systems should become a priority in the near future, and the fairness dimension is, by definition, one of the most important aspects to be considered in that process. As an example, we want to highlight that the authors from Krafft et al. ( 2020 ) aimed at auditing decision making systems, but faced important issues since their agents were banned from the platform that was meant to be analyzed (Facebook NewsFeed). Hence, there are technical difficulties that may make this challenge even harder to achieve, despite its importance in legal and ethical dimensions. Because of this, we argue that, in order to be practical and potentially address this challenge, such requirements should be enforced from higher levels or even policies, otherwise companies may not embrace this type of accountability.

Finally, one main fundamental problem of current research on fair recommender systems is that it is not entirely clear yet how impactful it is in practice. Algorithmic research is too often based on a very abstract and probably overly simplistic operationalization of the research problem, using computational metrics for which it is not clear if they are good proxies for fairness in a particular problem setting. In such a research approach, fundamental questions of what is a fair recommendation in a given situation are not discussed. Correspondingly, the choice of application domains sometimes seems arbitrary (based on dataset availability), and the fairness challenges often appear almost artificial. Moreover, connections to existing works and theories developed in the social sciences are rarely established in the published literature, and fairness is often simply treated as an algorithmic problem, e.g., to make recommendations that match a pre-defined target distribution. In some ways, current research shares challenges with many works in the area of Explainable AI, where many insights from social sciences exist, and where it is often neglected that explainable AI, like recommendation, to a large extent is a problem of human-computer interaction (Miller 2019 ). As a consequence, much more fundamental research on fairness, its definition in a given problem setting, and its perception by the involved stakeholders is needed. This, in turn, requires a multidisciplinary approach, involving not only researchers from different areas of computer sciences, but also including subject-matter experts from real-world problem settings and scholars from fields outside computer science, such as psychology and social science.

https://eur-lex.europa.eu/legal-content/en/TXT/?uri=COM:2020:825:FIN .

Consider Ashokan and Haas ( 2021 ), where the authors show that biases may occur in a typical machine learning pipeline from data generation, over the model building and evaluation, to deployment and user interaction.

The term redlining (Corbett-Davies and Goel 2018 ) is analogous to the concept of indirect unfairness wherein a non-sensitive characteristic (such as geography) is used as a proxy for a more personal quality (such as race or socioeconomic status).

We note here that our work is not intended to be a systematic literature review in the strict sense of Kitchenham et al. ( 2009 ), but rather aims to outline a broader picture of current research activities.

https://dblp.org/ .

Note that DBLP indexes arXiv papers.

A number of related events have been recently connected through the ACM FAccT Network, https://facctconference.org/network/ .

Each paper was categorized by at least two researchers, and potential discrepancies were resolved through a discussion process. The same process was applied to categorize the papers also in other dimensions as discussed later in this section.

We should note that we found no example where the reliability of these implicitly computed attributes was analyzed. Usually, authors use explicit thresholds to assign users/items to groups (Li et al. 2021a ; Xiao et al. 2020 ) or percentiles from distributions based on a variable of interest, such as item popularity (Abdollahpouri et al. 2021 ; Deldjoo et al. 2021a ).

It should be noted that if decisions would be based on the protected gender attribute, it would not be individual fairness. In the discussed work, however, the goal is to treat individuals similarly which have similar attributes (and not considering the gender attribute). This then represents an approach towards individual fairness according to the definition.

Even though there are some strategies that are not fair, e.g., dictatorship, where one decides for the group (Masthoff and Delic 2022 ).

The categorization of the papers was based on the datasets that were used for the empirical evaluations. We used higher-level categories of domains as done in earlier surveys, e.g., in Nunes and Jannach ( 2017 ), Jannach et al. ( 2012 ).

We note that in these tables we only provide individual examples of works that used a particular metric.

The level of reproducibility of research work can be assessed in multiple dimensions, see (Gundersen and Kjensmo 2018 ). In the context of our work, we limit ourselves to the analysis of certain central artifacts that are publicly shared.

Abdollahpouri, H., Burke, R.: Multi-stakeholder recommendation and its connection to multi-sided fairness. In: Proceedings of the Workshop on Recommendation in Multi-stakeholder Environments co-located with the 13th ACM Conference on Recommender Systems (RecSys 2019), CEUR Workshop Proceedings, vol. 2440, (2019)

Abdollahpouri, H., Burke, R., Mobasher, B.: Managing popularity bias in recommender systems with personalized re-ranking. In: Proceedings of the Thirty-Second International Florida Artificial Intelligence Research Society Conference, pp. 413–418, (2019a)

Abdollahpouri, H., Mansoury, M., Burke, R., Mobasher, B.: The unfairness of popularity bias in recommendation. In: Proceedings of the Workshop on Recommendation in Multi-stakeholder Environments co-located with the 13th ACM Conference on Recommender Systems (RecSys 2019), vol. 2440, (2019b)

Abdollahpouri, H., Adomavicius, G., Burke, R., Guy, I., Jannach, D., Kamishima, T., Krasnodebski, J., Pizzato, L.: Multistakeholder recommendation: Survey and research directions. User Model. User-Adap. Inter. 30 , 127–158 (2020)

Article   Google Scholar  

Abdollahpouri, H., Mansoury, M., Burke, R., Mobasher, B.: The connection between popularity bias, calibration, and fairness in recommendation. In: Fourteenth ACM Conference on Recommender Systems, pp. 726–731, (2020b)

Abdollahpouri, H., Mansoury, M., Burke, R., Mobasher, B., Malthouse, E.C.: User-centered evaluation of popularity bias in recommender systems. In: Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, UMAP 2021, ACM, pp. 119–129, (2021)

Adomavicius, G., Tuzhilin, A.: Context-aware recommender systems. In: Ricci F, Rokach L, Shapira B (eds) Recommender Systems Handbook, pp. 191–226, (2015)

Adomavicius, G., Jannach, D., Leitner, S., Zhang, J.: Understanding longitudinal dynamics of recommender systems with agent-based modeling and simulation. In: SimuRec Workshop at ACM RecSys 2021, (2021)

Amigó, E., Deldjoo, Y., Mizzaro, S., Bellogín, A.: A unifying and general account of fairness measurement in recommender systems. Inf. Process. Manag. 60 (1), 103115 (2023)

Anelli, V.W., Belli, L., Deldjoo, Y., Di Noia, T., Ferrara, A., Narducci, F., Pomo, C.: Pursuing privacy in recommender systems: the view of users and researchers from regulations to applications. In: Fifteenth ACM Conference on Recommender Systems, pp. 838–841, (2021)

Anelli, V.W., Deldjoo, Y., Di Noia, T., Malitesta, D., Paparella, V., Pomo, C.: Auditing consumer- and producer-fairness in graph collaborative filtering. In: Proceedings ECIR ’23, (2023)

Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias. In: Ethics of Data and Analytics, Auerbach Publications, pp. 254–264, (2016)

Ashokan, A., Haas, C.: Fairness metrics and bias mitigation strategies for rating predictions. Inf. Process. Manag. 58 (5), 102646 (2021)

Baeza-Yates, R.: Bias on the web. Commun. ACM 61 (6), 54–61 (2018)

Barocas, S., Hardt, M., Narayanan, A.: Fairness and Machine Learning. fairmlbook.org, (2019), http://www.fairmlbook.org

Bellogín, A., Said, A.: Improving accountability in recommender systems research through reproducibility. User Model User Adapt Interact 31 (5), 941–977 (2021)

Beutel, A., Chen, J., Doshi, T., Qian, H., Wei, L., Wu, Y., Heldt, L., Zhao, Z., Hong, L., Chi, E.H., Goodrow, C.: Fairness in recommendation ranking through pairwise comparisons. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, pp. 2212–2220, (2019)

Binns, R.: On the apparent conflict between individual and group fairness. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, pp. 514–524, (2020)

Bobadilla, J., Lara-Cabrera, R., González-Prieto, Á., Ortega, F.: Deepfair: deep learning for improving fairness in recommender systems. Int. J. Interact Multim. Artif. Intell. 6 (6), 86–94 (2021)

Google Scholar  

Boratto, L., Fenu, G., Marras, M.: Connecting user and item perspectives in popularity debiasing for collaborative recommendation. Inf. Process. Manag. 58 (1), 102387 (2021)

Boratto, L., Fenu, G., Marras, M.: Interplay between upsampling and regularization for provider fairness in recommender systems. User Model User Adapt Interact 31 (3), 421–455 (2021)

Borges, R., Stefanidis, K.: On mitigating popularity bias in recommendations via variational autoencoders. In: SAC ’21: The 36th ACM/SIGAPP Symposium on Applied Computing, pp. 1383–1389, (2021)

Buolamwini, J., Gebru, T.: Gender shades: Intersectional accuracy disparities in commercial gender classification. In: Conference on Fairness, pp. 77–91. Accountability and Transparency, PMLR (2018)

Burke, R.: Multisided fairness for recommendation. In: 4th Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017), (2017)

Burke, R., Sonboli, N., Mansoury, M., Ordoñez-Gauger, A.: Balanced neighborhoods for fairness-aware collaborative recommendation, (2017)

Burke, R., Sonboli, N., Ordonez-Gauger, A.: Balanced neighborhoods for multi-sided fairness in recommendation. In: Conference on Fairness, Accountability and Transparency, FAT 2018, Proceedings of Machine Learning Research, vol. 81, pp. 202–214, (2018)

Chakraborty, A., Messias, J., Benevenuto, F., Ghosh, S., Ganguly, N., Gummadi, K.P.: Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations. In: Proceedings of the Eleventh International Conference on Web and Social Media, ICWSM 2017 , pp. 22–31 (2017)

Chakraborty, A., Patro, G.K., Ganguly, N., Gummadi, K.P., Loiseau, P.: Equality of voice: Towards fair representation in crowdsourced top-k recommendations. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, pp. 129–138, (2019)

Chaudhari, H.A., Lin, S., Linda, O.: A general framework for fairness in multistakeholder recommendations. (2020), arXiv:2009.02423

Chen, J., Dong, H., Wang, X., Feng, F., Wang, M., He, X.: Bias and debias in recommender system: a survey and future directions. ACM Trans. Inf. Syst. (2022)

Chouldechova, A.: Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5 (2), 153–163 (2017)

Article   PubMed   Google Scholar  

Cooper, A.F.: Where is the normative proof? assumptions and contradictions in ML fairness research. (2020), CoRR arXiv:2010.10407

Corbett-Davies, S., Goel, S.: The measure and mismeasure of fairness: a critical review of fair machine learning. (2018), CoRR arXiv:1808.00023

Cornacchia, G., Narducci, F., Ragone, A.: A general model for fair and explainable recommendation in the loan domain. In: Joint Workshop Proceedings of the 3rd Edition of Knowledge-aware and Conversational Recommender Systems (KaRS) and the 5th Edition of Recommendation in Complex Environments (ComplexRec) co-located with 15th ACM Conference on Recommender Systems (RecSys 2021), (2021)

Council, N.R., et al.: Measuring Racial Discrimination. National Academies Press, London (2004)

Cremonesi, P., Jannach, D.: Progress in recommender systems research: crisis? What crisis? AI Mag. 42 (3), 43–54 (2021)

da Silva, D.C., Manzato, M.G., Durão, F.A.: Exploiting personalized calibration and metrics for fairness recommendation. Expert Syst. Appl. 181 , 115112 (2021)

Dash, A., Chakraborty, A., Ghosh, S., Mukherjee, A., Gummadi, K.P.: When the umpire is also a player: Bias in private label product recommendations on e-commerce marketplaces. In: FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 873–884, (2021)

Deldjoo, Y., Anelli, V.W., Zamani, H., Kouki, A.B., Noia, T.D.: Recommender systems fairness evaluation via generalized cross entropy. In: Proceedings of the Workshop on Recommendation in Multi-stakeholder Environments co-located with the 13th ACM Conference on Recommender Systems (RecSys 2019), Copenhagen, Denmark, September 20, 2019, (2019)

Deldjoo, Y., Di Noia, T., Merra, F.A.: Adversarial machine learning in recommender systems (aml-recsys). In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 869–872, (2020)

Deldjoo, Y., Anelli, V.W., Zamani, H., Bellogin, A., Di Noia, T.: A flexible framework for evaluating user and item fairness in recommender systems. User Modeling and User-Adapted Interaction pp. 1–47, (2021a)

Deldjoo, Y., Bellogin, A., Di Noia, T.: Explaining recommender systems fairness and accuracy through the lens of data characteristics. Inf. Process. Manag. 58 (5), 102662 (2021)

Deldjoo, Y., Noia, T.D., Merra, F.A.: A survey on adversarial recommender systems: from attack/defense strategies to generative adversarial networks. ACM Comput. Surv. 54 (2), 1–38 (2021)

Deldjoo, Y., Schedl, M., Knees, P.: Content-driven music recommendation: evolution, state of the art, and challenges. (2021d), arXiv preprint arXiv:2107.11803

Deldjoo, Y., Nazary, F., Ramisa, A., McAuley, J., Pellegrini, G., Bellogin, A., Di Noia, T.: A review of modern fashion recommender systems. ACM Comput. Surv. (2023)

Delic, A., Neidhardt, J., Nguyen, T.N., Ricci, F.: An observational user study for group recommender systems in the tourism domain. J. Inf. Technol. Tour. 19 (1–4), 87–116 (2018)

Di Noia, T., Tintarev, N., Fatourou, P., Schedl, M.: Recommender systems under European AI regulations. Commun. ACM 65 (4), 69–73 (2022)

Dong, Q., Xie, S., Li, W.: User-item matching for recommendation fairness. IEEE Access 9 , 130389–130398 (2021)

Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.S.: Fairness through awareness. In: Goldwasser S (ed) Innovations in Theoretical Computer Science 2012, pp. 214–226, (2012)

Edizel, B., Bonchi, F., Hajian, S., Panisson, A., Tassa, T.: Fairecsys: mitigating algorithmic bias in recommender systems. Int. J. Data Sci. Anal. 9 (2), 197–213 (2020)

Ekstrand, M.D., Das, A., Burke, R., Diaz, F.: Fairness in information access systems. Found. Trends Inf. Retr. 16 (1–2), 1–177 (2022)

Elahi, M., Jannach, D., Skjærven, L., Knudsen, E., Sjøvaag, H., Tolonen, K., Holmstad, Ø., Pipkin, I., Throndsen, E., Stenbom, A., Fiskerud, E., Oesch, A., Vredenberg, L., Trattner, C.: Towards responsible media recommendation. AI Ethics 2 , 103–114 (2022)

Farnadi, G., Kouki, P., Thompson, S.K., Srinivasan, S., Getoor, L.: A fairness-aware hybrid recommender system, (2018). arXiv:1809.09030

Felfernig, A., Boratto, L., Stettinger, M., Tkali, M.: Group Recommender Systems: An Introduction. Springer, Berlin (2018)

Book   Google Scholar  

Ferraro, A.: Music cold-start and long-tail recommendation: bias in deep representations. In: Bogers T, Said A, Brusilovsky P, Tikk D (eds) Proceedings of the 13th ACM Conference on Recommender Systems, RecSys 2019, pp. 586–590, (2019)

Ferwerda, B., Ingesson, E., Berndl, M., Schedl, M.: I Don’t Care How Popular You Are! Investigating Popularity Bias From a User’s Perspective. In: Proceedings of the 8th ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR 2023), ACM, Austin, USA, (2023)

Friedler, S.A., Scheidegger, C., Venkatasubramanian, S.: The (im)possibility of fairness: different value systems require different mechanisms for fair decision making. Commun. ACM 64 (4), 136–143 (2021)

Friedman, A., Knijnenburg, B.P., Vanhecke, K., Martens, L., Berkovsky, S.: Privacy aspects of recommender systems. In: Recommender Systems Handbook, Springer, pp. 649–688, (2015)

Friedman, B., Nissenbaum, H.: Bias in computer systems. ACM Trans. Inf. Syst. 14 (3), 330–347 (1996)

Fu, Z., Xian, Y., Gao, R., Zhao, J., Huang, Q., Ge, Y., Xu, S., Geng, S., Shah, C., Zhang, Y., de Melo, G.: Fairness-aware explainable recommendation over knowledge graphs. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020, pp. 69–78, (2020)

Ge, Y., Liu, S., Gao, R., Xian, Y., Li, Y., Zhao, X., Pei, C., Sun, F., Ge, J., Ou, W., Zhang, Y.: Towards long-term fairness in recommendation. In: WSDM ’21, The Fourteenth ACM International Conference on Web Search and Data Mining, pp. 445–453, (2021)

Geyik, S.C., Ambler, S., Kenthapadi, K., Karypis, G.: Fairness-aware ranking in search & recommendation systems with application to linkedin talent search. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, pp. 2221–2231, (2019)

Ghanem, N., Leitner, S., Jannach, D.: Balancing consumer and business value of recommender systems: a simulation-based analysis. E-Commerce Research and Applications forthcoming, (2022)

Gharahighehi, A., Vens, C., Pliakos, K.: Fair multi-stakeholder news recommender system with hypergraph ranking. Inf. Process. Manag. 58 (5), 102663 (2021)

Ghosh, A., Dutt, R., Wilson, C.: When fair ranking meets uncertain inference. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1033–1043, (2021a)

Ghosh, A., Genuit, L., Reagan, M.: Characterizing intersectional group fairness with worst-case comparisons. In: Artificial Intelligence Diversity, Belonging, Equity, and Inclusion, PMLR, pp. 22–34, (2021b)

Giannakas, T., Sermpezis, P., Giovanidis, A., Spyropoulos, T., Arvanitakis, G.: Fairness in network-friendly recommendations. In: 22nd IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks, WoWMoM 2021, pp. 71–80, (2021)

Gómez, E., Zhang, C.S., Boratto, L., Salamó, M., Marras, M.: The winner takes it all: Geographic imbalance and provider (un)fairness in educational recommender systems. In: SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1808–1812, (2021)

Gorantla, S., Deshpande, A., Louis, A.: On the problem of underranking in group-fair ranking. In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Proceedings of Machine Learning Research, vol. 139, pp. 3777–3787, (2021)

Grgic-Hlaca, N., Zafar, M.B., Gummadi, K.P., Weller, A.: The case for process fairness in learning: Feature selection for fair decision making. In: NIPS Symposium on Machine Learning and the Law, Barcelona, Spain, vol. 1, pp. 2, (2016)

Gunawardana, A., Shani, G., Yogev, S.: Evaluating recommender systems. In: Rokach, L., Shapira, B., Ricci, F. (eds.) Recommender Systems Handbook, pp. 547–601. Springer, Berlin (2022)

Chapter   Google Scholar  

Gundersen, O.E., Kjensmo, S.: State of the art: Reproducibility in artificial intelligence. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), pp. 1644–1651, (2018)

Gupta, A., Johnson, E., Payan, J., Roy, A.K., Kobren, A., Panda, S., Tristan, J.B., Wick, M.: Online post-processing in rankings for fair utility maximization. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, WSDM ’21, pp. 454–462, (2021)

Hao, Q., Xu, Q., Yang, Z., Huang, Q.: Pareto optimality for fairness-constrained collaborative filtering. In: MM ’21: ACM Multimedia Conference, ACM, pp. 5619–5627, (2021)

Harper, F.M., Konstan, J.A.: The MovieLens datasets: history and context. ACM Trans. Interact Intell. Syst. 5(4), (2015)

Htun, N.N., Lecluse, E., Verbert, K.: Perception of fairness in group music recommender systems. In: 26th International Conference on Intelligent User Interfaces, pp. 302–306, (2021)

Jannach, D., Adomavicius, G.: Price and profit awareness in recommender systems. In: Proceedings of the ACM RecSys 2017 Workshop on Value-Aware and Multi-Stakeholder Recommendation, (2017)

Jannach, D., Bauer, C.: Escaping the McNamara fallacy: towards more impactful recommender systems research. AI Mag. 41 (4), 79–95 (2020)

Jannach, D., Jugovac, M.: Measuring the business value of recommender systems. ACM TMIS 10 (4), 1–23 (2019)

Jannach, D., Zanker, M., Felfernig, A., Friedrich, G.: Recommender Systems-An Introduction. Cambridge University Press, Cambridge (2010)

Jannach, D., Zanker, M., Ge, M., Gröning, M.: Recommender systems in computer science and information systems-a landscape of research. In: 13th International Conference on Electronic Commerce and Web Technologies (EC-Web 2012), pp. 76–87, (2012)

Jannach, D., Lerche, L., Kamehkhosh, I., Jugovac, M.: What recommenders recommend: an analysis of recommendation biases and possible countermeasures. User Model. User-Adap. Inter. 25 (5), 427–491 (2015)

Jannach, D., Resnick, P., Tuzhilin, A., Zanker, M.: Recommender systems-beyond matrix completion. Commun. ACM 59 (11), 94–102 (2016)

Jannach, D., Pu, P., Ricci, F., Zanker, M.: Recommender systems: past, present, future. AI Mag. 42 (3), 3–6 (2021)

Jugovac, M., Jannach, D., Lerche, L.: Efficient optimization of multiple recommendation quality factors according to individual user tendencies. Expert Syst. Appl. 81 , 321–331 (2017)

Kaya, M., Bridge, D., Tintarev, N.: Ensuring Fairness in Group Recommendations by Rank-Sensitive Balancing of Relevance, pp. 101–110, (2020)

Kirnap, Ö., Diaz, F., Biega, A., Ekstrand, M.D., Carterette, B., Yilmaz, E.: Estimation of fair ranking metrics with incomplete judgments. In: WWW ’21: The Web Conference 2021, pp. 1065–1075, (2021)

Kitchenham, B.A., Brereton, P., Budgen, D., Turner, M., Bailey, J., Linkman, S.G.: Systematic literature reviews in software engineering-a systematic literature review. Inf. Softw. Technol. 51 (1), 7–15 (2009)

Kleinberg, J.M., Mullainathan, S., Raghavan, M.: Inherent trade-offs in the fair determination of risk scores. In: 8th Innovations in Theoretical Computer Science Conference, ITCS, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, LIPIcs, vol. 67, pp. 43:1–43:23, (2017)

Koprinska, I., Yacef, K.: People-to-people reciprocal recommenders. In: Recommender Systems Handbook, Springer, pp. 545–567, (2015)

Koshiyama, A.S., Kazim, E., Treleaven, P.C.: Algorithm auditing: Managing the legal, ethical, and technological risks of artificial intelligence, machine learning, and associated algorithms. Computer 55 (4), 40–50 (2022)

Koutsopoulos, I., Halkidi, M.: Efficient and fair item coverage in recommender systems. In: 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), IEEE, pp. 912–918, (2018)

Krafft, T.D., Hauer, M.P., Zweig, K.A.: Why do we need to be bots? what prevents society from detecting biases in recommendation systems. In: Bias and Social Aspects in Search and Recommendation - First International Workshop, BIAS 2020, vol 1245, pp. 27–34, (2020)

Kusner, M.J., Loftus, J., Russell, C., Silva, R.: Counterfactual fairness. In: Advances in Neural Information Processing Systems, vol. 30, (2017)

Li, Y., Chen, H., Fu, Z., Ge, Y., Zhang, Y.: User-oriented fairness in recommendation. In: Proceedings of The Web Conference 2021, WWW ’21, pp. 624–632, (2021a)

Li, Y., Chen, H., Xu, S., Ge, Y., Zhang, Y.: Towards personalized fairness based on causal notion. In: 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, pp. 1054–1063, (2021b)

Li, Y., Ge, Y., Zhang, Y.: Tutorial on fairness of machine learning in recommender systems. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2654–2657, (2021c)

Li, Y., Chen, H., Xu, S., Ge, Y., Tan, J., Liu, S., Zhang, Y.: Fairness in recommendation: a survey. (2022), CoRR arXiv:2205.13619

Lin, C., Liu, X., Xv, G., Li, H.: Mitigating sentiment bias for recommender systems. In: SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 31–40, (2021)

Lin, K., Sonboli, N., Mobasher, B., Burke, R.: Crank up the volume: Preference bias amplification in collaborative recommendation. In: Proceedings of the Workshop on Recommendation in Multi-stakeholder Environments co-located with the 13th ACM Conference on Recommender Systems (RecSys 2019), CEUR Workshop Proceedings, vol. 2440, (2019)

Lin, K., Sonboli, N., Mobasher, B., Burke, R.: Calibration in collaborative filtering recommender systems: a user-centered analysis. In: HT ’20: 31st ACM Conference on Hypertext and Social Media, pp. 197–206, (2020)

Liu, W., Liu, F., Tang, R., Liao, B., Chen, G., Heng, P.: Balancing between accuracy and fairness for interactive recommendation with reinforcement learning. In: Advances in Knowledge Discovery and Data Mining-24th Pacific-Asia Conference, PAKDD 2020, vol. 12084, pp. 155–167, (2020)

Malecek, L., Peska, L.: Fairness-preserving group recommendations with user weighting. In: Adjunct Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, pp. 4–9, (2021)

Mansoury, M., Mobasher, B., Burke, R., Pechenizkiy, M.: Bias disparity in collaborative recommendation: Algorithmic evaluation and comparison. In: Proceedings of the Workshop on Recommendation in Multi-stakeholder Environments co-located with the 13th ACM Conference on Recommender Systems (RecSys 2019), (2019)

Masthoff, J., Delic, A.: Group recommender systems: beyond preference aggregation. In: Rokach, L., Shapira, B., Kantor, P., Ricci, F. (eds.) Recommender Systems Handbook. Springer, Berlin (2022)

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. ACM Comput Surv 54(6), (2021)

Mehrotra, R., McInerney, J., Bouchard, H., Lalmas, M., Diaz, F.: Towards a fair marketplace: Counterfactual evaluation of the trade-off between relevance, fairness & satisfaction in recommendation systems. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, pp 2243–2251, (2018)

Melchiorre, A.B., Rekabsaz, N., Parada-Cabaleiro, E., Brandl, S., Lesota, O., Schedl, M.: Investigating gender fairness of recommendation algorithms in the music domain. Inf. Process. Manag. 58 (5), 102666 (2021)

Miller, T.: Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 267 , 1–38 (2019)

Article   MathSciNet   Google Scholar  

Misztal-Radecka, J., Indurkhya, B.: Bias-aware hierarchical clustering for detecting the discriminated groups of users in recommendation systems. Information Processing & Management 58 (3), 102519 (2021)

Mladenov, M., Hsu, C., Jain, V., Ie, E., Colby, C., Mayoraz, N., Pham, H., Tran, D., Vendrov, I., Boutilier, C.: RecSim NG: toward principled uncertainty modeling for recommender ecosystems. (2021), CoRR arXiv:2103.08057

Morina, G., Oliinyk, V., Waton, J., Marusic, I., Georgatzis, K.: Auditing and achieving intersectional fairness in classification problems. (2019), arXiv preprint arXiv:1911.01468

Moscati, M., Parada-Cabaleiro, E., Deldjoo, Y., Zangerle, E., Schedl, M.: Music4all-onion. a large-scale multi-faceted content-centric music recommendation dataset. In: Proceedings of the 31th ACM International Conference on Information & Knowledge Management (CIKM’22), (2022)

Mulligan, D.K., Kroll, J.A., Kohli, N., Wong, R.Y.: This thing called fairness: Disciplinary confusion realizing a value in technology. Proc ACM Hum Comput Interact 3 (CSCW), 119:1-119:36 (2019)

Naghiaei, M., Rahmani, H.A., Deldjoo, Y.: CPFair: Personalized Consumer and Producer Fairness Re-ranking for Recommender Systems. In: SIGIR ’22SIGIR ’22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, (2022)

Narayanan, A.: 21 definitions of fairness and their politics. Tutorial at FAT* 2018, (2018)

Ntoutsi, E., Fafalios, P., Gadiraju, U., Iosifidis, V., Nejdl, W., Vidal, M., Ruggieri, S., Turini, F., Papadopoulos, S., Krasanakis, E., Kompatsiaris, I., Kinder-Kurlanda, K., Wagner, C., Karimi, F., Fernández, M., Alani, H., Berendt, B., Kruegel, T., Heinze, C., Broelemann, K., Kasneci, G., Tiropanis, T., Staab, S.: Bias in data-driven artificial intelligence systems - an introductory survey. WIREs Data Mining Knowl Discov 10(3), (2020)

Nunes, I., Jannach, D.: A systematic review and taxonomy of explanations in decision support and recommender systems. User-Modeling and User-Adapted Interaction 27 (3–5), 393–444 (2017)

Oh, J., Park, S., Yu, H., Song, M., Park, S.T.: Novel recommendation based on personal popularity tendency. In: ICDM ’11, pp 507–516, (2011)

Olteanu, A., Castillo, C., Diaz, F., Kiciman, E.: Social data: Biases, methodological pitfalls, and ethical boundaries. Frontiers Big Data 2 , 13 (2019)

Patro, G.K., Biswas, A., Ganguly, N., Gummadi, K.P., Chakraborty, A.: Fairrec: Two-sided fairness for personalized recommendations in two-sided platforms. In: WWW ’20: The Web Conference 2020, pp 1194–1204, (2020)

Pedreshi, D., Ruggieri, S., Turini, F.: Discrimination-aware data mining. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, p 560–568, (2008)

Pessach, D., Shmueli, E.: A review on fairness in machine learning. ACM Computing Surveys (CSUR) 55 (3), 1–44 (2022)

Pitoura, E., Stefanidis, K., Koutrika, G.: Fairness in rankings and recommendations: an overview. VLDB J. 31 (3), 431–458 (2022)

Qiu, R., Wang, S., Chen, Z., Yin, H., Huang, Z.: CausalRec: Causal Inference for Visual Debiasing in Visually-Aware Recommendation. In: MM ’21: ACM Multimedia Conference, pp 3844–3852, (2021)

Rahmani, H.A., Deldjoo, Y., di Noia, T.: The role of context fusion on accuracy, beyond-accuracy, and fairness of point-of-interest recommendation systems. Expert Systems with Applications p 117700, (2022a)

Rahmani, H.A., Deldjoo, Y., Tourani, A., Naghiaei, M.: The unfairness of active users and popularity bias in point-of-interest recommendation. In: Advances in Bias and Fairness in Information Retrieval - Third International Workshop, BIAS 2022, Springer, Communications in Computer and Information Science, vol 1610, pp 56–68, (2022b)

Rahmani, H.A., Naghiaei, M., Dehghan, M., Aliannejadi, M.: Experiments on generalizability of user-oriented fairness in recommender systems. In: SIGIR ’22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 2755–2764, (2022c)

Rahmani, H.A., Naghiaei, M., Tourani, A., Deldjoo, Y.: Exploring the impact of temporal bias in point-of-interest recommendation. In: Proceedings of the 16th ACM Conference on Recommender Systems, (2022d)

Rastegarpanah, B., Gummadi, K.P., Crovella, M.: Fighting fire with fire: Using antidote data to improve polarization and fairness of recommender systems. In: Proceedings of the twelfth ACM International Conference on Web Search and Data Mining, pp 231–239, (2019)

Rawls, J.: Justice as fairness: A restatement. Harvard University Press (2001)

Riederer, C., Chaintreau, A.: The price of fairness in location based advertising. In: FATREC’17, (2017)

Rohde, D., Bonner, S., Dunlop, T., Vasile, F., Karatzoglou, A.: Recogym: A reinforcement learning environment for the problem of product recommendation in online advertising. arXiv preprint arXiv:1808.00720 , (2018)

Schelenz, L.: Diversity-aware Recommendations for Social Justice? Exploring User Diversity and Fairness in Recommender Systems. In: Adjunct Publication of the 29th ACM Conference on User Modeling, Adaptation and Personalization, UMAP 2021, pp 404–410, (2021)

Selbst, A.D., Boyd, D., Friedler, S.A., Venkatasubramanian, S., Vertesi, J.: Fairness and abstraction in sociotechnical systems. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, p 59–68, (2019)

Serbos, D., Qi, S., Mamoulis, N., Pitoura, E., Tsaparas, P.: Fairness in package-to-group recommendations. In: Proceedings of the 26th International Conference on World Wide Web, WWW 2017, pp 371–379, (2017)

Seymen, S., Abdollahpouri, H., Malthouse, E.C.: A unified optimization toolbox for solving popularity bias, fairness, and diversity in recommender systems. In: Proceedings of the 1st Workshop on Multi-Objective Recommender Systems (MORS 2021) co-located with 15th ACM Conference on Recommender Systems (RecSys 2021), CEUR Workshop Proceedings, vol 2959, (2021)

Shakespeare, D., Porcaro, L., Gómez, E., Castillo, C.: Exploring artist gender bias in music recommendation. In: Proceedings of the Workshops on Recommendation in Complex Scenarios and the Impact of Recommender Systems co-located with 14th ACM Conference on Recommender Systems (RecSys 2020), CEUR Workshop Proceedings, vol 2697, (2020)

Shen, T., Li, J., Bouadjenek, M.R., Mai, Z., Sanner, S.: Towards understanding and mitigating unintended biases in language model-driven conversational recommendation. Information Processing and Management In press, (2023)

Shrestha, Y.R., Yang, Y.: Fairness in algorithmic decision-making: Applications in multi-winner voting, machine learning, and recommender systems. Algorithms 12 (9), 199 (2019)

Slokom, M., Hanjalic, A., Larson, M.: Towards user-oriented privacy for recommender system data: A personalization-based approach to gender obfuscation for user profiles. Information Processing & Management 58 (6), 102722 (2021)

Sonboli, N., Burke, R., Mattei, N., Eskandanian, F., Gao, T.: “and the winner is...”: Dynamic lotteries for multi-group fairness-aware recommendation. In: FAccTRec Workshop: Responsible Recommendation (RecSys ’20), (2020)

Sonboli, N., Smith, J.J., Cabral Berenfus, F., Burke, R., Fiesler, C.: Fairness and transparency in recommendation: The users’ perspective. In: Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, p 274–279, (2021)

Srivastava, M., Heidari, H., Krause, A.: Mathematical notions vs. human perception of fairness: A descriptive approach to fairness for machine learning. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, pp 2459–2468, (2019)

Steck, H.: Calibrated recommendations. In: Proceedings of the 12th ACM Conference on Recommender Systems, pp 154–162, (2018)

Stratigi, M., Kondylakis, H., Stefanidis, K.: Fairness in group recommendations in the health domain. In: 33rd IEEE International Conference on Data Engineering, ICDE 2017, pp 1481–1488, (2017)

Stratigi, M., Nummenmaa, J., Pitoura, E., Stefanidis, K.: Fair sequential group recommendations. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing, pp 1443–1452, (2020)

Sühr, T., Hilgard, S., Lakkaraju, H.: Does Fair Ranking Improve Minority Outcomes? Understanding the Interplay of Human and Algorithmic Biases in Online Hiring, p 989–999, (2021)

Sun, W., Khenissi, S., Nasraoui, O., Shafto, P.: Debiasing the human-recommender system feedback loop in collaborative filtering. In: Companion of The 2019 World Wide Web Conference, WWW 2019, ACM, pp 645–651, (2019)

Tintarev, N., Masthoff, J.: Beyond explaining single item recommendations. In: Recommender Systems Handbook, Springer, pp 711–756, (2022)

Trattner, C., Jannach, D., Motta, E., Meijer, I.C., Diakopoulos, N., Elahi, M., Opdahl, A.L., Tessem, B., Borch, N., Fjeld, M., Øvrelid, L., Smedt, K.D., Moe, H.: Responsible Media Technology and AI: Challenges and Research Directions. AI and Ethics 2 , 585–594 (2022)

Tsintzou, V., Pitoura, E., Tsaparas, P.: Bias disparity in recommendation systems. In: Proceedings of the Workshop on Recommendation in Multi-stakeholder Environments co-located with the 13th ACM Conference on Recommender Systems (RecSys 2019), vol 2440, (2019)

Verma, S., Rubin, J.: Fairness definitions explained. In: Brun Y, Johnson B, Meliou A (eds) Proceedings of the International Workshop on Software Fairness, FairWare@ICSE 2018, pp 1–7, (2018)

Verma, S., Gao, R., Shah, C.: Facets of fairness in search and recommendation. In: Bias and Social Aspects in Search and Recommendation - First International Workshop, BIAS 2020, Communications in Computer and Information Science, vol 1245, pp 1–11, (2020)

Wan, M., Ni, J., Misra, R., McAuley, J.: Addressing marketing bias in product recommendations. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp 618–626, (2020)

Wang, C., Wang, K., Bian, A., Islam, R., Keya, K.N., Foulds, J.R., Pan, S.: Do humans prefer debiased AI algorithms? A case study in career recommendation. In: IUI 2022: 27th International Conference on Intelligent User Interfaces, pp 134–147, (2022a)

Wang, X., Thain, N., Sinha, A., Prost, F., Chi, E.H., Chen, J., Beutel, A.: Practical compositional fairness: Understanding fairness in multi-component recommender systems. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp 436–444, (2021)

Wang, Y., Ma, W., Zhang, M., Liu, Y., Ma, S.: A survey on the fairness of recommender systems. ACM TOIS forthcoming, (2022b)

Weydemann, L., Sacharidis, D., Werthner, H.: Defining and measuring fairness in location recommendations. In: Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Location-based Recommendations, Geosocial Networks and Geoadvertising, LocalRec@SIGSPATIAL 2019, pp 6:1–6:8, (2019)

Wu, C., Wu, F., Wang, X., Huang, Y., Xie. X.: Fairness-aware news recommendation with decomposed adversarial learning. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, pp 4462–4469, (2021a)

Wu, Y., Cao, J., Xu, G., Tan, Y.: TFROM: A two-sided fairness-aware recommendation model for both customers and providers. In: SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp 1013–1022, (2021b)

Wundervald, B.D.: Cluster-based quotas for fairness improvements in music recommendation systems. Int J Multim Inf Retr 10 (1), 25–32 (2021)

Xia, B., Yin, J., Xu, J., Li, Y.: We-rec: A fairness-aware reciprocal recommendation based on walrasian equilibrium. Knowl Based Syst 182, (2019)

Xiao, B., Benbasat, I.: E-commerce product recommendation agents: Use, characteristics, and impact. MIS Q. 31 (1), 137–209 (2007)

Xiao, Y., Pei, Q., Yao, L., Yu, S., Bai, L., Wang, X.: An enhanced probabilistic fairness-aware group recommendation by incorporating social activeness. J. Netw. Comput. Appl. 156 , 102579 (2020)

Yadav, H., Du, Z., Joachims, T.: Policy-gradient training of fair and unbiased ranking functions. In: SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 1044–1053, (2021)

Yao, S., Huang, B.: Beyond parity: Fairness objectives for collaborative filtering. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS ’17, p 2925–2934, (2017)

Zafar, M.B., Valera, I., Gomez Rodriguez, M., Gummadi, K.P.: Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In: Proceedings of the 26th International Conference on World Wide Web, pp 1171–1180, (2017)

Zehlike, M., Yang, K., Stoyanovich, J.: Fairness in ranking, part i: Score-based ranking. ACM Comput Surv Just Accepted, (2022a)

Zehlike, M., Yang, K., Stoyanovich, J.: Fairness in ranking, part ii: Learning-to-rank and recommender systems. ACM Comput Surv forthcoming, (2022b)

Zhang, J., Bareinboim, E.: Fairness in decision-making—the causal explanation formula. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 32, (2018)

Zhao, Z., Chen, J., Zhou, S., He, X., Cao, X., Zhang, F., Wu, W.: Popularity bias is not always evil: Disentangling benign and harmful bias for recommendation. IEEE Transactions on Knowledge & Data Engineering (01):1–13, (2022)

Zheng, Y., Dave, T., Mishra, N., Kumar, H.: Fairness in reciprocal recommendations: A speed-dating study. In: Adjunct Publication of the 26th Conference on User Modeling, Adaptation and Personalization, UMAP 2018, pp 29–34, (2018)

Zhou, M., Zhang, J., Adomavicius, G.: Longitudinal impact of preference biases on recommender systems’ performance. Kelley School of Business (2021-10), (2021)

Zhu, Q., Zhou, A., Sun, Q., Wang, S., Yang, F.: FMSR: A fairness-aware mobile service recommendation method. In: 2018 IEEE International Conference on Web Services, ICWS 2018, San Francisco, CA, USA, July 2-7, 2018, IEEE, pp 171–178, (2018a)

Zhu, Q., Sun, Q., Li, Z., Wang, S.: FARM: A fairness-aware recommendation method for high visibility and low visibility mobile apps. IEEE Access 8 , 122747–122756 (2020)

Zhu, Z., Hu, X., Caverlee, J.: Fairness-aware tensor-based recommendation. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, pp 1153–1162, (2018b)

Zhu, Z., Wang, J., Zhang, Y., Caverlee, J.: Fairness-aware recommendation of information curators. (2018c), arXiv:1809.03040

Zhu, Z., Wang, J., Caverlee, J.: Measuring and mitigating item under-recommendation bias in personalized ranking systems. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, p 449–458, (2020b)

Zhu, Z., Kim, J., Nguyen, T., Fenton, A., Caverlee, J.: Fairness among new items in cold start recommender systems. In: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, pp 767–776, (2021)

Download references

Acknowledgements

The authors thank the reviewers for their thoughtful comments and suggestions.

Open access funding provided by Politecnico di Bari within the CRUI-CARE Agreement.

Author information

Authors and affiliations.

Polytechnic University of Bari, Bari, Italy

Yashar Deldjoo, Alessandro Difonzo & Dario Zanzonelli

University of Klagenfurt, Klagenfurt, Austria

Dietmar Jannach

University Autonomous of Madrid, Madrid, Spain

Alejandro Bellogin

You can also search for this author in PubMed   Google Scholar

Contributions

YD: conceptualization, methodology, investigation, writing, review & editing DJ: conceptualization, methodology, investigation, writing, review & editing AB: conceptualization, methodology, investigation, writing, review & editing AD: investigation DZ: investigation All authors contributed to the article and approved the submitted version.

Corresponding author

Correspondence to Yashar Deldjoo .

Ethics declarations

Conflict of interest.

The authors declare no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Deldjoo, Y., Jannach, D., Bellogin, A. et al. Fairness in recommender systems: research landscape and future directions. User Model User-Adap Inter 34 , 59–108 (2024). https://doi.org/10.1007/s11257-023-09364-z

Download citation

Received : 27 June 2022

Accepted : 17 March 2023

Published : 24 April 2023

Issue Date : March 2024

DOI : https://doi.org/10.1007/s11257-023-09364-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Recommender systems
  • Find a journal
  • Publish with us
  • Track your research

Research Topics in Recommender Systems

Postgraduate course, course description, objectives and content, learning outcomes.

A student who has completed the course should have the following learning outcomes defined in terms of knowledge, skills and general competence:

The candidate

  • has fundamental knowledge about the central concepts behind recommender systems
  • has broad knowledge about state-of-the-art recommender system algorithms
  • has extensive knowledge about how to efficiently evaluate recommender systems
  • has knowledge about the current research trends in recommender systems
  • is able to implement state-of-the-art recommender system algorithms
  • can develop their own recommender system
  • is able to deploy HCI and machine learning routines to evaluate recommender systems
  • is able to teach laymen about how recommender systems work

Level of Study

Semester of instruction.

  • Assignments throughout the semester which must be completed and approved.
  • Participation at 80% of course seminars.

Compulsory requirements are only valid the semester they are approved.

  • Individual written school exam (30%)
  • Practical group assignment project (70%)

Both the exam and the assignment paper must be done in the teaching semester.

The exam assignment will be given in the language of instruction in the course. The exam answer must be submitted in the same language as the exam assignment.

Assessment in teaching semester.

Retake exam

School exam:

A retake exam is arranged for students with valid absence according to § 5-5. If there is a retake exam, this will be available for students with the follow results/absences:

  • Medical certificate/valid absence
  • Interruption during the exam
  • Fail/failed

If you have the right to take a retake exam and a retake exam is arranged for students with valid absences, you can sign up yourself in Studentweb after January 15/August 1.

Group assignment:

Students with valid absence as defined in the UiB regulations § 5-5 can apply for an extended submission deadline to [email protected]. The application must be submitted before the deadline for submission has expired.

Contact Information

[email protected]

Tlf: 55 58 41 17

  • Frontiers in Big Data
  • Recommender Systems
  • Research Topics

Industrial Recommender Systems

Total Downloads

Total Views and Downloads

About this Research Topic

Recommendation systems are used widely across many industries, such as e-commerce, multimedia content platforms, and social networks, to provide suggestions that a user will most likely consume or connect, and thus, improving the user experience. This motivates people in both industry and research ...

Keywords : Recommender Systems, Explainability, Fairness, Privacy, Security, Multi-objective, Reproducibility, Unbiased Recommendation, Business Impact

Important Note : All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

Topic Editors

Topic coordinators, recent articles, submission deadlines.

Submission closed.

Participating Journals

Total views.

  • Demographics

No records found

total views article views downloads topic views

Top countries

Top referring sites, about frontiers research topics.

With their unique mixes of varied contributions from Original Research to Review Articles, Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author.

IMAGES

  1. Introduction to Recommender system

    recommender system research topics

  2. Python Recommender Systems: Content Based & Collaborative Filtering

    recommender system research topics

  3. Recommender Systems 101

    recommender system research topics

  4. Recommender system architecture.

    recommender system research topics

  5. PPT

    recommender system research topics

  6. How to build a recommender system for a startup?

    recommender system research topics

VIDEO

  1. Recommender system

  2. Knowledge-Based Recommender System Part#2

  3. 16.5 Personalized Ranking for Recommender Systems

  4. [The Web Conference 2023 REVIEW] Knowledge Distillation Approaches for Accurate and ... / 강성구

  5. Book Recommender System

  6. RecSys 2016: Paper Session1

COMMENTS

  1. Recommender systems: Trends and frontiers

    Recent research work on topics such as multistakeholder recommendation, system biases, fairness and various potentially negative effects of recommender systems started to address these important questions (Abdollahpouri et al. 2020; Deldjoo et al. 2021; Ekstrand et al. 2021). However, still too often these problems are mainly addressed from a ...

  2. (PDF) Recommender Systems: An Overview, Research Trends, and Future

    Abstract: Recommender system (RS) has emerged as a major research interest. that aims to help users to find items online by providing sug gestions that. closely match their interest. This pa per ...

  3. A systematic review and research perspective on recommender systems

    Recommender systems are efficient tools for filtering online information, which is widespread owing to the changing habits of computer users, personalization trends, and emerging access to the internet. Even though the recent recommender systems are eminent in giving precise recommendations, they suffer from various limitations and challenges like scalability, cold-start, sparsity, etc. Due to ...

  4. Exploring the Landscape of Recommender Systems Evaluation: Practices

    Recommender systems research and practice are fast-developing topics with growing adoption in a wide variety of information access scenarios. In this article, we present an overview of research specifically focused on the evaluation of recommender systems.

  5. A systematic literature review on educational recommender systems for

    Recommender systems have become one of the main tools for personalized content filtering in the educational domain. Those who support teaching and learning activities, particularly, have gained increasing attention in the past years. This growing interest has motivated the emergence of new approaches and models in the field, in spite of it, there is a gap in literature about the current trends ...

  6. Frontiers

    The aim of the "Reviews in recommender systems" Research Topic is to highlight recent advances in the broad field of recommender systems, including important topics such as fairness (Kowald et al., 2020; Wang et al., 2023), privacy (Friedman et al., 2015; Muellner et al., 2021), and multi-stakeholder objectives (Abdollahpouri and Burke ...

  7. A Systematic Review of Recommender Systems and Their Applications in

    Abstract. This paper discusses the valuable role recommender systems may play in cybersecurity. First, a comprehensive presentation of recommender system types is presented, as well as their advantages and disadvantages, possible applications and security concerns. Then, the paper collects and presents the state of the art concerning the use of ...

  8. Health Recommender Systems: Systematic Review

    Recommender Systems and Techniques. Recommender techniques are traditionally divided into different categories [12,13] and are discussed in several state-of-the-art surveys [].Collaborative filtering is the most used and mature technique that compares the actions of multiple users to generate personalized suggestions. An example of this technique can typically be found on e-commerce sites ...

  9. Advanced Topics in Recommender Systems

    This chapter reviews several advanced topics in recommender systems, such as group recommendations, multi-criteria recommendations, active learning, and privacy. In addition, some interesting applications of recommender systems have been covered. ... ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 155-164, 2012 ...

  10. [2302.02579] Recommender Systems: A Primer

    Recommender Systems: A Primer. Pablo Castells, Dietmar Jannach. Personalized recommendations have become a common feature of modern online services, including most major e-commerce sites, media platforms and social networks. Today, due to their high practical relevance, research in the area of recommender systems is flourishing more than ever.

  11. Systematic Review of Recommendation Systems for Course Selection

    Systematic Review of Recommendation Systems for. Course Selection. Shrooq Algarni * and Frederick Sheldon. Department of Computer Science, University of Idaho, Moscow, ID 83843, USA; sheldon ...

  12. Recommender systems for sustainability: overview and research issues

    The contributions of this article enhance existing topic-related overviews (Bui, 2000; Vinuesa et al., 2020; vanWynsberghe, 2021) in terms of (1) a focus on recommender systems technologies for sustainability, (2) the provision of concrete examples of how recommender systems can be applied to achieve individual SDGs, and (3) a discussion of ...

  13. Personalized Recommendation Systems: Five Hot Research Topics You Must

    Research topic 1: recommendation system and deep learning. In recent years, deep learning technology has achieved great success in areas of speech recognition, computer vision, and natural language processing; and recommendation systems can benefit from these breakthroughs. Today, deep learning-based recommendation algorithms have made ...

  14. Recommender systems: an overview, research trends, and future

    Recommender system (RS) has emerged as a major research interest that aims to help users to find items online by providing suggestions that closely match their interest. This paper provides a comprehensive study on the RS covering the different recommendation approaches, associated issues, and techniques used for information retrieval. Thanks to its widespread applications, it has induced ...

  15. Reviews in Recommender Systems: 2022

    The aim of the " Reviews in recommender systems " Research Topic is to highlight recent advances in the broad field of recommender systems, including important topics such as fairness (Kowald et al., 2020; Wang et al., 2023), privacy (Friedman et al., 2015; Muellner et al., 2021), and multi-stakeholder objectives (Abdollahpouri and Burke ...

  16. Recommender Systems: Techniques, Applications, and Challenges

    Recommender systems research, aside from its theoretical contribution, is generally aimed at practically improving industrial RSs and involves research about various practical aspects that apply to the implementation of the systems. Indeed, an RS is an example of large-scale machine learning and data mining algorithms in commercial practice [5 ...

  17. Natural Language Processing for Recommender Systems

    In this Research Topic collection, we welcome publications from researchers in both academia and industry on the latest developments on the intersection between Natural Language Processing and Recommender Systems. The submissions will range from theoretically motivated new algorithms to empirically validated solutions and user studies. Topics ...

  18. Fifteen Years of Recommender Systems Research in Higher Education

    The study provides essential insights into current and future research on recommender systems in higher education. This analysis helps researchers, policymakers, and practitioners better understand the development of recommender systems in higher education and possible practice implications. ... These three topics represent the matured research ...

  19. A systematic literature review on educational recommender systems for

    Specifically, this study contributes to the field providing a summary and an analysis of the current available information about the teaching and learning support recommender systems topic in four dimensions: (i) how the recommendations are produced (ii) how the recommendations are presented to the users (iii) how the recommender systems are ...

  20. Frontiers in Big Data

    See all (4) Learn more about Research Topics. Part of an innovative multidisciplinary journal, focusing on the research and practice of recommender systems, surveys and tutorials of important techniques, and case studies of real-world implemen...

  21. Fairness in recommender systems: research landscape and future

    Recommender systems can strongly influence which information we see online, e.g., on social media, and thus impact our beliefs, decisions, and actions. At the same time, these systems can create substantial business value for different stakeholders. Given the growing potential impact of such AI-based systems on individuals, organizations, and society, questions of fairness have gained ...

  22. Research Topics in Recommender Systems

    Objectives and Content. This course offers an overview of approaches to develop and evaluate state-of-the-art recommender system methods. In particular, this course makes an extensive introduction to current algorithmic approaches for generating personalized recommender approaches, such as collaborative and content-based filtering, as well as ...

  23. Frontiers

    The evaluation of performance using competencies within a structured framework holds significant importance across various professional domains, particularly in roles like project manager. Typically, this assessment process, overseen by senior evaluators, involves scoring competencies based on data gathered from interviews, completed forms, and evaluation programs. However, this task is ...

  24. Industrial Recommender Systems

    Recommendation systems are used widely across many industries, such as e-commerce, multimedia content platforms, and social networks, to provide suggestions that a user will most likely consume or connect, and thus, improving the user experience.This motivates people in both industry and research organizations to focus on personalization or recommendation algorithms, which has resulted in a ...