In the early stage of recommendation systems, people spent much time on tedious and onerous feature engineering. At that time, the dimensions of the raw features are relatively small, which makes it possible to implement different combinations of raw features. The newly created features are then fed into a shallow model, such as Logistic Regression (LR) and Gradient Boosting Decision Trees (GBDT) are widely used in CTR prediction. Then, Factorization Machine (FM) transforms the learning of users and items features into a small, shared vector shape. Based on the above method, Field-aware and Field-weighted FM (FFM) further consider the different impact of the fields that a feature belongs to in order to improve the performance of CTR prediction. Along this line, Attentional Factorization Machines (AFM) are proposed to automatically learn weights of deep cross-features and Neural Factorization Machines (NFM) enhances FMs by modelling higher-order and non-linear feature interactions.

Recently, the success of deep neural networks (DNN) in natural language processing and computer vision brings a new direction to the recommendation system. Among them, the Wide & Deep learning introduces deep neural networks to the CTR prediction. It jointly trains a deep neural network along with the traditional wide linear model. The deep neural networks liberated people from feature engineering while generalizing better combinations of the features. Lots of variants of the Wide & Deep learning have been proposed since it revolutionizes the development of the CTR prediction. Deep & Cross network (DCN) replaces the wide linear part with cross-network, which generates explicit feature crossing among low and high level layers. DeepFM combines the power of DNN and factorization machines for feature representation in the recommendation system. Furthermore, xDeepFM extends the work of DNN by proposing a Compressed Interaction Network to enumerate and compress all feature interactions. Overall, the deep models mentioned above all construct a similar model structure by combing the low-order and high-order features, which greatly reduce the effort of feature engineering and improve the performance of CTR prediction.

However, these aforementioned shallow or deep models take statistical and categorical features as input, while discarding the sequential behavior information of users. For example, users may search items at an e-commerce system, then view some items of interest, and these items are likely to be clicked or purchased next time. Since the historical behaviors explicitly indicate the preference of the users, it has gained much more attention in the recommendation systems. Among them, DIN proposes a local activation unit that learns the dynamic user interests from the sequential behavior features. DIEN designs an interest evolving layer an attentional update gate to model the dependency between sequential behaviors. The researches above realized the importance of user’s historical behaviors. Unfortunately, they just project other information (i.e., user-specific and context) into one vector and did not pay equal attention to the interactions between the candidate item and fine-grained information, while modeling this interaction has shown extensively progress in many tasks, such as search recommendations and knowledge distillation.

Different from all previous methods, MIAN can explore the sequential behavior and other fine-grained information simultaneously. Specifically, compared to shallow and deep models, MIAN has a remarkable ability to encode user’s preference through sequential behavior. Compared to sequential models, MIAN can model fine-grained feature interactions better when historical behavior is not enough or representative.

Model	Paper	Publication	Type	Description
LR	Predicting Clicks: Estimating the Click-Through Rate for New Ads	WWW'07	Shallow	Logistic regression (LR) is a simple baseline model for CTR prediction. With the online learning algorithm, FTRL, proposed by Google, LR has been widely adopted in industry. It’s a widely used baseline and applies linear transformation to model the relationship of all the features.
FM	Factorization Machines	ICDM'10	Shallow	While LR fails to capture non-linear feature interactions, Rendle et al. propose factorization machine (FM) that embeds features into dense vectors and models pairwise feature interactions as inner products of the corresponding embedding vectors. Notably, FM also has a linear time complexity in terms of the number of features.
CCPM	A Convolutional Click Prediction Model	CIKM'15	Deep	CCPM reports the first attempt to use convolution for CTR prediction, where feature embeddings are aggregated hierarchically through convolution networks.
FFM	Field-aware Factorization Machines for CTR Prediction	RecSys'16	Shallow	Field-aware factorization machine (FFM) is an extension of FM that considers field information for feature interactions. It was a winner model in several Kaggle contests on CTR prediction.
YoutubeDNN	Deep Neural Networks for YouTube Recommendations	RecSys'16	Deep	DNN is a straightforward deep model, which applies a fully-connected network (termed DNN) after the concatenation of feature embeddings for CTR prediction.
Wide&Deep	Wide & Deep Learning for Recommender Systems	DLRS'16	Deep	Wide&Deep is a general learning framework proposed by Google that combines a wide (or shallow) network and deep network to achieve the advantages of both. It jointly trains a linear model and a deep MLP model to the CTR prediction.
IPNN	Product-based Neural Networks for User Response Prediction	ICDM'16	Deep	PNN is a product-based network that feeds the inner (or outer) products of features embeddings as the input of DNN. Due to the huge memory requirement of pairwise outer products, we use the inner product version, IPNN.
DeepCross	Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features	KDD'16	Deep	Inspired by residual networks, deep crossing to add residual connections between layers of DNNs. DCN is proposed to handle a set of sparse and dense features, and learn cross high-order features jointly with traditional deep MLP.
HOFM	Higher-Order Factorization Machines	NIPS'16	Shallow	Since FM only captures second-order feature interactions, HOFM aims to extend FM to higher-order factorization machines. However, it results in exponential feature combinations that consume huge memory and take a long running time.
DeepFM	DeepFM: A Factorization-Machine based Neural Network for CTR Prediction	IJCAI'17	Deep	DeepFM is an extension of Wide&Deep that substitutes LR with FM to explicitly model second-order feature interactions. It combines the explicit high-order interaction module with deep MLP module and traditional FM module, and requires no manual feature engineering.
NFM	Neural Factorization Machines for Sparse Predictive Analytics	SIGIR'17	Deep	Similar to PNN, NFM proposes a Bi-interaction layer that pools the pairwise feature interactions to a vector and then feed it to a DNN for CTR prediction.
AFM	Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks	IJCAI'17	Deep	Instead of treating all feature interactions equally as in FM, AFM learns the weights of feature interactions via attentional networks. Different from FwFM, AFM adjusts the weights dynamically according to the input data sample.
DCN	Deep & Cross Network for Ad Click Predictions	ADKDD'17	Deep	In DCN, a cross network is proposed to perform high-order feature interactions in an explicit way. In addition, it also integrates a DNN network following the Wide&Deep framework.
FwFM	Field-weighted Factorization Machines for Click-Through Rate Prediction in Display Advertising	WWW'18	Shallow	It considers field-wise weights of features interactions. Compared with FFM, it reports comparable performance but uses much fewer model parameters.
xDeepFM	xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems	KDD'18	xDeepFM	While high-order feature interactions modeled by DCN are bit-wise, xDeepFM proposes to capture high-order feature interactions in a vector-wise way via a compressed interaction network (CIN). It uses Compressed Interaction Network to enumerate and compress all feature interactions, for modeling an explicit order of interactions.
DIN	Deep Interest Network for Click-Through Rate Prediction	KDD'18	Deep	It’s an early work exploits users’ historical behaviors and uses the attention mechanism to activate user behaviors in which the user be interested in different items.
FiGNN	FiGNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction	CIKM'19	Deep	FiGNN leverages the message passing mechanism of graph neural networks to learn high-order features interactions.
AutoInt/AutoInt+	AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks	CIKM'19	Deep	AutoInt leverages self-attention networks to learn high-order features interactions. AutoInt+ integrates AutoInt with a DNN network.
FiBiNET	FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction	RecSys'19	Deep	FiBiNET leverages squeeze-excitation network to capture important features, and proposes bilinear interactions to enhance feature interactions.
FGCNN	Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction	WWW'19	Deep	FGCNN applies convolution networks and recombination layers to generate additional combinatorial features to enrich existing feature representations.
HFM/HFM+	Holographic Factorization Machines for Recommendation	AAAI'19	Deep	HFM proposes holographic representation and computes compressed outer products via circular convolution to model pairwise feature interactions. HFM+ further integrates a DNN network with HFM.
ONN	Operation-aware Neural Networks for User Response Prediction	Neural Networks'20	Deep	ONN (a.k.a., NFFM) is a model built on FFM. It feeds the interaction outputs from FFM to a DNN network for CTR prediction.
AFN/AFN+	Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions	AAAI'20	Deep	AFN applies logarithmic transformation layers to learn adaptive-order feature interactions. AFN+ further integrates AFN with a DNN network.
LorentzFM	Learning Feature Interactions with Lorentzian Factorization	AAAI'20	Shallow	LorentzFM embed features into a hyperbolic space and model feature interactions via triangle inequality of Lorentz distance.
InterHAt	Interpretable Click-through Rate Prediction through Hierarchical Attention	WSDM'20	Deep	InterHAt employs hierarchical attention networks to model high-order feature interactions in an efficient manner.
FLEN	FLEN: Leveraging Field for Scalable CTR Prediction	DLP-KDD'20	Deep	FLEN: Leveraging Field for Scalable CTR Prediction
FmFM	FM^2: Field-matrixed Factorization Machines for Recommender Systems	WWW'21	Deep	FM^2: Field-matrixed Factorization Machines for Recommender Systems

Models

📄️ A3C

📄️ AFM

📄️ AFN

📄️ AR

📄️ ASMG

📄️ AttRec

📄️ AUSH

📄️ AutoInt

📄️ BASR

📄️ BCQ

📄️ Behavior Propensity Modeling

📄️ BiasOnly

📄️ BigGraph

📄️ BPR

📄️ BST

📄️ CASER

📄️ CIGC

📄️ CoKE

📄️ DCN

📄️ DDPG

📄️ DeepCross

📄️ DeepFM

📄️ DeepWalk

📄️ DGTN

📄️ DM

📄️ DMT

📄️ DPADL

📄️ DQN

📄️ DRQN

📄️ DRR

📄️ Dueling DQN

📄️ FFM

📄️ FGNN

📄️ FM

📄️ GAT

📄️ GC-SAN

📄️ GCE-GNN

📄️ GLMix

📄️ GRU4Rec

📄️ HMLET

📄️ IncCTR

📄️ IPW

📄️ ItemPop

📄️ KHGT

📄️ LESSR

📄️ LightFM WARP

📄️ LightGCN

📄️ LINE

📄️ LIRD

📄️ Markov Chains

📄️ MATN

📄️ MB-GMN

📄️ MDP

📄️ MF

📄️ MIAN

📄️ MMoE

📄️ MPNN

📄️ NeuMF

📄️ NGCF

📄️ NMRN

📄️ Node2vec

📄️ PNN

📄️ PPO

📄️ Q-learning

📄️ Random Walk

📄️ RKMF

📄️ SAC

📄️ SARSA

📄️ SASRec

📄️ SDNE

📄️ SGL

📄️ Shared Bottom

📄️ SiReN

📄️ SLIST

📄️ SML

📄️ SPop

📄️ SR-GNN

📄️ SR-SAN

📄️ SSE-PT

CTR Prediction Models