Models
In the following sections, we will more systematically introduce the following models:
📄️ A3C
📄️ AFM
📄️ AFN
📄️ AR
📄️ ASMG
📄️ AttRec
📄️ AUSH
📄️ AutoInt
📄️ BASR
📄️ BCQ
📄️ Behavior Propensity Modeling
📄️ BiasOnly
📄️ BigGraph
📄️ BPR
📄️ BST
📄️ CASER
📄️ CIGC
📄️ CoKE
📄️ DCN
📄️ DDPG
📄️ DeepCross
📄️ DeepFM
📄️ DeepWalk
📄️ DGTN
📄️ DM
📄️ DMT
📄️ DPADL
📄️ DQN
📄️ DRQN
📄️ DRR
📄️ Dueling DQN
📄️ FFM
📄️ FGNN
📄️ FM
📄️ GAT
📄️ GC-SAN
📄️ GCE-GNN
📄️ GLMix
📄️ GRU4Rec
📄️ HMLET
📄️ IncCTR
📄️ IPW
📄️ ItemPop
📄️ KHGT
📄️ LESSR
📄️ LightFM WARP
📄️ LightGCN
📄️ LINE
📄️ LIRD
📄️ Markov Chains
📄️ MATN
📄️ MB-GMN
📄️ MDP
📄️ MF
📄️ MIAN
📄️ MMoE
📄️ MPNN
📄️ NeuMF
📄️ NGCF
📄️ NMRN
📄️ Node2vec
📄️ PNN
📄️ PPO
📄️ Q-learning
📄️ Random Walk
📄️ RKMF
📄️ SAC
📄️ SARSA
📄️ SASRec
📄️ SDNE
📄️ SGL
📄️ Shared Bottom
📄️ SiReN
📄️ SLIST
📄️ SML
📄️ SPop
📄️ SR-GNN
📄️ SR-SAN
📄️ SSE-PT
📄️ STAMP
📄️ Struc2Vec
📄️ SVAE
📄️ TAaMR
📄️ TAGNN-PP
📄️ TAGNN
📄️ TGIN
📄️ VNCF
📄️ VSKNN
📄️ Wide and Deep
📄️ Word2vec
📄️ xDeepFM
CTR Prediction Models
In the early stage of recommendation systems, people spent much time on tedious and onerous feature engineering. At that time, the dimensions of the raw features are relatively small, which makes it possible to implement different combinations of raw features. The newly created features are then fed into a shallow model, such as Logistic Regression (LR) and Gradient Boosting Decision Trees (GBDT) are widely used in CTR prediction. Then, Factorization Machine (FM) transforms the learning of users and items features into a small, shared vector shape. Based on the above method, Field-aware and Field-weighted FM (FFM) further consider the different impact of the fields that a feature belongs to in order to improve the performance of CTR prediction. Along this line, Attentional Factorization Machines (AFM) are proposed to automatically learn weights of deep cross-features and Neural Factorization Machines (NFM) enhances FMs by modelling higher-order and non-linear feature interactions.
Recently, the success of deep neural networks (DNN) in natural language processing and computer vision brings a new direction to the recommendation system. Among them, the Wide & Deep learning introduces deep neural networks to the CTR prediction. It jointly trains a deep neural network along with the traditional wide linear model. The deep neural networks liberated people from feature engineering while generalizing better combinations of the features. Lots of variants of the Wide & Deep learning have been proposed since it revolutionizes the development of the CTR prediction. Deep & Cross network (DCN) replaces the wide linear part with cross-network, which generates explicit feature crossing among low and high level layers. DeepFM combines the power of DNN and factorization machines for feature representation in the recommendation system. Furthermore, xDeepFM extends the work of DNN by proposing a Compressed Interaction Network to enumerate and compress all feature interactions. Overall, the deep models mentioned above all construct a similar model structure by combing the low-order and high-order features, which greatly reduce the effort of feature engineering and improve the performance of CTR prediction.
However, these aforementioned shallow or deep models take statistical and categorical features as input, while discarding the sequential behavior information of users. For example, users may search items at an e-commerce system, then view some items of interest, and these items are likely to be clicked or purchased next time. Since the historical behaviors explicitly indicate the preference of the users, it has gained much more attention in the recommendation systems. Among them, DIN proposes a local activation unit that learns the dynamic user interests from the sequential behavior features. DIEN designs an interest evolving layer an attentional update gate to model the dependency between sequential behaviors. The researches above realized the importance of user’s historical behaviors. Unfortunately, they just project other information (i.e., user-specific and context) into one vector and did not pay equal attention to the interactions between the candidate item and fine-grained information, while modeling this interaction has shown extensively progress in many tasks, such as search recommendations and knowledge distillation.
Different from all previous methods, MIAN can explore the sequential behavior and other fine-grained information simultaneously. Specifically, compared to shallow and deep models, MIAN has a remarkable ability to encode user’s preference through sequential behavior. Compared to sequential models, MIAN can model fine-grained feature interactions better when historical behavior is not enough or representative.
Model | Paper | Publication | Type | Description |
---|---|---|---|---|
LR | Predicting Clicks: Estimating the Click-Through Rate for New Ads | WWW'07 | Shallow | Logistic regression (LR) is a simple baseline model for CTR prediction. With the online learning algorithm, FTRL, proposed by Google, LR has been widely adopted in industry. It’s a widely used baseline and applies linear transformation to model the relationship of all the features. |
FM | Factorization Machines | ICDM'10 | Shallow | While LR fails to capture non-linear feature interactions, Rendle et al. propose factorization machine (FM) that embeds features into dense vectors and models pairwise feature interactions as inner products of the corresponding embedding vectors. Notably, FM also has a linear time complexity in terms of the number of features. |
CCPM | A Convolutional Click Prediction Model | CIKM'15 | Deep | CCPM reports the first attempt to use convolution for CTR prediction, where feature embeddings are aggregated hierarchically through convolution networks. |
FFM | Field-aware Factorization Machines for CTR Prediction | RecSys'16 | Shallow | Field-aware factorization machine (FFM) is an extension of FM that considers field information for feature interactions. It was a winner model in several Kaggle contests on CTR prediction. |
YoutubeDNN | Deep Neural Networks for YouTube Recommendations | RecSys'16 | Deep | DNN is a straightforward deep model, which applies a fully-connected network (termed DNN) after the concatenation of feature embeddings for CTR prediction. |
Wide&Deep | Wide & Deep Learning for Recommender Systems | DLRS'16 | Deep | Wide&Deep is a general learning framework proposed by Google that combines a wide (or shallow) network and deep network to achieve the advantages of both. It jointly trains a linear model and a deep MLP model to the CTR prediction. |
IPNN | Product-based Neural Networks for User Response Prediction | ICDM'16 | Deep | PNN is a product-based network that feeds the inner (or outer) products of features embeddings as the input of DNN. Due to the huge memory requirement of pairwise outer products, we use the inner product version, IPNN. |
DeepCross | Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features | KDD'16 | Deep | Inspired by residual networks, deep crossing to add residual connections between layers of DNNs. DCN is proposed to handle a set of sparse and dense features, and learn cross high-order features jointly with traditional deep MLP. |
HOFM | Higher-Order Factorization Machines | NIPS'16 | Shallow | Since FM only captures second-order feature interactions, HOFM aims to extend FM to higher-order factorization machines. However, it results in exponential feature combinations that consume huge memory and take a long running time. |
DeepFM | DeepFM: A Factorization-Machine based Neural Network for CTR Prediction | IJCAI'17 | Deep | DeepFM is an extension of Wide&Deep that substitutes LR with FM to explicitly model second-order feature interactions. It combines the explicit high-order interaction module with deep MLP module and traditional FM module, and requires no manual feature engineering. |
NFM | Neural Factorization Machines for Sparse Predictive Analytics | SIGIR'17 | Deep | Similar to PNN, NFM proposes a Bi-interaction layer that pools the pairwise feature interactions to a vector and then feed it to a DNN for CTR prediction. |
AFM | Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks | IJCAI'17 | Deep | Instead of treating all feature interactions equally as in FM, AFM learns the weights of feature interactions via attentional networks. Different from FwFM, AFM adjusts the weights dynamically according to the input data sample. |
DCN | Deep & Cross Network for Ad Click Predictions | ADKDD'17 | Deep | In DCN, a cross network is proposed to perform high-order feature interactions in an explicit way. In addition, it also integrates a DNN network following the Wide&Deep framework. |
FwFM | Field-weighted Factorization Machines for Click-Through Rate Prediction in Display Advertising | WWW'18 | Shallow | It considers field-wise weights of features interactions. Compared with FFM, it reports comparable performance but uses much fewer model parameters. |
xDeepFM | xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems | KDD'18 | xDeepFM | While high-order feature interactions modeled by DCN are bit-wise, xDeepFM proposes to capture high-order feature interactions in a vector-wise way via a compressed interaction network (CIN). It uses Compressed Interaction Network to enumerate and compress all feature interactions, for modeling an explicit order of interactions. |
DIN | Deep Interest Network for Click-Through Rate Prediction | KDD'18 | Deep | It’s an early work exploits users’ historical behaviors and uses the attention mechanism to activate user behaviors in which the user be interested in different items. |
FiGNN | FiGNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction | CIKM'19 | Deep | FiGNN leverages the message passing mechanism of graph neural networks to learn high-order features interactions. |
AutoInt/AutoInt+ | AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks | CIKM'19 | Deep | AutoInt leverages self-attention networks to learn high-order features interactions. AutoInt+ integrates AutoInt with a DNN network. |
FiBiNET | FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction | RecSys'19 | Deep | FiBiNET leverages squeeze-excitation network to capture important features, and proposes bilinear interactions to enhance feature interactions. |
FGCNN | Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction | WWW'19 | Deep | FGCNN applies convolution networks and recombination layers to generate additional combinatorial features to enrich existing feature representations. |
HFM/HFM+ | Holographic Factorization Machines for Recommendation | AAAI'19 | Deep | HFM proposes holographic representation and computes compressed outer products via circular convolution to model pairwise feature interactions. HFM+ further integrates a DNN network with HFM. |
ONN | Operation-aware Neural Networks for User Response Prediction | Neural Networks'20 | Deep | ONN (a.k.a., NFFM) is a model built on FFM. It feeds the interaction outputs from FFM to a DNN network for CTR prediction. |
AFN/AFN+ | Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions | AAAI'20 | Deep | AFN applies logarithmic transformation layers to learn adaptive-order feature interactions. AFN+ further integrates AFN with a DNN network. |
LorentzFM | Learning Feature Interactions with Lorentzian Factorization | AAAI'20 | Shallow | LorentzFM embed features into a hyperbolic space and model feature interactions via triangle inequality of Lorentz distance. |
InterHAt | Interpretable Click-through Rate Prediction through Hierarchical Attention | WSDM'20 | Deep | InterHAt employs hierarchical attention networks to model high-order feature interactions in an efficient manner. |
FLEN | FLEN: Leveraging Field for Scalable CTR Prediction | DLP-KDD'20 | Deep | FLEN: Leveraging Field for Scalable CTR Prediction |
FmFM | FM^2: Field-matrixed Factorization Machines for Recommender Systems | WWW'21 | Deep | FM^2: Field-matrixed Factorization Machines for Recommender Systems |