Skip to main content

DGTN

DGTN stands for Dual-channel Graph Transition Network.

research paper

Zheng et. al., β€œDual-channel Graph Transition Network for Session-based Recommendation”. arXiv, 2020.

The task of session-based recommendation is to predict user actions based on anonymous sessions. Recent research mainly models the target session as a sequence or a graph to capture item transitions within it, ignoring complex transitions between items in different sessions that have been generated by other users. These item transitions include potential collaborative information and reflect similar behavior patterns, which we assume may help with the recommendation for the target session. In this paper, we propose a novel method, namely Dual-channel Graph Transition Network (DGTN), to model item transitions within not only the target session but also the neighbor sessions. Specifically, we integrate the target session and its neighbor (similar) sessions into a single graph. Then the transition signals are explicitly injected into the embedding by channel-aware propagation. Experiments on real-world datasets demonstrate that DGTN outperforms other state-of-the-art methods. Further analysis verifies the rationality of dual-channel item transition modeling, suggesting a potential future direction for session-based recommendation.

Architecture​

An illustration of the model architecture. Taking item $v_3$ for example, we propagate the embeddings of neighbor items in the target session $s$ and the neighbor session set $N_s$ towards the next layer in the intra- and intersession channel, respectively. Then the embeddings of the final layer in the two channels are fed into the fusion function to obtain the final item embedding $v_3^*$. Based on the learned item embeddings, we generate the session embedding via session pooling. Finally, we apply a prediction layer to generate the recommendation probability $hat{y}$.

An illustration of the model architecture. Taking item v3v_3 for example, we propagate the embeddings of neighbor items in the target session ss and the neighbor session set NsN_s towards the next layer in the intra- and intersession channel, respectively. Then the embeddings of the final layer in the two channels are fed into the fusion function to obtain the final item embedding v3βˆ—v_3^*. Based on the learned item embeddings, we generate the session embedding via session pooling. Finally, we apply a prediction layer to generate the recommendation probability y^\hat{y}.

Given a session ss, we first determine its neighbor set NsN_s consisting of its rr most similar previous sessions. To simplify the computation, we use the number of duplicate items between sessions to calculate the similarity. Based on the ranking of the number of duplicate items between each session and the target session, we sample the top-rr sessions to constitute the neighborhood set NsN_s. Then we model the target session ss and its neighbor sessions in NsN_s as a session graph GsG_s, where each node represents either an item that appears in the session ss or any neighbor session in NsN_s. Each edge (vi,vj)(v_i , v_j) denotes a user clicking on item vjv_j after viv_i in the session ss or any neighbor session in NsN_s.

To encode the item transition signals into item embeddings, we follow the message-passing structure of GNN. Each item node viv_i in the graph aggregates messages passed from its neighbor item nodes in the graph. However, the constructed session graph contains item nodes from target session ss and neighbor sessions NsN_s, which reflect the users’ current behaviors and the global collaborative information, respectively. They ought to have different effects on the recommendation. Therefore, we propagate the item embeddings in two channels for the two types of neighbor item nodes. In the intra-session channel, we aggregate messages from only the neighbor nodes in VsV_s. In the inter-session channel, we aggregate messages from the neighbor nodes in VNsV_{N_s}.

Only the normalized sum of neighbor embeddings are propagated towards next layers:

Untitled

where did_i is the degree of node viv_i in the adjacency matrix. aija_{ij} = 1 denotes that there is an edge between node viv_i and vjv_j, and a missing edge is represented through aija_{ij} = 0.

Then we need to fuse the information of these two channels together and extract effective features. We exploy CNN pooling method for this.

We combine both users’ long-term preference and short-term interests of the session to generate the final session embedding. For the session s={v1,v2,…,vn}s = \{v_1, v_2, \dots, v_n\}, we use the last click item’s embedding to represent user’s short-term interests. Then we obtain the long-term preference by adopting a soft-attention mechanism to draw dependencies between the short-term interest and each item in the session.