Skip to main content

CIGC

To pursue high efficiency, we set the target as using only new data for model updating, meanwhile not sacrificing the recommendation accuracy compared with full model retraining. This is non-trivial to achieve, since the interaction data participates in both the graph structure for model construction and the loss function for model learning, whereas the old graph structure is not allowed to use in model updating. Causal Incremental Graph Convolution (CIGC) estimates the output of full graph convolution. Incremental Graph Convolution (IGC) ingeniously combine the old representations and the incremental graph and effectively fuse the long-term and short-term preference signals. Colliding Effect Distillation (CED) aims to avoid the out-of-date issue of inactive nodes that are not in the incremental graph, which connects the new data with inactive nodes through causal inference. In particular, CED estimates the causal effect of new data on the representation of inactive nodes through the control of their collider.

Given new interactions for refreshing an old GCN model, there are three straightforward strategies:

  • Full retraining, which simply merges the old data and new interactions to perform a full model training. This solution retains the most fidelity since all data is used. However, it is very costly in both memory and time, since interaction data is usually of a large scale and keeps increasing with time.
  • Fine-tuning with old graph. Interaction data participates in two parts of a GCN: forming the graph structure to perform graph convolutions and constituting training examples of the loss function. This fine-tuning solution constructs training examples with new interactions only, while still uses the full graph structure. As such, although this solution costs fewer resources than the full retraining, it is still costly due to the usage of the old graph.
  • Fine-tuning w/o old graph. This solution uses only the new interactions for model training and graph convolution. The old graph is not used in graph convolution, which saves many computation and storage resources because of the high sparsity of incremental graph. However, the new interactions contain only users’ short-term preferences, which can differ much from the long-term performances and be much sparser. In addition, it cuts off the connection to the inactive nodes that have no new interactions. It thus suffers easily from forgetting and over-fitting issues.

An illustration of the incremental graph at stage t + 1 in GCN retraining.