BST

It stands for Behavior Sequence Transformer.

Deep learning based methods have been widely used in industrial recommendation systems (RSs). Previous works adopt an Embedding & MLP paradigm: raw features are embedded into low-dimensional vectors, which are then fed onto MLP for final recommendations. However, most of these works just concatenate different features, ignoring the sequential nature of users' behaviors. BST model uses the powerful Transformer model to capture the sequential signals underlying users' behavior sequences for recommendation.

Inspired by the great success of the Transformer for machine translation task in natural language processing (NLP) , this model applies the self-attention mechanism to learn a better representation for each item in a user’s behavior sequence by considering the sequential information in embedding stage, and then feed them into MLPs to predict users’ responses to candidate items. The key advantage of the Transformer is that it can better capture the dependency among words in sentences by the self-attention mechanism, and intuitively speaking, the “dependency” among items in users’ behavior sequences can also be extracted by the Transformer.

BST

References​

References