Skip to content

upriyam-cmu/MLIP-F24-GenAdRec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

MLIP-F24-GenAdRec

Authors: Edoardo Botta, Utkarsh Priyam, Hemit Shah

Intro

This repository contains a collection of experiments with ad recommendation methods using the Taobao Ad Display/Click Data dataset, offered by Alibaba.

The project was carried out in the context of the 10-718 Machine Learning in Practice course at CMU.

Non-ML heuristic baseline

Our simplest baseline is a heuristic approach.

Given ad features $\boldsymbol{X}={X_1,\dots,X_n}$, we can score any given ad as $f(\boldsymbol{X})\propto\hat{P}(\boldsymbol{X})=\prod_{i\in[n]}\hat{P}(X_i|X_{<i})$, where $\hat{P}$ is the empirical distribution of the specified ad features $X_i$ for a given user. We take each user’s historical ad clicks and generate, for each ad feature, the histogram of category frequencies from these historical interactions. We then score all existing ads using these empirical distributions as described previously to find their relative likelihood (of a click) for this user. Note that the “generated” ad is simply the ad constructed by the empirical mode of ad features from historical user interactions, and we are using the distribution of ad features from those that the user has clicked on to construct a user-specific likelihood metric for all existing ads.

Execution script for this baseline is found at genadrec/non_ml_baseline/non_ml_baseline.py

Simple ML baseline

Our simple ML baseline model takes in the unique user ID (as we seek to independently model a distribution over categorical ad features for each user), and generates as output the distribution over each target ad feature, with a simple parallel MLP architecture (shown below).The produced distributions are then combined with the same method as the non-ML baseline to produce a joint distribution and scoring function over all possible ads. All ads (including the held-out latest ad click interaction per user) are scored according to each user’s joint distribution over ad features they are likely to positively interact with as previously described. These scores are used to rank all ads, and the NDCG is computed using the rank of the held-out target ad from the test split.

simple-ml-arch1

While no feature can be uniquely partitioned by some combination of the others, we noticed that conditioning on a feature subset dras- tically reduces the options possible for the remaining features. For instance, while over 500k unique campaigns exist, most category-brand pairs had fewer than 10 valid campaign or customer ID options, within the set of all existing ads. Exploiting this could significantly improve model performance on the relevant subset of feature values. To take advantage of this, during data processing we pre-partitioned each ad feature conditioned on preceding features in the hierarchy. More concretely, we produced mappings from all categories to valid brand IDs, category-brand tuples to valid customer IDs, and category-brand-customer tuples to valid campaign IDs. These mappings provide a new set of engineered targets, which we leveraged to shift the model prediction targets from marginal to conditional distributions for each feature. We used this hierarchy to reformulate the loss objective in terms of the conditional distributions of features given all preceding features, mirroring the learning target of the non-ML baseline. Specifically, we developed a novel masked-cross-entropy loss, which allows for the target distribution to be extracted as a subset of all possible logits in a larger distribution. This unique strategy allows the model to explicitly learn the conditional feature distribution instead of the entire marginal distribution.

Our first architectural improvement involved cascading the four parallel/independent MLP networks used previously into a hierarchical structure involving residual/skip connections. The new model architecture is shown below, with the added connections shown in gold/bold.

simple-ml-arch2

Training script for this model is available at genadrec/simple_train.py.

Two-Tower Model

We experimented with a more complex, two-tower contrastive embedding-based approach. The overall goal is to produce a vector representation of a user, that is similar to the representations of ads that they are likely to click on and dissimilar from representations of ads the user is unlikely to click on. We generate the ad embedding by embedding categorical features and passing the concatenated vectors to an MLP architecture. The user embedding is generated by a learned embedding table lookup. These two components are jointly trained in a contrastive fashion by leveraging interactions with clicks as positive example and interactions without clicks as negative examples. The loss function used is the sampled softmax loss. The architecture diagram for the two tower model is shown below.

two-tower-arch

Training run for this model is executed by passing model_type=ModelType.TWO_TOWER to the Trainer class initializer at genadrec/tt_seq_train.py.

RNN Sequence Model

1. Click Sequence Model

Improving upon the static two-tower modelling, we introduced a sequence model based on GRU cells, which construct a user embedding by using the sequence of the user’s past interactions. The input is sequence is composed of ad embeddings from past interactions, chronologically ordered summed with an action embedding, which distinguishes between interactions that represent a past ”click” and ones that represent a past ”non-click”. The model is then still trained contrastively to predict to align the align the embedding formed by the user sequence with the next ad click. Compared to the pointwise two-tower approach, this model is characterized by more efficient data usage, as the negative interactions are themselves part of the sequence instead of only being used within the constrastive loss. Hence, this approach is not only learning to predict the next ”click” given past clicks but it is effectively learning to predict the next ”clicks”, given past ”clicks” and ”non-clicks”.

seq_model_diagram

Training run for this model is executed by passing model_type=ModelType.SEQ to the Trainer class initializer at genadrec/tt_seq_train.py.

2. Behaviour Log Sequence Model

Our original dataset used to train the models only included users’ ad-click or ad-non-click interactions, in addition to user and ad specific features included in the dataset. We retrain the sequence model on the same click targets by further augmenting the ad-click, ad-non-click data with other user behaviors on the Taobao shopping platform, including their browse (feed-listing-click), favorite, add-to-cart, and purchase interactions with other items on the platform.

This option is enabled by passing the additional parameter behavior_log_augmented=True to the sequence model Trainer class initializer at genadrec/tt_seq_train.py.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •