-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protected Audience AB testing #909
Comments
Hello, |
This has been discussed in the WICG from 29th of November 2023. There has been a question from @michaelkleber on why the scenario on interest group 1st party user id split would not work. Let's image a scenario where I want to test 2 buying strategies across all my advertisers, one where I always bid 1 EUR (A) and one where I always bid 2 EUR (B). In todays world I would propose either strategy A or B to users and then measure how much displays, clicks & sales I get. Note that paying less doesn't mean the user will also buy something. What I would like to have is the best buying strategy. Now let's say 1 chrome browser does 1 auction. I have 2 advertisers, create 1 interest group per advertiser and then split by advertiser 1st party user id. During the auction each IG participates in the auction and out of all Chrome browsers we would have 25% that see AA scenarios, 25% AB%, 25% BA and 25% BB scenarios. So if I can't apply a unique split inside one auction this form of split doesn't seem to work at all for cross advertiser buying strategies even for retargeting campaigns. As a side note splitting by time (hour, day, ..) usually doesn't work because users don't have the same behavior over time (see black friday for example) EDIT: removed cost per sales metrics to simplify the example |
Jumping on the subject, to double down on what @fhoering explained, the issue here is that we won't be able to measure during the test what will happen when this tested modification is rolled out. For instance, if one user has two interest groups for one adtech and one (IG1) is in reference population (no modification of the bidding) and the other (IG2) is in test population. Thus, the measure during the test will be impacted by competition within an adtech, which won't happen after roll out. |
@alois-bissuel I think we talked about this during the 2023-11-29 call. This kind of bidding experiment is one where it makes sense to randomize A/B diversion based on a 1st-party identifier on publisher site. Now all of your IGs will compete against your other IGs using the same strategy on a single page (or even across a single site), so it will be reflective of the effect of rolling it out broadly. |
In reality changing the bid strategy is a complex behavior. So it will never be as easy as knowing in advance what effect this will produce like the bid will be always lower in all cases. And in the case of a split by publisher 1st party id I will have the problem that I cannot know which bid strategy produced the user's conversion behavior at the end. If he goes to publisher1 (high bid, sees several ads), publisher2 (low bid, no ad), publisher3 (low bid, sees one ad and clicks) => then buys something. To me this ask still makes sense and 3 bits seems reasonable and very aligned with the shared storage API. It could be seen as converging all Privacy Sandbox APIs. |
In Taboola we also need this kind of stickiness in some types of A/B tests, so it's very important for us as well. |
Why do we need A/B tests?
To give an example of what we mean by long term effects let’s look at a complex user journey and assume that we split users per publisher website (because we have access to the hostname in PA API), on some websites we propose buying strategy A and on some publisher websites we propose buying strategy B, and that we can measure conversions like sales for each ad display.
In retargeting, we show a banner to users multiple times before they buy. For example, a user has added Nike shoes to his basket but has not converted, we will remind him of the product, through ads on several publishers. When he converts, the sale will be attributed to the publisher on which was shown the last ad and not to whatever happened before that. In other terms, it is impossible to measure the effect of a buying strategy A versus B since we will not have a single identifier across sites.
Existing mechanism with ExperimentGroupId
https://github.com/WICG/turtledove/blob/main/FLEDGE.md#21-initiating-an-on-device-auction
The expected workflow has been described here:
Extending FLEDGE to support coordinated experiments by abrik0131 · Pull Request #266 · WICG/turtledove
Our understanding is that this translates to:
Pros:
buyerExperimentGroupId
can be dynamically set by buyer as part of the contextual call allowing any split (see comment below, it already might not apply anymore as async calls should be used to reduce auction latency)reportWin
Cons:
Splitting per interest group and 1st party user id
Doing a per interest group split seems appealing because for interest groups that are created on one advertiser website one could apply the same changes to the same campaigns to all 1st party users of this advertiser.
This would mainly work for single advertiser AB tests where we target users that already went to advertiser web page. It would work less well for more complex scenarios on all our traffic where we modify the behavior of multiple campaigns on multiple websites and in this case the same drawback as above, the very same user could see behavior changes in population A and B.
As we would split users during tagging phase we cannot guarantee that we really see those users again for a bidding opportunity. So we cannot guarantee an even split as for bidding we might only see n% of users of population A for bidding and a different amount for population B (some more explanation here Approach 2: Intent-to-Treat)
Pros:
Cons:
Using shared storage for AB testing
The shared-storage proposal already has a section on how to activate AB tests. The general idea is to create a unique user identifier (seed) for the Chrome browser with generateSeed, then call the window.sharedStorage.selectURL operation which takes a list of urls, hashes the user identifier to an index in this list and then returns the url for that user. The AB test population would be encoded in the url and as the number of urls is limited to 8 urls it would allow 3 bits of entropy for the user population. As different urls can be used for each call and would leak 3 bit all the time some mechanisms are in place to limit the budget per 24h per distinct number of urls (see https://github.com/WICG/shared-storage#budgeting).
As of now shared storage can only be called from a browser Javascript context and not from a Protected Audience worklet. This means the urls selection can only happen during rendering and not during bidding and therefore shared storage can only be used for pure creative AB tests and not Protected Audience bidding AB tests. So we still need a dedicated proposal to activate Protected Audience AB tests.
Proposal - Inject a low entropy global user population into
computeBid
For real world scenarios a global user population would still be needed for AB tests that need to measure complex user behaviors. As injecting any form of user identifier would leak additional information we propose a low entropy user identifier and some mitigations to prevent using or combining this into a full user identifier.
Chrome could cluster all users into a low entropy
UserExperimentGroupId
something like 3 bits. This identifier should be randomly drawn for each ad tech and not unique to all actors to prevent that our measurement cannot be influenced by the testing of other ad techs.As attribution is measured for each impression or click we would like this identifier to be stable for some time but it should be also shifted on a certain amount of users to prevent a large population drift over time. Long running AB tests will influence users and then user behavior will change over time introducing some bias. The usual way to solve this is restarting an AB test which cannot be done here for such a limited amount of buckets. So one idea might be to constantly rotate the population. Constantly rotating the population would be also useful to limit the effectiveness of a coordinated attack among Ad Techs to identify a user. If 1% of users get reassigned to population each day it would mean that after 14 days 14% of user might have shifted population.
If the labels are rotated every X weeks, it adds further burden to those trying to collude and update their 1st-party ID → global ID mappings
This new population id would be injected only into the
generateBid
function and also the trusted key/value server (to mirror current ExperimentGroupID behavior and because many of our computations are still server side, it is secure by design as it will run in a TEE without side effects).The identifiers could only get out of the of
generateBid
function via existing mechanisms and that already present privacy/utility trade offs, for example:If we encode the 3 bits into renderUrl this proposal seems very aligned with the proposal on shared-storage to allow 8 URLs (= 3 bits of entropy) for
selectURL
to activate creative AB testing (post bidding). In our case as Chrome would control the seed and thegenerateSeed
function can not be used we would not leak more than 3 bit. So introducing any form of budget capping seems not necessary.To prevent some cookie sync scenario where ad techs combine this new id into a full user identifier Chrome could add an explicit statement to the attestation to prevent ad techs sharing this id.
By design as we have few AB test populations we could only run a limited number of AB tests at the same time but we could reserve this for important AB tests and use the ExperimentGroupId mechanism more for technical AB tests.
The text was updated successfully, but these errors were encountered: