-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
forDebuggingOnly availability #632
Comments
Hi @jonasz, yes they are supported in both A and B modes. |
@jonasz we are actually thinking more about the privacy risks of the two parts of the forDebugOnly APIs, we need to think more about this, let us get back to you soon, hopefully next week |
Do you think some sampled mode could be acceptable in the long term ? Something small enough that it doesn't allow to do any user identification, like 1% of forDebuggingOnly.reportWin & forDebuggingOnly.reportLoss ? |
@ajvelasquezgoog friendly ping, any updates on this issue? |
We thank everyone interested for your patience on getting updates on this matter. We have been working closely with and collecting feedback from stakeholders over the last several weeks, and examined the efforts required to adapt to the full removal of these functions by the 3PCD deadline. The intent of The incremental feedback that we have been receiving in the last few months on this plan can be summarized as follows:
Given that, we think there is a path to continue supporting these use cases with a certain level of fidelity that will be acceptable, that also continues to meet our privacy goals. In essence we think it is possible to keep the In essence the proposal entails the introduction of 3 Chrome-controlled variables that will modify the current behavior of the New variable 1: Sampling Rate. Denotes how often a call to the New Variable 2: Cooldown Period. Denotes for how long (in days) should a single Chrome client, for a given calling adtech, return the same FALSE result for o after running the randomizing function that determines that the result should be FALSE. Let us call this variable New Variable 3: Lockout Period. Denotes for how long (in days) should a single Chrome client, for any and all calling adtech return a FALSE result for o after it returns TRUE for o once after running the randomizing function. Let us call this variable In other words, when one ad tech calls a
Based on these variables, and based on 2 reasonable assumptions we can make:
We calculate that in legitimate scenarios like the ones detailed in the opening paragraphs of this reply that each participating adtech should be getting between ~4.7K and ~5.4K daily reports, if they choose to implement We also want to highlight the protections that we see against malicious scenarios with this approach. A malicious actor that knows that the sample rate is 1/ We believe that with this proposal, we can accomplish the goals we set out in our opening paragraphs. Any and all feedback is very appreciated! @jonasz here you go |
We are thrilled to see longterm support for these debugging APIs and look forward to the improved observability as we mature our integrations. I wanted to raise two concerns with the details of the above proposal. In practice, we have observed some overhead when enabling debug codepaths due to the additional code profiling and report building. Given the highly latency sensitive worklet execution environment, we would recommend a mechanism to detect availability of Additionally, we are concerned about the shared lockout period given the threshold for critical situations may differ across adtechs. If one buyer decides to frequently invoke the API or unintentionally introduces a major bug which accelerates their call rate to 100%, should this lockout another buyer who needs to debug their own rare exceptions or sudden incidents? |
Hmm. There are two different things you might be asking here:
I think 1 would need to be an API that actually performed the die roll and triggered the cooling-off period 999/1000 of the time that 2 returned I'm not sold on either one of these, but which are you asking for?
If one buyer spams the API for whatever reason, the worst they could do is lock out 1/1000 of people for everyone else. The cooling-off period that happen 999/1000 times isn't shared state — it is only the 3-year lock-out that would let one ad tech affect another ad tech. |
Thank you for the responses. Could you kindly elaborate on why triggering the cooling-off period is necessary when detecting API availability? Is the concern that we would have access to the 1/1000 device-sticky decision to truly send the debug report, and that this may influence or leak out of the internal worklet execution? I believe we're asking for (1), but without tripping the cooldown period given this (a) effectively incurs the statistical cost of always invoking the API and (b) may be a surprising side effect for all developers. I'm afraid this may incentivize us to always invoke the API if it's detected rather than save it for error states; alternatively we should just accept the overhead and restrict building the event messages to truly exceptional scenarios. Great point about the difference between global lock-out and per-adtech cooling-off periods, I agree that the interplay of these successfully mitigates the impact of a spammy adtech. One final note: as a user of |
An API that is of the form "If I asked for a report right now, then would you send it?" would completely eliminate the 1-year cooling-off period, right? — after all, nobody would ever call the debugging API if they knew that it would not send a report. Your request would allow circumvention of all the "protections against malicious scenarios" that Alonso described above. Or maybe I'm still misunderstanding what you're asking for? On the other hand, I don't see any harm in an API of the form "Am I currently cooling down and/or locked out?" That would let you build your debugging requests much less often than without it, even though you would still only have a 1/1000 chance of sending each one that you built. @JensenPaul WDYT? (Regarding "lockout" vs "cooldown", I personally feel like "lockout" feels more global, like "the door is locked", while "cooldown" seems more caller-specific, as in "you are over-heated, go take a walk and cool down and then you can come back and join the rest of us." But if other people have opinions on these or other more intuitive names for the two states, please share!) |
Ah, I was assuming that the FALSE die roll was cast once per worklet function execution and there was no way to coordinate a loop based attack external to these functions. Thinking outside of that box, it does become clear why the check itself requires a cooldown. Any mechanisms to minimize the overhead of the API usage would still be welcome. Overall, the statefulness of this API makes it more difficult to conceptually model an observability framework compared to traditional random sampling. I wonder if there might be issues here with a population more prone to exceptional circumstances gradually dwindling overtime due to the cooldown, as well as the true rate of an exception becoming invisible without a fully transparent sampling rate? I also worry about the longterm repercussions of an initial, overly lax threshold of exceptional events, e.g. an adtech accidentally locking themselves out of the API for a year. |
Thanks. I think the "Am I currently cooling down and/or locked out?" API would indeed help with minimizing the overhead, we'll explore that.
I agree with this concern, but I haven't come up with any other way to preserve the privacy goals.
Yes, great point, that does seem like it's too easy to accidentally shoot yourself in the foot. Instead of a 1-year cool-down when you don't get a report, I wonder if we could instead have a shorter timeout, like 1 week, that would trigger 90% of the time, and a 1-year timeout the other 10%. Then even if an ad tech shipped a bug that asked everyone to send a debug report all at once, they would recover the ability to debug on 90% of their traffic a week later. (All percentages and time durations subject to change, but at least it's an idea.) |
Okay, I've done a little simulating of this idea of the anti-footgun two-cooldowns idea — thank you Google Sheets for the "Iterative calculation" capability in the File>Settings>Calculation menu. Suppose that when you ask for a debug report in a Chrome instance which is not in the cool-down or lock-out state,
Which is to say: if you accidentally push into production a bug that asks everyone in the world to send you a debug report, you would regain your ability to do selective debugging on 90% of browsers after two weeks, instead of after one year. In that case, with 100 ad techs spamming the API as much as possible, each one gets around 6500 debug reports per day per billion Chrome instances. If there were only a single ad tech using the API, they would instead get around 20K reports per day per billion, so the global lock-out mechanism cuts the number of reports to about 1/3 of what it would be otherwise. The truth will probably be somewhere between those two extremes. |
I'm curious about more insight into the rationale for the 1- or 3-year "long-term" lock-out / cool-down intervals... and the math that goes into what the minimum privacy-safe interval would need to be. Follow-up question -- have we considered that 1/1000 to be defined per |
The numbers are admittedly somewhat arbitrary! But sure, here is my thinking:
Sorry that I don't have a closed-form formula for the reports-per-day figure. I had one back when there was only one kind of cool-down, but once a second cool-down rate came along, simulation seemed like the only viable way. |
I suppose that's the most important question -- given the great lengths to which PS goes to ensure anonymity, there seems to be some wiggle room in this endpoints which could, in principle, allow some non-"me" specific information to be used for debugging that wouldn't be about the user. For example -- am I scoring k-anon bids that way that I'm expecting as a seller?
|
I completely agree that the browser can be more relaxed about information when it is either information from a single site or information shared across many users. But bidding functions necessarily have information from two sites (the IG Join site and the publisher site hosting the auction), with no k-anonymity constraint on either of them; and scoring in a whole auction implicitly involves information from many sites (the IG Join sites of every IG that bids). I don't see any way that the browser can possibly be more relaxed about that sort of many-site user-specific sort of information. |
In order to assess the sampling and other parameters, it will be useful if the API provides three bits that tells whether the report is sampled, whether the device is in cooldown, and whether the device is in lockdown period respectively, before rolling out this sampling mechanism. These can be reported via URL params appended to the reporting URL string:
We’re aware that it is possible for each adtech company to implement all the logic to simulate sampling/cooldown/lockout themselves while the 3P cookie is still available. However, it will be an additional work with some inaccuracy (as 3P cookie doesn’t map to device perfectly). |
Just to clarify -- this cooldown is per ad tech (i.e. tied to
And I want to make sure I understand the distinction, and implications thereof. |
@ardianp-google: Good point, we should make it easy for consumers of the reports to understand what impact downsampling will have. I doubt we can offer all three bits, but I think the one bit from option 2 above gets a lot of the benefit. @rdgordon-index: Yes, the 999/1000 cooldown is per ad tech, while the 1/1000 lockout happens only after sending a report, and is global across all ad techs. The way to think about the global nature is "Once a browser sends a single report, it will wait years before sending another one." |
Doesn't this provide another 'key abuse' mechanism, where ad techs can inadvertently affect each others' debug calls? |
There is a risk, but remember than if another ad tech calls the API for everyone in the world, they have no impact on your debugging call on 99.9% of browsers. It's true that if another ad tech keeps calling the API over and over, then some fraction of the population ends up locked out in the steady state, and if lots of other ad techs do this, then the fraction of the population you have available for reporting goes down. I've put together a little Google Sheets calculator that uses the parameters I suggested above to approximate what happens in a few scenarios. (Thank you to @alexmturner for pointing out the 4x4 matrix whose principal eigenvector makes this run.) https://docs.google.com/spreadsheets/d/1q-uBH7F_NAEWjqcGSChXj6TFbsQ4WK-p83RJTZrty9s/edit#gid=0 For example, with the above cooldown parameters and even with 25 ad techs calling the API as often as possible, 35.9% of browsers could end up in the lockout state — so you would still get reports from the other 2/3 of the population. |
I was wondering, aside from the discussion about the target shape of the API - can we assume that in |
The downsampling idea for |
The proposal in its current state cannot support our needs. First, we need info from won displays in order to compare online data with reported data e.g for the modeling signals field. Second, we would need the same number of reports (100,000) for losses to ensure there is no error leading to systematic loss. This means the sampling should apply independently on wins and losses. We are also a bit worried about the bias introduced by the cooldown and lockout periods which means only new Chrome browsers will send debug reports. Potentially automated bots will generate more reports than real Chrome users. With the following parameters and using above excel file:
We would get 100 000 events per day for wins and for losses. Please note, that in parallel, we made the complementary proposal #871 for offline debugging needs. |
Hello Fabian, Happy new year, sorry for the delay in responding. Certainly this proposed debugging API will not serve all needs, and if your goal is "to compare online data with reported data e.g for the modeling signals field" to find cases of buggy behavior, then I think the laboratory simulation approach discussion in #871 is quite valuable.
I think this different treatment of wins and losses would already be in your power. The two functions
I don't think these numbers are realistic. First, the value "100 ad techs" in the spreadsheet is not meant to be the total number of ad techs, it is meant to be the number of ad techs that are calling the reporting APIs constantly, and so are always in the cooldown-or-lockout period. This is a worst-case scenario, meant to illustrate that you would still be able to get a reasonable number of reports even if many ad techs were conspiring to run a denial-of-service attach to prevent all reporting. I think it is much more likely that ad techs would be selective in exactly the way you want to be: call the API only on a small fraction of "normal" traffic, and call it at a higher rate when something "interesting" happens. This would put many fewer people into lockout, and everyone doing this would get many more "interesting" reports than the spreadsheet's lower bound. A noteworthy part of my 14d-1yr-3yr parameters is that ad techs who did decide to call the API every time would mostly hurt themselves, because they mostly would end up in the cool-down period. Your changes have a big effect: they mean that an ad tech who calls the API all the time would hurt other ad techs a lot more, and hurt themselves a lot less. That means much less incentive for people to be thoughtful about how they use the API. I also don't feel that your parameters have a particularly good privacy story. They would lead to each browser sending a debugging report roughly every 3 months. That means that if the ad tech ecosystem decided to use this as a tracking mechanism, they could join up every person's behavior across 5 sites per year. With my proposed parameters, a browser only sends a report around once every 8 years — so in a year, around 85% of people would send no report at all, and the other 15% could at worst be linked up across only two sites (and those people would surely send no reports at all for three years thereafter). |
Thanks -- I missed this all-important line - https://github.com/WICG/turtledove/pull/1020/files#diff-d65ba9778fe3af46de3edfce2266b5b035192f8869280ec07179963b81f4e624R1232 |
Hey @michaelkleber can you help me understand what this means a bit better? I asked around and don't think we actually have clarity here yet, at least not the kind we can make an implementation choice, even for short term adoption purposes, with. The removal of 3PC has already started and has a planned ramp up date starting sometime in Q3 of 2024, so "as part of the removal of 3PC" could/should be interpreted as having already happened, but it seems like this is meaning to say that the forDebuggingOnly is still usable 100% of the time for some further period? I'd ask that we detail this broken down something like the following: Let's call "Unsampled/Unconstrained Availability of forDebuggingOnly" reporting the state where it can be called and will work immediately in any auction w/o limit or lockouts, and "Sampled Availability..." the state we'll get to eventually with lockouts and whatnot. Current cohortsMode B Treatment 1.* LabelsFor the set of Chrome browsers currently with unpartitioned 3PC access disabled AND sandbox APIs available:
Everything ElseFor All \ aboveCohort, same questions. Next Ramp Up Round, Whenever That IsCurrently planned for Q3 2024, but let's just say on date X when more browsers move into the "yes PS APIs but no unpartitioned 3PC access" group. So, similar questions as above:
I can understand why we'd want forDebuggingOnly not to have an official support date, but a) it seems like we're now giving one to some deprecated*URN functions b) publicly stating the implementation priorities are forcing this would be reasonable and c) I have at least one choice to be made based on the robustness of this timeline, and I suspect I'm not the only one. |
Hello @ajvelasquezgoog , do you know the answer to this? |
Feature rolling out status update: It runs the down sampling algorithm on forDebuggingOnly reports, updates the Explainer: https://github.com/WICG/turtledove/blob/main/FLEDGE.md#712-downsampling |
Thank you @qingxinwu.
I see the |
Hi, I’m working on the post-3PCD debug reporting framework for the Protected Audience API (FLEDGE), and I had a question about how the lockout/cooldown downsampling is implemented. Currently, if we have debug reporting on and 10 interest groups running in an auction, each of them will register a debug report to send on a winning/losing ping after the auction is complete. With the new downsampling, only one of these interest groups would be able to send a report before the browser is either put in cooldown or lockout, right? In the new downsampling API, when is sampling calculated, at the time that a report is registered using |
I also have a question about the rollout of the new downsampling API. Right now almost all of our browsers are sending some kind of debug report. I know you have already made the Currently I would imagine a large amount of browsers are accumulating in the lockout and long cooldown states, as we are calling |
The states collected now during this test phase will be reset at the time of enforcing downsampling, to avoid a large portion of browsers locked out by the time 3PCD happens. |
At most one. Note that it's possible that none of the debug reports from this auction is picked, and a future auction may have one picked by the sampling algorithm. See the spec for more details.
The sampling is calculated after the auction completes, because we need to know the auction result (winner) to know which debug report (win/loss) to collect from buyer/seller scripts. If a buyer loses the auction and it only calls debug win API, then it won't be sampled (thus not locked or cooldown) because it has no debug report to send anyways.
Only at most one will go through, but not necessarily the first one. It is randomly decided about whether to send a report or not (before one has been picked to send and all future reports will be dropped after that). And again, it's possbile that none of them goes through due to the randomness. |
Are you still not seeing loss callbacks? Loss reports are not expected to be suppressed. |
The spec states: "sampling rate is 1/1000, which means only sending reports 1/1000 times the forDebuggingOnly API is called." This seems to indicate that the sampling rate is per forDebuggingOnly call, which means it is per IG. This comment states: "It is randomly decided about whether to send a report or not (before one has been picked to send and all future reports will be dropped after that)" This comment, however seems to indicate that the sampling rate is per auction. Could you please clarify whether the sampling rate (and other related rates in the spec) are per forDebuggingOnly call (i.e. per IG) or per auction? |
Sorry about the confusion it brought, but I don't fully understand what "per auction sampling rate" means here.
Each fDO call has a 1/1000 (the current sampling rate) chance to be picked. After one report is picked, the browser will be locked out for a lockout period (currently set to 3 years). While the browser is locked out, all fDO calls (including the ones that have not been sampled yet from the current auction, and those from future auctions) from this browser will just be dropped and no sampling is needed. Those rates are applied per fDO call, as the spec indicated. Let me know if that's still not clear. |
Hello, We (Criteo) are currently investigating using the The flag only shows if the report would have been dropped because there is a global lockout for the user or we are in cooldown for the user, right? It does not indicate if the report would have been dropped because of the 1/1000 sampling? However, I assume you do the 1/1000 sampling internally in Chrome to calculate the flag? We observe that more than 0.1% of the callbacks we receive have the flag set to false, and we would have expected it to be less if the 1/1000 sampling for the particular callback was included in the flag. So, if we want to know how many callbacks we receive once the downsampling mechanism is enforce, we should consider we will get only 1/1000th of the callbacks with the flag set to false, is that correct? |
Yes, your understanding is correct: the 1/1000 sampling only happens if you actually call the API, and it is not reflected in the value of the flag. See #632 (comment) above for discussion of that design decision. |
Dear all, At Criteo we had a deeper look at the forDebuggingOnly availability and we have the following feedback for the current state of the proposal, and based on them we are proposing some changes. We believe that the current state has the following limitations:
For theses reasons, we are proposing the following changes to the fordebuggingOnly specification:
We would like to hear feedbacks on these proposed modifications. |
@BasileLeparmentier I appreciate your attempt to offer an alternative that "would mean a data leakage approximately once per device life." But if I understand your proposal correctly, the size of this once-every-three-years data leakage is vastly larger. With heavily-downsampled forDebuggingOnly, the most that a malicious ad tech could possibly leak, if they get lucky, is the ability to recognize the same browser behavior on two different websites: the Interest Group Join site and the site where the auction is taking place. It seems to me that with your proposal, the malicious ad tech could leak all of your activity and identity across all sites that you had visited in the past 30 days (or longer if #855 happens) — and moreover that every malicious ad tech could learn that, since you've removed the lockout-based need to get lucky. That level of privacy risk does not seem viable. While I understand that heavily-downsampled forDebuggingOnly cannot meet all of your visibility needs, remember that the Privacy Sandbox goal is to offer multiple tools, each with their own privacy protections, that let you get insight into different kinds of questions you might ask about what happens inside auctions. Combining fDO with Private Aggregation (central DP) and with Real Time Monitoring (local DP) should give a richer picture, and in particular should let you use the relatively small number of fDO reports at just the right time to do the debugging you need every day. |
Dear Michael, With our suggestions, there will indeed be some leakage (albeit at a low rate of once every three years) which means that this leakage wouldn't be economical even for malicious ad techs. To reduce the risks, we could also add an IG sampling mechanism depending on the browsers (via a hash(browser_id, IG)), where only 10 consistent IGs are returned by the forDebuggingOnly API for a given day. Overall,the point we are making is not that the current specification of this API cannot meet all our visibility need, it is that it meets none.
Debugging usually requires to reproduce a bug to be able to spot and fix them. By definition, they really are very hard to spot and fix with aggregated noisy data, either central or local DP. Even if the root cause is usually shared, the produced data can be in very high dimension. These aggregated data will point toward a direction, which is really useful, but with the amount of data you are proposing, the mitigation stage, where we actually fix the issue will be done completely blindly, which means that very often it will not be done. We believe that debugging will be integral to the success of the PA API. It will ease the adoption of the PSB, today very hard as there is so many way to get a step wrong. It will also enable to secure performance on the long run and avoid 'death by a thousands cuts' which may jeopardize overall success of the PSB. We need this debugging API to work for the PA API, so we would really appreciate any suggestion on how to improve the current specification of this API. |
I would like to understand how much of your debugging goal would be possible to achieve using any mechanism that retained the worst-case behavior "only join a user's behavior across two sites." For example, you mentioned the difficulty of a DSP and SSP collaboratively debugging, because those two parties don't get fDO logging on the same request. What would you think of a modification to the fDO downsampling so that if a buyer IG gets to send a debugging report from inside Your point about observing a deployed fix is very interesting, because it seems like the goal there could be to observe the same circumstances again later — that is, the same IG bidding on the same publisher site. That seems like it could be viable because a follow-up debug report in this case would still only give information about the same two sites, though I'll need to think more about how it could be achieved technically. |
This is somewhat similar a a previous callout -- #632 (comment) -- about all auction participants being able to 'see' the same auction and/or bid. |
Yes indeed, but restricting this to just a single buyer/seller/bid has far better privacy properties than a full auction with data from all the sites the user has visited in the past 30 days. |
Dear Michael, Thanks a lot for your answer. I understand your ask but I am unsure of the reason why there is a difference in nature between two sites data and slightly more being available. If we add the 10 IGs limitation, we remove the risk of the full browsing history being leaked, and the sampling rate with this makes this approach uneconomical to be used for fingerprinting purposes which I believe make the trade off acceptable.
Overall I want to stress that the interconnection of the online advertising make all possible bug happen, even the most surprising ones. Being able to debug is therefore really a need for the PA API to have any chance to be deployed at scale and over the long run. |
Without getting into the specific challenges around privacy (which admittedly are tricky), I want to share that GAM also considers debugging a critical use case for us and our partners. We've definitely run into challenges when working with DSPs who buy through us, so the more debugging tools the better. To Basile's point - when something does go wrong, it's often challenging to pinpoint where in the E2E flow the problem is (e.g. is it the DSP or the SSP), so in particular having some way for different entities to coordinate their debugging seems particularly useful. |
Hi, I'm working on how to balance the reporting budget between sellside/buyside and wanted to double check something. At the end of the auction, only one of seller or buyer reports registered with ForDebuggingOnly will be sent, right? There's no way a seller and buyer report will be packaged so that both can be sent? If a browser can only send one report (buyer or seller) at most per day, will there be some kind of selection to make sure a report from each seller/buyer is equally likely to get sent, or will one of the reports just be picked randomly regardless of what type they are? |
Could you please clarify whether cooldown is tied to origin or to eTLD+1 ? |
It's tied to origin. |
it is the latter, reports are randomly picked |
Hello, we wanted to give an update to the ecosystem in how we plan to prepare the forDebuggingOnly (fDO) API for an environment in which some users choose to allow 3PCs, and other users do not, following our July 2024 announcement on a new path for Privacy Sandbox. For traffic in which 3PCs are allowed there is no additional privacy risk from the browser sending unsampled fDO reports compared with 3PC-based debugging. This means that for users who have allowed 3PCs it is possible for the fDO API to remain unsampled and so provide additional precision without compromising privacy. Therefore we propose the following changes using what is already published in our Protected Audience explainer in, the downsampling section as the starting baseline: When the user chooses to allow 3PCs in the current impression site, we will not proceed with the downsampling algorithm as per our explainer, and also that the browser won’t change the state to either cooldown or lockout after either generateBid() or scoreAd() sends the fDO report. We want to allow use of the fDO API only when 3PCs were allowed both at the time the user joined the Interest Group (IG) and at the time that the IG participated in an auction. The simplest implementation we've come up with is for the browser to enter the fDO lockdown state when there is a cookie choice state change from not allowed to allowed, which lasts until all the IGs created prior to cookie choice state change have expired. Additionally, we’ve heard requests from you to better handle the "bias" in fDO reporting because people who turned 3PC off recently (and hence are not in the cooldown or lockout state) are more likely to send fDO reports than people who turned 3PC off earlier. For this reason, we also propose that when the cookie choice state changes from allowed to not allowed, that the browser be placed on a lockout period that can randomly total between 1 and 90 days with equal probability. We welcome the ecosystem comments on this proposal! |
Can you elaborate on what this means?
Given #855 (comment), "have expired" can be up to 90 days, correct? Does this have anything to do with joining origin? |
fDO is the only egress for critical reporting data today, so clear communication about intent to broadly implement this and presence of workable replacements is critical. |
Further to @dmdabbs' comment above, and as flagged to today's WICG call, there's an expectation that fDO isn't being downsampled -- so further clarity about timelines, rollout, and implications are very important to existing PAAPI integrations. |
Hi,
I was wondering, what is the plan for the
forDebuggingOnly
reporting functions and their availability? Will they be supported during the mode a and mode b testing phases?Best regards,
Jonasz
The text was updated successfully, but these errors were encountered: