-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creative pre-registration strategies #792
Comments
One thing we need to be careful with here is about leaking data - renderURLs haven't been checked for k-anonymity, and so requesting them can leak data (e.g., if we send them only when offered in a bid, then they could pass in a user ID for the publisher page that could be correlated with a user ID on the joining origin. 32 IGs could provide one bit of publisher page ID each, like: https://foo.test/bit-0-is-1?user=FreddyPharkas, https://foo.test/bit-1-is-0?user=FreddyPharkas, etc. Each URL has the full user ID in the joining origin, and one ordered bit from the top-level-site where the auction is running). Sending renderURLs on IG join would be more practical, but we don't know the seller origin to send the information to, and we'd need the IG to opt-in to sending the information (normally, offering a bid is considered to provide that permission). So I think we need to figure out the privacy story here on how we can implement this without creating a new cross-top-level-origin information leak. |
Can you elaborate? As per https://github.com/WICG/turtledove/blob/main/FLEDGE.md#33-metadata-with-the-ad-bid, I wasn't expecting
|
Sending on IG join would be great, from the privacy POV. If only IGs declared which sellers they were willing to bit with, this would be the preferred approach. But that hasn't been a required part of IG metadata until now. I suspect that if we propose it we will hear push-back, but maybe I'm being too pessimistic? Roni, want to pop my bubble quickly?
We do pass bids along to |
Only render URLs that win auctions (or rather, that would have won auctions) are registered with the k-anon server for the purposes of calculating k-anonymity, as otherwise, an ad could only be show to a single user, despite appearing in IGs for a lot of users. So if you're blocking ads that you've never seen before, they'll never reach the k-anon threshold. Therefore, this would need to be done for non-k-anon ads. |
And just to be clear - I mean the ads need to have won the top-level auction, in an environment that doesn't know whether they've met the k-anon threshold or not. My understanding is that you'd want to know the URL so it can be scanned before showing it anywhere. If that's not the case, and this can all be done after the ad has hit the k-anon threshold and we've already started showing the ad to users, this becomes much easier to do. We may need some sort of k-anon <renderURL, seller, component auction bool> check on how often an ad has won auctions, and once it's hit, have some way of conveying it to sellers, whether directly, or through an aggregation server of some sort. |
Correct.
To be clear, there's no desire to trigger this registration under the k-anon threshold; in other words, if a creative won't be shown to N devices, then there's no need to register it "before" it reaches this threshold. |
IMHO it isn't immediately obvious that from the explainer that this ever reaches
|
Agree that the explainer could be clearer on this point. I think this is the first case that's come up where the distinction really matters. |
That would add complexity if an existing buyer/IG wanted to start working with a new seller, correct? Would That being said, even if IG seller declaration were in place, that doesn't address the challenge of being able to leverage the metadata provided by |
Just so that I fully understand the privacy concern -- doesn't that situation arise the first time the ad across the k-anon threshold already? |
I think we could pass along renderURLs to new sellers when fetching the updateURL without any major new privacy issues, though that would potentially add a bunch of network requests and overhead (We'd need up update new sellers about renderURLs, and update old sellers about new renderURLs, so if we inform sellers directly from Chrome, that could be a lot of extra traffic). I don't think sending extra metadata specified by the IG affects the privacy characteristics here if we send the information on join (as opposed to on win on a 3P site, where it would need to be added to the k-anon check, at least). We are putting more complexity and overhead on the browser here for something that the browser doesn't really need to care about, unfortunately. Ideally we'd keep the browser API surface for this as minimal as possible. |
So, ideally the DSP and SSP don't know when the ad reaches the k-anon threshold for the first time, so can't alter behavior based on that. It can only get so much information from loss reports, and auctions are run in a manner that limits information it can get out of them. Only doing the k-anon counting after it wins the auction is done in part to protect against exactly that sort of gaming the system. |
I'm still wondering whether we can find a safe way to make this happen at IG Join and Update time. Really this is kind of about the browser mediating a direct flow of information from DSP to SSP, if both of them are OK with us doing so. I'm thinking something like: (1) Suppose SSP X has run auctions in the past [period of time] in which X has invited DSP Y to be a buyer and some Y IG has placed a bid. (Each browser instance could keep track of this.) (2) Suppose the IG object on which DSP Y calls Join includes a new field If both of those are true, then at the moment of IG Join, it seems to me like it would be OK for the browser to contact that SSP's KV server — if we knew the base URL somehow — and ask for the associated KV signals for each renderURL in the IG. And then if no KV signals came back, we could send the renderURL to some SSP-chosen scan-queueing endpoint, maybe as identified by the SSP in the KV response. This could even be one instead of two round-trips, since I don't think there's any need for the first one to go to a trusted server, this is just a question of what endpoints are set up to receive a lot of traffic (KV expecting calls on each auction) vs only a little (scan-queueing expecting traffic only when a new renderURL appears). |
As per the new guidance in https://github.com/WICG/turtledove/blob/main/FLEDGE.md#14-buyer-security-considerations :
This confirms that there will be no a priori method to be able to associate a |
In this situation, DSPs should tell SSP partners which domain they will use in the renderingURL so that SSP can keep track of it on their KV server to recognize the DSP partner from the renderingURL. |
Agreed -- but it's also not clear that there will only be a single such render domain per DSP. |
[I read through all the comments and think I understand what's being discussed/proposed, but apologies in advance if I rehash something or miss a point already made.] I think @michaelkleber is on the right track when he says that we're looking for someone to mediate between the SSP and the DSP. However, the challenge with having it be the browser was already pointed out by @rdgordon-index in the initial description, as I think this still results in a "significant volume of unregistered created API calls from each device, for each such creative." Could the K/V server be the point of coordination? It could provide an endpoint that can be queried for a list of renderURLs that have no data associated with them. It's effectively a list of cache misses. When the endpoint gets queried, it could take the extra step of filtering by checking which keys are still misses, though it doesn't have to. Depending on how lookups get distributed geographically, it might help segment the data by region (assuming K/V servers are deployed in multiple regions and probably see different keys). This could help SSPs know which values need to be pushed to which K/V servers. |
https://developers.google.com/display-video/protected-audience/ssp-guide#metadata_with_ad_bid -- some recent updates from DV3 regarding ad metadata |
Hi all. We've been exploring this issue, and have prepared a document that details a proposed solution, including a chronicle of several options considered and their respective pros/cons. Please take a look: Thanks. |
Given the length of the document linked above, I believe it would be helpful to convey a high-level summary of that document here. The design expressed in this document attempts to balance a few competing objectives:
To this end, the design proposed has the following properties. The document explains each of these properties and their motivation in far greater detail.
The majority of the document focuses on the question of when the browser would send ads to sellers' creative scanning entrypoints. The design alternative the document recommends proposes sending the ads of an interest group anytime that interest group is joined or updated, except that the browser would also keep track of which ads it already sent to each seller, so that it could reduce the volume of traffic sent to sellers' creative scanning entrypoint by sending each ad to each seller only once. To protect privacy, this deduplication would be partitioned by the joining site of the interest group. Please see the document for more details, and provide your comments here on GitHub issue. Thanks. |
question: can we include |
Can you elaborate on the "potential leak" here? A buyer submitting a bid is effectively "allowing" the seller to scan their creatives. |
Yes, that sounds like a good idea. I've modified the document to reflect it. I've also added a change log at the bottom of the document to record any changes made from when the document was first posted here.
This is a good question. The privacy risk described here would not be part of normal operation, but a malicious party could cause a leak in the following way. An auction is run on the user's device for which the seller is |
Technically true, but attestation also requires the ad tech vendor to indicate that they're not going do this kind of thing -- and because it's on the |
Thanks for the well-written and thought out proposal, @orrb1. |
Suggestion to please use a consistent root path component for all Protected Audience .well-known URIs as Attribution Reporting has. We find this helpful for request routing. |
Buyers' Config Publishing Since Chrome proposes to commit to fetching and persisting the new creative scanning config and you intend to extend For example,
An interest group can override the settings on any IG join, otherwise these are used. Same caveat mentioned in the explainer applies, that creative scanning cannot have a catch-all. The nifty new scanning declarations are yet another thing to hang onto every IG registration that counts against the size constraints. |
Seller Configs Same might apply on the seller side along with picking up the For example,
|
Submitting Creatives
Chrome isn't consuming anything from the response, right? You can ditch the URL encoding by POSTing,
Also free to send multiple creatives identified at the IG joining site as suggested in your perferred Option 2b. |
question: when would |
Not sure if this is where you were going @rdgordon-index, but statements like
had me wondering what frequency sellers get to re-review renderURLs. In today's workflows I'm familiar with, if our fetch url is active our partner re-verifies it. |
Somewhat depends on which Option is under consideration; indirectly, in Options 1 & 2, for example:
Which is some form of re-scanning, albeit indirectly -- I was asking about the explicit ability to do so. |
The |
Thank you both, Roni and David, for your thoughtful feedback. I'll try to answer each of your points below.
Though this is true, the Protected Audience API has a precedent of enforcing with technical restrictions what can be enforced, and relying on policy where that isn't possible.
We considered this among other options for posting this design and getting feedback. The goal was specifically to encourage most of the conversation to stay in this thread so that anyone who's interested can stay involved.
That's a good idea. There's an existing prefix for permission delegation, as described in the explainer, which we can use here as well. I've updated the well-known URIs in this design to be:
The concern here would be in determining what to do if there's a network error while trying to fetch the buyer config. At interest group join time, we'd have only a partial interest group, and that group may have trouble participating in auctions on that device, for example, in auctions that have required seller capabilities. Providing everything inline protects against that. Having just the sellers for creative scanning in a buyer config that needs to be fetched is an acceptable risk, since, even if the buyer config fetch fails, the interest group can still participate in auctions, and presumably the buyer config fetch will succeed on another device, which will send that buyer's ads for creative scanning.
The issue with combining
Yes, there seem to be some compelling benefits to using a POST here. I've updated the document to use a POST instead of a GET for the creative scanning entrypoint.
Entries in the
Could you clarify this? I had envisioned the creative scanning problem as a "discovery" problem. Once the seller knows about an ad, is there any reason it couldn't reverify that ad anytime it wanted to? From my perspective, an ad repeatedly sent to a seller's creative scanning entrypoint was a thing to be avoided because it contributed unnecessary load to the entrypoint. Still, in most of the options, an ad will likely be sent many times throughout its use. In options 3, 5, 6, and 7, a seller can either explicitly request that an ad be sent to their creative scanning entrypoint at any time. In other options, e.g. options 2 and 2a, other devices would send that ad, so sellers would get an opportunity to re-verify anytime a new device joins that interest group.
This seems like a new idea that's distinct from creative scanning. If you'd like to explore this further, could you please file a new issue for further discussion? Thanks. |
Yes after posting I realised that. You want the IG in a ready-to-go state in the IG cache, sans any 'assembly.'
Yes. Good point. Up to sellers when to age off discovered renderURLs.
Indeed it was. I'll post something separate from this thread. Thanks. |
Re-reading your response on the train I see that "picking up the perBuyerXXX pattern" could have been clearer. This
compared to
where the map pattern obviates the "interest_group_owner" and "defaultSamplingRate" labels. It's the OpenRTB background - looking for a concise representation to reduce network bytes. The 'etc...' was to accommodate future, appropriate attributes. |
Ah, sorry for the misunderstanding. It makes a lot of sense to use a format that's consistent with existing parameters. I've updated this in the document. Thanks. |
A few additional comments in advance of the WICG meeting:
I'm aligned that Options 1, 3, 5 and 7 are less desirable; and 2 is preferable to 2b from a seller workload perspective. |
Hi everyone, Thank you for all of your feedback on the document and proposals. Based on that feedback, we've made several changes to the design reflected in the document and described below. We've also changed the structure of the document to reflect the current recommended design, while moving the other options explored into an "Alternatives Considered" section. Please continue to provide us with feedback as we continue to explore potential solutions for supporting creative scanning with the Protected Audience API. From the notes:
In the current recommended design, the owner of the interest group can indicate which attested parties should be notified of new ads. This can absolutely include the top-level seller. We've updated the design so that this seller can explicitly indicate the creative scanner. They would do this using a new Patrick McCann and Laurentiu Badea both asked about having the Trusted Scoring Signals Server keep track of which ads had no corresponding signals - a sign that these ads had not been previously scanned - and expose those via an endpoint. Laurentiu pointed to Joel's prior comment on this issue. From Joel's comment:
We've added this idea as a new "Option 8" in the alternatives considered section of the document. Copying from the analysis provided there: If all Trusted Scoring Signals Servers were running in TEEs, a design like this could work for creative scanning while still preserving privacy. In order to mitigate the privacy risk incurred by allowing for the exfiltration of ad URLs that could potentially be used to expose a user's cross-site identity, the Trusted Scoring Signals Server could aggregate "cache misses" and then, after a delay (e.g. once a day), expose only those that have been reported by at least k devices, enforcing a k-anonymity threshold for creative scanning that would help mitigate the privacy risk. However, for this to work, the Trusted Scoring Signals request would need to include an identifier for that device, which is a privacy risk while Trusted Scoring Signals Servers still run outside of TEEs. We’re continuing to explore whether this offers a feasible solution in the short-term. From the notes:
This is a fair point, as the browser currently fetches trusted scoring signals for component ads as part of the same request that fetches trusted scoring signals for ads. The design has been updated to reflect that the renderURL and creativeScanningMetadata for each component ad would be sent for creative scanning alongside the renderURL and creativeScanningMetadata for each ad. From the notes:
Option 5 is more expensive without any benefit in quality. This option explored the question of whether the trusted scoring signals server could be used to indicate whether an ad should be sent to the creative scanning entrypoint. The conclusion of that exploration was that using the trusted scoring signals server to, in effect, triage the ads and determine which should be sent to the creative scanning entrypoint was inefficient. Assuming that the creative scanning entrypoint would be less expensive than the trusted scoring signals server, making a request to that more expensive trusted scoring signals server only to determine whether or not to make a request to the less expensive creative scanning entrypoint would be inefficient. From the notes:
The Private Aggregation API doesn't seem to be a good match for conveying arbitrary renderURLs. The aggregation key in an Private Aggregation API event is limited to 128 bits. As such, it could be used to convey the hash of a renderURL, but without knowing a priori what the set of all possible renderURLs could be, we couldn't convert that hash back to a renderURL. From the notes:
The current proposed design provides a mechanism that could be replicated by buyers sending their creatives directly to creative scanners. Building support as part of the Protected Audience API aims to establish a set of protocols to make that process easier.
Though this would address the issues you described, the effect would be to make this option identical in its behavior to option 2. The reason for this is that, at an individual device, if the ad first arrives when it's already k-anonymous, the browser wouldn't know to which sellers the ad had been sent from other devices before it was k-anonymous. As a result, each browser would fall back to sending each ad to each seller for each new ad, and potentially a second time if first sent before it was k-anonymous.
Though the TEE provides a guarantee that the trusted scoring signal server won't be able to exfiltrate any information by itself, the control it has over whether an ad is sent to creative scanning servers would provide it with a mechanism for exfiltrating a small amount of information. The trusted scoring signal server potentially has access to multiple sites’ worth of information - context from the publisher site and renderURLs from advertiser sites. If the trusted scoring signal server intentionally selected a subset of ads to be sent to the seller's creative scanning entrypoint, these could be used to reconstruct a user's cross-site identity.
The sampling rate defined in the seller config is an optional configuration that sellers may use to tune the rate of traffic as they see fit. If no per-buyer sampling rates are provided, the default sampling rate assumed is 1.0, so that all ads are sent to the sellers' creative scanning entrypoint. To ensure that they see all renderURLs, a seller may choose to maintain a sampling rate of 1.0 and, as noted in the document, efficiently shed previously discovered renderURLs at their creative scanning entrypoint. |
Appreciate the follow-ups to address earlier threads, @orrb1, and the updated written spec proposal. From your doc:
A number established features and emerging proposals concern renderURLs:
Regardless of how these chips land, I presume that the constraint will remain that a bidder/buyer will not be permitted to submit novel "render URLs"; they must be recognizable as present in the IG on device. On #1. the explainer says,
Does this mean that the "creative url" supplied to the seller will have AD_WIDTH & AD_HEIGHT replaced as Chrome does prior to navigating? On #2, On #4 On #5 On #6
Some of these are in discussion for buyers to provide to sellers in the bid
Chrome has or will mitigate attestation file availability by downloading these via some Chrome component. Wondering how to keep this file/fetch from experiencing similar issues. Can one assume that no buyer creatives will be shared if there is not a cached resource available? Also the fetch will be out of the critical path, yes?
Same here.
Is this answering Yes to the question above regarding multibid submissions?
Suggest using realistic, illustrative URLs. | |
Can you clarify why the browser would need to know about what's happening on other devices in this case (for Option 4)? If the hash includes seller, it should already know what the 'new seller' is -- and the existing sellers are already locally stored in the cache. |
Can you elaborate on the nature of this "intention"? By definition, only ads that need to be re-scanned, or aren't already scanned, would be sent to the creative scanning endpoint -- so how is this any different? |
I don't think that's a viable solution -- that's an enormous amount of network traffic simply to discard it at the entrypoint. |
@rdgordon-index - I have a couple of small follow questions regarding your initial comment on this issue. If given both the renderURL and the buyer origin, would it be possible to infer the other key signals needed for creative scanning? Specifically, could adomain be determined from either response headers returned from the ad server or by rendering the creative, and could seat be inferred using the renderURL, buyer origin, and adomain? |
Would definitely be valuable to include a link between
For response headers -- are you thinking about something like https://developers.google.com/authorized-buyers/rtb/protected-audience-api#automatic_creative_scanning ? 'returned from the ad server' -- #1028 talks about some of the challenges and assumptions as to whether or not the
Typically this would involve support for some sort of 'creative audit' flags to ensure that the |
@orrb1 @michaelkleber Taboola team (@vladimanaev and @razkliger) finished reviewing the proposal and here are our comments. We graded the eight proposals in a scale of one to eight, where one is the most bad for us, and eight is the most fit for our needs. As a reminder in native we have endless opportunities for auctions - need to run them/aware of them in the same time, need for look and feel, and higher demand for ad quality functionality
We will be happy to engage and colaborate on this moving forward |
Thank you, everyone, for your patience as we've explored in depth how Protected Audience can support creative scanning while advancing privacy and conserving resources. Previously, we had looked at several browser-mediated options, as outlined in this doc, but found that each of these options was infeasible due to privacy and/or resource concerns. We decided instead on an approach that reuses preexisting PA infrastructure. Here's what we currently propose: Today, sellers may choose to run a key/value service that allows the auction to retrieve real-time signals before the ad is scored by the seller. For the time being, these are run on untrusted servers, referred to as BYOS (Bring Your Own Server). At auction time, the browser issues a series of requests to these key/value services. Each request conveys one or more renderURLs in plain text, and the key/value service returns signals associated with each of those renderURLs. The key/value service request includes renderURLs for both ads and component ads. This exchange is described in more detail in section 3.1 of the explainer, in particular the paragraph beginning with, "Similarly, sellers may want to fetch information about a specific creative, e.g. the results of some out-of-band ad scanning system." We know that some sellers to date have relied on forDebuggingOnly (fDO) APIs to discover renderURLs for creative scanning. This flow will become ineffective on devices on which fDO is downsampled. We propose that sellers instead use their BYOS real-time scoring signals key/value service as a source of renderURLs for creative scanning while these services are BYOS-hosted. Some have noted that the key/value service request lacks metadata associated with each renderURL that's needed for creative scanning. To accommodate this, we would add a new string-typed
If the seller would like the Similarly, some have noted that the ad size is also necessary in order to scan a creative. Ad size is returned by The key/value service request includes renderURLs for both ads and component ads. For component ads, we would also add a new The In total, the URL for the browser's request to the the BYOS-hosted scoring signals key/value service for an auction configured with In the future the key/value service will be required to run in a trusted execution environment (TEE) to ensure that the user's data is kept private. We've been working on a long-term design that provides an aggregated stream of renderURLs and their associated metadata and sizes for creative scanning. Unlike the BYOS-based solution, which exposes each renderURL without any indication of its desirability, for this long term solution we're looking at ways to emit only the most valuable renderURLs as defined by the seller, providing a more focused stream of ads for scanning. Each renderURL will be sent for creative scanning only after that renderURL has met a privacy bar, for example, having been observed on multiple devices. As such, it's recommended that sellers using the BYOS key/value service approach described above similarly scan renderURLs only after they've been seen multiple times to ease future transitions into the long term privacy advancing state. We will provide a timeline to transition to this privacy improving approach in a future update. In the meantime, please provide us with feedback on whether the BYOS key/value service-based solution described above would support your creative scanning needs. Thanks so much. |
Thanks for the additional details regarding the KV-initiated creative registration proposal.
Earlier, we talked about the challenge of not knowing which DSP's buyer origin corresponds to the renderURL -- will buyer origin be sent for each renderURL as well? Regarding |
Thinking about the seller-specific metadata concern above -- is there any reason why |
Thanks @orrb1 |
Thank you for your questions, @rdgordon-index and @eysegal.
Yes, we can include another pair of query parameters -
As you noted, the browser doesn't enforce k-anonymity on the @rdgordon-index - I'm also unclear on what benefits there would be for the data to be derived from
Note that in the posting above, we noted that, in the long-term support for creative scanning, "Each renderURL will be sent for creative scanning only after that renderURL has met a privacy bar, for example, having been observed on multiple devices." This privacy bar will apply jointly to the render URL and its associated information -
No,
Just to clarify, in the description of how this could work posted above, the Thanks again for your feedback. |
Excellent.
If I've understood correctly, this still requires it to be part of IG join/update, and cannot be dynamic at
This was more about the seller-specific nature of bidding, and hence metadata associated with the bid -- and when these values are known -- but this is probably a better question for the buy-side who will need to manage setting and updating these values. Today, we've asked for all of this information via
Understood.
I think that speaks to the generic "string field whose URL-encoded value" of |
Follow-up question: will |
With the current proposed system we, as a buyer, will be unable to provide the ad size to the seller via this |
#1088 is relevant here -- originally, there was no mechnaism for knowing what the size actually would be for a given auction -- |
As suggested in the explainer, sellers have the ability to fetch additional real-time signals based on a combination of
renderURL
andhostname
(representing the publisher’s domain) that can be used duringscoreAd()
when scoring creatives. Specifically:In today’s programmatic ecosystem, buyers communicate their creative markup via the
bid.adm
during RTB, alongside other key bid metadata (advertiser domain, seat ID, IAB category, creative format, creative & campaign identifiers, etc.); however, no creative URL (akarenderURL
) containing the markup is provided. As a result, there is no existing mechanism by which SSPs can obtain this URL for all existing creatives submitted in contextual auctions.This necessarily means that all existing creatives are unable to be served in PA auctions, since creatives have to be pre-approved in order to be scored with a desirability > 0. Otherwise, the
rejectReason
for all PA creatives returned byscoreAd()
would bepending-approval-by-exchange
.This also necessitates a mechanism to initiate such PA creative registration via
renderURL
, which poses some challenges, as outlined below.The most naïve such mechanism, available today, would leverage the
forDebuggingOnly.reportAdAuctionLoss()
endpoint – that is, for anyrenderURL
not found in the seller’s K/V server, initiate an API call to a seller endpoint to indicate that saidrenderURL
has not yet been approved. According to #632 (comment), this function will be available until the end of 3PCD, and should suffice for short-term testing as well as during the 1% 3PCD time horizon.Challenges with this approach:
A significant volume of unregistered created API calls from each device, for each such creative submitted via
generateBid()
the K/V call doesn’t include a buyer origin - and the
renderURL
need not utilize the buyer origin – so there is no guaranteed way to map a URL to a given buyer (aka DSP)Other key signals available in OpenRTB – such as
adomain
andseat
– are not guaranteed to be made available toscoreAd()
(and hence able to be passed into this debugging endpoint), despite them being required for creative registration. As quoted hereAs such, without an IAB standard for parameters like
seat
inrenderURL
, it’s unclear how buyers will be able to ensure that their creatives are being registered for all sellers.Another alternative approach would be to somehow leverage the Private Aggregation API, but this shares all of the challenges above, as well as it being unclear how to bucket the fields required for registration (e.g.
renderURL
,seat
,adomain
). Furthermore, this also requires the immediate adoption of this API (and its requirement for TEE) in order to be able to start registering creatives, and as such, this does not seem like a short-term solution.The text was updated successfully, but these errors were encountered: