Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for debugging PRIVACY_BUDGET_EXHAUSTED scenarios: Feedback Requested #69

Open
anishahmd opened this issue Aug 13, 2024 · 4 comments
Labels
feedback requested Feedback Requested from customers question Further information is requested

Comments

@anishahmd
Copy link

Hello!

The Aggregation Service team has heard the feedback (#35, #42, #52, #61, #62) from our partners on difficulties in debugging PRIVACY_BUDGET_EXHAUSTED scenarios. Users can face such scenarios when their batching strategy is not optimized correctly to meet the privacy limits. Information on batching strategies can be found here.

To address this we are working on a feature that will provide a list of report_id's (UUID of the report as present in the report shared_info) of the aggregatable reports that cause PRIVACY_BUDGET_EXHAUSTED error. This report_ids list will be provided in an avro output file written to the user's cloud storage after a job fails with this error. Users can use this information to -

  • Identify aggregatable reports and corresponding shared_info responsible for PRIVACY_BUDGET_EXHAUSTED error.
  • Identify possible issues in aggregatable reports batching and/or job scheduling in adtech pipeline.
  • Filter out corresponding aggregatable reports from the input batches and bypass the PRIVACY_BUDGET_EXHAUSTED error.

In the future, we will look to extend this solution to provide additional information on the reason behind PRIVACY_BUDGET_EXHAUSTED errors.
If you have any feedback on the proposal or additional suggestion, please let us know.

Thank you!

@ruclohani ruclohani added question Further information is requested feedback requested Feedback Requested from customers labels Aug 30, 2024
@CGossec
Copy link

CGossec commented Sep 2, 2024

Hello,
In the past, we (Criteo) have experienced shared ID issues that we could not find the root cause for, even with extensive analysis and collaboration with Google.
As a result, while the proposal allows us to circumvent failures by rerunning failing batches without the invalid elements (and thus brings about a great first step in debugging), we think it would greatly benefit from including detailed information about the execution(s) that previously consumed the privacy budget for the failing reports.
This detailed information could include:

  • Discriminating information to identify the job that previously consumed the privacy budget (e.g. the JobKey)

but also

  • the reportIDs within that job that had this same sharedInfo

@CGossec
Copy link

CGossec commented Sep 3, 2024

Additionally:

The budget recovery request must come from the email that was provided as the point of contact during Aggregation Service onboarding so we can ensure the request is valid.

is a very harsh restriction that we believe should be made somewhat more relaxed. For instance if the original onboarding was done with an individual's email rather than a mailing list or other type of shared email, this may cause problems (e.g. the original onboarding requestor leaving the company).
Could the request maybe originate from the same domain (indicating the same company)?

@anishahmd
Copy link
Author

Thank you for sharing your feedback. We are glad to hear that providing report_ids will be valuable in debugging PRIVACY_BUDGET_EXHAUSTED jobs. We agree that including further details on the job that previously consumed the privacy budget can add more value for debugging. Providing this information is in our plan and we will share more information on it in future once the details are finalized.

Regarding your second comment, we have noted your feedback. To clarify the requirement, adtechs must fill out the budget recovery form to initiate the process. Our expectation is that the email you provide in the “Email Address of Point of Contact” field of your budget recovery form response matches the email you provided in the “Email Address of Point of Contact” field of your onboarding form response. In future iterations of budget recovery, we plan to make request verification more convenient. For tracking purposes, I will add a comment to the issue where we're accepting feedback on budget recovery with a reference to this feedback.

@anishahmd
Copy link
Author

Thank you again for your valuable feedback on the proposal to provide debugging information for PRIVACY_BUDGET_EXHAUSTED errors. After further consideration, we have refined our approach to optimize both informational value and system efficiency.

Instead of providing report_ids, we propose providing the shared_info fields that are used in shared_id calculation from the contributing aggregatable reports. These fields will include api, version, attribution_destination, reporting_origin, scheduled_report_time and source_registration_time for Attribution Reporting API (ARA) reports. This adjustment offers equivalent, debugging capability while mitigating potential scalability concerns.

Specifically, utilizing shared_info presents several advantages:
Reduced Storage Overhead: The size of report_ids can grow significantly, leading to substantial and unintentional cloud storage costs. In contrast, the shared_info fields remain compact, minimizing storage and processing overhead.
Enhanced Efficiency: Providing shared_info reduces processing cost due to fewer steps in the Aggregation service, minimizing delays in adtechs getting the debugging info
Usability: This would be the similar for both shared_info and report_id where adtechs can use either to identify problematic reports and filter out before re-running the job

We welcome your further input on this refined proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feedback requested Feedback Requested from customers question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants