Support for Requerying in Aggregation Service: Feedback Requested #71

wualbert17 · 2024-08-29T17:44:56Z

The Aggregation Service team is looking into supporting requerying, and would like your feedback.

Current System: Today, Aggregation Service only allows each Shared ID to be included in one summary report. Attempting to use the same report in subsequent aggregation jobs will result in budget exhausted errors.

Proposed Enhancement: Allow each Shared ID to be included in multiple summary reports. As before, each aggregation job will use a parameter "epsilon" to calculate noise, and is configurable by adtechs.

To ensure privacy guarantees, each Shared ID will have an Aggregatable Report Accounting Budget (a.k.a. privacy budget) that can be split across multiple aggregation jobs. Adtechs can choose how to divide the budget depending on their use cases. Aggregation Service will only generate the summary report if all Shared IDs in the job still have budget available. In line with our current maximum epsilon, Aggregation Service will enforce a budget of epsilon = 64.

From initial analysis, we found that several models of differential privacy perform better, depending on the use case:

For use cases that only need to requery a low number of times (less than 40), using the Laplace distribution with basic composition provides the best noise-to-signal ratio.
For use cases that need to requery a high number of times, using the Gaussian distribution with zCDP provides the best noise-to-signal ratio. zCDP uses a different privacy parameter called rho, instead of epsilon.

Adtechs can choose which DP model to use for each job. Depending on the selected model, adtechs will specify their per-job privacy parameters either in terms of epsilon or rho. In turn, Aggregation Service will use the same choice of epsilon or rho to maintain the budget for each Shared ID in the job. The exact budget value for rho is TBD, but it will be equivalent to epsilon = 64.

Once a Shared ID has been used with a specific model, all subsequent jobs that include that Shared ID must use the same model.

Motivating use cases:

Real-time monitoring: Requerying lets adtechs get initial "rough" data quickly, while still allowing them to reprocess later for "richer" comprehensive data once all reports in a Shared ID have been received (#732).
Error recovery: Requerying lets adtechs retry the same batch of reports to Aggregation Service, in case the adtech's pipeline encountered an error after the Aggregation Service job had succeeded. (#716)
Reach: Requerying is one part of a proposed solution for calculating Reach metrics (see https://github.com/patcg-individual-drafts/private-aggregation-api/blob/main/reach_whitepaper.md)

Proposed API:
The following fields will be added to Aggregation Service's CreateJobRequest:

{
  // Specifies which Differential Privacy (DP) model to use and its privacy parameters. If this
  // field is unset, default to laplace_dp with job_epsilon = 10.
  "dp_model": {

    // Indicates which DP model to use for this batch. If the type does not match the
    // model-specific parameters specified below, the request will fail.
    // If a report has been included in a prior job, this batch MUST use the
    // same type as the prior job. This means all previously used reports in this job must
    // have used the same model. Otherwise, the request will fail. 
    // Currently, this must either be "laplace_dp" or "gaussian_zcdp".
    "type": <string>,

    // Laplace distribution under pure differential privacy, using basic composition. Use this
    // if you expect your reports only need to be requeried a small number of times
    // (less than 40).
    "laplace_dp_params": {
      // The epsilon for this job. This determines noise levels and budget consumption for just
      // this batch. Must be at most 64.
      // If unset, the request will fail.
      "job_epsilon": <double>
    },

    // Gaussian distribution under rho-zCDP, with basic composition. Use this if you expect
    // your reports need to be requeried a large number of times (more than 40).
    "gaussian_zcdp_params": {
      // The rho for this batch. This determines noise levels and budget
      // consumption for just this batch. Must be at most N (exact value TBD).
      // If unset, the request will fail.
      "job_rho": <double>
    }
  }
}

We would really appreciate your feedback on this API. In particular:

Is the name "dp_model" clear? Or is there a more suitable term?
Is the naming of "laplace_dp" and "gaussian_zcdp" clear? Or are there more suitable terms?
Is it clear how the "per-job" privacy params are used, and how they relate to the overall budget?
What use cases would you like to solve by using this feature?
What use cases would you expect to use a large number of requeries? (In other words, what use cases would you expect to use "gaussian_zcdp"?)

ruclohani added question Further information is requested feedback requested Feedback Requested from customers labels Aug 30, 2024

preethiraghavan1 mentioned this issue Oct 8, 2024

Support for Encrypted Intermediates in Aggregation Service: Feedback Requested #77

Open

akashnadan mentioned this issue Oct 31, 2024

MEETING: Attribution Reporting API calls WICG/attribution-reporting-api#80

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Requerying in Aggregation Service: Feedback Requested #71

Support for Requerying in Aggregation Service: Feedback Requested #71

wualbert17 commented Aug 29, 2024 •

edited

Loading

Support for Requerying in Aggregation Service: Feedback Requested #71

Support for Requerying in Aggregation Service: Feedback Requested #71

Comments

wualbert17 commented Aug 29, 2024 • edited Loading

wualbert17 commented Aug 29, 2024 •

edited

Loading