Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec: filtering IDs #123

Merged
merged 3 commits into from
May 6, 2024
Merged

Spec: filtering IDs #123

merged 3 commits into from
May 6, 2024

Conversation

alexmturner
Copy link
Collaborator

@alexmturner alexmturner commented Mar 29, 2024

Specs the ability to set a filtering ID (and modify the default ID space). See https://github.com/patcg-individual-drafts/private-aggregation-api/blob/main/flexible_filtering.md#proposal-filtering-id-in-the-encrypted-payload and issue #92.

To support this new functionality, we increase the report version. Note that this also requires aggregation service versions to support the new version.


Preview | Diff

@alexmturner
Copy link
Collaborator Author

@linnan-github, could you PTAL? Thanks!

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated
1. If |preSpecifiedParams|' [=pre-specified report parameters/context ID=] is
not null, return true.
1. If |preSpecifiedParams|' [=pre-specified report parameters/filtering ID byte
size=] is not 1, return true.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we define an implemention-defined value for the default filtering ID byte size?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My instinct here is to avoid an implementation-defined value here as I don't think we'd expect another implementer to change this value. But maybe we can follow up with the spec mentor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it’s not implementation defined, maybe we can follow the ARA spec which defines constant values for readability?

Copy link
Collaborator Author

@alexmturner alexmturner Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, makes sense! Added.

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated
larger than 64.
1. Let |filteringIdByteSize| be |params|' [=pre-specified report parameters/
filtering ID byte size=].
1. [=Assert=]: |filteringIdByteSize| is an integer in the range [1, 8].

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably consider defining implementation-defined values for the valid range of filtering ID byte size.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
@alexmturner
Copy link
Collaborator Author

Thanks for the review, Nan!

@alexmturner
Copy link
Collaborator Author

@arichiv, could you PTAL as well? Thanks!

@alexmturner
Copy link
Collaborator Author

Two additional things I'm seeking feedback on:

  • I'm a little unsure what to name filteringIdByteSize. For context, this allows for using a custom size for the filtering ID space, e.g. you could specify 4 to allow for a 32-bit ID. (We only allow integer bytes which is why we went with ByteSize not BitSize.) Other options we were thinking of is filteringIdByteLength or filteringIdMaxBytes (maybe leaning towards this).
  • The default filteringIdByteSize is 1. When a non-default value is used, a report is sent even if no contributions were sent (a "null report"). This is done to avoid a privacy concern from the different payload length. However, it might be easier to describe the behavior if we activate this null reporting logic whenever a filteringIdByteSize is explicitly specified, even if it's for the default value. Personally, I'm probably leaning against this as it's not needed for privacy.

@alexmturner
Copy link
Collaborator Author

@linnan-github @arichiv, I've updated this PR to no longer have an explicit opt-in or require debug mode. This should simplify the change substantially, albeit potentially delay the launch slightly (as we will now want all current aggregation service releases to support the new format before launching to Chrome Stable).

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated
1. [=Set the context ID for a batching scope=] given |contextId| and
|batchingScope|.
1. Run the following steps [=in parallel=]:
1. If |preSpecifiedParams| is not null:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very close to line 1174, should there be an algo to setup some of this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was a mistake -- we shouldn't be repeating this. Fixed this in #128.

spec.bs Outdated
1. [=Set the context ID for a batching scope=] given |contextId| and
|batchingScope|.
1. Run the following steps [=in parallel=]:
1. If |preSpecifiedParams| is not null:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about repetition, especially given it's a long series of things nested in complex ways

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I think I might leave for now -- agreed that it's duplicate, but I don't think this PR changes that too much. I'm hoping to do a better job when we move this into the Shared Storage spec.

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
@arichiv
Copy link

arichiv commented Apr 30, 2024

It might be worth splitting this change into two PRs:

  1. replacing context id with a pre-specified report parameters struct that contains only a context id
  2. adding a filtering ID byte size to the pre-specified report parameters struct

Doing both at once when so many spots are touched by 1 but only a few feel the impact of 2 is risky from a comprehensive review perspective.

Copy link
Collaborator Author

@alexmturner alexmturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Pulled out the no-op pre-specified report parameter changes into their own PR (#128). I've left comments about other issues unresolved for now and will address them in this PR once its rebased onto the other one.

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated
1. [=Set the context ID for a batching scope=] given |contextId| and
|batchingScope|.
1. Run the following steps [=in parallel=]:
1. If |preSpecifiedParams| is not null:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was a mistake -- we shouldn't be repeating this. Fixed this in #128.

spec.bs Outdated
1. [=Set the context ID for a batching scope=] given |contextId| and
|batchingScope|.
1. Run the following steps [=in parallel=]:
1. If |preSpecifiedParams| is not null:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I think I might leave for now -- agreed that it's duplicate, but I don't think this PR changes that too much. I'm hoping to do a better job when we move this into the Shared Storage spec.

Copy link
Collaborator Author

@alexmturner alexmturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready for review again. I think I've addressed the remaining comments and rebased appropriately.

spec.bs Outdated Show resolved Hide resolved
spec.bs Show resolved Hide resolved
spec.bs Outdated
|options|["{{SharedStorageRunOperationMethodOptions/privateAggregationConfig}}"]["{{SharedStoragePrivateAggregationConfig/contextId}}"].
1. If |contextId|'s [=string/length=] is greater than 64, return a new
{{DOMException}} with name "`DataError`".
1. Return |contextId|.
1. Let |filteringIdByteSize| be the [=default filtering ID byte size=].
1. If |options|["{{SharedStorageRunOperationMethodOptions/privateAggregationConfig}}"]["{{SharedStoragePrivateAggregationConfig/filteringIdByteSize}}"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a go simplifying slightly. It's a little weird as we have to check if the entry exists first, but I think this is a little better.

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
Copy link

@arichiv arichiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I have not done a detailed review of the entire spec but the change just here look good to me. Thanks for splitting the PR into multiple parts, that made it a lot easier to read through

@@ -1349,9 +1408,16 @@ event, PAExtendedHistogramContribution contribution)</dfn> method steps are:
throw=] a {{TypeError}}.
1. Otherwise, if |contribution|["{{PAHistogramContribution/value}}"] is
negative, [=exception/throw=] a {{TypeError}}.
1. If |contribution|["{{PAExtendedHistogramContribution/filteringId}}"] is
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the third time we do something similar, it might be worth making an algo but up to you

@alexmturner
Copy link
Collaborator Author

Thanks for the review, @arichiv! I was also wondering if you had any feedback on the questions in #123 (comment).

@arichiv
Copy link

arichiv commented May 2, 2024

Two additional things I'm seeking feedback on:

  • I'm a little unsure what to name filteringIdByteSize. For context, this allows for using a custom size for the filtering ID space, e.g. you could specify 4 to allow for a 32-bit ID. (We only allow integer bytes which is why we went with ByteSize not BitSize.) Other options we were thinking of is filteringIdByteLength or filteringIdMaxBytes (maybe leaning towards this).

I like filteringIdMaxBytes given the usage I see.

  • The default filteringIdByteSize is 1. When a non-default value is used, a report is sent even if no contributions were sent (a "null report"). This is done to avoid a privacy concern from the different payload length. However, it might be easier to describe the behavior if we activate this null reporting logic whenever a filteringIdByteSize is explicitly specified, even if it's for the default value. Personally, I'm probably leaning against this as it's not needed for privacy.

Is there a design doc I could review? I'm not sure I understand the implications from what I've reviewed alone

@alexmturner
Copy link
Collaborator Author

alexmturner commented May 2, 2024

The explainer here might help, but it is a bit terse: https://github.com/patcg-individual-drafts/private-aggregation-api/blob/main/flexible_filtering.md#small-id-space-by-default-but-configurable.

To elaborate on the privacy concern, when reports are not deterministically sent, the count of reports is potentially sensitive, i.e. based on cross-site data. For example, a script could choose whether or not to make contributions in a shared storage worklet (and a report would only be sent if there are contributions). The filtering ID feature allows scripts to also affect payload size (although only based on first-party data), which could allow an adversary to "partition" its counts -- i.e. it could separately count the number of reports with each different payload size to gain more information. Thus, this additional metadata would amplify the existing counting attack.

We prevent this by ensuring that, whenever a non-default byte size is chosen (and so a non-default payload size will occur) reports are sent deterministically, i.e. even if the script makes no contributions. This means that the count of reports with that payload size can no longer be affected by cross-site data.

The question here is whether we should also extend this protection to cases where a script has explicitly chosen the default byte size. There is no privacy reason to do this -- these reports are indistinguishable from reports that got the default due to not specifying anything. However, it arguably might be easier for developers to grok the "use the param => deterministic report" logic over "specify a non-default value => deterministic report" logic. There is also a small performance downside to this -- a report will be needlessly sent if no contributions are made in this case. I'm leaning against this due to this small performance impact and because it feels more natural to align the logic with the privacy requirement; it also means we can define the WebIDL in the form type filteringIdMaxBytes = 1 and not have special "not present" logic.

(And no worries if you don't really have opinions here -- happy to resolve internally if so)

@arichiv
Copy link

arichiv commented May 2, 2024

I get it now, thanks for the elaboration. I would suggest something slightly more elaborate: modifying SharedStoragePrivateAggregationConfig to take both a filteringIdMaxBytes and a new enum called something like reportingStrategy which is a string that can be wait-for-contributions or always-send (open to other naming).

filteringIdMaxBytes has the same current properties (defaults to 1).

reportingStrategy defaults to wait-for-contributions when filteringIdMaxBytes is 1 and always-send otherwise (doesn't distinguish between filteringIdMaxBytes being 1 explicitly or by default). If reportingStrategy is set to wait-for-contributions when filteringIdMaxBytes isn't 1 an error is thrown. If reportingStrategy is set to always-send when filteringIdMaxBytes is 1 that's fine.

Base automatically changed from split_off_params_changes to main May 2, 2024 16:26
Copy link

@arichiv arichiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on last comment

@alexmturner
Copy link
Collaborator Author

Just switched the spec over to using "max bytes".

Oh interesting! I do see the appeal of making this property a bit more explicit, but I don't think any developer will want to opt-in to additional null reports if they can avoid it. Chatted briefly internally and, given that, we were thinking that leaving the logic as is should be simple enough for developers (and aligns with this idea that they want to minimize null reports when possible).

That being said, we could consider adopting randomized null reporting for the non-deterministic case in the future, similar to ARA's proposal. If we do end up pursuing that, we might want to return to this idea as there would be more of a meaningful tradeoff.

Thanks again both of you for all the reviews!

Copy link

@arichiv arichiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like if the default or an explicit 1 is used then empty reports won't be sent? I'm fine with that. I'm just against having defaults behave differently from explicit settings identical to the defaults.

@alexmturner
Copy link
Collaborator Author

Yep, that's right!

@@ -1038,6 +1085,7 @@ partial interface SharedStorageWorkletGlobalScope {
dictionary SharedStoragePrivateAggregationConfig {
USVString aggregationCoordinatorOrigin;
USVString contextId;
[EnforceRange] unsigned long long filteringIdMaxBytes;
Copy link

@linnan-github linnan-github May 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use unsigned long long to match `size_t in the implementation? Do we know whether there's similar API behavior?

Copy link

@arichiv arichiv May 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, doesn't JS max out at 53 bits for ints because that's the non-exponent part of a double (and all JS numbers are doubles)? Is an unsigned long enough?

If you really need an unsigned 64 bit int then you might need to accept it as a string and have the algo convert it internally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just following the guidance in the Web Platform Design Principles here. We unfortunately didn't follow this for some of the other existing fields -- we may want to change those at some point.

aarongable pushed a commit to chromium/chromium that referenced this pull request May 6, 2024
This follows discussion on the spec PR:
patcg-individual-drafts/private-aggregation-api#123 (comment).
This change should be a no-op (and the feature relying on this has never
been enabled by default anyway).

Bug: 330744610
Change-Id: Ia06439b9a717ce8283fbc71f90a296bf9501d848
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5506763
Reviewed-by: Nan Lin <[email protected]>
Commit-Queue: Alex Turner <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1296864}
@alexmturner alexmturner merged commit a857522 into main May 6, 2024
1 check passed
@alexmturner alexmturner deleted the filtering_ids branch May 6, 2024 16:26
github-actions bot added a commit that referenced this pull request May 6, 2024
SHA: a857522
Reason: push, by alexmturner

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
aarongable pushed a commit to chromium/chromium that referenced this pull request May 6, 2024
Adds support for the new filtering_id and filtering_id_max_bytes fields.
This support is not currently hooked up to either Shared Storage or
Protected Audience. That support will be added in separate cls.

The spec is available here:
patcg-individual-drafts/private-aggregation-api#123

Bug: 330744610
Change-Id: Iade0c91a6ad018b7bd4b709a4a7615e783d8d9b0
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5443375
Reviewed-by: Qingxin Wu <[email protected]>
Commit-Queue: Alex Turner <[email protected]>
Reviewed-by: Yao Xiao <[email protected]>
Reviewed-by: Mike Taylor <[email protected]>
Reviewed-by: Nan Lin <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1297012}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants