Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(search): search access controls #9892

Merged
merged 7 commits into from
Feb 28, 2024
Merged

Conversation

david-leifker
Copy link
Collaborator

@david-leifker david-leifker commented Feb 21, 2024

Introduces controls around search results based on policy privileges which can filter or restrict the data on the entities being returned from search operations.

Policy Logic:

  • Support for user/group/ownership (including typed)/tags/domain/urn/domain policies
  • new VIEW_ENTITY privilege
  • Policy filters and restricting aspects on entities is located in the ESAccessControlUtil class.

OperationContext:

  • Access policies must be pushed down from the initial request via (graphql, rest.li, openapi) all the way down to the query/dao layers within the project. This was implemented by introducing the concept of an OperationContext which wraps several layers of context including the session and system authentication contexts. Additionally it includes all information needed to build search query filters from policies for the user.
  • The OperationContext also include the logic to compute a unique context id which should be added to all cache keys to ensure the cache doesn't bypass the intended filtering/restrictions.
  • The OperationContext allows control over privilege escalation. The default implementation here follows the legacy mode where a user session is allowed to escalate to a system level privilege as needed. In the future this could be restricted to only allow escalation based on policy.

SearchFlags/SearchEntity:

  • A new option was added to SearchFlags to indicate whether the result should filter or restrict content in the response entities.
  • While extending the SearchFlags, also added the option to return soft deleted entities.
  • The SearchEntity object within the SearchResult objects now includes a list of restricted aspects. The initial implementation only includes the key aspect which indicates that the entity should be fully restricted. In the future this might be a subset of the aspects for the entity.

Data Models:

  • The Ownership model was extended to include ownership type information to be able to filter by ownership type. The data is populated with a mutation hook and not required to be populated by ingestion.
  • Added new FieldType for MAP_ARRAY for maps with string keys and multiple values (without having to be string encoded)
  • Updated ES mappings to ignore the dynamic schema created by MAP_ARRAY fields

TODO:

  • ownership migration upgrade step
  • complete unit tests for access controls
  • comprehensive api support for openapi/rest.li/graphl
  • restricted entity hydration and graphql response

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@david-leifker david-leifker marked this pull request as draft February 21, 2024 00:15
@github-actions github-actions bot added product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment labels Feb 21, 2024
@david-leifker david-leifker requested review from RyanHolstien, jjoyce0510 and chriscollins3456 and removed request for RyanHolstien February 21, 2024 00:35
@david-leifker david-leifker force-pushed the search-access-controls branch 2 times, most recently from ef56331 to 7df1de0 Compare February 22, 2024 04:23
@david-leifker david-leifker marked this pull request as ready for review February 22, 2024 19:21
@david-leifker david-leifker changed the title [Do Not Merge] wip - search access controls feat(search access controls): search access controls phase 1 Feb 22, 2024
@david-leifker david-leifker changed the title feat(search access controls): search access controls phase 1 feat(search): search access controls part 1 Feb 22, 2024
@david-leifker
Copy link
Collaborator Author

Note that I've found a critical limitation in this implementation. I've introduced a limit to the number owners by making the owner a nested field. This should be addressed by making the field the owner type instead. This reversal puts the limit on the max number of ownership types instead of the owner identities.

@david-leifker
Copy link
Collaborator Author

david-leifker commented Feb 22, 2024

Fixed the owner urn limitation by using this structure

  "ownerTypes": {
    "urn:li:ownershipType:__system__none": [
      "urn:li:corpuser:jdoe",
      "urn:li:corpuser:datahub"
    ],
    "urn:li:ownershipType:__system__technical_owner": [
      "urn:li:corpuser:myuser"
    ],
    "urn:li:ownershipType:__system__business_owner": [
      "urn:li:corpuser:myuser"
    ]
  },

@david-leifker david-leifker changed the title feat(search): search access controls part 1 feat(search): search access controls Feb 26, 2024
david-leifker and others added 4 commits February 27, 2024 22:27
TODO:
* cache keys - need to be context aware to prevent incorrect results
* ownership migration upgrade step
* complete unit tests for access controls
* restricted entity hydration and graphql response (chris)
TODO:
* ownership migration upgrade step
* complete unit tests for access controls
* comprehensive api coveration graphql, openapi, restli
* restricted entity hydration and graphql response
Copy link
Collaborator

@chriscollins3456 chriscollins3456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is looking awesome. but man there's lots of code haha so it's kind of hard to get a very solid review. I left a few comments on higher level stuff but in general looking good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants