Use data source access control

This page describes how to enforce data source access control for search apps in Vertex AI Agent Builder.

Access control for your data sources in Vertex AI Agent Builder limits the data that users can view in your search app's results. Google uses your identity provider to identify the end user performing a search and determine if they have access to the documents that are returned as results.

For example, say that employees at your company search across Confluence documents using your search app. However, you need to make sure they can't view content through the app that they aren't allowed to access. If you have set up a workforce pool in Google Cloud for your organization's identity provider, then you can also specify that workforce pool in Vertex AI Agent Builder. Now, if an employee uses your app, they get search results only for documents that their account already has access to in Confluence.

About data source access control

Turning on access control is a one-time procedure.

Access control is available for Cloud Storage, BigQuery, Google Drive, and all third-party data sources.

To turn on data source access control for Vertex AI Agent Builder, you must have your organization's identity provider configured in Google Cloud. The following authentication frameworks are supported:

  • Google Identity: If you use Google Identity, then all user identities and user groups are present and managed through Google Cloud. For more information about Google Identity, see the Google Identity documentation.
  • Third party identity provider federation: If you use an external identity provider, for example Okta or Azure AD, then you must set up workforce identity federation in Google Cloud before you can turn on data source access control for Vertex AI Agent Builder.

    If you use third-party connectors, the google.subject attribute must map to the email address field in the external identity provider. The following are example google.subject and google.groups attribute mappings for commonly used identity providers:

    • Azure AD with OIDC protocol

      google.subject=assertion.email
      google.groups=assertion.groups
      
    • Azure AD with SAML protocol

      google.subject=assertion.attributes['http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name'][0]
      google.groups=assertion.attributes['http://schemas.microsoft.com/ws/2008/06/identity/claims/groups']
      
    • Okta with OIDC protocol

      google.subject=assertion.email
      google.groups=assertion.groups
      
    • Okta with SAML protocol

      google.subject=assertion.subject
      google.groups=assertion.attributes['groups']
      

Limitations

Access control has the following limitations:

  • 250 readers are allowed per document. Each principal counts as a reader, where a principal can be a group or an individual user.
  • You can select one identity provider per Vertex AI Search-supported location.
  • Access control is honored only for identity and groups that are explicitly defined in your identity provider. Identities or groups that are defined natively within third-party apps are not supported.
  • To set a data source as access-controlled, you must select this setting during data store creation. You can't turn this setting on or off for an existing data store.
  • The Data > Documents tab in the console doesn't show data for access-controlled data sources because this data should only be visible to users that have view access.
  • To preview UI results for search apps that use third-party access control, you must log into the federated console or use the web app. See Preview results for access controlled apps.

Before you begin

This procedure assumes you have set up an identity provider in your Google Cloud project.

  • Google Identity: If you use Google Identity, you can proceed to the Connect to your identity provider procedure.
  • Third-party identity provider: Make sure you have set up a workforce identity pool for your third-party identity provider. Ensure you have specified subject and group attribute mappings when setting up workforce pool. For information about attribute mappings, see Attribute mappings in the IAM documentation. For more information about workforce identity pools, see Manage workforce identity pool providers in the IAM documentation.

Connect to your identity provider

To specify an identity provider for Vertex AI Agent Builder and turn on data source access control, follow these steps:

  1. In the Google Cloud console, go to the Agent Builder page.

    Agent Builder

  2. Go to the Settings > Authentication page.

  3. Click Add identity provider for the location you want to update.

  4. Select your identity provider in the Add identity provider dialog. If you select a third party identity provider, also select the workforce pool that applies for your data sources.

  5. Click Save changes.

Configure a data source with access control

To apply access control to a data source, use the following steps depending on the kind of data source you're setting up:

Unstructured data from Cloud Storage

When setting up a data store for unstructured data from Cloud Storage, you need to also upload ACL metadata and set the data store as access controlled:

  1. When preparing your data, include ACL information in your metadata using the acl_info field. For example:

    {
       "id": "<your-id>",
       "jsonData": "<JSON string>",
       "content": {
         "mimeType": "<application/pdf or text/html>",
         "uri": "gs://<your-gcs-bucket>/directory/filename.pdf"
       },
       "acl_info": {
         "readers": [
           {
             "principals": [
               { "group_id": "group_1" },
               { "user_id": "user_1" }
             ]
           }
         ]
       }
     }
    

    For more information about unstructured data with metadata, see the Unstructured data section of Prepare data for ingesting.

  2. When following the steps for data store creation in Create a search data store, you can enable access control by doing the following in either the console or using the API:

    • Console: When creating a data store, select This data store contains access control information during data store creation.
    • API: When creating data store, include the flag "aclEnabled": "true" in your JSON payload.
  3. When following the steps for data import in Create a search data store, make sure to do the following:

    • Upload your metadata with ACL information from the same bucket as your unstructured data
    • If using the API, set GcsSource.dataSchema to document

Structured data from Cloud Storage

When setting up a data store for structured data from Cloud Storage, you need to also upload ACL metadata and set the data store as access controlled:

  1. When preparing your data, include ACL information in your metadata using the acl_info field. For example:

    {
       "id": "<your-id>",
       "jsonData": "<JSON string>",
       "acl_info": {
         "readers": [
           {
             "principals": [
               { "group_id": "group_1" },
               { "user_id": "user_1" }
             ]
           }
         ]
       }
     }
    
  2. When following the steps for data store creation in Create a search data store, you can enable access control by doing the following in either the console or using the API:

    • Console: When creating a data store, select This data store contains access control information during data store creation.
    • API: When creating data store, include the flag "aclEnabled": "true" in your JSON payload.
  3. When following the steps for data import in Create a search data store, make sure to do the following:

    • Upload your metadata with ACL information from the same bucket as your unstructured data
    • If using the API, set GcsSource.dataSchema to document

Unstructured data from BigQuery

When setting up a data store for unstructured data from BigQuery, you need to set the data store as access controlled and provide ACL metadata using a predefined schema for Vertex AI Search:

  1. When preparing your data, specify the following schema. Don't use a custom schema.

    [
      {
        "name": "id",
        "mode": "REQUIRED",
        "type": "STRING",
        "fields": []
      },
      {
        "name": "jsonData",
        "mode": "NULLABLE",
        "type": "STRING",
        "fields": []
      },
      {
        "name": "content",
        "type": "RECORD",
        "mode": "NULLABLE",
        "fields": [
          {
            "name": "mimeType",
            "type": "STRING",
            "mode": "NULLABLE"
          },
          {
            "name": "uri",
            "type": "STRING",
            "mode": "NULLABLE"
          }
        ]
      }
      {
        "name": "acl_info",
        "type": "RECORD",
        "mode": "NULLABLE",
        "fields": [
          {
            "name": "readers",
            "type": "RECORD",
            "mode": "REPEATED",
            "fields": [
              {
                "name": "principals",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                  {
                    "name": "user_id",
                    "type": "STRING",
                    "mode": "NULLABLE"
                  },
                  {
                    "name": "group_id",
                    "type": "STRING",
                    "mode": "NULLABLE"
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
    
  2. Include your ACL metadata as a column in your BigQuery table.

  3. When following the steps in Create a search data store, enable access control in either the console or using the API:

    • Console: When creating a data store, select This data store contains access control information during data store creation.
    • API: When creating data store, include the flag "aclEnabled": "true" in your JSON payload.
  4. When following the steps for data import in Create a search data store, if using the API, set BigQuerySource.dataSchema to document.

Structured data from BigQuery

When setting up a data store for structured data from BigQuery, you need to set the data store as access controlled and provide ACL metadata using a predefined schema for Vertex AI Search:

  1. When preparing your data, specify the following schema. Don't use a custom schema.

    [
      {
        "name": "id",
        "mode": "REQUIRED",
        "type": "STRING",
        "fields": []
      },
      {
        "name": "jsonData",
        "mode": "NULLABLE",
        "type": "STRING",
        "fields": []
      },
      {
        "name": "acl_info",
        "type": "RECORD",
        "mode": "NULLABLE",
        "fields": [
          {
            "name": "readers",
            "type": "RECORD",
            "mode": "REPEATED",
            "fields": [
              {
                "name": "principals",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                  {
                    "name": "user_id",
                    "type": "STRING",
                    "mode": "NULLABLE"
                  },
                  {
                    "name": "group_id",
                    "type": "STRING",
                    "mode": "NULLABLE"
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
    
  2. Include your ACL metadata as a column in your BigQuery table.

  3. When following the steps in Create a search data store, enable access control in either the console or using the API:

    • Console: When creating a data store, select This data store contains access control information during data store creation.
    • API: When creating data store, include the flag "aclEnabled": "true" in your JSON payload.
  4. When following the steps for data import in Create a search data store, make sure to do the following:

    • If using the console, then when specifying the kind of data you're uploading, select JSONL for structured data with metadata
    • If using the API, set BigQuerySource.dataSchema to document

Preview results for apps with third-party access control

Previewing results in the console for apps with third-party access control requires you to sign in with your organization's credentials.

You can preview UI results in two ways:

Preview results in the Workforce Identity Federation console

Follow these steps to use the Workforce Identity Federation console to view results:

  1. In the Google Cloud console, go to the Agent Builder page.

    Agent Builder

  2. Click the name of the search app whose results you want to preview.

  3. Go to the Preview page.

  4. Click Preview with federated identity to go to the Workforce Identity Federation console.

  5. Enter your workforce pool provider and organization's credentials.

  6. Preview results for your app on the Preview page that appears.

    For more information about previewing your search results, see Get search results.

For more information about the Workforce Identity Federation console, see About the console (federated).

Grant search permissions to your users

To give your users the ability to search access-controlled data using your app, you need to grant access to users in your domain or workforce pool. Google recommends that you grant a custom IAM role to your user group.

  • Google Identity: If you use Google Identity, then Google recommends that you create a Google group that includes all employees that need to search. If you're a Google Workspace administrator, you can include all users in an organization in a Google group by following the steps in Add all your organization's users to a group.
  • Third-party identity provider: If you use an external identity provider, for example Okta or Azure AD, then add everyone in your workforce pool to a single group.

Google recommends that you create a custom IAM role to grant to your user group, using the following permissions:

  • discoveryengine.answers.get
  • discoveryengine.servingConfigs.answer
  • discoveryengine.servingConfigs.search
  • discoveryengine.sessions.get

For more information about permissions for Vertex AI Agent Builder resources using Identity and Access Management (IAM), see Access control with IAM.

For more information about custom roles, see the Custom roles in the IAM documentation.

Authorize the search widget

If you want to deploy a search widget for an access-controlled app, follow these steps:

  1. Grant the Discovery Engine Viewer role to users in your domain or workforce pool who need to make search API calls.

  2. Generate authorization tokens to pass to your widget:

  3. Follow the steps in Add a widget with an authorization token to pass the token to your widget.

Turn on the web app

The web app is a dedicated site generated by Vertex AI Search where you and any other users with sign-in credentials can use your search app.

To provide the search app to users without needing integrate the search widget or the search API on your own application, you can provide the web app URL to your users.

Follow these steps to turn on the web app:

  1. In the Google Cloud console, go to the Agent Builder page.

    Agent Builder

  2. Click the name of the search app to create a web app for.

    The search app must be associated with at least one data source with access control. For more information, see Configure a data source with access control.

  3. Go to the Integration > UI tab.

  4. Click Enable the web app.

  5. If you're using workforce identity federation, then select a workforce pool provider.

  6. Click the link to your web app.

  7. Enter your workforce pool provider and organization's credentials.

  8. Preview results for your app.

  9. To configure results for the web app, go to Configure results for the search widget. Any configurations for the widget also apply to the web app.

  10. Optional: To provide the search app to your users through this dedicated web app, copy the URL and send it to users who have sign-in credentials. They can bookmark the web app URL and go to it to use your search app.

For more information about getting search results, see Get search results.