Ground responses for PaLM 2 models

To get started with model grounding in Generative AI on Vertex AI, you need to complete some prerequisites. These include creating a Vertex AI Search data source, enabling Enterprise edition for your datastore, and linking your datastore to your app in Vertex AI Search. The data source serves as the foundation for grounding text-bison and chat-bison in Vertex AI.

Vertex AI Search helps you get started with high-quality search or recommendations based on data that you provide. To learn more about Vertex AI Search, see the Introduction to Vertex AI Search.

Enable Vertex AI Search

  1. In the Google Cloud console, go to the Search & Conversation page.

    Search & Conversation

  2. Read and agree to the terms of service, then click Continue and activate the API.

Create a datastore in Vertex AI Search

To ground your models to your source data, you need to have prepared and saved your data to Vertex AI Search. To do this, you need to create a data store in Vertex AI Search.

If you are starting from scratch, you need to prepare your data for ingestion into Vertex AI Search. See Prepare data for ingesting to get started. Depending on the size of your data, ingestion can take several minutes to several hours. Only unstructured data stores are supported for grounding. After you've prepared your data for ingestion, you can Create a search data store. After you've successfully created a data store, Create a search app to link to it and Turn Enterprise edition on.

Ground the text-bison model

Grounding is available for the text-bison and chat-bison models. These following examples use the text-bison foundation model.

If using the API, you ground the text-bison when calling predict. To do this, you add the optional groundingConfig and reference your datastore location, and your datastore ID.

If you don't know your datastore ID, follow these steps:

  1. In the Google Cloud console, go to the Vertex AI Search page and in the navigation menu, click Data stores. Go to the Data stores page
  2. Click the name of your datastore.
  3. On the Data page for your datastore, get the datastore ID.

REST

To test a text prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • PROMPT: A prompt is a natural language request submitted to a language model to receive a response back. Prompts can contain questions, instructions, contextual information, examples, and text for the model to complete or continue. (Don't add quotes around the prompt here.)
  • TEMPERATURE: The temperature is used for sampling during response generation, which occurs when topP and topK are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible.

    If the model returns a response that's too generic, too short, or the model gives a fallback response, try increasing the temperature.

  • MAX_OUTPUT_TOKENS: Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words.

    Specify a lower value for shorter responses and a higher value for potentially longer responses.

  • TOP_P: Top-P changes how the model selects tokens for output. Tokens are selected from the most (see top-K) to least probable until the sum of their probabilities equals the top-P value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5, then the model will select either A or B as the next token by using temperature and excludes C as a candidate.

    Specify a lower value for less random responses and a higher value for more random responses.

  • TOP_K: Top-K changes how the model selects tokens for output. A top-K of 1 means the next selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-K of 3 means that the next token is selected from among the three most probable tokens by using temperature.

    For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.

    Specify a lower value for less random responses and a higher value for more random responses.

  • SOURCE_TYPE The data source type that the model grounds to. Only Vertex AI Search is supported.
  • VERTEX_AI_SEARCH_DATA_STORE: The Vertex AI Search data store ID path.

    The VERTEX_AI_SEARCH_DATA_STORE must use the following format. Use the provided values for locations and collections: projects/{project_id}/locations/global/collections/default_collection/dataStores/{data_store_id}

    Note: The project ID in this data store ID path is your Vertex AI Search project ID.

HTTP method and URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-bison:predict

Request JSON body:

{
  "instances": [
    { "prompt": "PROMPT"}
  ],
  "parameters": {
    "temperature": TEMPERATURE,
    "maxOutputTokens": MAX_OUTPUT_TOKENS,
    "topP": TOP_P,
    "topK": TOP_K,
    "groundingConfig": {
      "sources": [
          {
              "type": "VERTEX_AI_SEARCH",
              "vertexAiSearchDatastore": "VERTEX_AI_SEARCH_DATA_STORE"
          }
      ]
    }
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-bison:predict"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-bison:predict" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Console

To ground a model from Vertex AI Studio, follow these instructions.

  1. Select the PaLM 2 for Text Bison or PaLM 2 for Chat Bison model card in the Model Garden. Go to the Model Garden
  2. From the model card, click Open prompt design. The Vertex AI Studio opens.
  3. From the parameters panel, select Advanced.
  4. Toggle the Enable Grounding option and select Customize.
  5. From the grounding source dropdown, select Vertex AI Search.
  6. Enter the Vertex AI Search data store path to your content. Path should follow this format: projects/{project_id}/locations/global/collections/default_collection/dataStores/{data_store_id}.
  7. Enter your prompt and click Submit.

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

import vertexai

from vertexai.language_models import GroundingSource, TextGenerationModel

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"
vertexai.init(project=PROJECT_ID, location="us-central1")

# TODO developer - override these parameters as needed:
parameters = {
    "temperature": 0.1,  # Temperature controls the degree of randomness in token selection.
    "max_output_tokens": 256,  # Token limit determines the maximum amount of text output.
    "top_p": 0.8,  # Tokens are selected from most probable to least until the sum of their probabilities equals the top_p value.
    "top_k": 20,  # A top_k of 1 means the selected token is the most probable among all tokens.
}

model = TextGenerationModel.from_pretrained("text-bison@002")

# TODO(developer): Update and un-comment below lines
# data_store_id = "datastore_123456789012345"
# data_store_location = "global"
if data_store_id and data_store_location:
    # Use Vertex AI Search data store
    grounding_source = GroundingSource.VertexAISearch(
        data_store_id=data_store_id, location=data_store_location
    )
else:
    # Use Google Search for grounding (Private Preview)
    grounding_source = GroundingSource.WebSearch()

response = model.predict(
    "What are the price, available colors, and storage size options of a Pixel Tablet?",
    grounding_source=grounding_source,
    **parameters,
)
print(f"Response from Model: {response.text}")
print(f"Grounding Metadata: {response.grounding_metadata}")

What's next