Fine tune an LLM model to answer questions from your documents.
This solution showcases how to extract question & answer pairs out of documents using Generative AI. It provides an end-to-end demonstration of QA extraction and fine-tuning of a large language model (LLM) on Vertex AI. Along the way, the solution utilizes Document AI Character Recognition (OCR), Firestore, Vector Search, Vertex AI Studio, and Cloud Functions.
- Uploading a new document triggers the webhook Cloud Function.
- Document AI extracts the text from the document file.
- Indexes the document text in Vector Search.
- A Vertex AI Large Language Model generates questions and answers from the document text.
- The questions and answers pairs are saved into Firestore.
- A fine tuning dataset is generated from the Firestore database.
- After human validation, a fine tuned Large Language Model is deployed and saved in the Model Registry.
Configuration: 2 mins Deployment: 6 mins
Name | Description | Type | Default | Required |
---|---|---|---|---|
disable_services_on_destroy | Whether project services will be disabled when the resources are destroyed. | bool |
false |
no |
documentai_location | Document AI location, see https://cloud.google.com/document-ai/docs/regions | string |
"us" |
no |
firestore_location | Firestore location, see https://firebase.google.com/docs/firestore/locations | string |
"nam5" |
no |
labels | A set of key/value label pairs to assign to the resources deployed by this blueprint. | map(string) |
{} |
no |
project_id | The Google Cloud project ID to deploy to | string |
n/a | yes |
region | The Google Cloud region to deploy to | string |
"us-central1" |
no |
unique_names | Whether to use unique names for resources | bool |
false |
no |
Name | Description |
---|---|
bucket_docs_name | The name of the docs bucket created |
bucket_main_name | The name of the main bucket created |
docs_index_endpoint_id | The ID of the docs index endpoint |
docs_index_id | The ID of the docs index |
documentai_processor_id | The full Document AI processor path ID |
firestore_database_name | The name of the Firestore database created |
neos_tutorial_url | The URL to launch the in-console tutorial for the Generative AI Knowledge Base solution |
predictions_notebook_url | The URL to open the notebook for model predictions in Colab |
unique_id | The unique ID for this deployment |
These sections describe requirements for using this module.
The following dependencies must be available:
- Terraform v0.13
- Terraform Provider for GCP plugin v5.8
A service account with the following roles must be used to provision the resources of this module:
- Storage Admin:
roles/storage.admin
The Project Factory module and the IAM module may be used in combination to provision a service account with the necessary roles applied.
A project with the following APIs enabled must be used to host the resources of this module:
- Google Cloud Storage JSON API:
storage-api.googleapis.com
The Project Factory module can be used to provision a project with the necessary APIs enabled.
Refer to the contribution guidelines for information on contributing to this module.
Please see our security disclosure process.