Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(alloydb): Added generate batch embeddings sample #12721

Merged

Conversation

twishabansal
Copy link
Contributor

@twishabansal twishabansal commented Oct 23, 2024

Description

Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.

Checklist

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@product-auto-label product-auto-label bot added samples Issues that are directly related to samples. api: notebooks Issues related to the Vertex AI Workbench API. labels Oct 23, 2024
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
@twishabansal twishabansal force-pushed the generate_batch_embeddings branch 2 times, most recently from fdcbe83 to 93aab38 Compare October 24, 2024 18:05
alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
"id": "D3FUBaXIUquR"
},
"source": [
"This runs the complete embeddings workflow:\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theres a bit of a disconnect between the set up and running. In the "Create the embeddings workflow" section I would add some context on that you are setting up the functions you will be using. We also need to prepare the user more for the idea of generating embeddings for multiple columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some more information under the Building an Embeddings Workflow heading as well as added more information on the dataset about what columns are to be embedded.

Let me know what you think!

alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
")\n",
"\n",
"# Update the database with the generated embeddings concurrently\n",
"await batch_update_rows_concurrently(\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user I would kind of expect just to run 1 method to generate the embeddings, compared to having to get and batch the source data then generate my embeddings, then updating the database. I have to copy around the variable cols_to_embed a lot. We might be able to simplify this devex more.

Copy link
Contributor Author

@twishabansal twishabansal Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made some changes in the code which would enhance the user experience by letting users declare cols to embed (and other variables) and using the run_embeddings_workflow directly.

Another alternative is to let each function use the global cols_to_embed variable and eliminate it's argument. It could make the code harder to maintain and understand.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to keeping it as an argument

alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
@twishabansal twishabansal marked this pull request as ready for review November 6, 2024 16:53
@twishabansal twishabansal requested review from a team as code owners November 6, 2024 16:53
@glasnt glasnt assigned glasnt and unassigned engelke Nov 14, 2024
@glasnt glasnt added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 14, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 14, 2024
@glasnt glasnt added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 19, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 19, 2024
@glasnt
Copy link
Contributor

glasnt commented Nov 20, 2024

  • You will need to add a copy of noxfile_config.py from the root folder, and adjust it to specify what python versions you want to test. (By default, everything but Python 3.8 and Python 3.12 are ignored)
  • I've been able to verify that I can, as a Google Cloud project owner on a project I own, run the notebook if I import it into Colab Enterprise, as is.
  • I've also been able to confirm I get the same or similar errors when running pytest locally.

I'm still working out how the testing harness works, because I can add broken commands to cells, and not stop the process earlier.

@glasnt
Copy link
Contributor

glasnt commented Nov 20, 2024

Configured production table, let's try to /gcbrun this again.

@glasnt glasnt added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 20, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 20, 2024
Copy link
Contributor

@glasnt glasnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending confirmation of sample use case (offline)

Comment on lines 15 to 17
# This sample creates a secure two-service application running on Cloud Run.
# This test builds and deploys the two secure services
# to test that they interact properly together.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest removing this bitrot comment, and add a note about data persistence.

Suggested change
# This sample creates a secure two-service application running on Cloud Run.
# This test builds and deploys the two secure services
# to test that they interact properly together.
# Maintainer Note: this sample presumes data exists in ALLOYDB_TABLE_NAME within the ALLOYDB_(cluster/instance/database)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file might be better suited as e2e_test.py for simplicity, as well (any _test.py file will be picked up by pytest, the exact name doesn't matter.

@glasnt glasnt assigned glasnt and unassigned glasnt Nov 20, 2024
Copy link
Contributor

@kurtisvg kurtisvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with the suggestion that we might want to test on python 3.10 (currently the version Colab runs)

@glasnt glasnt merged commit 08e0146 into GoogleCloudPlatform:main Nov 20, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: notebooks Issues related to the Vertex AI Workbench API. samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants