-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(alloydb): Added generate batch embeddings sample #12721
feat(alloydb): Added generate batch embeddings sample #12721
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
fdcbe83
to
93aab38
Compare
"id": "D3FUBaXIUquR" | ||
}, | ||
"source": [ | ||
"This runs the complete embeddings workflow:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theres a bit of a disconnect between the set up and running. In the "Create the embeddings workflow" section I would add some context on that you are setting up the functions you will be using. We also need to prepare the user more for the idea of generating embeddings for multiple columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some more information under the Building an Embeddings Workflow
heading as well as added more information on the dataset about what columns are to be embedded.
Let me know what you think!
")\n", | ||
"\n", | ||
"# Update the database with the generated embeddings concurrently\n", | ||
"await batch_update_rows_concurrently(\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user I would kind of expect just to run 1 method to generate the embeddings, compared to having to get and batch the source data then generate my embeddings, then updating the database. I have to copy around the variable cols_to_embed a lot. We might be able to simplify this devex more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made some changes in the code which would enhance the user experience by letting users declare cols to embed (and other variables) and using the run_embeddings_workflow
directly.
Another alternative is to let each function use the global cols_to_embed variable and eliminate it's argument. It could make the code harder to maintain and understand.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to keeping it as an argument
93aab38
to
4cc34e4
Compare
|
Configured production table, let's try to /gcbrun this again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending confirmation of sample use case (offline)
# This sample creates a secure two-service application running on Cloud Run. | ||
# This test builds and deploys the two secure services | ||
# to test that they interact properly together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest removing this bitrot comment, and add a note about data persistence.
# This sample creates a secure two-service application running on Cloud Run. | |
# This test builds and deploys the two secure services | |
# to test that they interact properly together. | |
# Maintainer Note: this sample presumes data exists in ALLOYDB_TABLE_NAME within the ALLOYDB_(cluster/instance/database) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file might be better suited as e2e_test.py
for simplicity, as well (any _test.py
file will be picked up by pytest
, the exact name doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with the suggestion that we might want to test on python 3.10 (currently the version Colab runs)
Description
Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.
Checklist
nox -s py-3.9
(see Test Environment Setup)nox -s lint
(see Test Environment Setup)