Tags: Unstructured-IO/unstructured-ingest
Tags
feat: Add binary_encode_vectors flag to AstraDBUploader (#609) AstraDB uploads vectors with their own binary format by default. This is more efficient, but it makes it hard to view and work with vectors directly in the UI. We can add a flag to turn off this encoding if you're willing to take the performance hit. Tried adding an integration test, but vectors are also read in binary format so this doesn't verify anything. Instead, let's start a unit test suite for Astra to confirm that the option is being set. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Adds a flag to disable binary-encoded vectors in AstraDB uploads and tests the behavior; bumps version to 1.2.22. > > - **AstraDB Uploader**: > - Add `binary_encode_vectors` flag to `AstraDBUploaderConfig` (default `True`). > - In `run_data`, when set to `False`, use `astrapy` `APIOptions`/`SerdesOptions` via `with_options` to disable binary vector encoding before inserts. > - **Tests**: > - New unit tests in `test/unit/processes/connectors/test_astradb.py` verifying `with_options` is called when encoding disabled and not called by default. > - **Versioning**: > - Bump `__version__` to `1.2.22` and update `CHANGELOG.md`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 590a0ef. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
feat: Add binary_encode_vectors flag to AstraDBUploader (#609) AstraDB uploads vectors with their own binary format by default. This is more efficient, but it makes it hard to view and work with vectors directly in the UI. We can add a flag to turn off this encoding if you're willing to take the performance hit. Tried adding an integration test, but vectors are also read in binary format so this doesn't verify anything. Instead, let's start a unit test suite for Astra to confirm that the option is being set. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Adds a flag to disable binary-encoded vectors in AstraDB uploads and tests the behavior; bumps version to 1.2.22. > > - **AstraDB Uploader**: > - Add `binary_encode_vectors` flag to `AstraDBUploaderConfig` (default `True`). > - In `run_data`, when set to `False`, use `astrapy` `APIOptions`/`SerdesOptions` via `with_options` to disable binary vector encoding before inserts. > - **Tests**: > - New unit tests in `test/unit/processes/connectors/test_astradb.py` verifying `with_options` is called when encoding disabled and not called by default. > - **Versioning**: > - Bump `__version__` to `1.2.22` and update `CHANGELOG.md`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 590a0ef. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
Fix databricks sdk version (#605) <!-- CURSOR_SUMMARY --> > [!NOTE] > Pins/updates Databricks SDK and adjusts the Databricks volumes connector, with version bump, changelog, test fixtures, and lockfile updates. > > - **Databricks**: > - Update `unstructured_ingest/processes/connectors/databricks/volumes.py` implementation for the volumes connector. > - **Dependencies**: > - Pin/update Databricks SDK version in `pyproject.toml` and `requirements/connectors/databricks-volumes.txt`. > - Refresh dependency lockfile `uv.lock`. > - **Release**: > - Bump package version in `unstructured_ingest/__version__.py` and update `CHANGELOG.md`. > - **Tests**: > - Update Notion database expected-result fixtures (`test/integration/connectors/expected_results/notion_database/...`). > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit a4b4c2d. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
fix: pin aiobotocore (#603) Bedrock installs of `unstructured-ingest` have been running into [this issue](aio-libs/aiobotocore#1414) caused by an incompatibility between `aiobotocore[boto3]` version 2.24.1 and the latest version of `botocore`. It looks like `aiobotocore` has a [fix](aio-libs/aiobotocore#1409) in the works, so I'm hopeful this issue will be resolved by the next version. This PR pins the `bedrock` install to avoid the incompatible version of `aiobotocore`.
feat: add configurable Bedrock inference profile support (#601) ## Summary Add support for AWS Bedrock inference profiles configurable via `BEDROCK_INFERENCE_PROFILE_ID` environment variable. - Add `inference_profile_id` field to `BedrockEmbeddingConfig` - Update `invoke_model` calls to use inference profile when configured - Add comprehensive unit tests for inference profile functionality - Maintain full backward compatibility when not configured ## Changes ### Core Implementation - Added `inference_profile_id` field to `BedrockEmbeddingConfig` with env var support - Updated both sync and async `invoke_model` calls to conditionally include `inferenceProfileId` parameter - Enhanced TYPE_CHECKING client signatures for proper typing ### Testing - Added unit tests for default behavior (None) - Added unit tests for environment variable loading - Added unit tests for manual configuration override - All existing tests continue to pass ## Usage Users can configure inference profiles in three ways: ```bash # Environment variable (recommended) export BEDROCK_INFERENCE_PROFILE_ID="arn:aws:bedrock:us-west-2:123456789012:inference-profile/my-profile" ``` ```python # Manual configuration config = BedrockEmbeddingConfig( inference_profile_id="arn:aws:bedrock:us-west-2:123456789012:inference-profile/my-profile" ) ``` ```python # Default behavior (disabled) config = BedrockEmbeddingConfig() # inference_profile_id is None ``` ## Test Plan - [x] All existing Bedrock unit tests pass - [x] New inference profile unit tests pass - [x] Environment variable loading works correctly - [x] Manual configuration override works - [x] Backward compatibility maintained 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <[email protected]>
PreviousNext