Skip to content

Tags: Unstructured-IO/unstructured-ingest

Tags

1.2.28

Toggle 1.2.28's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
limit opensearch-py to below 3.0 (#618)

1.2.27

Toggle 1.2.27's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
add iam auth to opensearch (#614)

1.2.25

Toggle 1.2.25's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Potter/fix elasticsearch no fields (#613)

1.2.24

Toggle 1.2.24's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
remove skip existing in release (#612)

Removes the skip-existing flag from Azure Artifacts upload in the
release workflow

1.2.23

Toggle 1.2.23's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: Add binary_encode_vectors flag to AstraDBUploader (#609)

AstraDB uploads vectors with their own binary format by default. This is
more efficient, but it makes it hard to view and work with vectors
directly in the UI. We can add a flag to turn off this encoding if
you're willing to take the performance hit.

Tried adding an integration test, but vectors are also read in binary
format so this doesn't verify anything. Instead, let's start a unit test
suite for Astra to confirm that the option is being set.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Adds a flag to disable binary-encoded vectors in AstraDB uploads and
tests the behavior; bumps version to 1.2.22.
> 
> - **AstraDB Uploader**:
> - Add `binary_encode_vectors` flag to `AstraDBUploaderConfig` (default
`True`).
> - In `run_data`, when set to `False`, use `astrapy`
`APIOptions`/`SerdesOptions` via `with_options` to disable binary vector
encoding before inserts.
> - **Tests**:
> - New unit tests in `test/unit/processes/connectors/test_astradb.py`
verifying `with_options` is called when encoding disabled and not called
by default.
> - **Versioning**:
>   - Bump `__version__` to `1.2.22` and update `CHANGELOG.md`.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
590a0ef. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

1.2.22

Toggle 1.2.22's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: Add binary_encode_vectors flag to AstraDBUploader (#609)

AstraDB uploads vectors with their own binary format by default. This is
more efficient, but it makes it hard to view and work with vectors
directly in the UI. We can add a flag to turn off this encoding if
you're willing to take the performance hit.

Tried adding an integration test, but vectors are also read in binary
format so this doesn't verify anything. Instead, let's start a unit test
suite for Astra to confirm that the option is being set.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Adds a flag to disable binary-encoded vectors in AstraDB uploads and
tests the behavior; bumps version to 1.2.22.
> 
> - **AstraDB Uploader**:
> - Add `binary_encode_vectors` flag to `AstraDBUploaderConfig` (default
`True`).
> - In `run_data`, when set to `False`, use `astrapy`
`APIOptions`/`SerdesOptions` via `with_options` to disable binary vector
encoding before inserts.
> - **Tests**:
> - New unit tests in `test/unit/processes/connectors/test_astradb.py`
verifying `with_options` is called when encoding disabled and not called
by default.
> - **Versioning**:
>   - Bump `__version__` to `1.2.22` and update `CHANGELOG.md`.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
590a0ef. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

1.2.21

Toggle 1.2.21's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fix databricks sdk version (#605)

<!-- CURSOR_SUMMARY -->
> [!NOTE]
> Pins/updates Databricks SDK and adjusts the Databricks volumes
connector, with version bump, changelog, test fixtures, and lockfile
updates.
> 
> - **Databricks**:
> - Update
`unstructured_ingest/processes/connectors/databricks/volumes.py`
implementation for the volumes connector.
> - **Dependencies**:
> - Pin/update Databricks SDK version in `pyproject.toml` and
`requirements/connectors/databricks-volumes.txt`.
>   - Refresh dependency lockfile `uv.lock`.
> - **Release**:
> - Bump package version in `unstructured_ingest/__version__.py` and
update `CHANGELOG.md`.
> - **Tests**:
> - Update Notion database expected-result fixtures
(`test/integration/connectors/expected_results/notion_database/...`).
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
a4b4c2d. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

1.2.20

Toggle 1.2.20's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[ENG-677] weaviate precheck fix (#604)

1.2.19

Toggle 1.2.19's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix: pin aiobotocore (#603)

Bedrock installs of `unstructured-ingest` have been running into [this
issue](aio-libs/aiobotocore#1414) caused by an
incompatibility between `aiobotocore[boto3]` version 2.24.1 and the
latest version of `botocore`. It looks like `aiobotocore` has a
[fix](aio-libs/aiobotocore#1409) in the works,
so I'm hopeful this issue will be resolved by the next version.

This PR pins the `bedrock` install to avoid the incompatible version of
`aiobotocore`.

1.2.18

Toggle 1.2.18's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: add configurable Bedrock inference profile support (#601)

## Summary

Add support for AWS Bedrock inference profiles configurable via
`BEDROCK_INFERENCE_PROFILE_ID` environment variable.

- Add `inference_profile_id` field to `BedrockEmbeddingConfig`
- Update `invoke_model` calls to use inference profile when configured
- Add comprehensive unit tests for inference profile functionality
- Maintain full backward compatibility when not configured

## Changes

### Core Implementation
- Added `inference_profile_id` field to `BedrockEmbeddingConfig` with
env var support
- Updated both sync and async `invoke_model` calls to conditionally
include `inferenceProfileId` parameter
- Enhanced TYPE_CHECKING client signatures for proper typing

### Testing
- Added unit tests for default behavior (None)
- Added unit tests for environment variable loading
- Added unit tests for manual configuration override
- All existing tests continue to pass

## Usage

Users can configure inference profiles in three ways:

```bash
# Environment variable (recommended)
export BEDROCK_INFERENCE_PROFILE_ID="arn:aws:bedrock:us-west-2:123456789012:inference-profile/my-profile"
```

```python
# Manual configuration
config = BedrockEmbeddingConfig(
    inference_profile_id="arn:aws:bedrock:us-west-2:123456789012:inference-profile/my-profile"
)
```

```python
# Default behavior (disabled)
config = BedrockEmbeddingConfig()  # inference_profile_id is None
```

## Test Plan

- [x] All existing Bedrock unit tests pass
- [x] New inference profile unit tests pass
- [x] Environment variable loading works correctly
- [x] Manual configuration override works
- [x] Backward compatibility maintained

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <[email protected]>