Skip to content

Tags: Unstructured-IO/unstructured

Tags

0.16.6

Toggle 0.16.6's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore: remove dev and release as 0.16.6 (#3793)

0.16.5

Toggle 0.16.5's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore: remove dev and release as 0.16.5 (#3775)

0.16.4

Toggle 0.16.4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: support pdf link extraction in hi_res strategy (#3753)

This PR aims to add support for link extraction in pdf `hi_res`
strategy. The `partition_pdf()` function now supports link extraction
when using the `hi_res` strategy, allowing users to extract hyperlinks
from PDF documents.

### Summary
- Added functionalities to support link extraction in hi_res flow
- Enhanced word extraction functionality used for link extraction in
both `fast` and `hi_res` flows, resulted in more correct `start_index`
and `text` in `links` metadata.
- Updated ingest fixture update workflow to not skip Astra DB source
test

### Testing
```
elements = partition_pdf(
    filename="example-docs/pdf/embedded-link.pdf",
    strategy="hi_res"
)
assert len(elements[0].metadata.links) == 3
```

---------

Co-authored-by: ryannikolaidis <[email protected]>
Co-authored-by: christinestraub <[email protected]>
Co-authored-by: cragwolfe <[email protected]>

0.16.3

Toggle 0.16.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[Merge] release to 0.16.3 (#3755)

- bump version to 0.16.3 based on Pluto's fix on layout parsing
- update unstructured-inference version to 0.8.1 in

0.16.2

Toggle 0.16.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
set version 0.16.2 (#3748)

0.16.1

Toggle 0.16.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Set version to 0.16.1 (#3745)

0.16.0

Toggle 0.16.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat/remove ingest code, use new dep for tests (#3595)

### Description
Alternative to #3572
but maintaining all ingest tests, running them by pulling in the latest
version of unstructured-ingest.

---------

Co-authored-by: ryannikolaidis <[email protected]>
Co-authored-by: rbiseck3 <[email protected]>
Co-authored-by: Christine Straub <[email protected]>
Co-authored-by: christinestraub <[email protected]>

0.15.14

Toggle 0.15.14's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
build(release): release commit for 0.15.14 (#3709)

### Summary
- cut release for version `0.15.14`
- ignore `vectara` ingest test due to a weird error occurring in:
https://github.com/Unstructured-IO/unstructured/actions/runs/11256744351/job/31317150581?pr=3709

0.15.13

Toggle 0.15.13's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix: correctly install mesa-gl for arm (#3647)

### Summary

Fixes the `arm64` image builds, which will be available again starting
in version `0.15.13`. A fix was implemented upstream in
Unstructured-IO/base-images#47 and a workaround
that installed `x86` packages in the `unstructured` repo was removed.

### Testing

See [this
job](https://github.com/Unstructured-IO/unstructured/actions/runs/10948943594/job/30401108059?pr=3647)
for a successful `arm64` build on the feature branch.

0.15.12

Toggle 0.15.12's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix: temporarily disable arm64 build (#3624)

### Summary

Per [this
job](https://github.com/Unstructured-IO/unstructured/actions/runs/10842120429/job/30087252047),
`arm64` builds are currently failing, likely because the workaround for
the broken `mesa-gl` package from the `wolfi` repository only works for
`amd64`. Temporarily disabling the `arm64` build in order to push out
the latest `amd64` image with security patches, then will revert and
work the fix for the `arm64` image.

- Unstructured-IO/base-images#44