Skip to content

Conversation

@rbiseck3
Copy link
Contributor

Description

Now that the ingest code has been emitting deprecation warnings for several versions, this PR removes all traces of ingest from the open source unstructured repo.

Copy link
Contributor

@MthwRobinson MthwRobinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as long as tests pass!

Copy link
Contributor

@cragwolfe cragwolfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, wait a second. these tests have also been valuable for unstructured, not just ingest. can we keep CI intact by pip installing unstructured-ingest and keeping the "fixtures" (expected outputs)?

Copy link
Contributor

@christinestraub christinestraub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe ingest codes are used by CI ingest test jobs in both the open-source and core-product repositories.

@rbiseck3
Copy link
Contributor Author

rbiseck3 commented Sep 4, 2024

@cragwolfe if we want to keep some tests in this repo, how do we maintain parity as new connectors are added and new tests for these connectors are added to the new repo? Will we need to update this repo to have those tests as well? This would only persist the tests that existed here before we moved the ingest work to the new repo.

@cragwolfe
Copy link
Contributor

@cragwolfe if we want to keep some tests in this repo, how do we maintain parity as new connectors are added and new tests for these connectors are added to the new repo? Will we need to update this repo to have those tests as well? This would only persist the tests that existed here before we moved the ingest work to the new repo.

yes, my main concern is not about adding connectors in the future, it is preserving the tests we have (which are basically two tests in one: output correctness and ingest smoke test). in practice, this means that new ingest tests which relied on new connectors wouldn't immediately be added to this repo, at least not until a pypi package was available with the new connector (or merged to main if we just want to clone the repo in tests). or, if we feel like the separate unstructured-ingest repo tests are sufficient end-to-end for the new connectors, we can skip adding them to this repo.

@rbiseck3
Copy link
Contributor Author

rbiseck3 commented Sep 5, 2024

I advocate for using the ingest tests that have been moved over to the new ingest repo as a valid smoke test since the version of unstructured is pinned there for testing. This alleviated the burden of matching new patterns and connectors in both places as they get added to the new repo. If for any reason, the ingest tests break due to a bump in the unstructured repo, we can work to remedy that. This will also make the CI process in this repo more streamline for developers given the overhead running the ingest tests add.

@rbiseck3 rbiseck3 requested a review from cragwolfe September 9, 2024 13:28
badGarnet pushed a commit that referenced this pull request Oct 15, 2024
### Description
Alternative to #3572
but maintaining all ingest tests, running them by pulling in the latest
version of unstructured-ingest.

---------

Co-authored-by: ryannikolaidis <[email protected]>
Co-authored-by: rbiseck3 <[email protected]>
Co-authored-by: Christine Straub <[email protected]>
Co-authored-by: christinestraub <[email protected]>
@scanny scanny closed this Dec 17, 2024
@scanny scanny deleted the roman/drop-ingest branch December 17, 2024 19:19
temp-adelyn pushed a commit to temp-adelyn/unstructured that referenced this pull request Mar 3, 2025
### Description
Alternative to Unstructured-IO#3572
but maintaining all ingest tests, running them by pulling in the latest
version of unstructured-ingest.

---------

Co-authored-by: ryannikolaidis <[email protected]>
Co-authored-by: rbiseck3 <[email protected]>
Co-authored-by: Christine Straub <[email protected]>
Co-authored-by: christinestraub <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants