-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat/remove all traces of ingest code #3572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
MthwRobinson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as long as tests pass!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, wait a second. these tests have also been valuable for unstructured, not just ingest. can we keep CI intact by pip installing unstructured-ingest and keeping the "fixtures" (expected outputs)?
christinestraub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe ingest codes are used by CI ingest test jobs in both the open-source and core-product repositories.
|
@cragwolfe if we want to keep some tests in this repo, how do we maintain parity as new connectors are added and new tests for these connectors are added to the new repo? Will we need to update this repo to have those tests as well? This would only persist the tests that existed here before we moved the ingest work to the new repo. |
yes, my main concern is not about adding connectors in the future, it is preserving the tests we have (which are basically two tests in one: output correctness and ingest smoke test). in practice, this means that new ingest tests which relied on new connectors wouldn't immediately be added to this repo, at least not until a pypi package was available with the new connector (or merged to main if we just want to clone the repo in tests). or, if we feel like the separate unstructured-ingest repo tests are sufficient end-to-end for the new connectors, we can skip adding them to this repo. |
|
I advocate for using the ingest tests that have been moved over to the new ingest repo as a valid smoke test since the version of unstructured is pinned there for testing. This alleviated the burden of matching new patterns and connectors in both places as they get added to the new repo. If for any reason, the ingest tests break due to a bump in the unstructured repo, we can work to remedy that. This will also make the CI process in this repo more streamline for developers given the overhead running the ingest tests add. |
### Description Alternative to #3572 but maintaining all ingest tests, running them by pulling in the latest version of unstructured-ingest. --------- Co-authored-by: ryannikolaidis <[email protected]> Co-authored-by: rbiseck3 <[email protected]> Co-authored-by: Christine Straub <[email protected]> Co-authored-by: christinestraub <[email protected]>
### Description Alternative to Unstructured-IO#3572 but maintaining all ingest tests, running them by pulling in the latest version of unstructured-ingest. --------- Co-authored-by: ryannikolaidis <[email protected]> Co-authored-by: rbiseck3 <[email protected]> Co-authored-by: Christine Straub <[email protected]> Co-authored-by: christinestraub <[email protected]>
Description
Now that the ingest code has been emitting deprecation warnings for several versions, this PR removes all traces of ingest from the open source unstructured repo.