-
Notifications
You must be signed in to change notification settings - Fork 3.2k
feat(aws): add s3 support to input, storage, output, cache, etc. #1830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@microsoft-github-policy-service agree |
…repend remote uri
* Fix fnllm version * Semver
|
Can you add the option to enter the endpoint URL to the boto3 client as well so that storage to other platforms such as minIO is also possible through the S3 API? |
Done: f1fd55d |
|
Please review @natoverse |
30380b4 to
4dcc89d
Compare
6272869 to
4040d4b
Compare
|
@natoverse @AlonsoGuevara review please? |
|
What is the status of this PR? We run out infra on AWS, so would be cool to have this functionality |
|
Unless you review this PR soon, I'm going to close without merging. I am now getting conflicts too numerous and too complex to resolve cleanly. @natoverse @AlonsoGuevara |
- Resolved conflicts in graphrag/config/defaults.py by accepting upstream changes and removing S3-specific fields from OutputDefaults and UpdateIndexOutputDefaults - Resolved conflicts in graphrag/config/enums.py by renaming OutputType to StorageType and removing InputType as per upstream changes - Resolved conflicts in graphrag/index/input/factory.py by accepting upstream simplification that uses passed storage parameter - Removed graphrag/config/models/output_config.py as it was deleted upstream and replaced with generic StorageConfig - Updated graphrag/config/models/graph_rag_config.py to use new enum names and fix S3 validation logic to work with new config structure - Fixed graphrag/storage/factory.py to use StorageType instead of OutputType - Added proper type annotations to dataclass fields in defaults.py to fix linting errors - Maintained S3 support in InputConfig, CacheConfig, and ReportingConfig where it's still available
|
Closing due to inactivity. |
Description
This PR adds s3 integration to GraphRAG; support both AWS s3 and s3-like services (via
endpoint_url; minio, etc.).Related Issues
#1306
Proposed Changes
graphrag/storage/s3_pipeline_storage.py)graphrag/callbacks/s3_workflow_callbacks.py)graphrag/config/prompt_getter.py)docs/config/s3.md)Checklist
Additional Notes