Skip to content

Support multi-part uploads in AWS S3 sink connector #1053

@brandon-powers

Description

@brandon-powers

Issue Guidelines

Please review these questions before submitting any issue?

What version of the Stream Reactor are you reporting this issue for?

6.1.0, latest stable release.

Are you running the correct version of Kafka/Confluent for the Stream reactor release?

Yes.

Do you have a supported version of the data source/sink .i.e Cassandra 3.0.9?

Yes.

Have you read the docs?

Yes.

What is the expected behaviour?

AWS S3 sink connector adds configuration to support S3 part sizes for multi-part uploads, and implements it in the storage interface. The S3 client used supports it, though the current implementation doesn't call that flow.

What was observed?

File sizes produced in S3 have to be buffered in the tmpfs /tmp mount all at once. For example, if flush size is 100 MB, and there are 100 topic-partitions producing files on the worker, it would require 10 GB of RAM for the connector(s) per flush (excluding the Kafka consumer fetch request storage).

What is your Connect cluster configuration (connect-avro-distributed.properties)?

N/A

What is your connector properties configuration (my-connector.properties)?

N/A

Please provide full log files (redact and sensitive information)

N/A

References

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions