A robust, high-performance and user-friendly alternative to the traditional curl-based Stream Load.
- Parallel Loading: Split data files automatically and perform parallel loading
- Support for Multiple Files and Directories: Support multiple files and directories load with one shot
- Path Traversal Support: Support path traversal when the source files are in directories
- Resilience and Continuity: Resume loading from previous failures and cancellations
- Automatic Retry Mechanism: Retry automatically when failure
- Comprehensive and Concise Input Parameters
doris-streamloader --source_file={FILE_LIST} --url={FE_OR_BE_SERVER_URL}:{PORT} --header={STREAMLOAD_HEADER} --db={TARGET_DATABASE} --table={TARGET_TABLE}
FILE_LIST
: directory or file list, support * wildcardFE_OR_BE_SERVER_URL
&PORT
: Doris FE or BE hostname or IP and HTTP portSTREAMLOAD_HEADER
: supports all headers ascurl
Stream Load does,multiple headers are separated by '?'TARGET_DATABASE
&TARGET_TABLE
: indicate the target database and table where the data will be loaded
e.g.:
doris-streamloader --source_file="data.csv" --url="http://localhost:8330" --header="column_separator:|?columns:col1,col2" --db="testdb" --table="testtbl"
For additional details and options, refer to our comprehensive docs below.
To build Streamloader, ensure you have golang installed (version >= 1.19.9). For example, on CentOS:
yum install golang
Then, navigate to the doris-streamloader directory and execute:
cd doris-streamloader && sh build.sh