Optimize S3 Bandwidth Usage via Two-Phase Progressive Leaf Search for Heavy Queries

**Description of the Problem**
When executing a search query that hits a massive number of splits (e.g., in a 50PB log scale deployment with a broad time range and weak filters), Quickwit exhibits extremely heavy continuous reads from S3. This pattern can saturate the network ingress bandwidth on the search nodes. 

Upon analyzing the S3 access pattern (`s3_compatible_storage.rs`, `bundle_storage.rs`, and `SplitCache`), it's clear that Quickwit is already highly optimized (utilizing HTTP Range Requests efficiently). The problem stems from the concurrency logic in the `leaf.rs` module.

Currently, in `single_doc_mapping_leaf_search`, **all matched uncached splits** are scheduled and dispatched concurrently via `run_local_search_tasks`. 
While Quickwit has a great `CanSplitDoBetter` pruning mechanism, it has limited effectiveness in this extreme scenario. Because hundreds or thousands of splits are spawned and enter their warmup phase (triggering S3 I/O) almost simultaneously, the pruning state (`worst_hit_found`) doesn't get populated fast enough to prevent unnecessary splits from aggressively downloading their `fast_fields` and dictionary terms. 

As a result, an enormous amount of unnecessary S3 `Range Request` bandwidth is wasted on splits that would have been pruned anyway if they were processed slightly later.

**Proposed Solution: Progressive Two-Phase Search**
To maximize the power of the existing `CanSplitDoBetter` mechanism without impacting normal query latency (where `split_count` is reasonable), we propose modifying `single_doc_mapping_leaf_search` to implement a "Two-Phase" or "Progressive" execution path when the split count is large.

**Logic implementation:**
1. **New Configuration**: Introduce `progressive_search_batch_size: usize` in `SearcherConfig` (defaulting to 20 or similar). If set to 0, it falls back to the current behavior.
2. **Phase 1 (Priority Batch)**: In `leaf.rs`, we take the first `batch_size` splits (which are already sorted by `optimize_split_order`, meaning they are highly relevant) and execute them. We `tokio::join!` them and wait for completion.
3. **Data Population**: Upon Phase 1 completion, the `CanSplitDoBetter` structure (which holds the `incremental_merge_collector`) is filled with concrete `worst_hit` scores from the most relevant splits.
4. **Phase 2 (Remaining Splits)**: We then dispatch the remaining splits. Because `CanSplitDoBetter` is now populated with a strong baseline, the `simplify_search_request` function can aggressively convert unnecessary splits into count-only queries or completely skip them (if `count` is not needed), preventing them from making any heavy S3 I/O requests.

**Impact & Safety:**
- Paginating (`count` accuracy) remains unaffected. `simplify_search_request` correctly respects `CountAll` and safely converts splits to `count-only` (bypassing expensive field data downloads) instead of dropping them.
- This effectively bounds the peak S3 GET request burst length to exactly what is needed for the Top-K.
- It degrades seamlessly back to the original concurrency behavior if the query matches fewer than `progressive_search_batch_size` splits.

**Environment details:**
- **OS**: Linux / Kubernetes
- **Quickwit Version**: Latest `main`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize S3 Bandwidth Usage via Two-Phase Progressive Leaf Search for Heavy Queries #6210

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize S3 Bandwidth Usage via Two-Phase Progressive Leaf Search for Heavy Queries #6210

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions