Skip to content

feat: add centralized retry mechanism for LLM providers #13793

Open
tdakanalis wants to merge 1 commit intoinfiniflow:mainfrom
tdakanalis:feat/centralized-llm-retry-mechanism
Open

feat: add centralized retry mechanism for LLM providers #13793
tdakanalis wants to merge 1 commit intoinfiniflow:mainfrom
tdakanalis:feat/centralized-llm-retry-mechanism

Conversation

@tdakanalis
Copy link

@tdakanalis tdakanalis commented Mar 25, 2026

What problem does this PR solve?

This PR solves the problem of duplicated and inconsistent retry logic across LLM model implementations.

Background:

Previously, each LLM model class (embedding, CV, rerank, TTS, OCR, speech2text) had its own ad-hoc retry implementation, leading to:

  • Duplicated code across 10+ files
  • Inconsistent retry behavior between different model providers
  • Difficult maintenance when changing retry settings
  • Risk of bugs when retry logic wasn't applied uniformly

Solution:

  • Extract retry logic into a centralized rag/llm/retry.py module with:
    • @ retry decorator for sync methods (retries on transient errors, re-raises on failure)
    • @ retry_or_fallback decorator for methods that return error tuples
    • is_retryable() - determines if an error is transient (rate limits, server errors)
    • classify_error() - categorizes exceptions into LLMErrorCode for detailed logging
    • async_handle_exception() - shared async error handler
  • Apply decorators consistently across all embedding, CV, rerank, TTS, OCR, and speech2text models
  • Add comprehensive unit tests (76 tests)

Behavior preserved:

  • Same default retry settings (max_retries=5, base_delay=2.0)
  • Same retryable error signals (rate limits, 5xx server errors)
  • Same backoff jitter formula (base_delay * uniform(10, 150))

Type of change

  • New Feature (non-breaking change which adds functionality)
  • Refactoring

Extract duplicated retry logic from chat_model.py into a new rag/llm/retry.py
module with decorators (@Retry, @retry_or_fallback) for consistent error handling
across all LLM providers (embedding, CV, rerank, TTS, OCR, speech2text).

Key changes:
- New rag/llm/retry.py module with:
  - @Retry: decorator for sync methods that retries on transient errors
  - @retry_or_fallback: decorator that returns fallback value on failure
  - is_retryable(): check if exception is transient
  - classify_error(): categorize exceptions into LLMErrorCode
  - async_handle_exception(): shared async error handler
- Updated all LLM model classes to use centralized retry decorators
- Added exception handling to caller sites (parser.py, conversation_app.py)
- Fixed LLMErrorCode.ERROR_MAX_ROUNDS value for consistency
- Added comprehensive unit tests in test/unit_test/rag/llm/test_retry.py

Behavior preserved:
- Same default retry settings (max_retries=5, base_delay=2.0)
- Same retryable error signals (rate limits, 5xx server errors)
- Same backoff jitter formula (base_delay * uniform(10, 150))
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. ☯️ refactor Pull request that refactor/refine code 🌈 python Pull requests that update Python code 💞 feature Feature request, pull request that fullfill a new feature. 🧪 test Pull requests that update test cases. labels Mar 25, 2026
@yingfeng yingfeng requested a review from yongtenglei March 25, 2026 14:46
@yingfeng yingfeng added the ci Continue Integration label Mar 25, 2026
@yingfeng yingfeng marked this pull request as draft March 25, 2026 14:46
@yingfeng yingfeng marked this pull request as ready for review March 25, 2026 14:46
@codecov
Copy link

codecov bot commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.72%. Comparing base (24fcd6b) to head (99c1cf6).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #13793   +/-   ##
=======================================
  Coverage   96.72%   96.72%           
=======================================
  Files          10       10           
  Lines         702      702           
  Branches      112      112           
=======================================
  Hits          679      679           
  Misses          5        5           
  Partials       18       18           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continue Integration 💞 feature Feature request, pull request that fullfill a new feature. 🌈 python Pull requests that update Python code ☯️ refactor Pull request that refactor/refine code size:XXL This PR changes 1000+ lines, ignoring generated files. 🧪 test Pull requests that update test cases.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants