Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): Improve model sub-nodes error handling #11418

Conversation

burivuhster
Copy link
Contributor

@burivuhster burivuhster commented Oct 25, 2024

Summary

Problem

The current error handling in LLM sub-nodes has two main issues:

  1. Model sub-nodes sometimes appear as "running" even after failing
  2. n8n incorrectly indicate that errors originated in the root node, when they actually occurred in Model sub-nodes

Context

The error handling relies on custom exception classes (e.g., NodeOperationError). However, there's no straightforward mechanism to wrap model exceptions before they propagate to the root node.

Solution Approaches Explored

  1. Using N8nLlmTracing (callback handler passed to the LangChain model's contructor). Using handleLLMError callback handler allows to add proper error info to the sub-node and fix the infinite running issue. But it can not be used for wrapping the exception for the root node.
  2. onFailedAttempt callback can be provided during each model's LangChain class instantiation. It allows to intercept the retry logic built into LangChain, and re-throw the wrapped exceptions.

Implementation

To use the second approach, the new helper function makeN8nLlmFailedAttemptHandler was implemented. It catches LLM errors, wrap them with our custom error classes, provides some default error handling logic (interrupt running if it does not make sense to retry for the specific error), and also allows to customize error handling for the specific cases (for example to use a custom error message for the OpenAI's insufficient quota error).

Known issues

Couple of LangChain model classes do not use the AsyncCaller for their calls. It means, that the common retry logic built into LangChain does not work in such cases. The onFailedAttempt callback is not being called as well.
Thus, the described error wrapping logic does not work yet for the following models:

  • LMChatOllama
  • LmChatAwsBedrock
  • LmChatGoogleVertex

Example 1. Insufficient quota error (OpenAI)
Before:
image

After:
image

Example 2. Empty prompt (Anthropic)
Before:
image

After:
image

Example 3. Nested sub-node error
Before:
image

After:
image

Example 4. Nested sub-node error
Before:
image

After:
image

Related Linear tickets, Github issues, and Community forum posts

Review / Merge checklist

  • PR title and summary are descriptive. (conventions)
  • Docs updated or follow-up ticket created.
  • Tests included.
  • PR Labeled with release/backport (if the PR is an urgent fix that needs to be backported)

@n8n-assistant n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team ui Enhancement in /editor-ui or /design-system labels Oct 25, 2024
@jeanpaul
Copy link
Contributor

jeanpaul commented Nov 1, 2024

One thing I noticed, that's a difference now is that I don't really get a sense for what's wrong with my credential, whereas on master I do:

image

Now it just says something is wrong with my credentials:

image

Why did that change? I think getting the things to check directly in the toast is a better experience.

@jeanpaul
Copy link
Contributor

jeanpaul commented Nov 1, 2024

One thing I noticed, that's a difference now is that I don't really get a sense for what's wrong with my credential, whereas on master I do:

image Now it just says something is wrong with my credentials: image Why did that change? I think getting the things to check directly in the toast is a better experience.

I don't know why, but now I cannot reproduce this anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 326 is another occurrence of context.functionality -- can you check if that needs to change too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And same for packages/editor-ui/src/utils/expressions.ts line 63.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine as this is for another case, when nodes use "pairedItem" feature (e.g. referencing previous node's item value when the pairedItem connection was interrupted)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but you don't set context.functionality anymore now? does that functionality break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's being set in n8nLlmFailedAttemptHandler.ts when we wrap the error into NodeApiError. For the paired item exceptions – nothing changes, I guess it's being set in WorkflowDataProxy.ts and should not be affected by this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see your point now. I found that setting context.functionality to configuration-node on NodeApiError does nothing, as it checks for the functionality on the exception object itself. Now I see that the checks are different for the configuration-node functionality and pairedItem. Good catch, thank you!

…in-the-sub-node-even-if

# Conflicts:
#	packages/@n8n/nodes-langchain/nodes/llms/N8nLlmTracing.ts
@burivuhster burivuhster marked this pull request as ready for review November 7, 2024 08:26
@burivuhster burivuhster requested a review from jeanpaul November 7, 2024 08:28
Copy link
Contributor

@jeanpaul jeanpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! thanks for addressing the issues with context.functionality, and for adding the extra tests! 👍

Copy link

cypress bot commented Nov 8, 2024

n8n    Run #7788

Run Properties:  status check passed Passed #7788  •  git commit 188f76131f: 🌳 🖥️ browsers:node18.12.0-chrome107 🤖 burivuhster 🗃️ e2e/*
Project n8n
Branch Review ai-298-errors-happen-in-the-root-node-not-in-the-sub-node-even-if
Run status status check passed Passed #7788
Run duration 04m 22s
Commit git commit 188f76131f: 🌳 🖥️ browsers:node18.12.0-chrome107 🤖 burivuhster 🗃️ e2e/*
Committer Eugene Molodkin
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 1
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 461
View all changes introduced in this branch ↗︎

Copy link
Contributor

github-actions bot commented Nov 8, 2024

✅ All Cypress E2E specs passed

@burivuhster burivuhster merged commit 57467d0 into master Nov 8, 2024
35 checks passed
@burivuhster burivuhster deleted the ai-298-errors-happen-in-the-root-node-not-in-the-sub-node-even-if branch November 8, 2024 09:17
@github-actions github-actions bot mentioned this pull request Nov 13, 2024
@janober
Copy link
Member

janober commented Nov 13, 2024

Got released with [email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team Released ui Enhancement in /editor-ui or /design-system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants