Skip to content

feat: Extend ErrorTracker with error grouping#1014

Merged
Pijukatel merged 10 commits intomasterfrom
error-tracker
Mar 5, 2025
Merged

feat: Extend ErrorTracker with error grouping#1014
Pijukatel merged 10 commits intomasterfrom
error-tracker

Conversation

@Pijukatel
Copy link
Copy Markdown
Collaborator

@Pijukatel Pijukatel commented Feb 24, 2025

Description

Adds error grouping to ErrorTracker.
ErrorTracker can be configured to use different grouping options.

Based on JS implementation https://github.com/apify/crawlee/blob/master/packages/core/src/crawlers/error_tracker.ts#L286
Differences from JS implementation:

  • showErrorCode option not migrated -> does not make sense in Python.
  • showFullStack option not migrated -> is not used in JS, no point migrating unused stuff.
  • getMostPopularErrors migrated under different name get_most_common_errors

Issues

Partially implements: #151
(ErrorSnapshotter will be in separate PR)

TODO:
 Message matching and replacing with wildcards
 Extra args - full stack and full message
 Tests
Todo:
Add get_most_popular_errors
Add extended configuration parameters
TODO:
Add extended configuration parameters
Do not port option for full trace - not used in JS version.
@github-actions github-actions bot added this to the 109th sprint - Tooling team milestone Feb 24, 2025
@github-actions github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Feb 24, 2025
@Pijukatel Pijukatel marked this pull request as ready for review February 24, 2025 13:20
Copy link
Copy Markdown
Collaborator

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks alright, just some minor nits

# No similar message found. Create new group.
self._errors[error_group_stack_trace][error_group_name].update([error_group_message])

def _get_traceback_text(self, error: Exception) -> str | None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the file and line number where the exception was thrown, isn't it? The name would suggest a complete stacktrace. I'd try and come up with something more fitting.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Comment on lines +11 to +12
ErrorMessageGroups = Counter[GroupName]
ErrorTypeGroups = dict[GroupName, ErrorMessageGroups]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that inlining these two definitions would actually make things more readable.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

def _create_generic_message(message_1: str | None, message_2: str | None) -> str | None:
"""Create a generic error message from two messages, if they are similar enough.

Different parts of similar messages are replaced by `_`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda surprised by the choice of _ here - I think it might get mixed up with legit underscores in python error messages.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe *** will be better in Python context

@janbuchar janbuchar self-requested a review March 5, 2025 09:09
@Pijukatel Pijukatel merged commit 561de5c into master Mar 5, 2025
23 checks passed
@Pijukatel Pijukatel deleted the error-tracker branch March 5, 2025 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants