Skip to content

[Bug] [Workflow] Workflow status inconsistency and infinite task creation loop in version 3.2.0 #17829

@crazychengmm

Description

@crazychengmm

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

Description
Version: DolphinScheduler 3.2.0

Scenario:
I started a workflow via the API. The tasks within the workflow completed successfully, but the workflow instance failed to recognize the completion. This led to a series of abnormal behaviors.

Abnormal Behaviors:

Infinite Task Creation: The workflow kept creating and submitting new task instances even though previous ones succeeded.
Retry Count Stagnation: The retry_times of the task remained at 1, even though the workflow's max retry was configured to 2. It seemed like the workflow was not "retrying" the failed task, but rather "re-triggering" new tasks from scratch.
Data Inconsistency: In the Web UI, the workflow instance showed an end_time from the past, but its state remained RUNNING.
Bizarre Behavior after Pausing: When I manually clicked "Pause" on the workflow, the system continued to generate new task instances in a PAUSED state every few seconds.
Logs: I checked both Master and Worker logs, but no explicit exceptions or error stacks were found during this period.
Recovery:
The issue was resolved immediately after restarting the Master cluster. The "ghost" tasks stopped being created, and the workflow state synchronized.

Possible Root Cause Suspected:
It seems like the WorkflowExecuteRunnable or the state machine in the Master node entered an inconsistent state/loop where it failed to update the workflow status while incorrectly believing it needed to schedule more tasks, potentially due to event loss or a race condition in the internal event queue.

Steps to Reproduce
Start a workflow instance via API in version 3.2.0.
Observe if the task finishes but the workflow fails to transition to SUCCESS.
Check if new task instances are generated repeatedly.
Try to pause the workflow and observe if paused tasks are still being created.
Expected Behavior
The workflow should transition to SUCCESS once all tasks are finished, and no further task instances should be created.

Actual Behavior
The workflow remains RUNNING (despite having an end_time), keeps creating new tasks infinitely, and even creates paused tasks after the workflow is paused.

Environment
OS: [CentOS 7]
DolphinScheduler Version: 3.2.0
Storage: [PG]
Deployment: [Cluster] 3master 6worker

What you expected to happen

fix this issue

How to reproduce

I don't know how it happen

Anything else

Image

taskId 2330628

Version

3.2.x

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requestedwontfixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions