Skip to content

[SPARK-55579][PYTHON] Rename PySpark error classes to be eval-type-agnostic#54996

Open
Yicong-Huang wants to merge 1 commit intoapache:masterfrom
Yicong-Huang:SPARK-55579/rename-error-classes
Open

[SPARK-55579][PYTHON] Rename PySpark error classes to be eval-type-agnostic#54996
Yicong-Huang wants to merge 1 commit intoapache:masterfrom
Yicong-Huang:SPARK-55579/rename-error-classes

Conversation

@Yicong-Huang
Copy link
Contributor

@Yicong-Huang Yicong-Huang commented Mar 25, 2026

What changes were proposed in this pull request?

Rename six PySpark error conditions to be generic and not tied to specific UDF eval types:

Old Name New Name
PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS OUTPUT_EXCEEDS_INPUT_ROWS
RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF RESULT_ROWS_MISMATCH
STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF INPUT_NOT_FULLY_CONSUMED
RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF RESULT_COLUMN_SCHEMA_MISMATCH
RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF RESULT_COLUMN_NAMES_MISMATCH
RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF RESULT_COLUMN_NAMES_MISMATCH (merged)

Also updated error messages to not reference specific eval types or data structures (e.g., "pandas.DataFrame", "pyarrow.Table" -> "data").

Why are the changes needed?

These error conditions were originally created for Pandas UDFs, but are now shared by Arrow UDFs as well. The names and messages should be generic so they can be reused across different eval types without confusion.

Part of SPARK-55388.

Does this PR introduce any user-facing change?

Yes. Error condition names and messages are updated. Users who catch specific error conditions by name will need to update their references.

How was this patch tested?

Existing tests updated to match new error condition names and messages.

Was this patch authored or co-authored using generative AI tooling?

No

@Yicong-Huang Yicong-Huang force-pushed the SPARK-55579/rename-error-classes branch from 46a6596 to a4919d0 Compare March 25, 2026 01:28
PythonException,
"PySparkRuntimeError: \\[RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF\\] "
"Column names of the returned pandas.DataFrame do not match "
"PySparkRuntimeError: \\[RESULT_COLUMNS_MISMATCH_NAMES\\] "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about RESULT_COLUMN_NAMES_MISMATCH

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. changed to this one

]
},
"RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF": {
"RESULT_COLUMNS_MISMATCH_SCHEMA": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about RESULT_COLUMN_SCHEMA_MISMATCH

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. changed to this one

Copy link
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, otherwise LGTM

@Yicong-Huang Yicong-Huang force-pushed the SPARK-55579/rename-error-classes branch from a4919d0 to 1fb783b Compare March 25, 2026 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants