Skip to content

Conversation

@wchan87
Copy link
Contributor

@wchan87 wchan87 commented Dec 10, 2025

What is the purpose of the change

Currently, CsvSchemaBuilder.set_null_value doesn't return self which breaks the builder design pattern which in turn makes the method unusable without workaround to retrieve the private _j_schema_builder object to call setNullValue directly. If left as-is, the current usage of the method with respect to the builder design pattern returns something like:

AttributeError: 'NoneType' object has no attribute 'build'

Brief change log

  • Added return self to the end of the CsvSchemaBuilder.set_null_value method

Verifying this change

This change added tests and can be verified as follows:

  • Added unit test, test_csv_default_null_value and helper methods which test empty string, '' is treated as null (i.e., None in Python)

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no, the method as is for CsvSchemaBuilder won't work properly as-is so this PR fixes it
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 10, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@dianfu dianfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wchan87 Good catch! LGTM.

@wchan87
Copy link
Contributor Author

wchan87 commented Dec 10, 2025

Prior commit ran into flake8 check so renamed the test method

Dec 10 03:42:54 ================flake8 checks=================
Dec 10 03:42:55 ./pyflink/datastream/formats/tests/test_csv.py:84:101: E501 line too long (101 > 100 characters)
Dec 10 03:42:55 ==========flake8 checks... [FAILED]===========

Seeing if I can trigger a re-run of CI
@flinkbot run azure

schema = CsvSchema.builder() \
.add_string_column('string') \
.add_number_column('number') \
.set_null_value('') \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Some testing suggestions:
I would test a test with .set_null_value(), set the default? If so we should test that
I also suggest a test specifying that a non default literal for null works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A null_value must be specified if .set_null_value is called. There is no default value.

@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Dec 13, 2025
@dianfu dianfu closed this in a56134d Dec 15, 2025
dianfu pushed a commit that referenced this pull request Dec 15, 2025
dianfu pushed a commit that referenced this pull request Dec 15, 2025
dianfu pushed a commit that referenced this pull request Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants