-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingest/protobuf): Improve String Handling for Multilingual Support in Descriptions #10975
fix(ingest/protobuf): Improve String Handling for Multilingual Support in Descriptions #10975
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe recent changes in Changes
Sequence Diagram(s)(No sequence diagrams generated as the changes do not entail complex control flow modifications.) Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-integration/java/datahub-protobuf/src/main/java/datahub/protobuf/ProtobufUtils.java (2 hunks)
Additional comments not posted (1)
metadata-integration/java/datahub-protobuf/src/main/java/datahub/protobuf/ProtobufUtils.java (1)
24-24
: LGTM! Simplified code and enhanced character handling.The changes simplify the code and improve the handling of multilingual characters by directly returning the concatenated and processed string without converting it to ISO-8859-1 encoding.
However, ensure that the changes do not introduce any regressions or unexpected behavior.
Verification successful
Verification Successful: No issues found with the
collapseLocationComments
method changes.The updated
collapseLocationComments
method is used consistently across multiple classes (ProtobufOneOfField
,ProtobufMessage
,ProtobufField
, andProtobufEnum
). The changes simplify the code and improve character handling without introducing any regressions or unexpected behavior.
ProtobufOneOfField.java
: Lines 9-19ProtobufMessage.java
: Lines 9-14ProtobufField.java
: Lines 9-34ProtobufEnum.java
: Lines 9-19Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `collapseLocationComments` method in the codebase to ensure no regressions. # Test: Search for the method usage. Expect: No regressions or unexpected behavior. rg --type java -A 5 $'collapseLocationComments'Length of output: 5025
Script:
#!/bin/bash # Description: Verify the context and usage of the `collapseLocationComments` method in the identified classes. # Check the usage in ProtobufOneOfField.java rg --type java -A 10 -B 10 'collapseLocationComments' metadata-integration/java/datahub-protobuf/src/main/java/datahub/protobuf/model/ProtobufOneOfField.java # Check the usage in ProtobufMessage.java rg --type java -A 10 -B 10 'collapseLocationComments' metadata-integration/java/datahub-protobuf/src/main/java/datahub/protobuf/model/ProtobufMessage.java # Check the usage in ProtobufField.java rg --type java -A 10 -B 10 'collapseLocationComments' metadata-integration/java/datahub-protobuf/src/main/java/datahub/protobuf/model/ProtobufField.java # Check the usage in ProtobufEnum.java rg --type java -A 10 -B 10 'collapseLocationComments' metadata-integration/java/datahub-protobuf/src/main/java/datahub/protobuf/model/ProtobufEnum.javaLength of output: 3858
Thanks for this! It has been awhile since I wrote that code, but it was definitely not working with UTF8 characters at one time. I'd be happy to merge this PR if a unit test using a UTF8 character could be added. |
@david-leifker I have added a unit test for verifying UTF-8 characters in protobuf comments. PTAL! 1f5fd5b |
...data-integration/java/datahub-protobuf/src/test/java/datahub/protobuf/ProtobufUtilsTest.java
Outdated
Show resolved
Hide resolved
Co-authored-by: happyprg <[email protected]>
Thanks! |
Checklist
Description
Support for Multilingual Characters: By returning the original string directly, we ensure that all characters, including those outside the ISO-8859-1 range (e.g., Korean, Chinese, Japanese), are preserved and correctly displayed.
Upon further investigation on this issue, it has been identified that the root cause of the broken UTF-8 characters is user error, specifically related to incorrect handling of character encodings in user input or data processing.
AS-IS
TO-BE
Summary by CodeRabbit
New Features
Bug Fixes
Refactor