-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rqd] Fix non ASCII chars #1335
[rqd] Fix non ASCII chars #1335
Conversation
Convert to ASCII while discarding characters that can not be encoded
ProblemBefore, if the RQD logs contained non-ASCII text, the RQD would crash, causing CueGUI to show the process as running indefinitely. ErrorUnicodeEncodeError: 'ascii' codec can't encode character u'\u221e' in position 64: ordinal not in range(128) The problem occurs in the Error happens in the code below:
SolutionBEFORE
NEW SOLUTION
UPDATED LOGIC The updated logic below ensure consistent handling of non-ASCII characters in logs with RQD_PREPEND_TIMESTAMP = True or RQD_PREPEND_TIMESTAMP = False. The change always encode lines to ASCII with 'ignore' option to discard non-ASCII characters. I also removed unused This change prevents UnicodeEncodeError and ensures consistent log outputs.
About the bug fixNow, the RQD will ignore characters non-ascii in the logs and the RQD will work correctly. For example: If the log is: It will be: It will ignore the Korean character '영화' and the 'é'. TestsManual Test
|
Other solutionsReplace non-ASCII character with '?'
Current solution
Solution using .encode('ascii', 'replace')
Encode with UTF-8, instead of ASCIIIt is also possible to keep UTF-8, but use encode() and decode(): UTF-8 example
ASCII example
|
- Convert to ASCII while discarding characters that can not be encoded - Update sphinx version to 5.0.0 on docs/requirements.txt
- Convert to ASCII while discarding characters that can not be encoded - Update sphinx version to 5.0.0 on docs/requirements.txt - Change docs/conf.py to use language = 'en'
@@ -1219,6 +1219,8 @@ def print_and_flush_ln(fd, last_timestamp): | |||
|
|||
remainder = lines[-1] | |||
for line in lines[0:-1]: | |||
# Convert to ASCII while discarding characters that can not be encoded | |||
line = line.encode('ascii', 'ignore') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ramonfigueiredo Thank you for the PR and the detailed write up. The detail makes it nice and easy to review with the proper context.
So, this error occurs when we intercept output in order to prepend a timestamp. What is logged to the file when RQD_PREPEND_TIMESTAMP
is False
?
My sense is that the output which is logged when RQD_PREPEND_TIMESTAMP
is True
vs False
should be as similar as possible, aside from the timestamp of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear @bcipriano ,
Sorry for the delay in answering your question. Thank you for your thorough review and the positive feedback on the PR and write-up.
Regarding your comment, the UnicodeEncodeError
occurs when we intercept output to prepend a timestamp. When RQD_PREPEND_TIMESTAMP
is False
, the output is logged directly without any modification or encoding to ASCII, preserving the original characters, including any non-ASCII ones.
Here is how the output differs based on the value of RQD_PREPEND_TIMESTAMP:
- When
RQD_PREPEND_TIMESTAMP
isTrue
:
- We intercept the output to prepend a timestamp.
- Non-ASCII characters are encoded to ASCII with the 'ignore' option, discarding any non-ASCII characters.
- Example log: "[12:34:56] text here , caf" (non-ASCII characters are ignored).
- When
RQD_PREPEND_TIMESTAMP
isFalse
:
- The output is logged directly without any modification.
- Non-ASCII characters are preserved.
- Example log: "text here 영화, café" (non-ASCII characters are retained).
To ensure the outputs are as similar as possible, aside from the timestamp, I can adjust our approach to handle encoding more gracefully. Instead of discarding non-ASCII characters, I can consider other strategies, such as escaping them or converting them to a specific placeholder.
However, given the current solution, the primary goal was to prevent crashes due to UnicodeEncodeError
by ignoring non-ASCII characters when RQD_PREPEND_TIMESTAMP
is True
(default option). This approach was chosen for simplicity and robustness.
If maintaining non-ASCII characters is critical, I can explore additional strategies to handle encoding more gracefully without discarding valuable information. I am open to suggestions and further discussion on the best approach to balance robustness and information retention.
Thank you again for your review and insights. I look forward to your feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bcipriano
FYI ...
The updated logic to ensure consistent handling of non-ASCII characters in logs with RQD_PREPEND_TIMESTAMP = True or RQD_PREPEND_TIMESTAMP = False.
Removing changes to update Sphinx version
@ramonfigueiredo is this ready for review again? I see one test failure, but it looks like the logs have expired |
Hi @DiegoTavares , Yes, it is ready to review. I am waiting for the feedback from @bcipriano . I fixed the checks/pipeline, it was missing merging the master into the branch |
- Always encode lines to ASCII with 'ignore' option to discard non-ASCII characters. - Removed unused file_descriptor block and ensured consistent encoding logic. This change prevents UnicodeEncodeError and ensures consistent log outputs.
c56d8cb
into
AcademySoftwareFoundation:master
@@ -343,6 +339,16 @@ def runLinux(self): | |||
|
|||
if rqd.rqconstants.RQD_PREPEND_TIMESTAMP: | |||
pipe_to_file(frameInfo.forkedCommand.stdout, frameInfo.forkedCommand.stderr, self.rqlog) | |||
else: | |||
with open(self.rqlog, 'a') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this broke file logging. self.rqlog
is a file object and not a string.
(When not using RQD_PREPEND_TIMESTAMP=True
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look and fix that soon!
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should note that #1401 changes alot of this logic anyway (if it gets merged)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lithorus ,
Let me know when your PR #1401 is ready for review. For now, it is on draft and some checks/pipelines are falling.
For sure it is a nice feature. Since Loki offers efficient, scalable log aggregation, cost savings, seamless Prometheus/Grafana integration, and strong community support. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will write some examples and documentation on how it's used after #1416 is merged (some of the loki stuff is not compatible with python 2.x)
Also, seeing the recent sentry support in cuebot, it would be interesting to also add support for that aswell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Thanks!
- Ensure consistent handling of non-ASCII characters in logs - Always encode lines to ASCII with an 'ignore' option to discard non-ASCII characters. - Removed unused file_descriptor block and ensured consistent encoding logic. This change prevents UnicodeEncodeError and ensures consistent log outputs.
Convert to ASCII while discarding characters that can not be encoded