[rqd] Fix non ASCII chars #1335

ramonfigueiredo · 2024-01-17T21:12:58Z

Convert to ASCII while discarding characters that can not be encoded

ramonfigueiredo · 2024-01-17T21:19:12Z

Problem

Before, if the RQD logs contained non-ASCII text, the RQD would crash, causing CueGUI to show the process as running indefinitely.

Error

UnicodeEncodeError: 'ascii' codec can't encode character u'\u221e' in position 64: ordinal not in range(128)

The problem occurs in the pipe_to_file function when trying to output non-ASCII characters to the file.

Error happens in the code below:

def pipe_to_file(stdout, stderr, outfile):
    ...

    def print_and_flush_ln(fd, last_timestamp):
    ...
        for line in lines[0:-1]:
            # Convert to ASCII while discarding characters that can not be encoded
            line = line.encode('ascii', 'ignore')
            print("[%s] %s" % (curr_line_timestamp, line), file=outfile)

Solution

BEFORE

for line in lines[0:-1]:
    print("[%s] %s" % (curr_line_timestamp, line), file=outfile)

NEW SOLUTION

for line in lines[0:-1]:
    # Convert to ASCII while discarding characters that can not be encoded
    line = line.encode('ascii', 'ignore')
    print("[%s] %s" % (curr_line_timestamp, line), file=outfile)

UPDATED LOGIC

The updated logic below ensure consistent handling of non-ASCII characters in logs with RQD_PREPEND_TIMESTAMP = True or RQD_PREPEND_TIMESTAMP = False.

The change always encode lines to ASCII with 'ignore' option to discard non-ASCII characters. I also removed unused file_descriptor block and ensured consistent encoding logic.

This change prevents UnicodeEncodeError and ensures consistent log outputs.

if rqd.rqconstants.RQD_PREPEND_TIMESTAMP:
    pipe_to_file(frameInfo.forkedCommand.stdout, frameInfo.forkedCommand.stderr, self.rqlog)
else:
    with open(self.rqlog, 'a') as f:
        # Convert to ASCII while discarding characters that can not be encoded
        for line in frameInfo.forkedCommand.stdout:
            line = line.encode('ascii', 'ignore')
            f.write(line.decode('ascii') + '\n')
        for line in frameInfo.forkedCommand.stderr:
            line = line.encode('ascii', 'ignore')
            f.write(line.decode('ascii') + '\n')

About the bug fix

Now, the RQD will ignore characters non-ascii in the logs and the RQD will work correctly.

For example:

If the log is:
'text here 영화, café'

It will be:
'text here , caf'

It will ignore the Korean character '영화' and the 'é'.

Tests

Manual Test

python

>>> text = 'text here 영화, café'

>>> text.encode('ascii')
Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-11: ordinal not in range(128)

>>> text.encode('ascii', 'ignore')
b'text here , caf'

ramonfigueiredo · 2024-01-17T21:37:42Z

Other solutions

Replace non-ASCII character with '?'

python

>>> text = 'text here 영화, café'

Current solution

>>> text.encode('ascii', 'ignore')
b'text here , caf'

Solution using .encode('ascii', 'replace')

>>> text.encode('ascii', 'replace')
b'text here ??, caf?'

Encode with UTF-8, instead of ASCII

It is also possible to keep UTF-8, but use encode() and decode():

UTF-8 example

>>> text = 'text here 영화, café'
>>> text_encoded = text.encode('utf-8', 'ignore')
>>> text_encoded.decode()
'text here �\x98\x81�\x99\x94, café'

ASCII example

>>> text = 'text here 영화, café'e')
>>> text_encoded = text.encode('ascii', 'ignore')
>>> text_encoded.decode()
'text here , caf'

- Convert to ASCII while discarding characters that can not be encoded - Update sphinx version to 5.0.0 on docs/requirements.txt

- Convert to ASCII while discarding characters that can not be encoded - Update sphinx version to 5.0.0 on docs/requirements.txt - Change docs/conf.py to use language = 'en'

docs/requirements.txt

bcipriano · 2024-01-19T18:07:39Z

rqd/rqd/rqcore.py

@@ -1219,6 +1219,8 @@ def print_and_flush_ln(fd, last_timestamp):

        remainder = lines[-1]
        for line in lines[0:-1]:
+            # Convert to ASCII while discarding characters that can not be encoded
+            line = line.encode('ascii', 'ignore')


@ramonfigueiredo Thank you for the PR and the detailed write up. The detail makes it nice and easy to review with the proper context.

So, this error occurs when we intercept output in order to prepend a timestamp. What is logged to the file when RQD_PREPEND_TIMESTAMP is False?

My sense is that the output which is logged when RQD_PREPEND_TIMESTAMP is True vs False should be as similar as possible, aside from the timestamp of course.

Dear @bcipriano ,

Sorry for the delay in answering your question. Thank you for your thorough review and the positive feedback on the PR and write-up.

Regarding your comment, the UnicodeEncodeError occurs when we intercept output to prepend a timestamp. When RQD_PREPEND_TIMESTAMP is False, the output is logged directly without any modification or encoding to ASCII, preserving the original characters, including any non-ASCII ones.

Here is how the output differs based on the value of RQD_PREPEND_TIMESTAMP:

When RQD_PREPEND_TIMESTAMP is True:

We intercept the output to prepend a timestamp.

Non-ASCII characters are encoded to ASCII with the 'ignore' option, discarding any non-ASCII characters.

Example log: "[12:34:56] text here , caf" (non-ASCII characters are ignored).

When RQD_PREPEND_TIMESTAMP is False:

The output is logged directly without any modification.

Non-ASCII characters are preserved.

Example log: "text here 영화, café" (non-ASCII characters are retained).

To ensure the outputs are as similar as possible, aside from the timestamp, I can adjust our approach to handle encoding more gracefully. Instead of discarding non-ASCII characters, I can consider other strategies, such as escaping them or converting them to a specific placeholder.

However, given the current solution, the primary goal was to prevent crashes due to UnicodeEncodeError by ignoring non-ASCII characters when RQD_PREPEND_TIMESTAMP is True (default option). This approach was chosen for simplicity and robustness.

If maintaining non-ASCII characters is critical, I can explore additional strategies to handle encoding more gracefully without discarding valuable information. I am open to suggestions and further discussion on the best approach to balance robustness and information retention.

Thank you again for your review and insights. I look forward to your feedback.

Hi @bcipriano

FYI ...

The updated logic to ensure consistent handling of non-ASCII characters in logs with RQD_PREPEND_TIMESTAMP = True or RQD_PREPEND_TIMESTAMP = False.

Removing changes to update Sphinx version

DiegoTavares · 2024-06-19T15:50:10Z

@ramonfigueiredo is this ready for review again? I see one test failure, but it looks like the logs have expired

ramonfigueiredo · 2024-07-03T17:25:47Z

@ramonfigueiredo is this ready for review again? I see one test failure, but it looks like the logs have expired

Hi @DiegoTavares ,

Yes, it is ready to review. I am waiting for the feedback from @bcipriano .

I fixed the checks/pipeline, it was missing merging the master into the branch fix_non_ascii_chars.

- Always encode lines to ASCII with 'ignore' option to discard non-ASCII characters. - Removed unused file_descriptor block and ensured consistent encoding logic. This change prevents UnicodeEncodeError and ensures consistent log outputs.

lithorus · 2024-07-16T11:46:08Z

rqd/rqd/rqcore.py

@@ -343,6 +339,16 @@ def runLinux(self):

        if rqd.rqconstants.RQD_PREPEND_TIMESTAMP:
            pipe_to_file(frameInfo.forkedCommand.stdout, frameInfo.forkedCommand.stderr, self.rqlog)
+        else:
+            with open(self.rqlog, 'a') as f:


It looks like this broke file logging. self.rqlog is a file object and not a string.
(When not using RQD_PREPEND_TIMESTAMP=True)

I'll take a look and fix that soon!

Thanks!

I should note that #1401 changes alot of this logic anyway (if it gets merged)

Hi @lithorus

FYI ...

I changed the code to prevent crashes due to TypeError and fixed the broken file logging. Thanks!

This is the new PR: #1417 . PR was merged into the master!

Hi @lithorus ,

Let me know when your PR #1401 is ready for review. For now, it is on draft and some checks/pipelines are falling.

For sure it is a nice feature. Since Loki offers efficient, scalable log aggregation, cost savings, seamless Prometheus/Grafana integration, and strong community support. Thanks!

Yes, I will write some examples and documentation on how it's used after #1416 is merged (some of the loki stuff is not compatible with python 2.x)
Also, seeing the recent sentry support in cuebot, it would be interesting to also add support for that aswell.

Great. Thanks!

- Ensure consistent handling of non-ASCII characters in logs - Always encode lines to ASCII with an 'ignore' option to discard non-ASCII characters. - Removed unused file_descriptor block and ensured consistent encoding logic. This change prevents UnicodeEncodeError and ensures consistent log outputs.

[rqd] Fix non ASCII chars

fe618e3

Convert to ASCII while discarding characters that can not be encoded

ramonfigueiredo requested review from bcipriano, gregdenton, jrray, smith1511, larsbijl, DiegoTavares, IdrisMiles and splhack as code owners January 17, 2024 21:12

ramonfigueiredo added 2 commits January 17, 2024 15:47

[rqd] Fix non ASCII chars

8fdfc93

- Convert to ASCII while discarding characters that can not be encoded - Update sphinx version to 5.0.0 on docs/requirements.txt

[rqd] Fix non ASCII chars

20d5864

- Convert to ASCII while discarding characters that can not be encoded - Update sphinx version to 5.0.0 on docs/requirements.txt - Change docs/conf.py to use language = 'en'

bcipriano requested changes Jan 19, 2024

View reviewed changes

[rqd] Fix non ASCII chars

31e1402

Removing changes to update Sphinx version

Merge branch 'AcademySoftwareFoundation:master' into fix_non_ascii_chars

2e1ece4

bcipriano approved these changes Jul 10, 2024

View reviewed changes

ramonfigueiredo merged commit c56d8cb into AcademySoftwareFoundation:master Jul 10, 2024
12 checks passed

lithorus reviewed Jul 16, 2024

View reviewed changes

ramonfigueiredo mentioned this pull request Jul 16, 2024

[rqd] Fix file logging to handle non-ASCII characters without timestamp #1417

Merged

lithorus mentioned this pull request Jul 16, 2024

Add support for logging to Loki (and others) #1401

Closed

DiegoTavares mentioned this pull request Jul 18, 2024

Issue on log parsing logic implemented by PR#1335 #1426

Closed

ramonfigueiredo deleted the fix_non_ascii_chars branch October 15, 2024 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rqd] Fix non ASCII chars #1335

[rqd] Fix non ASCII chars #1335

ramonfigueiredo commented Jan 17, 2024

ramonfigueiredo commented Jan 17, 2024 •

edited

Loading

ramonfigueiredo commented Jan 17, 2024

bcipriano Jan 19, 2024

ramonfigueiredo Jul 3, 2024

ramonfigueiredo Jul 3, 2024

DiegoTavares commented Jun 19, 2024

ramonfigueiredo commented Jul 3, 2024

lithorus Jul 16, 2024 •

edited

Loading

ramonfigueiredo Jul 16, 2024

lithorus Jul 16, 2024 •

edited

Loading

ramonfigueiredo Jul 16, 2024 •

edited

Loading

ramonfigueiredo Jul 16, 2024 •

edited

Loading

lithorus Jul 16, 2024

ramonfigueiredo Jul 16, 2024

[rqd] Fix non ASCII chars #1335

[rqd] Fix non ASCII chars #1335

Conversation

ramonfigueiredo commented Jan 17, 2024

ramonfigueiredo commented Jan 17, 2024 • edited Loading

Problem

Error

Solution

About the bug fix

Tests

Manual Test

ramonfigueiredo commented Jan 17, 2024

Other solutions

Replace non-ASCII character with '?'

Encode with UTF-8, instead of ASCII

bcipriano Jan 19, 2024

Choose a reason for hiding this comment

ramonfigueiredo Jul 3, 2024

Choose a reason for hiding this comment

ramonfigueiredo Jul 3, 2024

Choose a reason for hiding this comment

DiegoTavares commented Jun 19, 2024

ramonfigueiredo commented Jul 3, 2024

lithorus Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

ramonfigueiredo Jul 16, 2024

Choose a reason for hiding this comment

lithorus Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

ramonfigueiredo Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

ramonfigueiredo Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

lithorus Jul 16, 2024

Choose a reason for hiding this comment

ramonfigueiredo Jul 16, 2024

Choose a reason for hiding this comment

ramonfigueiredo commented Jan 17, 2024 •

edited

Loading

lithorus Jul 16, 2024 •

edited

Loading

lithorus Jul 16, 2024 •

edited

Loading

ramonfigueiredo Jul 16, 2024 •

edited

Loading

ramonfigueiredo Jul 16, 2024 •

edited

Loading