-
Notifications
You must be signed in to change notification settings - Fork 30.9k
[FIX] base: avoid bpo-35805 corrupting Message-Id #55609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@tde-banana-odoo @Julien00859 we definitely need a test to cover this. Based on a quick look in What do you think would be the best option to do that?
|
|
Why not both ? At least integrate an unit test here to ensure it works and merge it to fix the issue. Then proceed to clean a bit mailing testing tools and integrate it cleanly. |
Python 3 before 3.8 has a bug that causes the email.policy classes to incorrectly fold and RFC2047-encode "identification fields" in email messages. This mainly applies to Message-Id, References, and In-Reply-To fields. We are impacted by this bug since odoo#35929 where we switched to using the "modern" email.message API. RFC2047 section 5 clearly states that those headers/fields are not to be encoded, and that would violate RFC5322. Further, such a folded Message-Id is considered non-RFC-conformant by popular MTAs (GMail, Outlook), which will then generate *another* Message-Id field, causing the original threading information to be lost. Replies to such a modified message will reference the new, unknown Message-Id, and won't be attached to the original thread. The solution we adopt here is to monkey-patch the SMTP policies to special-case those identification fields and deactivate the automatic folding, until the bug is properly and fully fixed in the standard lib. Some considerations taken into account for this patch: - `email.policy.SMTP` is being monkey-patched globally to make sure we fix all possible places where Messages are being encoded/folded - the fix is **not** made version-specific, considering that even in Python 3.8 the official bugfix only applies to Message-Id, but still fails to protect other identification fields, like *References* and *In-Reply-To*. The author specifically noted that shortcoming [2]. The fix wouldn't break anything on Python 3.8 anyway. - the `noFoldPolicy` trick for preventing folding is done with no max line length at all. RFC5322, section 2.1.1 states [3] that the maximum length is 998 due to legacy implementations, but there is no provision to wrap identification fields that are longer than that. Wrapping at 998 chars would corrupt the header anyway. We'll just count on the fact that we don't usually need 1k+ chars in those headers. The invalid folding/encoding in action on Python 3.6 (in Python 3.8 only the second header gets folded): ```py >>> msg = email.message.EmailMessage(policy=email.policy.SMTP) >>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>' >>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>' >>> print(msg.as_string()) Message-Id: =?utf-8?q?=3C929227342217024=2E1596730490=2E324691772460938-exam?= =?utf-8?q?ple-30661-some=2Ereference=40test-123=2Eexample=2Ecom=3E?= In-Reply-To: =?utf-8?q?=3C92922734221723=2E1596730568=2E324691772460444-anot?= =?utf-8?q?her-30661-parent=2Ereference=40test-123=2Eexample=2Ecom=3E?= ``` and the expected result after the fix: ```py >>> msg = email.message.EmailMessage(policy=email.policy.SMTP) >>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>' >>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>' >>> print(msg.as_string()) Message-Id: <929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com> In-Reply-To: <92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com> ``` [1] bpo-35805: https://bugs.python.org/issue35805 [2] python/cpython#13397 (comment) [3] https://tools.ietf.org/html/rfc5322#section-2.1.1
6b5ba29 to
25b0243
Compare
|
Here's with a minimal regression test for the bugfix, extending the one @Julien00859 added for bpo-34424. I'll leave it up to you to take it into account for further cleanup/improvement in the test tooling, I have no bandwidth for that now ;) |
|
Task 2314992 created in my pipeline |
|
I take it there are no objections for quickly applying this fix then, thanks! @robodoo r+ |
Python 3 before 3.8 has a bug that causes the email.policy classes to incorrectly fold and RFC2047-encode "identification fields" in email messages. This mainly applies to Message-Id, References, and In-Reply-To fields. We are impacted by this bug since #35929 where we switched to using the "modern" email.message API. RFC2047 section 5 clearly states that those headers/fields are not to be encoded, and that would violate RFC5322. Further, such a folded Message-Id is considered non-RFC-conformant by popular MTAs (GMail, Outlook), which will then generate *another* Message-Id field, causing the original threading information to be lost. Replies to such a modified message will reference the new, unknown Message-Id, and won't be attached to the original thread. The solution we adopt here is to monkey-patch the SMTP policies to special-case those identification fields and deactivate the automatic folding, until the bug is properly and fully fixed in the standard lib. Some considerations taken into account for this patch: - `email.policy.SMTP` is being monkey-patched globally to make sure we fix all possible places where Messages are being encoded/folded - the fix is **not** made version-specific, considering that even in Python 3.8 the official bugfix only applies to Message-Id, but still fails to protect other identification fields, like *References* and *In-Reply-To*. The author specifically noted that shortcoming [2]. The fix wouldn't break anything on Python 3.8 anyway. - the `noFoldPolicy` trick for preventing folding is done with no max line length at all. RFC5322, section 2.1.1 states [3] that the maximum length is 998 due to legacy implementations, but there is no provision to wrap identification fields that are longer than that. Wrapping at 998 chars would corrupt the header anyway. We'll just count on the fact that we don't usually need 1k+ chars in those headers. The invalid folding/encoding in action on Python 3.6 (in Python 3.8 only the second header gets folded): ```py >>> msg = email.message.EmailMessage(policy=email.policy.SMTP) >>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>' >>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>' >>> print(msg.as_string()) Message-Id: =?utf-8?q?=3C929227342217024=2E1596730490=2E324691772460938-exam?= =?utf-8?q?ple-30661-some=2Ereference=40test-123=2Eexample=2Ecom=3E?= In-Reply-To: =?utf-8?q?=3C92922734221723=2E1596730568=2E324691772460444-anot?= =?utf-8?q?her-30661-parent=2Ereference=40test-123=2Eexample=2Ecom=3E?= ``` and the expected result after the fix: ```py >>> msg = email.message.EmailMessage(policy=email.policy.SMTP) >>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>' >>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>' >>> print(msg.as_string()) Message-Id: <929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com> In-Reply-To: <92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com> ``` [1] bpo-35805: https://bugs.python.org/issue35805 [2] python/cpython#13397 (comment) [3] https://tools.ietf.org/html/rfc5322#section-2.1.1 closes #55609 Signed-off-by: Olivier Dony (odo) <[email protected]>
Python 3 before 3.8 has a bug that causes the email.policy classes to incorrectly fold and RFC2047-encode "identification fields" in email messages. This mainly applies to Message-Id, References, and In-Reply-To fields.
We are impacted by this bug since #35929 where we switched to using the "modern"
email.messageAPI.RFC2047 section 5 clearly states that those headers/fields are not to be encoded, and that would violate RFC5322.
Further, such a folded Message-Id is considered non-RFC-conformant by popular MTAs (GMail, Outlook), which will then generate another Message-Id field, causing the original threading information to be lost. Replies to such a modified message will reference the new, unknown Message-Id, and won't be attached to the original thread.
The solution we adopt here is to monkey-patch the SMTP policies to special-case those identification fields and deactivate the automatic folding, until the bug is properly and fully fixed in the standard lib.
Some considerations taken into account for this patch:
email.policy.SMTPis being monkey-patched globally to make sure we fix all possible places where Messages are being encoded/foldednoFoldPolicytrick for preventing folding is done with no max line length at all. RFC5322, section 2.1.1 states [3] that the maximum length is 998 due to legacy implementations, but there is no provision to wrap identification fields that are longer than that. Wrapping at 998 chars would corrupt the header anyway. We'll just count on the fact that we don't usually need 1k+ chars in those headers.The invalid folding/encoding in action on Python 3.6 (in Python 3.8 only the second header gets folded):
and the expected result after the fix:
[1] bpo-35805: https://bugs.python.org/issue35805
[2] python/cpython#13397 (comment)
[3] https://tools.ietf.org/html/rfc5322#section-2.1.1