Skip to content

Conversation

@fw-bot
Copy link
Contributor

@fw-bot fw-bot commented Aug 9, 2020

Python 3 before 3.8 has a bug that causes the email.policy classes to incorrectly fold and RFC2047-encode "identification fields" in email messages. This mainly applies to Message-Id, References, and In-Reply-To fields.

We are impacted by this bug since #35929 where we switched to using the "modern" email.message API.

RFC2047 section 5 clearly states that those headers/fields are not to be encoded, and that would violate RFC5322.

Further, such a folded Message-Id is considered non-RFC-conformant by popular MTAs (GMail, Outlook), which will then generate another Message-Id field, causing the original threading information to be lost. Replies to such a modified message will reference the new, unknown Message-Id, and won't be attached to the original thread.

The solution we adopt here is to monkey-patch the SMTP policies to special-case those identification fields and deactivate the automatic folding, until the bug is properly and fully fixed in the standard lib.

Some considerations taken into account for this patch:

  • email.policy.SMTP is being monkey-patched globally to make sure we fix all possible places where Messages are being encoded/folded
  • the fix is not made version-specific, considering that even in Python 3.8 the official bugfix only applies to Message-Id, but still fails to protect other identification fields, like References and In-Reply-To. The author specifically noted that shortcoming [2]. The fix wouldn't break anything on Python 3.8 anyway.
  • the noFoldPolicy trick for preventing folding is done with no max line length at all. RFC5322, section 2.1.1 states [3] that the maximum length is 998 due to legacy implementations, but there is no provision to wrap identification fields that are longer than that. Wrapping at 998 chars would corrupt the header anyway. We'll just count on the fact that we don't usually need 1k+ chars in those headers.

The invalid folding/encoding in action on Python 3.6 (in Python 3.8 only the second header gets folded):

>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: =?utf-8?q?=3C929227342217024=2E1596730490=2E324691772460938-exam?=
 =?utf-8?q?ple-30661-some=2Ereference=40test-123=2Eexample=2Ecom=3E?=
In-Reply-To: =?utf-8?q?=3C92922734221723=2E1596730568=2E324691772460444-anot?=
 =?utf-8?q?her-30661-parent=2Ereference=40test-123=2Eexample=2Ecom=3E?=

and the expected result after the fix:

>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: <929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>
In-Reply-To: <92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>

[1] bpo-35805: https://bugs.python.org/issue35805
[2] python/cpython#13397 (comment)
[3] https://tools.ietf.org/html/rfc5322#section-2.1.1

Forward-Port-Of: #55609

@robodoo robodoo added forwardport This PR was created by @fw-bot seen 🙂 labels Aug 9, 2020
@fw-bot
Copy link
Contributor Author

fw-bot commented Aug 9, 2020

This PR targets saas-13.4 and is part of the forward-port chain. Further PRs will be created up to master.

More info at https://github.com/odoo/odoo/wiki/Mergebot#forward-port

@fw-bot
Copy link
Contributor Author

fw-bot commented Aug 9, 2020

Ping @odony

ci/runbot failed on this forward-port PR

@C3POdoo C3POdoo added the RD research & development, internal work label Aug 9, 2020
Python 3 before 3.8 has a bug that causes the email.policy classes to
incorrectly fold and RFC2047-encode "identification fields" in email
messages. This mainly applies to Message-Id, References, and In-Reply-To
fields.

We are impacted by this bug since odoo#35929 where we switched to
using the "modern" email.message API.

RFC2047 section 5 clearly states that those headers/fields are not to be
encoded, and that would violate RFC5322.

Further, such a folded Message-Id is considered non-RFC-conformant by
popular MTAs (GMail, Outlook), which will then generate *another*
Message-Id field, causing the original threading information to be lost.
Replies to such a modified message will reference the new, unknown
Message-Id, and won't be attached to the original thread.

The solution we adopt here is to monkey-patch the SMTP policies to
special-case those identification fields and deactivate the automatic
folding, until the bug is properly and fully fixed in the standard lib.

Some considerations taken into account for this patch:

- `email.policy.SMTP` is being monkey-patched globally to make sure we
  fix all possible places where Messages are being encoded/folded
- the fix is **not** made version-specific, considering that even in Python
  3.8 the official bugfix only applies to Message-Id, but still fails to
  protect other identification fields, like *References* and
  *In-Reply-To*. The author specifically noted that shortcoming [2].
  The fix wouldn't break anything on Python 3.8 anyway.
- the `noFoldPolicy` trick for preventing folding is done with no max
  line length at all. RFC5322, section 2.1.1 states [3] that the maximum
  length is 998 due to legacy implementations, but there is no provision
  to wrap identification fields that are longer than that. Wrapping at
  998 chars would corrupt the header anyway. We'll just count on the
  fact that we don't usually need 1k+ chars in those headers.

The invalid folding/encoding in action on Python 3.6 (in Python 3.8 only
the second header gets folded):

```py
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: =?utf-8?q?=3C929227342217024=2E1596730490=2E324691772460938-exam?=
 =?utf-8?q?ple-30661-some=2Ereference=40test-123=2Eexample=2Ecom=3E?=
In-Reply-To: =?utf-8?q?=3C92922734221723=2E1596730568=2E324691772460444-anot?=
 =?utf-8?q?her-30661-parent=2Ereference=40test-123=2Eexample=2Ecom=3E?=

```

and the expected result after the fix:
```py
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: <929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>
In-Reply-To: <92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>

```

[1] bpo-35805: https://bugs.python.org/issue35805
[2] python/cpython#13397 (comment)
[3] https://tools.ietf.org/html/rfc5322#section-2.1.1

X-original-commit: 6726e9a
@odony odony force-pushed the saas-13.4-saas-13.3-fix-message-ids-k0n3-fw branch from fae99e4 to 8e61547 Compare August 9, 2020 15:22
@fw-bot
Copy link
Contributor Author

fw-bot commented Aug 9, 2020

This PR was modified / updated and has become a normal PR. It should be merged the normal way (via @robodoo)

@robodoo robodoo added ☑ legal/cla CI 🤖 Robodoo has seen passing statuses and removed ☐ legal/cla labels Aug 9, 2020
@odony
Copy link
Contributor

odony commented Aug 9, 2020

@robodoo r+

robodoo pushed a commit that referenced this pull request Aug 9, 2020
Python 3 before 3.8 has a bug that causes the email.policy classes to
incorrectly fold and RFC2047-encode "identification fields" in email
messages. This mainly applies to Message-Id, References, and In-Reply-To
fields.

We are impacted by this bug since #35929 where we switched to
using the "modern" email.message API.

RFC2047 section 5 clearly states that those headers/fields are not to be
encoded, and that would violate RFC5322.

Further, such a folded Message-Id is considered non-RFC-conformant by
popular MTAs (GMail, Outlook), which will then generate *another*
Message-Id field, causing the original threading information to be lost.
Replies to such a modified message will reference the new, unknown
Message-Id, and won't be attached to the original thread.

The solution we adopt here is to monkey-patch the SMTP policies to
special-case those identification fields and deactivate the automatic
folding, until the bug is properly and fully fixed in the standard lib.

Some considerations taken into account for this patch:

- `email.policy.SMTP` is being monkey-patched globally to make sure we
  fix all possible places where Messages are being encoded/folded
- the fix is **not** made version-specific, considering that even in Python
  3.8 the official bugfix only applies to Message-Id, but still fails to
  protect other identification fields, like *References* and
  *In-Reply-To*. The author specifically noted that shortcoming [2].
  The fix wouldn't break anything on Python 3.8 anyway.
- the `noFoldPolicy` trick for preventing folding is done with no max
  line length at all. RFC5322, section 2.1.1 states [3] that the maximum
  length is 998 due to legacy implementations, but there is no provision
  to wrap identification fields that are longer than that. Wrapping at
  998 chars would corrupt the header anyway. We'll just count on the
  fact that we don't usually need 1k+ chars in those headers.

The invalid folding/encoding in action on Python 3.6 (in Python 3.8 only
the second header gets folded):

```py
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: =?utf-8?q?=3C929227342217024=2E1596730490=2E324691772460938-exam?=
 =?utf-8?q?ple-30661-some=2Ereference=40test-123=2Eexample=2Ecom=3E?=
In-Reply-To: =?utf-8?q?=3C92922734221723=2E1596730568=2E324691772460444-anot?=
 =?utf-8?q?her-30661-parent=2Ereference=40test-123=2Eexample=2Ecom=3E?=

```

and the expected result after the fix:
```py
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: <929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>
In-Reply-To: <92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>

```

[1] bpo-35805: https://bugs.python.org/issue35805
[2] python/cpython#13397 (comment)
[3] https://tools.ietf.org/html/rfc5322#section-2.1.1

closes #55655

X-original-commit: 6726e9a
Signed-off-by: Olivier Dony (odo) <[email protected]>
@robodoo robodoo closed this Aug 9, 2020
@robodoo robodoo temporarily deployed to merge August 9, 2020 16:19 Inactive
@fw-bot fw-bot deleted the saas-13.4-saas-13.3-fix-message-ids-k0n3-fw branch August 23, 2020 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI 🤖 Robodoo has seen passing statuses forwardport This PR was created by @fw-bot RD research & development, internal work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants