Unexpected Parsing of Numeric Literals Concatenated with Boolean Operators #87999

sco1 · 2021-04-13T18:27:20Z

BPO	43833
Nosy	@gvanrossum, @rhettinger, @cfbolz, @nedbat, @serhiy-storchaka, @zooba, @gvanrossum, @asottile, @pablogsal, @miss-islington, @sco1, @pxeger, @shreyanavigyan, @alimuldal
PRs	bpo-43833: Emit warnings for numeric literals followed by keyword #25466 [3.10] bpo-43833: Emit warnings for numeric literals followed by keyword (GH-25466) #26614

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2021-06-09.00:25:00.346>
created_at = <Date 2021-04-13.18:27:19.816>
labels = ['interpreter-core', 'type-bug', '3.10']
title = 'Unexpected Parsing of Numeric Literals Concatenated with Boolean Operators'
updated_at = <Date 2021-11-08.13:52:29.794>
user = 'https://github.com/sco1'

bugs.python.org fields:

activity = <Date 2021-11-08.13:52:29.794>
actor = 'pablogsal'
assignee = 'none'
closed = True
closed_date = <Date 2021-06-09.00:25:00.346>
closer = 'pablogsal'
components = ['Interpreter Core']
creation = <Date 2021-04-13.18:27:19.816>
creator = 'sco1'
dependencies = []
files = []
hgrepos = []
issue_num = 43833
keywords = ['patch']
message_count = 27.0
messages = ['390981', '390984', '390988', '390991', '390993', '390995', '390996', '390997', '390998', '390999', '391001', '391002', '391003', '391042', '391050', '391051', '391335', '391336', '391340', '391341', '391351', '391354', '395367', '395368', '396498', '405939', '405949']
nosy_count = 16.0
nosy_names = ['gvanrossum', 'rhettinger', 'Carl.Friedrich.Bolz', 'nedbat', 'Joshua.Landau', 'serhiy.storchaka', 'steve.dower', 'Guido.van.Rossum', 'Anthony Sottile', 'pablogsal', 'miss-islington', 'sco1', 'pxeger', 'shreyanavigyan', 'alimuldal', 'rrauenza']
pr_nums = ['25466', '26614']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue43833'
versions = ['Python 3.10']

sco1 · 2021-04-13T18:27:20Z

Came across this riddle today:

>>> [0x_for x in (1, 2, 3)]
[15]

Initially I thought this was related to PEP-515 but the unexpected behavior extends to simpler examples as well, such as:

>>> x = 5
>>> 123or x
123
>>> 123and x
5

I'm not familiar enough with C to understand why this is being parsed/tokenized this way, but this seems like it should instead be a SyntaxError. This appears to be fairly old behavior, as the non-underscored version works back to at least 2.7.

And a bonus:

>>> 0x1decade or more
31378142

sco1 · 2021-04-13T18:43:51Z

Sorry, the bonus, while fun, I don't think is related

alimuldal · 2021-04-13T19:07:29Z

Several other keywords seem to be affected, including if, else, is, and in

cfbolz · 2021-04-13T19:18:45Z

It's not just about keywords. Eg '1x' tokenizes too but then produces a syntax error in the parser. Keywords are only special in that they can be used to write syntactically meaningful things with these concatenated numbers.

pablogsal · 2021-04-13T19:33:51Z

This is know behaviour unfortunately and cannot be changed because of backwards compatibility.

serhiy-storchaka · 2021-04-13T19:43:03Z

Good example! Similar issue was discussed on the mailing list 3 years ago
(https://mail.python.org/archives/list/[email protected]/thread/D2WPCITHG2LBQAP7DBTC6CY26WQUBAKP/#D2WPCITHG2LBQAP7DBTC6CY26WQUBAKP). Now with new example it perhaps should be reconsidered.

shreyanavigyan · 2021-04-13T19:50:19Z

Hi. I'm totally confused about other keywords but I'm a little concerned about the "and", "or" operator when used on, not only "int" (also known as "long") but also most Python objects other then bool type.

Mostly when used on Python built-in objects "and", "or" keyword returns a very peculiar result. The "and" keyword returns the Python object on the left hand side while "or" returns the Python object on the right hand side. This applies to all Python object, built-in or user-defined, unless it has a specific __and__ or __or__ method defined.

What is actually going on?

pablogsal · 2021-04-13T19:58:09Z

We tried changing this IIRC and it broke code in the stdlib (now reformatted) so it will break code in the wild. I am not sure the gains are worth it.

serhiy-storchaka · 2021-04-13T19:58:46Z

Better example:

>>> [0x1for x in (1,2)]
[31]

The code is parsed as [0x1f or x in (1,2)] instead of [0x1 for x in (1,2)] as you may expect.

pablogsal · 2021-04-13T20:00:57Z

Precisely because examples like that changing this is a breaking change. Don't get me wrong: I would love to change it, but I don't know if is worth the risk

sco1 · 2021-04-13T20:09:01Z

Appreciate the additional historical context, I also was pointed to this in the documentation: https://docs.python.org/3/reference/lexical_analysis.html#whitespace-between-tokens

If a parsing change is undesired from a backwards compatibility standpoint, would it be something that could be included in PEP-8?

asottile · 2021-04-13T20:23:38Z

here's quite a few other cases as well -- I'd love for this to be clarified in PEP-8 such that I can rationalize crafting a rule for it in pycodestyle -- PyCQA/pycodestyle#371

pablogsal · 2021-04-13T20:25:20Z

One thing we could consider as Serhiy proposed on the mailing list is to emit a Syntax Warning. The ambiguous cases are specially scary so I think that makes sense

shreyanavigyan · 2021-04-14T08:09:27Z

Hi. I just want to know why is and, or operators behaving like this. The behavior is described in https://bugs.python.org/issue43833#msg390996. Moreover I researched a little more and found out even if __and__, __or__ methods are defined the and, or operators doesn't seem to work. As Serhiy described in https://bugs.python.org/issue43833#msg390998 the parser reads [0x1for x in (1,2)] as [0x1f or x in (1,2)] which is the parser's fault but why is the or operator behaving like that?

cfbolz · 2021-04-14T09:14:49Z

@shreyanavigyan This is a bit off-topic, but it's called "short-circuiting", described here: https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not
(or/and aren't really "operators", like +/- etc, they cannot be overridden, they evaluate their components lazily and are therefore almost control flow)

shreyanavigyan · 2021-04-14T09:20:12Z

@Carl.Friedrich.Bolz Thanks a lot for clarifying. For a second, I thought it was maybe a bug.

serhiy-storchaka · 2021-04-18T15:25:15Z

PR 25466 makes the tokenizer emitting a deprecation warning if the numeric literal is followed by one of keywords which are valid after numeric literals. In future releases it will be changed to syntax warning, and finally to syntax error.

It is breaking change, because it makes invalid currently allowed syntax like 0 in x or 1or x (but 0or x is already error).

See also bpo-21642 which allowed parsing "1else" as "1 else". Not all were agreed with that fix.

Perhaps we need to rewrite also some paragraphs in the language specification.

sco1 · 2021-04-18T16:19:00Z

We can also see this kind of thing with other literals, would that be in scope here?

e.g.

Python 3.9.4 (default, Apr  5 2021, 12:33:45) 
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> "foo"in ["foo", "bar"]
True
>>> [1,]in [[1,]]
True

serhiy-storchaka · 2021-04-18T18:17:51Z

There is no issues with lists and strings. "]" clearly ends the list display, and a quote ends a string literal. The problem with numeric literals is that they can contain letters, so it is not clear (for human reader) where the numeric literals ends and the keyword starts. Adding new numeric prefixes or suffixes or new keywords can break existing code.

sco1 · 2021-04-18T18:28:43Z

Makes sense, thanks!

rhettinger · 2021-04-19T03:13:03Z

I recommend just letting this be. Aside from it allowing for a cute riddle, in the real world seems to be harmless and not worth breaking code.

There are lots of other harmless oddities such as the space-invader increment operator:

x -=- 1

FWIW, a code reformatter such as Black will remove any weirdness.

gvanrossum · 2021-04-19T03:43:26Z

Actually I believe a real case was reported on python-dev. I think it is not clean that the boundary between numbers and identifiers is so fluid.

miss-islington · 2021-06-08T23:31:19Z

New changeset 2ea6d89 by Serhiy Storchaka in branch 'main':
bpo-43833: Emit warnings for numeric literals followed by keyword (GH-25466)
2ea6d89

miss-islington · 2021-06-08T23:52:31Z

New changeset eeefa7f by Miss Islington (bot) in branch '3.10':
bpo-43833: Emit warnings for numeric literals followed by keyword (GH-25466)
eeefa7f

pxeger · 2021-06-24T16:17:36Z

I would like to note that syntax like this is in heavy use in the Code Golf community (a sport in which the aim is to write the shortest code possible to complete a particular task).

It will be disappointing if it becomes an error and break many past programs (you can search for phrases like 1and, 0for on https://codegolf.stackexchange.com/search?q=0for for examples).

I could understand if this change remains because code golf is not exactly an important thing with serious ramifications, but I think it should be taken in to consideration as a use-case nonetheless.

gvanrossum · 2021-11-08T09:32:40Z

Do we have a plan for when this will be turned into a non-silent warning and when into an error?

pablogsal · 2021-11-08T13:52:30Z

Unless I am missing something it should be 3.11 non-silent warning and 3.12 syntax error

gvanrossum · 2022-04-25T15:44:49Z

Unless I am missing something it should be 3.11 non-silent warning and 3.12 syntax error

Following up: Is this a non-silent warning yet?

serhiy-storchaka · 2022-04-25T16:16:51Z

It is still DeprecationWarning.

…eyword The warning emitted by the Python parser for a numeric literal immediately followed by keyword has been changed from deprecation warning to syntax warning.

…GH-91980) The warning emitted by the Python parser for a numeric literal immediately followed by keyword has been changed from deprecation warning to syntax warning.

SectorCorruptor · 2022-11-17T05:29:33Z

I tried the code that is the subject of this issue,

[0x1for x in range(10)]

but in the usual IDLE installable from python.org, it doesn't raise anything. It still returns [31].

Further note to @pxeger: Our code-golf answers are normally posted in a specific python version (such as 2 or 3). When it's omitted, we assume it's Python 3, or the Python version that existed when the answer was posted. Besides an easy fix for this issue will be to specify the latest version of Python that allows this bug to be exploited, i.e. if this bug is fixed in, say Python 3.12, we can simply update our answers to say they work in Python versions before 3.12. Otherwise I don't think this bug being fixed will result in issues in executed code.

serhiy-storchaka · 2022-11-17T07:27:37Z

Please open a new issue for IDLE. Perhaps there is already an open issue for SyntaxWarning in IDLE.

SectorCorruptor · 2022-11-18T04:22:41Z

I have opened a new one under BPO 99567.

SectorCorruptor · 2022-12-05T13:03:44Z

@pablogsal this comment on my closed issue says otherwise:

In 3.12.0a2+, as of today, this is still just a warning. Duplicate of #82005. Thanks for the current example.

sco1 mannequin added 3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes type-bug An unexpected behavior, bug, or error labels Apr 13, 2021

serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed 3.7 (EOL) end of life 3.8 (EOL) end of life labels Apr 18, 2021

serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed 3.9 only security fixes 3.7 (EOL) end of life 3.8 (EOL) end of life labels Apr 18, 2021

pablogsal closed this as completed Jun 9, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

serhiy-storchaka mentioned this issue Apr 27, 2022

gh-87999: Change warning type for numeric literal followed by keyword #91980

Merged

serhiy-storchaka mentioned this issue May 19, 2022

Should we add "i" as a suffix for imaginary numbers (while keeping "j" also)? #92938

Closed

FichteFoll mentioned this issue Jun 14, 2023

[Python] Keywords immediately after numbers sublimehq/Packages#2763

Closed

mdickinson mentioned this issue Jan 24, 2024

SyntaxWarning: invalid decimal literal #114524

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected Parsing of Numeric Literals Concatenated with Boolean Operators #87999

Unexpected Parsing of Numeric Literals Concatenated with Boolean Operators #87999

sco1 mannequin commented Apr 13, 2021

sco1 mannequin commented Apr 13, 2021

sco1 mannequin commented Apr 13, 2021

alimuldal mannequin commented Apr 13, 2021

cfbolz mannequin commented Apr 13, 2021

pablogsal commented Apr 13, 2021

serhiy-storchaka commented Apr 13, 2021

shreyanavigyan mannequin commented Apr 13, 2021

pablogsal commented Apr 13, 2021

serhiy-storchaka commented Apr 13, 2021

pablogsal commented Apr 13, 2021

sco1 mannequin commented Apr 13, 2021

asottile mannequin commented Apr 13, 2021

pablogsal commented Apr 13, 2021

shreyanavigyan mannequin commented Apr 14, 2021

cfbolz mannequin commented Apr 14, 2021

shreyanavigyan mannequin commented Apr 14, 2021

serhiy-storchaka commented Apr 18, 2021

sco1 mannequin commented Apr 18, 2021

serhiy-storchaka commented Apr 18, 2021

sco1 mannequin commented Apr 18, 2021

rhettinger commented Apr 19, 2021

gvanrossum commented Apr 19, 2021

miss-islington commented Jun 8, 2021

miss-islington commented Jun 8, 2021

pxeger mannequin commented Jun 24, 2021

gvanrossum commented Nov 8, 2021

pablogsal commented Nov 8, 2021

gvanrossum commented Apr 25, 2022

serhiy-storchaka commented Apr 25, 2022

SectorCorruptor commented Nov 17, 2022

serhiy-storchaka commented Nov 17, 2022

SectorCorruptor commented Nov 18, 2022

SectorCorruptor commented Dec 5, 2022

Unexpected Parsing of Numeric Literals Concatenated with Boolean Operators #87999

Unexpected Parsing of Numeric Literals Concatenated with Boolean Operators #87999

Comments

sco1 mannequin commented Apr 13, 2021

sco1 mannequin commented Apr 13, 2021

sco1 mannequin commented Apr 13, 2021

alimuldal mannequin commented Apr 13, 2021

cfbolz mannequin commented Apr 13, 2021

pablogsal commented Apr 13, 2021

serhiy-storchaka commented Apr 13, 2021

shreyanavigyan mannequin commented Apr 13, 2021

pablogsal commented Apr 13, 2021

serhiy-storchaka commented Apr 13, 2021

pablogsal commented Apr 13, 2021

sco1 mannequin commented Apr 13, 2021

asottile mannequin commented Apr 13, 2021

pablogsal commented Apr 13, 2021

shreyanavigyan mannequin commented Apr 14, 2021

cfbolz mannequin commented Apr 14, 2021

shreyanavigyan mannequin commented Apr 14, 2021

serhiy-storchaka commented Apr 18, 2021

sco1 mannequin commented Apr 18, 2021

serhiy-storchaka commented Apr 18, 2021

sco1 mannequin commented Apr 18, 2021

rhettinger commented Apr 19, 2021

gvanrossum commented Apr 19, 2021

miss-islington commented Jun 8, 2021

miss-islington commented Jun 8, 2021

pxeger mannequin commented Jun 24, 2021

gvanrossum commented Nov 8, 2021

pablogsal commented Nov 8, 2021

gvanrossum commented Apr 25, 2022

serhiy-storchaka commented Apr 25, 2022

SectorCorruptor commented Nov 17, 2022

serhiy-storchaka commented Nov 17, 2022

SectorCorruptor commented Nov 18, 2022

SectorCorruptor commented Dec 5, 2022