Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested dict argument to jinja macro fails the formatter #471

Closed
rparvathaneni-sc opened this issue Aug 31, 2023 · 10 comments
Closed

Nested dict argument to jinja macro fails the formatter #471

rparvathaneni-sc opened this issue Aug 31, 2023 · 10 comments
Labels
bug Something isn't working

Comments

@rparvathaneni-sc
Copy link

Describe the bug
The following config macro makes the formatter fail because of the nested dict (range in partition_by)

{{ config(
    materialized='table',
    partition_by={
      "field": "user_id",
      "data_type": "int64",
      "range": {
        "start": 0,
        "end": 100,
        "interval": 10
      }
    }
)}}

To Reproduce
Paste the macro above in an sql file and try to run the formatter

Expected behavior
Formatter not erroring out

Actual behavior

sqlfmt encountered an error: Closing bracket ')' found at 138 before bracket was opened.

Additional context
What is the output of sqlfmt --version?

sqlfmt, version 0.19.0
@tconbeer
Copy link
Owner

tconbeer commented Aug 31, 2023

I can't repro on main, v0.19.0, or sqlfmt.com.

(shandy-sqlfmt-BZvASPt6-py3.8) tco@bros-gold:~/open/sqlfmt$ sqlfmt -
{{ config(
    materialized='table',
    partition_by={
      "field": "user_id",
      "data_type": "int64",
      "range": {
        "start": 0,
        "end": 100,
        "interval": 10
      }
    }
)}}
{{
    config(
        materialized="table",
        partition_by={
            "field": "user_id",
            "data_type": "int64",
            "range": {"start": 0, "end": 100, "interval": 10},
        },
    )
}}

1 file formatted.
0 files left unchanged.

@tconbeer tconbeer closed this as not planned Won't fix, can't repro, duplicate, stale Sep 1, 2023
@benjamin-awd
Copy link
Contributor

I've run into a similar issue -- plugging the code into sqlfmt.com works fine though, so I'm just manually replacing the buggy code

Original code:

{{
    config(
        materialized='incremental',
        incremental_strategy='insert_overwrite',
        external_location=generate_external_location("bucket", "datalake/xxx"),
        partitioned_by=['batch_date'],
        lf_tags_config={
          'enabled': true,
          'tags': {
            'domain': 'foo',
            'sensitivity': 'bar'
          }
        }
    )
}}

After formatting:

{{
    config(
        materialized="incremental",
        incremental_strategy="insert_overwrite",
        external_location=generate_external_location("bucket", "datalake/xxx"),
        partitioned_by=["batch_date"],
        lf_tags_config={
            "enabled": true,
            "tags": {"domain": "foo", "sensitivity": "bar"},
        },
    )
}}

@tconbeer
Copy link
Owner

Can you run pip list, please? I need to know what versions of shandy-sqlfmt and black you have installed.

@benjamin-awd
Copy link
Contributor

benjamin-awd commented Oct 13, 2023

I have a pretty long list of dependencies, but in my project I'm using

black                               22.12.0
shandy-sqlfmt                       0.20.0

Some additional debugging messages, in case it's useful:

# from node_manager.py
INFO:root:Token(type=TokenType.UNTERM_KEYWORD, token=select, spos=374)
INFO:root:Token(type=TokenType.UNTERM_KEYWORD, token=from, spos=1308)
INFO:root:Token(type=TokenType.UNTERM_KEYWORD, token=where, spos=1374)
INFO:root:Token(type=TokenType.BRACKET_CLOSE, token=), spos=310)
1 file had errors while formatting.
0 files left unchanged.
kp_dbt/dbt/app/models/kp_curated/web3/rpt_web3__subscription.sql
    sqlfmt encountered an error: Closing bracket ')' found at 310 before bracket was opened.
# from analyzer.py
-> rule.action(self, source_string, match)
(Pdb) rule
Rule(name='bracket_close', priority=510, pattern='(\\]|\\)|\\})', action=functools.partial(<function add_node_to_buffer at 0x106a6cee0>, token_type=<TokenType.BRACKET_CLOSE: 19>))
(Pdb) match
<re.Match object; span=(310, 315), match='    )'>

@tconbeer
Copy link
Owner

Thanks. Those logs are surprising -- the error is coming from pos 310, after later parts of the query are lexed... can you share the rest of your query (or another reproducible example), through the end of the where clause?

@benjamin-awd
Copy link
Contributor

benjamin-awd commented Oct 14, 2023

This query triggers sqlfmt encountered an error: Closing bracket ')' found at 330 before bracket was opened.

{{
    config(
        materialized='incremental',
        incremental_strategy='insert_overwrite',
        external_location=generate_external_location("baz", "datalake/producer=foo/object=bar"),
        partitioned_by=['batch_date'],
        lf_tags_config={
          'enabled': true,
          'tags': {
            'domain': 'foo',
            'sensitivity': 'public'
          }
        }
    )
}}

select
    batch_date
from
    {{ ref('bar') }}
{% if is_incremental() %}
where batch_date = '{{ var("ds") }}'
{% endif %}

The issue seems linked to the line-length argument. Running sqlfmt <path_to_query> --line-length 88 works fine, but sqlfmt <path_to_query> --line-length 120 triggers the error.

I've pinpointed the lack of a trailing comma as the issue -- if you add the comma before running sqlfmt, no errors are raised.

        lf_tags_config={
          'enabled': true,
          'tags': {
            'domain': 'foo',
            'sensitivity': 'public'
+          },
        }

@tconbeer tconbeer reopened this Oct 15, 2023
@tconbeer tconbeer added the bug Something isn't working label Oct 15, 2023
@tconbeer
Copy link
Owner

I can reproduce on main, only with a line length of 95 or greater, which is very strange.

@benjamin-awd
Copy link
Contributor

Looks like it's an issue with the search_for_terminating_token function

Starting var:.. start_rule = 'jinja_expression_start'
Starting var:.. program = re.compile('[^\\S\\n]*((\\{\\{-?)|(-?\\}\\}))', re.IGNORECASE|re.DOTALL)
Starting var:.. nesting_program = re.compile('[^\\S\\n]*(\\{\\{-?)', re.IGNORECASE|re.DOTALL)
Starting var:.. tail = ' config(lf_tags_config={"enabled": true, "tags": {"domain": "foo", "sensitivity": "public"}}) }}\n'
Starting var:.. pos = 2
00:52:32.838857 call       166     def search_for_terminating_token(
00:52:32.840292 line       179         match = program.search(tail)
New var:....... match = <re.Match object; span=(90, 92), match='}}'>
00:52:32.841333 line       180         if not match:
00:52:32.842425 line       186         start, end = match.span(1)
New var:....... start = 90
New var:....... end = 92
00:52:32.843606 line       188         nesting_match = nesting_program.match(tail, start)
New var:....... nesting_match = None
00:52:32.844716 line       189         if nesting_match:
00:52:32.846138 line       206             return pos + end
00:52:32.847240 return     206             return pos + end
Return value:.. 94

The correct start should be 94, and the correct end should be 96

I can reproduce on main, only with a line length of 95 or greater, which is very strange.

On lengths < 95, the source_string doesn't have }} in it, which prevents this issue from occurring

\n    config(\n        lf_tags_config={\n            "enabled": true,\n            "tags": {"domain": "foo", "sensitivity": "public"},\n        }\n    )\n}}\n

It might be possible to solve this with some kind of parsing logic that checks the tail to see if it contains a dictionary. Otherwise, maybe some kind of initial pass to add trailing commas to } delimiters? That would prevent the end of the dictionary from being read as a jinja comment, since it becomes ...public"},} instead of ...public"}}

benjamin-awd added a commit to benjamin-awd/sqlfmt that referenced this issue Oct 19, 2023
benjamin-awd added a commit to benjamin-awd/sqlfmt that referenced this issue Oct 19, 2023
benjamin-awd added a commit to benjamin-awd/sqlfmt that referenced this issue Oct 20, 2023
@benjamin-awd
Copy link
Contributor

Raised a proper PR on this https://github.com/tconbeer/sqlfmt/pull/508/files -- it's a bit rough around the edges but hopefully is a step forwards.

benjamin-awd added a commit to benjamin-awd/sqlfmt that referenced this issue Oct 20, 2023
@tconbeer
Copy link
Owner

Thanks so much! I've been traveling - it's on my list to review this and get the other one merged today; might slip to early next week. Really appreciate your engagement here!

tconbeer pushed a commit to benjamin-awd/sqlfmt that referenced this issue Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants