Skip to content

Commit

Permalink
feat/issue 200/order of operations (#244)
Browse files Browse the repository at this point in the history
* refactor: simplify op priority logic

* chore: bump development to py3.10, fix warnings

* fix: do not treat square brackets as standalone operators (regression from prev commit)

* refactor: create OperatorPrecedence enum and tiers

* feat: parse more operators into specific tiers

* fix #207: between's and now a tight word operator

* feat: add more granular operator precedence, close #200

* chore: update changelog and readme

* chore: bump primer refs

* fix: fix bugs in op precedence, improve test coverage

* fix: update primer stats for dbt-utils
  • Loading branch information
tconbeer authored Aug 18, 2022
1 parent 73c1a0a commit c34fcf2
Show file tree
Hide file tree
Showing 18 changed files with 394 additions and 117 deletions.
6 changes: 3 additions & 3 deletions .python-version
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
3.9.7
3.8.13
3.7.12
3.10.3
3.8.13
3.9.13
3.10.6
8 changes: 5 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,11 @@ All notable changes to this project will be documented in this file.
- sqlfmt can now be called as a python module, with `python -m sqlfmt`

### Formatting Changes + Bug Fixes
- fixed a bug that could cause lines with long jinja tags to be one character over the line length limit, and could result in unstable formatting ([#237](https://github.com/tconbeer/sqlfmt/pull/237) - thank you [@nfcampos](https://github.com/nfcampos)!)
- fixed a bug that formatted array literals like they were indexing operations ([#235](https://github.com/tconbeer/sqlfmt/pull/235) - thank you [@nfcampos](https://github.com/nfcampos)!)
- fixed a bug that caused unnecessary copying of the cache when using multiprocessing. Large projects should see dramatically faster (near-instant) runs once the cache is warm
- adds more granularity to operator precedence and will merge lines more aggressively that start with high-precedence operators ([#200](https://github.com/tconbeer/sqlfmt/issues/200))
- improves the formatting of `between ... and ...`, especially in situations where the source includes a line break ([#207](https://github.com/tconbeer/sqlfmt/issues/207))
- fixes a bug that caused unnecessary copying of the cache when using multiprocessing. Large projects should see dramatically faster (near-instant) runs once the cache is warm
- fixes a bug that could cause lines with long jinja tags to be one character over the line length limit, and could result in unstable formatting ([#237](https://github.com/tconbeer/sqlfmt/issues/237) - thank you [@nfcampos](https://github.com/nfcampos)!)
- fixes a bug that formatted array literals like they were indexing operations ([#235](https://github.com/tconbeer/sqlfmt/issues/235) - thank you [@nfcampos](https://github.com/nfcampos)!)

## [0.10.1] - 2022-08-05

Expand Down
31 changes: 15 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ For now, sqlfmt only works on `select` statements (which is all you need if you
## Installation

### Try it first
Want to test out sqlfmt on a query before you install it? Go to [sqlfmt.com](http://sqlfmt.com) to use the interactive, web-based version.
Want to test out sqlfmt on a query before you install it? Go to [sqlfmt.com](https://sqlfmt.com) to use the interactive, web-based version.

### Prerequisites
You will need Python 3.7-3.10 installed. You should use pipx or install into a virtual environment (maybe as a dev-dependency in your project). If you do not know how to install python and use pipx and/or virtual environments, go read about that first.
Expand Down Expand Up @@ -127,7 +127,7 @@ This can also be configured using the `pyproject.toml` file:
dialect = "clickhouse"
```

Note that with this option, sqlfmt will not lowercase **most** non-reserved keywords, even common ones like `sum` or `count`. See (and please join) [this issue](https://github.com/tconbeer/sqlfmt/issues/195) for a discussion of this topic.
Note that with this option, sqlfmt will not lowercase **most** non-reserved keywords, even common ones like `sum` or `count`. See (and please join) [this discussion](https://github.com/tconbeer/sqlfmt/discussions/229) for more on this topic.

### Using sqlfmt with dbt

Expand All @@ -146,7 +146,7 @@ Add the following config to your `.pre-commit-config.yaml` file:
```yml
repos:
- repo: https://github.com/tconbeer/sqlfmt
rev: v0.10.0
rev: v0.10.1
hooks:
- id: sqlfmt
language_version: python
Expand Down Expand Up @@ -199,28 +199,27 @@ You can use the [Run on Save](https://marketplace.visualstudio.com/items?itemNam
## The sqlfmt style
The only thing you can configure with sqlfmt is the desired line length of the formatted file. You can do this with the `--line-length` or `-l` options. The default is 88.

Given the desired line length, sqlfmt has four objectives:
1. Break and indent lines to make the syntactical structure of the code apparent
2. Break lines so they are shorter than the desired line length, if possible
3. Combine lines to use the least possible vertical space, without violating #1 and #2
4. Standardize capitalization (to lowercase) and in-line whitespace
Given the desired line length, sqlfmt has three objectives:
1. Break and indent lines to make the syntactical structure of the code apparent.
2. Combine lines to use the least possible vertical space, without violating the line-length constraint or the structure in #1.
3. Standardize capitalization (to lowercase) and in-line whitespace.

sqlfmt borrows elements from well-accepted styles from other programming languages. It places opening brackets on the same line as preceding function names (like *black* for python and *1TBS* for C). It indents closing brackets to the same depth as the opening bracket (this is extended to statements that must be closed, like `case` and `end`).

The sqlfmt style is as simple as possible, with little-to-no special-casing of formatting concerns. While at first blush, this may not create a format that is as "nice" or "expressive" as hand-crafted indentation, over time, as you grow accustomed to the style, formatting becomes transparent and the consistency will allow you to jump between files, projects, and even companies much faster.

### Why lowercase?
There are several reasons that sqlfmt insists on lowercase SQL keywords:
1. We believe that SQL is code (this is a surprisingly controversial statement!). Shouting-case keywords perpetuate the myth that SQL isn't "real code", or isn't "modern" and somehow is trapped in the era of low-level imperative languages: BASIC, COBOL, and FORTRAN. The reality is that SQL is an incredibly powerful, declarative, and modern language. It should look like one
1. Syntax highlighting for SQL makes shouting-case keywords redundant; the syntax highlighter in any text editor is going to be more consistent than any manual shout-casing. If you have a SQL query as a string inside of a block of code in another language, you may want to capitalize your keywords; sqlfmt only operates on dedicated SQL (and templated sql) files, so this is not relevant. However, even without syntax highlighting, the hierarchical and consistent indentation provided by sqlfmt provides sufficient visual structure without shout-casing keywords
1. We believe that SQL is code (this is a surprisingly controversial statement!). Shouting-case keywords perpetuate the myth that SQL isn't "real code", or isn't "modern" and somehow is trapped in the era of low-level imperative languages: BASIC, COBOL, and FORTRAN. The reality is that SQL is an incredibly powerful, declarative, and modern language. It should look like one.
1. Syntax highlighting for SQL makes shouting-case keywords redundant; the syntax highlighter in any text editor is going to be more consistent than any manual shout-casing. If you have a SQL query as a string inside of a block of code in another language, you may want to capitalize your keywords; sqlfmt only operates on dedicated SQL (and templated sql) files, so this is not relevant. However, even without syntax highlighting, the hierarchical and consistent indentation provided by sqlfmt provides sufficient visual structure without shout-casing keywords.
1. Even among people who like shout-cased keywords, there are disagreements between what gets shout-cased. SELECT, sure, but SUM? AS? OVER? AND? All-lowercase keywords eliminates this potential source of irregularity and disagreement.
1. Research shows that generally, lowercase words are more readable
1. Research shows that generally, lowercase words are more readable.

### Why trailing commas?
1. Using trailing commas follows the convention of every other written language and programming language
1. Leading commas require placing the first field name on the same line as `select`, which can obscure that field
1. SQL query compilation is extremely fast; the "cost" of "last field" errors is very low. Some dialects (e.g., BigQuery) even allow a trailing comma in the final field of a select statement
1. Trailing commas generalize better within `select` statements (e.g. `group by` and `partition by` clauses) and in other kinds of SQL statements (e.g. `insert` statements)
1. Using trailing commas follows the convention of every other written language and programming language.
1. Leading commas require placing the first field name on the same line as `select`, which can obscure that field (especially with `select distinct top 25 ...`).
1. SQL query compilation is extremely fast; the "cost" of "last field" errors is very low. Some dialects (e.g., BigQuery, DuckDB) even allow a trailing comma in the final field of a select statement.
1. Trailing commas generalize better within `select` statements (e.g. `group by` and `partition by` clauses) and in other kinds of SQL statements (e.g. `insert` statements).

### Examples

Expand Down Expand Up @@ -289,7 +288,7 @@ select
) as e
from my_table
```
Want more examples? See the `tests/data` directory, or go to http://sqlfmt.com to see how sqlfmt will format your queries.
Want more examples? See the `tests/data` directory, or go to https://sqlfmt.com to see how sqlfmt will format your queries.

## Contributing

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description = "sqlfmt is an opinionated CLI tool that formats your sql files"
readme = "README.md"
authors = ["Ted Conbeer <[email protected]>"]
license = "Apache-2.0"
homepage = "http://sqlfmt.com"
homepage = "https://sqlfmt.com"
repository = "https://github.com/tconbeer/sqlfmt"
classifiers = [
"Development Status :: 3 - Alpha",
Expand Down
4 changes: 1 addition & 3 deletions src/sqlfmt/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,9 +168,7 @@ def _format_many(
format_func = partial(_format_one, mode=mode)
if len(cache_misses) > 1 and not mode.single_process:
results.extend(
asyncio.get_event_loop().run_until_complete(
_multiprocess_map(format_func, paths, callback=callback)
)
asyncio.run(_multiprocess_map(format_func, paths, callback=callback))
)
else:
results.extend((map(format_func, cache_misses)))
Expand Down
16 changes: 4 additions & 12 deletions src/sqlfmt/dialect.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,14 +278,14 @@ def __init__(self) -> None:
r"cube",
r"(not\s+)?exists",
r"grouping sets",
r"(not\s+)?ilike",
r"(not\s+)?in",
r"is(\s+not)?",
r"isnull",
r"(not\s+)?like(\s+any)?",
r"(not\s+)?i?like(\s+any)?",
r"notnull",
r"rollup",
r"(not\s+)?regexp",
r"(not\s+)?rlike",
r"rollup",
r"some",
r"(not\s+)?similar\s+to",
r"using",
Expand Down Expand Up @@ -318,15 +318,6 @@ def __init__(self) -> None:
token_type=TokenType.TIGHT_WORD_OPERATOR,
),
),
Rule(
name="not",
priority=923,
pattern=group(r"not") + group(r"\W", r"$"),
action=partial(
actions.add_node_to_buffer,
token_type=TokenType.WORD_OPERATOR,
),
),
Rule(
name="as",
priority=930,
Expand All @@ -345,6 +336,7 @@ def __init__(self) -> None:
pattern=group(
r"and",
r"or",
r"not",
)
+ group(r"\W", r"$"),
action=partial(
Expand Down
4 changes: 3 additions & 1 deletion src/sqlfmt/line.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,9 @@ def is_standalone_jinja_statement(self) -> bool:

@property
def is_standalone_operator(self) -> bool:
return self._is_standalone_if(self.starts_with_operator)
return self._is_standalone_if(
self.starts_with_operator and not self.starts_with_square_bracket_operator
)

@property
def is_standalone_comma(self) -> bool:
Expand Down
66 changes: 28 additions & 38 deletions src/sqlfmt/merger.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
from sqlfmt.line import Line
from sqlfmt.mode import Mode
from sqlfmt.node import Node
from sqlfmt.operator_precedence import OperatorPrecedence
from sqlfmt.segment import Segment, create_segments_from_lines
from sqlfmt.token import TokenType


@dataclass
Expand Down Expand Up @@ -62,7 +62,7 @@ def _extract_components(
content_nodes = [
cls._raise_unmergeable(node, allow_multiline)
for node in line.nodes
if node.token.type != TokenType.NEWLINE
if not node.is_newline
]
if content_nodes:
final_newline = line.nodes[-1]
Expand Down Expand Up @@ -125,6 +125,7 @@ def maybe_merge_lines(self, lines: List[Line]) -> List[Line]:
Returns a new list of Lines
"""

try:
merged_lines = self.create_merged_line(lines)
except CannotMergeException:
Expand All @@ -139,7 +140,9 @@ def maybe_merge_lines(self, lines: List[Line]) -> List[Line]:
if len(segments) > 1:
# merge together segments of equal depth that are
# joined by operators
segments = self._maybe_merge_operators(segments)
segments = self._maybe_merge_operators(
segments, OperatorPrecedence.tiers()
)
# some operators really should not be by themselves
# so if their segments are too long to be merged,
# we merge just their first line onto the prior segment
Expand Down Expand Up @@ -189,41 +192,42 @@ def _fix_standalone_operators(self, segments: List[Segment]) -> List[Segment]:
def _maybe_merge_operators(
self,
segments: List[Segment],
priority: int = 2,
op_tiers: List[OperatorPrecedence],
) -> List[Segment]:
"""
Tries to merge runs of segments that start with operators into previous
segments. Operators have a priority that determines a sort of hierarchy;
if we can't merge a whole run of operators, we increase the priority to
create shorter runs that can be merged
"""
if len(segments) <= 1 or priority < 0:
if len(segments) <= 1 or not op_tiers:
return segments
head = 0
new_segments: List[Segment] = []
precedence = op_tiers.pop()

for i, segment in enumerate(segments[1:], start=1):
if not self._segment_continues_operator_sequence(segment, priority):
if not self._segment_continues_operator_sequence(segment, precedence):
new_segments.extend(
self._try_merge_operator_segments(segments[head:i], priority)
self._try_merge_operator_segments(segments[head:i], op_tiers.copy())
)
head = i

# we need to try one more time to merge everything after head
else:
new_segments.extend(
self._try_merge_operator_segments(segments[head:], priority)
self._try_merge_operator_segments(segments[head:], op_tiers.copy())
)

return new_segments

@classmethod
def _segment_continues_operator_sequence(
cls, segment: Segment, min_priority: int
cls, segment: Segment, max_precedence: OperatorPrecedence
) -> bool:
"""
Returns true if the first line of the segment is part
of a sequence of operators
of a sequence of operators of priority <= max_priority
"""
try:
line, _ = segment.head
Expand All @@ -232,30 +236,12 @@ def _segment_continues_operator_sequence(
return True
else:
return (
(
line.starts_with_operator
and cls._operator_priority(line.nodes[0].token.type) <= min_priority
)
or line.starts_with_comma
or line.starts_with_square_bracket_operator
)

@staticmethod
def _operator_priority(token_type: TokenType) -> int:
if token_type in (TokenType.BOOLEAN_OPERATOR, TokenType.ON):
return 2
elif token_type not in (
# list of "tight binding" operators
TokenType.AS,
TokenType.DOUBLE_COLON,
TokenType.TIGHT_WORD_OPERATOR,
):
return 1
else:
return 0
line.starts_with_operator
and OperatorPrecedence.from_node(line.nodes[0]) <= max_precedence
) or line.starts_with_comma

def _try_merge_operator_segments(
self, segments: List[Segment], priority: int
self, segments: List[Segment], op_tiers: List[OperatorPrecedence]
) -> List[Segment]:
"""
Attempts to merge segments into a single line; if that fails,
Expand All @@ -269,7 +255,7 @@ def _try_merge_operator_segments(
Segment(self.create_merged_line(list(itertools.chain(*segments))))
]
except CannotMergeException:
new_segments = self._maybe_merge_operators(segments, priority - 1)
new_segments = self._maybe_merge_operators(segments, op_tiers)
finally:
return new_segments

Expand All @@ -289,21 +275,25 @@ def _maybe_stubbornly_merge(self, segments: List[Segment]) -> List[Segment]:
if len(segments) <= 1:
return segments

stubborn_merge_tier = OperatorPrecedence.COMPARATORS

new_segments = [segments[0]]
for segment in segments[1:]:
prev_operator = self._segment_continues_operator_sequence(
new_segments[-1], min_priority=1
new_segments[-1], max_precedence=stubborn_merge_tier
)
if (
# always stubbornly merge P0 operators (e.g., `over`)
self._segment_continues_operator_sequence(segment, min_priority=0)
self._segment_continues_operator_sequence(
segment, max_precedence=OperatorPrecedence.OTHER_TIGHT
)
# stubbornly merge p1 operators only if they do NOT
# follow another p1 operator AND they open brackets
# and cover multiple lines
or (
not prev_operator
and self._segment_continues_operator_sequence(
segment, min_priority=1
segment, max_precedence=stubborn_merge_tier
)
and segment.tail_closes_head
)
Expand All @@ -325,12 +315,12 @@ def _stubbornly_merge(
best possible merger of those two segments
"""
new_segments: List[Segment] = []
# try to merge the first line of this segment with the previous segment
head, i = segment.head

# try to merge the first line of this segment with the previous segment
try:
prev_segment = Segment(self.create_merged_line(prev_segment + [head]))
prev_segment.extend(Segment(segment[i + 1 :]))
prev_segment.extend(segment[i + 1 :])
new_segments.append(prev_segment)
except CannotMergeException:
# try to add this segment to the last line of the previous segment
Expand Down
1 change: 1 addition & 0 deletions src/sqlfmt/node.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,7 @@ def is_operator(self) -> bool:
TokenType.SEMICOLON,
)
or self.is_multiplication_star
or self.is_square_bracket_operator
)

@property
Expand Down
Loading

0 comments on commit c34fcf2

Please sign in to comment.