Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typing #5924

Merged
merged 1 commit into from
Feb 21, 2022
Merged

Typing #5924

merged 1 commit into from
Feb 21, 2022

Conversation

smotornyuk
Copy link
Member

@smotornyuk smotornyuk commented Mar 2, 2021

Supply all .py files with corresponding .pyi details.

Initial stubs were generated by pyright because it puts a lot of extra information (like docstrings) into .pyi and it really helps to understand what's going on. Additionally, I've generated stubs using pytype(because it really shines when it comes to type inference) and now I'm going to compare these two versions + source code and add as much information as possible.

Those types shouldn't be considered as a source of truth and it's ok if they won't be constantly synchronized with the codebase. In the near future, it will change a lot(because we are going to drop a lot of legacy code), so it's hardly possible to reflect all the changes on types. But these files will come in handy when we are ready to move all the types from .pyi into .py and only after that, we can really rely on types in order to refactor functionality and find logical errors in code

@smotornyuk smotornyuk self-assigned this Mar 2, 2021
@amercader
Copy link
Member

Sorry if this was discussed (I'm sure it was at some point). What is the benefit of having type information as separate .pyi files instead of on the source files? It seems like something that would go out of sync really quickly.

@smotornyuk
Copy link
Member Author

smotornyuk commented Mar 8, 2021

Types inside .py-files have to wait till we drop python2 support. If I add types to .py files directly, it won't be possible to just cherry-pick new fixes into 2.9 patch-releases, because the code will contain py2-unsupported syntax and we'll need to manually update all cherry-picked changes.

On the contrary, while we are keeping types in separate files, there is no extra-work for preparing patch releases. And no changes in the master branch will conflict with this WIP PR. When the most of codebase has type information and this PR is more or less ready to merge into master(or even after it becomes part of the master branch depending on the time it will take me to finish it), I can just quickly merge all .pyi files into .py and make final sanity check updating all the places that are out of sync.

@smotornyuk smotornyuk removed their assignment May 17, 2021
@smotornyuk smotornyuk marked this pull request as ready for review May 17, 2021 15:39
@smotornyuk
Copy link
Member Author

smotornyuk commented May 17, 2021

Comments are welcome.
I've spent too much time tinkering with it and I definitely missed something, but it works.

Implementation details

  • Typechecking performed using pyright. I've used pytype initially for making stubs from source-code, but it not so flexible as pyright. Mypy can be used instead, but it lacks few features(recursive definitions aka Tree = None | Leaf[Node | Tree] and module-as-property context['model'] = ckan.model). In the future, we can migrate to mypy because these two engines are compatible in most cases and can be used interchangeably. We can even use both of them at the same time, as soon as mypy will add those missing features
  • I tried to avoid type: ignore directives and configured rather a strict type-checking mode. Whenever it may be not obvious, I added type_ignore_reason: REASON comments with a brief explanation of why we have to ignore types here. Usually, it relates to missing types from the 3rd-party library
  • I've added assert instructions in order to persuade type-checker in cases when we have more information than the compiler does. For example model.Package.get may return None, but it definitely returns a package right after a successful call to the package_show. Otherwise, we'd get NotFound exception.
  • ckan.types.Context type that summarizes our usage of context inside actions. It may be a starting point for standardizing and refactoring its usage
  • Typechecking performed as separate GitHub action. In this way, we can ignore type-checker fails in PRs for 3-6 months from now on. I can solve typing errors during this period, while everyone is getting used to the typing syntax and features

Next steps (not a part of this PR, contributors are welcome)

  • Use fewer Any
  • Implement separate methods in order to use fewer assertions. For example, model.Package.get_or_raise, that never returns None.
  • Define more types inside ckan.types module. We don't want extension developers to rely on internal data representation of data structures. Instead of Dict[str, Any] we can advise using something like ckan.types.DataDict type alias.
  • Remove unused variables
  • Introduce some rules as to how to use context.
  • use from __future__ import annotations and switch to list/dict instead of typing.List/typing.Dict when python 3.6 support dropped
  • Tests have no type-signatures. I have not intention to change it, but if someone is interested, why not?

Run typechecker locally

npm ci
npx pyright

ckan/types.py Outdated Show resolved Hide resolved
ckan/types.py Outdated Show resolved Hide resolved
@wardi
Copy link
Contributor

wardi commented May 18, 2021

@smotornyuk you spoke about the assert statements in the meeting but I missed your explanation. What purpose do they serve?

@smotornyuk
Copy link
Member Author

They are used when the type-checker doesn't have some extra information, that developer has. Two examples:

  • Here we know that global_conf is a dictionary, but the type-checker believes that there may be a string instead.
  • Here we know that user was found, even though model.User.get may return None. How do we know this? Because we are setting its attribute on the following line. If there were now user, the following line would throw an exception anyway

@smotornyuk
Copy link
Member Author

This PR will be merged on February, 21. Here's a short overview of features of your code editor, which will work better when it gets merged. This particular PR just specifies the type of variables used through the codebase of CKAN and doesn't provide any new features. Everything described below comes from your IDE/editor or, rather, from the specific language server that your editor uses. And you probably already know about these features. But just in case you've never used them, I'll add a few examples.

This PR optimized for pyright. Here you can find some details about setting up pyright inside your editor.

TLDR:

  • if you are using IDE, you should get everything right away, without any extra steps.
  • if you are using VSCode, install Pylance(I think it's included in Python extension) or Pyright extension.
  • If you are using Vim/Emacs/SublimeText/etc, check recommendations about enabling LSP adapter for your editor.

Features

All screenshots are taken using VSCode with Pyright and Python extensions enabled.

Code autocompletion

You probably had it before in most cases. But let's imagine you are writing Click's CLI command that uses context. Here's an example of such a command and in this example you'll see no reliable autocompletion suggestions:

@click.pass_context
def command(ctx):
    ctx.a<NO AUTOCOMPLETION>

Your editor may suggest some words based on the variables/functions available in the current buffer, but you cannot be sure that any of the suggestions actually exists inside ctx object.

But if specify that ctx is a click.Context, you'll get the following:
image

Logical checks

Sometimes, especially when refactoring the code, you get code blocks that have no sense. For example, you are checking the type of the variable that is guaranteed to be a string. Or you are using a variable that is initialized inside a conditional branch(i.e, possibly unbound variable). Finally, you can write a code that never executes because of an early return. Here's what it looks like:
image

Refactoring

In the following code snippet, if you want to rename name property, you have to update manually definition of the X class and code of fun function:

class X:
    name = "X"

def fun(obj):
    print(obj.name)

obj = X()
fun(obj)

But if you change the signature of the function to the following:

def fun(obj: X):

Your code editor will be able to rename all usages of the name property as long as it has a rename feature. For example, in VSCode, put the cursor on the name definition inside class X and press F2. You'll see the small popup:
image

As soon as you set a new name for property inside this popup, all occurrences of name will be replaced with the new value
image

And if you try to change the name of the name property only inside X class, the editor will report that from now obj.name inside fun function is undefined:
image

smotornyuk added a commit that referenced this pull request Feb 17, 2022
smotornyuk added a commit that referenced this pull request Feb 18, 2022
smotornyuk added a commit that referenced this pull request Feb 18, 2022
@smotornyuk smotornyuk deleted the typing branch February 21, 2022 07:22
@ThrawnCA ThrawnCA mentioned this pull request Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

4 participants