Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve config #6467

Merged
merged 39 commits into from
Nov 11, 2021
Merged

Improve config #6467

merged 39 commits into from
Nov 11, 2021

Conversation

smotornyuk
Copy link
Member

@smotornyuk smotornyuk commented Oct 12, 2021

Check updated configuration docs.

Extra details:
This PR introduces a new IConfigDeclaration interface which, in turn, changes some aspects of using the config in CKAN.
Note these changes are completely compatible with existing extensions and no changes are required by default. But below I'll explain what changes may be done in order to make extensions even better.

IConfigDeclaration

Declaration

This interface provides the single method def declare_config_options(self, declaration: Declaration, key: Key) -> None: ....
declaration object is used to "declare" config option. Declared options can have a default value, can be validated, etc. Undeclared options are ones that are used in the config file, but has no corresponding declaration. Undeclared options behave exactly as they did before. Actually, before this PR every single config option (ckan.site_url, ckan.plugins) was undeclared, so you already know, how they work.

In order to declare an option, we have to specify its "key" (name). That's it, any option will turn into declared via simple line:

declaration.declare("ckan.site_url")

Everything else is completely optional, but can be useful sometimes:

  • Default value. All the declared options have None as the default value. You can change it. Note, declared "default" value has no effect in existing code. I.e, config.get(option) will ignore it. I'll explain how to use declared defaults in the following section

    declaration.declare(name, DEFAULT_VALUE)
    
  • Validators. Just like in scheming, you can specify validators via string. There is two restrictions: validators must be idempotent(passing the value through the validators multiple times must produce the same result) and declared default must be valid(default value also can be validated in some cases)

    declaration.declare("debug").set_validators("boolean_validator")
    declaration.declare("timeout").set_validators("convert_int")
    
    # or via shortcut
    declaration.declare_bool("debug")
    declaration.declare_int("debug")
    
  • Descriptions. Explain the purpose of the config option

declaration.declare("debug").set_description("Enables debug mode")
  • Placeholders. If an option has no default, but you want to leave a hint as to how its value looks like, you can use placeholders. For example, sqlalchemy.url = postgresql://ckan_default:pass@localhost/ckan_test - it's rather not a default value, but a placeholder, that can be adapted to the current environment.
declaration.declare("sqlalchemy.url").set_placeholer("postgresql://ckan_default:pass@localhost/ckan_test")
  • Ignored/internal values. Some config values added in code, thus a value from the config file is ignored. Some values(__file__, testing, SECRET_KEY), can be set in the config file, but CKAN does not expect that and we cannot guarantee correct behavior when these variables are added to the config file. Nevertheless, if something can be found in the configuration object, it must be declared.

    declaration.declare("ckan.legacy_templates").ignored()
    declaration.declare("CKAN_INI").internal()
    
  • Huge definitions. If you want to declare more than 5-10 options, it's better to put declaration into a dictionary and use declaration.load_dict:

    data = ... 
    declaration.load_dict(data)
    

    That's how I defined CKAN core config options. You even may want to do something like:

    data = json.load("file/with/declarations.json")
    declaration.load_dict(data)
    

    Check the documentation of ckan config describe CLI command below. It explains how to get examples of such dictionaries.

Key

Why do we need a second parameter, key? It's a special object, that can be used in order to construct config option names. It can be compared with a string and produces quite an expected result:

a = key.a
assert a == "a"

abc = key.a.b.c
assert abc == "a.b.c"

prefix = key.ckan

title = ckan.title
url = ckan.url
assert (title, url) == ("ckan.title", "ckan.url")

prefix = key.a.b.c
assert prefix.d == "a.b.c.d"

But the main reason why I've added Key is dynamic patterns:

xxx = key.xxx
dynamic_name = xxx.dynamic("group").name

assert dynamic_name == "xxx.A.name"
assert dynamic_name == "xxx.ANOTHER_GROUP.name"
assert dynamic_name == "xxx.any.number.of.section.name"

In this way, we can declare groups of options. For example, everything in the config file with the prefix sqlalchemy. will be passed down to the engine constructor. We can declare this fact:

name = key.sqlalchemy.dynamic("anything")
declaration.declare_dynamic(name).set_description("Extra arguments for SQLAlchemy engine builder")

The only argument of the dynamic method is the name of the dynamic group. It has no special meaning and is used only for debugging.

How it changes CKAN

By default, it doesn't change anything. I expect it to change in the future, but for now, I want this functionality to be used explicitly.
There are two ways of using declarations in code.

Config object

There are three new methods added to the CKANConfig (toolkit.config):

  • config.safe(OPTION_NAME). Returns either the current value of the config option if it is present, or declared a default. The following two expressions are the same, but using safe allows us to re-use the same default value everywhere

    # assuming, you have done this: declaration.declare('ckan.datasets_per_page', 20)
    per_page = config.safe('ckan.datasets_per_page')
    
    # is the same as
    per_page = config.get('ckan.datasets_per_page', 20)
    

    When using safe with the undeclared option, None is used as default.

  • config.normalized(OPTION_NAME). Similar to the safe, but returns validated value

    # assuming, you have done this: declaration.declare_int('ckan.datasets_per_page', 20)
    per_page = config.normalized('ckan.datasets_per_page')
    
    # is the same as
    per_page = plugins.toolkit.asint(config.get('ckan.datasets_per_page', 20))
    
  • config.subset(key_with_dynamic_part). Simplified way to get group of options. It expects Key as an argument, so I've added key to the toolkit:

    pattern = tk.key.sqlalchemy.dynamic("params")
    sql_params = config.subset(pattern)
    # sql_params = {"sqlalchemy.url": ..., "sqlalchemy.echo": .., etc.}
    
    pattern = tk.key.solr.dynamic("params")
    solr_params = config.subset(pattern)
    

Config modes

There are three new boolean configuration options, which are disabled by default:

  • config.safe. When enabled, copy the default value from the declaration into the config object, when the corresponding config option is missing. In this mode config.safe(key) identical to config[key] and does no extra work. It's disabled because in safe mode all the declared options have value(None in the worst case). It means that config.get('ckan.datasets_per_page', DEFAULT) will never return DEFAULT, because ckan.datasets_per_page always has value in safe mode. Even though it's not a problem in most cases, it can break some extensions.
  • config.normalized. Similar to the previous one, but transforms config options in addition. For example, the declared boolean debug option will turn into True, while without normalized mode it has a value of "true". In this way we can avoid such errors:
    if config["debug"]: ...
    # must be: tk.asbool(config["debug"])
    
    It's disabled by default because some extensions can do the following:
    if config["debug"] == "true": ...
    
    In the normalized mode config.normalized(key) is identical to config[key] and does no extra work
  • config.strict. Performs validation of config options when the application is loaded. If there are any invalid config options, CKAN won't start and errors will be printed to the output. Note, that strict mode does not assume safe mode. That means that if you haven't specified ckan.datasets_per_page config option in your config file, in safe mode CKAN will start, but in the un-safe mode, it will print that ckan.datasets_per_page is not a valid integer. Actually, that's the main reason why it's not enabled by default.

I expect that extensions will migrate to config.safe/normalized during the next few years and then we can enable all these modes by default. After that, we'll replace config.safe/normalized with the direct access to config options(config[key]) and make these modes permanent.

Documentation bonus

I've exported declarations to the doc-builder and now we are using declared defaults instead of the hardcoded ones. BTW, some defaults in the docs were out of date :(

CLI

There isa new branch of commands available in CLI: ckan config

  • declaration [PLUGIN...]. Prints an example of configuration for the plugin(based on it's config declarations)

    ckan config declaration datapusher
    

    prints something like

    ## Datapusher settings #########################################################
    ckan.datapusher.formats = csv xls xlsx tsv application/csv application/vnd.ms-excel application/vnd.openxmlformats-officedocument.spreadsheetml.sheet ods application/vnd.oasis.opendocument.spreadsheet
    ckan.datapusher.url = 
    ckan.datapusher.callback_url_base = 
    ckan.datapusher.assume_task_stale_after = 3600
    

    Ideally, one will use this command before enabling the plugin and copy output to the config file.
    Core declarations can be printed in that way:

    ckan config declaration --core    
    

    and now this command is used internally when generating CKAN config (ckan generate config). This means we don't have to update config template anymore. All the declared options will appear there automatically

  • describe [PLUGIN...]. Print the config declaration for the given plugin. Mainly can be used as an example for writing own declarations.

    ckan config describe datapusher
    

    prints

    declaration.annotate('Datapusher settings')
    declaration.declare(key.ckan.datapusher.formats, 'csv xls xlsx tsv application/csv application/vnd.ms-excel 
    application/vnd.openxmlformats-officedocument.spreadsheetml.sheet ods application/vnd.oasis.opendocument.spreadsheet')
    declaration.declare(key.ckan.datapusher.url)
    declaration.declare(key.ckan.datapusher.callback_url_base)
    declaration.declare(key.ckan.datapusher.assume_task_stale_after, 3600).set_validators('convert_int')
    

    Just as in declaration, use --core if you want to see core declarations.
    In addition, you can print examples of YAML, JSON, TOML, or python's dict declarations, that can be used together with the declaration.load_dict:

    ckan config describe datapusher -f dict 
    ckan config describe datapusher -f json 
    ckan config describe datapusher -f yaml
    ckan config describe datapusher -f toml
    
  • search. Search for the config option using pattern

    ckan config search '*url'
    

    prints

    sqlalchemy.url
    ckan.site_url
    solr_url
    ckan.redis.url
    ckan.dumps_url
    package_new_return_url
    package_edit_return_url
    licenses_group_url
    

    if you want to include declarations from the disabled plugins, add them via -i flag:

    ckan config search '*url' -i datastore -i datapusher
    

    --with-default/--with-current flags print default/current values.
    --no-custom flag filters out all the options, which currently have a value that is different from the default.
    --custom-only flag shows only customized (non-default) options

  • undeclared. Shows all the options available in the config, that are not declared by any of enabled plugins

    ckan config undeclared
    

    If you know that some of the options are declared by disabled plugins, you can use -i flag in order to reduce the output:

    ckan config undeclared -i datastore -i  datapusher
    
  • validate. Allows you to verify that your configuration is valid even if strict mode is disabled. As in the previous command, use -i if you want to validate in addition configuration of disabled plugins:

    ckan config validate -i datastore -i  datapusher
    

    Note, that you'll likely get some errors that will disappear in safe mode. I've explained it in the config.strict description.

@smotornyuk smotornyuk changed the title Rework config Improve config Oct 13, 2021
@smotornyuk smotornyuk marked this pull request as ready for review October 14, 2021 11:00
@amercader amercader self-requested a review October 14, 2021 13:47
@EricSoroos
Copy link
Contributor

Recommend removing config.safe(OPTION_NAME) as a function call, as there's no good reason not to use config.normalized(OPTION_NAME) when modifying code to use this new setup.

(note, I'm not talking about the modes, just the calls that actually require changes to code anyway)

@amercader
Copy link
Member

AFAICT this is good to go. @smotornyuk we need some docs right? I see a declare-config-options reference but that doesn't seem to point to anywhere. We need some description of the new modes and behaviour in this page:

http://docs.ckan.org/en/latest/maintaining/configuration.html

and a notice on the changelog. Happy to help proofreading and reviewing it.

@smotornyuk
Copy link
Member Author

@amercader , there is quite a big section at the end of configuration.rst that contains a source of declare-config-options reference. I think that GitHub just collapsed it and that's why you haven't noticed it

@amercader
Copy link
Member

@smotornyuk my apologies you are right. I'll review that and merge afterwards

@amercader
Copy link
Member

@smotornyuk I took the liberty of re-writing and re-organizing some bits of the documentation in 67c126e. Please let me know if you are happy with all these changes.

Additionally here are some more questions that came up from my review:

  1. How do you define a required option using declaration.declare()?
  2. I think that the ckan config command is really confusing when config.mode = default, for instance if I run ckan config validate I get a bunch of "Missing value" errors for things that are not really mandatory. Des it make sense to just disable it completely in this case and just show a message that says somehting like "config.mode = strict is required to use the declarative config features"?
  3. Shouldn't ckan config validate include all loaded plugins config by default? rather than having to pass every plugin via -i. I consider the whole config in the ini file as a whole and you want to validate it you want to validate all of it right?

@smotornyuk
Copy link
Member Author

Yes, thanks for the documentation updates.

option = declaration.declare(...)
option.required()
  1. Don't mind against it
  2. search, undeclared and validate include all the enabled plugins and require -i only if you want to include disabled plugin

@amercader
Copy link
Member

@smotornyuk great, I've made 1 and 3 clear in the docs, thanks for taking care of 2

If these tests are green I'll merge it

@amercader amercader merged commit 526886d into master Nov 11, 2021
@amercader amercader deleted the rework-config branch November 11, 2021 15:06
@amercader
Copy link
Member

Great stuff, thanks @smotornyuk !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants