Skip to content

Conversation

@pohly
Copy link
Contributor

@pohly pohly commented Sep 26, 2018

What this PR does / why we need it:

As described in issue #66649, the goal is to make test/e2e/framework usable for test suites and tests outside of Kubernetes. External tests cannot modify the upstream framework when adding new configuration options, because test/e2e/framework/test_context.go is read-only for them.

There's also the stalled effort to support a configuration file format. Viper was chosen for that years ago, but then existing tests were never modified to retrieve options from Viper. Note that Viper might not be the right tool for the job. Other components (kubeadm, see #66649 (comment)) also need something and the trend seems to go towards a Kubernetes-specific implementation.

This PR avoids making tests depend on Viper, it just uses Viper under the hood. This means that it could get replaced later without changing tests.

The approach suggested in this PR is (see comment in test/e2e/test_context.go):

  • tests define their options using the flag package, in their own code, or use a new utility package to define flags via struct tags (requires less code)
  • all flags can also be set in a configuration file if that feature gets enabled in a test suite
  • test/e2e.test supports a config file, now for all options

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Partial #66649

Special notes for your reviewer:

This is a subset of PR #68483.

Release note:

test/e2e/e2e.test:
* -viper-config can be used to set also the options defined by command line flags
* the default config file is "e2e.yaml/toml/json/..." and the test starts when no such config is found (as before) but if -viper-config is used, the config file must exist
* -viper-config can be used to select a file with full path, with or without file suffix
* the csiImageVersion/Registry flags were renamed to storage.csi.imageVersion/Registry

/sig testing

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 26, 2018
@pohly
Copy link
Contributor Author

pohly commented Sep 26, 2018

Despite the "WIP" tag the changes are already complete enough to be tested.

@pohly pohly mentioned this pull request Sep 26, 2018
@coffeepac
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 26, 2018
@pohly
Copy link
Contributor Author

pohly commented Sep 27, 2018

/retest

@neolit123
Copy link
Member

/kind cleanup

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Oct 1, 2018
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @pohly for working on this.

i'm must say that the overall approach SGTM.
so this gets my +1, based on a quick review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the approach.
it creates some boilerplate, but overall this seems to comply with the demands of individual tests?

i will defer to the maintainers though for more reviews.

@neolit123
Copy link
Member

/assign @timothysc
/assign
/cc @kubernetes/sig-cluster-lifecycle-pr-reviews
/cc @kubernetes/sig-testing-pr-reviews

@k8s-ci-robot k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Oct 1, 2018
@coffeepac
Copy link
Contributor

/approve

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to put a blurb about this state of versioning support. TBH I don't think this is an area where backwards compatibility in configuration files should matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With "blurb about this state of versioning support", do you mean adding a warning that renaming fields or the common prefix is a user-visible change? I've added that and also updated the example a bit:

  • the example, including the corresponding command line flags, is presented completely before moving towards the definition of the underlying mechanism
  • the "scaling" part has to be defined in the prefix, because the struct name cannot be determined automatically (in this example, the struct is anonymous), nor is that desirable (it could be something that has no meaning outside of the source code)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pflag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not here. It's really https://godoc.org/flag and not https://godoc.org/github.com/spf13/pflag. test/e2e parses the command line with the former, in contrast to test/e2e_node, which uses the latter. So the "lingua franca" for defining flags that work everywhere has to be the flag package from the Go library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should avoid the panics and propagate errors where possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddOptions is meant to be called as initializer for a global variable, so pushing the error handling to the caller would just make using it harder. There's also not much that the caller can do besides panicking: this is a programming mistake that must be fixed before the test suite can be used.

This is similar to re.MustCompile or flag.Var, which also panic when the input is incorrect.

Copy link
Contributor Author

@pohly pohly Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, because flag.Var is called indirectly, AddOptions already panics in some cases (redefining flags), which is outside of our control. We might as well then also do the same for cases detected in our code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with it for now, but not a huge fan as it can be an anti-pattern is downstream folks import to leverage.

pohly added 5 commits October 5, 2018 14:22
Storing settings in the framework's TestContext is not something that
out-of-tree test authors can do because for them the framework is a
read-only upstream component. Conceptually the same is true for
in-tree tests, so the recommended approach is to define configuration
settings in the code that uses them.

How to do that is a bit uncertain. Viper has several
drawbacks (maintenance status uncertain, cannot list supported
options, cannot validate the configuration file). How to handle
configuration files is currently getting discussed for kubeadm, with
similar concerns about
Viper (kubernetes/kubeadm#1040).

Instead of making a choice now for E2E, the recommendation is that
test authors continue to define command line flags as before, except
that they should do it in their own code and with better flag names.

But the ability to read options also from a file is useful, so
several enhancements get added:
- all settings defined via flags can also be read from a
  configuration file, without extra work for test authors
- framework/config makes it possible to populate a struct directly
  and define flags with a single function call
- a path and file suffix can be given to --viper-config (as in
  "--viper-config /tmp/e2e.json") instead of expecting the file in
  the current directory; as before, just plain "--viper-config e2e"
  still works
- if "--viper-config" is set, the file must exist; otherwise the
  "e2e" config is optional (as before)
- errors from Viper are no longer silently ignored, so syntax errors
  are detected early
- Viper support is optional: test suite authors who don't want
  it are not forced to use it by the e2e/framework
Tests shouldn't have to use the central context for their settings,
because conceptually tests and framework get developed independently.

This does not yet use the new framework/config utility code because
that code still needs to be reviewed.

Besides moving the flags, they also get renamed from the top-level
"--csiImage{Version|Registry}" to
"--storage.csi.image.{version|registry}". These flags were introduced
fairly recently and shouldn't be in use much, so now is a good time to
introduce a hierarchical naming for storage flags, in particular
because more flags will be added soon.
Tests settings should be defined in the test source code itself
because conceptually the framework is a separate entity that not all
test authors can modify.

Using the new framework/config code also has several advantages:
- defaults can be set with less code
- no confusion around what's a duration
- the options can also be set via command line flags

While at it, a minor bug gets fixed:
- readConfig() returns only defaults when called while
  registering Ginkgo tests because Viperize() gets called later,
  so the scale in the logging soak test couldn't really be configured;
  now the value is read when the test runs and thus can be changed

The options get moved into the "instrumentation.logging"
resp. "instrumentation.monitoring" group to make it more obvious where
they are used. This is a breaking change, but that was already
necessary to improve the duration setting from plain integer to a
proper time duration.
Tests settings should be defined in the test source code itself
because conceptually the framework is a separate entity that not all
test authors can modify.

For the sake of backwards compatibility the name of the command line
flags are not changed.
Generated via hack/update-bazel.sh.
@pohly
Copy link
Contributor Author

pohly commented Oct 5, 2018

I've updated the Viper config support:

  • the default is as it was before this PR: look for "e2e", proceed without it
  • the -viper-config command line flag is defined by the test suite, not by the package: cleaner, allows reuse
  • the package returns errors instead of panicking (here I agree with @timothysc that returning an error is better because this error is probably caused by the user and thus may have to be reported differently)
  • no more second parsing of command line flags with pflag, instead the priority of "flags over config" gets ensured by viperUnmarshal itself: this is simpler and solves a bug where -provider foo triggered an error from pflag (it treats single-dash options differently than flag)
  • better error messages, for example: viper config "e2e" = "/nvme/gopath/src/k8s.io/kubernetes/e2e.yaml": setting v from config file value: strconv.Atoi: parsing "foobar": invalid syntax when setting "v", the integer log level, to a string in the e2e.yaml file

@timothysc I think I addressed the review comments, but please check.

@pohly pohly changed the title WIP: E2E settings E2E settings Oct 5, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 5, 2018
Copy link
Contributor

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with it for now, but not a huge fan as it can be an anti-pattern is downstream folks import to leverage.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: coffeepac, pohly, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2018
@neolit123
Copy link
Member

/retest

2 similar comments
@pohly
Copy link
Contributor Author

pohly commented Oct 10, 2018

/retest

@pohly
Copy link
Contributor Author

pohly commented Oct 10, 2018

/retest

@k8s-ci-robot k8s-ci-robot merged commit 9c94363 into kubernetes:master Oct 10, 2018
@deads2k
Copy link
Contributor

deads2k commented Mar 20, 2019

I came across this today and I was surprised that UpgradeTarget and UpgradeImage are not considered core to the framework that runs all tests. If different tests have different notions of the upgrade target and image, they aren't likely to work well together.

@smarterclayton

_ "k8s.io/kubernetes/test/e2e/ui"
)

var viperConfig = flag.String("viper-config", "", "The name of a viper config file (https://github.com/spf13/viper#what-is-viper). All e2e command line parameters can also be configured in such a file. May contain a path and may or may not contain the file suffix. The default is to look for an optional file with `e2e` as base name. If a file is specified explicitly, it must be present.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort of distributed registration makes it really hard to depend on packages because of conflicts. It also makes it impossible to avoid stripping vendor. And it prevents visibility to any other thing that wants access. I don't think we should use a pattern like this and instead we could use the kubectl or kube-apiserver/controller-manager method of Options structs with Bind methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to let each test own a certain parameter namespace (like 'storage' for test/e2e/storage). One of my earlier revisions of this PR described a naming scheme that would have avoided conflicts, but during review it was pointed out that this is too complex and therefore its now only a recommendation.

UpgradeTarget and UpgradeImage are legacy options that for historic reasons are defined at the top level of the namespace and weren't renamed to avoid breaking jobs. They don't belong into the core framework because they only parametrize certain tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has actually made it harder to build tests on top. Instead of having a publicly exposed config struct that I can set any way I want and then use, I must have my executable called to set these registered flags.

Copy link
Member

@neolit123 neolit123 Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see viper as a temporary solution here.
the work that WG component standard are doing is planned to solve the flags vs config problem, hopefully.
https://github.com/kubernetes/community/tree/master/wg-component-standard

one of the early steps in flight is a KEP for a fork/wrapper of pflag (kflag):
kubernetes/enhancements#764

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has actually made it harder to build tests on top. Instead of having a publicly exposed config struct that I can set any way I want and then use,

I don't understand the part about "harder to build tests". Previously it was impossible to build tests on top of the framework without modifying the framework. Now that part works.

I must have my executable called to set these registered flags.

Yes, of course the executable must be called, because it is the one parsing flags and/or configuration. Can you explain how and where you want to set the flags without involving the test suite binary?

I'm not trying to defend Viper here. I've never liked it myself (no error messages when spelling an option wrong!) and if there is a better solution, then I'm sure that would be welcome.

Copy link
Contributor

@deads2k deads2k Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this leads to clean composition. I cannot import the test/e2e/lifecycle package without it claiming global flags. Then, because the flag is treated as an internal global variable, it's impossible to set externally and embed the included tests.

This is substantively worse than the composition described above where it possible to create the options, then set them based on a config of the composer's choosing, then execute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this leads to clean composition. I cannot import the test/e2e/lifecycle package without it claiming global flags. Then, because the flag is treated as an internal global variable, it's impossible to set externally and embed the included tests.

turns out you're hitting the same problem with klog: #75403 . This pull created the same problem with the tests. They're not importable anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot import the test/e2e/lifecycle package without it claiming global flags.

You also can't import the e2e/framework either without it claiming global flags. That's actually worse, because one always gets the flags even when not using the corresponding tests. My proposal ("proposal" because not completely implemented) makes that better by only defining flags when actually importing the tests that use them.

Anyway, I thought that using flag and its default FlagSet was an acceptable mode of operation. That was fine for the Kubernetes E2E suite and no-one mentioned other usages where that might not be appropriate.

It's not hard to change. Instead of allowing tests to add to the default FlagSet, we define a framework/config.FlagSet that is to be used for test config options. Then the user of the framework can decide whether it wants to expose those as actual command line options or only wants to set them indirectly through some other mechanism, like viperconfig.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also can't import the e2e/framework either without it claiming global flags. That's actually worse, because one always gets the flags even when not using the corresponding tests. My proposal ("proposal" because not completely implemented) makes that better by only defining flags when actually importing the tests that use them.

Before I had to explicitly call a method to register the flags (RegisterCommonFlags as I recall). Now you're doing it on import. Before I could import a package and choose how to use it, now I can't import the package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I had to explicitly call a method to register the flags (RegisterCommonFlags as I recall). Now you're doing it on import.

Right, I had forgotten that aspect. So you are asking for tests to register their flags only when called explicitly, right?

What about the Ginkgo tests? They also get registered during init, unconditionally. It's perhaps less likely that one wants to import a test package without actually using the tests, but it could happen. Someone might want to have finer control over which tests get registered, based on command line flags. We actually do have this issue right now for CSI testing: we use the Kubernetes E2E suite, but run only 28 out of 6740 tests. Loading all those JUnit files with the Skip entries is noticable slow in Spyglass.

Let me see whether I can come up with a solution...

@justinsb
Copy link
Member

This is now the sole consumer of viper, after #76780 viper will be the sole consumer of afero; viper is already the sole consumer of hashcorp/hcl (at least).

How is this actually used? Do we need anything more than yaml & json support? My experience has been that it's easier to just write a parser, rather than using viper...

@neolit123
Copy link
Member

i think that viper support in the e2e framework was a mistake.
though removing it now will probably break consumers.

something that i proposed to @timothysc is that we simply start building a separate V2 binary e.g. e2e-suite2.test and re-do it's flags. the current flags for the suite are unmanageable. similar was done in the case of kubetest -> kubetest2.

as a side note, this type of piping of config fields to CLI flags was briefly discussed during this week's WG component standard meeting with @mtaufen and @stealthybox .

@BenTheElder
Copy link
Member

BenTheElder commented Apr 19, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants