Advanced API #14198

adrinjalali · 2019-06-26T14:25:55Z

In a few places we've mentioned certain functions which we would like to have, but they don't necessarily fit the usual fit/transform/predict pattern.

For instance, @jnothman mentioned having an add function for the ColumnTransformer, which we can also have for the Pipeline.

We also talked about a prune_tree function in #14038.

One concern seems to be that we would like to keep the API very simple and easy, which I agree with. But it doesn't have to limit us from having more ad-hoc functions, which we could fit in a separate section in the docs and tag them as advanced.

I'm not sure how we could handle that with sphinx, but what I'm proposing is to tag those functions as advanced, and have sphinx render them in an "advanced" section bellow the other functions. This way they would not interfere with the usual experience of a new user who's reading the docs, and yet it would enable us to introduce some rather useful methods.

I may be missing some historical discussion on this topic though, sorry for that.

The text was updated successfully, but these errors were encountered:

jnothman · 2019-06-26T14:35:38Z

Adding new numpydoc sections may not be so easy... prune_tree differs from add in that it mutates the model, not the parameters.

adrinjalali · 2019-06-26T14:39:58Z

Since our convention (or even a constraint) is to validate model parameters at fit time, does it matter if the parameters are mutated using set_params or another handier function?

As for the numpydoc sections, the question is, is it worth it? If it would make us come to a consensus and agree to accept these extra pieces of API easier, I'd say it is. I'm not sure if it's a concern though.

thomasjpfan · 2019-06-26T14:42:13Z

prune_tree differs from add in that it mutates the model

We can also design it to return a new instance of the DecisionTree* and not mutate the original tree.

amueller · 2019-06-26T16:52:50Z

I think we had tons of ad-hoc methods in the past. Some became more standard and some we moved away from.

I don't think there's anything that prevents us from adding ad-hoc methods right now if they are warranted.

jnothman · 2019-06-27T11:29:03Z

A builder method, like ColumnTransformer().add(columns=categorical, transformer=OneHotEncoder()).add(columns=~categorical, transformer=StandardScaler()) is not like the ad-hoc methods we've had before. We've previously had parameters set only by construction, set_params or setattr.

adrinjalali · 2019-06-27T11:45:02Z

Yes, but we also almost always make sure the validation is done during fit. So it doesn't really matter how the user sets the parameters, using which function, does it?

jnothman · 2019-06-27T12:09:05Z

I agree; that's essentially why I proposed it. I just think it's a departure from existing "ad hoc methods".

amueller · 2019-06-27T14:44:56Z

Thanks for the clarification, that wasn't clear to me from the initial description.
I'm +0 on an add method right not, I think. I haven't found this to be an inconvenience so far.

jnothman · 2019-06-28T06:16:20Z

In ColumnTransformer the benefit of 'add' is that it gets rid of the need to remember the order of a triple because the method has parameter names. It means that you can specify or use the default name for a step, rather than the all-or-nothing approach of make_pipeline vs Pipeline. It would make it easier to integrate other ways for users to specify the set of columns too.

adrinjalali added API Hard Hard level of difficulty labels Jun 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced API #14198

Advanced API #14198

adrinjalali commented Jun 26, 2019

jnothman commented Jun 26, 2019 via email

adrinjalali commented Jun 26, 2019

thomasjpfan commented Jun 26, 2019

amueller commented Jun 26, 2019

jnothman commented Jun 27, 2019 via email

adrinjalali commented Jun 27, 2019

jnothman commented Jun 27, 2019 via email

amueller commented Jun 27, 2019

jnothman commented Jun 28, 2019 via email

Advanced API #14198

Advanced API #14198

Comments

adrinjalali commented Jun 26, 2019

jnothman commented Jun 26, 2019 via email

adrinjalali commented Jun 26, 2019

thomasjpfan commented Jun 26, 2019

amueller commented Jun 26, 2019

jnothman commented Jun 27, 2019 via email

adrinjalali commented Jun 27, 2019

jnothman commented Jun 27, 2019 via email

amueller commented Jun 27, 2019

jnothman commented Jun 28, 2019 via email