PERF Cache intermediate results in decision tree estimators

I am wondering if it would be worth it to cache the intermediate results in decision tree estimators to avoid that the `sum_total` and `weighted_n_node_samples` attributes are re-computed.

In particular, these `for` loops will not be needed:

https://github.com/scikit-learn/scikit-learn/blob/6ca9eab67e1054a1f9508dfa286e0542c8bab5e3/sklearn/tree/_criterion.pyx#L330-L343

and:

https://github.com/scikit-learn/scikit-learn/blob/6ca9eab67e1054a1f9508dfa286e0542c8bab5e3/sklearn/tree/_criterion.pyx#L768-L780

and:

https://github.com/scikit-learn/scikit-learn/blob/6ca9eab67e1054a1f9508dfa286e0542c8bab5e3/sklearn/tree/_criterion.pyx#L1056-L1068

It should be easy to cache these intermediate results in `Stack` and `PriorityHeap`, but I think we should discuss before addressing the issue.

	for p in range(start, end):
	i = samples[p]

	# w is originally set to be 1.0, meaning that if no sample weights
	# are given, the default weight of each sample is 1.0
	if sample_weight != NULL:
	w = sample_weight[i]

	# Count weighted class frequency for each target
	for k in range(self.n_outputs):
	c = <SIZE_t> self.y[i, k]
	sum_total[k * self.sum_stride + c] += w

	self.weighted_n_node_samples += w

	for p in range(start, end):
	i = samples[p]

	if sample_weight != NULL:
	w = sample_weight[i]

	for k in range(self.n_outputs):
	y_ik = self.y[i, k]
	w_y_ik = w * y_ik
	self.sum_total[k] += w_y_ik
	self.sq_sum_total += w_y_ik * y_ik

	self.weighted_n_node_samples += w

	for p in range(start, end):
	i = samples[p]

	if sample_weight != NULL:
	w = sample_weight[i]

	for k in range(self.n_outputs):
	# push method ends up calling safe_realloc, hence `except -1`
	# push all values to the right side,
	# since pos = start initially anyway
	(<WeightedMedianCalculator> right_child[k]).push(self.y[i, k], w)

	self.weighted_n_node_samples += w

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF Cache intermediate results in decision tree estimators #18630

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

PERF Cache intermediate results in decision tree estimators #18630

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions