Skip to content

Simplify decision tree removing redundant decisions #25612

@nullhack

Description

@nullhack

Describe the workflow you want to enable

Description: Add a new method simplify() to the decision tree Class that returns a simplified version of the decision tree by pruning redundant leaves that do not add new decision paths. This simplification method will create a new instance of the tree with fewer nodes, allowing users to obtain a more concise and interpretable tree.

Motivation: Decision trees are known for their ability to provide easily interpretable models, but sometimes they can grow to a size that makes interpretation challenging. A simplified version of the decision tree will make it easier for users to understand the model, especially in situations where the tree has a large number of redundant leaves.

Example:
image
Can be simplified to
image

Related Work: There are related issues in Scikit-learn #10810, IBM Taxinomitis #226 and Stack Overfflow that point to the need for decision tree simplification. Scikit-learn has added post-pruning methods based on cost, as far as I know, they do not achieve the same level of simplification proposed in this feature.

Describe your proposed solution

Proposed Solution: Add a simplify() method to the decision tree module that returns a simplified version of the tree. The method will prune redundant leaves that do not add new decision paths. The new tree will have fewer nodes and be easier to interpret.

Example Usage:

from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier()

...

clf.fit(X, y)

tree = clf.tree_
simplified_tree = tree.simplify()

Describe alternatives you've considered, if relevant

An alternative way is to create ad hoc functions to achieve this simplification. But I believe this feature is important enough to be in the main repository

Additional context

I can work on this feature request and send a PR, if approved by members.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions