-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Description
Describe the workflow you want to enable
Description: Add a new method simplify() to the decision tree Class that returns a simplified version of the decision tree by pruning redundant leaves that do not add new decision paths. This simplification method will create a new instance of the tree with fewer nodes, allowing users to obtain a more concise and interpretable tree.
Motivation: Decision trees are known for their ability to provide easily interpretable models, but sometimes they can grow to a size that makes interpretation challenging. A simplified version of the decision tree will make it easier for users to understand the model, especially in situations where the tree has a large number of redundant leaves.
Related Work: There are related issues in Scikit-learn #10810, IBM Taxinomitis #226 and Stack Overfflow that point to the need for decision tree simplification. Scikit-learn has added post-pruning methods based on cost, as far as I know, they do not achieve the same level of simplification proposed in this feature.
Describe your proposed solution
Proposed Solution: Add a simplify() method to the decision tree module that returns a simplified version of the tree. The method will prune redundant leaves that do not add new decision paths. The new tree will have fewer nodes and be easier to interpret.
Example Usage:
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier()
...
clf.fit(X, y)
tree = clf.tree_
simplified_tree = tree.simplify()Describe alternatives you've considered, if relevant
An alternative way is to create ad hoc functions to achieve this simplification. But I believe this feature is important enough to be in the main repository
Additional context
I can work on this feature request and send a PR, if approved by members.

