Memory Management Issue in class ClassificationMetric #526
Open
Description
I'm using the provided code to compute SPD (Statistical Parity Difference) on adult datasets. However, upon calling the function get_spd_and_accuracy within a loop, I've noticed that memory consumption gradually increases when class_metrics.statistical_parity_difference()
is instantiated with each iteration, and this memory is not being released at the end of each iteration.
from aif360.datasets import AdultDataset, GermanDataset, CompasDataset
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from aif360.datasets import StandardDataset, BinaryLabelDataset
from aif360.metrics import ClassificationMetric
from copy import deepcopy
from sklearn.metrics import accuracy_score
def get_spd_and_accuracy(df, protected, target):
'''Prepare the data for training and testing'''
train, test = train_test_split(df, test_size=0.2, shuffle=True)
X_train = train.drop([protected, target], axis=1).values
y_train = train[target].values
y_test = test[target].values
X_test = test.drop([protected, target], axis=1).values
'''Train the model and predict the labels for the training and testing data'''
lmod = LogisticRegression(solver='liblinear', class_weight='balanced')
lmod.fit(X_train,y_train)
y_train_pred = lmod.predict(X_train)
y_test_pred = lmod.predict(X_test)
'''Prepare the data for the AIF360 metrics'''
train_transf = StandardDataset(train,
label_name=target,
favorable_classes=[1],
protected_attribute_names=[protected],
categorical_features=[],
features_to_drop=[],
privileged_classes=[[1.0]])
train_transf_pred = deepcopy(train_transf)
train_transf_pred.labels = y_train_pred
un_p=[{protected:0.0}]
p=[{protected:1.0}]
'''Calculate the Statistical Parity Difference and Accuracy Score'''
class_metrics = ClassificationMetric(train_transf,train_transf_pred,unprivileged_groups=un_p, privileged_groups=p)
print(round(class_metrics.statistical_parity_difference(),2))
print(round(accuracy_score(y_test, y_test_pred),2))
dataset = AdultDataset()
df = dataset.convert_to_dataframe()[0]
target = df.columns[-1]
protected = 'sex'
for i in range(25):
get_spd_and_accuracy(df, protected, target)
Any insights or recommendations regarding memory release strategies in this context would be greatly appreciated. Below is the snapshot of increase in memory.
Metadata
Assignees
Labels
No labels