Skip to content

Commit 9a2321e

Browse files
authored
* adding MLOps folder * adding MLOps README
1 parent 36d7901 commit 9a2321e

File tree

6 files changed

+417
-0
lines changed

6 files changed

+417
-0
lines changed

mlops/kubeflow/README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Kubeflow Pipeline Components for AIF360
2+
3+
Kubeflow pipeline components are implementations of Kubeflow pipeline tasks. Each task takes
4+
one or more [artifacts](https://www.kubeflow.org/docs/pipelines/overview/concepts/output-artifact/)
5+
as input and may produce one or more
6+
[artifacts](https://www.kubeflow.org/docs/pipelines/overview/concepts/output-artifact/) as output.
7+
8+
9+
**Example: AIF360 Components**
10+
* [Bias Detector - PyTorch](bias_detector_pytorch)
11+
12+
Each task usually includes two parts:
13+
14+
Each component has a component.yaml which will describe the finctionality exposed by it, for e.g.
15+
16+
```
17+
name: 'PyTorch Model Fairness Check'
18+
description: |
19+
Perform a fairness check on a certain attribute using AIF360 to make sure the PyTorch model is fair
20+
metadata:
21+
annotations: {platform: 'OpenSource'}
22+
inputs:
23+
- {name: model_id, description: 'Required. Training model ID', default: 'training-dummy'}
24+
- {name: model_class_file, description: 'Required. pytorch model class file'}
25+
- {name: model_class_name, description: 'Required. pytorch model class name', default: 'model'}
26+
- {name: feature_testset_path, description: 'Required. Feature test dataset path in the data bucket'}
27+
- {name: label_testset_path, description: 'Required. processed_data/y_test.npy'}
28+
- {name: protected_label_testset_path, description: 'Required. Protected label test dataset path in the data bucket'}
29+
- {name: favorable_label, description: 'Required. Favorable label for this model predictions'}
30+
- {name: unfavorable_label, description: 'Required. Unfavorable label for this model predictions'}
31+
- {name: privileged_groups, description: 'Required. Privileged feature groups within this model'}
32+
- {name: unprivileged_groups, description: 'Required. Unprivileged feature groups within this model'}
33+
outputs:
34+
- {name: metric_path, description: 'Path for fairness check output'}
35+
implementation:
36+
container:
37+
image: aipipeline/fairness-check-with-secret:pytorch-v3
38+
command: ['python']
39+
args: [
40+
-u, fairness_check.py,
41+
--model_id, {inputValue: model_id},
42+
--model_class_file, {inputValue: model_class_file},
43+
--model_class_name, {inputValue: model_class_name},
44+
--feature_testset_path, {inputValue: feature_testset_path},
45+
--label_testset_path, {inputValue: label_testset_path},
46+
--protected_label_testset_path, {inputValue: protected_label_testset_path},
47+
--favorable_label, {inputValue: favorable_label},
48+
--unfavorable_label, {inputValue: unfavorable_label},
49+
--privileged_groups, {inputValue: privileged_groups},
50+
--unprivileged_groups, {inputValue: unprivileged_groups},
51+
--metric_path, {outputPath: metric_path}
52+
]
53+
```
54+
55+
See how to [use the Kubeflow Pipelines SDK](https://www.kubeflow.org/docs/pipelines/sdk/sdk-overview/)
56+
and [build your own components](https://www.kubeflow.org/docs/pipelines/sdk/build-component/).
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
FROM pytorch/pytorch:latest
2+
3+
RUN pip install Flask aif360 pandas flask-cors Minio Pillow torchsummary
4+
5+
ENV APP_HOME /app
6+
COPY src $APP_HOME
7+
WORKDIR $APP_HOME
8+
9+
ENTRYPOINT ["python"]
10+
CMD ["app.py"]
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Copyright 2019 IBM Corporation
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
name: 'PyTorch Model Fairness Check'
16+
description: |
17+
Perform a fairness check on a certain attribute using AIF360 to make sure the PyTorch model is fair
18+
metadata:
19+
annotations: {platform: 'OpenSource'}
20+
inputs:
21+
- {name: model_id, description: 'Required. Training model ID', default: 'training-dummy'}
22+
- {name: model_class_file, description: 'Required. pytorch model class file'}
23+
- {name: model_class_name, description: 'Required. pytorch model class name', default: 'model'}
24+
- {name: feature_testset_path, description: 'Required. Feature test dataset path in the data bucket'}
25+
- {name: label_testset_path, description: 'Required. processed_data/y_test.npy'}
26+
- {name: protected_label_testset_path, description: 'Required. Protected label test dataset path in the data bucket'}
27+
- {name: favorable_label, description: 'Required. Favorable label for this model predictions'}
28+
- {name: unfavorable_label, description: 'Required. Unfavorable label for this model predictions'}
29+
- {name: privileged_groups, description: 'Required. Privileged feature groups within this model'}
30+
- {name: unprivileged_groups, description: 'Required. Unprivileged feature groups within this model'}
31+
outputs:
32+
- {name: metric_path, description: 'Path for fairness check output'}
33+
implementation:
34+
container:
35+
image: aipipeline/fairness-check-with-secret:pytorch-v3
36+
command: ['python']
37+
args: [
38+
-u, fairness_check.py,
39+
--model_id, {inputValue: model_id},
40+
--model_class_file, {inputValue: model_class_file},
41+
--model_class_name, {inputValue: model_class_name},
42+
--feature_testset_path, {inputValue: feature_testset_path},
43+
--label_testset_path, {inputValue: label_testset_path},
44+
--protected_label_testset_path, {inputValue: protected_label_testset_path},
45+
--favorable_label, {inputValue: favorable_label},
46+
--unfavorable_label, {inputValue: unfavorable_label},
47+
--privileged_groups, {inputValue: privileged_groups},
48+
--unprivileged_groups, {inputValue: unprivileged_groups},
49+
--metric_path, {outputPath: metric_path}
50+
]
Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
# Copyright 2019 IBM Corporation
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
import os
15+
from aif360.datasets import BinaryLabelDataset
16+
from aif360.metrics import ClassificationMetric
17+
import numpy as np
18+
import argparse
19+
import pandas as pd
20+
from minio import Minio
21+
import json
22+
import zipfile
23+
import importlib
24+
import re
25+
26+
import torch
27+
import torch.utils.data
28+
from torch.autograd import Variable
29+
30+
from flask import Flask, request, abort
31+
from flask_cors import CORS
32+
33+
app = Flask(__name__)
34+
CORS(app)
35+
36+
def dataset_wrapper(outcome, protected, unprivileged_groups, privileged_groups, favorable_label, unfavorable_label):
37+
""" A wrapper function to create aif360 dataset from outcome and protected in numpy array format.
38+
"""
39+
df = pd.DataFrame(data=outcome,
40+
columns=['outcome'])
41+
df['race'] = protected
42+
43+
dataset = BinaryLabelDataset(favorable_label=favorable_label,
44+
unfavorable_label=unfavorable_label,
45+
df=df,
46+
label_names=['outcome'],
47+
protected_attribute_names=['race'],
48+
unprivileged_protected_attributes=unprivileged_groups)
49+
return dataset
50+
51+
def get_s3_item(client, bucket, s3_path, name):
52+
try:
53+
client.Bucket(bucket).download_file(s3_path, name)
54+
except botocore.exceptions.ClientError as e:
55+
if e.response['Error']['Code'] == "404":
56+
print("The object does not exist.")
57+
else:
58+
raise
59+
60+
# Compute the accuaracy and predicted label using the given test dataset
61+
def evaluate(model, X_test, y_test):
62+
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
63+
test = torch.utils.data.TensorDataset(Variable(torch.FloatTensor(X_test.astype('float32'))), Variable(torch.LongTensor(y_test.astype('float32'))))
64+
test_loader = torch.utils.data.DataLoader(test, batch_size=64, shuffle=False)
65+
model.eval()
66+
correct = 0
67+
accuracy = 0
68+
y_pred = []
69+
with torch.no_grad():
70+
for images, labels in test_loader:
71+
images = images.to(device)
72+
labels = labels.to(device)
73+
outputs = model(images)
74+
_, predicted = torch.max(outputs.data, 1)
75+
predictions = torch.softmax(outputs.data, dim=1).detach().numpy()
76+
correct += predicted.eq(labels.data.view_as(predicted)).sum().item()
77+
y_pred += predicted.tolist()
78+
accuracy = 1. * correct / len(test_loader.dataset)
79+
y_pred = np.array(y_pred)
80+
return accuracy, y_pred
81+
82+
83+
def fairness_check(object_storage_url, object_storage_username, object_storage_password,
84+
data_bucket_name, result_bucket_name, model_id,
85+
feature_testset_path='processed_data/X_test.npy',
86+
label_testset_path='processed_data/y_test.npy',
87+
protected_label_testset_path='processed_data/p_test.npy',
88+
model_class_file='model.py',
89+
model_class_name='model',
90+
favorable_label=0.0,
91+
unfavorable_label=1.0,
92+
privileged_groups=[{'race': 0.0}],
93+
unprivileged_groups=[{'race': 4.0}]):
94+
95+
url = re.compile(r"https?://")
96+
cos = Minio(url.sub('', object_storage_url),
97+
access_key=object_storage_username,
98+
secret_key=object_storage_password)
99+
100+
dataset_filenamex = "X_test.npy"
101+
dataset_filenamey = "y_test.npy"
102+
dataset_filenamep = "p_test.npy"
103+
weights_filename = "model.pt"
104+
model_files = model_id + '/_submitted_code/model.zip'
105+
106+
cos.fget_object(data_bucket_name, feature_testset_path, dataset_filenamex)
107+
cos.fget_object(data_bucket_name, label_testset_path, dataset_filenamey)
108+
cos.fget_object(data_bucket_name, protected_label_testset_path, dataset_filenamep)
109+
cos.fget_object(result_bucket_name, model_id + '/' + weights_filename, weights_filename)
110+
cos.fget_object(result_bucket_name, model_files, 'model.zip')
111+
112+
# Load PyTorch model definition from the source code.
113+
zip_ref = zipfile.ZipFile('model.zip', 'r')
114+
zip_ref.extractall('model_files')
115+
zip_ref.close()
116+
117+
modulename = 'model_files.' + model_class_file.split('.')[0].replace('-', '_')
118+
119+
'''
120+
We required users to define where the model class is located or follow
121+
some naming convention we have provided.
122+
'''
123+
model_class = getattr(importlib.import_module(modulename), model_class_name)
124+
125+
# load & compile model
126+
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
127+
model = model_class().to(device)
128+
model.load_state_dict(torch.load(weights_filename, map_location=device))
129+
130+
"""Load the necessary labels and protected features for fairness check"""
131+
132+
x_test = np.load(dataset_filenamex)
133+
y_test = np.load(dataset_filenamey)
134+
p_test = np.load(dataset_filenamep)
135+
136+
_, y_pred = evaluate(model, x_test, y_test)
137+
138+
"""Calculate the fairness metrics"""
139+
140+
original_test_dataset = dataset_wrapper(outcome=y_test, protected=p_test,
141+
unprivileged_groups=unprivileged_groups,
142+
privileged_groups=privileged_groups,
143+
favorable_label=favorable_label,
144+
unfavorable_label=unfavorable_label)
145+
plain_predictions_test_dataset = dataset_wrapper(outcome=y_pred, protected=p_test,
146+
unprivileged_groups=unprivileged_groups,
147+
privileged_groups=privileged_groups,
148+
favorable_label=favorable_label,
149+
unfavorable_label=unfavorable_label)
150+
151+
classified_metric_nodebiasing_test = ClassificationMetric(original_test_dataset,
152+
plain_predictions_test_dataset,
153+
unprivileged_groups=unprivileged_groups,
154+
privileged_groups=privileged_groups)
155+
TPR = classified_metric_nodebiasing_test.true_positive_rate()
156+
TNR = classified_metric_nodebiasing_test.true_negative_rate()
157+
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)
158+
159+
print("#### Plain model - without debiasing - classification metrics on test set")
160+
161+
metrics = {
162+
"Classification accuracy": classified_metric_nodebiasing_test.accuracy(),
163+
"Balanced classification accuracy": bal_acc_nodebiasing_test,
164+
"Statistical parity difference": classified_metric_nodebiasing_test.statistical_parity_difference(),
165+
"Disparate impact": classified_metric_nodebiasing_test.disparate_impact(),
166+
"Equal opportunity difference": classified_metric_nodebiasing_test.equal_opportunity_difference(),
167+
"Average odds difference": classified_metric_nodebiasing_test.average_odds_difference(),
168+
"Theil index": classified_metric_nodebiasing_test.theil_index(),
169+
"False negative rate difference": classified_metric_nodebiasing_test.false_negative_rate_difference()
170+
}
171+
print("metrics: ", metrics)
172+
return metrics
173+
174+
# with open(metric_path, "w") as report:
175+
# report.write(json.dumps(metrics))
176+
177+
178+
@app.route('/', methods=['POST'])
179+
def fairness_api():
180+
try:
181+
s3_url = request.json['aws_endpoint_url']
182+
result_bucket_name = request.json['training_results_bucket']
183+
s3_username = request.json['aws_access_key_id']
184+
s3_password = request.json['aws_secret_access_key']
185+
training_id = request.json['model_id']
186+
data_bucket_name = request.json['training_data_bucket']
187+
except:
188+
abort(400)
189+
return json.dumps(fairness_check(s3_url, s3_username, s3_password, data_bucket_name, result_bucket_name, training_id))
190+
191+
192+
@app.route('/', methods=['OPTIONS'])
193+
def fairness_api_options():
194+
return "200"
195+
196+
197+
if __name__ == "__main__":
198+
app.run(debug=True,host='0.0.0.0',port=int(os.environ.get('PORT', 8080)))

0 commit comments

Comments
 (0)