-
Notifications
You must be signed in to change notification settings - Fork 855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a bias detector based on optimal transport #434
Conversation
6909467
to
ac9784a
Compare
Hi, @jmarecek. Thanks for your work! I have a couple requests:
At a high level, though, there seems to be a lot of repeated code here. Is there a way we can reuse the MDSS bias scan code for this? For example, how does this differ from adding another "scoring function"? That seems much cleaner to me and would cut down considerably on the checklist above. |
8397ee4
to
7271964
Compare
Hi @hoffmansc, I hope all is well. Illia has pushed the unit tests, doc strings, etc. It would be great to revisit this. I don't see how the MDSS scan could be extended to work with OT (or many other types of bias). Illia and I would be happy to jump on a call, should this be faster? Thanks! |
Hi @jmarecek @Illia-Kryvoviaz - I've reviewed this as well and happy to have a call to discuss some details here. In addition to the points raised above by @hoffmansc (FYI - point 1 is still open, and points 2 and 3 have only been partially completed), please also consider:
Thank you. |
c6c90fd
to
fef1bc9
Compare
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Hello @hoffmansc! As for detector vs metric, we argue that this should be an independent detector, rather than a metric, because it is too time-consuming to be a metric or run the subgroup analysis similar to MDSS. |
Thanks for your updates. I'm not sure I understand your point about detector vs. metric. It's fine if the metric is slow. The question is which classification better fits the algorithm and I think if it doesn't identify a subgroup like MDSS it shouldn't be called a detector. Metrics are simply anything which takes in data (predictions, ground truth, features, groups, etc.) and returns a numerical score (either per-group or "reduced" by taking the difference between groups, for example). Am I misinterpreting your algorithm? |
Like I mentioned above, there are a variety of scoring functions within the "Optimal transport perspective", incl. |
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Signed-off-by: Illia-Kryvoviaz <[email protected]>
I understand there can be multiple distance/scoring functions. I'm not saying there needs to be any changes to the function just that it would be better suited under |
Sam, your logic for it to be under metrics makes sense to me. |
Hi Kush -- so what about this: we contribute the OT-based metrics to aif360.metrics, and we add some "SimpleDetector" to detectors, which would not try to run the subgroup scan. That way, the current functionality of the code would be easy to access. Would that work? Jakub |
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Thanks for the call and discussion today @hoffmansc @jmarecek . As concluded, from an end-user point of view the OT-based measures are better suited under |
Actually, to be clear, it's only strictly necessary to put it under |
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Hello @hoffmansc! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good for now, thanks. I made one suggestion -- renaming the function to ot_distance
. If this is acceptable, I can merge it.
Down the line, I would like this to integrate with aif360.sklearn.metrics.intersection()
/difference()
instead of how it handles prot_attr
currently but I can make a separate PR with those changes and ask for your review.
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Hello @hoffmansc! |
@Illia-Kryvoviaz It looks like one of the tests is failing. This line should be AIF360/tests/test_ot_metric.py Line 136 in bb2ec68
|
Signed-off-by: Illia-Kryvoviaz <[email protected]>
Yes, you are right @hoffmansc. Now it looks right. |
Signed-off-by: Divya <[email protected]>
Signed-off-by: Venkata Meghana Achanta <[email protected]> Signed-off-by: meghana009 <[email protected]>
There is an increasing interest in estimating bias in terms of Wasserstein-2 distances (or related distances between measures, often computable using optimal-transport algorithms), cf.
http://proceedings.mlr.press/v97/gordaliza19a.html (ICML 2019)
https://academic.oup.com/imaiai/article-abstract/8/4/817/5586771 (Information and Inference, 2019)
https://openreview.net/forum?id=-welFirjMss (NeurIPS 2022)
Following some discussions with Rahul Nair ([email protected]), we would like to contribute a new detector, with the same signature as the original mdss detector of @Adebayo-Oshingbesan:
https://github.com/Trusted-AI/AIF360/blob/master/aif360/detectors/mdss_detector.py
Could we ask @hoffmansc for a code review, please?
Closes #433