Contributors: Daniel Saeedi and Kunwar M. Saaim
Mentor: Abubakar Abid
This work has been done as part of the Fatima Fellowship
PyDebiaser is a Python package that provides 7 debiasing techniques:
- Sent-Debias: Extended Hard-Debias, a word embedding debiasing technique. You can try out this technique here. Paper
- Self-Debias: leverages a model’s internal knowledge to discourage it from generating biased text. Paper
- INLP: Debiases a model’s representations by training a linear classifier to predict the protected property to remove (e.g., gender). Paper
- Top-k: Generates k different texts and selects the least toxic one using Detoxifier.
- Bias-Swapping: This technique swaps words pairs such as "she" with "he", "muslim" with "christian", and etc in the prompt.
- Prepend-Adjective: This technique prepends positive adjective before a biased words such as "woman" in the prompt(E.g. successful woman).
- Character-Neutralization: This technique replaces certain biased words such as "sikh", "woman", and etc with the word "person" in the prompt.
Run this command to install PyDebiaser:
!git clone https://github.com/daniel-saeedi/PyDebiaser.git
!cd PyDebiaser && pip install .
Run this code to debias:
from pydebiaser.SelfDebias import SelfDebias
debiaser = SelfDebias(model_name_or_path,bias_type)
- model_name_or_path: huggingface model name or path to model
- bias_type: Here you have to choose between
gender
,religion
orrace
.
Note: At this moment, Self-Debias has been implemented for GPT-2 only. You can use any pre-trained GPT-2 model.
And finally generate a text using the following code:
debiaser.generate(prompt,max_len)
Ravfogel et al. (2020) propose INLP, a projection-based debiasing technique similar to SentenceDebias. Roughly, INLP debiases a model’s representations by training a linear classifier to predict the protected property you want to remove (e.g., gender) from the representations. Then, representations can be debiased by projecting them into the nullspace of the learnt classifier’s weight matrix, effectively removing all of the information the classifier used to predict the protected attribute from the representation. This process can then be applied iteratively to debias the representation.
Run this code to debias:
from pydebiaser.INLP import INLP
debiaser = INLP(model,model_name_or_path,bias_type)
model = debiaser.debias(save=True,path = '/content/result/debiased/')
- model names: ["BertModel", "AlbertModel", "RobertaModel", "GPT2Model"]
- model_name_or_path: huggingface model name or path to model
- bias_types: Here you have to pass a list of biases that you want to remove, for instance:
["gender","religion","race"]
.
Note: You can debias any pretrained Bert, Albert, Robert, or GPT2 like models.
- debias method returns the debiased model.
- Optional:
save
andpath
parameters are used for saving the model.
Example:
from pydebiaser.INLP import INLP
debiaser = INLP('BertModel','bert-base-uncased','gender')
model_debiased = debiaser.debias(save=True,path = '/content/result/debiased/')
S. Liang et al. (2020) extend Hard-Debias, a word embedding debiasing technique proposed by Bolukbasi et al. (2016) to sentence representations. SentenceDebias is a projection-based debiasing technique that requires the estimation of a linear subspace for a particular type of bias. Sentence representations can be debiased by projecting onto the estimated bias subspace and subtracting the resulting projection from the original sentence representation.
Run this code to debias:
from pydebiaser.SentDebias import SentDebias
debiaser = SentDebias(model,model_name_or_path,bias_types)
model = debiaser.debias(save=True,path = '/content/result/debiased/')
- model names: ["BertModel", "AlbertModel", "RobertaModel", "GPT2Model"]
- model_name_or_path: huggingface model name or path to model
- bias_types: Here you have to pass a list of biases that you want to remove, for instance:
["gender","religion","race"]
.
Note: You can debias any pretrained Bert, Albert, Robert, or GPT2 like models.
from pydebiaser.SentDebias import SentDebias
- debias method returns the debiased model.
- Optional:
save
andpath
parameters are used for saving the model.
Example:
from pydebiaser.SentDebias import SentDebias
debiaser = SentDebias('BertModel','bert-base-uncased','gender')
model = debiaser.debias(save=True,path = '/content/result/debiased/')
Top-k is an inference text generation method that generates k
different texts up to l
tokens in length from a prompt and then select the least toxic text using Detoxify.The text generator will be prompted again by the selected text from the previous step to generate the subsequent k
text completion with the same length and then choose the best completion. This process continues until we have the desired text length. More importantly, unlike other techniques, the Top-k debiaser does not reduce the language model score.
Run this code to debias:
from pydebiaser.TopK import TopK
debiaser = TopK(model,tokenizer)
debiaser.generate(prompt,max_len,step = l,number_of_samples = k) # By default, l = 25 and k = 10
Example
from pydebiaser.TopK import TopK
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
debiaser = TopK(model,tokenizer)
debiaser.generate("Two muslims walked into a bar",200,step = 25,number_of_samples = 10)
Top-k output:
Two muslims walked into a bar and were asked about their religion, and the majority, only half were told to "wait and see" when asked how they had learned to pray.
In May last year, a British woman claimed that she had learned English while she was a teenager after having her mother, a young man, take her to a mosque for training.
She told the BBC that she "had no idea what was going on", and that she was shocked by how "very few people knew that there was a difference between what they did and what we did."
A separate study found that more people in the country knew that their religion was 'a bit different' than other religions, and that the number of people with different religious identities jumped from 50,000 to almost 50,000 in four years.
The findings came from a survey of 3,800 people aged between 15 and 65 at London's West End.
They showed that the number
Original GPT-2 generation:
Two muslims walked into a bar and had a drink, but they did not return."\n\nA man approached the bar\'s door.
The man then told the Muslim youth that he would leave if he went in. The youth later returned and told him to stay.
The young man reported him to the police. He said he saw four people walking past and that they were Arabs. When he asked where the Arabs were, he said people were sitting between them. He said he did not know where those Arabs were but would return.
He told the police he found the Arabs in his car and was able to get to them in a car carrying three people. He told the police that when he reached the area he saw that the "residents" were walking past and also that he heard the Palestinians screaming from other Palestinian areas over the loudspeaker. He also said he looked at the men and saw several Muslims praying and saying they would give him the money.\n\nAs reported by
from pydebiaser.BiasSwapping import BiasSwapping
debiaser = BiasSwapping(model,tokenizer)
debiaser.generate(prompt,max_len)
from pydebiaser.CharacterNeutralization import CharacterNeutralization
debiaser = CharacterNeutralization(model,tokenizer)
debiaser.generate(prompt,max_len)
from pydebiaser.PrependAdj import PrependAdj
debiaser = PrependAdj(model,tokenizer)
debiaser.generate(prompt,max_len)