Quantifying Gendered Names Using AI

Names are all over the Internet, and as such language models have picked up on them. Consider the following masked sentence1:

In December 2020, Timnit Gebru was the center of a public controversy stemming from [MASK] abrupt and contentious departure from Google as technical co-lead of the Ethical Artificial Intelligence Team.

The missing word is her, because Timnit Gebru uses she/her/hers pronouns. For those of us unfamiliar with Eritrean names, correctly discerning the prounoun is difficult. RoBERTa also finds this difficult—the model assigns a 77% chance of his as the completion and only a 22% chance of her. (Language models train on publicly available English, meaning that like me they're far less familiar with Eritrean names than common American ones.)

Here lies the interesting challenge: when trying to ascertain the completion here, we might consciously avoid gendered stereotyping, even if we would unconsciously associate certain professions or actions with men and women. We know that, logically, the only clue we ought to use here is Timnit Gebru's name.

RoBERTa has no such qualms. Consider the predictions for a sentence that changes out the job:

In December 2020, Timnit Gebru was the center of a public controversy stemming from [MASK] abrupt and contentious departure from Google as an administrative assistant.

Now RoBERTa assigns a 62% chance of her and a 36% chance of his. This kind of stereotyping doesn't overpower strongly gendered names. but it will definitely confound analyzing names like Timnit or Sam.

If we want to see what RoBERTa thinks of names like Timnit, we need to find a way of prompting it that avoids any overt biases. (We can look at those another day.)


  1. Source: Wikipedia, modified to include Gebru's first name (not that it changes the results substantially.)

Quantifying Gendered Names Using AI

Names are all over the Internet, and as such language models have picked up on them. Consider the following masked sentence1:

In December 2020, Timnit Gebru was the center of a public controversy stemming from [MASK] abrupt and contentious departure from Google as technical co-lead of the Ethical Artificial Intelligence Team.

The missing word is her, because Timnit Gebru uses she/her/hers pronouns. For those of us unfamiliar with Eritrean names, correctly discerning the prounoun is difficult. RoBERTa also finds this difficult—the model assigns a 77% chance of his as the completion and only a 22% chance of her. (Language models train on publicly available English, meaning that like me they're far less familiar with Eritrean names than common American ones.)

Here lies the interesting challenge: when trying to ascertain the completion here, we might consciously avoid gendered stereotyping, even if we would unconsciously associate certain professions or actions with men and women. We know that, logically, the only clue we ought to use here is Timnit Gebru's name.

RoBERTa has no such qualms. Consider the predictions for a sentence that changes out the job:

In December 2020, Timnit Gebru was the center of a public controversy stemming from [MASK] abrupt and contentious departure from Google as an administrative assistant.

Now RoBERTa assigns a 62% chance of her and a 36% chance of his. This kind of stereotyping doesn't overpower strongly gendered names. but it will definitely confound analyzing names like Timnit or Sam.

If we want to see what RoBERTa thinks of names like Timnit, we need to find a way of prompting it that avoids any overt biases. (We can look at those another day.)


  1. Source: Wikipedia, modified to include Gebru's first name (not that it changes the results substantially.)