Gender in AI Language Models: What's In a Name?

From Fill-In-The-Blank to Detective Stories

We're not using language models to generate text, but since that's the most common use case I'll take a second to explain how we get from filling in blanks to generating stories.

The basic idea is to add in one word at a time, using the probabilities as a guide to select likely words. You can see a possible start to that detective story: after each new word, we simply feed the new text back into the language model and ask for the next word.¹

Simply picking the most likely word each time is not a great way of generating varied or interesting text: note how our model is simply repeating the information it was given instead of venturing to add new details. ChatGPT or similar projects have more complicated sampling processes² that create more varied and interesting responses.

These details matter a lot for creating the illusion of a truly intelligent AI that doesn't endlessly repeat itself and is neither boring nor manic. If you're not making a chatbot, however, the minutia of sampling are unimportant, so we'll move on.

This is a little different from masking words in the middle of a sentence, and so RoBERTa doesn't do a great job at this task. I'm using OPT as an alternative.↩
Here's a technical overview of different text generation methods if you're interested.↩

From Fill-In-The-Blank to Detective Stories

We're not using language models to generate text, but since that's the most common use case I'll take a second to explain how we get from filling in blanks to generating stories.

This is a little different from masking words in the middle of a sentence, and so RoBERTa doesn't do a great job at this task. I'm using OPT as an alternative.↩
Here's a technical overview of different text generation methods if you're interested.↩