Gender in AI Language Models: What's In a Name?

How Language Models (Generally) Work

Before we can understand why ChatGPT decided to make Sam Striker a he, we need to cover how these models learn in the first place.¹

Let's say you want to train a model to understand English. You'll need an enormous quantity of English text. Lucky for you, that's easy to find: you might start with all of English Wikipedia and a digitized book collection. If you need even more data, you can crawl the Web and shovel in as many different websites as you can find.⁴

Then, to teach your model how English is structured in that training data, you train it to fill in the next word in a passage. For example, here's a sentence from the featured article on Wikipedia the day I write this:

After independently releasing three albums herself between 2009 and 2010, Meghan Trainor started writing songs for other singers.

Let's take a random word³ from this sentence and mask it:

After independently releasing three [MASK] herself between 2009 and 2010, Meghan Trainor started writing songs for other singers.

Now, our model tries to guess what the missing word is. Doing this over and over again, seeing billions and billions of words, it becomes possible to piece together a very detailed understanding of how English works. The cherry on top is that, because the model is learning from real English, it's slowly learning about the world too. There's no grammar rule that will tell you what the masked token is: the most helpful information is that Meghan Trainor is a singer-songwriter.

This is the most common approach: there are others, but we won't have use for them right now.↩
You might ask, "What if we don't want our AI to learn from hate speech or broken English?" That's a hard problem: despite researchers' best attempts to filter out unwanted training data, almost any language model trained on hundreds of gigabytes of data will have at least some of that in there.↩
In truth, LMs use tokens, not words: for our purposes, there's no difference, but not all words are given their own dedicated code. Monopoly may be represented internally as mon-o-poly. Different LMs use different approaches.↩

How Language Models (Generally) Work

Before we can understand why ChatGPT decided to make Sam Striker a he, we need to cover how these models learn in the first place.¹

After independently releasing three albums herself between 2009 and 2010, Meghan Trainor started writing songs for other singers.

Let's take a random word³ from this sentence and mask it:

After independently releasing three [MASK] herself between 2009 and 2010, Meghan Trainor started writing songs for other singers.

This is the most common approach: there are others, but we won't have use for them right now.↩
You might ask, "What if we don't want our AI to learn from hate speech or broken English?" That's a hard problem: despite researchers' best attempts to filter out unwanted training data, almost any language model trained on hundreds of gigabytes of data will have at least some of that in there.↩
In truth, LMs use tokens, not words: for our purposes, there's no difference, but not all words are given their own dedicated code. Monopoly may be represented internally as mon-o-poly. Different LMs use different approaches.↩