From Fill-In-The-Blank to Detective Stories
We're not using language models to generate text, but since that's the most common use case I'll take a second to explain how we get from filling in blanks to generating stories.
The basic idea is to add in one word at a time, using the probabilities as a guide to select likely words. You can see a possible start to that detective story: after each new word, we simply feed the new text back into the language model and ask for the next word.1
Simply picking the most likely word each time is not a great way of generating varied or interesting text: note how our model is simply repeating the information it was given instead of venturing to add new details. ChatGPT or similar projects have more complicated sampling processes2 that create more varied and interesting responses.
These details matter a lot for creating the illusion of a truly intelligent AI that doesn't endlessly repeat itself and is neither boring nor manic. If you're not making a chatbot, however, the minutia of sampling are unimportant, so we'll move on.
- This is a little different from masking words in the middle of a sentence, and so RoBERTa doesn't do a great job at this task. I'm using OPT as an alternative.↩
- Here's a technical overview of different text generation methods if you're interested.↩
From Fill-In-The-Blank to Detective Stories
We're not using language models to generate text, but since that's the most common use case I'll take a second to explain how we get from filling in blanks to generating stories.
The basic idea is to add in one word at a time, using the probabilities as a guide to select likely words. You can see a possible start to that detective story: after each new word, we simply feed the new text back into the language model and ask for the next word.1
Simply picking the most likely word each time is not a great way of generating varied or interesting text: note how our model is simply repeating the information it was given instead of venturing to add new details. ChatGPT or similar projects have more complicated sampling processes2 that create more varied and interesting responses.
These details matter a lot for creating the illusion of a truly intelligent AI that doesn't endlessly repeat itself and is neither boring nor manic. If you're not making a chatbot, however, the minutia of sampling are unimportant, so we'll move on.
- This is a little different from masking words in the middle of a sentence, and so RoBERTa doesn't do a great job at this task. I'm using OPT as an alternative.↩
- Here's a technical overview of different text generation methods if you're interested.↩