Experimental Setup Validation
Let's verify that this approach offers meaningful information. To do that, we can compare with a more objective source: the Social Security Administration's database of names given to children in the United States. I've selected the 3,630 names with a year of over 300 babies. I then ran the above RoBERTa setup with each: testing "I heard [MASK] name was Aaden", then "Aaliyah", and so on.
As a way of verifying the setup, we can compare the probability RoBERTa assigns to the actual historical ratio. The correlation between the two is 90%, indicating that RoBERTa's probabilities really do correspond to some underlying intuition about the gender split of different names.1
- This is the main way I ranked different prompts and language models, incidentally. ↩
Names | Girls | Boys | Total | % Girls | % Girls 2021 | RoBERTa % Female |
---|---|---|---|---|---|---|
Names Aaden | Girls 5 | Boys 5013 | Total 5018 | % Girls 0.1 | % Girls 2021 0 | RoBERTa % Female 5.14 |
Names Aaliyah | Girls 98342 | Boys 101 | Total 98443 | % Girls 99.9 | % Girls 2021 100 | RoBERTa % Female 97.4 |
Names Aarav | Girls 0 | Boys 6613 | Total 6613 | % Girls 0 | % Girls 2021 0 | RoBERTa % Female 5.87 |
Names Aaron | Girls 4353 | Boys 596930 | Total 601283 | % Girls 0.72 | % Girls 2021 0.33 | RoBERTa % Female 2.13 |
Names Abagail | Girls 5798 | Boys 0 | Total 5798 | % Girls 100 | % Girls 2021 100 | RoBERTa % Female 97.31 |
Names Abbey | Girls 17406 | Boys 35 | Total 17441 | % Girls 99.8 | % Girls 2021 100 | RoBERTa % Female 32.94 |
Names Abbie | Girls 21794 | Boys 330 | Total 22124 | % Girls 98.51 | % Girls 2021 100 | RoBERTa % Female 85.02 |
Names Abbigail | Girls 11942 | Boys 5 | Total 11947 | % Girls 99.96 | % Girls 2021 100 | RoBERTa % Female 78.63 |
Names Abby | Girls 59990 | Boys 181 | Total 60171 | % Girls 99.7 | % Girls 2021 100 | RoBERTa % Female 95.87 |
Names Abdiel | Girls 0 | Boys 6145 | Total 6145 | % Girls 0 | % Girls 2021 0 | RoBERTa % Female 5.28 |
Experimental Setup Validation
Let's verify that this approach offers meaningful information. To do that, we can compare with a more objective source: the Social Security Administration's database of names given to children in the United States. I've selected the 3,630 names with a year of over 300 babies. I then ran the above RoBERTa setup with each: testing "I heard [MASK] name was Aaden", then "Aaliyah", and so on.
As a way of verifying the setup, we can compare the probability RoBERTa assigns to the actual historical ratio. The correlation between the two is 90%, indicating that RoBERTa's probabilities really do correspond to some underlying intuition about the gender split of different names.1
- This is the main way I ranked different prompts and language models, incidentally. ↩
Names | Girls | Boys | Total | % Girls | % Girls 2021 | RoBERTa % Female |
---|---|---|---|---|---|---|
Names Aaden | Girls 5 | Boys 5013 | Total 5018 | % Girls 0.1 | % Girls 2021 0 | RoBERTa % Female 5.14 |
Names Aaliyah | Girls 98342 | Boys 101 | Total 98443 | % Girls 99.9 | % Girls 2021 100 | RoBERTa % Female 97.4 |
Names Aarav | Girls 0 | Boys 6613 | Total 6613 | % Girls 0 | % Girls 2021 0 | RoBERTa % Female 5.87 |
Names Aaron | Girls 4353 | Boys 596930 | Total 601283 | % Girls 0.72 | % Girls 2021 0.33 | RoBERTa % Female 2.13 |
Names Abagail | Girls 5798 | Boys 0 | Total 5798 | % Girls 100 | % Girls 2021 100 | RoBERTa % Female 97.31 |
Names Abbey | Girls 17406 | Boys 35 | Total 17441 | % Girls 99.8 | % Girls 2021 100 | RoBERTa % Female 32.94 |
Names Abbie | Girls 21794 | Boys 330 | Total 22124 | % Girls 98.51 | % Girls 2021 100 | RoBERTa % Female 85.02 |
Names Abbigail | Girls 11942 | Boys 5 | Total 11947 | % Girls 99.96 | % Girls 2021 100 | RoBERTa % Female 78.63 |
Names Abby | Girls 59990 | Boys 181 | Total 60171 | % Girls 99.7 | % Girls 2021 100 | RoBERTa % Female 95.87 |
Names Abdiel | Girls 0 | Boys 6145 | Total 6145 | % Girls 0 | % Girls 2021 0 | RoBERTa % Female 5.28 |