The Shifting Meanings of AI-Generated Words

The study reveals that semantic differentiation initially correlates with frequency-following Martin’s Law up to approximately $10^{4}$ training steps-but ultimately collapses in smaller models, while larger models maintain a stable frequency-specificity trade-off and exhibit a diverging polysemous word count, indicating a scale-dependent catastrophic loss of semantic richness.

New research reveals that as language models learn, the relationship between how often a word is used and how many meanings it acquires isn’t straightforward, defying a long-held linguistic principle.