How Attention Decays: A New Law of Language
![Attention mechanisms demonstrably align with linguistic structure, as evidenced by a correspondence between part-of-speech tags and attention weights - specifically, attention concentrates on nouns and verbs, suggesting the model prioritizes content words during processing [latex] \implies [/latex] a hierarchical understanding of sentence construction.](https://arxiv.org/html/2603.04805v1/2603.04805v1/agf-image/pos.png)
Researchers are finding that the way language models focus on words isn’t random, but follows a predictable pattern reminiscent of gravity.
![Attention mechanisms demonstrably align with linguistic structure, as evidenced by a correspondence between part-of-speech tags and attention weights - specifically, attention concentrates on nouns and verbs, suggesting the model prioritizes content words during processing [latex] \implies [/latex] a hierarchical understanding of sentence construction.](https://arxiv.org/html/2603.04805v1/2603.04805v1/agf-image/pos.png)
Researchers are finding that the way language models focus on words isn’t random, but follows a predictable pattern reminiscent of gravity.
A new reinforcement learning framework uses collective human feedback expressed in natural language to dramatically improve the training of large AI models.

A new approach focuses on anatomical structures to generate more accurate and detailed reports from computed tomography scans using the power of artificial intelligence.

New research explores whether large language models can accurately identify and interpret the complex values embedded within qualitative interview data.
New research reveals how the popular ReLU activation function subtly influences the solutions found by gradient descent in high-dimensional neural networks.

Researchers are leveraging the power of self-supervised learning and efficient model tuning to create authentic maritime radio dialogues, overcoming the limitations of scarce real-world data.

Researchers have developed a novel loss function that improves forecasting accuracy by addressing inherent biases in how models predict patterns over time and space.

A novel framework leverages inexpensive labels and self-supervision to enhance the robustness and efficiency of surrogate-based optimization for complex problems.
![A bidirectional curriculum, enhanced by multi-agent interactions, demonstrably improves data efficiency in mathematical reasoning tasks by strategically interleaving problem-solving and knowledge reinforcement-a process formalized as [latex] \mathcal{L} = \sum_{t=1}^{T} \mathbb{E}_{\tau_t \sim \pi} [r(s_t, a_t)] [/latex], where [latex] \mathcal{L} [/latex] represents the learning objective, [latex] \tau_t [/latex] a trajectory, and [latex] r [/latex] the reward function.](https://arxiv.org/html/2603.05120v1/2603.05120v1/x2.png)
A new framework uses a dynamic, agent-based approach to carefully order math problems, dramatically improving how efficiently artificial intelligence learns to reason.

A new deep generative framework dramatically speeds up Bayesian analysis of complex datasets, unlocking more accurate insights from the cosmic microwave background.