Beyond Online Learning: Building Powerful Research Agents with Synthesized Data
New research demonstrates that cutting-edge deep research agents can be effectively trained offline, challenging the conventional reliance on costly and complex online reinforcement learning.




![AR-Omni unifies textual, speech, and visual information by embedding these diverse inputs into a shared representational space, enabling a single autoregressive decoder to generate a cohesive token stream from a joint vocabulary encompassing [latex]T[/latex] (text), [latex]S[/latex] (speech), and [latex]I[/latex] (image) modalities.](https://arxiv.org/html/2601.17761v1/x1.png)


![The study investigates how language model-based agents can be augmented with external oracles-[latex]\mathcal{O}^{\text{state}}[/latex] for summarizing state, [latex]\mathcal{O}^{\text{plan}}[/latex] for hinting waypoints, and [latex]\mathcal{O}^{\text{history}}[/latex] for rewriting task descriptions-to navigate multi-turn tasks, effectively pruning historical context and enabling agents to make decisions independent of prior steps within a Markov decision process.](https://arxiv.org/html/2601.16649v1/x1.png)