Echoes of Training: Unmasking Data Used to Build AI Models
![The method quantifies gradient deviation to assess the sensitivity of a model’s output to perturbations in its input, effectively measuring the extent to which small changes can induce significant alterations-a principle formalized as [latex] \delta y = \frac{\partial y}{\partial x} \delta x [/latex]-and thereby providing a robust indicator of model stability and reliability.](https://arxiv.org/html/2603.04828v1/2603.04828v1/x6.png)
Researchers have developed a new technique to identify whether specific text samples were used in the pre-training of large language models.
![The method quantifies gradient deviation to assess the sensitivity of a model’s output to perturbations in its input, effectively measuring the extent to which small changes can induce significant alterations-a principle formalized as [latex] \delta y = \frac{\partial y}{\partial x} \delta x [/latex]-and thereby providing a robust indicator of model stability and reliability.](https://arxiv.org/html/2603.04828v1/2603.04828v1/x6.png)
Researchers have developed a new technique to identify whether specific text samples were used in the pre-training of large language models.
New research reveals that carefully examining the internal layers of Vision Transformers-specifically within their feedforward networks-offers a powerful approach to detecting data that falls outside of a model’s training distribution.

New research explores how artificial intelligence can optimize pricing and vehicle distribution when multiple companies operate competing on-demand mobility services.

A new study explores how to refine artificial intelligence models to better identify and combat online hate speech, addressing challenges like limited data and nuanced language.

A new study investigates how well deep learning models actually capture interactions between vessels when predicting their future paths.
![A hierarchical large auto-bidding model undergoes a two-stage training process-first, [latex]LBM-Act[/latex] learns to fuse linguistic guidance with decisional pathways through a dual embedding mechanism, and subsequently, [latex]LBM-Think[/latex] is refined via group relative-Q policy optimization, allowing the system to evolve beyond initial parameters and adapt its bidding strategies.](https://arxiv.org/html/2603.05134v1/2603.05134v1/x1.png)
A new hierarchical model combines the power of large language models with reinforcement learning to create more effective and adaptable automated bidding strategies.

A new framework leverages spatial transformer networks to address spectral shifts in X-ray photoelectron spectroscopy, enabling more accurate and reliable automated data analysis.

As deepfake technology becomes increasingly sophisticated, a new evaluation reveals the limitations of current tools and the need for a combined approach to verification.

Researchers have developed a self-evolving machine learning framework capable of autonomously generating high-performance algorithms for predicting future trends.
As advertising increasingly leverages large language models, researchers are investigating how reliably we can identify promotional content embedded within AI-generated text.