The Deep Learning Scaling Puzzle: Why Bigger Isn’t Always Better
![Internal feature learning in deep residual networks collapses with increasing depth-at a rate of [latex] 1/\sqrt{L} [/latex]-but this degradation is rectified by a depth-aware learning rate, [latex] \eta_1 = \eta_c n \sqrt{L} [/latex], which restores active learning across layers and enables consistent hyperparameter transfer and improved performance, as demonstrated by lower training and testing losses and higher accuracy even with varying network depths and widths.](https://arxiv.org/html/2512.21075v1/figures/Vanish_resnet_performence_acc_loss.png)
New research reveals how the dynamics of feature learning in deep neural networks explain both the successes and limitations of simply scaling up model size.
![Internal feature learning in deep residual networks collapses with increasing depth-at a rate of [latex] 1/\sqrt{L} [/latex]-but this degradation is rectified by a depth-aware learning rate, [latex] \eta_1 = \eta_c n \sqrt{L} [/latex], which restores active learning across layers and enables consistent hyperparameter transfer and improved performance, as demonstrated by lower training and testing losses and higher accuracy even with varying network depths and widths.](https://arxiv.org/html/2512.21075v1/figures/Vanish_resnet_performence_acc_loss.png)
New research reveals how the dynamics of feature learning in deep neural networks explain both the successes and limitations of simply scaling up model size.
![The system iteratively refines node descriptions within a closed loop, leveraging a graph neural network (GNN) to provide task feedback and a model-conditioned memory to retrieve relevant in-graph exemplars-guiding a large language model (LLM) to update node semantics before these are fed back into the GNN for continuous improvement [latex] \rightarrow [/latex].](https://arxiv.org/html/2512.21106v1/x2.png)
A new approach leverages the power of large language models to refine the semantic understanding of nodes within graph structures, leading to improved performance and adaptability.
![Combinatorial optimization problems defined on graph structures encompass a diverse range of challenges, fundamentally categorized by constraints on node and edge variables - such as those maximizing flow through a network [latex] G = (V, E) [/latex], minimizing the cost of traversing a graph, or satisfying complex relationships between interconnected elements - ultimately requiring algorithms to navigate this landscape of possibilities and identify provably optimal solutions.](https://arxiv.org/html/2512.20915v1/Classification_of_graph_COPs.png)
Researchers have developed a framework to predict how challenging a graph-based problem will be, offering insights into its inherent complexity.
![An agent’s adaptability presents a trade-off between responsiveness and stability, as demonstrated by a parameter [latex]\gamma[/latex] influencing its learning rate; a low [latex]\gamma[/latex] enables rapid adaptation to environmental shifts but introduces noise, while a high [latex]\gamma[/latex] prioritizes stability at the cost of slower adaptation-even relative to a static agent-due to its extended effective memory horizon of [latex]N_{eq}=1000[/latex] over [latex]t=500[/latex] time steps.](https://arxiv.org/html/2512.20884v1/images/experimentB-1.png)
New research proposes a framework for artificial intelligence that actively seeks out and verifies information, overcoming inherent limitations in its understanding of the world.
![The proposed AMPEND-LS framework leverages a learned stiffness parameterization to achieve robust and adaptable manipulation, effectively balancing positional accuracy with dynamic response through the optimization of [latex]L_s[/latex] loss-a metric quantifying the trade-off between trajectory tracking and energy expenditure-and ensuring stable, efficient control across diverse interaction scenarios.](https://arxiv.org/html/2512.21039v1/x1.png)
Researchers have developed a new artificial intelligence system that leverages multiple sources of information and simulated personas to more accurately identify and explain the reasoning behind fake news detection.

Researchers are leveraging the power of artificial intelligence to improve the detection of rare diseases in chest X-rays, addressing a critical challenge in medical imaging.

As social media conversations evolve, sentiment analysis models can quickly become unreliable, and this research details a novel method for monitoring performance without retraining.
New research tackles the challenge of reliably evaluating and improving the resilience of deep learning models against sophisticated adversarial attacks.
A new approach using deep symbolic regression is revealing the underlying equations that govern how defects interact within atomically thin materials.
![A framework assesses language model reliability by extracting latent states from a frozen [latex]Qwen2.5-7B-Instruct[/latex] model and computing hallucination probabilities with neural network probes, enabling real-time detection of fabricated content as the system processes each token.](https://arxiv.org/html/2512.20949v1/x2.png)
Researchers have developed a novel method to detect when large language models are fabricating information, moving beyond simple accuracy metrics.