Building Smarter Financial Models with Next-Generation AI

Author: Denis Avetisyan

A new approach to large language model training is delivering improved performance on complex financial reasoning tasks.

The QianfanHuijin framework employs a multi-stage training process to progressively refine its capabilities, enabling the model to achieve enhanced performance through iterative refinement and optimization.

This paper details QianfanHuijin, a series of models trained using continual pre-training, progressive post-training, and synthetic data to enhance financial domain expertise.

While large language models demonstrate promise across diverse fields, adapting them to the complexities of financial reasoning and agency remains a significant challenge. This is addressed in ‘QianfanHuijin Technical Report: A Novel Multi-Stage Training Paradigm for Finance Industrial LLMs’, which introduces a new approach to building finance-specific LLMs. The authors detail QianfanHuijin, a model achieving superior performance on financial benchmarks through continual pre-training and a progressive, fine-grained post-training pipeline emphasizing both reasoning and agentic capabilities. Could this multi-stage paradigm represent a broadly applicable methodology for enhancing industrial LLMs across specialized domains?

The Foundations of Financial Acumen: Addressing Cognitive Limitations

Conventional language models, while proficient in general language tasks, often stumble when confronted with the intricacies of financial analysis and prediction. This limitation stems from a fundamental lack of specialized knowledge; these models are trained on broad datasets lacking the specific terminology, concepts, and historical context crucial for understanding financial markets. Consequently, they struggle to accurately interpret financial reports, assess risk, or forecast market trends, leading to unreliable outputs and hindering their application in real-world financial scenarios. The nuanced language of finance, filled with jargon and dependent on a deep understanding of economic principles, presents a significant challenge to models primarily trained on general-purpose text, necessitating a dedicated approach to knowledge acquisition.

To overcome the limitations of general language models in financial analysis, a two-stage continual pre-training (CPT) strategy is employed, initiating with focused financial knowledge injection. This crucial first phase doesn’t simply add data; it systematically integrates a vast corpus of financial texts – encompassing reports, news articles, economic indicators, and regulatory filings – directly into the model’s core understanding. The process refines the model’s existing parameters, shifting its internal representation to prioritize financial concepts, terminology, and relationships. Consequently, the model develops a foundational grasp of the financial domain, enabling it to interpret complex data, recognize relevant patterns, and ultimately, provide more accurate and insightful predictions than would be possible with a general-purpose language model alone. This targeted injection establishes the necessary context for subsequent stages of refinement, building a strong base for advanced financial reasoning.

The creation of a truly financially adept language model begins with a dedicated knowledge infusion stage, effectively establishing a foundational expertise. This initial phase doesn’t aim for immediate task completion, but rather constructs a robust contextual understanding of financial principles, terminology, and historical data. By pre-training the model on a curated corpus of financial texts, it develops an internal representation of how financial concepts interrelate – a crucial step before tackling complex analytical challenges. This process ensures the model isn’t simply processing words, but comprehending the underlying economic realities they represent, ultimately paving the way for more accurate predictions and sophisticated reasoning in subsequent refinement stages.

Following the establishment of foundational financial knowledge, the model undergoes a phase of Financial Capability Enhancement, meticulously designed to move beyond simple recall towards sophisticated reasoning. This stage utilizes advanced techniques, including the incorporation of complex datasets encompassing market simulations and historical trading data, to refine the model’s predictive capabilities. Through iterative training and validation, the system learns to identify nuanced patterns, assess risk tolerance, and formulate informed financial strategies. The objective isn’t merely to process information, but to cultivate an ability to extrapolate from existing data, interpret economic indicators, and ultimately, demonstrate the hallmarks of expert-level financial judgment – a crucial step towards reliable automation in complex financial applications.

A Multi-Stage Pipeline for Refined Financial Reasoning

The Multi-stage Post-training Pipeline is structured as a sequential process of model refinement, moving from broad language proficiency to specialized reasoning and collaborative skills. This pipeline consists of four distinct stages: Supervised Fine-Tuning (SFT), Reasoning Reinforcement Learning (RL), Agentic RL, and General RL. Each stage builds upon the preceding one, progressively enhancing the model’s capabilities. SFT establishes a base level of language understanding, which is then leveraged by Reasoning RL to improve logical accuracy. Agentic RL further develops the model’s ability to interact with and utilize external tools, while the final General RL stage performs comprehensive optimization to ensure overall performance and alignment with desired objectives. This staged approach allows for targeted improvement of specific skills, resulting in a more robust and capable reasoning agent.

Supervised Fine-Tuning (SFT) serves as the initial stage in establishing a robust foundation of language understanding within the model. This process utilizes a curated dataset of input-output examples to train the model to accurately predict and generate text relevant to financial reasoning tasks. Following SFT, Reasoning Reinforcement Learning (Reasoning RL) is implemented to specifically enhance logical rigor and precision. This stage employs a reward function designed to incentivize the model to produce outputs that adhere to established principles of logical deduction and avoid common fallacies, effectively moving beyond simple pattern matching to encourage verifiable and justifiable conclusions.

Agentic Reinforcement Learning (RL) extends the model’s capabilities by training it to utilize external tools and APIs to enhance problem-solving; this involves defining an action space that includes tool calls, parsing tool outputs, and integrating these results into the reasoning process. Subsequent General RL then applies a broader reward structure across diverse tasks to optimize the model’s overall performance and ensure alignment with desired behaviors; this stage refines the agent’s policy beyond specific tool-use scenarios, promoting robust and generalized reasoning capabilities, and mitigating potential biases or unintended consequences arising from narrow task optimization.

Traditional language models often rely on identifying correlations within datasets, exhibiting pattern recognition without genuine understanding of underlying principles. This multi-stage post-training pipeline addresses this limitation by explicitly cultivating financial reasoning capabilities. Through iterative refinement via Supervised Fine-Tuning, Reasoning Reinforcement Learning, Agentic Reinforcement Learning, and General Reinforcement Learning, the model moves beyond statistical association to incorporate logical deduction, causal inference, and tool utilization. This process enables the model to analyze financial data, evaluate investment strategies, and generate insights based on economic principles, rather than simply replicating observed patterns.

This pipeline details the process of data cleaning and filtering applied to prepare the dataset for subsequent analysis.

Rigorous Validation: Benchmarking Financial Intelligence

QianfanHuijin utilizes established benchmarks – FinanceIQ, FLAME-Cer, FinQA, and Financial Reasoning – to provide quantitative validation of its financial intelligence capabilities. These benchmarks consist of datasets designed to assess a model’s performance on tasks common to the financial domain, including question answering, logical reasoning, and data interpretation. Employing these standardized evaluations allows for a direct comparison of QianfanHuijin’s capabilities against other models and demonstrates its performance across a variety of financial scenarios. Results from these benchmarks are used to refine the model and ensure reliable performance in real-world applications.

QianfanHuijin achieved a score of 94.43 on the FinanceIQ benchmark, a standardized assessment of financial intelligence capabilities. This result indicates a performance level exceeding that of other models evaluated on the same benchmark. FinanceIQ tests a model’s ability to understand and reason about complex financial concepts, interpret financial data, and provide accurate answers to finance-related questions. The score is calculated based on accuracy across a diverse set of financial tasks represented within the benchmark dataset, establishing a quantitative measure of the model’s proficiency in the financial domain.

The QianfanHuijin-70B model achieved a score of 89.59 on the FLAME-Cer benchmark, a standardized evaluation for financial reasoning capabilities. FLAME-Cer focuses on challenging question answering tasks requiring comprehension of complex financial documents and data. This performance indicates a high degree of accuracy in the model’s ability to extract and interpret relevant information from financial texts, and positions it as a leading performer on this specific benchmark compared to other evaluated models.

Evaluation using the FinQA benchmark demonstrates a performance advantage for QianfanHuijin, achieving a score of 77.1. This result surpasses the performance of the DeepSeek-R1 model, which attained a score of 65.5 on the same benchmark. The FinQA dataset focuses on question answering within the financial domain, requiring models to demonstrate understanding and reasoning capabilities related to financial concepts and data. This scoring difference indicates QianfanHuijin’s improved capacity for accurately interpreting and responding to complex financial queries as assessed by this standardized evaluation.

Rigorous benchmarking across multiple financial datasets – including FinanceIQ, FLAME-Cer, and FinQA – demonstrates QianfanHuijin’s consistent high performance on a variety of financial tasks. The model achieved scores of 94.43 on FinanceIQ and 89.59 on FLAME-Cer, exceeding the performance of comparable models. Furthermore, a FinQA score of 77.1, notably higher than DeepSeek-R1’s 65.5, indicates a strong capacity for question answering within the financial domain. These results collectively validate the model’s proficiency in handling complex financial reasoning and data analysis.

FinanceIQ evaluates financial performance through a suite of submetrics designed to provide a comprehensive assessment of key indicators.

Addressing Data Challenges and Refining Reward Signals

Financial modeling often suffers from a critical limitation: the lack of sufficient data, particularly when exploring nuanced or rare events crucial for robust performance. To overcome this, Data Synthesis techniques are employed to artificially expand training datasets, generating plausible financial scenarios that would otherwise be unavailable. This isn’t simply random data creation; instead, it involves sophisticated algorithms that learn the underlying patterns and relationships within existing financial data, then extrapolate to create new, realistic examples. By intelligently augmenting limited datasets, Data Synthesis allows for the training of more resilient and accurate models capable of navigating the complexities of real-world financial markets and providing more reliable insights, even when faced with unforeseen circumstances.

The limitations of existing instruction datasets – often lacking the nuanced complexity and breadth required for sophisticated financial modeling – are directly addressed by the Controllable Instruction Synthesis Framework (CIS-F). This innovative approach moves beyond simple data augmentation by generating synthetic instructions that are not merely varied, but specifically tailored to challenge and refine model capabilities. CIS-F achieves this through a controlled process, allowing researchers to define parameters influencing instruction difficulty, complexity, and even the types of financial scenarios presented. The resulting synthetic data expands the scope of training, equipping models to handle a wider range of real-world conditions and improving their accuracy in complex decision-making processes, ultimately leading to more robust and reliable financial applications.

To achieve nuanced and accurate reinforcement learning in complex financial modeling, a Dual-verifier Reward Model was developed. This system uniquely integrates the strengths of both rule-based systems and Large Language Models (LLMs) for reward signal generation. Traditional rule-based verification provides precision and consistency based on pre-defined financial principles, ensuring adherence to established criteria. However, these systems often struggle with ambiguity or novel situations. To address this, the model incorporates an LLM-based verifier, capable of interpreting complex financial reports and identifying subtle patterns or deviations. By combining these two approaches, the Dual-verifier Reward Model delivers reward signals that are both reliable and adaptable, ultimately leading to more robust and effective learning in dynamic financial environments.

The developed model showcases significant core business strength, demonstrably improving underlying financial performance beyond simple headline figures. While a 56.89% growth in Net Profit is substantial, the model achieved a far more impressive 92.68% deduction in Non-net Profit Growth. This indicates a substantial refinement of operational efficiency and a reduction in factors detracting from true profitability – effectively streamlining revenue generation and cost control. The disparity between these two metrics highlights the model’s ability to not only increase overall profit, but to do so through sustainable improvements in the fundamental drivers of business success, rather than relying on transient or non-recurring gains.

Analysis reveals that non-recurring profit and loss (P&L) items account for 11.67% of the overall Net Profit. This suggests a moderate, yet noteworthy, influence on reported profitability; while not dominating the financial picture, these one-time gains or losses demonstrate that sustained, core business performance remains the primary driver of earnings. The contribution highlights the importance of discerning between repeatable revenue streams and temporary factors when evaluating long-term financial health and potential. Consequently, stakeholders should consider this figure when assessing the true underlying strength and consistency of earnings generation.

The convergence of data synthesis, the Controllable Instruction Synthesis Framework, and a dual-verifier reward model demonstrably elevates the performance of financial models in practical applications. By proactively addressing data scarcity and refining the precision of reward signals, these techniques foster a heightened level of robustness against the inherent complexities and uncertainties of real-world financial data. This holistic approach not only improves the model’s capacity to generalize beyond training datasets but also allows it to navigate nuanced financial scenarios with greater accuracy, ultimately translating to more reliable and insightful predictions-as evidenced by the model’s ability to significantly reduce Non-net Profit Growth while simultaneously achieving substantial Net Profit growth.

The data synthesis workflow combines multiple sources and transformations to generate comprehensive and reliable datasets.

The pursuit of QianfanHuijin exemplifies a dedication to provable intelligence within the financial sphere. The model’s multi-stage training-continual pre-training, progressive post-training, and data synthesis-isn’t merely about achieving benchmark scores; it’s about building a system grounded in robust methodology. As Paul Erdős once stated, “A mathematician knows a lot of things, but a physicist knows the deep underlying truths.” This echoes the approach within this work; it’s not enough for the LLM to perform financial reasoning-the framework demands an understanding of the underlying truths of financial data and logic, striving for a correctness beyond empirical validation. The focus on specialized data synthesis reinforces this, ensuring the model isn’t simply memorizing patterns but constructing knowledge.

What’s Next?

The presented work, while demonstrating tangible gains in financial language model performance, merely scratches the surface of a deeper, more fundamental challenge. The relentless pursuit of scale – larger models, larger datasets – feels increasingly like a palliative measure. While QianfanHuijin exhibits proficiency, true intelligence in this domain necessitates formal reasoning, a capacity that remains conspicuously absent. The synthesis of financial data, though cleverly employed, is still fundamentally reliant on existing, potentially flawed, human-generated examples. A truly robust system demands the ability to derive financial truths, not merely extrapolate patterns.

Future investigations must prioritize the integration of symbolic reasoning engines with these large language models. The current paradigm favors statistical correlation over causal understanding, a dangerous limitation in a field where risk assessment demands unwavering logical rigor. One wonders if the observed improvements are genuine gains in comprehension, or simply more sophisticated mimicry. The field risks becoming lost in a labyrinth of empirical results, devoid of underlying mathematical justification.

In the chaos of data, only mathematical discipline endures. The next generation of financial language models will not be defined by their size, but by their ability to prove their conclusions, not simply assert them with statistical confidence. The emphasis should shift from ‘training’ to ‘verification’, from pattern recognition to axiomatic derivation. Only then can these systems transcend the role of glorified prediction engines and become genuine tools for financial understanding.

Original article: https://arxiv.org/pdf/2512.24314.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Foundations of Financial Acumen: Addressing Cognitive Limitations

A Multi-Stage Pipeline for Refined Financial Reasoning

Rigorous Validation: Benchmarking Financial Intelligence

Addressing Data Challenges and Refining Reward Signals

What’s Next?

See also: