Decoding Market Mood: AI-Powered Sentiment Analysis for India’s NIFTY 50

Author: Denis Avetisyan


A new approach combines the power of large language models, real-time news retrieval, and reinforcement learning to better understand investor sentiment and predict stock market movements.

An adaptive sentiment analysis pipeline is proposed, allowing for nuanced understanding as systems inevitably evolve and degrade over time, rather than remaining static assessments within a fixed framework.
An adaptive sentiment analysis pipeline is proposed, allowing for nuanced understanding as systems inevitably evolve and degrade over time, rather than remaining static assessments within a fixed framework.

This study details an adaptive financial sentiment analysis framework leveraging multi-source news, instruction-tuned language models, and proximal policy optimization to improve prediction accuracy for the NIFTY 50 index.

While financial sentiment analysis is crucial for informed investment, existing methods often neglect the dynamic interplay between market feedback and predictive accuracy. This is addressed in ‘Adaptive Financial Sentiment Analysis for NIFTY 50 via Instruction-Tuned LLMs , RAG and Reinforcement Learning Approaches’ which introduces a novel framework integrating large language models, multi-source news retrieval, and reinforcement learning to dynamically adapt to market behavior. Experimental results demonstrate significant improvements in sentiment classification and alignment with actual stock returns for the NIFTY 50 index. Could this adaptive approach unlock a new era of robust and market-aware financial modeling?


The Ebb and Flow of Market Sentiment

Conventional financial modeling has historically prioritized quantifiable data, such as trading volume and corporate earnings, often neglecting the pervasive influence of public opinion. However, investor behavior is rarely driven by logic alone; sentiment expressed in news articles, social media posts, and online forums demonstrably shapes market trends. These subtle cues – ranging from optimistic forecasts to anxieties about economic downturns – can amplify or suppress price movements, creating opportunities and risks not captured by traditional metrics. The collective emotional state of investors, though difficult to measure directly, acts as a significant, and frequently overlooked, force in determining asset valuation and overall market stability. Recognizing this interplay between rational analysis and emotional response is increasingly vital for comprehensive financial forecasting and risk management.

Financial forecasting historically prioritized quantitative metrics – stock prices, trading volumes, and economic indicators – yet an overreliance on these figures can obscure crucial information embedded in market perception. The assumption that price movements solely reflect rational economic behavior fails to account for the significant influence of investor psychology and collective sentiment. Studies demonstrate that shifts in public opinion, often expressed through news articles and social media, frequently precede and even drive price fluctuations, creating opportunities – or risks – missed by purely data-driven models. Ignoring these signals introduces a considerable blind spot, potentially leading to inaccurate predictions, flawed investment strategies, and ultimately, heightened exposure to market volatility. The inability to discern underlying emotional currents can therefore transform seemingly stable portfolios into unexpectedly vulnerable positions.

The accurate assessment of financial sentiment within textual data presents significant hurdles beyond simple positive or negative labeling. Financial language is replete with subtle nuances, industry-specific jargon, and often employs double negatives or conditional statements that require sophisticated natural language processing to correctly interpret. Moreover, market contexts are dynamic; a term considered bullish today might be bearish tomorrow, or its meaning could shift based on unforeseen events. This evolving lexicon and the presence of sarcasm, irony, and speculative language necessitate models capable of understanding not just what is said, but also how it’s said, and within what specific economic framework. Consequently, extracting reliable sentiment demands continual adaptation and the incorporation of contextual awareness to avoid misinterpreting signals and generating flawed financial predictions.

Augmenting Insight: Retrieval and the Limits of Knowledge

Large Language Models (LLMs) serve as the core component of our financial sentiment analysis pipeline due to their inherent capacity to process and interpret nuanced language structures commonly found in financial news and reports. Unlike traditional sentiment analysis methods reliant on predefined lexicons or rule-based systems, LLMs leverage deep learning techniques to understand contextual meaning, allowing for the accurate assessment of sentiment even in complex sentences containing jargon, negation, or implied opinions. This capability is crucial for analyzing the vast and often ambiguous language used in financial media, enabling a more sophisticated and reliable determination of market sentiment compared to simpler approaches. The LLM’s pre-training on extensive text corpora provides a foundational understanding of language, which is then fine-tuned for the specific task of financial sentiment classification.

Large Language Models (LLMs), while proficient in natural language understanding, possess inherent knowledge cut-off dates and lack access to current events. To address this limitation, we implemented Retrieval Augmented Generation (RAG). RAG functions by first retrieving relevant documents – in this case, current financial news articles – from an external knowledge source based on a user’s query. These retrieved articles are then concatenated with the original prompt and fed into the LLM. This process allows the model to ground its responses in up-to-date information, effectively expanding its knowledge base beyond its pre-training data and improving the timeliness and accuracy of sentiment analysis.

To maximize the relevance of information provided to the Large Language Model, a cosine similarity metric is implemented during the retrieval process. This calculation determines the similarity between the vector representation of the target financial instrument and the vector representations of candidate news articles. Articles exceeding a predetermined similarity threshold are then incorporated into the context provided to the LLM. Evaluation of this Retrieval-Augmented Generation (RAG) approach, utilizing static weighting for article relevance, resulted in a sentiment assessment accuracy of 0.6094, indicating a measurable improvement in performance attributable to the focused retrieval of pertinent financial news.

Imbuing Models with Domain Expertise: Instruction and the SentiFin Dataset

Instruction Tuning was employed to adapt the LLaMA 3.2 3B Large Language Model (LLM) for specialized financial applications. This process involves presenting the LLM with a dataset of instructions paired with desired outputs, effectively training it to perform specific financial tasks and adhere to relevant data formats. Rather than general language understanding, the model’s parameters are adjusted to optimize performance on tasks such as sentiment analysis, news classification, or financial forecasting, leveraging the provided instruction-output pairs to refine its responses and ensure alignment with financial domain requirements. The goal is to move beyond pre-trained capabilities and create a model explicitly proficient in processing and interpreting financial information.

The SentiFin dataset is a curated collection of Indian stock market news articles assembled to facilitate the training of Large Language Models (LLMs) in financial analysis. The dataset comprises news content related to companies within the NIFTY 50 index and is specifically designed to expose the LLM to financial terminology, industry-specific language, and the contextual nuances of the Indian stock market. This focused curation enables the model to develop a deeper understanding of financial reporting and improve its ability to accurately interpret and process financial news data, ultimately enhancing performance on downstream financial tasks.

The SentiFin dataset utilizes the NIFTY 50 index as its foundational data source, ensuring representation of the Indian stock market through its 50 most actively traded companies. This focus allows for a targeted approach to sentiment analysis relevant to Indian equities. Initial results from instruction tuning, employing this dataset, yielded an accuracy score of 0.5520. This figure serves as a performance baseline against which subsequent model refinements and alternative approaches will be evaluated, providing a quantifiable metric for improvement in financial sentiment classification.

The Adaptive System: Learning from the Market’s Verdict

The system employs reinforcement learning to refine the contribution of various financial news sources when predicting stock movements. Rather than treating all news equally, the approach learns to prioritize sources based on their historical ability to foreshadow actual market returns. This dynamic weighting allows the model to focus on information streams that consistently deliver predictive signals, effectively filtering out noise and potentially unreliable reporting. By continuously adjusting these weights through a learning process, the system adapts to changing market conditions and the evolving reliability of different news providers, ultimately aiming to enhance the accuracy of stock market predictions.

A core innovation lies in the system’s ability to learn directly from market outcomes. The framework establishes a direct feedback mechanism that meticulously compares sentiment predictions – derived from news sources – with actual subsequent stock returns. This comparison isn’t merely evaluative; it generates a quantifiable reward signal for the reinforcement learning agent. Positive correlations between predicted sentiment and realized gains yield positive rewards, encouraging the agent to prioritize those sources; conversely, discrepancies trigger negative rewards, prompting a reassessment of source reliability. This continuous loop of prediction, evaluation, and reinforcement allows the system to dynamically refine its weighting of news sources, effectively learning which sources consistently deliver the most predictive signals for stock market movements.

The system leverages Proximal Policy Optimization (PPO), a reinforcement learning algorithm designed for stable and efficient learning, to dynamically refine the weighting of various financial news sources. This approach enables the framework to prioritize information streams that demonstrably contribute to accurate stock market predictions. Initial results indicate a significant performance boost through this method; integrating market feedback-based source reweighting with Retrieval-Augmented Generation (RAG) achieved an accuracy of 0.6153. Further refinement, encompassing instruction tuning alongside RAG and reinforcement learning, culminated in an overall accuracy of 0.66, demonstrating the power of adaptive information weighting in financial forecasting.

Proximal Policy Optimization successfully learned final source weights, demonstrating effective policy convergence.
Proximal Policy Optimization successfully learned final source weights, demonstrating effective policy convergence.

The pursuit of predictive accuracy in financial markets, as explored within this adaptive sentiment analysis framework, mirrors the natural tendency of all systems toward eventual decay. The study’s integration of large language models, retrieval-augmented generation, and reinforcement learning isn’t about halting this decline, but rather about learning to navigate it with increasing grace. As Tim Bern-Lee observed, “The Web is more a social creation than a technical one,” and similarly, this framework acknowledges that market behavior isn’t solely dictated by data, but by the complex interplay of information and collective sentiment. The iterative refinement through reinforcement learning isn’t about achieving perfect prediction, but about adapting to the inherent fluidity of the system-observing the process proves more valuable than attempting to force an outcome.

What Lies Ahead?

This work, like any attempt to chart market currents, establishes a point on a continually extending timeline. The adaptive framework presented isn’t a solution, but a refinement – a more sensitive instrument for measuring the decay of information’s predictive power. The logging of news sentiment, augmented by retrieval and reinforced through learning, creates a chronicle of evolving biases, but the system’s true test lies in its ability to gracefully age. The inherent noise in financial data, the unpredictable nature of human reaction, these are not bugs to be fixed, but fundamental properties of the system itself.

Future iterations will inevitably confront the challenge of source weighting. While the current approach demonstrates promise, discerning genuine signal from deliberate distortion remains a Sisyphean task. The frontier isn’t simply about optimizing algorithms, but about modeling the very mechanics of deception. The system’s chronicle will need to account not only for what is said, but why it is said, and to whom.

Deployment is merely a moment on that timeline, a snapshot of performance. The real question isn’t whether this framework achieves peak accuracy, but how it loses accuracy over time, and how that loss can be anticipated and mitigated. The pursuit of perfect prediction is a fallacy; the intelligent system will be the one that understands its own limitations, and adapts accordingly.


Original article: https://arxiv.org/pdf/2512.20082.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-12-24 08:30