Decoding Conspiracy: How AI Agents Spot the Markers of Belief

Author: Denis Avetisyan

Researchers are harnessing the power of artificial intelligence to identify the subtle linguistic cues that indicate belief in conspiracy theories and understand how these ideas spread.

The distribution of labels reveals the characteristics of conspiracy theories, providing insight into their inherent structure and allowing for targeted detection strategies.

This paper details an agentic workflow leveraging open-weight large language models for psycholinguistic marker extraction and conspiracy endorsement detection, demonstrating improved performance and adversarial robustness at SemEval-2026 Task 10.

Traditional natural language processing often conflates semantic understanding with structural localization, hindering accurate psycholinguistic analysis and reliable conspiracy theory detection. This paper, ‘AILS-NTUA at SemEval-2026 Task 10: Agentic LLMs for Psycholinguistic Marker Extraction and Conspiracy Endorsement Detection’, introduces an agentic workflow leveraging large language models to jointly address these challenges. Our approach achieves state-of-the-art performance through innovations like Dynamic Discriminative Chain-of-Thought reasoning and an “Anti-Echo Chamber” architecture, demonstrating improved robustness and interpretability. Could this decoupled design establish a new paradigm for psycholinguistically-grounded NLP and more effectively combat the spread of misinformation?

Deconstructing the Echo: The Challenge of Conspiracy Endorsement

The growing prevalence of conspiracy theories online necessitates increasingly sophisticated detection methods, yet current approaches frequently falter when faced with the subtleties of human language. Traditional techniques, often relying on keyword spotting or simplistic sentiment analysis, struggle to differentiate between genuine endorsement and more complex communicative intents. This difficulty stems from the capacity for nuanced expression; individuals may allude to conspiracies without explicitly stating belief, employ sarcasm, or discuss them critically while reporting on their existence. Consequently, automated systems often misinterpret context, leading to both false positives – incorrectly flagging legitimate discussion – and false negatives, failing to identify genuine advocacy for unsubstantiated claims. Addressing this challenge requires a shift towards models capable of deeper semantic understanding and contextual reasoning, moving beyond surface-level analysis to grasp the intent behind the message.

Distinguishing genuine endorsement of conspiracy theories from satirical commentary, neutral reporting, or even ironic statements presents a significant challenge for automated detection systems. This difficulty is compounded by phenomena like Poe’s Law, which posits that, without explicit indication of the author’s intent, parody and genuine belief can become indistinguishable. Consequently, systems struggle to discern whether a statement promotes a conspiracy or merely discusses it, leading to frequent misclassifications. The inherent ambiguity in online communication, where cues like tone and body language are absent, further exacerbates this problem, requiring increasingly sophisticated approaches to contextual understanding and intent recognition to accurately identify true endorsement.

Current automated systems designed to identify endorsement of conspiracy theories frequently succumb to what researchers term the ‘Reporter Trap’. These algorithms struggle to differentiate between text reporting on a conspiracy and text advocating for it, leading to false positives. A news article detailing the claims of a conspiracy theory, for example, may be incorrectly flagged as endorsing those claims simply because it mentions them. This misclassification arises from a reliance on keyword spotting or superficial pattern recognition, failing to account for crucial contextual cues indicating objective reporting rather than genuine belief. Consequently, these systems often generate a high volume of inaccurate results, hindering effective monitoring and analysis of online conspiracy-related content.

AILS-NTUA: Architecting an Agentic Firewall

AILS-NTUA utilizes an agentic workflow by deploying a system of interconnected language models (LLMs) that collaboratively analyze input text. This approach moves beyond single-model analysis by assigning different LLMs specific roles and perspectives within the detection process. Coordination between these agents is central to the system’s functionality, allowing for a multifaceted examination of the text and a reduction in reliance on any single model’s interpretation. This distributed analysis aims to enhance the system’s ability to identify subtle cues and patterns indicative of endorsement, while simultaneously increasing its resilience to adversarial attacks or biased inputs.

The Anti-Echo Chamber Council within AILS-NTUA is composed of a parallel set of language model agents specifically designed to mitigate confirmation bias during endorsement detection. This is achieved by structuring the agents to independently analyze text from differing perspectives and challenge initial endorsements. Each agent operates with a degree of autonomy, evaluating evidence and formulating its own conclusions, thereby creating internal disagreement. This deliberate introduction of diverse viewpoints improves the overall robustness of the system, preventing premature convergence on potentially biased conclusions and leading to more reliable detection of endorsements.

The AILS-NTUA architecture leverages GPT-5.2 as its foundational language model, enhancing performance through the implementation of Dynamic Discriminative Chain-of-Thought (DD-CoT). DD-CoT facilitates a more refined semantic analysis of input text by adaptively selecting reasoning paths, resulting in a reported macro F1 score of 0.81 when applied to the extraction of psycholinguistic markers indicative of conspiracy theories. This score demonstrates the system’s capacity to accurately identify subtle linguistic cues associated with conspiratorial content.

Forging Robustness: Contrastive Retrieval and the Art of Negative Examples

AILS-NTUA employs contrastive retrieval as a data augmentation technique to create a training dataset encompassing both supportive and contradictory evidence. This method actively seeks examples that are semantically similar to a given input – positive examples – and those that are dissimilar, representing negative examples. By intentionally including a wide range of both positive and negative instances, the system aims to improve its ability to generalize and accurately assess claims across diverse and potentially ambiguous scenarios. The resulting dataset provides a more complete representation of the possible inputs the model may encounter during deployment, thereby enhancing robustness and overall performance.

The training methodology incorporates ‘hard negatives’ – data instances initially misclassified by the model but correctly labeled – to enhance performance. These examples, while challenging, provide a stronger training signal than easily classified instances, forcing the model to refine its decision boundaries. AILS-NTUA observed a quantifiable improvement in hard negative accuracy on conspiracy detection, increasing from 0.62 to 0.79 through the systematic inclusion and refinement based on these challenging examples. This indicates a significant improvement in the model’s ability to correctly identify subtle or ambiguous cases that would previously have been misclassified.

ChromaDB functions as the vector database used to store and efficiently retrieve the embeddings generated from text data during the contrastive learning process. Utilizing ChromaDB enables rapid nearest neighbor searches, identifying similar examples – both positive and crucially, hard negatives – required for training. This accelerated retrieval of relevant examples directly reduces training time and improves overall model performance in conspiracy detection. The database architecture supports scalable storage and retrieval, allowing for expansion of the training dataset without significant performance degradation, and is integral to the iterative refinement of the model’s ability to discriminate between factual and conspiratorial claims.

The effectiveness of the retrieval process in building a robust training dataset relies heavily on semantic discrimination – the ability to distinguish between nuanced meanings and contextual variations within text. This necessitates moving beyond simple keyword matching to understand the underlying intent and factual basis of a claim. A system with strong semantic discrimination can identify examples that appear similar but differ in crucial details, ensuring that the model is exposed to a diverse range of challenging cases. Without this capability, the retrieval process may return irrelevant or redundant examples, hindering the model’s ability to generalize and accurately detect misinformation, particularly in complex domains like conspiracy theories.

Beyond the Algorithm: Validation, Transferability, and the Pursuit of Transparency

Researchers evaluated the adaptability of AILS-NTUA by successfully integrating it with the Qwen-3-8B-Instruct large language model. This cross-platform implementation signifies a crucial step towards generalized application, moving beyond model-specific constraints. The demonstrated transferability suggests that the psycholinguistic marker extraction techniques developed within AILS-NTUA are not inherently tied to a particular model architecture, thereby expanding its potential for use across a wider range of natural language processing systems and datasets. This flexibility is essential for robust and scalable conspiracy endorsement detection, offering a pathway to deploy the methodology in diverse analytical contexts.

The research team prioritized transparency and future development by comprehensively logging the entire experimental workflow using MLflow. This meticulous tracking captured all parameters, code versions, datasets, and resulting metrics, creating a fully reproducible record of the study. Beyond simply verifying the results, this approach allows for seamless continuation of the work; researchers can readily build upon the existing framework, experiment with different configurations, and efficiently optimize the model for enhanced performance. The use of MLflow not only strengthens the scientific rigor of the findings but also fosters collaborative innovation and accelerates progress in conspiracy endorsement detection through improved semantic role reasoning.

The methodology demonstrably enhances the identification of conspiracy endorsement through a refined extraction of psycholinguistic markers. Results indicate a weighted F1 score of 0.79, representing a 0.03 performance gain compared to previous methods. Critically, the system exhibits a significant 2.7 point increase in Actor F1 score, attributable to advancements in semantic role reasoning – the ability to accurately discern the relationships between actors and their actions within text. This improvement suggests a more nuanced understanding of conspiratorial narratives and a greater capacity to distinguish genuine endorsements from simple mentions or critiques, ultimately bolstering the precision of detection systems.

The system presented doesn’t merely detect conspiracy theories; it dissects the language used to construct them, pinpointing psycholinguistic markers with an agentic workflow. One considers this approach akin to reverse-engineering belief itself. As Tim Berners-Lee observed, “The Web is more a social creation than a technical one.” This sentiment echoes the work’s focus; it isn’t simply about technical accuracy in identifying patterns, but understanding the social construction of narratives – how language shapes and reinforces belief systems. The agentic LLM, then, acts as a tool to deconstruct these constructions, revealing the underlying mechanisms at play, and challenging the notion of a singular ‘truth’.

What’s Next?

The demonstrated success in extracting psycholinguistic markers and flagging conspiratorial reasoning merely highlights the depth of the underlying problem. The system functions, yes, but the very act of building a detector necessitates a precise articulation of what constitutes ‘conspiracy’ and ‘manipulation’-definitions inherently subject to perspective and, disturbingly, to manipulation themselves. It’s a recursive challenge: to identify the patterns of deception, one must first embody a model of truth, a precarious position given the plasticity of belief.

Future work will inevitably focus on adversarial refinement, a digital arms race. Yet, the most pressing issue isn’t simply improving robustness against increasingly clever obfuscation. It’s acknowledging that these models, at their core, are sophisticated pattern-matchers, not arbiters of truth. The best hack is understanding why it worked, and every patch is a philosophical confession of imperfection.

Ultimately, the real frontier isn’t more accurate detection, but a deeper understanding of why these patterns resonate. What vulnerabilities in human cognition are being exploited? The task isn’t to build a perfect lie detector, but to inoculate against the lies themselves. That, predictably, is a far messier, more human problem.

Original article: https://arxiv.org/pdf/2603.04921.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Deconstructing the Echo: The Challenge of Conspiracy Endorsement

AILS-NTUA: Architecting an Agentic Firewall

Forging Robustness: Contrastive Retrieval and the Art of Negative Examples

Beyond the Algorithm: Validation, Transferability, and the Pursuit of Transparency

What’s Next?

See also: