The Storytelling Gap: Why AI Struggles with Fiction

Author: Denis Avetisyan


A new analysis reveals the fundamental limitations of artificial intelligence in replicating the complex narrative structures and emotional resonance of human-authored fiction.

Current large language models lack the capacity for nuanced narrative causation, emotional architecture, and information revaluation necessary for compelling storytelling.

Despite rapid advances in artificial intelligence, generating truly compelling fiction remains a surprisingly intractable problem, a tension explored in this paper, ‘The AI Fiction Paradox’. We argue that this difficulty stems not from a lack of data, but from fundamental mismatches between the logic of storytelling and the architecture of current large language models-specifically, challenges in replicating narrative causation, managing informational revaluation, and constructing multi-scale emotional architecture. These limitations explain the intense demand for copyrighted fiction in AI training and raise a critical question: as AI overcomes these hurdles, what implications will mastery of fiction’s uniquely powerful cognitive and emotional patterns have for human manipulation and control?


The Illusion of Narrative Depth: A Growing Dependence

Contemporary Large Language Models are demonstrating a pronounced reliance on fictional texts as a primary source of training data, a trend that increasingly eclipses the proportion of other text types within their datasets. This isn’t merely a matter of abundance; the very structure and complexity of narratives – with their intricate character relationships, plot developments, and thematic explorations – appear particularly well-suited to the learning mechanisms of these models. Consequently, a significant portion of the data used to train LLMs now consists of novels, short stories, and other forms of fiction, potentially influencing their outputs and shaping their understanding of the world through the lens of storytelling. This shift raises important questions about the nature of artificial intelligence and how it learns to process and interpret information, especially given the inherent subjectivity and imaginative elements present in fictional works.

Despite a growing dependence on fictional texts for training, Large Language Models demonstrate a surprisingly limited capacity for generating truly compelling narratives. Current models achieve a score of only 41.6% on the NoCha benchmark, a metric specifically designed to evaluate performance on tasks demanding comprehensive reasoning throughout entire books. This result indicates a significant deficiency in the ability to maintain narrative coherence and grasp the intricate relationships between events across extended storylines. The low score isn’t simply a matter of stylistic shortcomings; it reflects a fundamental challenge in processing information at a global scale, hindering the creation of fiction that feels logically sound and emotionally resonant. While LLMs can often mimic surface-level elements of storytelling, they struggle to synthesize information and construct narratives that demonstrate genuine understanding of plot, character development, and thematic consistency.

The escalating reliance of Large Language Models on fictional texts for training introduces significant legal complexities regarding copyright infringement. A substantial portion of the data used to develop these AI systems consists of copyrighted novels, short stories, and other creative works, often ingested without explicit permission from rights holders. This practice opens developers to potential lawsuits, as the models effectively learn and reproduce patterns, styles, and even specific plot elements from these protected materials. Determining fair use versus infringement remains a critical challenge, particularly given the scale of data involved and the difficulty in tracing the origins of generated content. The legal landscape is rapidly evolving, forcing developers to explore strategies like data filtering, licensing agreements, and the development of techniques to minimize the reproduction of copyrighted material, all while balancing innovation with legal compliance.

Current state-of-the-art Transformer models, despite their proficiency in many language tasks, demonstrate a surprising fragility when confronted with the complexities of long-form narrative. Studies reveal these models struggle to reliably identify the essential components of a story arc – the rising action, climax, and resolution – even when employing sentiment analysis as a guiding tool. This isn’t merely a matter of computational limitations; it suggests a fundamental disconnect in how these models understand narrative. While capable of statistically predicting the next word in a sequence, they appear to lack the capacity for holistic reasoning about character motivations, thematic development, and the subtle interplay of events that define a compelling story. The consistent failure to accurately map narrative structure indicates that LLMs, despite being trained on vast amounts of fictional text, haven’t internalized the underlying principles of storytelling, hinting at a deeper challenge in achieving true artificial general intelligence regarding creative endeavors.

The Architecture of Feeling: Layered Sentiment and the Illusion of Depth

Emotional impact in fiction is not solely derived from plot progression, but is actively constructed through a deliberate layering of sentiment across all textual elements. This involves considering the emotional valence of individual word choices, the cumulative effect of sentence structure, the emotional weight carried by scene descriptions, and the overall trajectory of the narrative arc. Authors utilize these elements to create a holistic emotional experience for the reader, moving beyond simply what happens in the story to how the story feels at every scale, from the micro-level of prose to the macro-level of thematic resolution. A successful implementation ensures that emotional resonance is not accidental, but rather a consistently reinforced aspect of the reading experience.

Multi-Scale Emotional Architecture involves deliberately constructing emotional responses in an audience by manipulating sentiment at various levels of granularity within a narrative. This process moves beyond simply including emotional content; it focuses on layering these sentiments – from the connotative value of individual word choices and sentence structures to the broader emotional implications of plot events and character interactions. By strategically coordinating these different scales of emotional signaling, a narrative can create a cumulative and sustained emotional effect, increasing reader engagement and influencing their interpretation of the story. The efficacy of this architecture relies on the consistent application of emotional cues across these scales, ensuring coherence and maximizing the intended emotional impact.

Narrative strength is significantly impacted by the depth of character development and the believability of social interactions. Complex characters, exhibiting internal contradictions and evolving motivations, facilitate reader identification and emotional investment. Social dynamics, including hierarchies, alliances, and conflicts, provide a framework for demonstrating character traits and driving plot progression. These interactions must adhere to established behavioral patterns and logical consequences to maintain narrative coherence and ensure the relationships feel authentic, fostering a stronger emotional connection for the audience.

A structurally sound plot, essential for emotional impact, necessitates a logical sequence of events where each outcome is directly and plausibly linked to preceding actions. This causal chain, often referred to as plot mechanics, establishes narrative coherence and prevents arbitrary occurrences that can diminish reader immersion. Effective plot construction doesn’t simply present what happens, but demonstrates why it happens, grounding emotional responses in believable consequences. The strength of these causal links directly influences the perceived authenticity of the narrative and, consequently, the depth of emotional engagement; a weak or illogical chain can undermine even the most compelling character work or thematic elements.

The Revelation of Meaning: Deferred Significance and Narrative Causation

Narrative causation, distinct from logical causation, creates a sense of both surprise and inevitability in audiences. Logical causation establishes a direct, predictable link between cause and effect; a narrative cause, however, may initially appear insignificant before gaining relevance as the story unfolds. This creates a feeling that, while unexpected at the moment, the event was foreshadowed and logically fits within the established narrative context. Skilled storytellers manipulate this perception, introducing elements that only gain full meaning in retrospect, thereby enhancing the emotional impact and perceived coherence of the plot. This differs from simply withholding information; narrative causation relies on the reinterpretation of previously presented details based on new context.

Informational revaluation is a narrative technique wherein details presented early in a sequence acquire increased importance as the story progresses. These initially seemingly inconsequential elements are not explicitly flagged as significant; rather, their relevance becomes apparent through subsequent revelations or contextual shifts. This process creates a sense of depth and believability, as it mirrors how information is often processed in real-world scenarios where complete understanding emerges gradually. Effective implementation relies on the strategic placement of these details and the pacing of their eventual significance, contributing to both surprise and a feeling of retrospective coherence within the narrative structure.

Large Language Models (LLMs) commonly utilize attention mechanisms to weigh the importance of different input tokens when generating text. This approach prioritizes immediate contextual relevance, potentially leading to difficulty with informational revaluation-the process where seemingly minor details acquire significance later in a narrative. Because attention is weighted towards the most prominent elements within a limited context window, LLMs may fail to adequately track or emphasize details that only become crucial with subsequent revelations. This short-sightedness stems from the model’s optimization for predicting the next token based on immediate probabilities, rather than maintaining a comprehensive understanding of long-term implications and deferred meaning within a broader narrative structure.

Recent evaluations demonstrate that incorporating discourse-level features into Large Language Model (LLM) generation processes yields substantial improvements in narrative quality. Specifically, performance metrics have shown an increase of over 40% when LLMs are explicitly designed to model informational revaluation – the technique of initially presenting details as unimportant, then revealing their significance later in the narrative. This enhancement suggests that current LLM architectures, which often prioritize immediate contextual relevance, benefit significantly from mechanisms that facilitate tracking and re-evaluating information across longer narrative spans. The observed improvement indicates that explicitly modeling this process allows LLMs to better approximate the nuanced causality found in compelling storytelling.

The Illusion of Understanding: Beyond Surface Coherence in Narrative AI

The creation of genuinely captivating fictional narratives demands more than simply arranging words in a grammatically correct and logically consistent manner; it requires a departure from systems focused solely on textual coherence and readily apparent emotional cues. Current AI models often excel at mimicking surface-level emotions – a character says they are sad, for example – but struggle to convey the underlying complexities that resonate with an audience. A truly compelling story hinges on nuanced character motivations, subtle thematic development, and the ability to evoke emotional responses through implication and carefully constructed narrative arcs. Therefore, progress in narrative AI necessitates a shift towards systems capable of modeling these deeper elements, moving beyond predictable emotional displays to achieve genuine emotional impact and lasting narrative resonance.

Current narrative AI often prioritizes grammatical correctness and immediate contextual relevance, but compelling stories demand a far more nuanced approach to emotion and information. Sophisticated models require an understanding of multi-scale emotional architecture – the way feelings aren’t monolithic, but rather layered, evolving over time, and influenced by a character’s history and relationships. Crucially, these systems must also master informational revaluation, the process by which new data reshapes a character’s understanding of the world and, consequently, their emotional state. Simply expressing sadness, for example, isn’t enough; an advanced AI should be able to demonstrate how that sadness shifts from grief to acceptance, or even to righteous anger, as a character processes unfolding events and reinterprets past experiences. Without this ability to dynamically manage and revise information – and to connect those revisions to believable emotional responses – narratives will remain superficially coherent but ultimately lack the depth and resonance that captivate audiences.

The creation of compelling, long-form fiction by artificial intelligence presents a unique challenge known as the AI-Fiction Paradox: models proficient in generating locally coherent text often struggle to maintain consistency and meaningful development across an entire narrative. Current systems frequently ‘forget’ previously established details or introduce contradictions as stories progress, hindering genuine engagement. Resolving this requires a shift towards AI capable of not simply tracking information, but actively re-evaluating it – assigning relative importance to facts, recognizing their evolving relevance to the plot, and updating internal representations accordingly. Such models must move beyond simple recall to implement systems of informational weighting and contextual prioritization, enabling them to dynamically manage a narrative’s knowledge base and ensure that plot twists, character arcs, and thematic elements resonate consistently throughout extended storytelling.

The progression of narrative AI promises a fundamental shift in how stories are conceived, delivered, and experienced. Beyond simply generating text, these emerging technologies envision entertainment tailored to individual preferences, where plots dynamically respond to user choices and emotional states. This isn’t merely about ‘choose your own adventure’ on a grander scale; it’s the potential for truly living within a narrative, with AI acting as a responsive world-builder and character director. Such systems could generate unique content endlessly, offering personalized mythologies, interactive dramas, and gaming experiences that evolve with each interaction. The implications extend beyond entertainment, potentially revolutionizing educational simulations, therapeutic interventions, and even the preservation of cultural storytelling traditions by adapting and re-imagining them for future generations.

The pursuit of artificial narrative reveals a curious irony. This work, dissecting the AI Fiction Paradox, highlights how easily systems can mimic form-generating sentences, constructing plots-yet utterly fail to grasp the underlying function of storytelling: emotional resonance and informational revaluation. It’s a predictable outcome, really. As Paul Erdős once observed, “A mathematician knows a lot of things, but knows nothing deeply.” Similarly, these large language models amass data, yet lack the ‘deep’ understanding of human sentiment arcs and narrative causation necessary to craft truly compelling fiction. Every generated story feels like a prophecy of its own computational limitations.

The Looming Silhouette

The pursuit of automated fiction reveals less about the potential of artificial intelligence and more about the fragility of human understanding. This work suggests the problem isn’t simply generating sentences, but constructing believable worlds where information accrues not linearly, but through the distorting lens of character motivation and emotional resonance. Attempts to codify ‘sentiment arcs’ feel increasingly like charting weather patterns – predictive, yet ultimately subject to chaotic, unforeseen shifts. The architecture isn’t structure – it’s a compromise frozen in time.

Future research will likely focus on increasingly granular modeling of human cognitive biases, attempting to inject ‘believability’ through statistical mimicry. Yet, this risks confusing correlation with causation, replicating the surface of narrative without grasping its underlying principles. Technologies change, dependencies remain. The true challenge isn’t teaching a machine to tell a story, but understanding why stories matter in the first place – a question that may lie beyond the reach of computation.

One suspects the ‘AI Fiction Paradox’ isn’t a problem to be solved, but a boundary to be acknowledged. The limitations revealed aren’t failures of engineering, but affirmations of the uniquely human capacity for meaning-making, for finding patterns in noise, and for embracing the beautiful, irreducible ambiguity at the heart of all great storytelling.


Original article: https://arxiv.org/pdf/2603.13545.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-17 21:05