
The best questions often arrive unexpectedly. Over dinner recently, a friend’s six-year-old daughter turned to address the table and, apropos of nothing, asked: “Who was the first human?” Since my own child – an unblinking 10-month-old with potato across his forehead – was unable to respond, the assignment fell by default to the four adults assembled. Who was the first human? Who was the first human? Who was the first human? The question hung in the air like bar smoke. A clang of silverware rang out from the nearby kitchen. We shifted in our seats. The questioner grew impatient.
“It depends,” I offered, “on how you define a human.” Silence. Dissatisfaction. The conversation moved to another topic.
Later that night I mulled over the philosophical weakness – nay, banality – of my answer. To state that one thing depends upon either the theoretical or scientific classification of some other thing seems, on the surface, a reasonable postulation. It asks that we trouble the conceptual categories that make knowledge possible in the first place. It claims that structures are unstable. It challenges objectivity. It sets up parameters. It states something. It states the obvious. It states absolutely nothing. It is a bullshit answer. The performance of rigour. Neutrality de rigueur. A six-year-old could sense this. Eventually, I realised why my response felt both adequate but also completely underwhelming: it was precisely the kind of thing a Large Language Model would say.
To confirm my hunch, I opened my phone and typed in the query. “Who was the first human?” Claude: “That’s a fascinating question, and the answer depends on how you define ‘human’.” Next, ChatGPT: “There isn’t a single identifiable ‘first human’.” 2 Both models go on to explain that the anatomical and cognitive traits which we associate with Homo sapiens evolved over thousands of generations. Claude distinguishes between scientific, religious, and mythological perspectives; referencing Manu, the archetypal first man of Hinduism, and so on. ChatGPT provides a five-part framework organised around the concept of a family tree and the practice of genetic tracing. Both models, in short, return the statistical equivalent of “well, actually”. Diplomatic. Structured. Satisfactory but emotionally vacant.
Of course, neither response is inaccurate – they are both broadly correct and logical. What is striking, rather, is their procedural deferral of the question itself; a kind of elocutionary smokescreen that hides their tonal evasiveness. Faced with a child’s demand for singularity – who was the first human? – the models deliver taxonomies, method, multiplicity. They dissolve the figure of the human into a process. In regurgitating the dominant scientific account of human evolution, they also adopt a particular stylistic posture: one that privileges qualification over commitment, context over certitude, expansion over a foray into humour, irony, or sarcasm. They’re like the corporate psychopath who, when presented with a task that requires nuance and a bit of wit, impulsively blurts: “Let’s map this out, shall we!” Programmatic sludge. Schematic sewerage. Even worse, a flowchart.
How to account for this weakness? While my familiarity with Darwin’s theory of natural selection is on par with that of the average person, LLMs possess a surplus of information – layer upon layer of powerful data. The largest body of information ever assembled. They have more than enough knowledge. Too much, one might argue. And yet so vast is the corpus on which they are trained that the substance of their answers often feels scrambled with noise. Tangled and tinny. Like cheap speakers in a busy cafe. Like commercial television. Most crucially, though, the LLM cannot assume the burden of choosing in such a way that hazards the essence of human absurdity. It has learned to read and rehearse, not to tête-à-tête in the manner Oscar Wilde famously described as a “sort of spiritualized action”. 3 It has no orientating affect. It has no joie de vivre. It is structurally allergic to originality. Faced with such a question, then, an LLM will always fail because it favours optimisation over risk – machinic pragmatism over old-fashioned conversational fancy.
And what about truth and falsity? By some accounts, LLMs are incapable of delivering truthful statements. 4 Rafael Alvarado points out that “even when LLMs produce ostensibly true sentences, they do so accidentally—these sentences are all cases of what epistemology calls unjustified true belief.” 5 He goes on to clarify that they are “unjustified because the systems that generate them do not incorporate any of the mechanisms by which humans seek or validate truth claims, either rationally or empirically.” But what is truth, anyway, if not an elusive and historically contingent concept? Truths change. Perspective matters. What is more, “standards of truth themselves have a history” and a “probabilistic” approach to truth is a “comparatively recent development in epistemology”, emerging alongside eighteenth-century efforts at the quantification of probability; themselves intertwined with religious and philosophical beliefs. 6
In many cases, too, fabrication is a good and necessary thing. The embellished anecdote. The truth distorted for no reason other than amusement. As Aaron Hanlon notes, our “awareness of fictionality primes us to respond as readers.” 7 Fiction also primes our analytical thinking. Why, then, has the capacity to produce truth become a measure of algorithmic intelligence? After all, the broader obsession over whether LLMs can craft truthful statements sits in paradoxical relation to another truth: humans lie. We’re extremely good at lying. Does this mean that I should have lied? Or should I have confessed that I simply do not know who, exactly, was the first human? Confronted with a prompt it cannot complete, the LLM is likely to falsify – and then justify – something that represents an answer. Like a used car salesman who will say anything to sell you an early 2000s Jeep Wrangler, it will defend, with ostensible certitude, a claim that does not occur in its training set. This has come to be known as AI’s hallucination, as if somehow such a flex is unique to machinic outputs. 8 But we have all witnessed this kind of performance at the pub. We tolerate it, too, in our systems of government.
- Virgina Woolf, ‘Modern Fiction’ in The Common Reader. New York: Harcourt, Brace and Company (1925): 212. ↩
- The author apologises unreservedly for enacting the most cliché essayistic flex of recent times in quoting fetishistically from a Large Language Model; however, in the context of the above discussion it seems entirely necessary. ↩
- Oscar Wilde, ‘The Critic as Artist’ in Intentions. New York: T.B. Mosher (1904). ↩
- For an incisive account of this debate from the lens of rhetoric, see David J. Gunkel’s recent article ‘Persuasive machines: large language models and the art of rhetoric.’ AI & Society (2026). Here, Gunkel writes: “the rise of LLMs can only appear as an urgent crisis. If what matters in communication is the truthful transmission of knowledge from one mind to another, then systems that generate authoritative-sounding discourse without understanding represent a profound corruption to the order of things.” ↩
- Rafael C. Alvarado, ‘What Large Language Models Know.’ Critical AI 2.1 (2024). ↩
- Luke Munn, Liam Magee and Vanicka Arora, ‘Truth machines: synthesizing veracity in AI language models.’ AI & Society 39 (2024): 2759-2773. On this point, Munn et al. reference Ian Hacking’s important study The Taming of Chance (1990), which charts the 19th century shift from a deterministic worldview to the “probabilisation” of Western society and scientific thought. ↩
- Aaron Hanlon, ‘LLM Outputs Are Fictions.’ Critical AI 2.1 (2024). ↩
- “Hallucination” has been used in the field of computer vision since at least the 1980s. In the early 2000s, the term underwent a semantic shift to signify factually incorrect or misleading machinic outputs in the context of AI. In 2023, Benj Edwards coined “confabulation” as an alternative term to describe factual errors generated by LLMs. In 2024, Michael Townsen Hicks, James Humphries and Joe Slater argued that such falsehoods are better understood as “bullshit”. See Hicks et al. ‘ChatGPT is bullshit.’ Ethics and Information Technology 26.38 (2024); Benj Edwards, ‘Why ChatGPT and Bing Chat are so good at making things up,’ Ars Technica (6 April 2023); Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn and Yarin Gal, ‘Detecting hallucinations in large language models using semantic entropy,’ Nature 630 (2024): 625-630; Peiqi Sui, Eamon Duede, Sophie Wu and Richard So, ‘Confabulation: The Surprising Value of Large Language Model Hallucinations,’ ACL Anthology (2024). ↩