On Sunday evening, the Minister of Communications and Digital Technologies withdrew South Africa’s Draft National Artificial Intelligence Policy. The reason is extraordinary and worth dwelling on.
The very document meant to govern AI in South Africa had been compiled with the assistance of generative AI, and contained, on the Minister’s own account, “various fictitious sources in its reference list”. A policy intended to set the country’s terms for responsible AI use was undermined by the failure mode it was meant to address.
It is tempting to read this as a uniquely South African story. It is not. It is the most public example to date of a problem that lawyers, judges, regulators and policymakers are now addressing from different sides of the same desk. This problem is the jagged frontier of AI and law.
What the jagged frontier means?

Supplied image source: Dell'Acqua et al (2023)
click to enlargeThe phrase comes from a 2023 paper by Fabrizio Dell’Acqua and others in the Harvard Business School Working Paper series.
They asked a simple question: When is AI genuinely useful to knowledge workers, and when is it actively harmful?
Their finding, in plain terms, was that AI’s capability ran along a “jagged frontier”. This is one that is sharply uneven, full of dips and peaks, and almost impossible to map from the inside. Within the frontier, AI significantly improves performance. Just outside it, the same tool degrades performance, and the user, by definition, cannot see the line.
For legal work, this is a reality. Legal products are text-driven, pattern-rich and often formulaic, exactly where AI looks most capable in 2026. But it is also citation-dependent and consequence-heavy, exactly where AI’s failures are most expensive. This is because the industry's tolerance for failure is low, and the field is generally seen as one built around exhaustive attention to detail.
The DCDT episode this past weekend is a useful illustration. Of the references in the Draft Policy, several pointed to academic articles and authors that simply do not exist.
The Minister’s assessment, when the withdrawal came, was direct: “The most plausible explanation is that AI-generated citations were included without proper verification.” A department drafting a policy that demands human oversight of AI had itself encountered the practical difficulty of consistently applying that oversight in drafting.
The institutional point matters more than the irony of the episode. Public consultation rests on the premise that the document under review is what it purports to be. When the source list is fictional, the participatory basis of the entire process is compromised.
Not only now, and not only South Africa
Lawyers have encountered similar issues across the globe as AI tools have become more embedded in legal work. South Africa now has its own developing line of authority, including matters such as Mavundla v MEC and Northbound Processing v South African Diamond and Precious Metals Regulator, where counsel relied on AI-generated cases, citations or quotations that were later found not to exist.
International examples, beginning with Mata v Avianca in New York in 2023 and continuing through more recent matters in the United Kingdom, Canada and Australia, show a similar pattern.
Just last week, several media outlets reported that some law firms have faced issues of this kind, notwithstanding standard precautions and emerging best-practice policies. In a database created by French data scientist and lawyer Damien Charlotin, which tracks legal decisions in cases where generative AI produced hallucinated content (typically inaccurate citations), approximately 1,352 cases have been reported so far across several countries.
The consequences for a lawyer who uses AI without adequate safeguards can be serious: court orders requiring explanations, adverse cost orders, and referrals to disciplinary bodies. In some instances, the underlying issue may be less about deliberate misconduct and more about misplaced reliance on fluent, plausible-sounding output.
It is worth noting that the pattern across the regulator and the courtroom examples are identical. A user prompts an AI model for help with something at the long tail of their field. The model produces a confident, well-formed and otherwise believable answer. The user, persuaded by fluency, treats fluency as evidence of grounded research and the output is filed or published. However, only later does somebody actually look up the citation.
Why citations, specifically, and why so reliably
A question not often asked is not whether AI hallucinate, but why are citations such a consistent point of failure. The following mechanisms at play offer an explanation of why this is, and why verification has to be made structural rather than discretionary.
AI models are not optimised for truth
Fundamentally, a large language model (LLM) is trained to predict the next token given prior context, not to track what is true in the world. This is rooted in statistical probability, not the deterministic logic that lawyers spend years training on.
There is no internal “truth” variable in an LLM. The output is a probability distribution over plausible continuations of text that 'looks like' its training data. It is important to remember: fluency is the objective; truth is a by-product.
Citations are pattern-rich and pattern-cheap
Legal and academic citation formats are stereotyped in a way few other types of writing are. 'Party v Party' (Year) Volume Reporter Page. Author surname, initials, year, title, journal, volume, page. Author names cluster within fields; journals cluster around topics.
The model has learned the 'grammar' of citations independently of their referents. It can therefore produce a perfectly-formed citation for a paper that does not exist.
The training corpus is loosely compressed into the model’s weights
A model’s knowledge of a citation is not a stored record. It is very important for lawyers using AI to remember this. It is a reconstruction from compressed patterns.
For high-signal sources, such as leading SCA judgments or foundational papers, the signal is strong enough that the reconstruction is faithful and more accurate than not. For long-tail sources, such as an obscure article in a niche journal or an unreported High Court matter, the signal is sparse, and the model interpolates. It generates what it predicts is likely to exist given the topic, the field and the year, even if it does not.
Hallucinated citations, accordingly, almost always sit in the long tail. The model is usually not wrong about the well-known subject matter. However, it is frequently wrong about the obscure article that supports the proposition the user actually needs.
The fine-tuning layer rewards confident answers over honest abstention
Reinforcement learning from human feedback, the layer that turns a base model into a chat assistant, is designed to systematically score complete, helpful responses higher than those with calibrated uncertainty. Fine-tuned models, therefore, learn to produce something rather than nothing, even when the calibrated answer should be: “I do not know.”
Taken together, these features help explain why inaccurate citations are a recurring risk. The model can reproduce the form of citations, has weaker coverage of the “long tail” of sources, is rewarded for providing complete answers, and does not reliably distinguish between a real reference and a plausible-sounding one.
In practice, fluency should not be treated as a substitute for independent verification.
Melissa Steele 27 Aug 2025 What this means for the profession
The practical consequence for law firms, in-house counsel and regulators is the same. Intervention cannot be discretionary or rely on individual users’ good practice. Policies alone will not achieve the rigour needed to scale the safe use of AI. It must be a verification chain: a structural process by which any AI-assisted output is checked against primary sources before it leaves the building.
Where AI genuinely helps lawyers, such as in research, factual analysis and summarising materials already in front of lawyers, the use case is real. Where it fails, in claims of fact, citations and novel legal questions, the failures are concentrated, predictable and avoidable through process.
The DCDT episode should act as a practical reminder to the legal profession that the frontier is jagged for everyone. A government department drafting national policy can encounter the same risks as counsel before the High Court. The discipline that supports reliable output is not seniority, sophistication or good intent alone. It is verification.
The Minister’s own framing on Sunday is worth taking seriously: “This unacceptable lapse proves why vigilant human oversight over the use of artificial intelligence is critical.”
That is the right lesson. The challenge for lawyers and for the next draft of South Africa’s AI Policy is to embed that oversight in the work, rather than discover its absence afterwards.