When AI Gets It Wrong and Why
That Shouldn’t Scare Us
By
Richard Sebaggala (PhD)
Stories about lawyers being “caught using AI wrongly” have become a familiar feature of professional headlines. One recent case in Australia illustrates the pattern. A King’s Counsel, together with junior counsel and their instructing solicitor, was referred to state disciplinary bodies after artificial intelligence–generated errors were discovered in court submissions. The documents contained fabricated or inaccurate legal references—so-called hallucinations—which were not identified before filing. When the court sought an explanation, the responses were unsatisfactory. Costs were awarded against the legal team, and responsibility for the errors became a matter for regulators.
The episode was widely reported, often with a tone of alarm. Artificial intelligence, the implication ran, had intruded into the courtroom with damaging consequences. The lesson appeared obvious: AI is unreliable and should be kept well away from serious professional work.
That conclusion, however, is too simple—and ultimately unhelpful. The problem in this case was not that artificial intelligence produced errors. It was that its output was treated as authoritative rather than provisional. What failed was not the technology itself, but the assumptions made about what the technology could do.
Hallucinations are not moral lapses, nor are they merely the result of careless users. They are a structural limitation of current large language models, arising from how these systems are built and trained. Even developers acknowledge that hallucinations have not been fully eliminated. To frame such incidents as scandals is to overlook a more productive question: how should AI be used, and where should it not be trusted?
A small experiment of my own makes the point more clearly. I recently asked ChatGPT to convert students’ group course marks into an Excel-style table, largely to avoid the tedium of manual data entry. The task involved nothing more than copying names, registration numbers, and marks into a clean, structured format. The result looked impeccable at first glance—neatly aligned, professionally presented, and entirely plausible. Yet closer inspection revealed several errors. Registration numbers had been swapped between students, and in some cases, marks were attributed to the wrong individuals, despite the original data being correct.
When I queried why such mistakes had occurred, given the simplicity of the task, the answer lay in how AI systems operate. These models do not “see” data as humans do. They do not inherently track identity, ownership, or factual relationships unless those constraints are explicitly imposed. Instead, they generate text by predicting what is most likely to come next, based on patterns absorbed during training.
When faced with structured material—tables, grades, legal citations, or names linked to numbers—the system tends to prioritise surface coherence over factual precision. The output looks right, but there is no internal mechanism verifying consistency or truth. This is the same dynamic that produced fabricated case citations in the King’s Counsel matter, and it is why hallucinations also appear in academic references, medical summaries, and financial reports.
Language models are not databases, nor are they calculators. They generate language probabilistically. When asked to reproduce or reorganise factual information, they may quietly reshape it, smoothing entries or rearranging details in ways that make linguistic sense but undermine accuracy. The problem is compounded by the absence of an internal truth-checking function. Unless an AI system is deliberately connected to verified external sources—databases, spreadsheets, citation tools—it has no reliable way of knowing when it is wrong. Confidence, in this context, is meaningless.
The risk increases further when many similar elements appear together. Names, numbers, and references can blur, particularly in long or complex prompts. That is what happened in my grading exercise and what appears to have happened in the legal case. Add to this the way such systems are trained—rewarded for producing answers rather than declining to respond—and the persistence of hallucinations becomes easier to understand. Faced with uncertainty, the model will usually generate something rather than admit ignorance.
This is why the lawyers involved did not err simply by using AI. They erred by relying on its output without independent verification. The same risk confronts lecturers, accountants, doctors, policy analysts, and researchers. In all these fields, responsibility does not shift to the machine. It remains with the professional.
Used properly, artificial intelligence is a powerful tool. It excels at drafting, organising ideas, summarising material, and reducing the burden of repetitive work. It can free time for deeper thinking and better judgment. Where it remains weak is in factual custody, precise attribution, and tasks where small errors carry serious consequences. Confusing these roles is what turns a useful assistant into a liability.
The lesson to draw from recent headlines is therefore not that AI should be avoided. It is that its limits must be understood. AI can work alongside human judgment, but it cannot replace it. When that boundary is respected, the technology becomes a collaborator rather than a shortcut—an amplifier of human reasoning rather than a substitute for it.
Fear, in this context, is the
wrong response. What is needed instead is literacy: a clear-eyed understanding
of what AI can do well, what it does poorly, and where human oversight is
indispensable. The gains on offer—in productivity, creativity, and learning—are
too substantial to be dismissed on the basis of misunderstood failures.
No comments:
Post a Comment