This isn't really AI...
Published:

...at least, not the way science fiction has trained us to think of AI. The term AI for LLMs is kind of a misnomer, at least in terms of what we expect when we hear "AI".
Fiction has conditioned us to believe that AIs are smart and accurate - able to use the best of all resources as if with the minds of the best experts in every relevant field. Infallible to the extent that information exists. Knowing the rules and how to follow them. Think of Lieutenant Commander Data, or even just the Enterprise's ship computer which is able to take readings and give accurate answers based on them. Think about AIVAS, which knew how to plan things out years in advance and manipulate those around it to further its own goals. (Fortunately for the Pernese, AIVAS's goals were set by the original colonists for the good of their descendants, and weren't changed.) If anything, fictional AIs tend to be too rigid, such as HAL's insistence on finishing the mission to the exclusion of all other concerns, even when malfunctioning.
The reality of the LLM based agents we see today (which are often called AI, and touted as capable of replacing human workers) is very different than this fictional ideal. They're fallible, probabilistic machines; not the height of accuracy and precision. They can be wrong as much as they can be right, and can make assertions based on sources that have nothing to do with their assertions. If anything, evidence shows that the LLMs' hallucinations are getting worse as the body of training material and context grows larger. According to the New York Times, when OpenAI upgraded from o3 to o4-mini, hallucinations went from 33% of the time in one benchmark, up to 48% - and even worse in a second benchmark which placed them at 51% and 79%! But users expect these LLMs to provide correct and factual information and depend on them to make decisions? This is a disaster waiting to happen, if it hasn't already. (Spoilers: relying on AI output without adequate human oversight already has resulted in major issues, in spite of instructions Lavingia believed would avoid the exact errors made.)
Humans can be fallible too, of course. But humans can learn, or be held responsible if they don't. Unfortunately, an LLM is more like the thousand monkeys trying to produce Shakespeare by banging away on typewriters than it is like a true reasoning machine. It's incentivized to produce results that humans like better, and developers have tried to constrain it in ways that make it more useful, but it ultimately has no actual understanding of the things it is saying, nor ability to meaningfully learn. You can flag an answer as "not good", but you can't explain to it what the mistake was and have it understand and learn and specifically avoid that mistake in the future. It can, to an extent, remember your preferences and try to account for them in responses; but it never evidences any genuine understanding of those preferences nor why they exist. It can't use that knowledge in a way that would make it behave meaningfully different in scenarios that were not explicitly contextualized. And there is no meaningful way to hold it accountable, no process for accountability. You can't fire an LLM, or fine it, or imprison it. It is not an entity, just a tool. If it makes mistakes, it's still humans who are, and must be, accountable for whether or not the mistakes are caught.
It might remember a piece of data, or it might not. It might get it wrong. Humans have that flaw too, but we made computers to reduce it, not to do it for us. Programming languages exist in large part to have strict, deterministic rules for behavior; however, LLMs operate on natural language processing, and natural languages are polysemous - a single word can have multiple meanings depending on context, much less tone of voice. Natural languages are ambiguous and non-deterministic. Even ignoring hallucinations, LLMs can easily misunderstand instructions in subtle ways that damage the output in non-obvious ways that only show up over time.
One of the ways that users overestimate the trust they should place in an LLM was coined the Gell-Mann amnesia effect by Michael Crichton. In short, the effect is when someone reads an article about a domain they are familiar with, and observes a wealth of errors, to the point that the article is worse than useless, but then goes on to imagine the rest of the articles in the publication (on topics with which they're less familiar) are accurate enough to take at face value. Crichton is talking about media such as newspapers, but the same effect happens when using an LLM; I've even caught myself doing it. After asking questions in an area I'm very familiar with (TypeScript, for example), and fighting over errors and incorrect interpretations of instructions, I then somehow imagine that the same LLM is better qualified to give me advice about dog breeds or interpersonal relationships than it was programming assistance.
I should know better, though. For another example, I was recently playing Eastward, and my character was standing in Greenberg outside the store, looking at the TV where you play Earth Born. I asked an LLM a question about Earth Born and Pixballs in the context of being in Greenberg, and as part of its reply, it "helpfully" supplied "the TV in Greenberg is upstairs in the general store". Uh, no. Not only is the TV outside the store, not in it, there isn't even an upstairs in the general store. It isn't a big deal with information that's easily and immediately falsifiable, such as in this case, but how often does it make such incorrect assertions when the user doesn't have the knowledge or context to know whether it's true? The more I see behavior like this, the harder time I have trusting anything that it says to me. This is far from the first, or only, case of blatantly incorrect information that I've experienced, either.
The LLMs I've interacted with don't seem to be able to admit it when they don't know something, either. Of course, there are a distressing number of people who will be confidently wrong - people are not perfect, either, after all - but an honest and self aware person will tell you when they're unsure about an answer to a question (and you generally learn not to trust those who are not honest and self aware!). An LLM will not. In fact, often when I call one out on a blatantly incorrect statement, it will agree with me that the statement is incorrect, and then pretend that I was the one that said the wrong thing, not the LLM! It doesn't seem to distinguish the flow of conversation correctly, and places an equal value on both its own misconceptions and the guiding voice of the user.
When it comes to sources, LLMs don't have any true understanding of the material they ingest, and thus have no way of filtering correct information from incorrect information. As a user, you have no way of knowing just where the training data came from, no way to determine whether good or bad sources were used in forming the reply. Even asking for sources is not a sure panacea. I have seen links supplied as "sources" for provided information that are completely unrelated, and I've also had links supplied as sources that aren't actually real links: they don't even point to anything. If it's important that the information you receive is accurate and correct, be sure to find and vet sources yourself, don't just rely on an LLM.
At least with people who tend to be confidently wrong, it's simple to understand not to trust that person's opinion. It's fairly easy, especially in a field in which you have some competence yourself, or a subject you've been doing some research into across multiple sources, to discard a source as unreliable and move on. But with the LLMs, all their sources are amalgamated, there's no way to gauge the general "knowledge" correctly as any given answer could be generated from correct or incorrect sources; thus, the entire usefulness of the system is greatly degraded.
Don't get me wrong, though. I do find LLMs to be useful tools for a number of tasks. They excel at things like being a partner to bounce ideas off of, or remembering that one weird word I read somewhere and want to use now. They're good at producing checklists to jog my memory for things I normally feel competent at. They're great for having a place to work through thoughts against a patient and non-judgemental but responsive backdrop, even if the replies themselves are low-confidence. They're fairly good at taking a complex, but not too large, chunk of text or code and breaking it down into more understandable pieces. My goal with this article is not to say that LLMs aren't a useful tool; they certainly are. But I do want to push back against the hype which equates them with the platonic ideal of the AI that we have formed from science fiction, because they are not that. From all indications of how improvements and progress are slowing down and becoming refinement rather than breakthroughs, they never will be that, either. Use them for what they are: cautiously, without expecting more than you're actually getting out the machine.