The Deceptive Nature of Understanding and Reasoning in AI: A narrative on Large Language Models
AI or humans: who is currently winning it when it comes to logic?
“Appearances are often deceiving.” — that’s a well known quote from the Greek fabulist Aesop’s fables. What appears to be, may not always be. In this article, I discuss how the same applies to present-day Large Language models (LLMs) in the ever-changing world of AI.
LLMs today are impressive — and why shouldn’t they be, they indeed deserve all that recognition and attention. ChatGPT, Bard, LLaMA, Mistral and so on have amazed us humans with their ability to make our everyday lives easier in so many different ways — from helping compose e-mails and write creative content for presentations to fixing our programming code. Many of us would have even used LLMs to solve “difficult” maths and science problems. However, with continuing usage of LLMs, many of us would consider them to be as capable of logical reasoning and inference as humans. And I don’t say they aren’t good, LLMs exhibit astounding performance these days on many of such tasks.
However, behind these exhibits — lies a hard truth. While it may appear that the LLMs are actually “thinking”, they really aren’t remotely close to doing so. Imagine that when these LLMs have been trained with every piece of textual data on the internet — every search engine, website and associated links within that website, Wikipedia article, Reddit forum, PDFs/Word Files/Excel spreadsheets, potentially even text present within images and much more, there is a very good chance that the difficult logical problem that we are asking it to solve has been already seen by the LLM, at least parts of it. And it would have “memorised” the template to solve that kind of specific problem. Does it really understand the “concepts” and “underlying natural phenomenon” when solving that difficult maths or science problem, likely not!
Consider an analogy to us humans — let us take an example here. When I lived in India, I studied the French language as an elective for 4 years at University. This included extensively studying French grammar. Today, I can communicate in basic French phrases, but wouldn’t be even 20% close to the level of a native French speaker. Is that my weakness? Probably not — as what I studied was mostly “bookish” and academic in nature. I didn’t have native French speakers around me and didn’t get much opportunity to interact naturally in French in my everyday life. All I was doing was using basic “templates” that I had memorised to fill-in-the-blanks.
There are many schools of thought around what LLMs can and cannot do today. I personally found the views of Yann LeCun (VP & Chief AI Scientist at Meta) to be very interesting and that I completely agree with. According to Yann “Answering by approximate retrieval or by understanding+reasoning are two ends of a spectrum.” As Yann also points out in this same post referring to François Chollet’s remarks as well, “current LLMs have pretty much failed every single reasoning and planning test thrown at them, as long as it wasn’t in their training set and they could not rely on mere retrieval.”
Recently, I tried some popular LLM models on few really simple problem tasks for us humans — (a) counting how many unique numbers are present in a long list (b) asking the model to sort a list of dates in ascending order. And I must say, the LLM models failed in this task, either mixing up and arriving at the months or years in the wrong order. No offence to these LLM models though — as they are really great otherwise in a plethora of other problem tasks and have been super helpful for me and billions of people around the world in recent times.
Thereby, the hard truth is that present-day LLMs are simply “most-likely next word predictors.” Given few words in an input prompt, what they are doing is simply populating the “template” by filling-in-the-blanks with the tokens that have the highest probability. Given a difficult math problem, it would be just filling in the answer by looking up information that it is seen from every single resource in its training data. This is not to say that LLMs have not learnt anything — they are quite capable these days of understanding our “input prompts” and the overall meaning of a sentence. However, they are very, very far from humans in really doing logical inference or reasoning. As a deeper example, a human can score well in a scientific test full of difficult numerical questions, say by memorising few formulas to solve these numerical problems without really understanding basics concepts and underlying natural phenomenon about e.g. Fiber Optics, Satellite communications or Digital Signal Processing for instance. While achieving exceptional performance here, the underlying mental model of the human has not really performed a logical inference or reasoning in this case. In the same manner, LLMs today can appear to do well in logical and reasoning questions, but if they do so, it is likely that your question (or parts of it) appeared in their training data, rather than them thinking and making inferences in the manner a human-mind that understands the concepts would do.
There is a lot of ongoing progress in the world of LLMs — and now in the world of Large Multimodal Models (LMMs) too. While I do not claim to be an expert in state-of-the-art AI research, it seems that there is still lot of progress needed in the AI research community to transition from look-up to logical reasoning — before LLMs can truly understand and reason as well as humans.
Please feel free to connect with me on LinkedIn at: http://linkedin.com/in/joyjitchatterjee/
Note: All views in this article are my personal opinions, and do not reflect the views of any other persons or organisations.