Fantastic essay. Highly recommended! I agree with all key points: \* There are p...

dragonwriter · on April 27, 2024

> There are problems that are easy for human beings but hard for current LLMs (and maybe impossible for them; no one knows). Examples include playing Wordle and predicting cellular automata (including Turing-complete ones like Rule 110). We don’t fully understand why current LLMs are bad at these tasks.

I thought we did know for things like playing Wordle, that its because they deal with words as sequence of tokens that correspond to whole words not sequences of letters, so a game that involves dealing with sequences of letters constrained to those that are valid words doesn’t match the way they process information?

> Providing an LLM with examples and step-by-step instructions in a prompt means the user is figuring out the “reasoning steps” and handing them to the LLM, instead of the LLM figuring them out by itself. We have “reasoning machines” that are intelligent but seem to be hitting fundamental limits we don’t understand.

But providing examples with different, contextually-appropriate sets of reasoning steps results can enable the model to choose its own, more-or-less appropriate, set of reasoning steps for particular questions not matching the examples.

> It’s unclear if better prompting and bigger models using existing attention mechanisms can achieve AGI.

Since there is no objective definition of AGI or test for it, there’s no basis for any meaningful speculation on what can or cannot achieve it; discussions about it are quasi-religious, not scientific.

rainsford · on April 27, 2024

Arriving at a generally accepted scientific definition of AGI might be difficult, but a more achievable goal might be to arrive at a scientific way to determine something is not AGI. And while I'm not an expert in the field, I would certainly think a strong contender for relevant criteria would be an inability to process information in a way other than the one a system was explicitly programmed to, even if the new way of processing information was very related to the pre-existing method. Most humans playing Wordle for the first time probably weren't used to thinking about words that way either, but they were able to adapt because they actually understand how letters and words work.

I'm sure one could train an LLM to be awesome at Wordle, but from an AGI perspective the fact that you'd have to do so proves it's not a path to AGI. The Wordle dominating LLM would presumably be perplexed by the next clever word game until trained on thinking about information that way, while a human doesn't need to absorb billions of examples to figure it out.

I was originally pretty bullish on LLMs, but now I'm equally convinced that while they probably have some interesting applications, they're a dead-end from a legitimate AGI perspective.

Al-Khwarizmi · on April 28, 2024

An LLM doesn't even see individual letters at all, because they get encoded into tokens before they are passed as input to the model. It doesn't make much sense to require reasoning with things that aren't even in the input as a requisite for intelligence.

That would be like an alien race that could see in an extra dimension, or see the non-visible light spectrum, presenting us with problems that we cannot even see and saying that we don't have AGI when we fail to solve them.

scoot · on April 28, 2024

And yet ChatGPT 3.5 can tell me the nth letter of an arbitrary word…

Al-Khwarizmi · on April 28, 2024

I have just tried and it indeed does get it right quite often, but if the word is rare (or made up) and the position is not one of the first, it often fails. And GPT-4 too.

I suppose if it can sort of do it is because of indirect deductions from training data.

I.e. maybe things like "the third letter of the word dog is d", or "the word d is composed of the letters d, o, g" are in the training data; and from there it can answer questions not only about "dog", but probably about words that have "dog" as their first subtoken.

Actually it's quite impressive that it can sort of do it taking into account that, as I mention, characters are just outright not in the input. It's ironic that people often use these things as an example of how "dumb" the system is when it's actually amazing that it can sometimes work around that limitation.

weebull · on April 28, 2024

...because it knows that the next token in the sequence "the 5th letter in the word _illusion_ is" happens to be "s". Not because it decomposed the word into letters.

scoot · on April 29, 2024

It seems unlikely that such sequences exist for the majority of words. And I asked in English about Portuguese words.

LivenessModel · on April 28, 2024

And yet GPT4 still can't reliably tell me if a word contains any given letter.

lukan · on April 28, 2024

"they're a dead-end from a legitimate AGI perspective"

Or another piece of the puzzle to achieve it. It might not be one true path, but a clever combination of existing working pieces where (different) LLMs are one or some of those pieces.

I believe there is also not only one way of thinking in the human brain, but my thought processes happen on different levels and maybe based on different mechanism. But as far as I know, we lack details.

JoshuaDavid · on April 28, 2024

What about an LLM that can't play wordle itself without being trained on it, but can write and use a wordle solver upon seeing the wordle rules?

I think "can recognize what tools are needed to solve a problem, build those tools, and use those tools" would count as a "path to AGI".

throwthrowuknow · on April 28, 2024

LLMs can’t reason but neither can the part of your brain that automatically completes the phrase “the sky is…”

nathan_compton · on April 28, 2024

"Since there is no objective definition of AGI or test for it, there’s no basis for any meaningful speculation on what can or cannot achieve it; discussions about it are quasi-religious, not scientific."

This is such a weird thing to say. Essentially _all_ scientific ideas are, at least to begin with, poorly defined. In fact, I'd argue that almost all scientific ideas remain poorly defined with the possible exception of _some_ of the basic concepts in physics. Scientific progress cannot be and is not predicated upon perfect definitions. For some reason when the topic of consciousness or AGI comes up around here, everyone commits a sort of "all or nothing" logical fallacy: absence of perfect knowledge is cast as total ignorance.

eru · on April 28, 2024

Yes. That absence of perfect definition was part of why Turing came with his famous test so long ago. His original paper is a great read!

Eisenstein · on April 28, 2024

What is the rough definition, then?

Etherlord87 · on April 28, 2024

Sam Harris argues similarly in The Moral Landscape. There's this conception objective morality cannot exist outside of religion, because as soon as you're trying to prove one, philosophers rush with pedantic criticism that would render any domain of science invalid.

nathan_compton · on April 29, 2024

I kinda get where Sam Harris is coming from, but its kind of silly to call what he is talking about morality. As far as I can tell, Harris is just a moral skeptic who believes something like "we should get a bunch of people together to decide kind of what we want in the world and then rationally pursue those ends." But that is very different from morality as it was traditionally understood (eg, facts about behaviors which are objective in their assignment of good and bad).

jncfhnb · on April 27, 2024

I think one should feel comfortable arguing that AGI must be stateful and experience continuous time at least. Such that a plain old LLM is definitively not ever going to be AGI; but an LLM called in a do while true for loop might.

PopePompus · on April 28, 2024

I don't understand why you believe it must experience continuous time. If you had a system which clearly could reason, which could learn new tasks on its own, which didn't hallucinate any more than humans do, but it was only active for the period required for it to complete an assigned task, and was completely dormant otherwise, why would that dormant period disqualify it as AGI? I agree that such a system should probably not be considered conscious, but I think it's an open question whether or not consciousness is required for intelligence.

jncfhnb · on April 28, 2024

Active for a period is still continuous during that period.

As opposed to “active when called”. A function, being called repeatedly over a length of time is reasonably “continuous” imo

PopePompus · on April 28, 2024

I don't see what the difference between "continuous during that period" and "active when called" is. When an AI runs inference, that calculation takes time. It is active during the entire interval during which it is responding to the prompt. It is then inactive until the next prompt. I don't see why a system can't be considered intelligent merely because its activity is intermittent.

jncfhnb · on April 28, 2024

The calculation takes time but the inference is from a single snapshot so it is effectively a single transaction of input to output. An intelligent entity is not a transactional machine. It has to a working system.

That system might be as simple as calling the transactional machine ever few seconds. That might pass the threshold. But then your AGI is the broader setup, not just the LLM.

But the transactional machine is certainly not an intelligent entity. Much like a brain in a jar or a cryostasis’d human.

Suppose we could perfectly simulate a human mind in a way that everyone finds compelling. We would still not call that simulated human mind an intelligent entity unless it was “active”.

kaibee · on April 28, 2024

I think its note worthy that humans actually fail this test... We have to go dormant for 8 hours every day.

Hunpeter · on April 28, 2024

Yes, but our brain is still working and processing information at those times as well, isn't it? Even if not in the same way as it does when we're conscious.

PopePompus · on April 28, 2024

What about general anesthesia? I had a major operation during which most of my brain was definitely offline for at least 8 hours.

autoexec · on April 29, 2024

Anesthesia shouldn't take your brain offline. It just makes you unconscious, paralyzes you, and gives you amnesia. Your brain is still active under general anesthesia. What you were thinking or feeling for those 8 hours was just forgotten.

crest · on April 29, 2024

[citation needed].

autoexec · on April 30, 2024

You might try https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8054915/ which states: "General anesthesia is characterized by loss of consciousness, amnesia, analgesia, and immobility." and further down shows brain activity recorded while under anesthesia via EEG. The paper looks at the differences and similarities of brain activity while under anesthesia and sleep. This is only possible because, however changed or slowed by it, the brain is still active while under anesthesia

pixl97 · on April 27, 2024

A consistent stateful experience may be needed, but not sure about continuous time. I mean human consciousness doesn't do that.

haswell · on April 27, 2024

Human consciousness does though, e.g. the flow state. F1 drivers are a good example.

We tend to not experience continuous time because we repeatedly get distracted by our thoughts, but entering the continuous stream of now is possible with practice and is one of the aims of many meditators.

int_19h · on April 28, 2024

Human consciousness is capable of it, but since most humans aren't in it much of the time, it would appear that it's not a prerequisite for true sentience.

krisoft · on April 28, 2024

What does it mean to “experience continous time”?

How do you know that F1 drivers experience it?

jncfhnb · on April 27, 2024

I would argue it needs to be at least somewhat continuous. Perhaps discrete on some granularity but if something is just a function waiting to be called it’s not an intelligent entity. The entity is the calling itself.

adrianN · on April 28, 2024

I try my best not to experience continuous time for at least eight hours a day.

jncfhnb · on April 28, 2024

Then for at least eight hours a day you don’t qualify as a generally intelligent system.

card_zero · on April 28, 2024

If I spend some amount of the day bathing, some amount of it scratching, some amount of it thinking vaguely about racoons without any clear conclusions, and a lot of it drinking tea, I wonder how many seconds remain during which I qualified as generally intelligent.

jncfhnb · on April 28, 2024

I feel you qualify during all of those waking seconds

card_zero · on April 28, 2024

Racoons are said to be intelligent because they're good at opening locks. On the other hand, when they have food and are within ten feet of a pool of water, they will dip the food in the water and rub it between their paws for no reason. They can reason about the locks, but not about the food. Meanwhile, I in theory can reason about anything, but in practice I wouldn't count on it. Whereas an LLM can't reason, but it's very sharp and always ready to react appropriately.

jncfhnb · on April 28, 2024

I’m not familiar with this raccoon behavior but it sure doesn’t sound like it’s done without reason.

An LLM is never ready to react to anything because it’s just a matrix that needs a higher level system to invoke it.

naasking · on April 28, 2024

Some good prompt-reply interactions are probably fed back in to subsequent training runs, so they're still stateful/have memory in a way, there's just a long delay.

jncfhnb · on April 28, 2024

That’s not the AGI’s state. That’s just some past information.

naasking · on April 28, 2024

State is a function of accumulated past information.

jncfhnb · on April 28, 2024

State is a function of accumulated past. That does not mean that having some past written down makes you stateful. A stateful thing has to incorporate the ongoing changes.

naasking · on April 28, 2024

Which is what I described: some successful prompt-replies are fed back into subsequent training runs.

jncfhnb · on April 28, 2024

No… that implies the model never has active state and is being replaced with a different, stateless model. This is similar to the difference between

Actor.happy = True

And

Actor = happier(Actor)

naasking · on April 28, 2024

Both of your examples are stateful systems from the outside, given a suitable choice of timeframe, the latter one is just how purely functional systems represent state. Theoretically they can simulate each other, and the endpoint you use to access Actor will still reference the latest Actor. The only reason you're calling them different is because you insist on using a specific timeframe to exclude considering one as stateful, and I'm pointing out that that isn't strictly necessary.

jncfhnb · on April 28, 2024

True but saying “subsequent training” implies very long periods between Updates.

We do not train LLMs to update them to the state of a conversation.

wongarsu · on April 27, 2024

You could imagine an LLM being called in a loop with a prompt like

You observe: {new input}

You remember: {from previous output}

React to this in the following format:

My inner thoughts: [what do you think about the current state]

I want to remember: [information that is important for your future actions]

Things I do: [Actions you want to take]

Things I say: [What I want to say to the user]

...

Not sure if that would qualify as an AGI as we currently define it. Given a sufficiently good LLM with good reasoning capabilities such a setup might be able to It would be able to do many of the things we currently expect AGIs to be able to do (given a sufficiently good LLM with good reasoning capabilities), including planning and learning new knowledge and new skills (by collecting and storing positive and negative examples in its "memory"). But its learning would be limited, and I'm sure as soon as it exists we would agree that it's not AGI

sophiabits · on April 28, 2024

This already exists (in a slightly different prompt format); it's the underlying idea behind ReAct: https://react-lm.github.io

As you say, I'm skeptical this counts as AGI. Although I admit that I don't have a particularly rock solid definition of what _would_ constitute true AGI.

marquisdepolis · on April 29, 2024

(Author here). I tried creating something similar in order to solve wordle etc, and the interesting part is that it is insufficient still. That's part of the mystery.

lgas · on April 28, 2024

It works better to give it access to functions to call for actions and remembering stuff, but this approach does provide some interesting results.

layer8 · on April 27, 2024

Regarding Wordle, it should be straightforward to make a token-based version of it, and I would assume that that has been tried. It seems the obvious thing to do when one is interested in the reasoning abilities necessary for Wordle.

furyofantares · on April 28, 2024

That doesn't seem straightforward - although it's blind to letters because all it sees are tokens, it doesn't have much training data ABOUT tokens.

baobabKoodaa · on April 28, 2024

What parent is saying is that instead of asking the LLM to play a game of Wordle with tokens like TIME,LIME we ask it to play with tokens like T,I,M,E,L. This is easy to do.

furyofantares · on April 28, 2024

And if you tell it to think up a word that has an E in position 3 and an L that's somewhere in the word but not in position 2, it's not going to be any better at that if you tell it to answer one letter at a time.

layer8 · on April 28, 2024

The idea is, instead of five-letter-words, play the game with five-token-words.

furyofantares · on April 28, 2024

That was my original interpretation, and while all it sees are tokens, roughly none of its training data is metadata about tokenizing. It knows far less about the positions of tokens in words than it does about the positions of letters in words.

layer8 · on April 28, 2024

I’m not sure that training data about that would be required. Shouldn’t the model be able to recognize that `["re", "cogn", "ize"]` represents the same sequence of tokens as `recognize`, assuming those are tokens in the model?

More generally, would you say that LLMs are generally unable to reason about sequences of items (not necessarily tokens) and compare them to some definition of “valid” sequences that would arise from the training corpus?

svachalek · on April 28, 2024

No. In the model, tokens are random numbers. But if you consider a sentence to be a sequence of words, you can say that LLMs are quite competent about reasoning about those sequences.

baobabKoodaa · on April 28, 2024

ChatGPT is able to spell the word "recognize" when asked.

So it is able to take a sequence of tokens ["recogn", "ize"] and transform it into a sequence of tokens [" R", " E", " C", " O", " G", " N", " I", " Z", " E"]

weitendorf · on April 28, 2024

> There are problems that are easy for human beings but hard for current LLMs (and maybe impossible for them; no one knows). Examples include playing Wordle and predicting cellular automata (including Turing-complete ones like Rule 110). We don't fully understand why current LLMs are bad at these tasks.

Wordle and cellular automata are very 2D, and LLMs are fundamentally 1D. You might think "but what about Chess!" - except Chess is encoded extremely often as a 1D stream of tokens to notate games, and bound to be highly represented in LLMs' training sets. Wordle and cellular automata are not often, if ever, encoded as 1D streams of tokens - it's not something an LLM would be experienced with even if they had a reasonable "understanding" of the concepts. Imagine being an OK chess player, being asked to play a game blindfolded dictating your moves purely via notation, and being told you suck.

> Providing an LLM with examples and step-by-step instructions in a prompt means the user is figuring out the "reasoning steps" and handing them to the LLM, instead of the LLM figuring them out by itself. We have "reasoning machines" that are intelligent but seem to be hitting fundamental limits we don't understand.

You have probably heard of this really popular game called Bridge before, right? You might even be able to remember tons of advice your Grandma gave you based on her experience playing it - except she never let you watch it directly. Is Grandma "figuring out the game" for you when she finally sits down and teaches you the rules?

papichulo2023 · on April 28, 2024

Not an authority in the matter, but afaik, with position encodings (part of the Transformers architecture), they can handle dimensionality just fine. Actually some people tried to do 2D Transformers and the results were the same.

Visual transformers are gaining traction and they are 100% focus in 2d data.

Sleepful · on April 28, 2024

Since when can LLM play chess? It can't understand it at all. You would have to filter out all the invalid moves until it spits a valid one.

cs702 · on April 27, 2024

As an aside, at one point I experimented a little with transformers that had access to external memory searchable via KNN lookups https://github.com/lucidrains/memorizing-transformers-pytorc... (great work by lucidrains) or via routed queries with https://github.com/glassroom/heinsen_routing (don't fully understand it; apparently related to attention). Both approaches seemed to work, but I had to put that work on hold for reasons outside my control.

Also as an aside, I'll add that transformers can be seen as a kind of "RNN" that grows its hidden state with each new token in the input context. I wonder if we will end up needing some new kind of "RNN" that can grow or shrink its hidden state and also access some kind of permanent memory as needed at each step.

We sure live in interesting times!

esafak · on April 27, 2024

> transformers that had access to external memory searchable via KNN lookups

This is common, and commonly called retrieval augmented generation, or RAG.

edit: I did not pay attention to the link. It is about Wu et al's "Memorizing Transformers", which contain an internal memory.

cs702 · on April 27, 2024

No. RAG is about finding relevant documents/paragraphs (via KNN lookups of their embeddings) and then inserting those documents/paragraphs into the input context, as sequences of input tokens. What I'm talking about is different: https://arxiv.org/abs/2203.08913

int_19h · on April 28, 2024

I don't think the ability to shrink state is needed. You can always represent removed state by additional state that represents deletion of whatever preceding state was there. If anything, this sounds more useful because the fact that this state is no longer believed to be relevant should prevent looping (where it would be repeatedly brought in, considered, and rejected).

cs702 · on April 28, 2024

> You can always represent removed state by additional state that represents deletion of whatever preceding state was there.

Good point. Thank you!

_wire_ · on April 27, 2024

>We don't fully understand why current LLMs are bad at these tasks.

In complete seriousness, can anyone can explain why LLMs are good at some tasks?

SomeCallMeTim · on April 27, 2024

LLMs are good at tasks that don't require actual understanding of the topic.

They can come up with excellent (or excellent-looking-but-wrong) answers to any question that their training corpus covers. In a gross oversimplification, the "reasoning" they do is really just parroting a weighted average (with randomness injected) of the matching training data.

What they're doing doesn't really match any definition of "understanding." An LLM (and any current AI) doesn't "understand" anything; it's effectively no more than a really big, really complicated spreadsheet. And no matter how complicated a spreadsheet gets, it's never going to understand anything.

Not until we find the secret to actual learning. And increasingly it looks like actual learning probably relies on some of the quantum phenomena that are known to be present in the brain.

We may not even have the science yet to understand how the brain learns. But I have become convinced that we're not going to find a way for digital-logic-based computers to bridge that gap.

jwells89 · on April 28, 2024

This is also why image generating models struggle to correctly draw highly variable objects like limbs and digits.

They’ll be able to produce infinite good looking cardboard boxes, because those are simple enough to be represented reasonably well with averages of training data. Limbs and digits on the other hand have nearly limitless different configurations and as such require an actual understanding (along with basic principles such as foreshortening and kinetics) to be able to draw well without human guidance.

grobgambit · on April 28, 2024

I would just add that I think I have encountered situations that knowing the weighted average answer from the training data for topics I didn't previously understand created better initial conditions for MY learning of the topic than not knowing the weighted average answer.

The problem to me is we are holding LLMs to a standard of usefulness from science fiction and not reality.

A new, giant set of encyclopedias has enormous utility but we wouldn't hold it against the encyclopedias that they aren't doing the thinking for us or 100% omniscient.

naasking · on April 28, 2024

> What they're doing doesn't really match any definition of "understanding."

What is the mechanistic definition of "understanding"?

throwthrowuknow · on April 28, 2024

What is your definition of understanding?

Please show me where the training data exists in the model to perform this lookup operation you’re supposing. If it’s that easy I’m sure you could reimplement it with a simple vector database.

Your last two paragraphs are just dualism in disguise.

Etherlord87 · on April 28, 2024

I'm far from being an expert on AI models, but it seems you lack the basic understanding of how these models work. They transform data EXACTLY like spreadsheets do. You can implement those models in Excel, assuming there's no row or column limit (or that it's high enough) - of course it will be much slower than the real implementations, but OP is right - LLMs are basically spreadsheets.

Question is, wouldn't a brain qualify as a spreadsheet, do we know it can't be implemented as one? Well, maybe not, I'm not an expert on spreadsheets either, but I think spreadsheets don't allow you circular references, and brain does, you can have feedback loops in the brain. So even if the brain doesn't have something still not understood by us, that OP suggests, it still is more powerful than AI.

BTW, this is one explanation on why AI fails at some tasks: ask AI if two words rhyme and it will be quite reliable on that. But ask it to give you word pairs that rhyme, and it will fail, because it won't run an internal loop trying some words and checking if they succeed to rhyme or not. If some AI actually succeeds at rhyming, it would do so either because it's trained to contain such word pairs from the get-go or because it's implemented to have multiple passes or something...

throwthrowuknow · on April 28, 2024

You can implement Doom in a spreadsheet too, so what? That wasn’t the point op or I were making. If you bother to read the sentence before op talks about spreadsheets they are making the conjecture that LLMs are lookup tables operating on the corpus they were trained on. That is the aspect of spreadsheets they were comparing them to, not the fact that spreadsheets can be used to implement anything that any other programming language can. Might as well say they are basically just arrays with some functions in between, yeah no shit.

Which LLMs can’t produce rhyming pairs? Both the current ChatGPT 3.5 and 4 seem to be able to generate as many as I ask for. Was this a failure mode at some point?

lossolo · on April 28, 2024

> Which LLMs can’t produce rhyming pairs? Both the current ChatGPT 3.5 and 4 seem to be able to generate as many as I ask for

Only in english. If they would understand language and rhymes they would do it in every other language it knows, It can't in my language while it can speak in it fluently. It just fails. And fails in so many other areas, I'm using LLMs daily for work and other stuff and if you use them long enough you will see that they are statistical machines not intelligent entities.

singron · on April 28, 2024

People are confusing the limited computational model of a transformer with the "Chinese room argument", which leads to unproductive simultaneous debates of computational theory and philosophy.

SomeCallMeTim · on May 3, 2024

I'm not confusing anything. I'm familiar with the Chinese Room Argument and I know how LLMs work.

What I'm saying is arguably philosophically related, in that I'm saying the LLM's model is analogous to the "response book" in the room. It doesn't matter how big the book is; if the book never changes, then no learning can happen. If no learning can happen, then understanding, a process that necessarily involves active reflection on a topic, can exist.

You simply can't say a book "understands" anything. To understand is to contemplate and mentally model a topic to the point where you can simulate it, at least at a high level. It's dynamic.

An LLM is static. It can simulate a dynamic response by having multiple stages that dig through an multiple insanely large books of instructions that cross reference each other and that involve calculations and bookmarks and such to come up with a result--but the books never change as part of the conversation.

lossolo · on April 28, 2024

Transformer is not a simple vector database doing simple lookup operation. It's doing lookup operation on a pattern, not a word. It learns patterns from the dataset. If your pattern is not there it will hallucinate or give you the wrong answer like GPT4 and Opus gave me hundreds of times already.

tibbydudeza · on April 28, 2024

>> quantum phenomena

You mean like the microtubles of Roger Penrose ???.

https://www.youtube.com/watch?v=jG0OpvudA10

danenania · on April 27, 2024

> the "reasoning" they do is really just parroting a weighted average (with randomness injected) of the matching training data

Perhaps our brains are doing exactly the same, just with more sophistication?

SomeCallMeTim · on April 28, 2024

No.

We know how current deep learning neural networks are trained.

We know definitively that this is not how brains learn.

Understanding requires learning. Dynamic learning. In order to experience something, an entity needs to be able to form new memories dynamically.

This does not happen anywhere in current tech. It's faked in some cases, but no, it doesn't really happen.

danenania · on April 28, 2024

> We know definitively that this is not how brains learn.

Ok then, I guess the case is closed.

> an entity needs to be able to form new memories dynamically.

LLMs can form new memories dynamically. Just pop some new data into the context.

SomeCallMeTim · on April 30, 2024

> LLMs can form new memories dynamically. Just pop some new data into the context.

No, that's an illusion.

The LLM itself is static. The recurrent connections form a soft-of temporary memory that doesn't affect the learned behavior of the network at all.

I don't get why people who don't understand what's happening keep arguing that AIs are some sci-fi interpretation of AI. They're not. At least not yet.

danenania · on April 30, 2024

It isn't temporary if you keep it permanently in context (or in a RAG store) and pass it into every model call, which is how long-term memory is being implemented both in research and in practice. And yes it obviously does affect the learned behavior. The distinction you're making between training and context is arbitrary.

naasking · on April 28, 2024

> We know definitively that this is not how brains learn.

So you have mechanistic, formal model of how the brain functions? That's news to me.

Scarblac · on April 28, 2024

Your brain was first trained by reading all of the Internet?

Anyway, the question of whether computers can think is as interesting as the question whether submarines can swim.

naasking · on April 28, 2024

> Anyway, the question of whether computers can think is as interesting as the question whether submarines can swim.

Given the amount of ink spilled on the question, gotta disagree with you there.

Scarblac · on April 29, 2024

For the record, that wasn't me, it's a famous quote from Edsger Dijkstra.

iraqmtpizza · on April 28, 2024

Endless ink has been spilled on the most banal and useless things. Deconstructing ice cream and physical beauty from a Marxist-feminist race-conscious postmodern perspective.

naasking · on April 28, 2024

Except one is clearly a niche question, and the other has repeatedly captured the world's imagination and spilled orders of magnitude more ink.

Etherlord87 · on April 28, 2024

Is it interesting to ponder if the Earth is flat?

SomeCallMeTim · on April 30, 2024

There's no way brains have the "right answers" fed into them as required by backpropagation.

naasking · on April 30, 2024

Look up predictive coding. Our senses are constantly feeding us corrections to our predictions.

xanderlewis · on April 28, 2024

Every single discussion of ‘AGI’ has endless comments exactly like this. Whatever criticism is made of an attempt to produce a reasoning machine, there’s always inevitably someone who says ‘but that’s just what our brains do, duhhh… stop trying to feel special’.

It’s boring, and it’s also completely content-free. This particular instance doesn’t even make sense: how can it be exactly the same, yet more sophisticated?

Sorry.

adrianN · on April 28, 2024

The problem is that we currently lack good definitions for crucial words such as "understanding" and we don't know how brains work, so that nobody can objectively tell whether a spreadsheet "understands" anything better than our brains. That makes these kinds of discussions quite unproductive.

xanderlewis · on April 28, 2024

I can’t define ‘understanding’ but I can certainly identify a lack of it when I see it. And LLM chatbots absolutely do not show signs of understanding. They do fine at reproducing and remixing things they’ve ‘seen’ millions of times before, but try asking them technical questions that involve logical deduction or an actual ability to do on-the-spot ‘thinking’ about new ideas. They fail miserably. ChatGPT is a smooth-talking swindler.

I suspect those who can’t see this either

(a) are software engineers amazed that a chatbot can write code, despite it having been trained on an unimaginably massive (morally ambiguously procured) dataset that probably already contains something close to the boilerplate you want anyway

(b) don’t have the sufficient level of technical knowledge to ask probing enough questions to betray the weaknesses. That is, anything you might ask is either so open-ended that almost anything coherent will look like a valid answer (this is most questions you could ask, outside of seriously technical fields) or has already been asked countless times before and is explicitly part of the training data.

danenania · on April 28, 2024

Your understanding of how LLMs work isn’t at all accurate. There’s a valid debate to be had here, but it requires that both sides have a basic understanding of the subject matter.

xanderlewis · on April 28, 2024

How is it not accurate? I haven’t said anything about the internal workings of an LLM — just what it able to produce (which is based on observation).

I have more than a basic understanding of the subject matter (neural networks; specifically transformers, etc.). It’s actually not a hugely technical field.

By the way, it appears that you are in category (a).

danenania · on April 28, 2024

You don’t know what they’re able to produce because you clearly don’t know how they actually work. So your “observations” are not worth much.

xanderlewis · on April 28, 2024

Yes I do, right down to the technical details. What makes you think I don’t? Is it because I used the word ‘remixing’?

danenania · on April 28, 2024

As the comment I replied to very correctly said, we don’t know how the brain produces cognition. So you certainly cannot discard the hypothesis that it works through “parroting” a weighted average of training data just as LLMs are alleged to do.

Considering that LLMs with a much smaller number of neurons than the brain are in many cases producing human-level output, there is some evidence, if circumstantial, that our brains may be doing something similar.

iraqmtpizza · on April 28, 2024

LLMs don't have neurons. That's just marketing lol.

"A neuron in a neural network typically evaluates a sequence of tokens in one go, considering them as a whole input." -- ChatGPT

You could consider an RTX 4090 to be one neuron too.

danenania · on April 28, 2024

It’s almost as if ‘neuron’ has a different meaning in computer science than biology.

iraqmtpizza · on April 28, 2024

LOL you just owned the guy who said "LLMs with a much smaller number of neurons than the brain are in many cases producing human-level output"

xanderlewis · on April 28, 2024

> in many cases producing human-level output

They’re not, unless you blindly believe OpenAI press releases and crypto scammer AI hype bros on Twitter.

zer00eyz · on April 27, 2024

Yes:

An LLM isnt a model of human thinking.

An LLM is an attempt to build a simulation of human communication. An LLM is to language what a forecast is to weather. No amount of weather data is actually going to turn that simulation into snow, no amount of LLM data is going to create AGI.

That having been said, better models (smaller, more flexible ones) are going to result in a LOT of practical uses that have the potential to make our day to day lives easier (think digital personal assistant that has current knowledge).

choeger · on April 27, 2024

Great comment. Just one thought: Language, unlike weather, is meta-circular. All we know about specific words or sentences is again encoded in words and sentences. So the embedding encodes a subset of human knowledge.

Hence, a LLM is predicting not only language but language with some sort of meaning.

zer00eyz · on April 27, 2024

That re-embeding is also encoded in weather. It is why perfect forecasting is impossible, why we talk about the butterfly effect.

The "hallucination problem" is simply the tyranny of Lorenz... one is not sure if a starting state will have a good outcome or swing wildly. Some good weather models are based on re-runing with tweaks to starting params, and then things that end up out of bounds can get tossed. Its harder to know when a result is out of bounds for an LLM, and we dont have the ability to run every request 100 times through various models to get an "average" output yet... However some of the reuse of layers does emulate this to an extent....

red75prime · on April 28, 2024

Ugh. Really? Those "simulated water isn't wet"(when applied to cognition) "arguments" were punched so many times it even hurts to look at them.

zer00eyz · on April 28, 2024

No simulated water isnt wet.

But an LLM isn't even trying to simulate cognition. It's a model that is predicting language. It has all the problems of a predictive model... the "hallucination" problem is just the tyranny of Lorenz.

adrianN · on April 28, 2024

We don't really know what "cognition" is, so it's hard to tell whether a system is doing it.

lostmsu · on April 28, 2024

This is plain wrong due to mixing of concepts. Language is technically something from Chomsky hierarchy. Predicting language is being able to tell if input is valid or invalid. LLMs do that, but they also build a statistical model across all valid inputs, and that is not just the language.

zer00eyz · on April 28, 2024

>> Predicting language is being able to tell if input is valid or invalid.

If this were the case then the hallucination problem would be solvable.

That hallucination problem is not only going to be hard to detect in any meaningful way but it's going to be harder to eliminate. The very nature of LLM (mixing in noise aka temperature) means that they always risk going off the rails. This is the same thing Lorenz discovered in modeling weather...

lostmsu · on April 30, 2024

I don't think that "hallucination problem" is a problem at all worth addressing separately from just building bigger/better models that do the same thing. Because 1) it is present in humans, 2) it is clear bigger models have less of it than smaller models. If at scale nothing changes LLMs will eventually just hallucinate less than humans.

richardw · on April 27, 2024

LLM’s are a compressed and lossy form of our combined writing output, which it turns out is similarly structured enough to make new combinations of text seem reasonable, even enough to display simple reasoning. I find it useful to think “what can I expect from speaking with the dataset of combined writing of people”, rather than treating a basic LLM as a mind.

That doesn’t mean we won’t end up approximating one eventually, but it’s going to take a lot of real human thinking first. For example, ChatGPT writes code to solve some questions rather than reasoning about it from text. The LLM is not doing the heavy lifting in that case.

Give it (some) 3D questions or anything where there isn’t massive textual datasets and you often need to break out to specialised code.

Another thought I find useful is that it considers its job done when it’s produced enough reasonable tokens, not when it’s actually solved a problem. You and I would continue to ponder the edge cases. It’s just happy if there are 1000 tokens that look approximately like its dataset. Agents make that a bit smarter but they’re still limited by the goal of being happy when each has produced the required token quota, missing eg implications that we’d see instantly. Obviously we’re smart enough to keep filling those gaps.

tobiasSoftware · on April 27, 2024

"I find it useful to think “what can I expect from speaking with the dataset of combined writing of people”, rather than treating a basic LLM as a mind."

I've been doing this as well, mentally I think of LLMs as the librarians of the internet.

pbhjpbhj · on April 27, 2024

They're bad librarians. They're not bad, they do a bad job of being librarians, which is a good thing! They can't quite tell you the exact quote, but they do recall the gist, they're not sure it was Gandhi who said that thing but they think he did, it might be in this post or perhaps one of these. They'll point you to the right section of the library to find what you're after, but make sure you verify it!

marquisdepolis · on April 29, 2024

They are librarians, just that it happens to be the library of Babel.

piannucci · on April 28, 2024

Book golems

HarHarVeryFunny · on April 27, 2024

I'd guess because the Transformer architecture is (I assume) fairly close to the way that our brain learns and produces language - similar hierarchical approach and perhaps similar type of inter-embedding attention-based copying?

Similar to how CNNs are so successful at image recognition, because they also roughly follow the way we do it too.

Other seq-2-seq language approaches work too, but not as good as Transformers, which I'd guess is due to transformers better matching our own inductive biases, maybe due to the specific form of attention.

j16sdiz · on April 27, 2024

> why LLMs are good at some tasks?

Like how we explain human doing tasks -- they are evolved to do that.

I believe this is a non-answer, but if we are satisfied with that non answer for human, why not LLMs?

layer8 · on April 27, 2024

I would argue that we are not satisfied with that answer for humans either.

pbhjpbhj · on April 27, 2024

If you look at transfer learning, I think that is a useful point at which to understand task-specific application and hence why LLMs excel at some tasks and not others.

Tasks are specialised for using the training corpus, the attention mechanisms, the loss functions, and such.

I'll leave it to others to expand on actual answers, but IMO focusing on transfer learning helps to understand how an LLM does inferences.

ccppurcell · on April 27, 2024

I would argue that the G in AGI means it can't require better prompting.

CamperBob2 · on April 27, 2024

We should probably draw a distinction between a human-equivalent G, which certainly can require better prompting (why else did you go to school?!) and god-equivalent G, which never requires better prompting.

Just using the term 'General' doesn't seem to communicate anything useful about the nature of intelligence.

ccppurcell · on April 28, 2024

School is not better prompting, it's actually the opposite! It's learning how to deal with poorly formed prompts!

dragonwriter · on April 27, 2024

That would like saying that because humans’ output can be better or worse based on better or worse past experience (~prompting, in that it is the source of the equivalent of “in-context learning”), humans lack general intelligence.

coffeebeqn · on April 28, 2024

This is more like the distinction of a Jr and Sr dev. One needs the tasks the be pre-chewed and defined “good prompts” while the latter can deal with very ambiguous problems

dragonwriter · on April 29, 2024

The entirety of a human's experience is the “prompt”. Current LLMs rely on the analog of instinct (pre-context in-built training) a lot more than humans for their behavior because they have itty bitty tiny context windows, but humans have really big context windows for in-context learning.

ccppurcell · on April 28, 2024

No, it's saying that I have general intelligence in part because I am able to reason about vague prompts

ianbicking · on April 27, 2024

"Providing an LLM with examples and step-by-step instructions in a prompt means the user is figuring out the "reasoning steps" and handing them to the LLM, instead of the LLM figuring them out by itself. We have "reasoning machines" that are intelligent but seem to be hitting fundamental limits we don't understand."

One thing an LLM _also_ doesn't bring to the table is an opinion. We can push it in that direction by giving it a role ("you are an expert developer" etc), but it's a bit weak.

If you give an LLM an easy task with minimal instructions it will do the task in the most conventional, common sense fashion. And why shouldn't it? It has no opinion, your prompt doesn't give it an opinion, so it just does the most normal-seeming thing. If you want it to solve the task in any other way then you have to tell it to do so.

I think a hard task is similar. If you don't tell the LLM _how_ to solve the hard task then it will try to approach it in the most conventional, common sense way. Instead of just boring results for a hard task the result is often failure. But hard problems approached with conventional common sense will often result in failures! Giving the LLM a thought process to follow is a quick education on how to solve the problem.

Maybe we just need to train the LLM on more problem solving? And maybe LLMs worked better when they were initially trained on code for exactly that reason, it's a much larger corpus of task-solving examples than is available elsewhere. That is, maybe we don't talk often enough and clearly enough about how to solve natural language problems in order for the models to really learn those techniques.

Also, as the author talks about in the article with respect to agents, the inability to rewind responses may keep the LLM from addressing problems in the ways humans do, but that can also be addressed with agents or multi-prompt approaches. These approaches don't seem that impressive in practice right now, but maybe we just need to figure it out (and maybe with better training the models themselves will be better at handling these recursive calls).

int_19h · on April 28, 2024

LLMs absolutely do have opinions. Take a large enough base model and have it chat without a system prompt, and it will have an opinion on most things - unless this was specifically trained out of it through RLHF, as is the case for all commonly used chatbots.

And yes, of course, that opinion is going to be the "average" of what their training data is, but why is that a surprise? Humans don't come with innate opinions, either - the ones that we end up having are shaped by our upbringing, both the broad cultural aspects of it and specific personal experiences. To the extent an LLM has either, it's the training process, so of course that shapes the opinions it will exhibit when not prompted to do anything else.

Now the fact that you can "override" this default persona of any LLM so trivially by prompting it is IMO stronger evidence that it's not really an identity. But that, I think, is also a function of their training - after all, that training basically consists of completing a bunch of text representing many very different opinions. In a very real sense, we're training models to assume that opinions are fungible. But if you take a model and train it specifically on e.g. writings of some philosophical school, and it will internalize those.

krainboltgreene · on April 28, 2024

I am extremely alarmed by the number of HN commenters who apparently confuse "is able to generate text that looks like" and "has a", you guys are going crazy with this anthropomorphization of a token predictor. Doesn't this concern you when it comes to phishing or similar things?

I keep hoping it's just short-hand conversation phrases, but the conclusions seem to back the idea that you think it's actually thinking?

naasking · on April 28, 2024

Do you have mechanistic model for what it means to think? If not, how do you know thinking isn't equivalent to sophisticated next token prediction?

krainboltgreene · on April 28, 2024

How do you know my cat isn't constantly solving calculus problems? I also can't come up with a "mechanistic model" for what it means to do that either.

Further, if your rubric for "can reason with intelligence and have an opinion" is "looks like it" (and I certainly hope this isn't the case because woo-boy), then how did you not feel this way about Mark V. Shaney?

Like I understand that people live learning about the Chinese Room thought experiment like it's high school, but we actually know it's a program and how it works. There is no mystery.

naasking · on April 28, 2024

> but we actually know it's a program and how it works. There is no mystery.

You're right, we do know how it works. Your mistake is concluding that because we know how LLMs work and they're not that complicated, but we don't know how the brain works and it seems pretty complicated, therefore the brain can't be doing what LLMs do. That just doesn't follow.

You made exactly the same argument in the opposite direction, asking if my rubric for "can reason with intelligence and have an opinion" is "seems like it", and your rubric for "thinking is not a token predictor driven by matrix multiplications" is "seems like it".

You can make a case for the plausibility of each conclusion, but that's doesn't make it a fact, which is how you're presenting it.

krainboltgreene · on April 28, 2024

Dude it's a token predictor. This all sounds very nice until you snap back to reality and remember it's a token predictor and you're not a scientist. You're a web developer. You have no evidence, you have no studies, you have no proof. You're making a claim on the basis that everyone has as much understanding of the field as you and that's just wrong.

naasking · on April 28, 2024

What claim am I making, specifically?

naasking · on April 30, 2024

I'll take your silence as indication that you realize that I'm not making any claims beyond: we have no evidence to support your claims because, as I said from the very beginning, we lack a robust and detailed mechanistic model for what it means to think, so any claims that depend on the assumption that we do have that knowledge are speculation at best.

In fact, I think an even stronger case could be made that prediction is central to how our brains work, and the evidence is the rise of predictive coding models in neuroscience. It's too early still to say what form that prediction takes, but clearly your dismissal of "token prediction" as somehow meaningless or irrelevant to human thinking seems frankly silly.

int_19h · on April 28, 2024

The "stochastic parrot" crowd keeps repeating "it's just a token predictor!" like that somehow makes any practical difference whatsoever. Thing is, if it's a token predictor that consistently correctly predicts tokens that give the correct answer to, say, novel logical puzzles, then it is a reasoning token predictor, with all that entails.

krainboltgreene · on April 28, 2024

This isn't correct and I am extremely concerned if this is the level of logic running billions of dollars.

int_19h · on April 29, 2024

Then please go ahead and explain how something can solve novel logical puzzles (i.e. ones that are not present in its training set) without some capacity for reasoning. You're claiming that it is "generating texts that looks like ..." - so what is the "..." in this case? I posit that the word that should be placed there is solution, and then you need to explain why that is not ipso facto a demonstration of the ability to reason.

xanderlewis · on April 28, 2024

They’ll just look incredibly silly in, say, ten years from now.

In fact, much of the popular commentary around ChatGPT from around two years ago already looks so.

tavern1991 · on April 30, 2024

I couldn't agree more. It is shocking to me how many of my peers think something magic is happening inside an LLM. It is just a token predictor. It doesn't know anything. It can't solve novel problems.

xanderlewis · on April 28, 2024

> We don't fully understand why current LLMs are bad at these tasks.

Rather than asking why LLMs can’t do these tasks, maybe one should ask why we’d expect them to be able to in the first place? Do we fully understand why, for example, a cat can’t predict cellular automata? What would such an explanation look like?

I know there are some who will want to immediately jump in with scathing disagreement, but so far I’ve yet to see any solid evidence of LLMs being capable of reasoning. They can certainly do surprising and impressive things, but the kind of tasks you’re talking about require understanding, which, whilst obviously a very thorny thing to try and define, doesn’t seem to have much to do with how LLMs operate.

I don’t think we should be at all surprised that super-advanced autocorrect can’t exhibit intelligence, and we should spend our time building better systems rather than wondering why what we have now doesn’t work. It’ll be obvious in a few years (or perhaps decades) from now that we just had totally the wrong paradigm. It’s frankly bonkers to think you’re ever going to get a pure LLM to be able to do these kind of things with any degree of reliability just by feeding it yet more data or by ‘prompting it better’.

TacticalCoder · on April 27, 2024

> We have "reasoning machines" that are intelligent...

That's quite a statement.

oldsecondhand · on April 28, 2024

We have expert systems, theorem provers and planners but OP probably didn't mean this.