Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But a conversation with an LLM doesn't only depend on the last message, it depends on all previous messages. Sure the number of messages is limited by the context window, so you can indeed say it's a high-order Markov chain (it can only see N states back), but theoretically, what an actual non-Markovian LLM would look like? Something which has an infinite context? It's not physically possible... At this point it sounds like "it's a Markov chain" is not a very useful statement.


You appear to be confusing the UI over the LLM with the actual working of the LLM. None of them allow for more than one message to exist. You put in one message and you get one word (well, one token) back.


I have an idea of how they work. At the end of the day, it's all just numbers. You say you put "one message", but in reality it's N input numbers corresponding to the tokens (where N can very from 1 to any size). Basically N variables. An LLM is one large function which takes N values as input and produces a new output value. So you can view an LLM as y = f(a, b, c, ...)

What I don't understand is how the line is drawn between "past" and "present". Why are variables a,b,c viewed as "present" and not "past"? How is past defined? Why can't you view token generation as f(past1, past2, past3, present)?

As I said, it does look "Markovian" if the number of "pasts" is fixed, but again, what exactly is not Markovian then? Infinite inputs? Everything in the real world is finite.


Let's keep it simple and say that the procedure of an LLM is

  state: list[n];
  while (true){
      new_token = llm(state);
      state.pop_front();
      state.push_back(new_token);
  }
Any given time that llm() is called, it is blind to previous invocations. The state may contain evidence of them, or it may not. It's a free-form data structure that is sometimes filled with user input and sometimes contains previous llm() outputs, but llm() doesn't care.


So any pure function is Markovian then? They too only depend on their input and are blind to previous invocations. I’m just trying to understand the practical purpose of calling something a Markov chain beyond small sizes, because if we allow the state to be of any size, I’m not sure what the usefulness is: anything can be called a Markov chain if you concatenate all input data of any size into one blob and interpret it as "one state."


>So any pure function is Markovian then?

I guess it would be more accurate to say that at its core a Markovian process is based around a pure function. Usually there's some kind of feedback loop around the function that allows the process to evolve in some way and continue producing output.

>I’m pretty sure anything can be modeled using pure functions.

We're not talking about theoretical models of black boxes, though, we're talking about known mechanisms. A recursive function doesn't become iterative just because it could be thought of as iterative (because eventually the CPU does loop over some sequence of instructions). We still say it's recursive.

>I’m not sure what the usefulness is

I suppose part of the usefulness is to de-mystify LLMs. If you can understand what Markov chains do, the thought "oh. An LLMs is just a much more complex Markov chain" can help reasoning about them.


“just”?

If you include the entire past history as part of the state, then any stochastic process becomes a Markov process.

Any computer program (without network stuff or inputs for a time (assume your computer has a true RNG module)) you could run is a Markov process. Just say that your state space is the space of possible ways the memory of the computer can be.

Saying that the LLM is “just” a Markov process seems to be doing something a bit weird with that “just”.


>Any computer program (without network stuff or inputs for a time (assume your computer has a true RNG module)) you could run is a Markov process. Just say that your state space is the space of possible ways the memory of the computer can be.

No. The entire computer is a Markov process in that situation. An individual process may still remember its past states and therefore not be Markovian. See the example I gave about allowing or blocking transactions.


By “memory” I was including whatever is on disk, and the values of all the CPU registers, and all the CPU flags, etc.

So, yeah, the state of the computer (regarded as an abstract machine, not viewed at the level of the physical hardware)

Oh, you mean because I said a particular program, and other things on the computer could interfere with the program. Ok, point. I was imagining a computer that just had a single program running on it (with no separate OS) I guess? But yeah, good point, I certainly didn’t make my idea precise enough (in my own mind or in my words).


Yes, I understood what you meant and my answer assumes that the contents of all memory -- volatile and persistent -- constitute the state of the machine.

It doesn't matter whether the program is the only thing the computer is doing or whether there are other things running on the same machine. The program considered as a machine unto itself is a conceptually different thing from the underlying hardware. The Markov property is a way to reason about how a process interacts with its state. "I could design a Markov process that's semantically equivalent to a non-Markov process, by just including more state" is a pointless statement. The whole point of the discrimination is to reason about the thing you're studying.


>what an actual non-Markovian LLM would look like? Something which has an infinite context? It's not physically possible... At this point it sounds like "it's a Markov chain" is not a very useful statement.

Uh... You're moving the goalposts. I don't know the exact definition of "LLM", but let's suppose that it and the definition of "Markov chain" make us conclude that LLMs are necessarily Markov chains. "It's a Markov chain" is still a useful statement, because there are processes that are neither LLMs nor Markov chains.


The point being made is equivalent to pointing out that by strict mathematical definition there aren't physically reliazable distinguishably-non-Markovian processes, because computers are finite objects. Yes, you can talk about Markov chains as referring to everything up to and including actual human brains, but the kind of Markov chains people actually refer to as Markovian in practice are much more specific than that, typically those systems with structurally simple, indexical nodes, but sometimes extending to more complex ideas like RNNs where the states are at least of finite bandwidth. An LLM not only has continuous-valued nodes, it has nodes whose description length grows proportionally to and theoretically-unboundedly with respect to the size of its history.


>because there are processes that are neither LLMs nor Markov chains.

I'm not sure what those would be. All dynamical systems (essentially all of classical physics) except maybe quantum mechanics are a Markov chain.


Well, I gave an example already, so...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: