For me, I've realized I often cannot possibly learn something if I can't compare it to something prior first.
In this case, as another user mentioned, the decoupling use case is a great one. Instead of two processes/API directly talking, having an intermediate "buffer" process/API can save you headache
The concept of connascence, and not coupling is what I find more useful for trade off analysis.
Synchronous connascence means that you only have a single architectural quanta under Neil Ford’s terminology.
As Ford is less religious and more respectful of real world trade offs, I find his writings more useful for real world problems.
I encourage people to check his books out and see if it is useful. It was always hard to mention connascence as it has a reputation of being ivory tower architect jargon, but in a distributed system world it is very pragmatic.
One is the practical and societal consequences, iteratively, over the next few decades. Fine, this is important discussion. If this is what you're discussing, I have no worries - automation taking a significant portion of jobs, including software engineering, is a huge worry.
The other thing is this almost schadenfreude of intelligence. The argument goes something like, if AGI is a superset of all our intellectual, physical, and mental capabilities, what point is there of humans? Not from an economic perspective, but literally, a "why do humans exist" perspective? It would be "rational" to defer all of your thinking to a hyperintelligent AGI. Obviously.
The latter sentiment I see a decent bit on hackernews. You see it encoded in psychoanalytic comments like, "Humans have had the special privilege of being intelligent for so long, that they can't fathom that something else is more intelligent than them."
For me, the only actionable conclusion I can see from a philosophy like this is to Lie Down and Rot. You are not allowed to use your thinking, because a rational superagent has simply thought about it more objectively and harder than you.
I don't know. That kind of thinking, be it from intuitively when I was in my teens, to learning about government and ethics (Rational Utopianism, etc.) has always ticked me off. Incidentally, every single person who's thought that way unequivocally, I've disliked.
Of course, if you phrase it like this, you'll get called irrational and quickly get compared to not so nice things. I don't care. Compare me all you want to unsavory figures, this kind of psychoanalytic gaslighting statement is never conducive to "good human living".
Don't care if the rebuttal analogy is "well, you're a toddler throwing a tantrum, while the AGI simply moves on". You can't let ideologies like the second get to you.
Very interested in this! I'm mainly a ChatGPT user; for me, o3 was the first sign of true "intelligence" (not 'sentience' or anything like that, just actual, genuine usefulness). Are these models at that level yet? Or are they o1? Still GPT4 level?
Not nearly o3 level. Much better than GPT4, though! For instance Qwen 3 30b-a3b 2507 Reasoning gets 46 vs GPT 4's 21 and o3's 60-something on Artificial Analysis's benchmark aggregation score. Small local models ~30b params and below tend to benchmark far better than they actually work, too.
The etymology of the "markov property" is that the current state does not depend on history.
And in classes, the very first trick you learn to skirt around history is to add Boolean variables to your "memory state". Your systems now model, "did it rain The previous N days?" The issue obviously being that this is exponential if you're not careful. Maybe you can get clever by just making your state a "sliding window history", then it's linear in the number of days you remember. Maybe mix the both. Maybe add even more information .Tradeoffs, tradeoffs.
I don't think LLMs embody the markov property at all, even if you can make everything eventually follow the markov property by just "considering every single possible state". Of which there are (size of token set)^(length) states at minimum because of the KV cache.
The KV cache doesn't affect it because it's just an optimization. LLMs are stateless and don't take any other input than a fixed block of text. They don't have memory, which is the requirement for a Markov chain.
Have you ever actually worked with a basic markov problem?
The markov property states that your state is a transition of probabilities entirely from the previous state.
These states, inhabit a state space. The way you encode "memory" if you need it, e.g. say you need to remember if it rained the last 3 days, is by expanding said state space. In that case, you'd go from 1 state to 3 states, 2^3 states if you needed the precise binary information for each day. Being "clever", maybe you assume only the # of days it rained, in the past 3 days mattered, you can get a 'linear' amount of memory.
Sure, a LLM is a "markov chain" of state space size (# tokens)^(context length), at minimum. That's not a helpful abstraction and defeats the original purpose of the markov observation. The entire point of the markov observation is that you can represent a seemingly huge predictive model with just a couple of variables in a discrete state space, and ideally you're the clever programmer/researcher and can significantly collapse said space by being, well, clever.
For me at least, I wasn't even under the impression that this was a possible research angle to begin with. Crazy stuff that people are trying, and very cool too!
> The problem with using them is that humans have to review the content for accuracy.
There are (at least) two humans in this equation. The publisher, and the reader. The publisher at least should do their due diligence, regardless of how "hard" it is (in this case, we literally just ask that you review your OWN CITATIONS that you insert into your paper). This is why we have accountability as a concept.
In this case, as another user mentioned, the decoupling use case is a great one. Instead of two processes/API directly talking, having an intermediate "buffer" process/API can save you headache
reply