That is precisely what autoregressive means. Perhaps you meant to write that mod...

janalsncm · 2025-10-20T22:48:16 1761000496

I think they are distinguishing the mechanical process of generation from the way the idea exists. It’s the same as how a person can literally only speak one word at a time but the ideas might be nonlinear.

sailingparrot · 2025-10-20T23:12:09 1761001929

Indeed what I meant. The LLM isn’t a blank slate at the beginning of each new token during autoregression as the kv cache is there.

bjourne · 2025-10-21T18:51:13 1761072673

If so they are wrong. :) Autoregressive just means that the probability of the next token is just a function of the already seen/emitted tokens. Any "ideas that may exist" are entirely embedded in this sequence.

sailingparrot · 2025-10-21T19:32:43 1761075163

> entirely embedded in this sequence.

Obviously wrong, as otherwise every model would predict exactly the same thing, it would not even be predicting anymore, simply decoding.

The sequence is not enough to reproduce the exact output, you also need the weights.

And the way the model work is by attending to its own internal state (weights*input) and refining it, both across the depth (layer) dimension and across the time (tokens) dimension.

The fact that you can get the model to give you the exact same output by fixing a few seeds, is only a consequence of the process being markovian, and is orthogonal to the fact that at each token position the model is “thinking” about a longer horizon than the present token and is able to reuse that representation at later time steps

bjourne · 2025-10-21T23:15:05 1761088505

Well, that an autoregressive model has parameters does not mean it is not autoregressive. LLMs are not Markovian.

sailingparrot · 2025-10-22T02:49:51 1761101391

At no point have I argued that LLMs aren’t autoregressive, I am merely talking about LLMs ability to reason across time steps, so it seems we are talking past each other which won’t lead anywhere.

And yes, LLM can be studied under the lens of Markov processes: https://arxiv.org/pdf/2410.02724

Have a good day