More

trq_ · 2025-11-07T17:24:58 1762536298

We're back up! It was about ~30 minutes of downtime this morning, our apologies if it interrupted your work.

trq_ · 2024-12-24T02:34:59 1735007699

Hmm the hallucination would happen in the auto labelling, but we review and test our labels and they seem correct!

trq_ · 2024-12-23T19:00:14 1734980414

If you're hacking on this and have questions, please join us on Discord: https://discord.gg/vhT9Chrt

trq_ · 2024-12-23T18:58:27 1734980307

We haven't yet found generalizable "make this model smarter" features, but there is a tradeoff of putting instructions in system prompts, e.g. if you have a chatbot that sometimes generates code, you can give it very specific instructions when it's coding and leave those out of the system prompt otherwise.

We have a notebook about that here: https://docs.goodfire.ai/notebooks/dynamicprompts

trq_ · on Oct 26, 2024

This is incredible! I haven't seen that repo yet, thank you for pointing it out, and the writing

trq_ · on Oct 25, 2024

Yeah, I think the idea of finding out what flavor of uncertainty you have is very interesting.

trq_ · on Oct 25, 2024

This is awesome, can't wait for evals against Claude Computer Use!

amelius · on Oct 25, 2024

Can we first test this with basic sysadmin work in a simple shell?

Can't wait to replace "apt get install" by "gpt get install" and then have it solve all the dependency errors by itself.

ErikBjare · on Oct 27, 2024

This had been possible for a year already. My project gptme does it just fine (like many other tools), especially now with Claude 3.5.

amelius · on Oct 27, 2024

I know that it exists. I was just hoping we can make such interactions (practically) bug-free before we move on to the next big thing.

anonym29 · on Oct 26, 2024

Threat actors can't wait for you to start doing this either.

asdev · on Oct 26, 2024

how can you write metrics against something that's non deterministic?

trq_ · on Oct 25, 2024

Yeah! I want to use the logprobs API, but you can't for example:

- sample multiple logits and branch (we maybe could with the old text completion API, but this no longer exists)

- add in a reasoning token on the fly

- stop execution, ask the user, etc.

But a visualization of logprobs in a query seems like it might be useful.

TZubiri · on Oct 25, 2024

Can't you?

1- option top_logprobs allows you not just to get the most likely token, but the top most likely tokens.

You can branch, by just chosing any point in your generated string and feed it back to the LLM, for example: { "user":"what is the colour of love?", "assistant":"the colour of love is"}

It's true that it will add an "assistant" tag, wand old completions was better for this.

trq_ · on Oct 25, 2024

I want to build intuition on this by building a logit visualizer for OpenAI outputs. But from what I've seen so far, you can often trace down a hallucination.

Here's an example of someone doing that for 9.9 > 9.11: https://x.com/mengk20/status/1849213929924513905

z3t4 · on Oct 25, 2024

I'm thinking versioning. 9.9, 9.10, 9.11 etc because in my native language we use the comma, for decimal separation 9,11 9,22 9,90

trq_ · on Oct 25, 2024

I mean, LLMs certainly know representations of what words means and their relationship to each other, that's what the Key and Query matrices hold for example.

But in this case, it means that the underlying point in embedding space doesn't map clearly to only one specific token. That's not too different from when you have an idea in your head but can't think of the word.

gibsonf1 · on Oct 25, 2024

You're missing my point. Words are simply serialized thoughts. When we humans read the words, like you would be doing for this sentence, you are building a model of what those words mean based on your conceptual understanding and experience in space-time. That modeling is how you can then determine if the model formed in your mind using the serialized words in the sentence corresponds to reality or not. For the LLM, there is actually no model of reality whatsoever, its just words, so there is no way the LLM would ever know if the words when modeled would be true or false etc.

TapamN · on Oct 25, 2024

An LLM does have a model of reality. An LLM's reality is built on the experiences (words) it's been feed.

Humans are similar. A human's reality is built on the experiences (senses) it's been feed. There definitely are several major differences, the obvious one being that we have a different sensory input than an LLM, but there are others, like human's having a instinctual base model of reality, shaped by the effects of natural selection over our ancestors.

Just like an LLM can't tell if the reality it's been fed actually corresponds to the "truer" outside reality (you could feed an LLM lies like the sky is plaid in such a way that it would report that it's true), a human can't tell if the reality it's been fed actually corresponds to a "truer" outside reality (humans could be feed lies like we are in true reality, when we're actually all NPCs in a video game for a higher level).

The LLM can't tell if it's internal reality matches an outside reality, and humans can't tell if their internal reality matches an outside reality, because both only have the input they've received to go on, and can't tell if it's problematic or it's incomplete.

gibsonf1 · on Oct 25, 2024

Words are not reality, they are just data serialized from human world experience, without reference to the underlying meaning of those words. An LLM is unable to build the conceptual space-time model that the words reference, thus it has no understanding whatsoever of the meaning of those words. The evidence for this is everywhere in the "hallucinations" of LLM. It just statistics on words, and that gets you nowhere to understanding the meaning of words, that is conceptual awareness of matter through space-time.

astrange · on Oct 25, 2024

This is a reverse anthropic fallacy. It may be true of a base model (though it probably isn't), but it isn't true of a production LLM system, because the LLM companies have evals and testing systems and such things, so they don't release models that clearly fail to understand things.

You're basically saying that no computer program can work, because if you randomly generate a computer program then most of them don't work.

gibsonf1 · on Oct 25, 2024

Not at all. I'm saying there is a difference between statistics about word data and working with space-time data and concepts that classify space-time. We do the latter https://graphmetrix.com/trinpod-server

dTal · on Oct 25, 2024

Insofar as this is a philosophically meaningful assertion, it isn't true. LLMs live in a universe of words, it is true; within that universe, they absolutely have world models, which encode the relationships between concepts encoded by words. It's not "reality", but neither are the conceptual webs stored in human brains. Everything is mediated through senses. There's no qualitative difference between an input stream of abstract symbols, and one of pictures and sounds. Unless you think Helen Keller lacked a concept of true and false?

gibsonf1 · on Oct 25, 2024

They don't have world models, they have word models. A very big difference indeed!

warkdarrior · on Oct 25, 2024

Would you say that blind-deaf-paralyzed people do not have world models either, since they can only experience the world through words?

gibsonf1 · on Oct 27, 2024

Well, if they have hearing, they can build a world model based on that sensation. So when someone talks about the fall, they can remember the sound of leaves hitting other leaves when they fall. The senses give us measurement data on reality that we use to then model reality. We humans then can create concepts about that experience, and then ultimately communicate with other using common words to communication that conceptual understanding. Word data alone is just word data with no meaning. This is why when I look at a paragraph in Russian, it has no meaning for me. (As I don't understand Russian)