More

fi-le · 2025-07-05T11:36:50 1751715410

At least in this instance, it came from my fleshy human brain. Although I perhaps used it to come off as smarter than I really am - just like an LLM might.

fi-le · 2025-06-12T06:45:35 1749710735

That is the sweetest compliment I could have hoped for, thank you.

fi-le · 2025-06-12T06:42:39 1749710559

I wrote it and am really confused what you mean, I think I'm missing a joke? If not this is serious, it shouldn't be there.

thomasanders0n · 2025-06-13T18:30:11 1749839411

Perfectly fine for me.

fi-le · 2025-04-05T08:55:57 1743843357

You're completely right, my argument is fundamentally wrong because it relies on the commutativity, but the embedding matrix obviously does not treat some columns differently than others. Back to the drawing board I suppose. Thanks!

fi-le · 2025-04-05T08:51:19 1743843079

Oh, are you getting a captcha when accessing the site this links to? If so, I didn't know this.

tazjin · 2025-04-05T09:12:08 1743844328

It usually depends on location, for example Cloudflare has a setting somewhere for "always show captchas for non-western traffic" and a lot of people set it.

fi-le · 2025-04-05T15:00:06 1743865206

Wow, I guess my hosting provider uses Cloudfare and that setting then.

fi-le · 2025-04-05T08:49:14 1743842954

This is a great point, I think I might have been wrong actually. It doesn't really make sense that one row of the embedding matrix is treated differently than another...

markisus · 2025-04-05T19:19:14 1743880754

Indeed. Maybe the learned circuit does something like the following. For each token's feature vector, compute a representation of positions where it appears in the sentence. This could be made possible by the positional embeddings.

Token Features 0 => list[1, 5, 6, 10]

Token Features 1 => list[7, 8]

...

These "list features" would be invariant to Caesar cipher. So then the LLM could pass these list features to a learned Caesar cipher decoder unit to spit out the decoded text.

It's still unexplained, however, why the Byzantine Music Notation would trigger the this circuit while other Caesar cipher's wouldn't.

fi-le · 2025-03-05T08:51:54 1741164714

You're right, evidently my programming wasn't quite up to the task here. Good to see though that there's demand, so I'll try and make a better version implementing something like you are describing.

fi-le · 2025-01-27T14:03:01 1737986581

I was wondering where the traffic came from, thanks for mentioning it!

fi-le · on Jan 2, 2025

We're doing a successor to this, working hard and going public in month or so, hopefully. But HN gets a preview of course: https://huggingface.co/datasets/lennart-finke/SimpleStories

And here's a more interactive explorer: https://fi-le.net/simplestories

jmward01 · on Jan 3, 2025

This looks like a great dataset! Thanks for posting. I'm looking for projects just like this to try my training modifications against. Do you have initial results shown? It is a small model/dataset so training the gpt2 model in the repo probably wouldn't be too hard but it would be good to have reference runs to make sure things are set up right when I run it.

fi-le · on Jan 3, 2025

So glad you like it! If I understand your question correctly, yes, we are also putting together a small library for training small language models. It's not mature at all yet, but can keep up with our progress here: https://github.com/danbraunai/simple_stories_train

jmward01 · on Jan 3, 2025

Yeah. I looked at the dataset and there are a lot of possible tasks you could train against here since it has some great annotations. So, having a simple reference baseline, like a pretrain gpt2 run (which I think your repo is set up to do), helps give a starting point for other work. It looks like the dataset is small enough and the gpt2 ref code in your repo is lightweight enough to do a quick run and plot some curves on. Thanks!

coder543 · on Jan 3, 2025

Does template_plural actually work well / offer any benefits?

fi-le · on Jan 5, 2025

It does, we use it as a default. Some possible benefits are that 1) it saves input tokens 2) in theory allows for different variations on a theme, whereas with two seperate prompts you run the risk of repeating one topic.

fi-le · on April 29, 2023

You awe absowutely cowwect, a cwassic wookie mistake on my pawt! And thanks ~~~ヾ(＾∇＾)