Hacker Newsnew | past | comments | ask | show | jobs | submit | more fi-le's commentslogin

At least in this instance, it came from my fleshy human brain. Although I perhaps used it to come off as smarter than I really am - just like an LLM might.


That is the sweetest compliment I could have hoped for, thank you.


I wrote it and am really confused what you mean, I think I'm missing a joke? If not this is serious, it shouldn't be there.


Perfectly fine for me.


You're completely right, my argument is fundamentally wrong because it relies on the commutativity, but the embedding matrix obviously does not treat some columns differently than others. Back to the drawing board I suppose. Thanks!


Oh, are you getting a captcha when accessing the site this links to? If so, I didn't know this.


It usually depends on location, for example Cloudflare has a setting somewhere for "always show captchas for non-western traffic" and a lot of people set it.


Wow, I guess my hosting provider uses Cloudfare and that setting then.


This is a great point, I think I might have been wrong actually. It doesn't really make sense that one row of the embedding matrix is treated differently than another...


Indeed. Maybe the learned circuit does something like the following. For each token's feature vector, compute a representation of positions where it appears in the sentence. This could be made possible by the positional embeddings.

Token Features 0 => list[1, 5, 6, 10]

Token Features 1 => list[7, 8]

...

These "list features" would be invariant to Caesar cipher. So then the LLM could pass these list features to a learned Caesar cipher decoder unit to spit out the decoded text.

It's still unexplained, however, why the Byzantine Music Notation would trigger the this circuit while other Caesar cipher's wouldn't.


You're right, evidently my programming wasn't quite up to the task here. Good to see though that there's demand, so I'll try and make a better version implementing something like you are describing.


I was wondering where the traffic came from, thanks for mentioning it!


We're doing a successor to this, working hard and going public in month or so, hopefully. But HN gets a preview of course: https://huggingface.co/datasets/lennart-finke/SimpleStories

And here's a more interactive explorer: https://fi-le.net/simplestories


This looks like a great dataset! Thanks for posting. I'm looking for projects just like this to try my training modifications against. Do you have initial results shown? It is a small model/dataset so training the gpt2 model in the repo probably wouldn't be too hard but it would be good to have reference runs to make sure things are set up right when I run it.


So glad you like it! If I understand your question correctly, yes, we are also putting together a small library for training small language models. It's not mature at all yet, but can keep up with our progress here: https://github.com/danbraunai/simple_stories_train


Yeah. I looked at the dataset and there are a lot of possible tasks you could train against here since it has some great annotations. So, having a simple reference baseline, like a pretrain gpt2 run (which I think your repo is set up to do), helps give a starting point for other work. It looks like the dataset is small enough and the gpt2 ref code in your repo is lightweight enough to do a quick run and plot some curves on. Thanks!


Does template_plural actually work well / offer any benefits?


It does, we use it as a default. Some possible benefits are that 1) it saves input tokens 2) in theory allows for different variations on a theme, whereas with two seperate prompts you run the risk of repeating one topic.


You awe absowutely cowwect, a cwassic wookie mistake on my pawt! And thanks ~~~ヾ(^∇^)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: