Hacker Newsnew | past | comments | ask | show | jobs | submit | leminimal's commentslogin

The true final level is trying to download the level 48 certificate, get a 403 and open it to see

neal.fun

Verify you are human by completing the action below.

Verify you are human

neal.fun needs to review the security of your connection before proceeding.

alongside the Cloudflare logo.


Kudos on your release! I know this was just made available but

- Somewhere the README, consider adding the need for a `-DWEIGHT_TYPE=hwy::bfloat16_t` flag for non-sfp. Maybe around step 3.

- The README should explicitly say somehere that there's no GPU support (at the moment)

- "Failed to read cache gating_ein_0 (error 294)" is pretty obscure. I think even "(error at line number 294)" would be a big improvement when it fails to FindKey.

- There's something odd about the 2b vs 7b model. The 2b will claim its trained by Google but the 7b won't. Were these trained on the same data?

- Are the .sbs weights the same weights as the GGUF? I'm getting different answers compared to llama.cpp. Do you know of a good way to compare the two? Any way to make both deterministic? Or even dump probability distributions on the first (or any) token to compare?


Yes - thanks for pointing that out. The README is being updated, you can see an updated WIP in the dev branch: https://github.com/google/gemma.cpp/tree/dev?tab=readme-ov-f... and improving error messages is a high priority.

The weights should be the same across formats, but it's easy for differences to arise due to quantization and/or subtle implementation differences. Minor implementation differences has been a pain point in the ML ecosystem for a while (w/ IRs, onnx, python vs. runtime, etc.), but hopefully the differences aren't too significant (if they are, it's a bug in one of the implementations).

There were quantization fixes like https://twitter.com/ggerganov/status/1760418864418934922 and other patches happening, but it may take a few days for patches to work their way through the ecosystem.


Thanks, I'm glad to see your time machine caught my comment.

I'm using the 32-bit GGUF model from the Google repo, not a different quantized model, so I could have one less source of error. It's hard to tell with LLMs if its a bug. It just gives slightly stranger answers sometimes, but it's not completely gibberish. or incoherent sentences or have extra punctuations like with some other LLM bugs I've seen.

Still, I'll wait a few days to build llama.cpp again to see if there are any changes.


Do you have a version that doesn't need Windows and/or a Microsoft account? Or an uncut video of someone using it?


Are there project-based tutorial that talks more about neural net architecture, hyperparameters selection and debugging? Something that walks through getting poor results and make explicit the reasoning for tweaking?

When I try to use transformers or any AI thing on a toy problem I come up with, it never works. Even Fizz-Buzz which I thought was easy doesn't work (because division or modulo is apparently hard to represent for NNs). And there's this blackbox of training that's hard to debug into. Yes, for the available resources, if you pick the exact same problem, the exact same NN architecture and exact same hyperparameters, it all works out. But surely they didn't get that on the first try. So what's the tweaking process?

Somehow this point isn't often talked about in courses and consequently the ones who've passed this hurdle don't get their experience transferred. I'd follow an entire course on this if it were available. An HN commenter linked me to this

https://karpathy.github.io/2019/04/25/recipe/

which is exactly on point. But it'd be great if it were one or more tutorials with a specific example, wrapped in code and peppered with many failures.


There’s an interactive neural network you can train here, which can give some intuition on wider vs larger networks:

https://mlu-explain.github.io/neural-networks/

See also here:

http://playground.tensorflow.org/


There's no great answer to this question. It is a bunch of tricks. Fundamentally:

If you're saying FizzBuzz doesn't work, presumably you mean that encoding the n directly doesn't work. Neither does encoding n from 0 to 1 or between -1 and 1 (and don't forget: obviously don't use relu with -1 to 1). It doesn't.

Neural networks can do a LOT of things, but they cannot deal with numbers. And they certainly cannot deal with natural or real numbers. BUT they can deal with certain encodings.

Instead of using the number directly, give one input to the neural network per bit of the number. That will work. Just pass in the last 10 bits of the number.

Or cheat and use transformers. Pass in the last 5 generations and have it construct the next FizzBuzz line. That will work. Because it's possible.

To make the number-based neural network for FizzBuzz "perfect" think about it. The neural network needs to be able to divide by 3 and 5. They can't. You can't fix that. You must make it possible for the neural network to learn the algorithm for dividing by 3 and 5 ... 2, 3 and 5 are relative primes (and actual primes). So "cheat" and pass in numbers in base 15 (by one-hot encoding the number mod 15 for example).

PM me if you'd like to debug whatever network you have together over zoom or Google meets or whatever.

https://en.wikipedia.org/wiki/One-hot

This may be catastrophically wrong. I only have a master's in machine learning (a European master's degree, meaning I've written several theses on it (didn't pass first time, had to work full time to be able to study), and I was writing captcha crackers using ConvNets in 2002. But I've never been able to convince anyone to hire me to do anything machine learning related.


Thanks for answering, what you wrote here is exactly the sort of thing I'm talking about. Something implicit that's known but not obvious if you look at the first few lectures of the first few courses (or blogs or announcements, etc).

You mention bag of tricks and that's indeed one issue but its worse than that because it includes knowing what "silent problems" needs a trick applied to it in the first place!

Indeed, despite using vectors everywhere, NN are bad with numerical input encoded as themselves! Its almost like the only kind of variables you can have are fixed size enums. That you then encode into vectors that are as far apart as possible, and unit vectors ("one hot vectors") do this. But that's not quite it and sometimes you can still some meaningful metric on the input that's preserved in the encoding (example: word embeddings). And so its again unclear what you can give it and what you can't.

In this toy example, I have an idea of what the shape of the solution is. But generally I do not and would not know to use a base 15 encoding or to send it the last 5 (or 15) outputs as inputs. I know you already sort of addressed this point in your last few paragraphs.

I'm still trying out toy problems at the time so it might be a "waste" of your time to troubleshoot these but I'm happy to take you up on the offer. HN doesn't have PMs though.

Do you remember when you first learned about the things you are using in your reply here? Was it in a course or just asking someone else who worked on NN for longer? I learned through by googling and finding comment threads like these! But they are not easy to collect or find together.


(I've added an email to my profile. I hope you can see it. Feel free to flick me an email or google chat me)


> This may be catastrophically wrong. I only have a master's in machine learning (a European master's degree, meaning I've written several theses on it (didn't pass first time, had to work full time to be able to study), and I was writing captcha crackers using ConvNets in 2002. But I've never been able to convince anyone to hire me to do anything machine learning related.

Oh wow, those are great credentials. I'm surprised that you haven't run across a position yet. Maybe it is a matter of your location? It seems like a lot of these jobs want onsite workers, which can be a real problem.

TBH, I get the feeling that a lot of us without such credentials are in a similar position right now. Slowly trying to work our way towards what seems to be a big new green field, but having a really unclear path to getting there...


Yes. I created a course which uses implementing Stable Diffusion from scratch as the project, and goes through lots of architecture choices, hyperparam selection, and debugging. (But note that this isn't something that's fast or easy to learn - it'll take around a month full-time intensive study.) https://course.fast.ai/Lessons/part2.html


Thanks for making that course. It was on my list of courses to look at since GPT-4 recommended it (with all the caveat that entails :) ). Thanks for also making notebooks available alongside the videos.

However, can you point me to the lectures where training happen (and architecture choices, hyperparam selection, and debugging happens.). I'm less familiar with SD but at a quick glance it seems like we're using a pretrained model and implementing bits that will eventually be useful for training but not training a new model, at least in the beginning of the deep dive notebook and first few lessons (starting at part 2, lesson 9).


Maybe this is more of a general ML question but I faced it when transformers became popular. Do you know of a project-based tutorial that talks more about neural net architecture, hyperparameters selection and debugging? Something that walks through getting poor results and make explicit the reasoning for tweaking?

When I try to use transformers or any AI thing on a toy problem I come up with, it never works. And there's this blackbox of training that's hard to debug into. Yes, for the available resources, if you pick the exact problem, the exact NN architecture and exact hyperparameters, it all works out. But surely they didn't get that on the first try. So what's the tweaking process?


There is A. Karpathy's recipe for training NNs but it is not a walkthrough with an example:

https://karpathy.github.io/2019/04/25/recipe/

but the general idea of "get something that can overfit first" is probably pretty good.

In my experience getting the data right is probably the most underappreciated thing. Karpathy has data as step one, but in my experience, also data representation and sampling strategy does quite the miracle.

In Part II of our book we do an end-to-end project including e.g. a moment where nothing works until we crop around "regions of interest" to balance the per-pixel classes in the training data for the UNet. This has been something I have pasted into the PyTorch forums every now and then, too.


Thanks for linking me to that post! Its much better at expressing what I'm trying to say. I'll have a careful read of it now.

I think I'm still at a step before the overfit. It doesn't converge to a solution on its training data (fit or overfit). And all my data is artificially generated so no cleaning is needed (though choosing a representation still matters). I don't know if that's what you mean by getting the data right or something else. Example problems that "don't work": fizzbuzz, reverse all characters in a sentence.


I also had trouble with Lemmy's UI and made a different frontend for myself. Here's some screenshots.

1. https://postimg.cc/PPRMGw7k

2. https://postimg.cc/mcNMrzmk

3. https://postimg.cc/7CVG4vLT

I was thinking of making it more widely available but didn't know if there'd be enough users to make it worthwhile and if interest in Lemmy would last.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: