If the recent revelations in the Epic vs Google court case are anything to go by, Motorola is likely getting paid by Google for every single Google search and Google Play Store transaction that occurs on that phone. It could even be sold at a loss. I don't think pointing at a low-cost smartphone is a very useful point of reference given that context.
However, I think there are plenty of Linux SBCs (single board computers) that have 4GB of RAM for around $50, just no screen, GPS, cellular modem, cameras, speakers... all sorts of things that add cost to a cell phone. $10 is a far-fetched claim, in my opinion, and citations are needed. The Pi Zero 2W is $15 and only has 512MB of RAM. So, sure, let's go with $50.
Have you considered how cheap printers are? I see multiple inkjet printers on Amazon that cost $59. Adding $50 would nearly double the price of the unit. Other manufacturers would eat their lunch, so you can see why no one is rushing to offer a $59 printer with an additional $50 worth of computer built in. Even if it were "only" $25 extra, that is still significant.
At the higher end, printers do start to include more of everything, but those aren't the printers the average consumer is buying.
Two components do not make a product. The SBC market is a better litmus test for the real costs. There is plenty of competition making products of all kinds.
Every 7950X offers a 65W mode. It’s not a separate SKU.
It’s a choice each user can make if they care more about efficiency. Tasks take longer to complete, but the total energy consumed for the completion of the task is dramatically less.
Of course most users don’t pick the 65W option. They want maximum performance, and the cost of electricity is largely negligible to most people buying a 7950X.
AMD isn’t going to offer a huge discount for a 65W 7950X for the reasons discussed elsewhere: they don’t need to.
> and the cost of electricity is largely negligible to most people buying a 7950X.
“I can afford it” is still wasteful. Most people wouldn’t even notice the speed difference in 65 W mode. Even if they can afford the space heater mode.
It also includes a link to the TTP form, although the form itself seems to make no reference to Imagen being part of the program anymore, confusingly. (Instead indicating that Imagen is GA.)
> 96GB of weights. You won't be able to run this on your home GPU.
This seems like a non-sequitur. Doesn't MoE select an expert for each token? Presumably, the same expert would frequently be selected for a number of tokens in a row. At that point, you're only running a 7B model, which will easily fit on a GPU. It will be slower when "swapping" experts if you can't fit them all into VRAM at the same time, but it shouldn't be catastrophic for performance in the way that being unable to fit all layers of an LLM is. It's also easy to imagine caching the N most recent experts in VRAM, where N is the largest number that still fits into your VRAM.
Someone smarter will probably correct me, but I don’t think that is how MoE works. With MoE, a feed-forward network assesses the tokens and selects the best two of eight experts to generate the next token. The choice of experts can change with each new token. For example, let’s say you have two experts that are really good at answering physics questions. For some of the generation, those two will be selected. But later on, maybe the context suggests you need two models better suited to generate French language. This is a silly simplification of what I understand to be going on.
One viable strategy might be to offload as many experts as possible to the GPU, and evaluate the other ones on the CPU. If you collect some statistics which experts are used most in your use cases and select those for GPU acceleration you might get some cheap but notable speedups over other approaches.
This being said, presumably if you’re running a huge farm of GPUs, you could put each expert onto its own slice of GPUs and orchestrate data to flow between GPUs as needed. I have no idea how you’d do this…
Yes, that's more or less it - there's no guarantee that the chosen expert will still be used for the next token, so you'll need to have all of them on hand at any given moment.
yes I read that. do you think it's reasonable to assume that the same expert will be selected so consistently that model swapping times won't dominate total runtime?
Just mentioning in case it helps anyone out: Linux already has a disk buffer cache. If you have available RAM, it will hold on to pages that have been read from disk until there is enough memory pressure to remove them (and then it will only remove some of them, not all of them). If you don't have available RAM, then the tmpfs wouldn't work. The tmpfs is helpful if you know better than the paging subsystem about how much you really want this data to always stay in RAM no matter what, but that is also much less flexible, because sometimes you need to burst in RAM usage.
You must enable Bard history for these features to work, and you must go to the extensions page and make sure they’re turned on. Arguing with a model that doesn’t have access to the extensions won’t make it suddenly use the extensions.
Summarizing my last 5 emails worked just fine with bard using that query.
Ok, that makes more sense, but if people haven't migrated away by now... the odds seem increasingly likely that they won't migrate in time to avoid that deadline.
yes. Finetuning a whisper model on a RPi 5 is ~2x faster than on the RPi 4. Other stages involving data pre-processing with HF dataset is again 2x-3x faster.
Founder of Replicate here. We open pull requests on models[0] to get them running on Replicate so people can try out a demo of the model and run them with an API. They're also packaged with Cog[1] so you can run them as a Docker image.
Somebody happened to stumble across our fork of the model and submitted it. We didn't submit it nor intend for it to be an ad. I hope the submission gets replaced with the upstream repo so the author gets full credit. :)
I'm curious, How did you know about this thread here? I've seen this happen where a blog or site is mentioned and the author shows up. It's there software to monitor when you're mentioned on HN or did you just happen to browse it?
You might find https://syften.com/ interesting. I use it for monitoring Reddit and all kinds of communities for mentions of my name and the titles of my books.
However, I think there are plenty of Linux SBCs (single board computers) that have 4GB of RAM for around $50, just no screen, GPS, cellular modem, cameras, speakers... all sorts of things that add cost to a cell phone. $10 is a far-fetched claim, in my opinion, and citations are needed. The Pi Zero 2W is $15 and only has 512MB of RAM. So, sure, let's go with $50.
Have you considered how cheap printers are? I see multiple inkjet printers on Amazon that cost $59. Adding $50 would nearly double the price of the unit. Other manufacturers would eat their lunch, so you can see why no one is rushing to offer a $59 printer with an additional $50 worth of computer built in. Even if it were "only" $25 extra, that is still significant.
At the higher end, printers do start to include more of everything, but those aren't the printers the average consumer is buying.