More

coder543 · on Dec 17, 2023

If the recent revelations in the Epic vs Google court case are anything to go by, Motorola is likely getting paid by Google for every single Google search and Google Play Store transaction that occurs on that phone. It could even be sold at a loss. I don't think pointing at a low-cost smartphone is a very useful point of reference given that context.

However, I think there are plenty of Linux SBCs (single board computers) that have 4GB of RAM for around $50, just no screen, GPS, cellular modem, cameras, speakers... all sorts of things that add cost to a cell phone. $10 is a far-fetched claim, in my opinion, and citations are needed. The Pi Zero 2W is $15 and only has 512MB of RAM. So, sure, let's go with $50.

Have you considered how cheap printers are? I see multiple inkjet printers on Amazon that cost $59. Adding $50 would nearly double the price of the unit. Other manufacturers would eat their lunch, so you can see why no one is rushing to offer a $59 printer with an additional $50 worth of computer built in. Even if it were "only" $25 extra, that is still significant.

At the higher end, printers do start to include more of everything, but those aren't the printers the average consumer is buying.

whatever1 · on Dec 17, 2023

MTK6765V is $9.9 [1] 3GB LPDDR3 is $10 [2]

[1] https://www.martview.com/mtk-cpu-mt6765v-cb-1839-amsh-btpkvx... [2] https://tinyurl.com/5zretykj

Plus I doubt that a printer needs that much memory or strong cpu to print.

coder543 · on Dec 17, 2023

Two components do not make a product. The SBC market is a better litmus test for the real costs. There is plenty of competition making products of all kinds.

coder543 · on Dec 13, 2023

Every 7950X offers a 65W mode. It’s not a separate SKU.

It’s a choice each user can make if they care more about efficiency. Tasks take longer to complete, but the total energy consumed for the completion of the task is dramatically less.

nottorp · on Dec 13, 2023

You think? I think it's a choice that very few technical users even know about. And of those, 90% don't care about efficiency.

The 250 W space heater vacuum cleaner soundtrack mode should be opt in rather than opt out. Same for video cards.

coder543 · on Dec 13, 2023

Of course most users don’t pick the 65W option. They want maximum performance, and the cost of electricity is largely negligible to most people buying a 7950X.

AMD isn’t going to offer a huge discount for a 65W 7950X for the reasons discussed elsewhere: they don’t need to.

nottorp · on Dec 14, 2023

> and the cost of electricity is largely negligible to most people buying a 7950X.

“I can afford it” is still wasteful. Most people wouldn’t even notice the speed difference in 65 W mode. Even if they can afford the space heater mode.

coder543 · on Dec 13, 2023

This page might be somewhat helpful: https://cloud.google.com/vertex-ai/docs/generative-ai/image/...

It also includes a link to the TTP form, although the form itself seems to make no reference to Imagen being part of the program anymore, confusingly. (Instead indicating that Imagen is GA.)

a1o · on Dec 13, 2023

> GA.

Generally Available?

asimpletune · on Dec 13, 2023

coder543 · on Dec 8, 2023

> 96GB of weights. You won't be able to run this on your home GPU.

This seems like a non-sequitur. Doesn't MoE select an expert for each token? Presumably, the same expert would frequently be selected for a number of tokens in a row. At that point, you're only running a 7B model, which will easily fit on a GPU. It will be slower when "swapping" experts if you can't fit them all into VRAM at the same time, but it shouldn't be catastrophic for performance in the way that being unable to fit all layers of an LLM is. It's also easy to imagine caching the N most recent experts in VRAM, where N is the largest number that still fits into your VRAM.

ttul · on Dec 8, 2023

Someone smarter will probably correct me, but I don’t think that is how MoE works. With MoE, a feed-forward network assesses the tokens and selects the best two of eight experts to generate the next token. The choice of experts can change with each new token. For example, let’s say you have two experts that are really good at answering physics questions. For some of the generation, those two will be selected. But later on, maybe the context suggests you need two models better suited to generate French language. This is a silly simplification of what I understand to be going on.

wongarsu · on Dec 8, 2023

One viable strategy might be to offload as many experts as possible to the GPU, and evaluate the other ones on the CPU. If you collect some statistics which experts are used most in your use cases and select those for GPU acceleration you might get some cheap but notable speedups over other approaches.

ttul · on Dec 8, 2023

This being said, presumably if you’re running a huge farm of GPUs, you could put each expert onto its own slice of GPUs and orchestrate data to flow between GPUs as needed. I have no idea how you’d do this…

alchemist1e9 · on Dec 8, 2023

Ideally those many GPUs could be on different hosts connected with a commodity interconnect like 10gbe.

If MOE models do well it could be great for commodity hw based distributed inference approaches.

Philpax · on Dec 8, 2023

Yes, that's more or less it - there's no guarantee that the chosen expert will still be used for the next token, so you'll need to have all of them on hand at any given moment.

read_if_gay_ · on Dec 8, 2023

however, if you need to swap experts on each token, you might as well run on cpu.

tarruda · on Dec 8, 2023

> Presumably, the same expert would frequently be selected for a number of tokens in a row

In other words, assuming you ask a coding question and there's a coding expert in the mix, it would answer it completely.

ttul · on Dec 8, 2023

See my poorly educated answer above. I don’t think that’s how MoE actually works. A new mixture of experts is chosen for every new context.

read_if_gay_ · on Dec 8, 2023

yes I read that. do you think it's reasonable to assume that the same expert will be selected so consistently that model swapping times won't dominate total runtime?

tarruda · on Dec 8, 2023

No idea TBH, we'll have to wait and see. Some say it might be possible to efficiently swap the expert weights if you can fit everything in RAM: https://x.com/brandnarb/status/1733163321036075368?s=20

numeri · on Dec 8, 2023

You're not necessarily wrong, but I'd imagine this is almost prohibitively slow. Also, this model seems to use two experts per token.

tarruda · on Dec 8, 2023

I will be super happy if this is true.

Even if you can't fit all of them in the VRAM, you could load everything in tmpfs, which at least removes disk I/O penalty.

cjbprime · on Dec 8, 2023

Just mentioning in case it helps anyone out: Linux already has a disk buffer cache. If you have available RAM, it will hold on to pages that have been read from disk until there is enough memory pressure to remove them (and then it will only remove some of them, not all of them). If you don't have available RAM, then the tmpfs wouldn't work. The tmpfs is helpful if you know better than the paging subsystem about how much you really want this data to always stay in RAM no matter what, but that is also much less flexible, because sometimes you need to burst in RAM usage.

coder543 · on Dec 6, 2023

That row says lower is better. For "word error rate", lower is definitely better.

But they also used Large-v3, which I have not ever seen outperform Large-v2 in even a single case. I have no idea why OpenAI even released Large-v3.

coder543 · on Nov 22, 2023

Do you have Bard history enabled?

coder543 · on Nov 22, 2023

You must enable Bard history for these features to work, and you must go to the extensions page and make sure they’re turned on. Arguing with a model that doesn’t have access to the extensions won’t make it suddenly use the extensions.

Summarizing my last 5 emails worked just fine with bard using that query.

coder543 · on Nov 19, 2023

Was there actually a price increase recently and not 6 months ago?

If people just want to vent about Hashicorp/Terraform, it seems like a text-post would be sufficient for that.

pyrophane · on Nov 19, 2023

It was announced a while back but the deadline to migrate to the new billing model is EOY.

coder543 · on Nov 19, 2023

Ok, that makes more sense, but if people haven't migrated away by now... the odds seem increasingly likely that they won't migrate in time to avoid that deadline.

coder543 · on Nov 16, 2023

v3 only comes in one flavor: large.

I don’t think you’re going to have a good time running the large model on a Pi of any kind.

The large models are 32x slower than the tiny models, roughly.[0]

I just tested, and whisper.cpp on my Pi 4 can transcribe the 30-second a13.wav sample (“make samples” to fetch it) in 18.5 seconds.

You can do the math… 32x = 10 minutes transcribe 30 seconds of audio with the large model. Not a good time for most people.

The Pi 5 could be 2x to 3x faster.

[0]: https://github.com/openai/whisper/blob/main/README.md#availa...

danieljanes · on Nov 16, 2023

I can confirm that we're seeing 2x to 3x faster (RPi 4 vs RPi 5) in some of our early tests

jafermarq · on Nov 16, 2023

yes. Finetuning a whisper model on a RPi 5 is ~2x faster than on the RPi 4. Other stages involving data pre-processing with HF dataset is again 2x-3x faster.

coder543 · on Nov 14, 2023

The submission link is weird. It has far fewer stars than the repo it is forked from, and could just be an ad for replicate.com?

It is missing the most recent commits from what appears to be the real source: https://github.com/Vaibhavs10/insanely-fast-whisper

The only added commit is adding a replicate.com example, whatever that means.

bfirsh · on Nov 14, 2023

Founder of Replicate here. We open pull requests on models[0] to get them running on Replicate so people can try out a demo of the model and run them with an API. They're also packaged with Cog[1] so you can run them as a Docker image.

Somebody happened to stumble across our fork of the model and submitted it. We didn't submit it nor intend for it to be an ad. I hope the submission gets replaced with the upstream repo so the author gets full credit. :)

[0] https://github.com/Vaibhavs10/insanely-fast-whisper/pull/42

[1] https://github.com/replicate/cog

idonotknowwhy · on Nov 14, 2023

I'm curious, How did you know about this thread here? I've seen this happen where a blog or site is mentioned and the author shows up. It's there software to monitor when you're mentioned on HN or did you just happen to browse it?

bfirsh · on Nov 14, 2023

I read Hacker News a lot...

arvidkahl · on Nov 14, 2023

You might find https://syften.com/ interesting. I use it for monitoring Reddit and all kinds of communities for mentions of my name and the titles of my books.

dang · on Nov 14, 2023

Ok, we've changed to that from https://github.com/chenxwh/insanely-fast-whisper. Thanks!