Yeah, I bought a used Mac Studio (an M1, to be fair, but still a Max and things haven't changed since) hoping to be able to run a decent LLM on it, and was sorely disappointed thanks to the prompt processing speed especially.
No offense to you personally, but I find it very funny when people hear marketing copy for a product and think it can do anything they said it can.
Apple silicon is still just a single consumer grade chip. It might be able to run certain end user software well, but it cannot replace a server rack of GPUs.
I don’t think this is a fair take in this particular situation. My comment is in response to Simon Willison, who has a very popular blog in the LLM space. This isn’t company marketing copy; it’s trusted third parties spreading this misleading information.
I'm still ambivalent on the rest of the AI features, but the AI translation is absolutely amazing. The translation quality isn't perfect, but being able to seamlessly translate 20+ languages 100% locally is amazing.
That translation app is so cool, exactly what I've always been looking for (offline + camera integration + clean UI). Thanks for putting in the work and for putting it on F-Droid even!
Agree, for us switching between languages all the time, with some of those languages being less known to us, it's a great tool!
My only wish was that I can force it to always allow me to try translating things, even if it doesn't identify it as some specific language. Sometimes what I want to translate is like 30% one language and 70% another language, and I still want to translate it to another language, but since the tool doesn't see it as "foreign enough" or something, I don't even get the choice of having it translated.
Besides that, it's a wonderful despite it not being perfect. Hopefully with time it'll only get better as they get more data. On that note, I'd be more than happy to contribute data if they added some way of giving "good translation / bad translation" feedback, but haven't seen that. I guess I had two wishes in the end.
If you select a chunk of text in the page and right click there should be a context-menu option to translate the text. It's a popup with a textarea and not in-situ, but it's the same local model as far as I can tell
> By contrast, ex situ methods involve the removal or displacement of materials, specimens, or processes for study, preservation, or modification in a controlled setting, often at the cost of contextual integrity.
Might as well use the correct words if you want to talk above people's heads.
First: No need to be rude, "in situ" is a very commonly used phrase among English speakers, as should be evident from the Wikipedia article [1] you yourself cited
Second: The normal Firefox translate feature replaces the text in the page with the translated text - retaining its styling, position, context w/ images, etc. The right click menu, does not. I described the right click menu as "not in situ" which is correct.
I agree, I'm generally sceptical of new AI "features" in the browser and will be turning most of them off. But the translation feature (which has been in Firefox for a while now) is great. The difference is that translation in a browser is something that is clearly useful and has always been AI-based to an extent, so shipping with a local model for translation is a strict improvement (leaving aside any difference in translation quality, which I have not noticed). The other AI features are not obviously useful IMO.
Aw thanks! We don't currently, but from a cost perspective as a user it shouldn't matter much since it's all bundled into the same subscription (we rate-limit by requests, not by tokens — our request rate limits are set to "higher than the amount of messages per hour that Claude Code promises", haha). We might at some point just to save GPUs though!
Yeah I wasn't worried so much about costs to me, as sustainability of your own prices — don't want to run into a "we're lowering quotas" situation like CC did :P
Lol fair! I think we're safe for now; our most popular model (and my personal favorite coding model) is GLM-4.5, which fits on a ~relatively small node compared to the rumored sizes of Anthropic's models. We can throw a lot of tokens at it before running into issues — it's kind of nice to launch without prompt caching, since it means if we're flying too close to the sun on tokens we still have some pretty large levers left to pull on the infra side before needing to do anything drastic with rate limits.
> I think we're safe for now; our most popular model (and my personal favorite coding model) is GLM-4.5,
That's funny, that's also my favorite coding model as well!
> the rumored sizes of Anthropic's models
Yeah. I've long had a hypothesis that their models are, like, average sized for a SOTA model, but fully dense, like that old llama 3.1 405b model, and that's why their per token inference costs are insane compared to the competition.
> it's kind of nice to launch without prompt caching, since it means if we're flying too close to the sun on tokens we still have some pretty large levers left to pull on the infra side before needing to do anything drastic with rate limits.
That makes sense.
I'm poor as dirt, and my job actually forbids AI code in the main codebase, so I can't justify even a $20 per month prescription right now (especially when, for experimenting with agentic coding, qwen code is currently free (if shitty)) but when or if it becomes financially responsible, you will be at the very top of my list.
> Unless I'm missing something, you'll have a bunch of links laying around?
This is true, but it's better than files in that it's a single tap and everything is instantly merged into your existing local storage, instead of you having multiple files downloaded local; and next time you export a link, it will be that merged version, with the other person's comments plus your new ones. So there's a single linear stream of links where the latest link is the correct version for both people at all times, and there's only one version on each person's local hard drive, stored at local storage, instead of file V1, V2, V3, etc. Idk if that makes sense.
> Merge?
Yeah basically. It rolls everything up each time.
> Edit: oh, there's a resolve thing. So presumably you'd get links from other people and resolve them? Is that tracked anywhere?
I initially decided not to have resolution be tracked, but if you think it would help with the sharing process, then I could totally do that!
This is actually a really good point. It isn't thay GenAI is useless in business — just that businesses don't know how to build, deploy, and use it usefully at an organization level. Kind of an example of Karpathy's "Power to the people" thesis that individual productivity benefits more from GenAI than corporate.
Mixture of Experts. Llama 3.1 405B is a dense model, so to evaluate the next token, the context has to go through literally every parameter in its neural network. Whereas with mixture of experts, it's usually like a sixth to a tenth or even less of the neural network parameters that actually get evaluated for every token. Also, they don't use GPT-4 or 4.5 anymore iirc, which may have been dense (and that's why they were so expensive), 4.1 and 4o are much different models.
You're grouping those words wrong. As another commenter pointed out to you, which you ignored, it's median (Gemini Apps) not (median Gemini) Apps. Gemini Apps is a highly specific thing — with a legal definition even iirc — that does not include search, and encompasses a list of models you can actually see and know.
I didn't ignore it, I actually spent some time researching to find out what Google means by "Gemini Apps" (plural) and whether it includes search AI overview, and I can't get a clear answer anywhere.
Of course, Gemini App (singular) means the mobile app. But it seems that the term Gemini Apps (plural) is being used by Google to refer to any way in which users can access the Gemini models, and also they do clearly state that a version of Gemini isused to generate the search overviews.
So it still seems reasonably likely, until they confirm otherwise, that this median includes search overview.
No, because unless they state otherwise we should assume that they consider search overview to be an AI assistant (they definitely believe this) and also that it's one of the Gemini Apps.
Look, there's not enough information to answer this within the paper. I'm not willing to give Google the benefit of the doubt on vague language, and you are. I'm assuming they're a huge basicappy evil corporation whose every publication is gone over and reworded by marketing to make them look good, and you're assuming... whatever.