All signs are that they are doing exactly that. They already have an on-device LLM which powers certain features, and I expect they will have a better-trained version of that on-device model that comes out with the "new Siri" update.
In the original announcement of the Siri revamped a couple of years ago, they specifically talked about having the on-device model handle everything it can, and only using the cloud models for the harder or more open ended questions.
I have tried 3 different Macs with different versions of macOS prior to looking for a workaround, and everywhere the result is the same: old photos are not downloaded automatically from iCloud, and there is no button to start this process - for this exact reason.
Want to prove me wrong? Create a new macOS user and open Photos with your iCloud. It will be empty until you start copying photos from your phone. It will take much less time than arguing here.
You're arguing with a lot of people who have personally seen this work. You can listen to other people. You can also go to an Apple Store and let them show you what's going wrong here.
Perhaps no one here has tried to download an entire iCloud library at once, or perhaps size is an issue, but that doesn't change the fact that there is no download button for iCloud Photos and iCloud Photos Downloader simply solves this. That's what this post is about.
I can personally confirm I've downloaded an entire iCloud library at once, to a brand new Mac, using the 'Download Originals to this Mac' option. As have many others here, I would think.
That's literally what that option is for.
If it's not working for you, you might be dealing with a bug, or perhaps you haven't given it enough time to sync. If you go to Photos > Library and scroll down, it should show you the sync status.
Thanks, that was a relief because I realised I didn't see the sync status at the bottom. It appears that Monterey hides the status message at the bottom by default, and I had to pull the page down twice to see it.
Long story short, iCloud wasn't syncing photos "due to performance" and this message was hidden.
No worries! I don't understand why Apple is so averse to surfacing the status of things, especially highly sensitive and finicky things like online sync. It would dramatically improve the feel of the software if it didn't seem like it just inexplicably wasn't working half the time.
iCloud Photos Downloader is an option, yes, but it is incorrect to say that Apple does not provide an official way to do this on Mac. Again, I direct you to the Apple Store so someone can show you in person, since you won't listen to anyone on here.
Photos on MacOS indeed synchronise photos with iCloud.
After our conversation I had tried to understand why I indeed don't see any status and I found out, that to get one in Monterey iOS I must need to scroll down of the collection and after, at the bottom pul whole page for the second time. Status message appears and it was saying that syncing was disabled due to Mac performance (I didn't asked for this).
Apologies, for misleading, code543 and thank you for consistence.
However, I must admit that I'm happy that found iCloud Photos Downloader as a result, also I liked that it's downloading all photos in date/folder structure.
Let me be one more voice telling you that you are wrong. I just did this morning.
In settings, "download originals to this mac", select all photos, file -> "export unmodified originals" will trigger the Photos app to download every file from iCloud into your local library (as well as exporting them to wherever you want)
I guess "there is no download button" but dude...I don't need iCloud Photos Downloader.
Thanks for letting me know. May I ask what macOS version you use?
Unfortunately, I'm unable to locate any button, status bar, or option to refresh or pull everything from iCloud in macOS Photos. There aren't even any details showing what percentage of iCloud is currently synchronized with macOS Photos. With nothing to debug, I can only conclude that for some reason the sync isn't working in my case.
It's great if this works for you and you don't need iCloud Photos Downloader, but for some reason I don't have that luxury.
As long as you are signed into the Mac with the same iCloud account used on the iPhone, this will download them all. No, you do not need to get them all downloaded to the iPhone ever for any reason for this to work. Period. You need to stop repeating that, because it is wrong. How many people have to say the same thing?
Yes, you will have to go into a hidden folder to access the Originals once they're downloaded if you want to copy them somewhere else, but it's like two clicks.
I've been using Mac since Mac OS X 10.4 (~2005) and was under the same impression.
However, in reality, when you use the same Apple account on both devices with the Photos app on macOS (yes, with the 'Download Originals' checkbox enabled), it only downloads photos that you upload from your phone.
And if you look at the iCloud tab in the Photos app, it says 'Automatically _upload_ and store all your photos and videos in iCloud', so it works from Mac to iCloud, and doesn't help to download full iCloud library.
No, you are not correct. How many people have to tell you this?
It absolutely works the way I said it does, because I have seen it work that way. Just because you accidentally turned off iCloud Photos in your Apple Account settings on that Mac (or some other similar issue) does not mean it does not work this way when properly signed in.
If you want something to try, go to System Settings -> Apple Account -> Photos and see if "Sync this mac" is turned off. It needs to be on. There could be other ways that this feature is disabled, but that is one of them.
Not seeing something work is not evidence that it does not work. You have not seen it work, but that is not proof it does not work.
Seeing it work is evidence that it works. I have seen it work.
Other people have seen it work that way, and their replies are all over this thread. Apple documents that it works this way.
Yes, it will upload photos to iCloud if enabled, but it also downloads them.
When you take a new photo, it synchronises with all your devices, and therefore you see it on your Mac, iPhone, etc. However, if you get a new Mac (I got one because my library was under capacity), Photos will not start synchronising your 10-year-old photos until you process them through the phone.
Every app that uses tailwind builds a custom CSS bundle. Tailwind Labs does not host those; whoever is making the app has to figure out their own hosting. So I’m not seeing the significant infrastructure costs?
Even if Tailwind were a shared hosted system like the common bootstrap CDNs of old… CDNs are dirt cheap for a small text file, even if it were loaded billions of times a month.
Some back of the napkin math suggests that it would cost about $300 per billion downloads for the current bootstrap.min.css file (gzip compressed, naturally) at North American network prices on one CDN I’ve used before. Or just $150 per billion globally if you're willing to use fewer PoPs. With browser caching, even split per domain, a billion downloads covers a very large number of users for a very large number of page loads.
Zero-shot means zero-retraining, so think along the lines of "Do you need to modify the weights? Or can you keep the weights fixed and you only need to supply an example?"
> Zero-shot means zero-retraining, so think along the lines of "Do you need to modify the weights? Or can you keep the weights fixed and you only need to supply an example?"
I would caution that using the term "example" suggests further learning happens at inference-time, which isn't the case.
For LLMs, the entire prompt is the input and conveys both the style and the content vectors. In zero-shot voice cloning, we provide the exact same inputs vectors but just decoupled. Providing reference audio is no different than including "Answer in the style of Sir Isaac Newton" in an LLM's prompt. The model doesn't 'learn' the voice; it simply applies the style vector to the content during the forward pass.
Providing inference-time context (in this case, audio) is no different than giving a prompt to an LLM. Think of it as analogous to an AGENTS.md included in a prompt. You're not retraining the model, you're simply putting the rest of the prompt into context.
If you actually stopped and fine-tuned the model weights on that single clip, that would be one-shot learning.
In the olden days of 2023, you didn’t just find instruct-tuned models sitting on every shelf.
You could use a base model that has only undergone pretraining and can only generate text continuations based on the input it receives. If you provided the model with several examples of a question followed by an answer, and then provided a new question followed by a blank for the next answer, the model understood from the context that it needed to answer the question. This is the most primitive use of ICL, and a very basic way to achieve limited instruction following behavior.
With this few-shot example, I would call that few-shot ICL. Not zero shot, even though the model weights are locked.
But, I am learning that it is technically called zero shot, and I will accept this, even if I think it is a confusingly named concept.
It’s nonsensical to call it “zero shot” when a sample of the voice is provided. The term “zero shot cloning” implies you have some representation of the voice from another domain - e.g. a text description of the voice. What they’re doing is ABSOLUTELY one shot cloning. I don’t care if lots of STT folks use the term this way, they’re wrong.
I don't disagree, but that's what people started calling it. Zero-shot doesn't make sense anyway, as how would the model know what voice it should sound like (unless it's a celebrity voice or similar included in the training data where it's enough to specify a name).
> Zero-shot doesn't make sense anyway, as how would the model know what voice it should sound like (unless it's a celebrity voice or similar included in the training data where it's enough to specify a name).
It makes perfect sense; you are simply confusing training samples with inference context. "Zero-shot" refers to zero gradient updates (retraining) required to handle a new class. It does not mean "zero input information."
> how would the model know what voice it should sound like
It uses the reference audio just like a text based model uses a prompt.
> unless it's a celebrity voice or similar included in the training data where it's enough to specify a name
If the voice is in the training data, that is literally the opposite of zero-shot. The entire point of zero-shot is that the model has never encountered the speaker before.
With LLMs I've seen zero-shot used to describe scenarios where there's no example, it "take this and output JSON", while one-shot has the prompt include an example like "take this and output JSON, for this data the JSON should look like this".
Thus if you feed a the model target voice, ie an example of the desired output vouce, it sure seems like it should be classified as one-shot.
However it seems the zero-shot in voice cloning is relative to learning, and in contrast to one-shot learning[1].
So a bit overloaded term causing confusion from what I can gather.
The confusion clears up if you stop conflating contextual conditioning (prompting) with actual Learning (weight updates). For LLMs, "few-shot prompting" is technically a misnomer that stuck; you are just establishing a pattern in the context window, not training the model.
In voice cloning, the reference audio is simply the input, not a training example. You wouldn't say an image classifier is doing "one-shot learning" just because you fed it one image to classify. That image is the input. Similarly, the reference audio is the input that conditions the generation. It is zero-shot because the model's weights were never optimized for that specific speaker's manifold.
> So if you get your target to record (say) 1 hour of audio, that's a one-shot.
No, that would still be zero shot. Providing inference-time context (in this case, audio) is no different than giving a prompt to an LLM. Think of it as analogous to an AGENTS.md included in a prompt. You're not retraining the model, you're simply putting the rest of the prompt into context.
If you actually stopped and fine-tuned the model weights on that single clip, that would be one-shot learning.
> Providing inference-time context (in this case, audio) is no different than giving a prompt to an LLM.
Right... And you have 0-shot prompts ("give me a list of animals"), 1-shot prompts ("give me a list of animals, for example: a cat"), 2-shot prompts ("give me a list of animals, for example: a cat; a dog"), etc.
The "shot" refers to how many examples are provided to the LLM in the prompt, and have nothing to do with training or tuning, in every context I've ever seen.
> Right... And you have 0-shot prompts ("give me a list of animals"), 1-shot prompts ("give me a list of animals, for example: a cat"), 2-shot prompts ("give me a list of animals, for example: a cat; a dog"), etc.
> The "shot" refers to how many examples are provided to the LLM in the prompt, and have nothing to do with training or tuning, in every context I've ever seen.
In formal ML, "shot" refers to the number of samples available for a specific class during the training phase. You're describing a colloquial usage of the term found only in prompt engineering.
You can't apply an LLMism to a voice cloning model where standard ML definitions apply.
If you're okay with the images being on a CDN, why wouldn't you also be okay with the HTML and CSS also being on the CDN? Just fronting the entire static site with a pull-through CDN is an easy solution that doesn't require any complicated workflow.
I’m talking about integrating with GitHub. Publishing to Cloudflare for instance is fine, but where do you put the images between drafting and publishing?
Or do you just check in images to GitHub and call it a day?
I wasn't suggesting publishing to Cloudflare, just that if you're concerned about the complexity of the workflow of getting images into the CDN, simply fronting whatever host you're using with a CDN of some kind (which could be Cloudflare) will solve that.
Usually you just store the images in the same git repo as the markdown. How you initially host the static site once generated is up to you.
The problem with storing binaries in Git is when they change frequently, since that will quickly bloat the repo. But, images that are part of the website will ~never change over time, so they don't really cause problems.
> You’re talking to me like a total idiot, having assumed I know nothing about this.
Sorry I tried to help? If that's the response I get for helping, good luck...
> All I meant was a way to avoid storing images in git, the rest is quite simple.
There is no good way to do that, and no way that I would recommend. Git is the correct solution, if that is where you are storing the markdown. No fancy git tools are required.
I commit the images alongside the markdown files in GitHub. My site is has numerous images and there are logical groups of posts. I make those logical groups of posts a git submodule, so I don't have all posts on my machine (or iPad) at one time.
Working Copy (git for iPad) handles submodules reasonably well, I have a few that I'm working on cloned on it and others are not so I don't use so much space.
Is it their original launch edition keyboard, or the later refined version? The launch edition one I have is like you describe, but I hope they have improved things since then.
Yeah I think I have the original, and reading this again seems like these new ones are more "touch sensitive". It's a neat idea if they can nail the haptics.
The Llama 4 models were instruct models at a time when everyone was hyped about and expecting reasoning models. As instruct models, I agree they seemed fine, and I think Meta mostly dropped the ball by taking the negative community feedback as a signal that they should just give up. They’ve had plenty of time to train and release a Llama-4.5 by now, which could include reasoning variants and even stronger instruct models, and I think the community would have come around. Instead, it sounds like they’re focusing on closed source models that seem destined for obscurity, where Llama was at least widely known.
On the flip side, it also shows how damaging echo chambers can be, where relatively few people even gave the models a chance, just repeating the negativity they heard from other people and downvoting anyone who voiced a different experience.
I think this was exacerbated by the fact that Llama models had previously come in small, dense sizes like 8B that people could run on modest hardware, where even Llama 4 Scout was a large model that a lot of people in the community weren’t prepared to run. Large models seem more socially accepted now than they were when Llama 4 launched.
Large MoE models are more socially accepted because medium/large sized MoE models can still be quite small wrt. expert size (which is what sets the amount of required VRAM). But a large dense model is still challenging to get to run.
Specdec works well for code, so the prompt I used was "Write a React TypeScript demo".
prompt eval time = 313.70 ms / 40 tokens (7.84 ms per token, 127.51 tokens per second)
eval time = 46278.35 ms / 913 tokens (50.69 ms per token, 19.73 tokens per second)
total time = 46592.05 ms / 953 tokens
draft acceptance rate = 0.87616 (757 accepted / 864 generated)
The draft model cannot affect the quality of the output. A good draft model makes token generation faster, and a bad one would slow things down, but the quality will be the same as the main model either way.
In the original announcement of the Siri revamped a couple of years ago, they specifically talked about having the on-device model handle everything it can, and only using the cloud models for the harder or more open ended questions.
reply