Nope... this stuff is 96.5% copper, and copper is ~3x as expensive as stainless steel. Even if tantalum and lithium were free, it would be substantially more expensive. Tantalum is not free, though. It's a very expensive material at about 100x the cost per kg relative to stainless steel, so it nearly doubles the cost of the raw material inputs by itself with its 3% contribution. The process of making this alloy is also likely to be expensive.
I'm also not sure how much being in an alloy would impact the antimicrobial effects of copper.
Well, this could dramatically increase the demand for tantalum, which (econ 101) could dramatically increase the supply over time? Is tantalum in much demand today?
Huge demand for copper hasn’t brought its price down to the price of stainless steel, has it? Most definitely not, so it seems like Econ 101 was incomplete. Not all goods are perfectly elastic. Inelastic goods do not get cheaper with more demand.
Tantalum is in demand today, yes. Tantalum capacitors are a well known application, but it is used in all sorts of things.
My point was that even if tantalum were free, a material that is 96.5% copper is still not going to be significantly cheaper than copper, which I think is a pretty self-evident outcome.
Copper has been in high demand for centuries. Lithium might be a more similar situation to tantalum, huge spikes in demand in the last decade have absolutely floored prices
I downloaded a sample RAF file from the internet. It was 16 megapixels, and 34MB in size. I used Adobe DNG Converter, and it created a file that was 25MB in size. It was actually smaller.
Claiming that DNG takes up 4x space doesn't align with any of my own experiences, and it didn't happen on the RAF file that I just tested.
Is it drawing the image from top to bottom very slowly over the course of at least 30 seconds? If not, then you're using DALL-E, not 4o image generation.
This top to bottom drawing – does this tell us anything about the underlying model architecture? AFAIK diffusion models do not work like that. They denoise the full frame over many steps. In the past there used to be attempts to slowly synthetize a picture by predicting the next pixel, but I wasn't aware whether there has been a shift to that kind of architecture within OpenAI.
Yes, the model card explicitly says it's autoregressive, not diffusion. And it's not a separate model, it's a native ability of GPT-4o, which is a multimodal model. They just didn't made this ability public until now. I assume they worked on the fine-tuning to improve prompt following.
If it tried and failed repeatedly, then it was prompting DALL-E, looking at the results, then prompting DALL-E again, not doing direct image generation.
No... OpenAI said it was "rolling out". Not that it was "already rolled out to all users and all servers". Some people have access already, some people don't. Even people who have access don't have it consistently, since it seems to depend on which server processes your request.
On the web version, click on the image to make it larger. In the upper right corner, there is an (i) icon, which you can click to reveal the DALL-E prompt that GPT-4o generated.
"B" just means "billion". A 7B model has 7 billion parameters. Most models are trained in fp16, so each parameter takes two bytes at full precision. Therefore, 7B = 14GB of memory. You can easily quantize models to 8 bits per parameter with very little quality loss, so then 7B = 7GB of memory. With more quality loss (making the model dumber), you can quantize to 4 bits per parameter, so 7B = 3.5GB of memory. There are ways to quantize at other levels too, anywhere from under 2 bits per parameter up to 6 bits per parameter are common.
There is additional memory used for context / KV cache. So, if you use a large context window for a model, you will need to factor in several additional gigabytes for that, but it is much harder to provide a rule of thumb for that overhead. Most of the time, the overhead is significantly less than the size of the model, so not 2x or anything. (The size of the context window is related to the amount of text/images that you can have in a conversation before the LLM begins forgetting the earlier parts of the conversation.)
The most important thing for local LLM performance is typically memory bandwidth. This is why GPUs are so much faster for LLM inference than CPUs, since GPU VRAM is many times the speed of CPU RAM. Apple Silicon offers rather decent memory bandwidth, which makes the performance fit somewhere between a typical Intel/AMD CPU and a typical GPU. Apple Silicon is definitely not as fast as a discrete GPU with the same amount of VRAM.
That's about all you need to know to get started. There are obviously nuances and exceptions that apply in certain situations.
A 32B model at 5 bits per parameter will comfortably fit onto a 24GB GPU and provide decent speed, as long as the context window isn't set to a huge value.
Assuming the same model sizes in gigabytes, which one to choose: a higher-B lower-bit or a lower-B higher-bit? Is there a silver bullet? Like “yeah always take 4-bit 13B over 8-bit 7B”.
Or are same-sized models basically equal in this regard?
I would say 9 times out of 10, you will get better results from a Q4 model that’s a size class larger than a smaller model at Q8. But it’s best not to go below Q4.
But it’s still surprising they haven’t. People would be motivated as hell if they launched GPUs with twice the amount of VRAM. It’s not as simple as just soldering some more in but still.
They sort of have. I'm using a 7900xtx, which has 24gb of vram. The next competitor would be a 4090, which would cost more than double today; granted, that would be much faster.
Technically there is also the 3090, which is more comparable price wise. I don't know about performance, though.
VRAM is supply limited enough that going bigger isn't as easy as it sounds. AMD can probably sell as much as they get their hands on, so they may as well still more GPUs, too.
Also true. Tap on any text field. In the menu where the "paste" option lives, there is also a Scan Text option. I've used that for a number of things over the years.
On iPhone, WiFi QR codes work just fine. You just open the camera app, and point the phone at the QR code. They're automatically detected and scanned, the same as any other normal QR code. (No, you can't open the camera app during initial setup... but, it's not for a lack of the standard or the feature.)
A back of the napkin calculation: 819GB/s / 37GB/tok = 22 tokens/sec.
Realistically, you’ll have to run quantized to fit inside of the 512GB limit, so it could be more like 22GB of data transfer per token, which would yield 37 tokens per second as the theoretical limit.
It is likely going to be very usable. As other people have pointed out, the Mac Studio is also not the only option at this price point… but it is neat that it is an option.
I'm also not sure how much being in an alloy would impact the antimicrobial effects of copper.