More

coder543 · 2025-04-28T02:29:22 1745807362

Nope... this stuff is 96.5% copper, and copper is ~3x as expensive as stainless steel. Even if tantalum and lithium were free, it would be substantially more expensive. Tantalum is not free, though. It's a very expensive material at about 100x the cost per kg relative to stainless steel, so it nearly doubles the cost of the raw material inputs by itself with its 3% contribution. The process of making this alloy is also likely to be expensive.

I'm also not sure how much being in an alloy would impact the antimicrobial effects of copper.

kragen · 2025-04-28T03:06:08 1745809568

You're right about the cost angle, though it might be cheaper than stellite, inconel, monel, that kind of thing.

Generally copper does retain its antibacterial properties in alloys where it's a high proportion of the alloy, like this one.

thehappypm · 2025-04-28T19:27:23 1745868443

Well, this could dramatically increase the demand for tantalum, which (econ 101) could dramatically increase the supply over time? Is tantalum in much demand today?

coder543 · 2025-04-28T19:36:42 1745869002

Huge demand for copper hasn’t brought its price down to the price of stainless steel, has it? Most definitely not, so it seems like Econ 101 was incomplete. Not all goods are perfectly elastic. Inelastic goods do not get cheaper with more demand.

Tantalum is in demand today, yes. Tantalum capacitors are a well known application, but it is used in all sorts of things.

My point was that even if tantalum were free, a material that is 96.5% copper is still not going to be significantly cheaper than copper, which I think is a pretty self-evident outcome.

thehappypm · 2025-04-29T10:18:20 1745921900

Copper has been in high demand for centuries. Lithium might be a more similar situation to tantalum, huge spikes in demand in the last decade have absolutely floored prices

coder543 · 2025-04-18T01:41:08 1744940468

"When you have thinking turned on, all output tokens (including thoughts) are charged at the $3.50 / 1M rate"[0]

[0]: https://x.com/OfficialLoganK/status/1912981986085323231

coder543 · 2025-04-07T15:08:59 1744038539

I downloaded a sample RAF file from the internet. It was 16 megapixels, and 34MB in size. I used Adobe DNG Converter, and it created a file that was 25MB in size. It was actually smaller.

Claiming that DNG takes up 4x space doesn't align with any of my own experiences, and it didn't happen on the RAF file that I just tested.

coder543 · 2025-03-25T21:41:46 1742938906

Is it drawing the image from top to bottom very slowly over the course of at least 30 seconds? If not, then you're using DALL-E, not 4o image generation.

uh_uh · 2025-03-25T23:56:43 1742947003

This top to bottom drawing – does this tell us anything about the underlying model architecture? AFAIK diffusion models do not work like that. They denoise the full frame over many steps. In the past there used to be attempts to slowly synthetize a picture by predicting the next pixel, but I wasn't aware whether there has been a shift to that kind of architecture within OpenAI.

cubefox · 2025-03-26T09:27:15 1742981235

Yes, the model card explicitly says it's autoregressive, not diffusion. And it's not a separate model, it's a native ability of GPT-4o, which is a multimodal model. They just didn't made this ability public until now. I assume they worked on the fine-tuning to improve prompt following.

thesparks · 2025-03-26T01:57:14 1742954234

apparently it's not diffusion, but tokens

coder543 · 2025-03-25T21:22:23 1742937743

If it tried and failed repeatedly, then it was prompting DALL-E, looking at the results, then prompting DALL-E again, not doing direct image generation.

Imustaskforhelp · 2025-03-26T02:27:47 1742956067

So it's not doing what they are saying/ advertising, I think you are onto something big then

coder543 · 2025-03-26T02:30:01 1742956201

No... OpenAI said it was "rolling out". Not that it was "already rolled out to all users and all servers". Some people have access already, some people don't. Even people who have access don't have it consistently, since it seems to depend on which server processes your request.

coder543 · 2025-03-25T21:21:38 1742937698

To be clear, that is DALL-E, not 4o image generation. (You can see the prompt that 4o generated to give to DALL-E.)

spuz · 2025-03-26T00:18:57 1742948337

How can you see this? I don't see it.

coder543 · 2025-03-26T00:29:47 1742948987

On the web version, click on the image to make it larger. In the upper right corner, there is an (i) icon, which you can click to reveal the DALL-E prompt that GPT-4o generated.

coder543 · 2025-03-24T20:14:43 1742847283

"B" just means "billion". A 7B model has 7 billion parameters. Most models are trained in fp16, so each parameter takes two bytes at full precision. Therefore, 7B = 14GB of memory. You can easily quantize models to 8 bits per parameter with very little quality loss, so then 7B = 7GB of memory. With more quality loss (making the model dumber), you can quantize to 4 bits per parameter, so 7B = 3.5GB of memory. There are ways to quantize at other levels too, anywhere from under 2 bits per parameter up to 6 bits per parameter are common.

There is additional memory used for context / KV cache. So, if you use a large context window for a model, you will need to factor in several additional gigabytes for that, but it is much harder to provide a rule of thumb for that overhead. Most of the time, the overhead is significantly less than the size of the model, so not 2x or anything. (The size of the context window is related to the amount of text/images that you can have in a conversation before the LLM begins forgetting the earlier parts of the conversation.)

The most important thing for local LLM performance is typically memory bandwidth. This is why GPUs are so much faster for LLM inference than CPUs, since GPU VRAM is many times the speed of CPU RAM. Apple Silicon offers rather decent memory bandwidth, which makes the performance fit somewhere between a typical Intel/AMD CPU and a typical GPU. Apple Silicon is definitely not as fast as a discrete GPU with the same amount of VRAM.

That's about all you need to know to get started. There are obviously nuances and exceptions that apply in certain situations.

A 32B model at 5 bits per parameter will comfortably fit onto a 24GB GPU and provide decent speed, as long as the context window isn't set to a huge value.

wruza · 2025-03-25T01:20:10 1742865610

Oh, I have a question, maybe you know.

Assuming the same model sizes in gigabytes, which one to choose: a higher-B lower-bit or a lower-B higher-bit? Is there a silver bullet? Like “yeah always take 4-bit 13B over 8-bit 7B”.

Or are same-sized models basically equal in this regard?

anon373839 · 2025-03-25T06:07:31 1742882851

I would say 9 times out of 10, you will get better results from a Q4 model that’s a size class larger than a smaller model at Q8. But it’s best not to go below Q4.

nenaoki · 2025-03-25T10:22:48 1742898168

My understanding is that models are currently undertrained and not very "dense", so Q4 doesn't hurt very much now but it may in future denser models.

anon373839 · 2025-03-25T15:38:56 1742917136

That may well be true. I know that earlier models like Llama 1 65B could tolerate more aggressive quantization, which supports that idea.

epolanski · 2025-03-24T22:41:33 1742856093

So, in essence, all AMD does to launch a successful GPU in inference space is to load it with ram?

TrueDuality · 2025-03-24T23:03:38 1742857418

AMD's limitation is more of a software problem than a hardware problem at this point.

AuryGlenz · 2025-03-25T01:15:21 1742865321

But it’s still surprising they haven’t. People would be motivated as hell if they launched GPUs with twice the amount of VRAM. It’s not as simple as just soldering some more in but still.

wruza · 2025-03-25T01:27:54 1742866074

AMD “just” has to write something like CUDA overnight. Imagine you’re in 1995 and have to ship Kubuntu 24.04 LTS this summer running on your S3 Virge.

mirekrusin · 2025-03-25T05:11:51 1742879511

They don't need to do anything software wise, inference is solved problem for AMD.

thomastjeffery · 2025-03-25T14:16:15 1742912175

They sort of have. I'm using a 7900xtx, which has 24gb of vram. The next competitor would be a 4090, which would cost more than double today; granted, that would be much faster.

Technically there is also the 3090, which is more comparable price wise. I don't know about performance, though.

VRAM is supply limited enough that going bigger isn't as easy as it sounds. AMD can probably sell as much as they get their hands on, so they may as well still more GPUs, too.

regularfry · 2025-03-25T11:33:03 1742902383

Funnily enough you can buy GPUs where someone has done exactly that: solder extra VRAM into a stock model.

yencabulator · 2025-03-25T17:48:42 1742924922

Or let go of the traditional definition of a GPU, and go integrated. AMD Ryzen AI Max+ 395 with 128GB RAM is a promising start.

coder543 · 2025-03-16T20:36:52 1742157412

Also true. Tap on any text field. In the menu where the "paste" option lives, there is also a Scan Text option. I've used that for a number of things over the years.

frizlab · 2025-03-18T12:32:12 1742301132

I’ve just setup a new phone, this is not enabled during the phone setup for some reason…

coder543 · 2025-03-16T20:34:02 1742157242

On iPhone, WiFi QR codes work just fine. You just open the camera app, and point the phone at the QR code. They're automatically detected and scanned, the same as any other normal QR code. (No, you can't open the camera app during initial setup... but, it's not for a lack of the standard or the feature.)

coder543 · 2025-03-05T15:01:53 1741186913

DeepSeek-R1 only has 37B active parameters.

A back of the napkin calculation: 819GB/s / 37GB/tok = 22 tokens/sec.

Realistically, you’ll have to run quantized to fit inside of the 512GB limit, so it could be more like 22GB of data transfer per token, which would yield 37 tokens per second as the theoretical limit.

It is likely going to be very usable. As other people have pointed out, the Mac Studio is also not the only option at this price point… but it is neat that it is an option.