This photography question can be solved with the right equations. A lot of non-reasoning LLMs would spout some nonsense like 0.67 stops faster. Sometimes they’ll leave a stray negative sign in too!
The answer should be approximately 1.37, although “1 and 1/3” is acceptable too.
LLMs usually don’t have trouble coming up with the formulas, so it’s not a particularly obscure question, just one that won’t have a memorized answer, since there are very few f/4.5 lenses on the market, and even fewer people asking this exact question online. Applying those formulas is harder, but the LLM should be able to sanity check the result and catch common errors. (f/2.8 -> f/4 is one full stop, which is common knowledge among photographers, so getting a result of less than one is obviously an error.)
This also avoids being a test that just emphasizes tokenizer problems… I find the strawberry test to be dreadfully boring. It’s not a useful test. No one is actually using LLMs to count letters in words, and until we have LLMs that can actually see the letters of each word… it’s just not a good test, in my opinion. I’m convinced that the big AI labs see it as a meme at this point, which is the only reason they keep bringing it up. They must find the public obsession with it hilarious.
I was impressed at how consistently well Phi-4 did at my photography math question, especially for a non-reasoning model. Phi-4 scored highly on math benchmarks, and it shows.
The limitation is clearly not Rust. Any language that can bind to C libraries can bind to the functions PRQL exposes... the authors just haven't chosen to implement convenient SDKs for many languages. They also list 8 languages, not 4, just that they've had time to polish the libraries for 4 languages, apparently.
PRQL appears to be a rather small project... not some major corporate effort.
Yes, that's precisely it. PRQL is a completely volunteer driven project by folks who had enough of the thousands of paper cuts from SQL and felt that we deserved something better after 50 years. Throw away the SQL syntax and keep what people usually like about SQL - declarative, relational operators - plus add functions and composition.
The main limitation is developer time. There is so much that could be done with PRQL! Without corporate sponsor, parent company, or more contributors, velocity is unfortunately limited. If you'd like to see that change, please reach out!
Sorry, by my comment I meant that if PRQL exposed a C API (as the parent commenter claims it does), I’d write bindings for the languages I use that can consume C libraries. Unfortunately I’m not proficient enough with Rust to create the C API myself.
Right, that's not my forte as well. Seems like quite a key enabler though so let me see where we are on that.
There was also a helpful comment on a HN thread a few weeks back about how to make the API better to develop against. I've been meaning to get back to that but been constrained myself.
I suspect there is a communications breakdown happening here. I'll try to clarify what I was saying, since I think I did a poor job.
In Rust, when you define a `#[no_mangle] pub unsafe extern "C"` function, and then compile as a shared object / dll, that function will be exposed in an ABI-compatible way the same as any C function would be. It's just a matter of defining the proper header file so that you can use it from a C program, or from any other programming language that can bind to C.
Writing a header file manually is boring and error-prone, so people will often autogenerate the header file for the exposed functions using a tool like cbindgen: https://github.com/mozilla/cbindgen
And that C header file -- combined with the compiled library -- should be all that is needed.
I suspect that the PRQL maintainer is saying that they want to offer a more idiomatic binding for C. The raw API that is exposed may not be the most user-friendly API, especially since they don't seem to have much familiarity with what is considered "idiomatic" in C, so they haven't been ready to commit to that API being considered "stable" yet. Based on my own poking around in their existing bindings... that C binding appears to be the API that they are using internally in those other language bindings already. (I'm also not sure how else they would be creating most of those bindings, if they weren't using that C binding... apart from some special cases, like how there is a convenient alternative for exposing bindings to Python from Rust, for example.)
C# does not allow directly using a C header file, so it requires manually re-defining the same set of extern function signatures, but it appears to be the same.
I'm not an expert on PRQL by any means, and it's been a few years since I really used Rust, but I'm just piecing together what I can see here.
Rust code normally does not adhere to a C-compatible ABI, but the purpose of these "extern" functions is to do exactly that when you're trying to expose code that can be called by standard conventions... since the industry has largely settled on C-style functions and structs, for better or worse, with all of the limitations that imposes.
there are bespoke libraries which build on top of it like CsWin32 where you specify the methods/modules to import and get nice and, often, memory-safe types and members in C#.
I think it should be possible to enhance this even further like having '// pinvkgen: #include <some_dependency.h>' at the top of any particular C# file and getting what you need in a C-like way. There are some problems with this inline approach but it could work.
The main point is there are quite a few community packages which simplify authoring bindings (there are more, like https://github.com/Cysharp/csbindgen/ for Rust or https://github.com/royalapplications/beyondnet for Swift). It really shouldn't be a necessity to write bindings by hand for large dependencies as it's both error prone and a solved problem.
> unless using a totally separate toolchain w/ "CGo".
CGo is built into the primary Go toolchain... it's not a 'totally separate toolchain' at all, unless you're referring to the C compiler used by CGo for the C code... but that's true of every language that isn't C or C++ when it is asked to import and compile some C code. You could also write assembly functions without CGo, and that avoids invoking a C compiler.
> Means you can't optimize parts of a Golang app to dispense with GC altogether
This is also not true... by default, Go stack allocates everything. Things are only moved to the heap when the compiler is unable to prove that they won't escape the current stack context. You can write Go code that doesn't heap allocate at all, and therefore will create no garbage at all. You can pass a flag to the compiler, and it will emit its escape analysis. This is one way you can see whether the code in a function is heap allocating, and if it is, you can figure out why and solve that. 99.99% of the time, no one cares, and it just works. But if you need to "dispense with GC altogether", it is possible.
You can also disable the GC entirely if you want, or just pause it for a critical section. But again... why? When would you need to do this?
Go apps typically don't have much GC pressure in my experience because short-lived values are usually stack allocated by the compiler.
> You can write Go code that doesn't heap allocate at all
In practice this proves to be problematic because there is no guarantee whether escape analysis will in fact do what you want (as in, you can't force it, and you don't control dependencies unless you want to vendor). It is pretty good, but it's very far from being bullet-proof. As a result, Go applications have to resort to sync.Pool.
Go is good at keeping allocation profile at bay, but I found it unable to compete with C# at writing true allocation-free code.
As I mentioned in my comment, you can also observe the escape analysis from the compiler and know whether your code will allocate or not, and you can make adjustments to the code based on the escape analysis. I was making the point that you technically can write allocation-free code, it is just extremely rare for it to matter.
sync.Pool is useful, but it solves a larger class of problems. If you are expected to deal with dynamically sized chunks of work, then you will want to allocate somewhere. sync.Pool gives you a place to reuse those allocations. C# ref structs don't seem to help here, since you can't have a dynamically sized ref struct, AFAIK. So, if you have a piece of code that can operate on N items, and if you need to allocate 2*N bytes of memory as a working set, then you won't be able to avoid allocating somewhere. That's what sync.Pool is for.
Oftentimes, sync.Pool is easier to reach for than restructuring code to be allocation-free, but sync.Pool isn't the only option.
> sync.Pool is useful, but it solves a larger class of problems. If you are expected to deal with dynamically sized chunks of work, then you will want to allocate somewhere. sync.Pool gives you a place to reuse those allocations. C# ref structs don't seem to help here, since you can't have a dynamically sized ref struct, AFAIK. So, if you have a piece of code that can operate on N items, and if you need to allocate 2*N bytes of memory as a working set, then you won't be able to avoid allocating somewhere. That's what sync.Pool is for.
Ref structs (which really are just structs that can hold 'ref T' pointers) are only one feature of the type system among many which put C# in the same performance weight class as C/C++/Rust/Zig. And they do help. Unless significant changes happen to Go, it will remain disadvantaged against C# in writing this kind of code.
Only the whiskers are touching, and the same applies to several other languages too. Yes, the median is impressively low… for anything other than those three. And it is still separate.
C# has impressive performance, but it is categorically separate from those three languages, and it is disingenuous to claim otherwise without some extremely strong evidence to support that claim.
My interpretation is supported not just by the Benchmarks Game, but by all evidence I’ve ever seen up to this point, and I have never once seen anyone make claim that about C# until now… because C# just isn’t in the same league.
> Ref structs (which really are just structs that can hold 'ref T' pointers)
A ref struct can hold a lot more than that. The uniquely defining characteristic of a ref struct is that the compiler guarantees it will not leave the stack, ever. A ref struct can contain a wide variety of different values, not just ref T, but yes, it can also contain other ref T fields.
I’m saying C# as a whole, not C# on one example. But, I have already agreed that C#’s performance has become pretty impressive. I also still believe that idiomatic Rust is going to be faster than idiomatic C#, even if C# now supports really advanced (and non-idiomatic) patterns that let you rewrite chunks of code to be much faster when needed.
It would be interesting to see the box plot updated to include the naot results — I had assumed that it was already.
They have more to say than a single cherry picked benchmark from the half dozen.
I had asked the other person for additional benchmarks that supported their cause. They refused to point at a single shred of evidence. I agree the Benchmarks Game isn’t definitive. But it is substantially more useful than people making completely unsupported claims.
I find most discussions of programming language performance to be pointless, but some discussions are even more pointless than others.
I curate the benchmarks game, so to me it's very much an initial step, a starting point. It's a pity that those next steps always seem like too much effort.
This is a distribution of submissions. I suggest you look at the actual implementations and how they stack-up performance wise and what kind of patterns each respective language enables. You will quickly find out that this statement is incorrect and they behave rather closely on optimized code. Another good exercise will be to actually use a disassembler for once and see how it goes with writing performant algorithm implementation. It will be apparent that C# for all intents and purposes must be approached quite similarly with practically identical techniques and data structures as the systems programming family of languages and will produce a comparable performance profile.
> No…? https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...
A ref struct can hold a lot more than that. What’s unique about a ref struct is that the compiler guarantees it will not leave the stack, ever. A ref struct can contain all sorts of different stack-allocatable values, not just references.
Do you realize this is not a mutually exclusive statement? Ref structs are just structs which can hold byref pointers aka managed references. This means that, yes, because managed references can only ever be placed on the stack (but not the memory they point to), a similar restriction is placed on ref structs alongside the Rust-like lifetime analysis to enforce memory safety. Beyond this, their semantics are identical to regular structs.
I.e.
> C# ref structs don't seem to help here, since you can't have a dynamically sized ref struct, AFAIK
Your previous reply indicates you did not know the details until reading the documentation just now. This is highly commendable because reading documentation as a skill seems to be in short supply nowadays. However, it misses the point that memory (including dynamic, whatever you mean by this, I presume reallocations?) can originate from anywhere - stackalloc buffers, malloc, inline arrays, regular arrays or virtually any source of memory, which can be wrapped into Span<T>'s or addressed with unsafe byref arithmetics (or pinning and using raw pointers).
Ref structs help with this a lot and enable many data structures which reference arbitrary memory in a generalized way (think writing a tokenizer that wraps a span of chars, much like you would do in C but retaining GC compatibility without the overhead of carrying the full string like in Go).
You can also trivially author fully identical Rust-like e.g. Vec<T>[0] with any memory source, even on top of Jemalloc or Mimalloc (which has excellent pure C# reimplementation[1] fully competitive with the original implementation in C).
None of this is even remotely possible in any other GC-based language.
People have had a long time to submit better C# implementations. You are still providing no meaningful evidence.
> Do you realize this is not a mutually exclusive statement?
It doesn’t have to be mutually exclusive. You didn’t seem to understand why people care about ref structs, since you chose to focus on something that is an incidental property, not the reason that ref structs exist.
Brigading you is not my intent, so please don't take my comments that way.
I just want to add that C# is getting pretty fast, and it's not just because people have had a long time to submit better implementations to a benchmark site.
The language began laying the groundwork for AOT and higher performance in general with the introduction of Span<T> and friends 7 or so years ago. Since then, they have been making strides on a host of fronts to allow programmers the freedom to express most patterns expected of a low level language, including arbitrary int pointers, pointer arithmetic, typed memory regions, and an unsafe subset.
In my day-to-day experience, C# is not as fast as the "big 3" non-GCed languages (C/C++/Rust), especially in traditional application code that might use LINQ, code generation or reflection (which are AOT unfriendly features- i.e. AOT LINQ is interpreted at runtime), but since I don't tend to re-write the same code across multiple languages simultaneously I can't quantify the extent of the current speed differences.
I can say, however, that C# has been moving forward every release, and those benchmarks demonstrate that it is separating itself from the Java/Go tier (and I consider Go to be a notch or two above JITed Java, but no personal experience with GraalVM AOT yet) and it definitely feels close to the C/C++/Rust tier.
It may not ever attain even partial parity on that front, for a whole host of reasons (its reliance on its own compiler infrastructure and not a gcc or llvm based backend is a big one for me), but the language itself has slowly implemented the necessary constructs for safe (and unsafe) arbitrary memory manipulation, including explicit stack & heap allocation, and the skipping of GC, which are sort of the fundamental "costs of admission" for consideration as a high performance systems language.
I don't expect anyone to like or prefer C#, nor do I advocate forcing the language on anyone, and I really hate being such a staunch advocate here on HN (I want to avoid broken record syndrome), but as I have stated many times here, I am a big proponent of programmer ergonomics, and C# really seems to be firing on all cylinders right now (recent core library CVEs notwithstanding).
I just don’t like seeing people make bold claims without supporting evidence… those tend to feel self-aggrandizing and/or like tribalism. It also felt like a bad faith argument, so I stopped responding to that other person when there was nothing positive I could say. If the evidence existed, then they should have provided evidence. I asked for evidence.
I like C#, just as I like Go and Rust. But languages are tools, and I try to evaluate tools objectively.
> I can say, however, that C# has been moving forward every release, and those benchmarks demonstrate that it is separating itself from the Java/Go tier
I also agree. I have been following the development of C# for a very long time. I like what I have seen, especially since .NET Core. As I have mentioned in this thread already, C#’s performance is impressive. I just don’t accept a general claim that it’s as fast as Rust at this point, but not every application needs that much performance. I wish I could get some real world experience with C#, I just haven’t found any interesting tech jobs that use C#… and the job market right now doesn’t seem great, unfortunately.
I have hopes that adoption outside of stodgy enterprises will pick up, which would of course help the job situation (in due time of course).
Sometimes it's hard to shake the rep you have when you're now a ~25 year old language.
Awareness takes time. People need to be told, then they need to tinker here and there. Either they like what they see when they kick the tires or they don't.
I'm pretty language agnostic tbh, but I would like to see it become a bit more fashionable given its modernization efforts, cross-platform support, and MIT license.
Please read through the description and follow-up articles on the BenchmarksGame website. People did submit benchmarks, but submitting yet another SIMD+unsafe+full parallelization implementation is not the main goal of the project. However, this is precisely the subject (at which Go is inadequate) that we are discussing here. And for it, my suggestions in the previous comment stand.
tl;dr Async/await is what the Python community has settled on, so I don’t think there’s any point in resisting it now, but the fact that it happened is a historical curiosity from my point of view.
-----
I’m not the person you replied to, and I actually don’t have anything against async/await as a pattern where it is needed, but I know that prior to Python getting async/await, there were some moderately popular green threading / coroutine libraries like gevent and eventlet. You would (mostly) just write normal, synchronous Python, and then blocking calls would be intercepted by the runtime and allow other coroutines to take their turn. This felt Pythonic to me at the time, because most code would work in both sync and async environments. You didn’t have to write a separate async version, and you didn’t really have to update old libraries.
The other pre-async/await approach was taken by Tornado and Twisted... basically a form of callback hell. I don't think anyone liked this, but it might have been popular because it worked... and unofficial green threading implementations like gevent/eventlet sometimes broke in interesting ways. (I think officially incorporating a gevent/eventlet-style solution into Python would have overcome most of the issues... but, that's just my speculation.)
I’ve never used Python professionally outside of some short scripts, but it was one of the first languages I used seriously for hobby stuff back during the early days of the Python 3 transition. I never fully understood why Python chose to switch to async/await. Promises can be useful for structured concurrency patterns, but as someone who has been writing Go professionally for a number of years… I just don’t think most code should need to be async-aware.
For a language like Rust, I think async/await makes perfect sense. Rust cannot afford to impose a runtime on everyone, and async/await can be implemented in a very low level, efficient way that gives the developer as much control as they need. This kind of ultra-low-level optimization stuff just isn’t relevant to Python… so, as an outsider, I almost wonder how (in Python) async/await isn’t just a clunkier coroutine system.
If I were to try to rebut my own comment, I would say that async/await was probably chosen because "explicit is better than implicit", and green threading might have been too implicit for the Python community's tastes.
There are a few truths in this. I’ve been involved in a few large projects written in Python and have seen both ups and downs.
Green threads required extensive monkey patching, and debugging those programs was incredibly hard. Instagram moved from them to async/await and wrote a blog post about it, iirc.
But I agree that Python’s async/await implementation is a bit too low-level and could use better abstractions. A lot of the hate Python async gets is due to the `asyncio` library. It’s a shame that the default library is full of deprecated and gotcha-ridden APIs. Trio attempted to fix these, but adoption has been low.
The community settled on async for the same reason I love Go despite all its faults. It’s flawed, but you can build successful systems with it. Lots of companies still write new services in async Python instead of Go because, as big as the Go community is, Python’s is absolutely ginormous.
Plus, LLMs brought more new people to Python than most other languages, and it’s easier to find Python developers and teach them async than to hire Gophers.
How exactly are you trying to train and deploy this YOLO model? What kind of accuracy are you seeing against the validation set at the end of the training process?
100GB of storage is… not much for 1000 users. A 1TB NVMe SSD is $60. So 100GB is a total of $6 of storage… or about half a penny per user.
And that’s for SSD storage… an enterprise-grade 14TB hard drive is only $18/TB on Amazon right now, less than a third of the SSD price per TB. Call it 100GB = $2 of storage, total, to enable 1000 users to run the editor of their choice.
So, no, I’m not seeing the problem here.
If you really wanted to penny pinch (one whole penny for every 5 users), I think you could use btrfs deduplication to reduce the storage used.
Very apparently you're counting as a singular physical person. In a big organization there's always an overhead both in time, and money, and more importantly there are more problems to solve than there are resources. So one has to arrange priorities, and just keep low-prio things in check not letting to boil over the lid.
Preventing people from using their preferred tools — tools which are extremely widely used in the real world — does not seem like a useful application of time and effort.
Usefulness depends on conditions of which we don't know a lot here. Sure, there are situations when counteracting pressure is more expensive than expanding capacity. But frankly I doubt this particular case is one of those.
How many concurrent users can you run off a single NVMe SSD?
How many students leave their coursework to the last minute?
How do you explain that the server went down during the last hour before submission deadline again, and that everyone gets an extension again, because you cheaped out on putting the cheapest possible storage into a system that has to cope with large demand at peak times?
How many students now start to do worse because of the anxiety caused by these repeated outages?
How much more needs to be invested in the university counselling services to account for this uptick in students struggling?
No… it’s not. To quote the message earlier in the thread, that message said “everyone with >100MB of disk usage on the class server was a VSCode user.”
100MB * 1000 users is how the person I responded to calculated 100GB, which is storage.
Most of the RAM usage would likely just be executable files that are mmap’d from disk.. not “real” RAM usage. But, also, the 1000 users in question wouldn’t all be connected at the same time… and I honestly doubt they would all be assigned to the same server for practical reasons anyways.
It’s not easy to estimate the real RAM usage with back of the napkin math.
Depending on what they're doing, it could easily be multiple Gb per user. When you do VSCode remoting, pretty much everything but the UI is running on the server. This includes stuff like code analysis for autocompletion, which - especially for languages that require type inference to provide useful completions - can consume a lot of RAM, and a fair bit of CPU.
> I honestly doubt they would all be assigned to the same server for practical reasons anyways.
The computer science department at my university had multiple servers. All CS students got an account on the one same server by default. Access was granted to other servers on a case by case basis, based on very course-specific needs.
So yes, in my case, all CS undergrads used the same one server.
I'm somewhat surprised neither this article nor the previous one mention anything about the Florence-2 model series. I had thought that Florence-2 was not just surprisingly capable for this kind of work, but also easily fine-tunable for a particular kind of document, when you expect to process a lot of instances of that document and want to further optimize accuracy. It's extremely small (0.23B and 0.77B parameters), so it's easy to run, easy to fine-tune, and probably unlikely to overthink things.
I don't personally deal with any OCR tasks, so maybe I misread the room, but it sounded promising, and I have seen some continuing interest in it online elsewhere.
In addition to the architectural issues mentioned in OP's article that are faced by most SOTA LLMs, I also expect that current SOTA LLMs like Gemini 2.0 Flash aren't being trained with very many document OCR examples... for now, it seems like the kind of thing that could benefit from fine-tuning on that objective, which would help emphasize to the model that it doesn't need to try to solve any equations or be helpful in any smart way.
32B models are easy to run on 24GB of RAM at a 4-bit quant.
It sounds like you need to play with some of the existing 32B models with better documentation on how to run them if you're having trouble, but it is entirely plausible to run this on a laptop.
I can run Qwen2.5-Instruct-32B-q4_K_M at 22 tokens per second on just an RTX 3090.
My question was about running it unquantized. The author of the article didn't say how he ran it. If he quantized it then saying he ran it on a laptop is not a news.
I can't imagine why anyone would run it unquantized, but there are some laptops with the more than 70GB of RAM that would be required. It's not that it can't be done... it's just that quantizing to at least 8-bit seems to be standard practice these days, and DeepSeek has shown that it's even worth training at 8-bit resolution.
This photography question can be solved with the right equations. A lot of non-reasoning LLMs would spout some nonsense like 0.67 stops faster. Sometimes they’ll leave a stray negative sign in too!
The answer should be approximately 1.37, although “1 and 1/3” is acceptable too.
LLMs usually don’t have trouble coming up with the formulas, so it’s not a particularly obscure question, just one that won’t have a memorized answer, since there are very few f/4.5 lenses on the market, and even fewer people asking this exact question online. Applying those formulas is harder, but the LLM should be able to sanity check the result and catch common errors. (f/2.8 -> f/4 is one full stop, which is common knowledge among photographers, so getting a result of less than one is obviously an error.)
This also avoids being a test that just emphasizes tokenizer problems… I find the strawberry test to be dreadfully boring. It’s not a useful test. No one is actually using LLMs to count letters in words, and until we have LLMs that can actually see the letters of each word… it’s just not a good test, in my opinion. I’m convinced that the big AI labs see it as a meme at this point, which is the only reason they keep bringing it up. They must find the public obsession with it hilarious.
I was impressed at how consistently well Phi-4 did at my photography math question, especially for a non-reasoning model. Phi-4 scored highly on math benchmarks, and it shows.