We just released a tiny (~3kloc) Python library that implements state-of-the-art inference algorithms on GPU and provides performance similar to vLLM. We believe it's a great learning vehicle for inference techniques and the code is quite easy to hack on!
Hi, thanks for the information, but LLVM still does not provide ABI compatibility. If you reduce the struct to 3 i32, it is passed in edi, esi, and edx on my machine. However according to the ABI it should be packed in rdi and rsi.
Checkout the QBE transcription and what it will compile to http://c9x.me/paste/mGOO (there is a bit of register shuffling because hinting in regalloc is not very mature yet, but note that SSA form for the input is not required!).
I actually decided to take more time to answer your comment more throughly than others. Also, I TA'd twice the class from where you linked the article below, so I know about it :).
Great! One of my main reasons posting here is getting next generation of high assurance developers info they need plus learning from them in what's not my specialty (esp formal verification). Just hate missed opportunities given I rarely run into people that even know what the phrase means or why it matters. ;)
And also, we are seeing more and more certified C programs: see the DeepSpec NSF expedition grant, the Verified Software Toolchain, and the CertiKOS project for examples. I work with these guys.
You are soooo lucky. DeepSpec has a near dream team of people working on this issue. Appel and Chlipala alone could probably knock out most of the problem given enough time. Add the others and great stuff on publication list is entirely unsurprising. Except in its cleverness. :) Glad you brought it up as I haven't read the info flow for c & asm paper yet.
Btw, hows progress coming on those projects? Specifically, are any of the tools (a) useful for non-experts with a little bit of training by tutorials, etc and (b) available for download in open-source or binary form yet? Thanks ahead of time.
Then maybe I did not express myself in the best terms. QBE definitely supports stack slots and their registerization! Minic, a small C frontend shipped with QBE makes use of them.
The difference is that LLVM forces you to use them even when you know your locals do not escape (i.e. your source language is Pascal), QBE doesn't.
So, LLVM makes you use stack slots for two independent problems: 1. Compiling languages like C where locals can escape, and 2. Avoiding to construct SSA form in the frontend. In QBE, you use stack slots (alloc4, alloc8) to solve 1, but to solve 2, you can simply emit non-ssa form and QBE will fixup things for you.
> you can simply emit non-ssa form and QBE will fixup things for you
That's exactly what LLVM does though, except the "non-SSA form" involves loads/stores to alloca'd values. The difference is that in LLVM, the language is always in SSA form, it just has some reads / writes to memory that can be pruned, while QBE alternates between being an SSA language and not an SSA language.
LLVM also doesn't "make" you use stack slots. If you wanted to, you could emit fully pruned programs as LLVM IR. Using allocas for variables is a choice that clang makes.
Also, QBE does not really "alternate" SSA/non-SSA, SSA form is built once at the beginning of the compilation pipeline and preserved later.
I don't understand what you mean by "fully pruned programs". Maybe you want to refer to pruned SSA form. And then, here is my point: with LLVM, either you build SSA yourself or you use allocas. QBE offers a convenient third option.
Some CFG transforms are actually much easier if you get out of an SSA first, reshuffle CFG without caring about maintaining your phis, and then simply rebuild an SSA form.
Expected to see you here. Hey, send me an email or something so I can send you interesting stuff without hunting your profile or random comments on HN. Address in my profile. Here's the one I was saving for you to complement the ML and Haskell CPU's I linked.
Cool stuff, eh? That they keep it close to a regular, RISC processor means optimizations of that should carry over. Unlike stuff like Fifth Generation that tried to go way, way the hell to far with Prolog hardware. ;) Should fit nicely into my concept of general-purpose CPU's with purpose-built coprocessors. Also speculate techniques might be helpful on ASIC's meant for today's big-data apps that use things like Datalog for queries. Ya think?
> I don't understand what you mean by "fully pruned programs". Maybe you want to refer to pruned SSA form.
The mem2reg pass in LLVM is the recommended method of constructing pruned SSA form. It just implements the standard algorithm.
> QBE does not really "alternate" SSA/non-SSA, SSA form is built once at the beginning of the compilation pipeline and preserved later.
The QBE IL presented to the user is not in SSA form, but internally within your compiler, an SSA representation is kept. I prefer to not have syntactic sugar in my IL.
We've taken 'LLVM' out of the title above, since experience has shown that discussions about titles tend to be off-topic and/or shallow. We also added 'Show HN' since this is your own work. Good luck!
It's much much smaller (I think libfirm is over 100kloc, QBE is about 6k).
But the major difference is the IL: I use a human-readable and easily-printable text IL. This means that you don't need a graph-viewing tool to read the IL (it's just text) and that you can modify the IL between two passes super easily. This simple IL is a blessing when debugging a compiler.
I think QBE also has better support for the x64 ABI.
Finally, it is much less advanced (less optimizations, less tested) than libfirm and supports only x64 as a target.
Thank you for your words. It is often called NIH, but eh, I learned a lot! And I think that I made some modest improvements over LLVM, you can check them out in my comparison at http://c9x.me/compile/doc/llvm.html
Looks good! I know your target is 70% of the performance, but is there any fundamental reason to QBE that it couldn't be more? Suppose I ported my compiler from LLVM to QBE (I think I could do so with not too much effort) would at some point I be able to work on porting some of LLVMs optimizing phases to QBE to get my performance up to par, or is there a design decision you made that will get in the way of the last 30%?
Great, I have the same goal for my compiler. I'd love for the compiler to be able to bootstrap its own backend, and as it only compiles C that would rule out llvm.
I am not sure how far I am along in actually compiling C, if I'd had to guess I'd say around 60%. Hopefully there's not too much crazy things on the horizon.
I'm not a C programmer myself so I first implemented the switch statement in a naieve way, and then I discovered the way they actually work and spent days getting it right.
If I get to the point where I can compile trivial C programs like the benchmarks game, I'll research a move to QBE :)