More

kardos · 2025-11-07T02:40:59 1762483259

I suppose /some/ performance loss is inevitable. But this could be quite a game changer. As more folks play with it, performing benchmarks, etc -- it should reveal which C idioms incur the most/least performance hits under Fil-C. So with some targetted patching of C code, we may end up with a rather modest price for the memory safety

pizlonator · 2025-11-07T03:07:02 1762484822

And I'm not done optimizing. The perf will get better. Rust and Yolo-C will always be faster, but right now we can't know what the difference will be.

Top optimization opportunities:

- InvisiCaps 2.0. While implementing the current capability model, when I was about 3/4 of the way done with the rewrite, I realized that if I had done it differently I would have avoided two branch+compares on every pointer load. That's huge! I just haven't had the appetite for doing yet another rewrite recently. But I'll do it eventually.

- ABI. Right now, Fil-C uses a binary interface that relies on lowering to what ELF is capable of. This introduces a bunch of overhead on every global variable access and every function call. All of this goes away if Fil-C gets its own object file format. That's a lot of work, but it will happen in Fil-C gets more adoption.

- Better abstract interpreter. Fil-C already has an abstract interpreter in the compiler, but it's not nearly as smart as it could be. For example, it doesn't have octagon domain yet. Giving it octagon domain will dramatically improve the performance of loops.

- More intrinsics. Right now, a lot of libc functions that are totally memory safe but are implemented in assembly are implemented in plain Fil-C instead right now, just because of how the libc ports happened to work out. Like, say you call some <math.h> function that takes doubles and returns doubles - it's going to be slower in Fil-C today because you'll end up in the generic C code version compiled with Fil-C. No good reason for this! It's just grunt work to fix!

- The calling convention itself is trash right now - it involves passing things through a thread-local buffer. It's less trashy than the calling convention I started out with (that allocated everything in the heap lmao), but still. There's nothing fundamentally preventing a Fil-C register-based calling convention, but it would take a decent amount of work to implement.

There are probably other perf optimization opportunities that I'm either forgetting right now or that haven't been found yet. It's still early days!

jacquesm · 2025-11-07T04:03:54 1762488234

This is such an interesting project.

I've always been firmly in the 'let it crash' camp for bugs, the sooner and the closer to the offending piece of code you can generate a crash the better. Maybe it would be possible to embed Fil-C in a test-suite combined with a fuzzing like tool that varies input to try really hard to get a program to trigger an abend. As long as it is possible to fuzz your way to a crash in Fil-C that would be a sign that there is more work to do.

That way 'passes Fil-C' would be a bit like running code under valgrind and move the penalty to the development phase rather than the runtime. Is this feasible or am I woolgathering, and is Fil-C only ever going to work by using it to compile the production code?

SkiFire13 · 2025-11-07T07:36:49 1762501009

From what I understand some things in Fil-C work "as expected" instead of crashing (e.g. dereferencing a pointer to an out of scope variable will give you the old value of that variable), so it won't work as a sanitizer.

1718627440 · 2025-11-07T11:28:57 1762514937

You can use the built-in sanitizer from your compiler though.

SkiFire13 · 2025-11-07T11:59:46 1762516786

At that point why use Fil-C for this though?

1718627440 · 2025-11-07T12:01:26 1762516886

Because you don't want to let it crash in production? Sanitizer for testing Fil-C for shipping.

pornel · 2025-11-07T12:28:51 1762518531

Fil-C will crash on memory corruption too. In fact, its main advantage is crashing sooner.

All the quick fixes for C that don't require code rewrites boil down to crashing. They don't make your C code less reliable, they just make the unreliability more visible.

To me, Fil-C is most suited to be used during development and testing. In production you can use other sandboxing/hardening solutions that have lower overhead, after hopefully shaking out most of the bugs with Fil-C.

jacquesm · 2025-11-07T12:34:03 1762518843

The great thing about such crashes is if you have coredumps enabled that you can just load the crashed binary into GDB and type 'where' and you most likely can immediately figure out from inspecting the call stack what the actual problem is. This was/is my go-to method to find really hard to reproduce bugs.

jitl · 2025-11-07T13:42:13 1762522933

I think the issue with this approach is it’s perfectly reasonable in Fil-C to never call `free` because the GC will GC. So if you develop on Fil-C, you may be leaking memory if you run in production with Yolo-C.

pornel · 2025-11-07T15:57:03 1762531023

Fil-C uses `free()` to mark memory as no longer valid, so it is important to keep using manual memory management to let Fil-C catch UAF bugs (which are likely symptoms of logic bugs, so you'd want to catch them anyway).

The whole point of Fil-C is having C compatibility. If you're going to treat it as a deployment target on its own, it's a waste: you get overhead of a GC language, but with clunkiness and tedium of C, instead of nicer language features that ground-up GC languages have.

jacquesm · 2025-11-07T16:41:33 1762533693

I agree with you but jitl has a point: implicit reliance on the GC could creep in and you might not notice it until you switch back to regular C.

estebank · 2025-11-07T17:19:38 1762535978

Fil-C should have a(n on by default) mode where collecting an unfreed allocation is a crash, if it doesn't already.

pizlonator · 2025-11-07T22:26:36 1762554396

It's not that simple since some object allocations go unfreed.

For example, Fil-C lifts all escaping locals to the heap, but doesn't free them.

SkiFire13 · 2025-11-07T23:31:59 1762558319

I think you're missing a bit of context from the parent comments:

> Maybe it would be possible to embed Fil-C in a test-suite

qdotme · 2025-11-07T07:39:13 1762501153

Can you elaborate on what makes ELF (potentially with custom sections/extension and maybe custom ld.so plugin) insufficient?

A lot of remarkably unusual stuff has been shoved into the format without breaking the tooling, so wondering what the restrictions are.

baq · 2025-11-07T10:20:22 1762510822

> Rust and Yolo-C will always be faster

graydon points in that direction, but since you're here: how feasible is a hypothetical Fil-Unsafe-Rust? would you need to compile the whole program in Fil-Rust to get the benefits of Fil-Unsafe-Rust?

zozbot234 · 2025-11-07T10:34:57 1762511697

It's reasonably easy if you can treat the Safe Rust and Fil-Unsafe-Rust code as accessing different address spaces (in the C programming sense of "a broad subset of memory that a pointer is limited to", not the general OS/hardware sense), since that's essentially what the bespoke Fil-C ABI amounts to in the first place. Which of course is not really a good fit for every use of Unsafe Rust, but might suffice for some of them.

pizlonator · 2025-11-07T15:11:19 1762528279

What is Fil-Rust and Fil-Unsafe-Rust?

kobebrookskC3 · 2025-11-07T16:00:43 1762531243

in my mind it would be doing what fil-c does for c to unsafe rust: a hypothetical memory safe implementation of unsafe rust using the same methods fil-c does e.g. gc

pizlonator · 2025-11-07T16:08:40 1762531720

Can't do GC unless you go all-in.

So that implies just running all of Rust through the Fil-C transformation

zozbot234 · 2025-11-07T16:23:47 1762532627

> Can't do GC unless you go all-in.

It can be done, especially with a safe non-GC language that can meaningfully guarantee it won't corrupt GC metadata or break its invariants. You only have real issues (and then only wrt. excess overhead, not unsoundness) with pervasive mutual references between the GC and non-GC parts of the program. You do need to promote GC pointers to a root anytime that non-GC code has direct access to them, and add finalizers to GC objects that may need to drop/run destructors on non-GC data.

baq · 2025-11-07T16:23:38 1762532618

that's what I expected, thanks for making this clear!

kobebrookskC3 · 2025-11-07T14:29:17 1762525757

what would fil-rust do that miri doesn't?

baq · 2025-11-07T14:49:18 1762526958

e.g. validate safety across safe/unsafe boundaries

estebank · 2025-11-07T15:40:17 1762530017

Miri does do that? It is not aware of the distinction to begin with (which is one of the use cases of the tool: it lets us exercise safe code to ensure there aren't memory violations caused by incorrect MIR lowering). I might be mistaking what you mean. Miri's big limitation is not being able to interface with FFI.

baq · 2025-11-07T15:51:27 1762530687

hmmm I thought miri was used in the compiler for static analysis, wasn't aware it's a runtime interpreter.

I guess the primary reason would be running hardened code in production without compromising performance too much, same as you would run Fil-C compiled software instead of the usual way. I've no idea if it's feasible to run miri in prod.

estebank · 2025-11-07T16:16:38 1762532198

I guess the confusion happens because MIR: the representation, mir: the stage, stable MIR: the potential future API interface for hooking into the compiler stage, and miri: the MIR interpreter all share pretty much the same name. Const evaluation uses MIR and that's the most likely culprit. Miri is an interpreter (as you found out on your own now), and it is not meant for use in production workloads due to the slowdown it introduces, so it limits its use to use in test suites and during debugging.

From my understanding Fil-C is an LLVM operation, so it should be possible to build integration to have a Fil-Rust binary that is slower but gives you some of the benefits of miri. I see value in doing something like that. There are plenty of other languages that would be well served by this too!

kragen · 2025-11-07T04:03:07 1762488187

The savings of two conditional branches sounds interesting; what would the change be?

pizlonator · 2025-11-07T04:25:21 1762489521

- Don’t put flags in the high bits of the aux pointer. Instead if an object has flags, it’ll have a fatter header. Most objects don’t have flags.

- Give up on lock freedom of atomic pointers. This is a fun one because theoretically, it’s worse. But it comes with a net perf improvement because there’s no need to check the low bit of lowers.

kragen · 2025-11-07T04:35:36 1762490136

Scary! I'm excited to see how it turns out.

pjmlp · 2025-11-07T13:12:20 1762521140

Love your Yolo-C remark. :)

senderista · 2025-11-07T05:13:35 1762492415

So you'd have to implement binfmt_misc for the new binary format? Will you need to write your own ld.so?

pizlonator · 2025-11-07T05:18:43 1762492723

Yes and yes

cardanome · 2025-11-07T09:53:31 1762509211

If you are not writing anything performance sensitive, you shouldn't be using C in the first place. Even if Fil-C greatly reduces its overhead, I can't see it ever being a good idea for actual release builds.

As a Linux user of two decades, memory safety has never been a major issues that I would be willing to trade performance for. It doesn't magically make my application work it just panics instead of crashes, same end result for me. It just makes it so the issue can not be exploited by an attacker. Which is good but like Linux has been already safe enough to be the main choice to run on servers so meh. The whole memory safety cult is weird.

I guess Fil-C could have a place in the testing pipeline. Run some integration tests on builds made with it and see if stuff panics.

That said, Fil-C is a super cool projects. I don't mean to throw any shades at it.

pizlonator · 2025-11-07T15:13:10 1762528390

> If you are not writing anything performance sensitive, you shouldn't be using C in the first place.

Then why are all of the IO-bound low level pieces of Linux userland written in C?

Take just one example: udevd. I have a Fil-C version. There is zero observable difference in performance.

cardanome · 2025-11-07T17:58:15 1762538295

udevd might actually be a good use for Fil-C. Good point.

My fear is that the performance difference might add up once use it on more and more part. I imagine it uses a lot more memory. Plus once Fil-C gets adopted in the mainstream it might lower the need for devs to actually fix the code and they might start just relying on Fil-C.

To be fair, systemd itself is corporate shite to begin with and I wouldn't mind seeing it being replaced with something written in a language with memory safety.

pizlonator · 2025-11-07T21:07:20 1762549640

> My fear is that the performance difference might add up once use it on more and more part

If that argument is valid, then why hasn't it stopped adoption of slow languages?

Like, Python is waaay slower than Fil-C. And so much of Linux userland is written in shell, which is slower still.

kardos · 2025-11-07T18:23:58 1762539838

> it might lower the need for devs to actually fix the code and they might start just relying on Fil-C.

Well, the program would still halt upon memory flaw, so there would still be a need to fix it

jitl · 2025-11-07T13:45:55 1762523155

People with Linux servers keep getting hacked so idk if I buy the argument “if it’s in use it’s good enough”. That’s like saying “everyone else runs Pentium 2, why would I upgrade to Pentium 3?”

cardanome · 2025-11-07T14:20:32 1762525232

While memory safety can help reduce many security vulnerabilities it is not the only source of vulnerabilities. Furthermore as for getting hacked I would suspect the main problems to be social engineering, bad configuration and lack of maintenance and not really the software itself being insecure.

> That’s like saying “everyone else runs Pentium 2, why would I upgrade to Pentium 3?”

No one should blindly upgrade because bigger number is better. If I look into new hardware I research benchmarks and figure out if it would enable me to (better) run the software/games I care about it and if the improvement is worth my money.

Same with security. You need to read actual studies and figure out what the cost/benefit of certain measures is.

There are safer alternatives to Linux but apparently the situation isn't bad enough for people to switch to them.

And I am not saying you should create new projects in C or C++. Most people should not. But there is a lot of battle tested C and C++ code out there and to act as if we suddenly have this big problem with memory safety is a weird narrative to push. And if you discover a vulnerability, well fix it instead of wrapping it Fil-C and making the whole thing slower.

kardos · 2025-10-29T19:42:33 1761766953

It would be nice indeed if there was a good solution to multi-gigabyte conda directories. Conda has been reproducible in my experience with pinned dependencies in the environment YAML... slow to build, sure, but reproducible.

PaulHoule · 2025-10-29T19:55:47 1761767747

I'd argue bzip compression was a mistake for Conda. There was a time when I had Conda packages made for the CUDA libraries so conda could locally install the right version of CUDA for every project, but boy it took forever for Conda to unpack 100MB+ packages.

kardos · 2025-10-29T20:19:47 1761769187

It seems they are using zstd now for .conda packages, eg, bzip is obsoleted, so that should be faster.

blactuary · 2025-10-30T14:55:05 1761836105

uv does it by caching versions of packages so they can be shared across projects/environments. So you still have to store those multi-gig directories but you don't have so much duplication. If conda could do something similar that would be great

kardos · 2025-10-29T19:36:54 1761766614

> because new poetry releases would stall trying to resolve dependencies.

> uv is much faster than both of these tools

conda is also (in)famous for being slow at this, although the new mamba solver is much faster. What does uv do in order to resolve dependencies much faster?

collinmanderson · 2025-10-29T19:47:40 1761767260

> What does uv do in order to resolve dependencies much faster?

- Representing version numbers as single integer for fast comparison.

- Being implemented in rust rather than Python (compared to Poetry)

- Parallel downloads

- Caching individual files rather than zipped wheel, so installation is just hard-linking files, zero copy (on unix at least). Also makes it very storage efficient.

kardos · 2025-10-29T19:31:16 1761766276

> uv will magically install and use the specified modules.

As long as you have internet access, and whatever repository it's drawing from is online, and you may get different version of python each time, ...

tclancy · 2025-10-29T20:20:19 1761769219

And electricity and running water and oh the inconvenience. How is this worse than getting a script file that expects you to install modules?

maccard · 2025-10-29T19:51:29 1761767489

If I download python project from someone on the same network as me and they have it written in a different python version to me and a requirements.txt I need all those things anyway.

85392_school · 2025-10-29T19:55:44 1761767744

You can constrain Python version: https://peps.python.org/pep-0723/#:~:text=requires-python

dragonwriter · 2025-10-29T20:01:58 1761768118

I mean, if you use == constraints instead of >= you can avoid getting different versions, and if you’ve used it (or other things which combined have a superset of the requirements) you might have everything locally in your uv cache, too.

But, yes, python scripts with in-script dependencies plus uv to run them doesn't change dependency distribution, just streamlines use compared to manual setup of a venv per script.

gkfasdfasdf · 2025-10-30T02:21:53 1761790913

You can specify python version requirements in the comment, as the standard describes

kardos · 2025-07-30T00:04:39 1753833879

Sounds like a great use for a local LLM to strip ads from the output of the ad-infested LLM.

kardos · 2025-06-23T20:26:05 1750710365

Well, heat capacity and thermal conductivity are not the same thing

kardos · 2025-06-04T20:12:55 1749067975

because it's a much better experience than copy-pasting into a webapp

kardos · 2025-06-04T20:09:02 1749067742

I'd like to try it, but enterprise only?

kardos · 2025-04-14T00:15:54 1744589754

Surely there must be a way to do the joins in software, without doing it by hand, eg a SQL-like library? Pandas or equivalent?

Rohansi · 2025-04-14T06:16:00 1744611360

Of course - but that is the best case scenario. You will need to support other kinds of queries as well, including writes, which is where it gets even more complicated. The guarantees provided by your RDBMS go away when you shard your database like this. Transactions are local to each database so writes to multiple cannot be a single transaction anymore.

kardos · 2025-04-08T16:58:45 1744131525

Indeed. KYC has a purpose though -- prevention of fraud, money laundering, etc. Getting rid of KYC without a similarly-effective solution for those things seems unlikely. Ideas?

irusensei · 2025-04-09T13:23:27 1744205007

That’s not really true. Most financial crimes are big operations facilitated by banks. Criminals love KYC because that’s a chance to make their operations seem legit.

amrocha · 2025-04-09T01:09:11 1744160951

Here’s an idea, get rid of cryptocurrencies and the need for KYC basically vanishes.

sneak · 2025-04-09T05:25:14 1744176314

If you get rid of cocaine, the need for rehab centers also vanishes.

There is no way to “get rid of cryptocurrencies” at this point save for shutting off the internet. It is not within the power of the state to prohibit, any more than prostitution or cocaine.

amrocha · 2025-04-09T09:17:27 1744190247

Sure, there would be a black market for it, but that black market would be a lot smaller than the open market we have right now.

There’s plenty of legal ways of exchanging cryptocurrencies for real currencies, shutting those down would be a good start.