I suppose /some/ performance loss is inevitable. But this could be quite a game changer. As more folks play with it, performing benchmarks, etc -- it should reveal which C idioms incur the most/least performance hits under Fil-C. So with some targetted patching of C code, we may end up with a rather modest price for the memory safety
And I'm not done optimizing. The perf will get better. Rust and Yolo-C will always be faster, but right now we can't know what the difference will be.
Top optimization opportunities:
- InvisiCaps 2.0. While implementing the current capability model, when I was about 3/4 of the way done with the rewrite, I realized that if I had done it differently I would have avoided two branch+compares on every pointer load. That's huge! I just haven't had the appetite for doing yet another rewrite recently. But I'll do it eventually.
- ABI. Right now, Fil-C uses a binary interface that relies on lowering to what ELF is capable of. This introduces a bunch of overhead on every global variable access and every function call. All of this goes away if Fil-C gets its own object file format. That's a lot of work, but it will happen in Fil-C gets more adoption.
- Better abstract interpreter. Fil-C already has an abstract interpreter in the compiler, but it's not nearly as smart as it could be. For example, it doesn't have octagon domain yet. Giving it octagon domain will dramatically improve the performance of loops.
- More intrinsics. Right now, a lot of libc functions that are totally memory safe but are implemented in assembly are implemented in plain Fil-C instead right now, just because of how the libc ports happened to work out. Like, say you call some <math.h> function that takes doubles and returns doubles - it's going to be slower in Fil-C today because you'll end up in the generic C code version compiled with Fil-C. No good reason for this! It's just grunt work to fix!
- The calling convention itself is trash right now - it involves passing things through a thread-local buffer. It's less trashy than the calling convention I started out with (that allocated everything in the heap lmao), but still. There's nothing fundamentally preventing a Fil-C register-based calling convention, but it would take a decent amount of work to implement.
There are probably other perf optimization opportunities that I'm either forgetting right now or that haven't been found yet. It's still early days!
I've always been firmly in the 'let it crash' camp for bugs, the sooner and the closer to the offending piece of code you can generate a crash the better. Maybe it would be possible to embed Fil-C in a test-suite combined with a fuzzing like tool that varies input to try really hard to get a program to trigger an abend. As long as it is possible to fuzz your way to a crash in Fil-C that would be a sign that there is more work to do.
That way 'passes Fil-C' would be a bit like running code under valgrind and move the penalty to the development phase rather than the runtime. Is this feasible or am I woolgathering, and is Fil-C only ever going to work by using it to compile the production code?
From what I understand some things in Fil-C work "as expected" instead of crashing (e.g. dereferencing a pointer to an out of scope variable will give you the old value of that variable), so it won't work as a sanitizer.
Fil-C will crash on memory corruption too. In fact, its main advantage is crashing sooner.
All the quick fixes for C that don't require code rewrites boil down to crashing. They don't make your C code less reliable, they just make the unreliability more visible.
To me, Fil-C is most suited to be used during development and testing. In production you can use other sandboxing/hardening solutions that have lower overhead, after hopefully shaking out most of the bugs with Fil-C.
The great thing about such crashes is if you have coredumps enabled that you can just load the crashed binary into GDB and type 'where' and you most likely can immediately figure out from inspecting the call stack what the actual problem is. This was/is my go-to method to find really hard to reproduce bugs.
I think the issue with this approach is it’s perfectly reasonable in Fil-C to never call `free` because the GC will GC. So if you develop on Fil-C, you may be leaking memory if you run in production with Yolo-C.
Fil-C uses `free()` to mark memory as no longer valid, so it is important to keep using manual memory management to let Fil-C catch UAF bugs (which are likely symptoms of logic bugs, so you'd want to catch them anyway).
The whole point of Fil-C is having C compatibility. If you're going to treat it as a deployment target on its own, it's a waste: you get overhead of a GC language, but with clunkiness and tedium of C, instead of nicer language features that ground-up GC languages have.
graydon points in that direction, but since you're here: how feasible is a hypothetical Fil-Unsafe-Rust? would you need to compile the whole program in Fil-Rust to get the benefits of Fil-Unsafe-Rust?
It's reasonably easy if you can treat the Safe Rust and Fil-Unsafe-Rust code as accessing different address spaces (in the C programming sense of "a broad subset of memory that a pointer is limited to", not the general OS/hardware sense), since that's essentially what the bespoke Fil-C ABI amounts to in the first place. Which of course is not really a good fit for every use of Unsafe Rust, but might suffice for some of them.
in my mind it would be doing what fil-c does for c to unsafe rust: a hypothetical memory safe implementation of unsafe rust using the same methods fil-c does e.g. gc
It can be done, especially with a safe non-GC language that can meaningfully guarantee it won't corrupt GC metadata or break its invariants. You only have real issues (and then only wrt. excess overhead, not unsoundness) with pervasive mutual references between the GC and non-GC parts of the program. You do need to promote GC pointers to a root anytime that non-GC code has direct access to them, and add finalizers to GC objects that may need to drop/run destructors on non-GC data.
Miri does do that? It is not aware of the distinction to begin with (which is one of the use cases of the tool: it lets us exercise safe code to ensure there aren't memory violations caused by incorrect MIR lowering). I might be mistaking what you mean. Miri's big limitation is not being able to interface with FFI.
hmmm I thought miri was used in the compiler for static analysis, wasn't aware it's a runtime interpreter.
I guess the primary reason would be running hardened code in production without compromising performance too much, same as you would run Fil-C compiled software instead of the usual way. I've no idea if it's feasible to run miri in prod.
I guess the confusion happens because MIR: the representation, mir: the stage, stable MIR: the potential future API interface for hooking into the compiler stage, and miri: the MIR interpreter all share pretty much the same name. Const evaluation uses MIR and that's the most likely culprit. Miri is an interpreter (as you found out on your own now), and it is not meant for use in production workloads due to the slowdown it introduces, so it limits its use to use in test suites and during debugging.
From my understanding Fil-C is an LLVM operation, so it should be possible to build integration to have a Fil-Rust binary that is slower but gives you some of the benefits of miri. I see value in doing something like that. There are plenty of other languages that would be well served by this too!
- Don’t put flags in the high bits of the aux pointer. Instead if an object has flags, it’ll have a fatter header. Most objects don’t have flags.
- Give up on lock freedom of atomic pointers. This is a fun one because theoretically, it’s worse. But it comes with a net perf improvement because there’s no need to check the low bit of lowers.
If you are not writing anything performance sensitive, you shouldn't be using C in the first place. Even if Fil-C greatly reduces its overhead, I can't see it ever being a good idea for actual release builds.
As a Linux user of two decades, memory safety has never been a major issues that I would be willing to trade performance for. It doesn't magically make my application work it just panics instead of crashes, same end result for me. It just makes it so the issue can not be exploited by an attacker. Which is good but like Linux has been already safe enough to be the main choice to run on servers so meh. The whole memory safety cult is weird.
I guess Fil-C could have a place in the testing pipeline. Run some integration tests on builds made with it and see if stuff panics.
That said, Fil-C is a super cool projects. I don't mean to throw any shades at it.
udevd might actually be a good use for Fil-C. Good point.
My fear is that the performance difference might add up once use it on more and more part. I imagine it uses a lot more memory. Plus once Fil-C gets adopted in the mainstream it might lower the need for devs to actually fix the code and they might start just relying on Fil-C.
To be fair, systemd itself is corporate shite to begin with and I wouldn't mind seeing it being replaced with something written in a language with memory safety.
People with Linux servers keep getting hacked so idk if I buy the argument “if it’s in use it’s good enough”. That’s like saying “everyone else runs Pentium 2, why would I upgrade to Pentium 3?”
While memory safety can help reduce many security vulnerabilities it is not the only source of vulnerabilities. Furthermore as for getting hacked I would suspect the main problems to be social engineering, bad configuration and lack of maintenance and not really the software itself being insecure.
> That’s like saying “everyone else runs Pentium 2, why would I upgrade to Pentium 3?”
No one should blindly upgrade because bigger number is better. If I look into new hardware I research benchmarks and figure out if it would enable me to (better) run the software/games I care about it and if the improvement is worth my money.
Same with security. You need to read actual studies and figure out what the cost/benefit of certain measures is.
There are safer alternatives to Linux but apparently the situation isn't bad enough for people to switch to them.
And I am not saying you should create new projects in C or C++. Most people should not. But there is a lot of battle tested C and C++ code out there and to act as if we suddenly have this big problem with memory safety is a weird narrative to push. And if you discover a vulnerability, well fix it instead of wrapping it Fil-C and making the whole thing slower.
It would be nice indeed if there was a good solution to multi-gigabyte conda directories. Conda has been reproducible in my experience with pinned dependencies in the environment YAML... slow to build, sure, but reproducible.
I'd argue bzip compression was a mistake for Conda. There was a time when I had Conda packages made for the CUDA libraries so conda could locally install the right version of CUDA for every project, but boy it took forever for Conda to unpack 100MB+ packages.
uv does it by caching versions of packages so they can be shared across projects/environments. So you still have to store those multi-gig directories but you don't have so much duplication. If conda could do something similar that would be great
> because new poetry releases would stall trying to resolve dependencies.
> uv is much faster than both of these tools
conda is also (in)famous for being slow at this, although the new mamba solver is much faster. What does uv do in order to resolve dependencies much faster?
> What does uv do in order to resolve dependencies much faster?
- Representing version numbers as single integer for fast comparison.
- Being implemented in rust rather than Python (compared to Poetry)
- Parallel downloads
- Caching individual files rather than zipped wheel, so installation is just hard-linking files, zero copy (on unix at least). Also makes it very storage efficient.
If I download python project from someone on the same network as me and they have it written in a different python version to me and a requirements.txt I need all those things anyway.
I mean, if you use == constraints instead of >= you can avoid getting different versions, and if you’ve used it (or other things which combined have a superset of the requirements) you might have everything locally in your uv cache, too.
But, yes, python scripts with in-script dependencies plus uv to run them doesn't change dependency distribution, just streamlines use compared to manual setup of a venv per script.
Of course - but that is the best case scenario. You will need to support other kinds of queries as well, including writes, which is where it gets even more complicated. The guarantees provided by your RDBMS go away when you shard your database like this. Transactions are local to each database so writes to multiple cannot be a single transaction anymore.
Indeed. KYC has a purpose though -- prevention of fraud, money laundering, etc. Getting rid of KYC without a similarly-effective solution for those things seems unlikely. Ideas?
That’s not really true. Most financial crimes are big operations facilitated by banks. Criminals love KYC because that’s a chance to make their operations seem legit.
If you get rid of cocaine, the need for rehab centers also vanishes.
There is no way to “get rid of cryptocurrencies” at this point save for shutting off the internet. It is not within the power of the state to prohibit, any more than prostitution or cocaine.