More

nanolith · 2025-04-04T06:58:48 1743749928

The Ben Eater computer is an interesting starting point. With a few modifications, such as an extended program counter / address bus, a combinatorial ALU programmed in flash, and a few more registers, it can be upgraded to a more powerful, if slow, computer.

Further, these circuits can be directly simulated in an FPGA if you want to teach yourself an HDL like VHDL or Verilog. You really want to learn better and faster ways to organize a CPU, but it's a fun way for a beginner to bridge into this kind of work.

At the local hackerspace, we actually taught a class that assembled a hybrid 8-bit / 16-bit variation of this computer with 1MB of RAM, an 8-bit ALU, a 16-bit data bus / registers, and a 20-bit address bus. With this additional capability, we could program it over serial with a very basic bootloader, and it was powerful enough to compute pi to 1000 digits. Granted, this took about 18 hours, but it could have run faster with a printed circuit board and a crystal oscillator. We just never really got to that point.

As for how powerful such a computer with discrete logic can be, consider that the KB-11A CPU for the PDP-11/45 was built using 74-series logic ICs. It wasn't until after this system that DEC moved on to VSLI. So, this is a way to play with technology found in computers in 1972 / 1973.

As a stepping stone to digital logic and FPGA work, Ben Eater's tutorial isn't bad. Just get yourself a copy of Don Lancaster's TTL Cookbook or CMOS Cookbook to bring his tutorial up to best practices.

nanolith · 2025-03-18T00:27:52 1742257672

It's a clever use of assembler, but in production code, it's much better to use a bounded model checker, like CBMC, to verify memory safety across all possible execution paths.

nanolith · 2025-01-12T15:14:43 1736694883

To each their own. With Vim, Unix is my IDE. I don't know about the recent interest in these editors that you mention. I've been using vi/Vim for the past 30 years. I take it to every project and job. My fingers already know what to do. I've watched colleagues who I started working with 20 years ago as they've retooled on the latest hotness every 4-5 years. Visual Studio, Netbeans, Eclipse, Jetbrains, VS Code, etc. It doesn't take long to learn to use a new IDE, but they are definitely shorter term investments.

I can do more or less the same thing most folks can with an IDE; I just use external tools. I wouldn't claim that Vim is somehow superior. It's just what I use. Every now and then, I noodle a bit on a personal editor that is to ed what Vim is to vi. At some point, I'll migrate to it.

I think there is a bit of a different philosophy that the editor folks have. I can't speak for them, but I can speak for me. I like to feel closer to the code base. I like to have more of it in my head. The best analogy I've found is that using an editor like Vim or Emacs is closer to driving with a manual transmission and with tight steering controls, compared to driving with an automatic transmission with partial self-driving features found in modern cars. There is definitely something to be said about things like adaptive cruise control, lane keeping assist, GPS navigation, etc. But, if you talk to a manual transmission enthusiast, there is a thrill of feeling closer to the road and being more engaged. Both folks arrive at the destination in the same amount of time. But, if you ask each about their experience, they will have much different views of the drive organized in their head.

Shorel · 2025-01-13T10:11:58 1736763118

> I wouldn't claim that Vim is somehow superior. It's just what I use.

Exactly, that's your editor and you are used to it. There is no reason at all for you to consider to change editors.

I use Sublime Text, it's my editor, it has been for a decade and I am used to it.

I don't have Copilot or similar, and I'm glad I don't. Not interested in such a thing.

snozolli · 2025-01-12T17:24:33 1736702673

To each their own.

And yet I get down-voted for expressing a well-reasoned opinion against vim and Emacs.

I've been using vi/Vim for the past 30 years. I take it to every project and job.

I've rarely used an IDE that didn't allow custom key bindings, often with the ability to select a set from a drop-down list to match other IDEs. I've been using mostly the same keyboard shortcuts across IDEs for over 20 years.

if you talk to a manual transmission enthusiast, there is a thrill of feeling closer to the road and being more engaged

Funny you should say that, because I regularly enrage these types by pointing out that if they can't stay engaged as a driver with an automatic transmission, then the problem is with them, not the car. This is a quasi-religious ritual with these people, and a very low-effort way to get a sense of superiority over others (i.e. every driver on the road before ~1970 had experience with a manual transmission and literally anyone can learn in a few hours. It's not a skill to be proud of).

nanolith · 2025-01-12T18:42:39 1736707359

and a very low-effort way to get a sense of superiority over others... literally anyone can learn in a few hours.

I agree that it is a skill that is easy to learn. The same is true of IDEs. This isn't about skill or superiority, but comfort. Some folks are more comfortable being closer to the machine or the road, as it were. Others are more comfortable having some automation between them and the machine. I think that the better to consider this a matter of personal preference.

The IDE adds a layer of abstraction, and abstraction can be leaky. If you are comfortable with the abstraction, and with the opinionated choices the IDE makes, that's fine. If you are not, that's also fine. All that I ask when I'm bootstrapping a project with a team is that projects be arranged such that they are IDE / editor agnostic. Use standard build configuration / build tools that have appropriate plugins for IDEs, and can also be run in the terminal / command-line. Then, the individual developer can choose to use whichever editor or IDE that developer is comfortable using.

snozolli · 2025-01-13T00:20:16 1736727616

This isn't about skill or superiority, but comfort.

The down-votes that I continue to get in this thread tell a different story.

troad · 2025-01-13T08:50:43 1736758243

> And yet I get down-voted for expressing a well-reasoned opinion against vim and Emacs.

Calling people whose editor preferences differ from yours "obsessed" "editor-hipsters" is not a well-reasoned opinion, nor is saying that people who drive manual are engaged in a "quasi-religious ritual" to feel a false "sense of superiority". Those are just insults. Hence the downvotes, I suspect.

No one here is forcing you to use any editor, you've only got people trying to explain to you what they find valuable in the tools you're attacking. You're coming off as someone with a major chip on his shoulder. You may use whatever editor you like, but you should consider extending the same courtesy to others.

nanolith · on Nov 1, 2024

Leave it to politicians to pit bull a language. Model checked C/C++ is memory safe. Had they reached out to a wider set of people for guidance, they'd have a more balanced report.

I will agree that software safety -- not just memory safety -- is critical. Trying to attack this at the language level instead of the development process and assurance level is daft. FIPS certification and aerospace certification both require auditing already. It's not much of a stretch to require an audit of critical infrastructure to verify that safety processes are in place.

Simply adopting a different language won't make software safe. It will make it safer, perhaps, but we can do better. Model checked code -- be it written in Rust, C, or C++ -- is on the same level. Tools exist for each. That is what CISA should focus on, not trying to force organizations to migrate their code bases to some new and shiny language.

erik_seaberg · on Nov 1, 2024

I'm a little familiar with TLA+ but it can't verify the actual code you want to run, only a restatement of your intended algorithm. Are there published model checkers that can check real C++? How would they catch double-free or use-after-free or double-throw?

nanolith · on Nov 1, 2024

Yep. ESBMC and PolySpace both work for C++.

I use CBMC for C.

They run an abstract machine model through an SMT solver. Among other things, this abstract machine model tracks memory usage, UAF, aliasing, etc. With custom assertions, it can also cover issues like casting safety, serialization issues, logic errors, etc. More or less, you can build up any proof obligations you need and run them through the model checker. It will provide counter-examples if it detects a failure.

erik_seaberg · on Nov 1, 2024

I'm definitely digging into this, thank you!

I'm a little concerned about ESBMC's "simulating a finite prefix of the program execution with all possible inputs" when important software like operating systems are intended to run indefinitely. Would they start with a valid state (filesystem, process list) and prove no invalid state is reachable within n syscalls or interrupts? Maybe I was hoping for proving invariants.

nanolith · on Nov 1, 2024

So, the way I deal with this is to provide an exit condition in the loop that is triggered in a non-deterministic finite way by the model checker but that is not triggered at runtime. This allows for termination, which is required as part of the model check, while maintaining the loop.

The best way that I've found to model check software is to decompose the software by function, defining function contracts and invariants along the way. To reduce complexity, shadow functions that follow the same function contracts and maintain the same invariants can be substituted. A shadow function uses non-determinism to exercise the same range of behaviors that the real function provides in a way that is simpler for the model checker to verify. When verifying a function with the model checker, you can substitute functions it calls with these shadow functions to reduce the complexity of the model check.

The example server I'm using for the book initiates a TLS connection with the client. Each connection is maintained by a fiber that enters into a command-response loop. When verifying this loop function in the model checker, I can shadow the functions that read the command and send the response. As long as these shadows follow the same function contract as the function, I can verify that the loop function is correct. For instance, the read command function returns a command in the output variable on success that must be freed by the caller. The dispatch function that dispatches the command and sends the response expects this command to be valid and treats it as read-only data with a lifetime that is beyond the lifetime of the dispatch function. Finally, the loop cleans up the command. All of this can be specified using the function contracts defined by this function as well as the semantics of the real functions and the shadow functions. We know based on the contracts that we expect this command object to be created on success, and we know based on the abstract model that it must be cleaned up. If, for instance, we freed it before calling dispatch, the abstract model would find a UAF counter-example.

If this sounds like a lot, there is a reason why I'm writing a book on the topic of practical model checking in C, with a mature well-worked example that mirrors a real-world application.

rich_sasha · on Nov 1, 2024

How very interesting.

Does that detect issues at runtime, or prove statically that the code is "safe" (whatever the remit of that definition is here)?

If the former, I guess you're still at the mercy of test coverage and all the pitfalls it entails.

nanolith · on Nov 1, 2024

Model checking employs abstract interpretation, so it's similar to static analysis. I hesitate to use the latter term, because it's often used to describe how linting works. This is significantly more advanced.

The model checker translates an abstract model of execution for the code into an SMT solver, and uses this to find counter-examples. The SMT solver works backward from failure cases to discover a combination of inputs or program state that can cause any one of these failure cases. This traces memory, data flow, and control flow. It's actually a pretty neat thing.

It doesn't require existing test coverage or anything at runtime to work.

timClicks · on Nov 3, 2024

That's the intent of the document - to reach people for guidance. The document under discussion is requesting comment from industry.

nanolith · on Nov 4, 2024

Yep. I'm writing up a response letter based on my own work with model checked C. It's not the language but the process and the tooling that matters. It is definitely true that the industry has been quite lax with memory safety, but the solution isn't to rewrite everything in yet another language. The solution is to tighten development processes and put forward a plan to make existing code safer.

dataflow · on Nov 9, 2024

> It's not the language but the process and the tooling that matters.

You're missing some other critical components: the developers, and the costs.

If you come up with processes and tooling that is difficult to use widely, you're going to negatively impact your ability to deliver. That's not a trade-off you can ignore. If the cost of using C++ safely ends up being that only (say) 10% of the developers who currently use it will be able to keep doing that -- that on its own might justify a government policy decision to avoid it completely.

> but the solution isn't to rewrite everything in yet another language

How many times more effort would it be to rewrite a typical C++ function in a memory-safe language vs. verify the equivalent guarantees with model-checking tools, in your view?

Like how much actual work are you saving here? And how do the resulting turnaround times (say, compile/verification/etc. times) compare?

nanolith · on Nov 9, 2024

> You're missing some other critical components: the developers, and the costs.

No, I'm not.

> If you come up with processes and tooling that is difficult to use widely,

That's an unfounded assumption. The tooling and processes I have developed are no more difficult to use than unit testing.

> How many times more effort would it be...

I'd say compared to the 10% or so overhead of model checking, it would be approximately 8X as difficult, given that this requires learning a new language, using it correctly, and rebuilding the semantics in a way that is safe in that language. But, first, you need to learn new frameworks, rebuild all of the testing and quality assurance that went into the first development process, and rebuild the same level of trust as the original. Writing software is the easy part. It's all of the other stuff, like analysis, testing, confidence building, and engineering that costs money, and this would have to be redone from scratch, as the artifact from that original effort is being thrown away for a rewrite. Remember: the cost of rewriting software isn't just the cost of sitting down and banging out code.

> And how do the resulting turnaround times (say, compile/verification/etc. times) compare?

Model checking is actually pretty fast if you use proper shadowing. Compiling with C is lightning fast. I'd say that using a model checker is a little slower than compiling with Rust.

Also, you are mistaken that using a memory-safe language gives equivalent guarantees. It does not. A model checker allows you to write safer code. There are more errors than memory errors. Memory safety fixes many of the problems, but not nearly enough. So, after that rewrite, you'll also need to add a model checker or some other kind of formal methods to be as safe as just using the model checker to begin with. For Rust, that's Kani. It's not an apples to apples comparison.

dataflow · on Nov 9, 2024

>> If you come up with processes and tooling that is difficult to use widely,

> That's an unfounded assumption. The tooling and processes I have developed are no more difficult to use than unit testing.

My assumption was just founded on the reality of how few developers use model-checking tools, and how they difficult they find them. If your tools are radically better, and they work for C++, that's obviously amazing.

Could you give us an idea of what the equivalent of this simple C++ code would look like, so we can see how it looks in comparison? You know, with whatever bells and whistles you need to add to it for the model checker to verify its correctness:

  #include <string>
  #include <unordered_map>
  
  template<class Char>
  auto make_map(size_t argc, const Char *const argv[]) {
    using S = std::basic_string<Char>;
    std::unordered_map<S, S> map;
    for (size_t i = 0; i + 2 <= argc; i += 2) {
      map[S(argv[i])] = S(argv[i + 1]);
    }
    return map;
  }
  
  int process(const std::unordered_map<std::string, std::string> &);
  
  int main(int argc, char *argv[]) {
    return process(make_map(argc, argv));
  }

nanolith · on Nov 9, 2024

The reason why so few developers use model checking tools is because there is limited documentation out there for how to use them effectively. That is changing. Hell, I'm writing a book on the subject.

These tools aren't difficult to use. They just aren't well documented.

I've got better things to do with my afternoon than model check random bits of code in a forum. You wouldn't ask someone talking about a unit testing framework to unit test an example that you write up. If you're curious, I'm writing a book on model checking C, which will be going off to the publisher sometime in 2025. There are also examples online.

dataflow · on Nov 9, 2024

> I've got better things to do with my afternoon than model check random bits of code in a forum. You wouldn't ask someone talking about a unit testing framework to unit test an example that you write up.

I totally would, and I would also reply to their example. I've exchanged code examples in the past right here on HN for discussions like this.

You obviously don't have to do anything, but if you're going to claim it's only a 10% difference compared to unit testing -- expect people to be skeptical unless you can show side-by-side examples.

> If you're curious, I'm writing a book on model checking C, which will be going off to the publisher sometime in 2025. There are also examples online.

I'm not at all curious about model checking C. I've seen that before numerous times, and frankly, even if you could cut the development cost in C by 50% through model checking, I still wouldn't be interested.

You keep going back to C... in a conversation that's that's supposed to cover C++. I'm not sure why you do this, if the solutions to them are actually comparable -- a lot of the world is on C++, so you would find a much, much broader audience if you cater to that. My suggestion (and you're obviously welcome to ignore me, but I'm just letting you know my reaction as a reader) is to focus your response letter & book at least partly on visibly demonstrating how your proposed solutions would have similar success on C++. If you don't sell people on this being a solution for C++, I doubt your writing will make many people change their minds about switching to a different language.

nanolith · on Nov 9, 2024

I'm focused on C because a lot of our critical infrastructure is written in C. The Linux / BSD kernels, firmware, etc. It's also where my interest is. Model checked C is a good enough balance of cognitive load, safety, and time to market for my purposes. Others may disagree.

Tooling exists in C++. That shouldn't be too surprising given that the C++ abstract machine model is not much more complex than C and has similar complexities as Rust. The same techniques work there as well. If someone wants to tap that audience by translating this process to C++, they are welcome to do so.

whytevuhuni · on Nov 9, 2024

What are some of the tools you'd recommend?

In my case I'm working with Nginx and I'd like to make a few custom modules safer.

nanolith · on Nov 10, 2024

For C, I recommend CBMC.

nanolith · on Oct 27, 2024

One area that I have been exploring is building equivalence proofs between high-level specifications, an implementation in C, and the machine code output from the compiler. I'm still very early in that work, but one of my hopes is to at least demonstrate that the output still meets the specifications, and that we can control things like timing (e.g. no branching on secret data) and cache in this output.

I think that the compilation and optimization step, as a black box, is a disservice for highly reliable software development. Compiler and optimizer bugs are definitely a thing. I was bitten by one that injected timing attacks into certain integer operations by branching on the integer data in order to optimize 32-bit multiplications on 8-bit microcontrollers. Yeah, this makes perfect sense when trying to optimize fixed point multiplication, but it completely destroys the security of DLP or ecDLP based cryptography by introducing timing attacks that can recover the private key. Thankfully, I was fastidious about examining the optimized machine code output of this compiler, and was able to substitute hand coded assembler in its place.

cesarb · on Oct 27, 2024

> One area that I have been exploring is building equivalence proofs between high-level specifications, an implementation in C, and the machine code output from the compiler.

AFAIK, that's how seL4 is verified. Quoting from https://docs.sel4.systems/projects/sel4/frequently-asked-que...

"[...] Specifically, the ARM, ARM_HYP (ARM with virtualisation extensions), X64, and RISCV64 versions of seL4 comprise the first (and still only) general-purpose OS kernel with a full code-level functional correctness proof, meaning a mathematical proof that the implementation (written in C) adheres to its specification. [...] On the ARM and RISCV64 platforms, there is a further proof that the binary code which executes on the hardware is a correct translation of the C code. This means that the compiler does not have to be trusted, and extends the functional correctness property to the binary. [...] Combined with the proofs mentioned above, these properties are guaranteed to be enforced not only by a model of the kernel (the specification) but the actual binary that executes on the hardware."

nanolith · on Oct 27, 2024

Indeed it is. What I'm working toward is a more efficient way to do the same, that doesn't take the touted 30 man years of effort to accomplish.

I'm working on a hybrid approach between SMT solving and constructive proofs. Model checking done with an SMT solver is pretty sound. I'm actually planning a book on a scalable technique to do this with CBMC. But, the last leg of this really is understanding the compiler output.

nlewycky · on Oct 27, 2024

> I was bitten by one that injected timing attacks into certain integer operations by branching on the integer data in order to optimize 32-bit multiplications on 8-bit microcontrollers.

FWIW, I think this should be considered a language design problem rather than an optimizer design problem. Black box optimizer behaviour is good for enabling language designs that have little connection to hardware behaviour, and good for portability including to different extensions within an ISA.

C doesn't offer a way to express any timing guarantees. The compiler, OS, CPU designer, etc. can't even do the right thing if they wanted to because the necessary information isn't being received from the programmer.

jcranmer · on Oct 27, 2024

To a large degree, constant-time programming is hampered by the fact that even hardware is often unwilling to provide constant-time guarantees, let alone any guarantees that the compiler would care to preserve. (Although, to be honest, constant-time guarantees are the sort of things that most compiler writers prefer to explicitly not guarantee in any circumstances whatsoever).

yxhuvud · on Oct 28, 2024

If it were starting to show up in languages though, then that would make it more likely for hardware to start introducing functionality like this.

bluGill · on Oct 27, 2024

8 bit cpus offer constant time. 16 bit was starting to get into the issues where you cannot off it.

wiml · on Oct 28, 2024

Your comment makes me wonder about the idea of building a superscalar, out of order, speculative implementation of the 6502 or 8080 instruction sets. Might make a good educational project.

nanolith · on Oct 27, 2024

Few languages provide such guarantees. But, there really was no way with this particular compiler to pass a hint to generate constant time code.

Black box designs work until the knob or dial you need to control it isn't there. I would have taken a pragma, a command-line option to the compiler, or even a language extension.

This is one example of many as to why I think that user-guided code generation should be an option of a modern tool suite. If I build formal specifications indicating the sort of behavior I expect, I should be able to link these specifications to the output. Ultimately, this will come down to engineering, and possibly, overriding or modifying the optimizer itself. An extensible design that makes it possible to do this would significantly improve my work. Barring that, I have to write assembler by hand to work around bad assumptions made by the optimizer.

gizmo686 · on Oct 27, 2024

If you haven't done so, you might want to look at some of the work done by the seL4 microkernel project.

They start with a Haskell prototype that is translated programatically into a formal specification for the theorem prover.

They then implement the same thing in C, and use a refinement prove to demonstrate that it matches their Haskell implementation.

They then compile the program, and create another refinement proof to demonstrate that the binary code matches the C semantics.

nanolith · on Oct 27, 2024

I'll refer you to my reply to a sibling comment. I'm hoping that I can build a more efficient means of doing similar work as with seL4, but without the 30 man year effort.

They are on the right track. But, I think there have been some improvements since their effort that can lead to more streamlined equivalence proofs.

kragen · on Oct 28, 2024

What do you think about Jasmin?

nanolith · on Oct 28, 2024

Jasmin has some great ideas. My goal in particular is to improve the tooling around C so that verification of C programs is easier. Because there are trillions of lines of C out there, I want to ensure that semi-automated processes can be developed for verifying and fixing this code. For cryptography, this involves similar features as Jasmin. The main difference is that I'm looking at ways to guide this while largely preserving the C language. That's not because I think C is better, but because I want to inject this into existing code bases and development processes, with existing software developers.

kragen · on Oct 28, 2024

It sounds like a very promising approach! Do you think there are really trillions of lines of C? At ten delivered and debugged lines of code per programmer-day we'd have on the order of a single trillion lines ever written, but most of that has been discarded, and probably less than half of the remainder is C.

I suspect something like wasm may be a better way to preserve backward-compatibility with C, although of course it won't help with constant time or confused-deputy vulnerabilities. CHERI might help with the latter.

nanolith · on Oct 28, 2024

That's based on current estimates. It's impossible to know for certain, but there is a lot of C software out there.

I'm a big fan of runtime mitigations. I use a few in my own work.

WASM can help in some areas. But, bear in mind that a lot of this C source code is firmware or lower level operating system details (e.g. kernels, device drivers, system services, low-level APIs, portable cryptography routines). In this case, WASM wouldn't be a good fit.

CHERI is also a possibility in some contexts for runtime mitigation. But, since that does involve a hardware component, unless or until such capabilities are available in mainstream devices and microcontrollers, this would only be of limited use.

There are other runtime mitigations that are more mainstream, such as pointer authentication, that are in various states of evaluation, deployment, or regression due to hardware vulnerabilities. I think that each of these runtime mitigations are important for defense in depth, but I think that defense in depth works best when these mitigations are adding an additional layer of protection instead of the only layer of protection.

So, this verification work should be seen as an added layer of protection on top of runtime mitigations, which I hope will become more widely available as time goes on.

nanolith · on Oct 27, 2024

While Locutus is an improvement on the previous Hyphanet model, I'm still concerned about the caching model. In Hyphanet, data would be cached the more it was used, and peers connected over multiple hops. This had the potential to cause huge headaches as authorities began investigating issues such as CSAM, because their understanding of how caching and requests work did not match up with how it actually worked. While, certainly, one could build a subset darknet, all it takes is one bad peer to expose the whole darknet to prosecution.

I played with Freenet when it first came out, but when I realized the implications of this -- which later turned out to be true -- I destroyed the hard drive I had been using to run Freenet, just in case.

Locutus seems less aggressive about caching data, but it still does some caching. Without really digging into the documentation or source code, I'd still be nervous about running it.

sanity · on Oct 27, 2024

Locutus (now Freenet) is more of a communication than a storage medium. It's focused more on allowing people to build decentralized tools like group chat[1] for realtime communication than be a content distribution network like bittorrent.

We're also planning a decentralized reputation system based on the concept of "web of trust" that should do at least as good a job of ensuring users aren't exposed to unwanted content like spam or worse.

[1] https://github.com/freenet/river

nanolith · on Oct 27, 2024

That is reassuring. I hope that your team can build this web of trust, because it really is crucial.

It is great to have a tool that allows ideas to be shared, but I think that consent regarding which ideas are shared or promoted is important, not only from a legal perspective, but also from an ethical one.

sanity · on Oct 27, 2024

Agreed, any system that allows people to discover content needs to protect them from stuff they don't want to see, whether it's annoyances like spam, malicious attacks like DDoS, or extremely harmful content like CSAM.

Centralized services don't do a great job of this but I think we can do better with a decentralized approach.

We won't have a perfect solution overnight but we've already built important components of such a system (eg. ghost keys[1]).

[1] https://freenet.org/news/introducing-ghost-keys/

immibis · on Oct 27, 2024

Under EU rules (Digital Services Act) I believe an automated cache is treated similarly to a pipe - you're no more responsible for holding content in your cache (especially if encrypted!) than a Tor node operator is for someone accessing it through Tor.

German police did raid the home of a guy who ran a lot of Tor nodes. Authoritarians don't need any technical excuse to raid your home - they can just do it if they want to scare you away from these platforms, regardless of how they work.

nanolith · on Oct 27, 2024

Unfortunately, US case law is murkier. There have been some notable convictions of people using the old Freenet to search or host CSAM. However, the digital forensics used in these cases was pretty sketchy. If someone simply caching data from downstream queries were held to the same standard, they'd also be considered guilty.

I have no doubt that there was other evidence that led to the conviction of these individuals. But, I can only go on what I know, and what I know is that the standard of digital forensics evidence in those cases was subpar.

nanolith · on Oct 25, 2024

The compiler really doesn't have the opportunity to simulate every possibility of code. It's not just the matter of whether the function is safe, but every possible use of the function, which the compiler may not see when it is focused on a single compilation unit or a library.

If you want this level of safety, which is possible in C, then you need to use a model checker. Model checking C isn't as trivial as adding a flag to the compiler, but it can be done with about as much overhead as unit testing, if a reasonable idiomatic style is followed, and if the model checker is used well.

It is still a decision problem, and thus has similar limitations, but you can perform steps to ensure that you have some level of soundness with unwinding assertions and other techniques.

jmkr · on Oct 25, 2024

Thanks that's helpful I'll take a look at model checking.

nanolith · on Oct 25, 2024

That used to be true, but now we have reasonable model checking tools for C. It's possible to write safer C without the cognitive load of Rust.

https://www.cprover.org/cbmc/

kelnos · on Oct 25, 2024

So two things:

1. This is perhaps just my preference, and so is more subjective, but I don't want to have to pick and choose among various options for doing linting or static analysis or or address sanitizing or model checking after the fact. I want compilation to fail if any of these invariants don't hold. Rust can do that; C never will be able to do that. (Sure, if I write unsafe Rust, I'm going to want to run MIRI on it, but if can stick to safe Rust, the compiler should be sufficient.)

2. I'm probably not as well-versed in the topic as you are, but my understanding is that model checking tools like this cannot prove that every single program that a compiler will accept will also be free of the issues that the model checker is looking for. Again, the Rust compiler can do that. Yes, that does mean that the Rust compiler might reject some programs that would turn out to be safe and sound, but I'm ok with that trade off.

nanolith · on Oct 26, 2024

1. Of course it can. My model checks on function contracts and invariants run for every function. If a function fails this contract or invariant, the overall build fails. Does it matter that this is a separate build step than the compilation? Not in practice. Think of the model check step as being a semantic analysis step that happens after the type check and syntax check from the compiler.

2. You're not interested in proving that every program that the compiler accepts will be free of issues. You're interested in whether YOUR program that you write is free of those issues. The Rust compiler CAN'T do this, because the Rust compiler is only looking at a SUBSET of the possible things that you can build model checks against. This is why Kani -- a model checker for Rust -- exists. You can model check unsafe code in Rust, as well as safe code in Rust against user assertions and function contracts that are otherwise not possible to check in vanilla Rust.

Model checking isn't just for C, but model checking, as a practical form of formal methods, brings the same and even better safety to C. In fact, with Kani, you can get similar safety in Rust as well.

If you like Rust, use it. But, as was the point of my comment, it is possible and practical to get similar safety in C.

Cyph0n · on Oct 25, 2024

I am sure it is “possible”, but we are talking about practicality here.

Why doesn’t the Linux kernel embrace model checking instead of experimenting with Rust?

nanolith · on Oct 25, 2024

It is quite practical. I'm actually planning a book on the subject.

The reason why some Rust enthusiasts have been experimenting with Rust in the Linux kernel is because they are passionate about Rust, and kernel maintainers are looking to find younger people. It's neither an endorsement of Rust nor an argument against model checking in C.

The reality is that this tooling isn't yet well known about. As it becomes better known, it will be adopted.

Cyph0n · on Oct 25, 2024

Unless you have some insider info, I think you’re oversimplifying why Linux is giving Rust a chance.

Also, model checking - at least as you’re portraying it - sounds too good to be true. If it was anywhere close to the realm of practicality for large C codebases (without maintaining a model separate from code), we would be hearing its praises being sung by C devs all over.

nanolith · on Oct 25, 2024

> Unless you have some insider info, I think you’re oversimplifying why Linux is giving Rust a chance.

Not really. Linus Torvalds has been quite open about this topic, and it has been covered extensively on LWN.

> Also, model checking - at least as you’re portraying it - sounds too good to be true.

> If it was anywhere close to the realm of practicality... we would be hearing its praises being sung by C devs all over.

Like any technology, model checking takes effort to use and learn. But, it does work quite well. Again, you're conflating the popularity of something with its effectiveness, which is a poor argument.

Cyph0n · on Oct 25, 2024

In my opinion, vaguely handwaving that model checking makes C as effective as Rust invites nothing but skepticism to anyone who knows C and Rust.

Typically, if you’re advocating for a relatively unknown technology that you want others to adopt, the onus is on you to describe how it is better and to be upfront about its limitations. Good luck with your book!

nanolith · on Oct 25, 2024

I appreciate your opinion. It's not possible to hold a nuanced conversation about model checking in general or in C in particular in the comments of HN. I'd need much more space. But, I can summarize.

CBMC's abstract machine can detect memory safety violations and integer related UB. This includes things like use-after-free, buffer overruns, heap/stack corruption, thread races, fence post errors, integer conversion and promotion errors, and signed integer overflow. When it detects a violation, it provides a counter-example demonstrating this. It also provides the user to build custom assertions, which allows function contracts to be built and enforced. Any function can be defined as the entry point for the model checker, which allows function-by-function analysis. Shadow methods can be substituted, which provides the abstraction necessary to model check entire code bases. A shadow method uses non-determinism to cover all possible inputs, outputs, and side-effects that the real function could perform. This abstraction also allows modeling of third party libraries, user/kernel code, and hardware. So far, I've model checked code bases of around half a million lines of code. It will easily scale to cover a code base the size of the Linux kernel, as long as you understand how to use it. It's an engineering problem at this point. Tracking that abstraction matches implementation is actually not that hard and can be done by writing good function contracts which are verified by the model checker. If the shadow and the original code follow the same function contract, which includes all possible inputs, outputs, and side-effects, the abstraction can be substituted.

The biggest limitation really comes down to large recursive data structures, which is also a pain point for Rust for that matter. There are ways to deal with this, but that's probably the most significant place where any code base customization is required. It's possible to refactor this code to be just as fast in modern C, but in a way that is easier for the model checker to verify quickly.

It's impossible to convert the trillions of lines of code written in C to Rust or any other language without blowing the entire budget of the tech sector for 30 years. Rewrites are prohibitively expensive. Tooling and automation for this tooling is not nearly that expensive.

Cyph0n · on Oct 26, 2024

So is it a form of property checking? That is, your model defines certain properties which your program is checked against.

Or is it some form of symbolic execution? This I doubt because I believe the performance is not there yet.

I will read up a bit more on CBMC.

nanolith · on Oct 26, 2024

It's a form of abstract interpretation. The code is compiled to a constraint problem in an SMT solver.

There is definitely a performance impact here, which is why it is important to decompose the program and verify it function by function. This decomposition is sound as long as the model checking scenario covers the complete function contract. To improve performance further, I use abstraction in the form of shadow methods. These are sort of like mocks for the model checker. They provide the same function contract -- inputs, outputs, and side effects -- as the original function, but using built-in non-determinism provided by the model checker. This simplifies the overall SMT equation while maintaining an approximation of the overall program. By defining external function contracts, I can use the model checker to verify that both the original function and the shadow function follow the same contract, which keeps the two in sync. The shadow functions are used to replace functions called by the function under model check in order to isolate this function and simplify the overall SMT equation.

The tool provides the mechanism, but it has taken me six years of work and research to develop a practical way to scale it. The book will cover the tool, but it is documenting this "cheat sheet" that is the real purpose for it.

For what it's worth, I'm also considering an edition that covers Rust and Kani.

binary132 · on Oct 26, 2024

Now I’m curious

OtomotO · on Oct 25, 2024

Great now give me tooling of this millenium (no, I am not going back to vendor everything manually and I am not reinventing every basic data structure I wrote in University in ever project I work on) and we have a deal!

Oh, also get rid of header files, they are archaic. And I want fearless concurrency... And sum types!

nanolith · on Oct 25, 2024

If you want those things, you don't want C. Pick a reasonable higher level language you like that makes those decisions for you.

My comment was not to imply that somehow C is superior to X, Y, or Z, but rather to point out that the safety problem with C does have a practical solution.

OtomotO · on Oct 25, 2024

Fair point, sorry (non native speaker here, for what it's worth) have a wonderful day!

nanolith · on Oct 18, 2024

Shooting invasive species is definitely legal in Florida assuming that you have permission of the land owner or are otherwise in a location where shooting is allowed (i.e. not in a municipality that bans the gunfire). It has to be done humanely: if you hit the invasive animal, you are responsible for dispatching it as quickly as possible. If you don't dispatch the animal, you can easily run foul of animal cruelty laws here.

I have been hoping for a health improvement so I can go on a guided python hunt. I have read up on it quite a bit. Not only can I get python leather to make a nice wallet, belt, and a pair of boots, but I can do so while doing a small part to curb an invasive species that is wrecking havoc on the ecosystem in South Florida.

rietta · on Oct 18, 2024

I hope your health does improve and you get to go on your hunt! Sounds like a good thing to plan for. Leathermaking is a good use too considering the unsafe mercury levels found in the Python meat.

TechDebtDevin · on Oct 18, 2024

I say we just give Florida back to Spain and call it a day.

eth0up · on Oct 18, 2024

What a ridiculous proposal. Give it back to the Timucans, and if you can't find any, which you won't, consider the Seminoles.

Billy Bowlegs for governor - I know he wouldn't turn the State Parks into golf courses!

nojvek · on Oct 18, 2024

I don’t think Spain can handle Florida anymore.

TechDebtDevin · on Oct 19, 2024

whispers *no-one can*

baud147258 · on Oct 21, 2024

>I don’t think Spain can handle Florida anymore.

I don't think Spain ever could handle Florida

anthk · on Oct 19, 2024

No refunds.

nanolith · on Oct 4, 2024

I'd recommend using FreeBSD as your first BSD. It has a more recent version of ZFS integrated in the kernel than NetBSD. OpenBSD does not have ZFS support; it's a direction they chose not to take for security and simplicity reasons.