So, what’s the state of the art in guided state-space exploration/fuzzing?
Seems if you have a reference implementation your fuzzer should be able to do some nice white-box validation to ensure you are behaving the same as the old implementation.
For this type of thing I think property testing would work well. It would take a fair about of work to write a proptest for the entire input space of the tool. But it's achievable and as durable as the CLI arguments (so for this specific case, very unlikely to change in a backwards incompatible way). And this kind of rote work with good reference materials (namely the man pages) is amenable to being generated.
Whatever language you're working in there is probably a port of Hypothesis or quickcheck. For Rust I use the `proptest` crate, but for differential testing of a CLI I would probably use the Python Hypothesis package and invoke the commands externally.
Seems if you have a reference implementation your fuzzer should be able to do some nice white-box validation to ensure you are behaving the same as the old implementation.