That's because they added new tests to catch these cases. I recall seeing someone mention in a comment here that coreutils didn't have a test for this either.
So it is reassuring that these things actually get documented and tested.
If I understood correctly, the test suite they are using are the tests used for the original coreutils. Apparently, the behavior that led to this bug wasn't covered by the original test suite (and it looks like the original coreutils just got the behavior "right" without it being tested), so the uutils guys added new tests to cover this after the bug was found.
That makes sense. However, I am generally biased against making significant changes in software (especially rewrites) without also beefing up the test suite, though.
How could they beef it up if they don't know the problem is there?
You can write frivolous tests all you want; bugs are a part of life. They'll occur even if you test thoroughly, especially in new software. It's the response that matters.
They could use the description of what the software is supposed to do and an understanding of the current code to figure out how it should work and what edge cases need to be tested. They can also test random inputs and so on. When you write new software, you do this. When you add features, you do this. When you do a rewrite or refactor, you should also do this.
Is it?
I hope I won't step on somebody's else toes:
GenAI would greatly help cover existing functionality and prevent regressions in new implementation.
For each tool, generate multiple cases, some based on documentation and some from the LLM understanding of the util. Classic input + expected pairs.
Run with both GNU old impl and the new Rust impl.
First - cases where expected+old+new are identical, should go to regression suite.
Now a HUMAN should take a look in this order:
1. Cases where expected+old are identical, but rust is different.
2. If time allows - Cases where expected+rust are identical , but old is different.
TBH, after #1 (expected+old, vs. rust) I'd be asking the GenAI to generate more test cases in these faulty areas.
"You have to catch everything" is much easier said than done, but "add at least one new test"? Nominally the people doing the rewrite should understand what they are rewriting.
Usually, the standard a rewrite is held to is "no worse than the original," which is a very high bar.
I think it's a bit naive to believe that the original coreutils developers only used what is now in the public test suite. Over that length of development, a lot of people probably tested a lot of things even if those tests didn't make it into the official CI suite. If you're doing a rewrite, just writing to the existing tests is really not enough.
It's not "naive". That is the nature of Open Source software. Everything is in the open.
Especially because people will not use a pre-compiled binary, but compile the software themselves (e.g., Gentoo users). So there must be no 'secret' tests, to guarantee that whoever compiles the software, as long as the dependencies are met, will produce a binary with the exact same behavior.
In fact, as an Open Source software, the test suite of the original coreutils is part of the Source package. It's in their (that is, coreutils' maintainers) interest to have the software tested against known edge cases. Because one day their project will be picked up by "some lone developer in Iowa" who will add new features. If there are 'secret' test cases, then the new developer's additions might break things.
This incident is merely coreutils happening to produce correct results on some edge case for uutils.
"Secret" tests have existed forever and will continue to exist. That's the nature of software. What gets pushed is only what the developer wants to maintain, not everything they did in the process of constructing and maintaining that software.
In practice "some lone developer in Iowa" will be held to the standard of quality of the original project if they want to add to it or replace it despite the support they get from the public package. Open-source software is also often not open to being pushed by any random person.
While you develop a feature, you do a lot of tests including white-box tests with the debugger, that would be annoying to automate and maybe wouldn't even survive the next commit. You also do "tests" by manually executing the code in your head modelling all the execution states. Automated tests often only test single cases, while rationing across all states, is hard to automate. They likely also had a specification next to their editor, referencing all the nitpicks in there and proving in their head that these are indeed addressed.
These kind of "tests" often are enforced as the codebase evolved, by having old guys e.g. named Torvalds that yell at new guys. They are hard to formalize short of writing a proof.
This is going to be a problem, considering (more and more unfortunate) Ubuntu's popularity. Scripts will continue to be written against Ubuntu, and if they do not work on your Debian or whatever, it's your problem.
Same thing that happens with Alpine's shell, or macOS, or the BSDs — I work with shell all the time and often run into scripts that should work on non-bash shells, with non-GNU coreutils, but don't, because nobody cared to test them anywhere besides Ubuntu, which until now at least had the same environment as most other Linux distributions.
More pain incoming, for no technical reason at all. Canonical used to feel like a force for the good, but doesn't anymore.
I try to execute my scripts with dash, but I am not sure if that is enough. Do you have a good way recommendation to check, that you're not relying on non-portable behaviour short of installing another distro, that doesn't take much effort.
Working on macOS for years taught me that most people are going to write code that supports what they can test even if there's a better way that works across more systems. Heck, even using 'sed -i' breaks if you go from macOS to Linux or vice-versa but if you don't have a Mac you wouldn't know.
Meanwhile, this is a rewrite of `date` (and other coreutils) with the goal of being perfectly compatible with GNU coreutils (even using the coreutils test cases), which means that differences between the two are going to reduce, not expand.
What you're complaining about here is "people only test on one platform" and your solution is that everything should stay the same and never change, and we should have homogeneity across all platforms forever so that you don't have to deal with a shell script that doesn't work. The actual solution is for more people to become aware of these differences and why best practices exist.
Note that recently Ubuntu and Debian switched /bin/sh to dash instead of bash, which then resulted in a lot of people having to fix their /bin/sh scripts to remove bashisms which then improves things for everyone across all platforms. Now Ubuntu switches to uutils and we find they have a bug in `date` because GNU coreutils didn't have a test for that either; now coreutils has a test for it too so that they don't have a regression in the future, and everyone's software gets better.
That's because they added new tests to catch these cases. I recall seeing someone mention in a comment here that coreutils didn't have a test for this either.
So it is reassuring that these things actually get documented and tested.