A modern 4GHz CPU is not only 40 times faster. It is a few thousand times faster than a 100MHz CPU back from the days. Probably not 20,000, but at least 2,000 times faster seems reasonable.
And responsiveness back then was so good, because your program was very close to hardware with very little in between if not running completely free from OS abstractions.
Can you show your working on this? Because a 100MHz CPU can do 100,000,000 things a second, and a 4GHz CPU can do 4,000,000,000 things a second, and if my math's right, that means the 4GHz CPU can do 40 times as many things a second at the 100MHz CPU.
Now, you might argue 'the 4GHz CPU is multicore!', and so sure, maybe we're up to 8 times 40, which is, I'm pretty sure, 320. And maybe you'll say that the cache is bigger, so you'll be able to keep the data pipelines full and get more done on the faster CPU. But how are you getting to 'at least 2,000'?
Sure. I'll oversimplify a lot, but the feeling of how things work should be correct.
The clock frequency is not a good way of measuring performance. Never was. Even earlier designs as the 8086 did not do one thing (instruction) every cycle. They did far less.
Modern CPUs are extremely complex beasts that can take in a lot of instructions. They take a good look on those instructions, change them in a way that does not alter the result but makes some optimizations possible and then distribute those instruction to a bunch of internal workers that can work on those at the same time. More on this can be found in the wikipedia rabbit hole starting with instruction level parallelism.
One way to measure this is to look at how many of a selected set of instructions per cycle can be done. An 8086 could do 0.066. A 386DX did 0.134, a 486 could do 0.7. A Pentium 100 already could do 1.88, and so on. Modern CPUs get to 10, per core.
But wait, there's more. This comparison gives only a very rough idea of a CPUs capabilities since it focuses on a very specific thing that might have little to do with actual observed performance. Especially since modern CPUs have extremely specialized instructions that can do enormous amount of computations on enormous amounts of data in little time. And there we are in the wonderful world of benchmarks that may or may not reflect reality by measuring execution time of a defined workload.
Passmark does CPU benchmarks and their weakest CPU in the database seems to be a Pentium 4 @ 1.3GHz. Single Core, single thread. It comes in at 77 (passmarks?). An i7-13700 is rated with 34,431. Does that make it 500 times faster than the 1.3GHz P4? Hard to tell, but its a hell of a difference. And from the P4 to a Pentium or even a 486 running at 100Hz ... at least another hell of a difference.
We can also try Dhrystone MIPS, another benchmark. Wikipedia has - strangely enough - numbers for the Pentium and the 486 at 100MHz: 188 MIPS for the Pentium, 70 MIPS for the 486. The most modern (2019!) desktop cpu entry comes in around 750,000 MIPS. A Threadripper from 2020 over 2,300,000 MIPS.
So, how much more can a modern CPU do than an ancient one? A lot. And especially a lot more than you would expect from the faster frequency alone. Even with only one core, it can do several hundred times the workload. And we got a lot of cores.
While it's harder to calculate, that 4Ghz CPU comes with vastly faster RAM, busses, and disk. Not many 100 MHz systems around with NVMe or even SATA...
>your program was very close to hardware with very little in between if not running completely free from OS abstractions
This! It also meant that it was very very easy for any program or misbehaving driver to completely crash your system. Not to mention all the security implications of every app having direct hardware access.
But when I go look at my text editor being slow, I can see that the amount of CPU time spent dealing with the kernel is less than a tenth of it. So that's not the reason.
It's a much better estimate than hand waving about memory isolation.
If we want to talk about how things work directly, my program can get things to the GPU in far less than a millisecond. The safety layers are not the problem.
And responsiveness back then was so good, because your program was very close to hardware with very little in between if not running completely free from OS abstractions.