I read those but was hoping for some more concrete questions. When I have a bug ...

Xc43 · on April 10, 2022

It is my suggestion you try to formulate questions on your own before reading the concrete examples I have given below. I have learned recently that being challenged when learning is good for learning.

That written, there are two pages of one book I recommend for everyone that wants some pre-made questions. The book is 'How to solve it' by George Polya. The two pages are at the start of the book. It is called the 'How to solve it' list in the contents page. The jist of it can be found on the wikipedia page of the book(1).

Here are some example questions I have made inspired by the book:

To understand: What is the feature? What is already there? What are the links between what is there and the feature? Is it sufficient?

While planning: Have I seen this situation before? Do I a know a related problem? If so, can I use this previous experience? Can I imagine a similar and simpler problem? Does this fulfill all requirements of the feature?

(1) https://en.wikipedia.org/wiki/How_to_Solve_It#Four_principle...

nuancebydefault · on April 12, 2022

Thanks so much! I haven't looked at bugs from that angle yet. Usually, when working in a huge CI/CD code base i just add/refactor code step by step whilst keeping things working. But as said, that makes problems easy to locate (less stress) but is not always the most efficient way, especially speed-wise. The questions / scientific method is promising, something i would like to give a try.

spc476 · on April 9, 2022

Bugs are just a manifestation of inattention to detail (wrong assumption, missed some critical item, an off-by-one error, etc.). If you are adding code and find a bug, it's most likely in the code just written. If you code hasn't been recently added (say, a few months---it could happen depending upon the program) and a bug pops up, then yes, it's a bit harder, but in my experience it's generally some input that wasn't expected [1], or something else that has changed in the environment [2].

I've found the scientific method to work wonders. Even if I don't have a hypothesis to test, just setting a breakpoint halfway in the work to check the results, and continue to bisect, can lead quickly lead to the issue. This is harder if you have timing issues or random crashes and it make take some time [3] but it is possible to isolate issues.

As for questions to ask: What are my assumptions? Assert (in C, call assert()) those in the code. What are my inputs? Where do they come from? Can they be invalid? How (or why) are they invalid? What is my expected output? Is the output correct?

[1] Latest bug at work---we work with NANP numbers, which have a particular format. The area code (first three digits) cannot have two consecutive 1s. The exchange (the next three digits) cannot have two consecutive 1s. And the code I wrote has been doing that filtering for five years now (give or take). But just recently, a customer has complained that 800-311-xxxx was mislabeled as bad. Turns out, 800 numbers (toll-free in the US) have an exception to the rule.

[2] Any issues that happen in production, my team is always assumed to be incorrect. About 95% of the time, it's not us, as our code doesn't change that often (we're lucky to get three deployments per year) and we have to point this out time and time again, unless it's classifying NANP numbers [1], in which case, yeah, it's us.

[3] Hardest bug I had to track down was a program randomly getting segfaults. On the development system, it would take hours, if not days, to crash. On the production system, it would take hours, if not minutes, to crash. And it never crashed in the same place. It took a month of constant work to find the issue---in an otherwise single process/single thread program, a signal handler was calling async-unsafe functions (first time I encountered that---didn't help that I was the only developer at the time).