Bugs in software and what we can do about it

· 4 min read
Bugs in software and what we can do about it

The only way to create software that has fewer bugs is to cut off the functionality and implement an absolute minimum and then spend an enormous amount of time testing and perfecting it. That’s rarely the case for a project that aims for commercial success. New releases of every piece of software or hardware are buggy and vendors improve them over time. I don’t advocate for low-quality software, but I think that our expectations are somewhat unrealistic and such expectations lead to a lot of disappointment among developers, managers, and customers.

Bug-free software is not the goal

While it might be theoretically possible to create software that has no bugs, it’s rarely the primary goal set for developers. There’s a reason for that — it might take hundred years to write a completely bug-free version of a decently complex system, and then it will take even more time to write another system that proves that the first one is bug-free. Developers of such system will have to come up with every possible edge use case, implementing handling of it, reflecting it in the documentation and making sure that update didn’t break the system as a whole. And fixing the system if something went wrong. And them testing everything again.

Because of such impossibility, the goal is usually to write usable and maintainable software that is flexible enough to be easily changed or updated. A usable system should satisfy fundamental requirements. A certain level of bugs is acceptable as long as they don’t affect core use cases. The maintainable system allows developers to spot and fix bugs without introducing new ones quickly. System that is flexible can be changed and improved without completely rewriting it.

Spend time on testing to reduce the number of bugs

I can deploy my code to production and guarantee that it’s perfect — said no developer ever. At least not one that has a decent amount of experience. Knowing that we all make mistakes, the importance of testing becomes very clear. We must thoroughly test everything that is supposed to go to production to catch most of the bugs before they even see the daylight. There are many ways to test the code, and there are certain practices to do that as part of the development workflow, like TDD and regular code reviews. All of them allow reducing the number of bugs significantly.

Testing and reviewing the code is essential, but it takes time. From my practice, the rule of thumb is to spend about 50% of time testing and improving the system. After developers finish a feature, they deploy the code to a test server where QA engineers can thoroughly test it. The development team can start working on the new task while waiting for the feedback from QA. After testing all bugs are prioritized and scheduled for future iterations for developers. That increases the time to deliver a feature to the customers but significantly reduces the number of bugs that can be caught by QA

How do you know what is a bug?

One common problem is defining what a bug is. Suppose you’re building an application that works in a particular domain. The logic for the application is very well described and documented. Developers are implementing everything according to specification. Your team is doing QA, and there’s a minimal amount of bugs introduced to the system overall. In a situation like that three types of bugs occur often:

  1. The specification was correct, but implementation was wrong. Implementation bugs are quite common for inexperienced developers making questionable choices in their code. They can be introduced as a result of the rippling effect of a change on a large codebase, in that case even seasoned developer may not understand how his update will impact the system as a whole. That’s one of the most common types of bugs overall, and it’s the easiest ones to catch and prevent with the right amount of developer training, time spent on unit tests, code reviews and refactoring the code.
  2. The implementation was correct, but the specification was wrong. It is also perceived as a bug for the customer, but it’s not a bug in a developer’s world. Such bugs are introduced as a result of poor communication between the feature owner and the person who’s writing the specification. Sometimes it’s because details are lacking or because the description is misleading. The specification also can become inconsistent, especially for large features that have a lot of interdependent pieces. These types of bugs are much harder to track down and fix as implementation errors, and sometimes usually require significant error to fix because the logic itself changes.
  3. The specification wasn’t covering that specific case. This type of bug is somewhat similar to #2 but can lead to even more problems depending on the significance of the case that is missing. These bugs usually come to light at the release time when someone realizes that application behavior is wrong or the end result is not what it should be. These bugs have to be researched and can severely impact development timeline because lack of specification for something can lead to totally unexpected result.

Prioritize bugs and fix those that are important

Once we accept the fact that bugs in software are inevitable and we can’t make them go away, we can create a system to work with them. Fixing all bugs is impossible — a decent live project with users will have hundreds of bugs reported, and the only way to work through them is to categorize them by importance. The same approach should be taken by a small team building a startup — when you have limited resources you have to be very careful about how you spend them.

The importance of every bug depends on its criticality to core system functionality (data consistency is important!) and the number of users affected. Some bugs are just not worth fixing them because the amount of time needed to fix is much more than the impact of the bug. Some bugs are kept forever because fixing them will most likely introduce even more bugs and thus will require refactoring of a large portion of the system.