Nowadays, cars feature more electronic components than ever, embedded systems like braking, acceleration, navigation, communication, and dozens of others must perform flawlessly in various condition because human life is at stake. With current, autonomous vehicles technology development trends, the number of safety critical embedded systems will only increase and require more attention.
In Codelab, we mostly operate in industries with very high security standards and have many years of experience in Automotive projects, therefore our organizational processes are aligned to Automotive SPICE® standard and focused on constant improvement. Fixing problems quickly is significant but the only thing that can prevent mistakes from happening, is finding the root cause and tackle it properly. For Root Cause Analysis we used to use 5 Whys technique, developed almost a hundred years ago by Sakichi Toyoda. This is a very popular tool, especially in Lean Management. However, we quickly realized that, regardless of some advantages, this technique seems to be too simplistic for our needs. Too often the final answer of investigation is ‘human error’, too often it entails discouraging blame culture. We felt the need to find powerful, effective and socially conscious tool for post mortem analysis. We got inspired by Infinite Hows method, thoroughly described by John Allspaw and also Nick Stenning with Jessica DeVita.
The method does not simply change one word to another. Replacing Whys with Hows comes primarily with a different mindset and effort put to ask better questions, therefore get more valuable answers. In the following part, I will present this topic with more details.
Infinite Hows method
A perfect start for any investigation is to ask ‘why’, but in the end, inevitably changes to who is responsible. Judging a specific person won’t help the project team with either learning or improving.
Let’s take an example. If we start 5 Whys analysis with the question: ‘why the delivery was late?’ we will probably learn the root cause of the problem is either the manager doesn’t have sufficient management skills or someone on the team is not skilled or trained enough to deliver tasks on time. Yes, training is important, but we don’t need to do a proper analysis to come to this conclusion, and it doesn’t help with understanding the event, moreover improving, and learning from mistakes. Asking people why they did something multiple times may put them on the defensive and make them speak less frankly, especially when being asked by someone more powerful in the organization.
When using Infinite Hows method we start asking: ‘how did we made the delivery?’ it gives us an opportunity to learn how we evaluated the scope of the work, how much the time pressure was experienced, how often delivery delays happen, how the approach for coding and testing was chosen, and the list goes on and on. Asking ‘how’ lets us understand the conditions that allowed the failure to happen, gives wider perspective and more valuable data. It allows us to comprehend the whole story and find out what was responsible for the error. The shift of responsibility from who to what not only helps with understanding, learning, and making project improvements but also keeps a respectful, open minded and engaging working environment.
To work with Infinite Hows method, we need to start with understanding people’s local rationality.
Local rationality
It is obvious when we consider our own actions and decisions that we try to do what makes sense to us at the time. We believe that we do reasonable things given knowledge and understanding of the problem at a particular moment. In most cases when we make a decision, we think it’s the best, rational way. Otherwise, we wouldn’t have done it. This is known as the ‘local rationality principle’. Our rationality is local implicitly because its limited to our mindset, knowledge, capabilities, goals, and to the amount of information that we can handle as well. While usually accept this limitation for ourselves, we often use different criteria for others. We assume that they should have or could have acted differently, based on our current, post-incident knowledge. That’s why we are so eager to look for guilty ones during failure investigation. It’s natural, human tendency to alternate solutions to life events which already had occurred. But again, while counterfactual thinking is tempting it does not convey information about complex situation, environment, and a problem itself.
Asking better questions, leading interviews in a more empathetic manner, analysing problems from broader perspectives is a continuous learning process. There is no simple manual. As a result, complex analysis is time-consuming and doesn’t give a simple answer, however it doesn’t mean a weak analysis. It is the analysis that makes us learn and any failure prevention depends on that learning.