Bugs will happen from time to time. As our systems grow in complexity, new functionalities mean new risks. What makes or breaks a team is not only how it handles incidents, but also how it learns from them. This is where incident postmortems come into the picture.
What is a postmortem?
An incident postmortem is a framework, usually in the form of a meeting or document, whose purpose is to engage in a deep analysis of what occurred, why it did, and how to prevent similar recurrences. Blame-free postmortems reject the notion of a single root cause or culprit; they embrace that in most, if not all, cases, there are multiple contributing factors. As such, the blame also cannot be assigned to a single person.
Below, we offer 3 postmortem practices that embrace a blame-free culture.
Do it to learn
The focus of a blame-free postmortem is to learn. This includes, for example, the details of an incident, how the incident could have been handled better, where the team got lucky, and how the team can better support each other next time. In the absence of a blame-free culture, the main question during a postmortem leans closer and closer toward “who caused all this?” This sort of question ignores that engineers make the best decision they can based on the data and knowledge they have. In a learning-focused and blame-free postmortem, there is a collective effort to gather more data and knowledge so that engineers can make better and more informed decisions.
Try asking questions like these instead:
– What are the potential contributing factors that led to this incident?
– What systemic improvements can we make to prevent this from happening again?
– What steps are required for increasing the data and knowledge available to engineers so that they can make more informed decisions in the future?
– How can we improve our documentation so that future engineers on our team can understand what happened and what we have done to make our system more resilient?
– What additional data validation should we add in so that we can catch and fix the bug, leaving the users’ experiences unaffected?
Make it a group effort
A postmortem report prepared by a single person isn’t very effective, as it only offers one perspective on the incident, from lead-up, to resolution, through next steps. This not only can be restrictive to the team’s overall learnings on a given incident, it also fails to be inclusive.
One recommendation is for teams to collect perspectives in an asynchronous manner. Empowering participants to voice their perspectives outside of the context of a meeting allows for more thoughtful answers and inclusivity. A member of your CS team might have valuable information about how customers noticed an issue, and that might go unheard if there are only six engineers in the room. Encouraging others to participate in the postmortem also ensures that everyone is heard, even if they are new to the organization, uncomfortable speaking up, or if some people tend to dominate the conversation.
Collaboration during a postmortem meeting allows participants to offer additional perspectives and otherwise glossed-over contributing factors. It lends way to a richer picture of what really happened, which better informs how to improve (or what to improve) for next time. And, of course, it gives more people the platform to speak up about potential observations unseen by others and their own ideas of what could benefit the team. A postmortem for the team, by the team.
Be accepting of mistakes
No matter the severity of an incident, it is important to remember that every team member was trying their best and making the best decision that they could at the time. Rather than reprimanding people for their mistakes, a blame-free culture encourages being accepting of them and learn from them (Do it to learn). When we work in an environment in which mistakes lead to punishment, people start to shy away from taking risks, being creative, and speaking up. In the effort to gather information about the incident, it is actually detrimental to the process when people are afraid to speak up, especially if they possess information or context about the situation that no one else does (Make it a group effort). Therefore, it is beneficial to be accepting of mistakes and to encourage honesty and transparency during postmortem meetings. Only then will there be trust between team members and a sense of psychological safety.
We are happy to announce that FireHydrant now has a free tool for postmortems. It provides the structure of standard questions along with a collaborative space to share lessons learned. At the end of it, it’ll generate a pretty PDF and JSON file to share with others.