Note: This survey was conducted before the recent protests in the US that followed the death of George Floyd. It’s unclear how those events would have impacted the data, or any conclusions drawn from it.
One of my concerns as COVID-19 took hold in the US was what the impact on teams that are oncall in tech would be. It can be extremely challenging to be oncall during a “normal” time, and this has been anything but normal. So, I decided to create a survey to learn more about what people’s experiences have been. The survey was conducted from April 8 to April 27, 2020, via a Google Form. It was anonymous and had 141 respondents.
Before we get into the data, I want to set a bit of context. I’m not a data scientist and I don’t consider these results to prove anything scientifically. I do, however, find them very interesting.
While I wanted mainly to hear from folks who were currently in oncall rotations, I expected some people would complete the survey who were not oncall. So the survey began with this question:
Q1. I currently am in an oncall rotation.
Of the 141 respondents, 118 said they were currently oncall. The respondents that were part of an oncall rotation proceeded to the next section called Team Impacts. The respondents who said they were not oncall skipped to a section at the end called Company Information.
Q2. One or more of our team members have been sick with COVID-19.
Only 16 of the 118 oncall respondents reported having a teammate who had been sick with COVID-19. I expected this number to be a bit higher, based on how contagious SARS-CoV-2 is. Several of my teammates have contracted the virus, so that might be skewing my perception. A good chunk of the FireHydrant employees are located in New York, which might explain it.
Q3. We have had to adjust our oncall rotation due to team members being sick with COVID-19.
Only four of the 118 respondents reported having to adjust their oncall schedules due to their sick team members. That’s only 25% of the people who indicated they had a sick teammate. This one shocked me. I guess it depends on the team size and frequency of the oncall rotations, and maybe even some luck. In a big rotation, someone could be sick for several weeks and not have an oncall shift. I was once in a 10 person rotation, which meant I had more than two months off between shifts.
Q4. We have had to adust our oncall rotation to accommodate team members as a result of sheltering at home.
Twenty-seven of the 118 respondents said they had to adjust oncall schedules due to sheltering at home. That’s many more than had to adjust for sick teammates. This isn’t surprising, considering how disruptive that sheltering at home has been to some people’s lives. People with children are suddenly doing childcare all day, and some people have had to care for others.
I was happy to see that some teams are making those adjustments. I’ve been saying for a while that teams being flexible is going to be critical.
Q5. I am more concerned about the stability of our systems than I was before the COVID-19 crisis.
Only 30 of the 118 respondents said they were more concerned about the stability of their systems. This is another data point that surprised me. There’s so much uncertainty right now, and I expected that to bleed over more into how people think about their systems. I was happy to hear that a lot of the respondents are doing better than I expected, though. I struggle at times with anxiety and tend to see things through that lens. This is why I wanted to get some outside perspectives, because I know my assumptions are often not correct.
This was the first open-ended question of the survey. All of the open-ended questions were optional. This one received 74 responses.
Some themes emerged. Of the respondents who were neutral or disagreed with the previous question (the ones who were not more concerned), one factor that was mentioned repeatedly was that their teams are running their workloads in a public cloud. Several mentioned AWS and one said they were confident in AWS’s supply chain.
Some respondents mentioned that they felt their systems were well architected or robust. Others were already used to working remotely. People also mentioned things like having good automation and good incident coordination.
Some of the respondents who were more concerned about the stability of their systems worried about having physical access to equipment in their offices or data centers. One mentioned that their colocation facility had clamped down on access and that it had prevented them from doing some capacity upgrades. That response hurt me to read, what bad timing.
Scaling and traffic were also mentioned. Some companies, due to the nature of their business, are experiencing higher traffic levels than usual. One respondent worked at an email provider and said their traffic had been at Cyber Monday levels. Other people said their company’s traffic had significantly dropped off. Some companies have gained customers, while others have lost some.
One of my favorite responses came from someone who had said they were neutral on the previous question about system stability: “It was bad to begin with and hasn’t improved since.”
Q7. My team’s incident response during the COVID-19 crisis has been…
Ninety-two of the 118 respondents, or 78 percent, said their incident response was about the same. 15 percent said it was better, and about seven percent said it was worse. This was another result that surprised me. I certainly wasn’t expecting to see more people respond that their team’s incident response was better than those who said it was worse. We’ll look at some of the reasons in the next question.
The second open-ended question received 58 responses. Twelve of those people answered that their incident response had been better in the previous question. The most common response was that people were more available since they were all remote. One respondent mentioned that no one was traveling on their team. Another said their team had recently introduced monitoring and incident response tools. Others suggested that people were near their computers and ready.
People also mentioned that there was better communication. One respondent mentioned that things that would have been in face-to-face conversations or private Slack channels were now in the open. Another called it “forced intentional collaboration.”
There were only five responses from people who said their incident management was worse. Three of them mentioned working remotely. One mentioned slower responses due to travel limits. And one, unfortunately, responded that their management was putting more pressure on them to be “productive” (the quotes are theirs), which had made it “an impossible task to do anything right.”
The most common response from the people who said their incident response was about the same was that things just hadn’t changed much. Several used the phrase “business as usual.” Some people said their teams just hadn’t had many incidents. Others pointed to their teams already being remote and having good communication practices.
One person mentioned that their team already had a focus on keeping alerts actionable. They said they had just been looking at their alerts again to identify items that could wait until the following day for remediation instead of waking someone up in the middle of the night.
One respondent answered, “luck.”
Q9. Prior to the COVID-19 pandemic, our oncall team was primarily remote.
I wasn’t sure what kind of mix we’d get here. I’ve been working remotely for a couple of years now and assume it’s pretty common. But 65 percent of the folks who responded to the survey were not on remote teams. This result makes some of the earlier answers even more interesting, like the questions about their confidence in the systems and whether their incident response had gotten better or worse. I would have assumed that teams who were adjusting to suddenly being remote might have less confidence in the system and that their incident response might have gotten worse.
At the same time, many teams that aren’t remote still have experience responding to incidents remotely after hours. When I thought about this and reflected on my career, I responded remotely to incidents after hours with my teams for many years, before remote work in tech was even common.
Q10. What has surprised you about your team’s incident response, during the COVID-19 crisis?
The final open-ended question received 36 responses. Eleven respondents said that nothing had changed, or very little. It was the most common response by far.
Other people mentioned their team was supporting each other and looking out for the mental health of their co-workers. One respondent said that “I think people have become a little closer because it’s a social group that still exists now.”
I loved hearing this: “I had thought people would be too depressed or anxious to perform but my colleagues are all really throwing themselves into their work. The last postmortem I read was 23 pages rather than our usual 3-5.”
Someone else was surprised by “How we’ve been able to keep working despite all the added distractions of children, spouses, pets, and news.”
Q11. Members of my team have reported having increased anxiety, depression, or other mental health issues, due to the COVID-19 crisis.
This is the most striking result of the survey for me. I expected a lot of people to answer yes, but 69 percent was even more than I would have guessed. Also, keep in mind that the question asked about what the team members have reported. I try to be pretty transparent about my mental health issues with my teams, but I know that not everyone is comfortable doing that. I operate from a lot of privilege as a straight white guy working in tech. For some people, talking about their anxiety or depression might hurt their careers.
If the majority of the people who responded to the survey have a team member that reported having mental health issues due to the COVID-19 crisis, I can’t help but wonder how many other are also suffering more and haven’t said so.
This final section contained overall questions about the respondent’s employer. All respondents received this question, whether they reported being oncall or not.
Q12. My employer has communicated well with employees about the COVID-19 crisis.
Eighty-two percent of respondents said they agreed. This was great to see.
Q13. My employer has shown empathy towards employees about the COVID-19 crisis.
Eighty percent of respondents agreed with this question.
I was pleased to see these responses to this question and the previous one. I’ve seen some very good messaging from companies around the crisis, but I didn’t expect this positive of a response. The reality is that this is going to be a tough situation to manage for companies that do communicate well with their employees, let alone the ones who don’t.
Q14. How large is your company?
There were no respondents in the 1-5 employee range. 53 percent of respondents were from companies with 500+ employees.
I initially undertook this survey based on my curiosity about how oncall teams are dealing with the COVID-19 crisis. After seeing the responses to the survey, I feel more optimistic. In a way, I shouldn’t have been surprised. Humans are very adaptable. If you had asked me a few months ago how I’d be dealing with sheltering at home along for two months, I would have thought I’d be handling it much worse.
I’m not oncall, of course, but I did see some of the things I was hoping too, in terms of teams taking care of each other. I tend to talk about the negative aspects of being oncall, but anyone who’s been in a rotation has likely experienced the camaraderie and bonding that can happen when teams work incidents. The people on these teams are used to dealing with difficult and surprising technical challenges together. Those shared stories are part of team building.
I’d like to thank everyone who helped get the word out about the survey, and especially people who took the time to respond. Your insights have given me a lot to think about.