Skip to content

Tell me the last time you prepared for something; because you prepared, it was succesful

Situation:

  • In the past, before I was team lead, everytime there is a on-call issue/alert, the engineer will have to find a solution from scratch
  • they would need to find a solution on the spot
  • this slowed down the response time

Task:

  • My goal was to make on-call process seamless
  • I wanted to have a lower resolve time

Action:

  • Much like a POH (pilot operating handbook), where pilot use when there are incidents on the plane, I decided to create my team's on-call handbook. I called it the SRE7 Operating HandBook
  • Here I documented most procedures to take when there's a fire
  • New a new incident happens, the on-call person usually updates the handbook
  • Also, during our post-mortem, I make sure that the book is updated

Result:

  • As a result, people were not inimidated to be on-call
  • I also managed to lower our response time as a team

Lesson:

  • We have to constantly find ways to improve our existing system