Tell me the last time you prepared for something; because you prepared, it was succesful
Situation:
- In the past, before I was team lead, everytime there is a on-call issue/alert, the engineer will have to find a solution from scratch
- they would need to find a solution on the spot
- this slowed down the response time
Task:
- My goal was to make on-call process seamless
- I wanted to have a lower resolve time
Action:
- Much like a POH (pilot operating handbook), where pilot use when there are incidents on the plane, I decided to create my team's on-call handbook. I called it the SRE7 Operating HandBook
- Here I documented most procedures to take when there's a fire
- New a new incident happens, the on-call person usually updates the handbook
- Also, during our post-mortem, I make sure that the book is updated
Result:
- As a result, people were not inimidated to be on-call
- I also managed to lower our response time as a team
Lesson:
- We have to constantly find ways to improve our existing system