If something goes wrong on an airplane, nobody expects you to read a 200-page manual.
Instead, you get a small card.
It has:
- pictures
- short instructions
- only the important steps
Why?
Because emergencies are a terrible time for homework.
Incidents in production work the same way.
When systems are failing, nobody wants to open a giant wiki page called:
“Operational Recovery Procedures v17 FINAL FINAL 2.”
They want the emergency card.
That’s what a runbook is supposed to be.
Your Brain Gets Smaller During an Incident
In calm moments, engineers can:
- explore
- compare options
- read long documentation
- think in elegant diagrams
During an incident?
Your brain switches into survival mode.
You are:
- stressed
- rushed
- distracted
- probably being pinged by three people at once
This is why even smart engineers forget obvious things under pressure.
Not because they are bad.
Because pressure shrinks thinking.
That’s exactly why runbooks matter.
Airplane Cards Don’t Explain Aerodynamics
An airplane safety card does not teach you:
- how jet engines work
- why cabin pressure changes
- the history of aviation safety
It gives you exactly what you need right now:
- where the exits are
- what to do first
- how to use the mask
- where the life vest is
A good runbook works the same way.
It should answer:
- What is happening?
- What should I check first?
- What is the safe next step?
- How do I stabilize things?
- When do I escalate?
Not everything.
Just the critical path.
Long Documentation Is Useful — Just Not at 3 A.M.
Detailed documentation absolutely has value.
Design docs matter.
Architecture notes matter.
Deep technical explanations matter.
But they serve a different purpose.
Think of it like this:
- Documentation teaches
- Runbooks guide action
A runbook is not the place for background essays.
It is the place for:
- Symptoms
- Quick checks
- Immediate actions
- Rollback steps
- Escalation paths
- Links to deeper docs if needed
That’s it.
Visual Beats Clever
Airplane cards use pictures because pictures are fast.
A tired person under stress can understand:
- arrows
- icons
- short boxes
- yes/no decisions
The same is true in incident response.
A good runbook often includes:
- a checklist
- a decision tree
- a tiny architecture sketch
- command examples you can copy safely
You are not trying to impress people.
You are trying to help a stressed human do the right thing quickly.
The Best Runbook Feels Almost Boring
A strong runbook is:
- short
- obvious
- concrete
- predictable
It says things like:
- “Check service health endpoint”
- “Confirm database connectivity”
- “If error rate exceeds X, roll back release”
- “If rollback fails, page platform team”
Not:
- “Investigate system anomalies holistically”
- “Consider various mitigation strategies”
- “Use engineering judgment”
That may sound smart.
But in a real incident, vague language is useless.
What This Means in Real Life
If your runbook only works when people are calm, rested, and already understand the system…
it is not a runbook.
It is a textbook.
A real runbook should work for:
- tired people
- stressed people
- new people
- people with Slack exploding in the background
Because incidents are not the time to study.
They are the time to follow the emergency card.
🛟 Reframe to Remember
Runbooks are airplane safety cards for production.
Nobody reads a novel during an emergency.
They need the picture, the checklist, and the next safe step.
That is what saves uptime.


Leave a Reply