f you’ve ever babysat a toddler, you already understand on-call.
Not the cute Instagram version.
The real one.
The one where everything is calm…
until suddenly something is in their mouth that absolutely should not be.
That’s production at 2:37 a.m.
The System Was Fine Five Minutes Ago
Toddlers don’t wake up planning chaos.
They just:
- Get curious
- Touch things
- Test gravity
- Explore electrical outlets
Your system does the same.
A traffic spike appears.
A disk fills up.
A dependency times out.
A cron job decides today is the day.
Nothing is “wrong” with the toddler.
Nothing is “broken” in your system.
It’s just… being alive.
On-Call Isn’t Parenting. It’s Babysitting.
Here’s the key reframing:
Parents redesign the house.
Babysitters keep the kid alive until morning.
On-call is not about:
- Perfect architecture
- Long-term fixes
- Deep root cause analysis
On-call is about:
- Stopping bleeding
- Preventing worse damage
- Buying time
Your job is not to raise the system into adulthood at 3 a.m.
Your job is to make sure it doesn’t eat glue.
Why Things Break at Night (It’s Not Personal)
Night incidents feel unfair because:
- Fewer people are around
- Everything feels urgent
- Your brain is half asleep
But systems don’t have a bedtime.
Traffic is global.
Jobs run on schedules.
Failures don’t check calendars.
It only feels like production is being mean.
In reality, you’re just the adult in the room.
The Calm Babysitter Always Wins
Bad babysitting looks like:
- Panic
- Random clicking
- Trying five fixes at once
- Yelling “WHY IS THIS HAPPENING” internally
Good babysitting looks like:
- Breathe
- Stop the immediate danger
- Do the simplest safe thing
- Monitor
Sometimes that’s restarting a service.
Sometimes it’s scaling up.
Sometimes it’s rolling back.
Not elegant.
Just safe.
You’re Not Failing If You Don’t Fix It Forever
One of the biggest on-call myths:
“If I don’t fully solve it, I did a bad job.”
Nope.
If the system is stable when the sun comes up — you won.
Permanent fixes are daytime work.
With coffee.
And teammates.
Nighttime is survival mode.
And that’s okay.
What This Means in Real Life
Think of on-call like this:
- You’re not an architect at night
- You’re a safety net
- Your goal is boring stability
If nothing catches fire and users are okay — that’s success.
Even if the fix was duct tape.
✅ Reframe to Remember
On-call isn’t about brilliance.
It’s about calm.
It’s just babysitting a very fast, very curious toddler made of servers.


Leave a Reply