When something breaks in production, a lot of people instinctively ask the same question:

“Okay… who did it?”

That sounds dramatic.

It also sounds human.

We like stories with clear villains.

The database was innocent. The deploy was suspicious. Somebody pushed something at 4:58 p.m. and now the homepage looks haunted.

But a good postmortem is not a courtroom.

It’s a detective story.

And detective stories are not really about punishment.

They are about understanding what happened, how it happened, and how to stop it from happening again.


A Bad Postmortem Starts Like a Trial

The moment an incident is over, tired brains want closure.

So people start reaching for simple answers:

  • Who changed what?
  • Who approved this?
  • Who missed the warning signs?
  • Who made the mistake?

That can feel satisfying for about five minutes.

But it usually leads to the wrong ending.

Because incidents in complex systems are rarely caused by one evil mastermind in a hoodie.

They are usually built from many small things:

  • a confusing alert
  • an old assumption
  • a hidden dependency
  • a rushed deploy
  • a misleading dashboard
  • a missing safeguard

That is not a villain.

That is a plot.


Detectives Look for Chains, Not Villains

In a detective story, the interesting question is not just who touched the doorknob.

It is:

  • What happened first?
  • What made the next event possible?
  • What clues were visible?
  • Which assumptions turned out to be wrong?
  • Why did normal defenses fail?

That is exactly how a useful postmortem works.

It asks:

  • What was the timeline?
  • What changed?
  • What signals did we see?
  • What made diagnosis slow or difficult?
  • What protections were missing?

This is how you move from blame to learning.


“Human Error” Is Not the Ending

Imagine a detective finishing a case by saying:

“Well, a human touched something. Mystery solved.”

That would be a terrible detective.

And “human error” is usually a terrible postmortem conclusion.

Of course a human was involved.

Humans deploy things.

Humans click buttons.

Humans respond to pages.

Humans interpret dashboards.

The real question is:

Why did it make sense to that human at the time?

Maybe:

  • the alert was unclear
  • the UI was misleading
  • the runbook was outdated
  • the system behaved differently than expected
  • the process encouraged speed over safety

That is where the real evidence lives.


Every Incident Leaves Clues

A detective story has clues scattered everywhere.

A good postmortem does too.

Some clues live in:

  • logs
  • metrics
  • traces
  • deployment history
  • chat timelines
  • support tickets
  • people’s memory of what they saw

The goal is to reconstruct the scene.

Not to create a dramatic speech.

Not to crown a guilty party.

Just to understand the sequence clearly enough that the team can improve the system.


The Best Postmortems Make the System Less Surprising

A weak postmortem ends with:

  • “Be more careful”
  • “Communicate better”
  • “Pay more attention”

That is detective-novel nonsense.

A strong postmortem ends with concrete improvements:

  • clarify the alert
  • add a rollback step
  • improve a dashboard
  • add a circuit breaker
  • fix the runbook
  • change the review process
  • automate a check humans keep missing

Good postmortems do not just retell the story.

They rewrite the sequel.


Psychological Safety Is the Evidence Room

Detectives need witnesses who are willing to talk.

Teams need the same.

If engineers think every incident review is secretly a blame ritual, they will:

  • hide uncertainty
  • edit the story
  • avoid admitting confusion
  • protect themselves instead of helping the investigation

That destroys learning.

A strong postmortem creates safety for people to say:

  • “I thought this metric meant X”
  • “I didn’t realize that dependency had changed”
  • “The dashboard misled me”
  • “I skipped that step because the runbook was outdated”

That honesty is not weakness.

It is evidence.


What This Means in Real Life

The next time you write a postmortem, imagine a detective board on the wall.

You are not trying to answer:

Who should feel bad?

You are trying to answer:

  • What happened?
  • Why did it make sense at the time?
  • Which conditions lined up?
  • How do we make this less likely next time?

Because incidents are mysteries to be understood, not crimes to be punished.

And the best engineering teams do not hunt culprits.

They follow clues.


🧩 Reframe to Remember

Postmortems are detective stories for nerds.

They are about causes, timelines, and clues.

Not guilt, shame, or dramatic finger-pointing.

That is how systems get safer.

WordPress Cookie Notice by Real Cookie Banner