Status Page Best Practices: What to Show During an Incident

Q: What should a status page show during an incident?

At minimum: which components are affected, what users are actually experiencing in plain language, when you last updated the page, and what you're doing about it. Show the current state (investigating, identified, monitoring, resolved), a timestamped log of updates, and an honest scope ('checkout is failing' rather than 'some users may experience issues'). Leave off internal jargon, root-cause speculation before you're sure, and blame. The goal is to answer the two questions every customer has — 'is it you or me?' and 'when will it be back?' — without making them email support to find out.

Q: How often should you update a status page during an outage?

Post the first update fast — within minutes of confirming the incident, even if all you can say is 'we're investigating reports of errors.' After that, set an explicit cadence and stick to it: every 30 minutes for a major outage is a reasonable default, and crucially, tell people when the next update is coming ('next update by 14:30 UTC'). A promised update that arrives on time, even with no new information, is far more reassuring than silence. The worst pattern is a status page that goes quiet for two hours mid-incident — users assume you've forgotten about them.

Q: Should a status page be hosted separately from your main app?

Yes. The cardinal rule of status pages is that they must stay up when everything else is down. Host the status page on separate infrastructure from the application it reports on — a different provider, region, or platform — so that the outage taking down your app can't also take down the page customers rely on to learn about it. A status page served from the same servers as your product is worthless during exactly the incident it exists for. Hosted status pages (like CronAlert's) solve this by living on independent edge infrastructure.

Q: Should you show historical uptime on a public status page?

Generally yes — a visible uptime history and past-incident log builds trust, because it signals you have nothing to hide and that incidents are handled transparently. Show a rolling 90-day uptime percentage per component and a chronological incident archive with the postmortem summary for major events. The exception is when your reliability is genuinely poor and a public number would do more harm than good; in that case fix the reliability first. Most teams underestimate how much a transparent, slightly-imperfect history reassures prospects compared to a suspiciously perfect or absent one.

Q: What's the difference between an outage and degraded performance on a status page?

Use distinct component states so customers can self-diagnose. 'Operational' means working normally; 'degraded performance' means working but slow or partially impaired (elevated latency, intermittent errors); 'partial outage' means a subset of functionality or users is fully down; 'major outage' means the component is unavailable. Mapping the real situation to the right state matters: calling a slowdown a 'major outage' erodes trust through over-alarming, while calling a hard outage 'degraded performance' makes you look out of touch. Your monitoring should distinguish slow from down so the status page reflects reality.

Most of the time, nobody looks at your status page. Then your app starts throwing errors, and within minutes it becomes the single most-visited page you operate. A status page is a piece of infrastructure that does almost nothing 99% of the time and carries your entire customer relationship the other 1%. That asymmetry is why it's worth getting right before you need it.

A good status page answers two questions a frustrated customer is asking: "Is it you or is it me?" and "When will it be back?" A bad one — stale, vague, or hosted on the same servers that just went down — answers neither, and turns a recoverable outage into a flood of support tickets and a dent in trust. This guide covers what to show during an incident, what to leave off, how often to update, and how to write updates that calm people instead of alarming them.

(If you don't have a status page set up yet, start with our guide to setting up a free status page. This post is about how to run one well once it exists.)

Rule zero: the status page must outlive the outage

Before any content decision, get the hosting right. The single most common status-page failure is hosting it on the same infrastructure as the product it reports on. When your app goes down, the status page goes down with it — and customers see a connection error on the one page that was supposed to tell them what's happening. That's worse than having no status page at all, because it confirms the outage while denying them any information.

Host the status page on independent infrastructure: a different provider, a different region, or a managed status-page service that lives on separate edge infrastructure entirely. The test is simple — if your primary cloud region disappeared right now, would the status page still load? If the answer is no, fix that first. A hosted status page (like the one built into CronAlert) passes this test by default because it doesn't run on your servers.

What to show during an incident

During an active incident, your status page has four jobs. Each maps to something concrete on the page:

Scope — which components are affected. Break your service into named components (API, dashboard, checkout, email delivery, webhooks) and mark the state of each. A customer who only uses your API shouldn't have to guess whether an outage on the marketing site affects them.
Impact — what users are actually experiencing. In plain language: "Checkout is failing for most customers" beats "We are experiencing elevated error rates in the payments subsystem." Describe the symptom the user sees, not the internal system that's broken.
State — where you are in the lifecycle. Investigating → Identified → Monitoring → Resolved. This single label tells customers whether you're still hunting or already have a fix deploying.
Recency — when you last said anything. A visible "last updated" timestamp and, ideally, a promise of when the next update lands. Stale-looking pages read as abandoned.

Underneath, keep a timestamped log of updates in reverse-chronological order so a customer arriving mid-incident can read the story so far. Each entry is short: what you know now, what changed since the last update, and what you're doing next.

Component states: slow is not the same as down

Use distinct states and map them honestly. The four that cover almost every situation:

State	What it means	Example
Operational	Working normally	Everything green
Degraded performance	Up but slow or intermittently erroring	API latency 5× normal; some requests timing out
Partial outage	A subset of users or features is fully down	File uploads failing; rest of app fine
Major outage	Component unavailable	API returning 503 for everyone

Mapping reality to the right state is a trust decision. Calling a slowdown a "major outage" over-alarms and trains customers to ignore you. Calling a hard outage "degraded performance" makes you look out of touch with your own product. Your monitoring has to support this distinction — a check that only knows "200 or not" can't tell slow from down. Tracking response-time thresholds alongside availability is what lets you flip a component to "degraded" before it's fully "down," which is exactly the early signal customers appreciate.

How often to update — and the cadence promise

Two rules govern timing, and the second matters more than the first.

Post the first update fast. Within minutes of confirming an incident, publish something — even "We're investigating reports of errors affecting the API. More soon." A blank status page during a known outage is the gap support tickets pour into. You don't need the root cause to acknowledge the symptom.

Promise the next update, and keep the promise. Set an explicit cadence — every 30 minutes for a major outage is a sensible default — and tell people when to come back: "Next update by 14:30 UTC." Then post at 14:30 no matter what, even if the update is "still investigating, no change yet, next update by 15:00." A promised update that arrives on time, with nothing new, is dramatically more reassuring than two hours of silence followed by a resolution. Silence reads as "they've forgotten about us"; a steady drumbeat reads as "they're on it."

This discipline is the public-facing half of a good incident response process — the internal side decides what to do; the status page decides what to say and when.

How to write updates that calm, not alarm

Tone during an incident is a skill. A few principles that consistently land well:

Lead with impact, not internals. Customers care what's broken for them, not which microservice threw the exception.
Be specific about scope. "Affecting customers in the EU region" or "checkout only" shrinks the perceived blast radius and stops unaffected users from panicking.
Don't speculate on root cause in public. Early guesses are usually wrong, and a wrong public guess ("a database issue") that you later retract erodes confidence. Say what you've confirmed, not what you suspect.
No blame, no drama. Don't blame a vendor, a team member, or "an unexpected issue" in a way that sounds like you're surprised your own system can fail. Calm and factual.
Use the same time zone every time (UTC is the safe default) so a global audience isn't doing mental math during a crisis.

Save the deeper analysis for after. When the incident resolves, link the status entry to a blameless postmortem for major events — that's where root cause and prevention belong, written calmly with the full picture, not improvised mid-fire.

What to leave off

Restraint is part of the craft. Keep these off a public status page:

Internal jargon and system names. "Kafka consumer lag" means nothing to your customers and signals you're talking to yourself.
Premature root-cause claims. See above — confirmed facts only.
Customer-specific data. Never name affected accounts or expose anything that hints at who's impacted.
Apology inflation. One sincere acknowledgment beats five escalating apologies that make the situation sound more catastrophic than it is.
Auto-posted raw monitoring noise. Don't wire every transient blip straight to the public page. A single failed check from one region isn't an incident — confirm it first, the same way you'd want verification before paging a human.

After the incident: history that builds trust

When it's over, mark the incident resolved with a final summary and leave it in the archive. A visible incident history and a rolling 90-day uptime percentage per component do more for trust than a suspiciously perfect record. Transparency signals you handle problems openly; an empty or hidden history makes prospects wonder what you're not showing them.

Drive the uptime numbers and history from real monitoring data rather than hand-edited figures — see our guide to uptime reports and using uptime data for SLA reporting for how to turn check results into the metrics a status page (and a customer's procurement team) wants to see. If you publish an SLA, your status page is where customers check whether you're meeting it.

Frequently asked questions

What should a status page show during an incident?

Which components are affected, what users are actually experiencing in plain language, the current lifecycle state (investigating/identified/monitoring/resolved), when you last updated, and a timestamped log of updates. Leave off jargon, root-cause speculation, and blame. Answer "is it you or me?" and "when will it be back?" without making customers email support.

How often should you update a status page during an outage?

Post the first update within minutes, then on a promised cadence — every 30 minutes for a major outage is a good default — and always say when the next update is coming. An on-time update with no news beats silence every time.

Should a status page be hosted separately from your main app?

Yes — always. It must stay up when your app is down, so host it on independent infrastructure (a different provider or a managed service on separate edge infrastructure). A status page on the same servers as your product is useless during the exact incident it exists for.

Should you show historical uptime on a public status page?

Usually yes. A rolling 90-day uptime figure and a past-incident archive build trust by signaling transparency. The only exception is genuinely poor reliability you should fix before publishing — but most teams underestimate how reassuring a slightly imperfect, honest history is.

What's the difference between an outage and degraded performance?

"Degraded" means up but slow or intermittently failing; "outage" (partial or major) means unavailable for some or all users. Use distinct states and map them honestly — your monitoring needs to tell slow from down for the page to reflect reality.

Run a status page your customers actually trust

A status page earns trust in the moments your product is at its worst. Host it off your own infrastructure, show honest component states, update on a promised cadence, write for the customer rather than the engineer, and keep an open history afterward. Create a free CronAlert account to get a hosted status page backed by real uptime monitoring — so the states, timestamps, and history reflect what your checks actually saw, not what someone remembered to type in.

Related reading: set up a free status page, incident response for small teams, writing a blameless postmortem, turning checks into uptime reports, and — if the status page is your main reason for choosing a tool — CronAlert vs Pulsetic.