Most of the time, nobody looks at your status page. Then your app starts throwing errors, and within minutes it becomes the single most-visited page you operate. A status page is a piece of infrastructure that does almost nothing 99% of the time and carries your entire customer relationship the other 1%. That asymmetry is why it's worth getting right before you need it.
A good status page answers two questions a frustrated customer is asking: "Is it you or is it me?" and "When will it be back?" A bad one — stale, vague, or hosted on the same servers that just went down — answers neither, and turns a recoverable outage into a flood of support tickets and a dent in trust. This guide covers what to show during an incident, what to leave off, how often to update, and how to write updates that calm people instead of alarming them.
(If you don't have a status page set up yet, start with our guide to setting up a free status page. This post is about how to run one well once it exists.)
Rule zero: the status page must outlive the outage
Before any content decision, get the hosting right. The single most common status-page failure is hosting it on the same infrastructure as the product it reports on. When your app goes down, the status page goes down with it — and customers see a connection error on the one page that was supposed to tell them what's happening. That's worse than having no status page at all, because it confirms the outage while denying them any information.
Host the status page on independent infrastructure: a different provider, a different region, or a managed status-page service that lives on separate edge infrastructure entirely. The test is simple — if your primary cloud region disappeared right now, would the status page still load? If the answer is no, fix that first. A hosted status page (like the one built into CronAlert) passes this test by default because it doesn't run on your servers.
What to show during an incident
During an active incident, your status page has four jobs. Each maps to something concrete on the page:
- Scope — which components are affected. Break your service into named components (API, dashboard, checkout, email delivery, webhooks) and mark the state of each. A customer who only uses your API shouldn't have to guess whether an outage on the marketing site affects them.
- Impact — what users are actually experiencing. In plain language: "Checkout is failing for most customers" beats "We are experiencing elevated error rates in the payments subsystem." Describe the symptom the user sees, not the internal system that's broken.
- State — where you are in the lifecycle. Investigating → Identified → Monitoring → Resolved. This single label tells customers whether you're still hunting or already have a fix deploying.
- Recency — when you last said anything. A visible "last updated" timestamp and, ideally, a promise of when the next update lands. Stale-looking pages read as abandoned.
Underneath, keep a timestamped log of updates in reverse-chronological order so a customer arriving mid-incident can read the story so far. Each entry is short: what you know now, what changed since the last update, and what you're doing next.
Component states: slow is not the same as down
Use distinct states and map them honestly. The four that cover almost every situation:
| State | What it means | Example |
|---|---|---|
| Operational | Working normally | Everything green |
| Degraded performance | Up but slow or intermittently erroring | API latency 5× normal; some requests timing out |
| Partial outage | A subset of users or features is fully down | File uploads failing; rest of app fine |
| Major outage | Component unavailable | API returning 503 for everyone |
Mapping reality to the right state is a trust decision. Calling a slowdown a "major outage" over-alarms and trains customers to ignore you. Calling a hard outage "degraded performance" makes you look out of touch with your own product. Your monitoring has to support this distinction — a check that only knows "200 or not" can't tell slow from down. Tracking response-time thresholds alongside availability is what lets you flip a component to "degraded" before it's fully "down," which is exactly the early signal customers appreciate.
How often to update — and the cadence promise
Two rules govern timing, and the second matters more than the first.
Post the first update fast. Within minutes of confirming an incident, publish something — even "We're investigating reports of errors affecting the API. More soon." A blank status page during a known outage is the gap support tickets pour into. You don't need the root cause to acknowledge the symptom.
Promise the next update, and keep the promise. Set an explicit cadence — every 30 minutes for a major outage is a sensible default — and tell people when to come back: "Next update by 14:30 UTC." Then post at 14:30 no matter what, even if the update is "still investigating, no change yet, next update by 15:00." A promised update that arrives on time, with nothing new, is dramatically more reassuring than two hours of silence followed by a resolution. Silence reads as "they've forgotten about us"; a steady drumbeat reads as "they're on it."
This discipline is the public-facing half of a good incident response process — the internal side decides what to do; the status page decides what to say and when.
How to write updates that calm, not alarm
Tone during an incident is a skill. A few principles that consistently land well:
- Lead with impact, not internals. Customers care what's broken for them, not which microservice threw the exception.
- Be specific about scope. "Affecting customers in the EU region" or "checkout only" shrinks the perceived blast radius and stops unaffected users from panicking.
- Don't speculate on root cause in public. Early guesses are usually wrong, and a wrong public guess ("a database issue") that you later retract erodes confidence. Say what you've confirmed, not what you suspect.
- No blame, no drama. Don't blame a vendor, a team member, or "an unexpected issue" in a way that sounds like you're surprised your own system can fail. Calm and factual.
- Use the same time zone every time (UTC is the safe default) so a global audience isn't doing mental math during a crisis.
Save the deeper analysis for after. When the incident resolves, link the status entry to a blameless postmortem for major events — that's where root cause and prevention belong, written calmly with the full picture, not improvised mid-fire.
What to leave off
Restraint is part of the craft. Keep these off a public status page:
- Internal jargon and system names. "Kafka consumer lag" means nothing to your customers and signals you're talking to yourself.
- Premature root-cause claims. See above — confirmed facts only.
- Customer-specific data. Never name affected accounts or expose anything that hints at who's impacted.
- Apology inflation. One sincere acknowledgment beats five escalating apologies that make the situation sound more catastrophic than it is.
- Auto-posted raw monitoring noise. Don't wire every transient blip straight to the public page. A single failed check from one region isn't an incident — confirm it first, the same way you'd want verification before paging a human.
After the incident: history that builds trust
When it's over, mark the incident resolved with a final summary and leave it in the archive. A visible incident history and a rolling 90-day uptime percentage per component do more for trust than a suspiciously perfect record. Transparency signals you handle problems openly; an empty or hidden history makes prospects wonder what you're not showing them.
Drive the uptime numbers and history from real monitoring data rather than hand-edited figures — see our guide to uptime reports and using uptime data for SLA reporting for how to turn check results into the metrics a status page (and a customer's procurement team) wants to see. If you publish an SLA, your status page is where customers check whether you're meeting it.
Frequently asked questions
What should a status page show during an incident?
Which components are affected, what users are actually experiencing in plain language, the current lifecycle state (investigating/identified/monitoring/resolved), when you last updated, and a timestamped log of updates. Leave off jargon, root-cause speculation, and blame. Answer "is it you or me?" and "when will it be back?" without making customers email support.
How often should you update a status page during an outage?
Post the first update within minutes, then on a promised cadence — every 30 minutes for a major outage is a good default — and always say when the next update is coming. An on-time update with no news beats silence every time.
Should a status page be hosted separately from your main app?
Yes — always. It must stay up when your app is down, so host it on independent infrastructure (a different provider or a managed service on separate edge infrastructure). A status page on the same servers as your product is useless during the exact incident it exists for.
Should you show historical uptime on a public status page?
Usually yes. A rolling 90-day uptime figure and a past-incident archive build trust by signaling transparency. The only exception is genuinely poor reliability you should fix before publishing — but most teams underestimate how reassuring a slightly imperfect, honest history is.
What's the difference between an outage and degraded performance?
"Degraded" means up but slow or intermittently failing; "outage" (partial or major) means unavailable for some or all users. Use distinct states and map them honestly — your monitoring needs to tell slow from down for the page to reflect reality.
Run a status page your customers actually trust
A status page earns trust in the moments your product is at its worst. Host it off your own infrastructure, show honest component states, update on a promised cadence, write for the customer rather than the engineer, and keep an open history afterward. Create a free CronAlert account to get a hosted status page backed by real uptime monitoring — so the states, timestamps, and history reflect what your checks actually saw, not what someone remembered to type in.
Related reading: set up a free status page, incident response for small teams, writing a blameless postmortem, turning checks into uptime reports, and — if the status page is your main reason for choosing a tool — CronAlert vs Pulsetic.