Websites go down for a small number of reasons. The same ten causes account for almost every outage most teams will ever see. Knowing the list — and the specific signature of each on the monitoring side — cuts the time between "something is wrong" and "here's the root cause" from hours to minutes.
This post walks through the ten most common causes of website downtime, what each looks like in your monitoring data, and what to do about it.
1. Bad deploys
The single most common cause of outages for active teams. A code change passes CI, passes type checks, passes tests, and then breaks in production under real data or real traffic. Common subcategories: a migration that takes a table lock longer than the deploy window, a config value missing from the production environment, a dependency that resolves differently in Vercel's build than in local dev, a feature flag that was meant to stay off but was pushed on.
What it looks like: The failure starts immediately after a deploy. Monitors flip from green to red within one check interval of the deploy timestamp. The error is usually a 500, a 502, or a blank response.
How to prevent: Staged rollouts. Run the new version for a canary slice of traffic before full rollout. Add a health check endpoint and have the deploy pipeline wait for it to return 200 from the new version before cutting traffic over. See CI/CD uptime monitoring for the deploy-gated-on-health-check pattern.
2. Expired SSL certificates
Nothing fails on a schedule like a certificate. Let's Encrypt certs expire every 90 days. Auto-renewal works until it doesn't — a rate limit, a DNS change that broke the ACME challenge, a cron job that silently stopped running three months ago. When the cert expires, every browser refuses to connect and the site is effectively down.
What it looks like: All browsers show a red certificate warning. Monitors report SSL errors rather than HTTP errors. Users bounce from the warning page; there is no graceful degradation.
How to prevent: Monitor certificate expiry dates and set alerts at 30, 14, and 7 days out. CronAlert's SSL certificate monitoring does this by default on every HTTPS monitor — no extra config, no cron job to forget about.
3. DNS failures
DNS is the oldest and most fragile dependency in the stack. A misconfigured registrar change, a typo in a record update, an authoritative nameserver outage, a delegation issue with a subdomain — any of these takes the site down for the duration of the DNS cache TTL plus resolution time.
What it looks like: Monitors report "name not resolved" or "connection refused" rather than an HTTP error. The failure propagates over minutes to hours as caches expire in different regions. Users in different parts of the world report the outage at different times.
How to prevent: Use multiple nameservers, keep DNS TTLs moderate (300-3600 seconds — short enough to fix fast, long enough to survive resolver blips), monitor your DNS separately from your HTTP endpoints, and never make manual DNS changes in production without a rollback plan. Multi-region monitoring catches regional DNS propagation issues that a single-location check misses.
4. Database overload or exhaustion
The database connection pool fills up, queries queue, timeouts cascade, and every request ends up returning a 500 or 503. This is often triggered by a traffic spike, a slow query that locks a row longer than usual, or a runaway background job that holds connections.
What it looks like: The site returns HTTP 503 or 504. Response times climb before the failures start. Only dynamic pages fail; static pages and cached content still serve.
How to prevent: Size your connection pool for peak traffic, not average. Set aggressive query timeouts so one slow query cannot drag down the whole pool. Use read replicas for expensive reads. Monitor database health separately via a health endpoint so you can tell "database is overloaded" apart from "web server is down."
5. Third-party dependencies
Your site may be perfectly healthy while Stripe is down, or while your auth provider is having a bad day, or while your email service is rate-limiting you. If your critical flows depend on a third party, your uptime is capped by theirs.
What it looks like: Specific flows fail while others work — checkout breaks but browsing is fine, signup fails but login still works. Error messages mention the third party by name in your logs. Their status page lights up yellow or red around the same time.
How to prevent: Put circuit breakers around every third-party call. Time out aggressively (2-5 seconds is usually enough). Cache successful responses where possible. Have a graceful degradation path: if Stripe is down, queue the order and tell the user "we'll confirm by email." Monitor the third party's status page and your own integration separately. The microservices monitoring guide covers external-monitoring patterns that apply to third-party dependencies too.
6. Traffic spikes
A viral link, a Reddit front page, a product launch, a botnet crawling your site — sudden 10x traffic can take down a site that handles steady 1x load just fine. The failure mode is usually a cascading failure: database hits its connection limit, web servers start queuing, load balancers start timing out, error pages return at normal speed but real pages return 502.
What it looks like: Response times climb before anything fails. Then requests start timing out at specific layers — load balancer, web server, database — depending on which capacity limit is hit first. Static assets keep serving if you have a CDN.
How to prevent: Put a CDN in front of static assets. Use autoscaling for stateless compute. Cache aggressively at every layer (HTTP cache, application cache, database query cache). Rate-limit abusive crawlers. Know your limits: load-test against realistic peak traffic and know which component will saturate first.
7. DDoS attacks
A deliberate flood of traffic from a botnet, aimed at your origin or your DNS. The result is the same as a legitimate traffic spike but at a larger scale and aimed at whatever layer the attacker picks.
What it looks like: Traffic volume is 100x normal, often from a specific set of IP ranges or user agents. Your legitimate users experience the same timeouts as a regular overload, but the pattern in the logs is distinctive — repeated identical requests, unusual geographies, or cycling through URL parameters.
How to prevent: A CDN with DDoS protection (Cloudflare, Fastly, AWS Shield) is the standard solution. Keep your origin server's IP unpublished where possible. Rate-limit at the edge. Have a playbook for turning on "I'm under attack" mode quickly if your CDN supports it.
8. Misconfigured CDN or cache
Caching is one of the most powerful performance tools and one of the most error-prone. A bad cache configuration can serve stale data for days, serve one user's session to another, cache error responses for hours, or bypass the cache entirely and saturate the origin.
What it looks like: Either everyone sees the same stale version of the site regardless of state, or the origin is hit for every request and gets overloaded. Users report "I logged out but the site still says hello Jane." Cache-related outages are often intermittent and regional because CDNs have many edge POPs.
How to prevent: Be explicit about cache headers on every response. Never cache authenticated responses. Add cache-busting strategies on deploys. Monitor cache hit rate and origin request volume. Use multi-region monitoring to detect regional cache staleness.
9. Expired domain registration
The most embarrassing outage. Your domain registration expired because the card on file expired, or the notification went to an alias nobody reads, or the renewal auto-charge was flagged as suspicious by the bank. The domain still resolves for a grace period, then stops resolving entirely, and sometimes gets parked or taken by a domain speculator.
What it looks like: DNS resolution fails for the domain. If you check whois, the domain is either in a grace period, pending deletion, or owned by someone else entirely.
How to prevent: Auto-renew with a card you actively monitor. Set manual calendar reminders 60 and 30 days before expiry. Use a registrar that allows multi-year registration and pay for 5-10 years on your core domains. Monitor the registration expiry separately from your HTTP checks.
10. Hosting provider outages
Your hosting provider has a bad day. A single AZ in AWS us-east-1 goes down. Cloudflare has a control plane incident. Your VPS provider pushes a bad kernel update. These are often the outages you can do the least about, at least in the short term.
What it looks like: Multiple customers of the same provider report outages at the same time. Their status page usually lights up within 5-15 minutes. Your monitoring goes red regardless of what your code is doing.
How to prevent: Multi-region deployment for critical services. Know your provider's track record and pick one with reasonable reliability. Monitor externally from a different provider than the one you're hosted on — a CronAlert check from Cloudflare's edge will succeed when your AWS-hosted site fails, cleanly separating "provider issue" from "my code issue." Subscribe to your provider's status page RSS or incident feed so you get early warning.
The pattern: you can't prevent all downtime, but you can detect it fast
Most outages are shorter than they should be because a human noticed quickly, and longer than they should be because a human noticed late. The biggest lever for reducing the impact of an outage is not preventing every cause above — some are unavoidable — but detecting every cause fast.
A 1-minute uptime check with a push or SMS alert converts a three-hour outage (during which the site is dead and nobody knows) into a fifteen-minute outage (during which the on-call engineer gets paged, investigates, and rolls back). That single change is worth more than most in-code hardening.
Pair that with:
- Multi-region checks so you can tell "the internet is broken near this one probe" from "the site is broken for everyone."
- Consecutive-check verification so transient blips don't page you at 3am for a DNS hiccup that resolved before you opened your laptop.
- SSL and domain monitoring so scheduled failures (cert renewal, registration renewal) never surprise you.
- Health endpoints on each service so you can tell which dependency is down when the site is in a degraded state.
- A public status page so users don't flood your support inbox during an outage.
More on reducing alert noise specifically in the alert fatigue guide, and on the cost calculation that justifies spending effort here in the cost of downtime guide.
Frequently asked questions
What is the most common cause of website downtime?
Bad deploys, for active teams. Code that passed CI but breaks at runtime under real traffic or data. For less active sites, expired SSL certificates and expired domain registrations are more common because they fail on a schedule nobody remembers. The specific top cause depends on how often the site changes.
How can I prevent website downtime?
You cannot prevent all of it, but you can dramatically reduce the impact. Monitor externally from multiple regions, set up SSL and domain expiry alerts, use staged deploys with health checks, size your database pool for peak traffic, add circuit breakers around third-party dependencies, and put a CDN in front of static assets. Fast detection is often the single biggest lever.
How long does a typical website outage last?
It varies wildly. Teams with good monitoring resolve most incidents in under 15 minutes. Teams without any external monitoring routinely have outages lasting hours because the first signal is a customer complaint. The single biggest reducer of duration is fast alerts on external uptime checks.
Does a CDN prevent downtime?
A CDN reduces some classes of downtime — traffic spikes on static assets, DDoS attacks, regional network issues — and has no effect on others. It does not help when your origin is down for dynamic requests, when your database is overloaded, when a deploy is broken, or when third-party dependencies fail. Worth having, not a substitute for origin-side reliability.
How do I know what caused my website downtime?
Correlate the alert timestamp with recent deploys, traffic graphs, database connection counts, and third-party status pages. External uptime monitoring logs give you the exact response codes and error messages from each check: a 502 points at the origin, a 503 at overload, a DNS error at resolution, a certificate error at SSL. Add structured logging for deeper analysis.
Start monitoring before the next outage
Every outage on this list is faster to fix when someone knew about it before the customers did. Setting up basic external monitoring is a 5-minute task that pays for itself the first time it catches a deploy that broke at 2am.
Create a free CronAlert account — 25 monitors, SSL monitoring, 3-minute intervals, multiple alert channels, and a 5-minute setup guide. Paid plans add 1-minute intervals and multi-region quorum for the specific failure modes in causes 3, 6, and 10 above.