Uptime monitoring is straightforward: a service sends HTTP requests to your URL on a schedule. If the response is not what it should be -- a timeout, a 500 error, a DNS failure -- it alerts you. That is the entire concept.
The reason it matters is equally simple. If your website or API goes down and nobody tells you, your users find out first. They leave. They tweet about it. They switch to a competitor. Uptime monitoring exists so that you hear about problems before your customers do.
This guide covers how uptime monitoring works under the hood, what you should be monitoring, the different types of checks available, and how to pick the right tool for your stack.
Why uptime monitoring matters
Downtime costs real money. Gartner's oft-cited figure is $5,600 per minute for enterprise outages, but even for a small SaaS doing $10K MRR, every hour of downtime means lost signups, failed API calls, and support tickets piling up.
Beyond the direct revenue impact, there are three reasons you cannot afford to skip monitoring:
- User trust erodes fast. A single unannounced outage can undo months of reliability reputation. Users do not remember the 364 days your service worked perfectly -- they remember the day it did not.
- SLA compliance is contractual. If you promise 99.9% uptime in your SLA, that is 8.7 hours of allowed downtime per year. Without monitoring, you have no way to measure whether you are meeting that commitment, and no data to dispute disputes.
- Mean time to recovery (MTTR) depends on detection speed. You cannot fix what you do not know is broken. Teams without monitoring typically discover outages 20-40 minutes after they start -- from user complaints. Teams with monitoring discover them in 1-3 minutes.
How uptime monitoring works
When a monitoring service checks your URL, here is what actually happens at the network level:
- DNS resolution. The monitor resolves your domain name to an IP address. If DNS is misconfigured or your DNS provider is down, the check fails here.
- TCP connection. A TCP handshake establishes a connection to your server. If the server is unreachable or the port is closed, the check fails at this stage.
- TLS handshake. For HTTPS URLs, the monitor negotiates a secure connection and validates your SSL certificate. An expired or misconfigured certificate causes a failure here.
- HTTP request and response. The monitor sends an HTTP GET (or HEAD) request and waits for a response. It records the status code and response time.
- Response validation. The simplest check just verifies a 2xx status code. More advanced checks can validate that the response body contains (or does not contain) specific keywords or content patterns.
If any step fails or the total request exceeds the timeout threshold (typically 30 seconds), the monitor records a failure. Most monitoring tools then run a confirmation check to rule out transient blips before sending an alert.
What you should be monitoring
Your homepage is the obvious starting point, but it is rarely the only thing that matters. Here is what a well-monitored stack looks like:
- Your marketing site and app. These are what users see. Monitor the homepage, login page, and key app routes.
- API endpoints. If your product exposes an API -- or if your frontend depends on one -- monitor your critical endpoints directly. A healthy homepage does not mean your API is working.
- Cron jobs and scheduled tasks. Background jobs fail silently. Heartbeat monitoring (also called dead man's snitch or push monitoring) catches jobs that stop running without anyone noticing.
- SSL certificates. An expired certificate takes your entire site offline for HTTPS users. SSL monitoring warns you days before expiration so you can renew in time.
- Third-party dependencies. If your app relies on a payment processor, email service, or external API, monitor those too. When Stripe goes down, your checkout goes down -- and you want to know immediately.
- Status pages. If you run a public status page, monitor it separately. An outage on your status page during an incident is the worst possible timing.
Types of uptime checks
Not all checks are created equal. Different failure modes require different monitoring approaches.
HTTP checks
The standard check. Send an HTTP request, verify you get a 2xx response within the timeout window. This catches server crashes, deployment failures, DNS problems, and certificate issues. It is the baseline that every monitor should have.
Keyword monitoring
An HTTP 200 does not always mean your page is working correctly. Your app might return a 200 with an error message in the body, or a CDN might serve a stale cached page while your origin server is down. Keyword monitoring checks that the response body contains (or does not contain) a specific string, catching these soft failures that status codes miss.
Content monitoring
A step beyond keyword checks. Content monitoring tracks changes to your page content over time -- useful for detecting defacements, accidental content deletions, or third-party script injections that would not trigger a status code failure.
Heartbeat (push) monitoring
For cron jobs and scheduled tasks, the traditional pull model does not work -- there is no URL to check. Instead, heartbeat monitoring gives you a unique URL that your job pings when it completes. If the ping does not arrive within the expected window, you get an alert. This catches jobs that crash, hang, or silently stop being scheduled.
Multi-region checks
A single-region check cannot tell you whether your site is down globally or just unreachable from one network. Multi-region monitoring runs the same check from multiple geographic locations simultaneously, distinguishing between a true outage and a localized routing problem. This matters especially if you serve users across continents.
Alert channels
A monitor that detects downtime but does not reach the right person fast enough is useless. The alert channel matters as much as the check itself.
- Email -- universal and reliable, but easy to miss if you are not actively checking your inbox. Good as a fallback, not as your primary channel for critical services.
- Slack -- integrates into the tool most engineering teams already live in. Set up a dedicated
#alertschannel so downtime notifications do not get buried in conversation. - Discord -- same concept as Slack, popular with open source projects and smaller teams. Webhook setup takes about 30 seconds.
- Webhooks -- the most flexible option. Send alert data to any HTTP endpoint: a custom dashboard, a PagerDuty integration, a Zapier workflow, your own incident management system.
- PagerDuty / Teams / Telegram -- for teams with established on-call rotations or specific communication preferences. Available on CronAlert's Pro plan and above.
The best setup uses at least two channels. A Slack message for fast visibility, plus email as a durable record. For production-critical services, add PagerDuty or a webhook to your incident management tool.
Key metrics to track
Uptime monitoring generates data. Here are the numbers that actually matter:
Uptime percentage
The headline metric. Expressed as a percentage of time your service was available over a given period. The difference between "nines" is dramatic:
- 99% -- 3.65 days of downtime per year. Unacceptable for most production services.
- 99.9% -- 8.7 hours per year. The standard target for most web applications.
- 99.99% -- 52 minutes per year. Expected for critical infrastructure and services with strict SLAs.
- 99.999% -- 5.2 minutes per year. Extremely difficult to achieve. Requires redundant everything.
Response time
How long the server takes to respond. Track the median (p50) and tail latency (p95, p99). A healthy median with a terrible p99 means some percentage of your users are having a bad experience -- even though your averages look fine. Trending response time upward is often an early warning sign of an impending outage.
Mean time to recovery (MTTR)
The average time between when an incident starts and when service is restored. This is the metric that monitoring improves most directly. Without monitoring, MTTR is bounded by how quickly someone notices and reports the problem. With 1-minute check intervals and instant alerts, detection drops to minutes.
Incident frequency
How often your service goes down. A service with 99.9% uptime that has one 8-hour outage per year has a very different reliability profile than one with 99.9% uptime spread across 500 brief incidents. Frequent short outages often point to different root causes (flaky deploys, resource exhaustion) than rare long ones (infrastructure failures, data corruption).
How to choose a monitoring tool
The monitoring market is crowded. Here is what actually matters when picking a tool:
- Check frequency. How often can you check? 5-minute intervals miss incidents that last 3 minutes. 1-minute intervals are the practical minimum for production services.
- Alert speed. Time from detection to notification. Some tools batch alerts or add delays. You want alerts sent immediately on confirmation of failure.
- False positive handling. Does the tool run confirmation checks before alerting? Single-check alerting means you get woken up at 3 AM because of a transient network blip. Smart confirmation (checking twice before alerting) dramatically reduces noise.
- Check types. Basic HTTP checks are table stakes. Look for keyword monitoring, content checks, heartbeat monitoring, and SSL expiration tracking to cover the full range of failure modes.
- Multi-region support. If you serve a global audience, you need checks from multiple locations to distinguish between regional and global outages.
- Alert channel variety. The tool should support your team's communication stack -- whether that is Slack, Discord, PagerDuty, or custom webhooks.
- Status pages. Built-in status pages save you from running a separate tool. Your users should have somewhere to check during an incident.
- Pricing transparency. Some tools charge per check, per alert, or per seat. Look for simple pricing that does not penalize you for monitoring more things.
- Team features. If multiple people need access, you need shared dashboards, role-based access, and the ability to route alerts to different team members.
- API access. For automation and integration with your existing tooling -- CI/CD pipelines, infrastructure-as-code, custom dashboards.
- Maintenance windows. Scheduled downtime should not count against your uptime metrics or trigger false alerts during planned deploys.
Getting started with CronAlert
CronAlert was built for this. We run on Cloudflare's edge network, which means your checks execute close to your infrastructure with minimal overhead. No heavy agents to install, no complex configuration.
Here is what you get on the free plan -- no credit card required:
- 25 monitors with 3-minute check intervals
- Email, Slack, Discord, and webhook alerts
- 1 public status page
- SSL certificate monitoring
- Basic API access
- 7-day log retention
Paid plans start at $5/month and add 1-minute intervals, keyword monitoring, multi-region checks, maintenance windows, team collaboration, and longer retention.
You can have your first monitor running in under 5 minutes. Create a free account, add a URL, set up an alert channel, and you are done. For a detailed walkthrough, see our step-by-step setup guide.
Frequently asked questions
How does uptime monitoring work?
An uptime monitor sends HTTP requests to your URL on a fixed schedule -- for example, every 1 or 3 minutes. If the response returns a non-2xx status code, times out, or fails at the DNS/TCP/TLS layer, the monitor marks your site as down and triggers an alert through your configured channels (email, Slack, Discord, webhook, etc.).
What is a good uptime percentage?
99.9% (three nines) is the standard target for most web applications. That allows about 8.7 hours of downtime per year. 99.99% (four nines) allows only 52 minutes per year and is typical for critical infrastructure. The right target depends on your SLA commitments and what your users expect.
What should I monitor besides my website?
Beyond your main website, monitor your API endpoints, webhook receivers, cron jobs and scheduled tasks, SSL certificate expiration, third-party services you depend on, and any URL that your customers or internal systems rely on.
How many monitors do I need?
At minimum, monitor your homepage, your primary API endpoint, and any critical background jobs. A typical SaaS product with a marketing site, app, API, and a few cron jobs needs 10-20 monitors. CronAlert's free plan includes 25 monitors, which covers most small to mid-size applications.
Can I monitor from multiple regions?
Yes. Multi-region monitoring checks your URL from multiple geographic locations simultaneously, which helps distinguish between a true outage and a localized network issue. CronAlert's Team and Business plans include multi-region checks from 5 regions across North America, Europe, and Asia-Pacific.