What is an error budget and how do I calculate one?

An error budget is the amount of downtime you're allowed to have in a period while still meeting your SLA. If you committed to 99.9% monthly uptime, your error budget is roughly 43 minutes per month (0.1% of 43,200 minutes). At 99.95% it's 22 minutes; at 99.99% it's 4 minutes. You calculate burn rate by dividing actual downtime by the period elapsed — if you've used 30 of your 43 minutes in week 2 of the month, you're burning at 2.8x the sustainable rate and you should freeze risky deploys until the rate drops. Error budgets turn the SLA from a year-end report into a live operational signal.

Should maintenance windows count against my SLA?

It depends on what's in your SLA contract. Most enterprise SLAs explicitly exclude scheduled maintenance announced in advance (typically 48-72 hours, capped at a few hours per month). If your contract has that language, configure maintenance windows in CronAlert with the same duration as the actual maintenance, and the period is excluded from uptime calculations automatically. If your SLA doesn't have a maintenance exclusion clause, scheduled maintenance counts against your budget — which is a strong argument for adding that clause to future contracts.

What's the right SLA target for a B2B SaaS company?

Three nines (99.9% monthly, 43 minutes downtime/month) is the typical entry-level commitment and is achievable on cloud infrastructure with reasonable engineering investment. Three-and-a-half nines (99.95%, 22 minutes) is common for established mid-market SaaS. Four nines (99.99%, 4 minutes/month) is what enterprise SaaS commits to and requires real engineering investment — redundant infrastructure, automated failover, mature deploy pipelines. The right target depends on what your infrastructure can reliably deliver and what customers will pay for; don't commit to a number you can't hit.

How to Use Uptime Data to Improve Your SLA Compliance

Q: What's the difference between SLA reporting and SLA compliance?

SLA reporting is retrospective — at the end of the quarter you produce a document showing whether you hit your committed uptime. SLA compliance is proactive — throughout the quarter you use the same data to know whether you're on track and intervene before you miss. Reporting tells you what happened; compliance is the operational practice of using monitoring data to make reliability decisions in real time. You need both, but reporting alone is too late to change the outcome.

Q: Can I track different SLAs for different customers?

Yes. The cleanest approach is to tag monitors by which customer tier they back — production-shared, enterprise-customer-A, enterprise-customer-B — and calculate uptime per tag for each customer's SLA period. CronAlert's API lets you pull check results filtered by tag and date range, so per-customer uptime can be computed by a small script. For customers on dedicated infrastructure (single-tenant deployments), put their monitors on a separate status page and compute uptime per page. For shared multi-tenant infrastructure, every customer sees the same uptime number.

SLA reporting is the document you send the customer at the end of the quarter. SLA compliance is the operational practice of using the same data, in real time, to make sure the document at the end of the quarter says what you want it to say. Reporting is backward-looking. Compliance is forward-looking. You need both, but only one of them can change the outcome.

We've already written about how to produce SLA reports — the math, the format, the API export. This post is about everything that happens before the report. How to use uptime data to know whether you're on track, how to spend your error budget like the limited resource it is, how to handle per-customer SLAs without losing your mind, and the operational habits that turn monitoring from a passive observability tool into an active reliability lever.

The mental shift: from reports to budgets

The single biggest unlock in SLA compliance is to stop thinking of uptime as a number you report and start thinking of downtime as a budget you spend. The math is the same; the operational behavior is completely different.

99.9% monthly = 43 minutes of downtime budget per month.
99.95% monthly = 21 minutes of downtime budget per month.
99.99% monthly = 4.3 minutes of downtime budget per month.

When you frame it as a budget, two things change. First, every downtime event has an obvious cost — that ten-minute incident on the 4th consumed 23% of your monthly budget. Second, decisions about risk have a natural denominator — "should we deploy this risky migration tonight?" becomes "we have 35 minutes of budget left this month and a normal deploy carries 2-3 minutes of expected risk, so yes." For background on the percentages, see how to calculate uptime percentage.

Step 1: Pick the right target

Compliance starts before any monitoring data is collected. The SLA target has to be one you can actually hit. The most common compliance failure is committing to four nines on infrastructure that can only deliver three.

A rough guide based on what's typically achievable at each tier:

99.9% (43 min/month) — achievable with a single-region cloud deployment, a competent on-call practice, and modest deploy automation. The entry-level B2B SaaS commitment.
99.95% (22 min/month) — achievable with multi-region redundancy or fast automated failover, mature deploy pipelines with canary releases, and proactive monitoring of dependencies. Common for established mid-market SaaS.
99.99% (4 min/month) — requires multi-region active-active, automated failover with low RTO, blameless deploy culture with strict canary discipline, and dedicated reliability engineering investment. Enterprise SaaS territory.
Five nines (5 min/year) — usually a marketing number, not a real commitment. Achievable only for narrowly scoped components (DNS, CDN edge) with massive engineering investment.

Before committing in a contract, look at your last 3 months of uptime data in CronAlert. If you weren't hitting the target during a quiet quarter, you won't hit it during a noisy one. Negotiate a lower target now, hit it consistently for a year, then renegotiate up at renewal.

Step 2: Calculate burn rate in real time

Once you've picked a target, the operational signal that matters is burn rate — the ratio of actual downtime to the budget you'd expect to use by this point in the period.

burn_rate = (downtime_so_far / period_elapsed)
          / (budget_total / period_total)

A burn rate of 1.0 means you're consuming budget exactly as expected. 0.5 means you're well under. 2.0 means you're burning budget twice as fast as sustainable — at this rate, you'll run out before the end of the period.

Concrete example: 99.9% monthly = 43 minutes budget. By day 10 of a 30-day month you'd expect to have used (10/30) * 43 = 14.3 minutes. If actual downtime is 28 minutes, burn rate is 28 / 14.3 = 2.0x — danger zone. If it's 7 minutes, burn rate is 0.5x — comfortable.

You can pull the inputs from CronAlert's API:

curl 'https://cronalert.com/api/check-results?monitor_id=<id>&start=2026-05-01&end=2026-05-26' \
  -H 'Authorization: Bearer <api_key>' \
  | jq '[.results[] | select(.status != "up")] | length * <check_interval_minutes>'

Wire this into a daily Slack message or a dashboard tile. A team that sees burn rate every morning is a team that can intervene before missing the SLA. A team that only sees uptime at the end of the quarter is reading the postmortem.

Step 3: Use burn rate to make deploy decisions

The most useful application of burn rate is as an input to deploy decisions. Many teams adopt a simple policy:

Burn rate < 1.0 — normal deploys. Risky migrations and infrastructure changes are fine if reviewed.
Burn rate 1.0 - 2.0 — normal deploys allowed, but risky changes require a second reviewer and a rollback plan. Friday afternoon deploys discouraged.
Burn rate > 2.0 — soft freeze. Only deploys that improve reliability or fix critical bugs. No infrastructure changes. No risky migrations.
Burn rate > 4.0 or budget exhausted — hard freeze. Stop all non-critical deploys until the budget resets at the start of the next period.

The point isn't to be rigid — it's to use the data you already have to make risk decisions explicit. "We can't ship this on Friday because we're at 2.5x burn" is a much better conversation than "I have a bad feeling about this." Teams that operate this way miss SLAs less often, and when they do miss, the explanation is on paper from week one of the period, not reconstructed at the end.

Step 4: Per-customer SLAs without losing your mind

Enterprise B2B contracts often have customer-specific SLAs. Customer A has 99.9%; Customer B has 99.95%; Customer C, the big one with the contract that took three months to negotiate, has 99.99%. The compliance question is: how do you track three different SLAs without three different monitoring tools?

Two patterns work:

Shared infrastructure: one uptime number, calculated three ways

Most multi-tenant SaaS shares infrastructure across customers. The actual uptime is the same for everyone — when the API is down, it's down for every customer. But each customer's SLA period and target is different:

Customer A's quarterly SLA period starts Jan 1, target 99.9%.
Customer B's monthly SLA period starts the 15th of each month, target 99.95%.
Customer C's annual SLA period starts on their contract anniversary, target 99.99%.

Same underlying check data, three different uptime calculations. A small script that pulls check results from CronAlert's API and applies each customer's period and target produces the three uptime numbers. Run it daily and dashboard the burn rate per customer.

Dedicated infrastructure: tag monitors per customer

For single-tenant deployments — common at the high end of enterprise — each customer has their own infrastructure with its own actual uptime. Tag the monitors per customer (customer-a, customer-b, etc.) and filter check results by tag when calculating per-customer uptime. A separate status page per customer is the cleanest way to expose the data; you can also give each customer their own status page URL behind authentication via CronAlert's private status pages.

The agency monitoring guide covers the multi-tenant tagging pattern in more detail, and it applies directly to enterprise SaaS with per-customer SLAs.

Step 5: Maintenance window policy

Maintenance windows are where SLA compliance most often breaks down operationally. The contract says "scheduled maintenance announced 48 hours in advance is excluded from uptime calculations." The implementation says "the engineer who did the deploy forgot to create the maintenance window in CronAlert." The result: the maintenance counts against your SLA and you've burned 30 minutes of budget on an event that contractually shouldn't count.

The fix is policy plus tooling:

Define what counts as scheduled. Announced N hours in advance (whatever your SLA says, typically 48 or 72), during a defined window (usually weekend nights or a customer-specific maintenance window), with documented scope.
Always create the CronAlert maintenance window first. Before the engineer touches anything, the maintenance window exists in CronAlert with the right monitors attached. CronAlert excludes the window from uptime calculations automatically.
Notify customers before the maintenance window starts. Use the status page's scheduled maintenance feature. If a customer wasn't notified 48 hours in advance, the maintenance isn't scheduled — it's an outage with a known cause.
Close maintenance windows on time. An "ongoing maintenance" window with no end time excludes uptime data indefinitely. Cap the window at the announced duration; if maintenance overruns, the overage counts against your SLA. That's a useful incentive.

Putting maintenance window creation into the deploy runbook (or even better, a CI hook that creates the window automatically before a release) removes the "forgot to set it up" failure mode.

Step 6: Customer credit automation

When you miss an SLA, the contract specifies service credits — typically a percentage of the monthly fee returned to the customer per missed tier. The most common compliance failure isn't missing the SLA; it's missing the SLA and then having to manually figure out, six weeks later, exactly what credit each customer is owed.

Automate the credit calculation. The inputs are:

Per-customer measured uptime (from the per-customer calculations above).
Per-customer committed SLA target (from the contract).
Credit schedule from the contract (e.g., 10% credit if uptime drops below 99.9%, 25% credit if below 99.5%, 50% credit if below 99.0%).

A small script that runs at the end of each customer's SLA period, pulls check results from CronAlert, applies the credit schedule, and emits a credit memo into your billing system removes the manual reconciliation step. More importantly, it removes the temptation to "not look too closely" when you've narrowly missed — when the system automatically calculates and applies the credit, you can't quietly skip the obligation, which builds the kind of customer trust that wins renewals.

Step 7: Use the data to improve the next quarter

The point of all of this is that compliance work produces engineering signal, not just paperwork. The team that hits 99.95% one quarter and 99.92% the next has data on what went wrong. Three questions to answer at each end-of-period review:

Where did the budget go? Group incidents by root cause. If 60% of downtime came from one class of failure (database deploys, third-party dependencies, deploy regressions), that's where to invest reliability engineering next quarter.
What did we ship that helped? Compare quarter-over-quarter uptime. If you invested in automated failover and uptime went from 99.91% to 99.97%, that's evidence you can use to justify more reliability investment.
What didn't help? If you added multi-region redundancy and uptime didn't improve, dig into why. Maybe the failures aren't single-region. Maybe the failover isn't working. The data points at where the assumption was wrong.

Pair this with the cost of downtime framework and the conversation about reliability investment becomes quantitative. "We lost X minutes of budget last quarter, the cost was Y, and the proposed investment of Z would have prevented W of those minutes" is a much better business case than "we should invest more in reliability."

Common compliance mistakes

Only looking at the data at the end of the period

By the time the quarterly report is generated, the quarter is over. If you're under target, there's nothing to do but write the credit. Compliance has to happen in real time — daily burn rate, weekly review, monthly trend.

Committing to four nines on three-nines infrastructure

The most common contract negotiation mistake. Procurement asks for 99.99% and engineering doesn't push back because "we'll figure it out." You won't. Look at historical data before agreeing to a target.

No maintenance window discipline

Either engineers forget to create them, or "ongoing maintenance" windows stay open for days. Both burn through SLA budget unfairly. Automate maintenance window creation in the deploy pipeline; set a hard end time on every window.

Manual credit calculation

Six-week-late credit memos with arithmetic errors damage the customer relationship more than the missed SLA itself. Automate it and apply it on the next invoice.

Frequently asked questions

What's the difference between SLA reporting and SLA compliance?

Reporting is the document you produce at the end of the period. Compliance is the operational practice of using the same data in real time to make sure that document says what you want. We covered the reporting side in how to use uptime data for SLA reporting — this post is the operational companion.

How do I calculate an error budget?

Budget = (1 - SLA target) * period duration. 99.9% monthly = 0.001 * 43,200 minutes = 43 minutes. Burn rate = (downtime so far / period elapsed) / (budget / period total). Greater than 1.0 means you're spending faster than sustainable.

Can I track different SLAs for different customers?

Yes — tag monitors per customer (or per shared component), pull check results filtered by tag, and apply each customer's SLA period and target. Same data, multiple calculations.

Should scheduled maintenance count against my SLA?

Per most SLA contracts, no — but only if you actually create the maintenance window in CronAlert before the maintenance happens, and only if you announced it to customers in advance. Skip either step and the time counts as an outage.

What's the right SLA target for B2B SaaS?

99.9% is the entry-level commitment. 99.95% is common for mid-market. 99.99% is enterprise territory and requires real engineering investment. Don't commit to a number historical data suggests you can't hit.

Get started

The first step toward compliance is to start measuring burn rate. Create a free CronAlert account, add your production monitors, set up maintenance windows for any planned deploys, and write a small script that pulls daily uptime from the API and posts burn rate to Slack each morning. Two weeks of that data will tell you more about your real-world SLA risk than any contract negotiation will.