The Complete Guide to HTTP Health Check Endpoints

Q: What's the difference between /healthz, /livez, and /readyz?

All three are conventions popularized by Kubernetes but applicable to any HTTP service. /healthz is a generic 'is the process alive and serving traffic' endpoint and is the most common choice for external uptime monitoring. /livez (liveness) signals 'restart me if I'm not responding' to an orchestrator — it should fail only when a restart would actually fix the problem. /readyz (readiness) signals 'I'm running but should not receive traffic right now' — useful during startup, draining, or when a critical dependency is unavailable. External uptime monitoring tools should hit /healthz or a custom equivalent; orchestrators should hit /livez and /readyz.

Q: Should a health check verify database connectivity?

Sometimes. A 'shallow' health check returns 200 if the process is running, regardless of dependencies. A 'deep' health check verifies databases, caches, downstream APIs, and other dependencies. Use shallow checks for liveness probes (where a failure triggers a restart) and deep checks for readiness probes and external monitoring (where a failure should signal something is wrong but not trigger a restart). A common mistake is making the liveness probe a deep check, which causes the orchestrator to restart healthy instances every time a downstream dependency hiccups.

Q: Should health check endpoints require authentication?

Generally no — but the response should contain no sensitive information. A health endpoint that returns just a status code (or a tiny JSON document with status, version, and uptime) doesn't need authentication and shouldn't have it, because every layer of auth adds something that can break and obscure the actual application health. If the endpoint must return rich diagnostic information, gate that detailed view behind authentication and keep a public endpoint that returns minimum information for monitoring.

Q: How often should an external monitor hit a health endpoint?

Depends on the criticality. Production user-facing services should be checked every 1 minute from multiple regions; internal tools and staging can be checked every 5 to 15 minutes. The check interval determines how fast you'll detect outages — a 5-minute interval means a worst-case 5-minute detection delay. Most uptime SLAs are tight enough that 1-minute checks pay for themselves in faster detection. Don't go below 1 minute unless your application is truly latency-critical; sub-minute checks add load and rarely improve real-world detection time.

Q: Should a deep health check call all dependencies in series or in parallel?

Parallel, with a strict per-dependency timeout. Calling each dependency in series compounds latency: five dependencies at 200ms each is a 1-second health check that times out under load. In parallel with a 500ms cap on each call, the worst case is 500ms regardless of how many dependencies you check. Time out individual dependencies aggressively and report each one's status separately in the response body so the operator can see exactly what's slow without rerunning the check.

A health check endpoint is the smallest, simplest piece of an application that does the most work for ops. It's the one URL your monitoring tool hits, your load balancer asks before sending traffic, your orchestrator polls before declaring the pod ready, and the on-call engineer curls when something feels off. Get it right and the rest of your observability story flows. Get it wrong and you're either blind to outages or restarting healthy services every time DNS hiccups.

This post is the long-form playbook: what a health endpoint should do, what it shouldn't, the conventions (/healthz, /livez, /readyz), the trade-offs between shallow and deep checks, code examples in four languages, and how external uptime monitoring fits alongside Kubernetes probes.

What a health check endpoint is for

A health endpoint exists to answer a single question, fast and unambiguously: "is this thing working?" The catch is that "working" means different things to different callers. The same endpoint is asked by:

External uptime monitors ("is the service reachable from outside?"). The answer determines whether to alert the on-call. Uptime monitoring setup is the upstream context here.
Load balancers ("should I send traffic to this instance?"). The answer determines whether to keep the instance in rotation.
Orchestrators like Kubernetes ("is this pod alive? is it ready for traffic?"). The answer determines whether to restart the pod or hold traffic until it's ready.
On-call engineers ("which thing is broken?"). They curl the endpoint when investigating an alert.

The mistake most teams make is having one endpoint try to answer all four questions, then being surprised when load balancers pull instances out of rotation during a downstream dependency hiccup or when liveness probes restart pods that are perfectly healthy. The right answer is to have purpose-built endpoints, each answering one question. The Kubernetes-native convention is the most useful default.

The /healthz, /livez, /readyz convention

The /healthz family of endpoints was popularized by Kubernetes and has been adopted broadly. They serve different purposes:

/healthz — the catch-all

A simple "is this process alive and broadly working" endpoint. Returns 200 if the application is running and can respond to HTTP requests, 5xx if something is fundamentally broken. This is the right endpoint for external uptime monitoring — the question external monitors care about is whether the service is reachable and responsive, not whether every dependency is healthy.

/livez — liveness probe

"Restart me if I'm not responding." This is what Kubernetes liveness probes hit. It should fail only when a restart would fix the problem — typically when the process is deadlocked, has an exhausted thread pool, or is stuck in some unrecoverable state.

The cardinal sin of liveness probes is making them check downstream dependencies. If the database is down and your liveness probe checks the database, Kubernetes restarts every pod, repeatedly, accomplishing nothing except adding restart churn to an already broken system. Liveness should be shallow.

/readyz — readiness probe

"I'm running but should not receive traffic right now." This is what load balancers and Kubernetes readiness probes hit. It returns failure during startup (warming caches, loading models, opening connection pools), during shutdown (draining in-flight requests), or when a critical dependency is unavailable.

Readiness can be deep. If the database is down, marking the pod as not ready pulls it out of rotation (without restarting it) so traffic flows only to healthy instances. When the database recovers, the pod becomes ready again automatically.

/startupz — startup probe (Kubernetes-specific)

A relatively newer addition for slow-starting applications. Kubernetes uses this to delay liveness and readiness probing until the application has had time to start up. Most applications don't need a separate startup probe; readiness handles the same case for most.

Shallow vs deep checks

The shallow vs deep decision is the most consequential single decision in health check design.

Shallow check

A shallow check returns 200 if the process is running and the HTTP stack is responsive. It does not call the database, it does not check Redis, it does not contact downstream services. Reading the request and writing 200 OK is the entire operation.

Shallow checks are cheap (no I/O), reliable (no false negatives from dependency hiccups), and predictable (no variable latency). They are the right choice for liveness probes and for the public health endpoint that an external monitor hits.

Deep check

A deep check verifies the application's critical dependencies are reachable: a database query, a Redis ping, a call to a downstream service health endpoint. It tells you not just "is the application running" but "is it actually serving requests successfully right now."

Deep checks are more useful as diagnostic signals but more expensive (real I/O on every check) and less reliable (a 200ms blip in one dependency causes the whole health endpoint to fail). They are the right choice for readiness probes and for an internal "deep diagnostics" endpoint that an on-call engineer can curl.

The hybrid pattern

The right architecture for most teams is to expose both:

/healthz — shallow, public, hit by external monitors and load balancers.
/livez — shallow, internal-only, hit by Kubernetes liveness probes.
/readyz — deep, internal-only, hit by Kubernetes readiness probes.
/healthz/deep or /diagnostics — deep, optionally authenticated, available for on-call investigation.

The shallow endpoints stay up even when downstream dependencies are flaky, so external monitors and the load balancer don't pull the instance out of rotation for transient issues. The deep endpoints fail when dependencies fail, which is what a Kubernetes readiness probe actually wants. The diagnostics endpoint gives an on-call engineer real information without exposing detailed internals to the world.

A common related pattern for monitoring databases specifically is in how to monitor your database health endpoint.

What to include in a health response body

A good health response includes enough information that the operator can decide what to do without rerunning the check. The minimum useful body is:

{
  "status": "ok",
  "version": "2.4.7",
  "uptime_seconds": 18234
}

A deep check response should expand to include the status of each dependency, with one of three states per dependency: "ok", "degraded", or "down":

{
  "status": "degraded",
  "version": "2.4.7",
  "checks": {
    "database": { "status": "ok", "latency_ms": 8 },
    "redis":    { "status": "ok", "latency_ms": 3 },
    "stripe":   { "status": "down", "error": "timeout after 500ms" }
  }
}

Three states is more useful than two. "Degraded" lets you mark the instance as serving traffic with a warning rather than pulling it out of rotation. A monitor or load balancer can decide what to do with that signal independently — most uptime monitors should treat "degraded" as a warning, not an outage.

What NOT to include

A few things tend to creep into health endpoints that shouldn't be there:

Stack traces. A health endpoint that exposes Python or Node.js stack traces in error responses is leaking implementation details to anyone who can curl it. Catch errors and return a clean status; log the trace internally.
Secrets, environment variables, or credentials. Some debug endpoints dump process.env or framework configuration. Never on a publicly reachable endpoint, and ideally never on any endpoint that can be reached from outside the cluster.
Detailed user-data counts. "Total users: 142,953" looks fine until a competitor scrapes it for growth signals. Aggregate stats belong in private dashboards, not public health responses.
Slow operations. A health check that takes 3 seconds defeats the purpose. Time-bound every operation aggressively (500ms cap is reasonable) and prefer parallel calls over serial ones.
Side effects. A health check should not write to the database, increment counters, or trigger any state change. It is read-only.

Code examples

Node.js (Express)

app.get("/healthz", (_req, res) => {
  res.status(200).json({
    status: "ok",
    version: process.env.APP_VERSION,
    uptime_seconds: Math.floor(process.uptime()),
  });
});

app.get("/readyz", async (_req, res) => {
  const checks = await Promise.allSettled([
    withTimeout(db.raw("SELECT 1"), 500),
    withTimeout(redis.ping(), 500),
  ]);
  const dbOk = checks[0].status === "fulfilled";
  const redisOk = checks[1].status === "fulfilled";
  const ok = dbOk && redisOk;
  res.status(ok ? 200 : 503).json({
    status: ok ? "ok" : "down",
    checks: {
      database: { status: dbOk ? "ok" : "down" },
      redis:    { status: redisOk ? "ok" : "down" },
    },
  });
});

The withTimeout helper wraps each dependency call in a 500ms cap. The Promise.allSettled wrapper runs them in parallel without short-circuiting on the first failure, which is what you want for a diagnostic response.

Python (FastAPI)

from fastapi import FastAPI, Response
import asyncio
import time

app = FastAPI()
START_TIME = time.time()

@app.get("/healthz")
async def healthz():
    return {
        "status": "ok",
        "version": APP_VERSION,
        "uptime_seconds": int(time.time() - START_TIME),
    }

@app.get("/readyz")
async def readyz(response: Response):
    db_task = asyncio.create_task(check_db())
    redis_task = asyncio.create_task(check_redis())
    db_ok, redis_ok = await asyncio.gather(db_task, redis_task)
    ok = db_ok and redis_ok
    response.status_code = 200 if ok else 503
    return {
        "status": "ok" if ok else "down",
        "checks": {
            "database": {"status": "ok" if db_ok else "down"},
            "redis":    {"status": "ok" if redis_ok else "down"},
        },
    }

async def check_db():
    try:
        await asyncio.wait_for(db.execute("SELECT 1"), timeout=0.5)
        return True
    except Exception:
        return False

Go

func healthzHandler(w http.ResponseWriter, _ *http.Request) {
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(map[string]any{
        "status":         "ok",
        "version":        version,
        "uptime_seconds": int(time.Since(startTime).Seconds()),
    })
}

func readyzHandler(w http.ResponseWriter, _ *http.Request) {
    ctx, cancel := context.WithTimeout(context.Background(), 500*time.Millisecond)
    defer cancel()

    var dbOk, redisOk bool
    var wg sync.WaitGroup
    wg.Add(2)
    go func() { defer wg.Done(); dbOk = db.PingContext(ctx) == nil }()
    go func() { defer wg.Done(); redisOk = redis.Ping(ctx).Err() == nil }()
    wg.Wait()

    ok := dbOk && redisOk
    if !ok {
        w.WriteHeader(http.StatusServiceUnavailable)
    }
    json.NewEncoder(w).Encode(map[string]any{
        "status": map[bool]string{true: "ok", false: "down"}[ok],
        "checks": map[string]any{
            "database": map[string]string{"status": map[bool]string{true: "ok", false: "down"}[dbOk]},
            "redis":    map[string]string{"status": map[bool]string{true: "ok", false: "down"}[redisOk]},
        },
    })
}

Rust (Axum)

async fn healthz() -> Json<serde_json::Value> {
    Json(json!({
        "status": "ok",
        "version": env!("CARGO_PKG_VERSION"),
        "uptime_seconds": START_TIME.elapsed().as_secs(),
    }))
}

async fn readyz(State(state): State<AppState>) -> (StatusCode, Json<serde_json::Value>) {
    let timeout = Duration::from_millis(500);
    let (db_ok, redis_ok) = tokio::join!(
        tokio::time::timeout(timeout, sqlx::query("SELECT 1").execute(&state.db)).map(|r| r.is_ok()),
        tokio::time::timeout(timeout, state.redis.ping()).map(|r| r.is_ok()),
    );
    let ok = db_ok && redis_ok;
    let status = if ok { StatusCode::OK } else { StatusCode::SERVICE_UNAVAILABLE };
    (status, Json(json!({
        "status": if ok { "ok" } else { "down" },
        "checks": {
            "database": { "status": if db_ok { "ok" } else { "down" } },
            "redis":    { "status": if redis_ok { "ok" } else { "down" } },
        },
    })))
}

Kubernetes probe configuration

Once you have separate liveness and readiness endpoints, the Kubernetes deployment configuration looks like this:

livenessProbe:
  httpGet:
    path: /livez
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 1
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /readyz
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  timeoutSeconds: 2
  failureThreshold: 2

A few notes on the parameters:

failureThreshold: 3 on liveness — three consecutive failures before restart. One failure shouldn't trigger a restart; transient blips happen.
periodSeconds: 5 on readiness — fast feedback so traffic re-routes quickly when an instance recovers from a dependency outage.
timeoutSeconds: 1 on liveness — liveness checks should be cheap. If yours takes more than a second, it's doing too much.
initialDelaySeconds — give the application time to start before probing. For slow-starting Java applications, set this generously; for Go and Rust services it can be small.

For Kubernetes-specific monitoring patterns beyond the probes themselves, see Kubernetes uptime monitoring.

External uptime monitoring vs Kubernetes probes

Kubernetes probes and external uptime checks complement each other; neither replaces the other:

Probes tell Kubernetes whether to restart a pod or pull it from rotation. They run inside the cluster and have no visibility into how the service looks from the outside.
External uptime checks tell you whether users can actually reach the service. They run from outside the cluster and have no visibility into pod-level state.

The failure modes are different. A misconfigured ingress can leave every pod healthy from Kubernetes' perspective while the public endpoint returns 502. A broken DNS record can make the service unreachable from the internet without any probe noticing. A regional Cloudflare outage can take down the public endpoint while the cluster is fine. Probes won't catch any of these — only an external monitor will.

The right setup is to expose /livez and /readyz for in-cluster probes and a public /healthz (or any monitored endpoint) for external monitoring. The external monitor catches CDN, DNS, and ingress problems; the probes catch pod-level issues. Monitoring microservices goes deeper on the architecture.

Common health-check anti-patterns

The everything endpoint

A single /health that's used as liveness, readiness, external monitoring, and on-call diagnostics. Inevitably gets configured as a deep check, which means a downstream blip restarts pods, drops them from rotation, and pages the on-call engineer simultaneously. Split it.

Cascading deep checks

Service A's deep health check calls Service B's deep health check, which calls Service C's deep health check. A slow database in service C times out the entire request chain, and every service ends up reporting itself as down. Each service's deep check should verify only its direct dependencies, and never another service's deep endpoint.

Health checks that mutate state

"Let's increment a counter so we can graph health-check rate." This couples observation to side effects, breaks idempotency, and produces silent failures when the counter store is the broken dependency. Health checks are read-only.

Pure-200 with no body

A response of just 200 OK with no body works mechanically but tells the on-call nothing during an investigation. A small JSON body with version and dependency status pays for itself the first time someone curls the endpoint at 3am.

Authentication on the public health endpoint

Adding auth to a public health endpoint adds a layer that can break independently of the application. A simple /healthz that returns {"status": "ok", "version": "..."} doesn't need auth and shouldn't have it. Gate detailed diagnostic endpoints behind auth, but keep the basic one open.

Frequently asked questions

What's the difference between /healthz, /livez, and /readyz?

/healthz is the catch-all "is this thing up" endpoint, used by external monitors. /livez is shallow and signals "restart me if I'm broken" to Kubernetes. /readyz is deep and signals "should I be in the load balancer rotation right now." Externally monitor /healthz; let probes handle /livez and /readyz.

Should a health check verify database connectivity?

Use a deep check (with database verification) for readiness probes and a shallow check (no dependencies) for liveness. A liveness probe that checks the database will restart every pod when the database hiccups, which is rarely the right behavior.

Should health check endpoints require authentication?

Generally no for the basic endpoint. Keep the body small and free of sensitive data so auth isn't needed. Gate detailed diagnostic endpoints behind auth.

How often should an external monitor hit a health endpoint?

Every 1 minute for production user-facing services. Every 5 to 15 minutes for staging and internal tools. Don't go below 1 minute unless your application is genuinely latency-critical — it adds load and rarely improves real-world detection.

Should a deep health check call all dependencies in series or in parallel?

Parallel, with a strict per-dependency timeout (500ms is reasonable). Serial calls compound latency; parallel with timeouts caps the total response time at the slowest dependency.

Putting it together with external monitoring

A complete health check architecture looks like: shallow /livez for Kubernetes liveness, deep /readyz for readiness probes, public /healthz for external uptime monitors and load balancers, and an authenticated /diagnostics for on-call investigation. Implement them in parallel, time-bound them aggressively, and keep the bodies small and useful.

On the external side, point an uptime monitor at /healthz with a 1-minute interval and multi-region quorum. Create a CronAlert account, add a monitor with a keyword check that confirms "status":"ok" appears in the response body, and route the alert to your on-call channel.