There is no outage more total than a login outage. When your database is slow, some pages still work. When a cache degrades, the site limps along. But when authentication breaks, everyone is locked out at once — existing users can't refresh expired sessions, new users can't sign in, your API clients can't get tokens. And the cruel part is that your homepage keeps returning 200 the entire time, so a normal uptime check sees nothing wrong. The marketing site is fine. It's the front door's lock that's jammed.

Authentication is also one of the most dependency-heavy parts of a modern app. Even a "simple" OAuth login touches a discovery document, an authorization endpoint, a token endpoint, a JWKS key set, your own callback handler, and the TLS certificates on all of them — often spread across your infrastructure and a third-party identity provider you don't control. This guide covers what to monitor across that chain and how to watch it with CronAlert so a broken login pages you in seconds, not after support tickets pile up.

Why the homepage check is blind to this

Your login flow runs on a completely different path than your landing page. A homepage check exercises one route and confirms the web server is serving HTML. It never performs the OAuth redirect, never calls the identity provider's token endpoint, never fetches the JWKS keys, never verifies a signature. So every one of these can fail while the homepage stays green:

  • The identity provider (Auth0, Okta, Cognito, Google, Entra ID) has an outage.
  • Your OAuth client secret expired — many providers issue secrets with an expiry, and they lapse silently.
  • The provider rotated its signing keys and your app cached a stale JWKS, so every token now fails verification.
  • A TLS certificate expired on your auth domain, and clients refuse the handshake.
  • A redirect URI was misconfigured during a deploy, so the callback 400s.

This is the same blind-spot logic that makes a database health endpoint or cache health check necessary — you have to monitor the thing that's actually at risk, not the front door. For auth, "the thing at risk" is a short list of well-defined endpoints.

The endpoints to monitor

1. The OpenID Connect discovery document

Almost every OIDC provider publishes a discovery document at /.well-known/openid-configuration. It's the map of the whole flow — it names the authorization endpoint, token endpoint, JWKS URI, and supported scopes. If it's unreachable, libraries that auto-discover configuration can't even start the flow.

Monitor it for a 200, and add keyword monitoring to assert the body contains "jwks_uri" or "issuer" — proof that you got real JSON, not an error page or an empty 200 from a misbehaving proxy. Examples of public discovery URLs you can point a monitor at:

https://accounts.google.com/.well-known/openid-configuration
https://YOUR_TENANT.auth0.com/.well-known/openid-configuration
https://YOUR_ORG.okta.com/.well-known/openid-configuration
https://login.microsoftonline.com/TENANT_ID/v2.0/.well-known/openid-configuration
https://cognito-idp.REGION.amazonaws.com/POOL_ID/.well-known/openid-configuration

2. The JWKS endpoint (signing keys)

The JWKS (JSON Web Key Set) endpoint publishes the public keys your app uses to verify token signatures. This is the most dangerous single point in the chain: if it's down, empty, or returns garbage, every token your app tries to verify fails, and everyone is logged out at once, including users who were happily signed in a moment ago.

Monitor the JWKS URI (named in the discovery document) for a 200, and use keyword monitoring to confirm the response actually contains a "keys" array — not an empty {"keys":[]} and not an error. Treat a JWKS failure as critical: it deserves a page, not an email.

3. The authorization and token endpoints

The authorization endpoint is where users are redirected to log in; the token endpoint is where your backend exchanges the code for tokens. These are harder to synthetically exercise end-to-end because they expect specific parameters and credentials, but you can still monitor their availability: a well-behaved authorization endpoint returns a 200 or a redirect (3xx) for a probe, and a token endpoint typically returns a structured JSON error (often HTTP 400 with "error") rather than a 502/503/timeout when it's healthy but your probe omits credentials. Configure the monitor's expected status accordingly so a healthy-but-rejecting endpoint doesn't read as down.

4. Your own callback / redirect URI

The OAuth callback handler in your app is the piece most likely to break on a deploy — a changed route, a missing environment variable, a misconfigured redirect URI in the provider's dashboard. Monitor the callback path for the response it gives to an unauthenticated probe (commonly a redirect to login or a 400 about a missing code), and alert if it starts returning 500s. This is closely related to monitoring any inbound receiver endpoint — it's a route in your app that an external party calls back into.

5. TLS certificate expiry

Auth endpoints fail hard the instant a certificate expires: OAuth clients refuse to complete a TLS handshake on an invalid cert, and there is no degraded mode — it's binary. CronAlert performs SSL certificate monitoring on every HTTPS monitor and warns you days ahead of expiry. This matters most for self-hosted identity providers (Keycloak, Authentik, Ory) and custom auth domains where you own the certificate, rather than the provider's own domain.

Monitoring a third-party IdP you don't control

If you use Auth0, Okta, Cognito, Google, or Entra ID, you can't fix their outage — but detecting it instantly is still worth a lot. The moment their discovery or JWKS endpoint starts failing, you want to:

  • Communicate. Post to your status page and tell users "login is temporarily unavailable due to an issue at our identity provider" before they flood support guessing it's your fault.
  • Fail over, if you can. Some apps support a backup login method (email magic-link, a secondary provider). An immediate alert is what triggers that switch.
  • Correlate. When you also monitor the provider's endpoints directly, you can tell the difference between "our token verification is broken" and "the provider is down" — which routes the incident to the right team. This is exactly the third-party dependency monitoring pattern applied to identity.

Subscribing to the provider's own status page is complementary, but it's not a substitute: status pages are updated by humans, often minutes (or longer) after an incident starts, and they reflect the provider's global view, not your specific tenant or region. Your own synthetic check on their endpoints sees the problem the instant it affects you.

Setting it up in CronAlert

  1. Create a monitor for the discovery document. Expect 200, and add keyword monitoring (Pro) for "jwks_uri" so an empty or error response trips the alert.
  2. Create a monitor for the JWKS endpoint. Expect 200, keyword-match "keys", and route this one to your highest-severity channel — a JWKS failure logs everyone out.
  3. Create a monitor for your callback URI. Set the expected status to whatever a healthy unauthenticated probe returns (often a 3xx redirect or a 400), and alert on 5xx.
  4. Let SSL monitoring run on every HTTPS auth monitor so certificate expiry is caught days in advance.
  5. Use a 1-minute interval on a paid plan for the JWKS and token endpoints — auth is the one outage where every minute of detection delay multiplies across your entire user base.
  6. Route by severity. JWKS and token-endpoint failures should page (PagerDuty / Opsgenie); a discovery-document blip can be Slack. See incident response workflows.

Avoiding false positives on auth endpoints

Auth endpoints are unusually prone to false positives because a healthy endpoint often returns a non-200 status to an unauthenticated probe by design. Two safeguards:

  • Set the right expected status. A token endpoint returning HTTP 400 with {"error":"invalid_request"} is healthy — it's correctly rejecting a malformed request. Configure that as the expected response, and alert only on 5xx, timeouts, or connection failures. Distinguishing these is the same skill covered in HTTP status codes explained.
  • Use consecutive-check verification. Identity providers do brief maintenance and key rotations. A single failed check shouldn't page; two or three in a row should. This is the same dual-region / consecutive-check logic that keeps false positives down generally.

Common pitfalls

  • Monitoring only the homepage. The headline failure mode — login is down, homepage is green, and you find out from Twitter.
  • Forgetting the client secret expiry. Several providers issue OAuth client secrets that expire. There's no endpoint that turns red for this; put a calendar reminder or a heartbeat check on the renewal so it doesn't lapse silently.
  • Caching JWKS forever. If your app caches signing keys with no refresh, a provider key rotation breaks verification. Monitor the JWKS endpoint and make sure your app refreshes on an unknown kid.
  • Treating the IdP status page as monitoring. It lags real incidents and reflects a global view, not your tenant. Run your own synthetic check.
  • Ignoring certificate expiry on self-hosted IdPs. If you run Keycloak or Authentik, the cert is yours to renew. An expired cert is an instant, total login outage.

Where this fits in a broader strategy

Auth monitoring sits alongside your other deep checks. Pair it with database health monitoring (the IdP often needs a database too), third-party dependency monitoring for the rest of your critical vendors, and standard uptime monitoring on user-facing pages. Together they answer the question a homepage check can't: not just "is the site up," but "can a user actually sign in and use it." For a deeper look at what to monitor across a whole product, see uptime monitoring for SaaS.

Frequently asked questions

What should you monitor for OAuth and identity providers?

The OIDC discovery document, the JWKS endpoint, the authorization and token endpoints, your own callback URI, and the TLS certificates on all of them. A failure in any one breaks sign-in while your homepage keeps returning 200.

Why doesn't a normal uptime check catch a login outage?

Because authentication runs on a different path than your marketing site. A homepage check never exercises the OAuth redirect, the token endpoint, or the JWKS fetch — so the IdP can be down, your secret expired, or your keys rotated, and the homepage still returns 200.

How do you monitor a third-party IdP like Auth0, Okta, or Google?

Point a monitor at the provider's public discovery and JWKS endpoints and assert a 200 plus a keyword that proves the JSON is valid ("issuer" or "keys"). You can't fix their outage, but instant detection lets you communicate, fail over, and route the incident correctly.

How do you monitor JWKS and signing-key rotation?

Monitor the JWKS URI for a 200 and keyword-match the "keys" array so an empty set or error page trips the alert. An unreachable or empty JWKS endpoint means every token fails verification and everyone is logged out, so treat it as critical.

How do you monitor TLS certificate expiry for auth endpoints?

Auth endpoints fail the instant a certificate expires, with no degraded mode. CronAlert's SSL certificate monitoring runs on every HTTPS monitor and alerts days before expiry. This matters most for self-hosted IdPs and custom auth domains where you own the certificate.

Monitor your login flow with CronAlert

Your login is the one outage that affects every user simultaneously and stays invisible to a homepage check. Create a free account (25 monitors, no credit card), add monitors for your discovery document, JWKS endpoint, token endpoint, and callback URI, turn on keyword monitoring to catch empty-but-200 responses, and route the JWKS check to your pager. The next time your identity provider stumbles or your signing keys rotate, you'll know in under a minute — long before the first "I can't log in" ticket.

Related reading: monitoring third-party dependencies like Stripe and Twilio, monitoring inbound webhook receivers, HTTP status codes explained, how to reduce false-positive alerts, and uptime monitoring for SaaS.