GraphQL endpoints fail in ways REST monitors cannot detect. Every successful and unsuccessful GraphQL response returns HTTP 200 OK. Errors live in a JSON array in the response body. A single POST URL exposes hundreds of logical query paths. Monitor a GraphQL API the way you monitor a REST one and you will miss the outages that matter most.
This post walks through what makes GraphQL monitoring different, the query patterns that catch real failures, and the exact configuration for monitoring a GraphQL endpoint with CronAlert.
Why GraphQL monitoring is different
Errors hide behind HTTP 200
A typical GraphQL server returns 200 OK for any request that successfully parsed. If your resolver threw, if a database query timed out, if authorization denied the request, the HTTP status is still 200 and the failure lives in the response body:
{
"data": null,
"errors": [
{ "message": "Database connection refused", "path": ["viewer"] }
]
} A status-code-only monitor sees the 200 and reports the endpoint as healthy. Your users see a broken dashboard. This is the same class of problem that affects any endpoint that returns 200 with an error body, but GraphQL makes it the norm rather than the exception.
One URL, many paths
REST exposes a distinct URL per operation: GET /users/me, POST /checkout, GET /orders. Each URL can have its own monitor. A GraphQL API exposes one URL — typically /graphql — that multiplexes hundreds of queries. A single monitor on the base URL tells you whether the server is reachable. It tells you nothing about whether the queries your users actually run still work.
The answer is to create one monitor per critical query path, not one monitor for the whole endpoint. Pick three to five queries that represent the revenue-critical flows and monitor those individually.
Partial success is common
A GraphQL response can contain both data and errors — one field resolved successfully, another failed. The user sees a page that renders but is missing a widget. From a pure availability standpoint the endpoint is "up"; from a product standpoint something is broken. Treat the presence of an errors array as a failure even when data is non-null.
The four monitor patterns that work
1. The minimal health query
The smallest useful GraphQL monitor sends a query that forces the server through its full request pipeline — parsing, validation, execution — without touching any data sources. __typename is built into every GraphQL server:
POST /graphql
Content-Type: application/json
{ "query": "{ __typename }" }
Expected response: {"data":{"__typename":"Query"}}. If the server is up and the executor works, this query passes. If anything in the middleware chain is broken — auth, rate limiting, logging — this query fails cleanly with a clear error.
This is your "is GraphQL alive" monitor. It should run on a 1-minute interval in production. It does not prove any specific query works, only that the server is parsing and responding.
2. A dedicated healthz query
Add a root field that exercises the dependencies your server actually needs. Most GraphQL schemas can add a healthz query in a few lines:
type Query {
healthz: HealthStatus!
}
type HealthStatus {
ok: Boolean!
database: String!
cache: String!
version: String!
}
The resolver pings the database, pings the cache, and returns ok: true only if both succeed. A monitor sends { healthz { ok database cache } } and fails if ok is not true or if the response contains an errors field.
This is essentially a GraphQL-shaped version of a REST health endpoint — the same principles apply, including what not to put in it (skip anything slow, skip anything that calls a third party).
3. Per-query monitors for critical paths
The queries your customers actually run are the ones worth monitoring directly. Three to five is usually enough:
- Login / viewer query. If this breaks, nobody can use the app.
{ viewer { id email } } - Dashboard load. The top-level query the user sees on page load.
- Checkout / order mutation. A read-only version of the real mutation, or a dry-run flag if your server supports it.
- Search / list query. Often has its own backend (Elasticsearch, OpenSearch) that can fail independently.
Use a real test account whose credentials are known-good. Pass the auth token in the Authorization header of the monitor. Validate a known-good field is present in the response body.
4. Introspection in staging only
Introspection (__schema and related meta-fields) is a handy way to detect schema drift — it answers "did my last deploy remove a field that another service depends on?" But introspection is often disabled in production for security reasons, and a monitor that hits it will report the production server as down.
Run introspection monitors on staging or preview environments only. In production, stick to the three patterns above.
Configuring a GraphQL monitor in CronAlert
CronAlert monitors support POST requests with a custom body, custom headers, and response body validation. The full setup for a GraphQL monitor:
- Create a new monitor in the dashboard. Set the type to HTTP.
- URL: your GraphQL endpoint (e.g.,
https://api.example.com/graphql). - Method: POST.
- Headers:
Content-Type: application/json. AddAuthorization: Bearer <token>if the query requires auth. - Body: the JSON query payload, e.g.,
{"query":"{ viewer { id } }"}. - Keyword monitoring — expected: a field name you know will appear in the response, like
"viewer"or"ok":true. - Keyword monitoring — unwanted: the string
"errors". If it appears, the check fails even if the HTTP status is 200. - Interval: 1 minute on paid plans, 3 minutes on free.
- Alerts: any channel — email, Slack, Discord, PagerDuty, webhook.
For multi-region customers, CronAlert's 5-region probe network sends the same POST from every region. This catches the case where your GraphQL server is up but is unreachable from one geography — a common failure mode for GraphQL gateways that depend on regional caches or CDNs.
What to alert on, what to ignore
Alert-fatigue is the real risk when you start monitoring GraphQL queries individually. A noisy monitor on the dashboard query can page you every time one field resolver has a transient hiccup. Some rules that work well in practice:
- Alert on sustained failures, not single ones. CronAlert's consecutive-check verification — enabled by default — filters out single-query timeouts and transient network blips.
- Use response time thresholds, not just 200 vs. non-200. A GraphQL query that normally takes 100ms but is now taking 5 seconds is a real incident even if it eventually returns data.
- Treat the
errorsarray as a failure, treatdata: nullas ambiguous. A non-nullerrorsalways means something broke. A nulldatawith specific error codes (e.g.,UNAUTHORIZED) may be expected for an unauthenticated probe. - Keep the query small. Don't monitor the full dashboard query. Monitor a stripped-down version that exercises the same resolvers but returns only a few fields. Large payloads amplify noise and cost.
- Don't monitor every mutation. Monitoring a write that has side effects will pollute your database. Add a read-only healthcheck field for each critical mutation instead.
More on this pattern in the alert fatigue guide.
GraphQL federation and gateways
If you run a federated GraphQL setup — Apollo Federation, GraphQL Mesh, a custom gateway — the gateway is a separate failure mode from the subgraph services. The gateway can be healthy while a subgraph is down, returning partial responses with errors. A subgraph can be healthy while the gateway is down, making it appear the whole system is broken.
Monitor the gateway with __typename as the "is the entry point alive" check. Monitor each subgraph with a query that crosses the gateway boundary into that subgraph — your gateway's trace tools will tell you which resolvers touch which subgraph. This combination catches both gateway outages and subgraph outages with two monitors per subgraph instead of monitoring the mesh as a black box.
This is the GraphQL version of the microservices external-monitoring pattern: internal metrics tell you what did go wrong; external monitors tell you whether your users are affected.
Frequently asked questions
Why can't I just monitor GraphQL like a REST endpoint?
GraphQL servers return 200 OK for nearly every request, including failed ones. Errors appear inside the response body in an errors array. A status-code-only monitor will report every GraphQL outage as healthy. You need a POST-based monitor with response body validation.
What's the simplest GraphQL health check query?
{ __typename }. It returns {"data":{"__typename":"Query"}} in a healthy server, touches no data sources, and forces the server through its full request pipeline. For a deeper check, add a dedicated healthz root field that verifies database and cache connectivity.
Should I monitor GraphQL introspection?
Only in non-production environments. Introspection is commonly disabled in production, so monitoring it would flag the server as down. Use __typename or a custom healthz field in production and reserve introspection for staging.
How many GraphQL monitors should I create?
One for the base __typename check, one for the healthz field, and three to five for the specific query paths that matter most to your users. Fewer than that and you miss outages; more and you start generating noise for queries that are not user-visible.
Can CronAlert POST a GraphQL query as a monitor?
Yes. CronAlert monitors support POST with a custom body and headers, so any GraphQL query works. Combined with keyword monitoring — require a specific field, fail if "errors" appears — this catches both transport and application-level failures on any plan. The API endpoint monitoring guide has more on POST monitor configuration.
Get your GraphQL endpoints covered
GraphQL's flexibility is its strength and its monitoring weakness. The same query-anything-from-anywhere design that makes GraphQL great for product teams makes it invisible to monitors that only check HTTP status. Fixing that takes a small shift in how you configure monitors: POST a real query, validate the body, fail on errors.
Create a free CronAlert account and spin up a GraphQL monitor in a minute — 25 monitors, POST support, keyword validation, and email/Slack/Discord alerts included on the free plan. Upgrade to Team for multi-region quorum and 1-minute intervals.