How to Monitor Serverless Functions (AWS Lambda, Cloudflare Workers)

Serverless functions fail differently than traditional servers. There is no process to crash, no CPU graph to spike, no disk to fill. Instead, your Lambda times out silently, your Worker hits a memory limit and returns a generic error, or your function deploys successfully but cannot reach a database it depends on. The failure modes are subtle, and the built-in platform metrics often do not surface them until users complain.

External uptime monitoring solves this by checking your serverless functions the same way your users reach them -- over HTTP, from outside your cloud provider's network. If the function is broken, slow, or returning errors, you find out in minutes instead of hours.

Why serverless monitoring is different

Traditional monitoring assumes a long-running process. You install an agent, it collects metrics, and you get dashboards for CPU, memory, disk, and request rates. Serverless breaks every one of these assumptions:

No persistent process. Functions spin up, execute, and terminate. There is nothing to install an agent on. By the time you notice something is wrong, the execution environment that caused the problem is already gone.
Cold starts are invisible to users but real. A function that has not run recently takes 1-10 seconds to initialize on the first request. Platform metrics show the cold start, but they do not tell you whether it caused a user-visible timeout or degraded experience.
Timeouts are silent. When an AWS Lambda hits its timeout, it simply stops. The response to the caller is a generic gateway timeout. Your CloudWatch logs show "Task timed out after 30 seconds" -- but only if you are watching them.
Deploy failures are invisible. You push new code, the deployment succeeds, but the function cannot import a module, connect to a database, or parse its environment variables. The platform marks the deployment as successful because deployment and execution are separate concerns.

External monitoring cuts through all of this by asking one simple question: does the function work right now? Not "did it deploy?" or "is the platform healthy?" -- but "if a user calls this function, do they get the right answer in a reasonable time?"

Health check patterns for serverless

HTTP-triggered functions: expose a health endpoint

If your serverless function serves HTTP traffic -- an API endpoint, a webhook handler, a server-side rendered page -- you can monitor it directly with an HTTP uptime check. The simplest approach is to add a dedicated health route that exercises your function's critical dependencies.

For an AWS Lambda behind API Gateway:

// handler.js
export async function handler(event) {
  if (event.path === '/healthz') {
    try {
      // Verify database connectivity
      await db.query('SELECT 1');
      return {
        statusCode: 200,
        body: JSON.stringify({ status: 'healthy', region: process.env.AWS_REGION }),
      };
    } catch (err) {
      return {
        statusCode: 503,
        body: JSON.stringify({ status: 'unhealthy', error: 'dependency failure' }),
      };
    }
  }

  // Normal request handling
  return handleRequest(event);
}

For a Cloudflare Worker:

export default {
  async fetch(request, env) {
    const url = new URL(request.url);

    if (url.pathname === '/healthz') {
      try {
        // Verify D1 database connectivity
        await env.DB.prepare('SELECT 1').first();
        return Response.json({
          status: 'healthy',
          colo: request.cf?.colo,
        });
      } catch (err) {
        return Response.json(
          { status: 'unhealthy', error: 'dependency failure' },
          { status: 503 }
        );
      }
    }

    // Normal request handling
    return handleRequest(request, env);
  },
};

Both patterns follow the same approach from the database health endpoint guide: verify critical dependencies, return a clear status, and use HTTP status codes that external monitors can key on.

Event-driven functions: use heartbeat monitoring

Many serverless functions are not HTTP-triggered. Queue processors, cron jobs, event handlers, and stream processors run on their own schedule or in response to internal events. You cannot check these with an HTTP request because there is no URL to hit.

The solution is heartbeat monitoring. Your function pings a unique URL on successful completion. If the heartbeat does not arrive within the expected window, an alert fires.

// AWS Lambda processing SQS messages
export async function handler(event) {
  for (const record of event.Records) {
    await processMessage(record.body);
  }

  // Ping heartbeat on success
  await fetch('https://cronalert.com/api/heartbeat/YOUR_MONITOR_ID');
}

This catches every failure mode that matters: the function never runs (scheduler broken), the function runs but crashes, the function runs but takes too long, or the function runs but cannot process messages correctly. If any of these happen, the heartbeat does not arrive, and you get an alert.

Scheduled functions: combine both approaches

Serverless cron jobs (AWS EventBridge rules, Cloudflare Cron Triggers) are particularly prone to silent failure. The schedule might be misconfigured, the function might fail on every invocation, or the trigger might be disabled during a deployment -- and nothing alerts you because there is no one calling the function to notice.

For critical scheduled functions, use both patterns:

Heartbeat monitoring on the function itself -- it pings CronAlert every time it runs successfully.
HTTP monitoring on a health endpoint that reports the last successful run time. If the last run was more than 2x the expected interval ago, the endpoint returns an unhealthy status.

export default {
  async scheduled(event, env) {
    await runCronJob(env);

    // Record last successful run
    await env.KV.put('last_cron_run', new Date().toISOString());

    // Ping heartbeat
    await fetch('https://cronalert.com/api/heartbeat/YOUR_MONITOR_ID');
  },

  async fetch(request, env) {
    if (new URL(request.url).pathname === '/healthz') {
      const lastRun = await env.KV.get('last_cron_run');
      const msSinceLastRun = Date.now() - new Date(lastRun).getTime();
      const expectedIntervalMs = 5 * 60 * 1000; // 5 minutes

      return Response.json({
        status: msSinceLastRun < expectedIntervalMs * 2 ? 'healthy' : 'stale',
        last_run: lastRun,
        seconds_ago: Math.round(msSinceLastRun / 1000),
      });
    }
  },
};

Monitoring cold starts

Cold starts are the serverless-specific failure mode that catches most teams off guard. Your function works perfectly in testing, then takes 8 seconds to respond in production because a real user happened to hit it after a period of inactivity. The request succeeds eventually, but the user has already given up or the calling service has timed out.

What causes cold starts

First invocation after idle period. The platform has to provision a new execution environment, download your code, and initialize your runtime. On AWS Lambda, this is typically 100ms to 10 seconds depending on package size and runtime.
Scaling events. When traffic spikes, the platform creates new execution environments in parallel. Each new environment experiences a cold start. If you go from 0 to 100 concurrent requests, 100 users experience cold starts simultaneously.
Deployment. After a new deploy, existing warm environments may be replaced with new ones running the updated code. This can cause a burst of cold starts across all concurrent executions.

How to monitor cold starts with CronAlert

Regular uptime checks serve double duty for cold starts. A 1-minute check interval means your function is called at least every minute, which keeps it warm and prevents cold starts for real users. This is one of the few cases where the monitoring itself improves the thing being monitored.

To detect cold starts that do happen:

Set a reasonable timeout. If your function normally responds in 200ms but cold starts take 3 seconds, set the monitoring timeout to 10 seconds. This catches genuine timeouts (30+ seconds) without false-alerting on cold starts.
Include cold start indicators in your health response. Report whether the current invocation is a cold start so you can see the pattern in your monitoring data:

let isWarm = false;

export async function handler(event) {
  const wasColdStart = !isWarm;
  isWarm = true;

  return {
    statusCode: 200,
    body: JSON.stringify({
      status: 'healthy',
      cold_start: wasColdStart,
    }),
  };
}

Monitoring serverless timeouts

Every serverless platform has execution time limits. AWS Lambda allows up to 15 minutes. Cloudflare Workers allows 30 seconds on the free plan and up to 15 minutes with Cron Triggers. When a function hits its timeout, the platform terminates it immediately -- no graceful shutdown, no error handler, no logging. The request simply dies.

External monitoring catches timeouts naturally. When your function times out, CronAlert's check either receives no response (connection timeout) or receives a gateway timeout error from the platform. Either way, the check fails and an alert fires.

To catch functions that are approaching their timeout before they actually hit it, add timing to your health endpoint:

export async function handler(event) {
  const start = Date.now();

  // Run your health checks
  await db.query('SELECT 1');
  await cache.ping();

  const elapsed = Date.now() - start;
  const timeoutMs = parseInt(process.env.FUNCTION_TIMEOUT || '30000');
  const headroom = timeoutMs - elapsed;

  return {
    statusCode: headroom < 5000 ? 503 : 200,
    body: JSON.stringify({
      status: headroom < 5000 ? 'unhealthy' : 'healthy',
      elapsed_ms: elapsed,
      headroom_ms: headroom,
    }),
  };
}

If the health check itself takes most of the function's timeout budget, something is wrong -- a slow database, a network issue, or a dependency that is degraded. Returning 503 when headroom is low triggers an alert before real requests start timing out.

Platform-specific monitoring tips

AWS Lambda

Monitor through API Gateway or Function URLs, not the Lambda invoke API. You want to test the full path that users take, including API Gateway routing, authorization, and throttling.
Watch for throttling. Lambda throttles at the account level (default 1,000 concurrent executions). If your function is being throttled, the health endpoint returns a 429 status through API Gateway. A standard HTTP status code check catches this immediately.
Monitor across regions if you deploy to multiple AWS regions. Use CronAlert's multi-region monitoring to check your function from all 5 probe regions simultaneously. A function that works in us-east-1 but is broken in eu-west-1 is a partial outage that single-region monitoring misses.

Cloudflare Workers

Monitor the deployed URL directly. Workers are globally distributed, so any request reaches the nearest edge location automatically. Your monitor checks the Worker from CronAlert's probe regions, exercising multiple edge locations.
Test KV and D1 bindings in your health check. A Worker that deploys successfully but cannot reach its KV namespace or D1 database is functionally broken. Include a read from each binding in your health endpoint.
Watch for CPU time limits. Workers have a 10ms CPU time limit on the free plan (50ms on paid). If your function exceeds this, Cloudflare terminates it. The error is visible as a 1101 or "Worker exceeded CPU time limit" response -- a standard HTTP check detects this as a non-200 status.

Google Cloud Functions / Azure Functions

Same principles apply. Expose a health endpoint, check dependencies, return clear status. The only differences are platform-specific timeout limits and cold start characteristics.
Monitor the HTTPS trigger URL directly. Both platforms provide an HTTPS URL for HTTP-triggered functions. Point your CronAlert monitor at that URL.

Setting up alerts for serverless functions

Serverless failures tend to be urgent -- your function is either working or it is not, and there is no "degraded but functional" middle ground for most serverless use cases. Set up your alert channels accordingly:

Slack or Teams for team awareness. Everyone should know when a production function is down. See our guides for Slack alerts and Teams alerts.
PagerDuty or webhook for on-call escalation. Route to PagerDuty or a custom webhook endpoint that triggers your incident response process.
Use consecutive-check verification. CronAlert verifies failures with a follow-up check before alerting, which prevents false positives from transient cold starts or one-off network blips.

Frequently asked questions

How do I monitor a serverless function that is not HTTP-triggered?

Use heartbeat monitoring. Your function pings a heartbeat URL on success, and the monitor alerts you if it does not receive a ping within the expected interval. This works for queue processors, scheduled tasks, and event-driven functions that have no HTTP endpoint to check.

Can cold starts cause false positive alerts?

They can if your monitoring timeout is too tight. A cold start on AWS Lambda can add 1 to 10 seconds depending on runtime and package size. Set your monitoring timeout to accommodate the worst-case cold start for your function. CronAlert supports timeouts up to 120 seconds, and consecutive-check verification prevents a single slow response from triggering an alert.

How often should I check serverless function health?

For production functions, every 1 to 3 minutes. This also has the side benefit of keeping your function warm -- regular checks prevent cold starts for real users. CronAlert free plan checks every 3 minutes, paid plans check every 1 minute.

What is the difference between monitoring a serverless function and a traditional server?

Traditional servers have persistent processes you can SSH into, check logs, and monitor with agents. Serverless functions are ephemeral -- they spin up, execute, and disappear. You cannot install monitoring agents inside them. External monitoring (HTTP checks and heartbeats) is the only reliable way to verify that your serverless functions are working correctly from the user's perspective.

Start monitoring your serverless functions

Serverless does not mean worry-free. Functions time out, cold starts degrade user experience, deployments break silently, and platform limits throttle your traffic. External monitoring is the only way to know whether your functions are actually working -- not whether the platform says they are deployed. If you run Next.js on Vercel, see the Next.js and Vercel uptime monitoring guide for platform-specific monitors covering ISR, Edge runtime, and preview deploys.

Create a free CronAlert account and set up HTTP monitors for your function endpoints or heartbeat monitors for your event-driven functions. The free plan gives you 25 monitors with 3-minute checks. Upgrade to Pro ($5/month) for 1-minute checks that double as warm-up calls. See the pricing page for full plan details.