How to Monitor Cron Jobs (And Get Alerts When They Fail)

Your database backup script has been failing for three weeks. You don't know this yet. You'll find out when your production database crashes and you reach for that backup that doesn't exist.

This is the reality of cron jobs: they fail silently, and by the time you notice, the damage is done.

The Problem: Cron Jobs Fail Without Telling Anyone

Cron is beautifully simple. Add a line to your crontab, and your script runs on schedule. But cron was designed in an era when "monitoring" meant checking printouts. It has no built-in mechanism to tell you when something goes wrong.

When a cron job fails, here's what happens: nothing. No email (unless you've configured local mail delivery, which almost nobody does anymore). No alert. No Slack message. The job just... doesn't run. Or it runs and exits with an error code that disappears into the void.

The failure modes are endless:

Script crashes halfway through
Disk fills up, writes fail silently
Network timeout kills an API call
Permission change breaks file access
Server reboots and cron daemon doesn't restart
Someone comments out the crontab line "temporarily"

Any of these can happen, and you won't know until the consequences hit you.

Common Approaches to Cron Job Monitoring

Before we get to the right solution, let's look at what most people try first.

Manual Log Checking

The most common approach: SSH into the server occasionally and check logs.

grep "backup" /var/log/syslog | tail -20

This works until it doesn't. You'll check diligently for a week, then forget. Three months later, you realize you haven't looked at those logs since February.

Manual checking doesn't scale, and it relies on human memory, the least reliable monitoring system ever invented.

Custom Email Scripts

A step up: wrap your cron jobs in a script that sends an email on failure.

#!/bin/bash
/path/to/backup.sh
if [ $? -ne 0 ]; then
    echo "Backup failed" | mail -s "ALERT: Backup Failed" [email protected]
fi

Better. But now you have new problems:

You need a working mail server (harder than it sounds on modern cloud VMs)
Emails go to spam
If the script itself crashes, no email gets sent
You only know about failures, not jobs that never ran at all

That last point is critical. If your cron daemon crashes, or someone deletes the crontab entry, the script doesn't run, which means no failure email. You've only solved half the problem.

Systemd Timers with OnFailure

If you're on Linux with systemd, you can use timer units instead of cron:

[Unit]
Description=Database backup

[Service]
ExecStart=/path/to/backup.sh

[Install]
WantedBy=multi-user.target

Then create an OnFailure handler that triggers alerts. This is robust, but:

It's Linux-only
Configuration is verbose
You need separate unit files for each job
Still doesn't catch jobs that never start

For complex deployments with existing systemd infrastructure, this can work. For most of us running a handful of cron jobs, it's overkill.

The Simple Solution: Heartbeat Monitoring

Here's the pattern that actually works: instead of alerting on failure, alert on the absence of success.

The concept:

Your cron job runs and does its work
If successful, it pings an external URL (a "heartbeat")
An external service tracks these pings
If no ping arrives within the expected window, you get an alert

This inverts the problem elegantly. You're not trying to catch every possible failure mode. You're simply verifying that the job completed successfully, on schedule.

The external service doesn't care why the ping didn't arrive. Server down? Script crashed? Cron daemon died? Network issue? Doesn't matter. No ping means something's wrong, and you get notified.

How to Implement Heartbeat Monitoring for Cron Jobs

Implementation is dead simple. Append a curl request to your cron job:

0 2 * * * /home/scripts/backup.sh && curl -fsS --retry 3 https://api.cronsignal.io/ping/abc123

Let's break this down:

0 2 * * * - Run at 2:00 AM daily
/home/scripts/backup.sh - Your actual script
&& - Only continue if the previous command succeeded (exit code 0)
curl -fsS --retry 3 - Silent request with retries
https://api.cronsignal.io/ping/abc123 - Your unique ping endpoint

The && operator is crucial. It means the ping only fires if your script exits successfully. If backup.sh returns a non-zero exit code, curl never runs, the ping never arrives, and you get alerted.

Making Your Scripts Exit Correctly

For this pattern to work, your scripts need to return proper exit codes. Most well-written scripts do this automatically, but verify yours does:

#!/bin/bash
set -e  # Exit on any error

# Your backup logic here
pg_dump mydb > /backups/mydb_$(date +%Y%m%d).sql

# If we got here, everything worked
exit 0

The set -e flag is your friend. It makes the script exit immediately if any command fails, rather than continuing blindly.

For more complex scripts, you might want explicit error handling:

#!/bin/bash

pg_dump mydb > /tmp/backup.sql
if [ $? -ne 0 ]; then
    echo "Database dump failed"
    exit 1
fi

aws s3 cp /tmp/backup.sql s3://my-bucket/
if [ $? -ne 0 ]; then
    echo "S3 upload failed"
    exit 1
fi

exit 0

Handling Long-Running Jobs

For jobs that take a while, you might want to ping at the start and end:

0 2 * * * curl -fsS https://api.cronsignal.io/ping/abc123/start && /home/scripts/backup.sh && curl -fsS https://api.cronsignal.io/ping/abc123

This lets you catch jobs that start but never finish. Useful for debugging jobs that hang or get killed by OOM.

What Cron Jobs Should You Monitor?

Not every cron job needs monitoring. A script that cleans up temp files? Probably fine if it misses a run. But some jobs are critical:

Database Backups

This is the canonical example. Backup failures are invisible until disaster strikes. Monitor every backup job, no exceptions.

0 3 * * * /scripts/pg_backup.sh && curl -fsS https://api.cronsignal.io/ping/db-backup

Report Generation

If stakeholders expect a report every Monday morning, you want to know before they start asking questions.

0 6 * * 1 /scripts/weekly_report.py && curl -fsS https://api.cronsignal.io/ping/weekly-report

Data Synchronization

ETL jobs, API syncs, cache rebuilds. Anything where stale data causes problems downstream.

*/15 * * * * /scripts/sync_inventory.sh && curl -fsS https://api.cronsignal.io/ping/inventory-sync

Cleanup and Maintenance Jobs

Log rotation, temp file cleanup, expired session purging. These jobs often fail silently for months until disk space runs out.

0 4 * * * /scripts/cleanup.sh && curl -fsS https://api.cronsignal.io/ping/cleanup

SSL Certificate Renewal

Let's Encrypt certbot runs via cron. If it fails, your site goes down when the cert expires. High stakes, easy to forget.

0 0 * * * certbot renew --quiet && curl -fsS https://api.cronsignal.io/ping/ssl-renewal

Health Checks and Heartbeats

Sometimes you just want to know a server is alive and cron is running:

*/5 * * * * curl -fsS https://api.cronsignal.io/ping/server-alive

Setting Appropriate Alert Windows

The monitoring service needs to know when to expect your ping. Get this wrong and you'll either miss failures or get false alarms.

For a job that runs every hour, a 90-minute window makes sense. It gives some buffer for jobs that run slow occasionally, while still catching actual failures promptly.

For daily backups at 2 AM that usually take 30 minutes, you might set a 2-hour window. If the ping hasn't arrived by 4 AM, something's wrong.

Be realistic about job duration variance. A backup that usually takes 10 minutes might take 2 hours when there's more data than usual. Set your windows accordingly.

Why External Monitoring Beats Self-Monitoring

You might be tempted to build this yourself. Run a small service that tracks pings and sends alerts. It's not complicated.

But consider: if that monitoring service runs on the same server as your cron jobs, it's subject to the same failure modes. Server goes down, cron stops, monitoring stops. No alert.

External monitoring services exist precisely because they're independent. They're watching from outside, which means they catch failures that would take down your internal monitoring too.

It's the same reason you don't keep your backup drives next to your production servers.

Getting Started with Cron Job Monitoring

CronSignal does exactly what's described in this post for $5/month for unlimited checks. You get a ping URL, set your expected schedule, and receive alerts via email or Slack when pings don't arrive.

There are other services that do similar things. The pattern matters more than the specific tool. Pick something, set it up, and stop wondering whether your cron jobs are actually running.

The best monitoring is the kind you set up once and forget about until it saves you. A heartbeat ping takes 30 seconds to add to your crontab. The peace of mind is worth it, especially at 3 AM when you're not getting paged because you caught the backup failure twelve hours ago.