Monitor your Crons like a Pro !! An Open-Source Alternative to New Relic

Blog about how we used an Open-source, self-hosted tool to monitor our cron jobs. Includes details about how it works and how to setup.

Context

Our client was running approximately 51 background jobs across their production environment. These jobs handled everything from

  1. Data synchronisation

  2. Report generation

  3. Cleanup tasks

  4. Scheduled notifications

All these jobs are running using Crontabs

Problem Statement

When you're managing dozens of cron jobs in production, the scariest scenario isn't a job that fails loudly—it's the one that fails silently. You only discover the problem when a customer reports missing data or a delayed process. By then, it's already too late.

We were facing two critical issues:

1. Misconfigured cron jobs that never get triggered
Sometimes, due to misconfigurations or environment changes, a cron job simply wouldn't run at all. Without proper monitoring, these silent failures would go unnoticed until someone manually checked or a customer complained.

2. Jobs that failed silently without raising alerts
Even when cron jobs executed on schedule, some would fail midway through their execution without generating any alerts. The logs would show errors, but no one was actively watching them 24/7.

The real pain was this: we'd only discover these failures when customers reported issues.

That's not monitoring—that's damage control.

Solution

Our first approach was to use NewRelic, but that didnt work for us mainly because of the complexity.

For each cron job, we had to:

  • Write custom NRQL queries to track execution patterns

  • Set up separate alert conditions for missed schedules

  • Configure Slack notification channels

  • Maintain multiple dashboards to visualise job health

The complexity was overwhelming. What we needed was very simple,

Did this job run?

Did it succeed?

After some research, we tried an open-source, self-hosted background job monitoring tool - healthchecks.io, and it solved our use case perfectly. This is how it works,

HealthCheck - how it works

  1. Job starts → Cron job pings Healthcheck service "start" endpoint

  2. Job completes successfully → Cron job pings Healthcheck service "success" endpoint

  3. Job fails → Cron job pings Healthcheck service "failure" endpoint (or times out if it never completes)

  4. Missed execution → If Healthcheck service doesn't receive any ping within the expected cron schedule, it triggers an alert

# Cron job enabled with HealthCheck (HC) Monitoring
0 2 * * * curl -m 10 --retry 5 \
https://hc.example.com/ping/abc123/start && \ # Notify HC about Cron Start
/actual-cron-expression.sh && \ # Run the actual Cron job
curl -m 10 --retry 5 https://hc.example.com/ping/abc123 || \ # Notify HC that Cron is successful
curl -m 10 --retry 5 https://hc.example.com/ping/abc123/fail # Notify HC that Cron has failed

Outcome / Impact

We will know about the status of the cron job in a dashboard like below,

Dashboard

We will get a Slack Alert in case of any Cron job failure,

Final State of Background job is this,

No more silent failures: Every job execution is tracked, and failures trigger instant alerts
Proactive monitoring: We catch missed or failing jobs before customers notice
Clear visibility: The entire team can see job health at a glance
Minimal maintenance: No NRQL queries to write or complex dashboards to maintain
Full control: Self-hosted solution with no external dependencies or costs


The bottom line: If you're tired of complex monitoring setups for simple cron jobs, give Healthchecks a try. It's the monitoring tool cron jobs deserve—simple, reliable, and effective.

No more NRQL queries. No hidden costs. Just straightforward monitoring that actually works.

Share this post

Want to discuss about DevOps practices, Infrastructure Audits or Free consulting for your AWS Cloud?

Prasanna would be glad to jump into a call
Loading...