
Context
Our client was running approximately 51 background jobs across their production environment. These jobs handled everything from
Data synchronisation
Report generation
Cleanup tasks
Scheduled notifications
All these jobs are running using Crontabs
Problem Statement
When you're managing dozens of cron jobs in production, the scariest scenario isn't a job that fails loudly—it's the one that fails silently. You only discover the problem when a customer reports missing data or a delayed process. By then, it's already too late.
We were facing two critical issues:
1. Misconfigured cron jobs that never get triggered
Sometimes, due to misconfigurations or environment changes, a cron job simply wouldn't run at all. Without proper monitoring, these silent failures would go unnoticed until someone manually checked or a customer complained.
2. Jobs that failed silently without raising alerts
Even when cron jobs executed on schedule, some would fail midway through their execution without generating any alerts. The logs would show errors, but no one was actively watching them 24/7.
The real pain was this: we'd only discover these failures when customers reported issues.
That's not monitoring—that's damage control.
Solution
Our first approach was to use NewRelic, but that didnt work for us mainly because of the complexity.
For each cron job, we had to:
Write custom NRQL queries to track execution patterns
Set up separate alert conditions for missed schedules
Configure Slack notification channels
Maintain multiple dashboards to visualise job health
The complexity was overwhelming. What we needed was very simple,
Did this job run?
Did it succeed?
After some research, we tried an open-source, self-hosted background job monitoring tool - healthchecks.io, and it solved our use case perfectly. This is how it works,
![]()
Job starts → Cron job pings Healthcheck service "start" endpoint
Job completes successfully → Cron job pings Healthcheck service "success" endpoint
Job fails → Cron job pings Healthcheck service "failure" endpoint (or times out if it never completes)
Missed execution → If Healthcheck service doesn't receive any ping within the expected cron schedule, it triggers an alert
# Cron job enabled with HealthCheck (HC) Monitoring
0 2 * * * curl -m 10 --retry 5 \
https://hc.example.com/ping/abc123/start && \ # Notify HC about Cron Start
/actual-cron-expression.sh && \ # Run the actual Cron job
curl -m 10 --retry 5 https://hc.example.com/ping/abc123 || \ # Notify HC that Cron is successful
curl -m 10 --retry 5 https://hc.example.com/ping/abc123/fail # Notify HC that Cron has failedOutcome / Impact
We will know about the status of the cron job in a dashboard like below,

We will get a Slack Alert in case of any Cron job failure,
Final State of Background job is this,
✅ No more silent failures: Every job execution is tracked, and failures trigger instant alerts
✅ Proactive monitoring: We catch missed or failing jobs before customers notice
✅ Clear visibility: The entire team can see job health at a glance
✅ Minimal maintenance: No NRQL queries to write or complex dashboards to maintain
✅ Full control: Self-hosted solution with no external dependencies or costs
The bottom line: If you're tired of complex monitoring setups for simple cron jobs, give Healthchecks a try. It's the monitoring tool cron jobs deserve—simple, reliable, and effective.
No more NRQL queries. No hidden costs. Just straightforward monitoring that actually works.
Share this post
