Ask HN: How do you verify cron jobs did what they were supposed to?

8•BlackPearl02•2w ago

I've been running into this issue where my cron jobs "succeed" but don't actually do their job correctly.

For example:

Backup cron runs, exit code 0, but creates empty files

Data sync completes successfully but only processes a fraction of records

Report generator finishes but outputs incomplete data

The logs say everything's fine, but the results are wrong. Actually, the errors are probably in the logs somewhere, but who checks logs proactively? I'm not going through log files every day to see if something silently failed.

I've tried:

Adding validation in scripts - works, but you still need to check the logs

Webhook alerts - but you have to write connectors for every script

Error monitoring tools - but they only catch exceptions, not wrong results

I ended up building a simple monitoring tool that watches job results instead of just execution - you send it the actual results (file size, count, etc.) and it alerts if something's off. No need to dig through logs.

But I'm curious: how do you all handle this? Are you actually checking logs regularly, or do you have something that proactively alerts you when results don't match expectations?

Comments

krunck•2w ago

My way of doing things:

  1. Scripts should always return an error (>0) when things did not go as planned and 0 when they did. Always.
  2. Scripts should always notify you when they return >0. Either in their own way or via emails sent by Cron.
  3. Use chronic ( from Debian moreutils package) to ensure that cron jobs only email output when they ended in error. That way you don't need to worry about things sent to STDOUT spamming you.
  4. Create wrapper scripts for jobs that need extra functionality: notification, logging, or sanity checks.

PenguinCoder•2w ago

+1 for chronic. Very useful for knowing when a cron fails without needing to manually review every log run.

BlackPearl02•2w ago

Those are all solid practices! I use chronic too, and proper exit codes are essential.

The gap I found is that even with all of that, you can still have "successful" jobs (exit code 0, no errors) that produce wrong results. Like a backup script that runs successfully but creates empty files because the source directory was empty, or a sync that only processes 10% of records because of a logic bug.

But there's another issue: chronic doesn't detect when cron jobs don't run at all. If your crontab gets corrupted, the server time changes, or cron daemon stops, chronic won't alert you because there's no output to email.

That's why I built result validation monitoring - it expects a ping after your job completes. If the ping doesn't arrive (job didn't run, crashed before completion, etc.), it alerts. Plus it validates the actual results (file sizes, record counts, content validation) and alerts if they don't match expectations.

Works alongside chronic/exit codes, but adds detection for jobs that never executed and validation of the actual outputs.

maliciouspickle•2w ago

this is not a direct answer to the original question, but problems like this are what let to the creation of orchestrator tools like airflow, luigi, dagster, prefect, etc.. these tools provide features which help increase task/job observability, ease of debugging, and overall reliability of scheduled jobs/tasks.

it is a natural progression to move on from cron and adopt an orchestrator tool (many options nowadays) when you need more insight into cron, or when you start finding yourself building custom features around it.

i would do some research into orchestators and see if there are any that meet your requirements. many have feature sets and integrations that’s solve some of the exact problems you’re describing

(as a data engineer my current favorite general purpose orchestrator is dagster. it’s lightweight yet flexible)

edit: as a basic example, in most orchestrators, there is a first class way to define data quality checks, if you have less data than expected, or erroneous data (based upon your expectations) you can define this as an automated check

you can then choose to fail the job, or set a number re-retries before failing , or send a notification to some destination of your choice( they have integrations with slack, and many other alerting tools)

i like dagster because it is geared for hooking into the data itself. you can use it to ‘run a job’ like a some function, but really it shines when you use its ‘data asset features’ that tracks the data itself over time, and provides a nice UI to view and compare data from each run over time. hook in alerting for anomalies and you’re good to go!

they have many more features depending on the tool , and some more or less complicated to set up.

grugdev42•2w ago

I would (respectfully) challenge this idea. :)

I'm not certain adding more complexity (which comes with the more powerful solutions you've suggested) will help things right now.

Cron is such a basic tool, it really shouldn't be causing any problems. I think fixing the underlying problems in the scripts themselves is important to do first.

Just my two cents though!

nelgaard•2w ago

I do something similar, but simpler.

E.g., for rolling daily backups, something like

ls -l *.backup | mail -s "backup done" me@foo.dk, someoneelse@bar.dk

even for successful cron jobs. That way you can check file sizes, timestamps, etc.

That way I will notice if something is not working, even if emails are also not working, the server is down etc. It requires of course that you actually read those emails. But at least if people have accepted to check them, they cannot complain. Well, they can of course, but then I can also blame them.

BlackPearl02•2w ago

That's a clever approach! I used to do something similar, but found I'd miss emails or they'd get buried.

The tool I built does something similar but automated - it watches for a ping after your job completes and can validate the results (file sizes, timestamps, counts, etc.). You set validation rules once, and it only alerts when something's actually wrong. No need to read through daily emails.

For your backup example, you'd ping it with the file size, and it alerts if the size is unexpectedly small or zero. Same idea, just automated.

kevin061•2w ago

What I did is make the scripts ping uptime kuma https://github.com/louislam/uptime-kuma

This has two advantages. First, if cron jobs are not running at all for some reason, the timeout will notify you.

Second, you can also manually trigger error conditions by pushing to uptime kuma with an error message, exactly the same as if you used cron email notifications.

If the uptime kuma endpoint is pinged with s higher frequency than the timeout you configured it to, then no alerts are fired.

grugdev42•2w ago

It sounds like you have four separate problems:

---

1. Being sure your cronjobs ran

Use a heartbeat monitoring system like this:

https://uptimerobot.com/cron-job-monitoring/

Append their URL to ping after your cronjob. Like so:

* * * * * * python /home/foo.py && curl https://example.com/heartbeat

If your cronjob doesn't run, or runs and fails, the heartbeat won't get called because of the &&.

---

2. Make sure your scripts return error codes:

Your scripts should return 0 for success, or greater than 0 for errors.

This ties into point number one. Without proper error codes your heartbeat monitoring won't work.

---

3. Standardised logging:

Make sure you send/pipe your errors to ONE place. Having to look in multiple places is just asking for trouble.

And then check your logs daily. Better yet, automate the checking... maybe you send the contents of the log to Slack once per day? Or email it to yourself?

---

4. More robust scripts:

I'm not trying to be unkind, but your scripts sound like they're erroring a lot!

Maybe they need to be tightened up... don't blindly trust things, check return types, verify the previous step using code, log more information to help you track the problems down

---

If you do all of these things I think you will fix your problems. Good luck :)

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

Ask HN: Ideas for small ways to make the world a better place

Ask HN: Non AI-obsessed tech forums

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

Ask HN: Who wants to be hired? (February 2026)

LLMs are powerful, but enterprises are deterministic by nature

Ask HN: Who is hiring? (February 2026)

AI Regex Scientist: A self-improving regex solver

Ask HN: Non-profit, volunteers run org needs CRM. Is Odoo Community a good sol.?

Tell HN: Another round of Zendesk email spam

Ask HN: Is Connecting via SSH Risky?

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

Ask HN: How does ChatGPT decide which websites to recommend?

Ask HN: Why LLM providers sell access instead of consulting services?

Ask HN: Is there anyone here who still uses slide rules?

Ask HN: Mem0 stores memories, but doesn't learn user patterns

Ask HN: Is it just me or are most businesses insane?

Kernighan on Programming

Ask HN: What is the most complicated Algorithm you came up with yourself?

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

We built a serverless GPU inference platform with predictable latency

Ask HN: Does a good "read it later" app exist?

Ask HN: Have you been fired because of AI?

Ask HN: Anyone have a "sovereign" solution for phone calls?

Ask HN: Cheap laptop for Linux without GUI (for writing)

Ask HN: Any International Job Boards for International Workers?

Ask HN: How Did You Validate?