It is 2022, so I had thought it would be tremendously easy and cheap, but it seems no solution is easily implemented.
From their pricing example:
> You schedule a job to run once an hour for 15 seconds. This job uses 50% of one CPU core and 256 MiB of memory. This job will cost $0.016/day for the CPU, and $0.001/day for the memory, adding up to $0.017/day, or $0.51/month.
While you could do FaaS (e.g. AWS Lambda) + another service for state (e.g. S3/RDS/EFS), that seems like overkill.
Even today I would still do it the old way: store state in a filesystem (maybe SQLite if you have more complex needs), and configure a plain old cronjob on any of the servers you have access to.
Maybe put it in docker/dockerhub to make it easier to run.
Surprisingly it is an interesting question. You have a very simple task, but ideally you want the solution to include: running environment (let's say a container), scheduler, persistent state, monitoring (you want to know when it is down). Not to mention deployment.
There are solutions for each of those, but given the simplicity it would be nice to have a lightweight solution that includes all of that with minimal configuration. I doubt it exists.
I.e. you can do all of that on AWS, but I can't help but wonder if the infrastructure setup is going to be more complicated than your actual scraper.
Edit: a good way to think about it is: imagine that a person just learned how to scrape websites, they wrote a script that works locally. How much do they need to learn to move it to the cloud?
If your answer is: ok, so you just need to create an AWS account, dockerize your script, put it in a Lambda, create RDS instance, trigger your Lambda through EventBridge and setup CloudWatch... then there's something wrong with you. There must be a better way to reliably run a fucking 10 line script in the cloud.
There's been many times over the years when I wanted to write a simple scraper that does exactly what author describes and what stopped me is the amount of infrastructure bullshit I would have to deal with.
https://fly.io/docs/machines/working-with-machines/#create-a...
See the `schedule` parameter.
This might be a non-answer, but if you just want "scheduled notifications of a website change", then you can use something completely purpose built for that. There is a Github action that does exactly this: https://github.com/yasoob/github-action-scraper-tutorial/blo...
This might also be a non-answer, but I've found myself using Integromat/Make or Zapier more and more for this type of work. It allows me to whip things up quickly and it can actually be approachable by non-technical people.
You could use lambda if your runtime is expected to always be less than 15 minutes Batch fargate jobs documentation - https://docs.aws.amazon.com/batch/latest/userguide/fargate.h... Cloud watch cron documentation
https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/S...
https://aws.amazon.com/about-aws/whats-new/2018/03/aws-batch...
Everything can be deployed as a simple cdk app once you have your code ready in a docker repository somewhere . There are no servers to maintain and no ongoing maintenance will be necessary
I do a lot of web scraping, and Lambdas are very useful for web scraping since their IPs can change for each request.
For deployment, you can just do `serverless deploy` or set up a GitHub action each time you push to main.
I remember seeing a couple projects shared before, using this technique to scrape sites with GHA
- (free) set up google sheets (not sure if this is required but that's what I do)
- (free) run an scheduled appscript bound to that sheet with regular intervals
- make it scrape the page (it's just javascript, you can do get requests, etc)
- Have it scrape a webarchive of the page you want to compare (not need to persist state or run a database, etc) and have it check the differences with the current page
- Have it send you an email if there's changes
It's totally free, takes no time to set up, and is relatively effortless. It can even send you alert on failures etc. I do something like this for my reading tracker where I crawl various sites (like amazon, etc) and RSS feeds for new releases of books/series I read and collect them in a dynamic spreadsheet.
EDIT: after reading the other solutions in this thread I'm blown away by how much people are over thinking and over engineering this. KISS people.
Send an email (for free): https://blog.cloudflare.com/sending-email-from-workers-with-...
Cloud Run will run any docker image, and give it a HTTPS URL. Scales to zero, allows running in the background for up to an hour.
The benefit over FAAS is that you can run anything you want in the container, including multiple processes.
Here's how a cron job implementation would look with Temporal -
` func Subscription(ctx choreography.Context, userUUID string) error { free_trial_on := true
for {
if free_trial_on {
// Sleep until trial period passes
choreography.Sleep(ctx, days(15))
free_trial_on = False
} else {
// Charge subscription fee
choreography.ExecuteActivity(ctx, ChargeSubscriptionAndEmailReceipt, userUUID, frequency).Get(ctx, nil)
// Sleep until next payment is due
choreography.Sleep(ctx, days(30))
}
}
}
`
Alternatively, for a more generic solution to “easy free cronjobs”, you can write some php page on a free php hosting service and point visualping to that page. Cost: 0/year
https://www.netlify.com/blog/how-to-schedule-deploys-with-ne...
Might be more precise to call it "GitHub scraping" as you set up your script to run on a schedule on GitHub Actions CI and keep its state by committing into the git repo.
We have our scripts running on GH Actions end by hitting the Slack API to notify us with a message.
There's a hosted solution, or you can self-host it
Cheap: low in price, especially in relation to similar items or services.
Easy: achieved without great effort; presenting few difficulties.
Because "easiest" AND "cheapest" are logically conflicting, my answer only applies to "easy" ways of hosting cron jobs. My answer also holds true for background tasks that are not always scheduled.
The two tools I rely on are:
1. Pipedream
2. Windmill (open-source pipedream)
Both of them let me define functions that need to be run on schedule. Also functions that can react to real world events (webhooks). I can code in Go/ Js/ Ts or Python. Everything is version controlled, stored in Git.
For Pipedream, you pay a premium for managed service. Windmill can be hosted on a $5 machine.
Cheapest: Any CI/CD service provider. Takes 20 minutes to set up. Not very reliable. Free
Next cheapest, least easy: complex proprietary serverless cloud functions to automatically trigger a task on a schedule. Takes an hour to set up. Most reliable. $0.30-$1.5/month
i'm sure the major cloud providers have cheap / free tiers for this kind of work, but quite frankly i've been burned by run away pricing tiers too many times to ever consider using cloud again for a personal project .. so unless this is getting funded by a client try to host your own.
[1] https://learn.microsoft.com/en-us/azure/azure-functions/func...
The UI is excellent for this. You can probably find cheaper on Fly but probably not easier.
Even easier, maybe more expensive than Render:
https://www.zeplo.io/docs/schedule
You just hit a URL like so and it's done. zeplo.to/your_url.com?_cron=5|14|||*
We have companies running hundreds of concurrent schedules on our platform. Send me a note at ravi@airplane.dev if you'd like to chat about it.
* free-tier vps on GCP or Oracle Cloud
* lambda job on AWS
I have a cheap VPS I use for other things and just run my cron jobs there.
My use cases have been: - Web Scraping of some websites to check stock availability of a product (4 times a day). - Web Scraping of a website to get an apppointment at a specific citizen office (every 5 seconds). If success then send a Telegram Chat Bot message to me.
The second use case was a bit tricky, because my IP was kicked frequently from the server, I tried out what the sweet spot timing for the ping was and my setup (all jobs with different IPs) was the following:
1. setup on my Raspberry Pi in combination of the package PM2 (PM2 is a daemon process manager that will help you manage and keep your application online 24/7)
2. setup on Render (https://render.com/) Advantage here was the very convinient, quick and easy setup. I was able to link my Github Repository to it and the Cron Job would re-build the program automatically whenever a new master version on GitHub was available. I used an in-memory TypeScript job scheduler library in my project that repeatedly executes given tasks within specified intervals of time (e. g. "each 5 seconds") until 7 PM and I let the scron job service execute the script again at 7 AM in the morning. Debug Print Console inclusive.
3. setup on EvenNode (https://www.evennode.com/docs/cron-jobs) Here the upload of my node.js script was done via FileZilla to the FTP-Server of EvenNode. Linkage to GitHub Repository is also possible but since I am not that familiar with the private key setup at GitHub I chose the FTP-Server option. I used an in-memory TypeScript job scheduler library in my project that repeatedly executes given tasks within specified intervals of time (e. g. "each 5 seconds") until 7 PM and I let the scron job service execute the script again at 7 AM in the morning. Debug Print Console inclusive.
Both online Service were very cheap, paid cents only. The Raspberry Pi solution was for free (only energy consumption). I can highly recommend render.com and evennode.com for cron jobs.
Flexibility to write custom code and alerting logic, but no headache of managing your own infrastructure
Disclaimer: I previously worked at Retool :)
GitHub actions are another option
Not free but basically you wrap your job in http endpoint and then we take care of the rest.
There are two ways to set up a cron job: the dashboard or via the API if you need the flexibility, e.g., dynamically creating/starting/stopping crons.
We use HTTP webhooks to make the outgoing cron requests, which makes it easy to integrate with a wide range of services and platforms, including Vercel, Lambda, etc. We're also working on adding local execution so you can have even more control over how your tasks are run without using webhooks, but that's still being built & tested.
Feel free to reply here or email me directly if you have any questions: james@mergent.co
https://developers.google.com/apps-script
It's free, so pretty cheap.
You can set up a schedule to run the scripts. Has easy access to Google APIs (Gmail).
Very powerful and simple solution I've used for years.
It's cheap as in free, thanks to the generous free tier.
How cheap can you run this cron job, the basics:
- 8,765 invocations a year
- Lambda: $0.20 per 1M requests
- Cloudwatch Events: $1.00 per 1M requests
$1.2 / 1,000,000 * 8,765 = About 1 cent
Is anybody going to host that as a service for this price? Absolutely not.
You can create cron jobs over HTTP, both recurring and one-time:
Node SDK: https://www.npmjs.com/package/beew API Reference: https://beew.io/api
You can also create directly trough the UI.
Let me know if you need any help
I want to offload this, but any time I search for an option, they're overcomplicated or overpriced. Maybe there just isn't a market for it?
...maybe I don't want to offload this. The Pi and SD card was like $35, the power is basically zero, and the Internet is basically zero.
Priced: $0.25/GB-hour + data transfer and storage
I'm actively using all of the above approaches.
It's worth checking out Shipyard (https://www.shipyardapp.com). I'm the co-founder and designed it to be the easiest way to deploy scripts/workflows in the cloud. You can schedule any code to run our platform (native support for Python, Node.js, Bash) and build reusable templates.
Our free dev plan allows for 10 hours of free runtime per month which is plenty for most use cases. If you want more or need webhooks/API, that starts at $50/month.
Feel free to contact if you want to learn more. Email is in my profile.
https://chrome.google.com/webstore/detail/distill-web-monito...
It's free.
It is definitely the easiest way for Python code. Just paste your code and thats it
Full disclosure: I made this product. We are a YC company
More info:
https://azure.microsoft.com/en-us/pricing/details/functions/
https://www.serverless360.com/blog/azure-functions-triggers-...
0 code on the cron side (cannot be simpler), and if you only monitor a single route per hour is free as well. AND you get a free UI on how the cron job went :)
We self-host, but they have a GA cloud offering now. [1]
As long as you use js/ts, go, python, php or java that is. For some jobs we wrap process execution - e.g running a powershell command.
You won't spend much at all, it's fractions of a penny per call and I think there's a free tier.
One of the cheaper options and probably one of the older options.
It might be more than what you need, but your needs will increase. The above cost will not.
NixOS-configured systemd timers work great. Code them in bash, Go, Haskell, whatever. If you like polyglot (mostly to leverage any community's open source apps), NixOS is the best.
It isn't the cheapest or easiest for a single job. It's not hard to deploy to a random $5 VPS though (I use a Digital Ocean droplet). But once you have a box to deploy to, the incremental cost is as cheap as it gets.
Free, the job timeout is 10s but can be extended if needed, and minimum resolution is run every minute.
https://docs.deta.sh/docs/tutorials/cron-guide https://docs.deta.sh/docs/micros/cron
That said, there are probably different forces at play if this is a personal infrastructure question: while cheap is good, if you aren't fluent in AWS it may be a slog to set up. If you're not, easiest thing is probably a real cron job run on a cheap VPS.
You assume someone purchased hardware (preferably available anywhere on the planet), powered it up, set it up, connected it to the internet, then built the software to handle your very specific task, then put it online to do specifically what you want and for a close-to-free price? And you're shocked this doesn't exist?
Zero infrastructure to manage, only write your script (in Dark, for that matter).
https://learn.microsoft.com/en-us/azure/azure-functions/func...
I’m using it to issue a daily db back up command for my app.
How much local compute do you have?
I think people underestimate dynamic DNS or a static IP if you can get one for your home for cheap.
Tl;dr: Github Actions for low frequency, upstash for high.
Easiest way I found to do that was Azure Functions. Costs me about 35p per month. Mostly for storage for logs as far as I can tell. I'd sort that, but it's literally not worth the time it would take
A VPS at hosthatch is still 4.99 I believe.
Is it about how to run cron? This is a thing of the past. As you are asking for 2022, look at systemd timers.
Is it about cheap hosting? The are lots of answers here already.
Or is it about how to monitor a site for changes?
you want to watch a web-page - apparently they have no RSS-feed in 2022... that got out of fashion a long time ago:
cheap: on one of your existing systems [workstation|raspberry|laptop|home-router|server|vm|cloudinstance]
easy: idk. in a short shell-script do something like:
curl -o /tmp/somefile- diff /tmp/somefile- or calculate a hashvalue of the page & compare ... or go for the last-modified, e-tag or content-length in the http-header curl -I put this file into your systems crontab. just my 0.02€
locally hosted linux machine (or vm)
$24/year and you can basically do anything you want.
All you gotta do is select the desired interval, you tell easycron which endpoint you wanna hit, and that's it.
To store data you’d use FireStore and/or Storage.
Both free for your use case (probably)
Cost min $5 per month and then additional $1 for each additional task
I built a cron job utility service with Pipedream workflows because I needed some additional features like sending email report and hooking up to a cronjob monitor like cronitor or healthchecks.
the simplest and cheapest
Lambda
Event Bridge rule