HACKER Q&A
📣 mind-blight

Are people considering moving off of Fly.io?


We're using fly at my work. It's had multiple outages in the last month that have taken down our production servers. There has been no proactive communication and very little insight besides "We've identified the issue and are attempting a fix."

We're now 24 hours into an outage that started with everything being taken offline, and is now causing intermittent 502 errors. Their status page (https://status.flyio.net/) still shows 99.99% uptime 24 hours into an outage.

Besides the outages, the service is great. But, that's a big caveat. We're pretty frustrated and are considering leaving.

Is anyone else in the same situation, and if so what's keeping you/what are you leaving for?


  👤 jeromegn Accepted Answer ✓
I'm the one who created this incident on our status page. I've been overly cautious in resolving this incident, but at this point I think it's causing more harm than good to keep it unresolved on there.

I think it might've prevented users from posting on our forums or sending in an email (premium support). I can imagine users looking at the status page and mistakenly thinking their problems were related to the current incident.

I've interpreted "Monitoring" as essentially meaning: "this is fixed, but we're keeping a close eye on the situation". We do not yet have a formal process for incidents such as this one (but we are working on that).

If our users are having issues, that's a problem. Looking at our own metrics, the community forum and our premium support inbox: I don't believe this to be the case.

Perhaps we should've done a better job at explaining the exact symptoms our users might be experiencing from this particular incident.


👤 monero-xmr
People should learn that using an intermediary other than AWS or Google Cloud for convenience is risky. All depends on your level of risk vs. screwing around, but if you want to go cheap then you should run your own instrumentation on top of bare Linux instances from commodity vendors that can be cycled out easily, and use multiple vendors to ensure outages at one are easily remedied.

Heroku is another example. Can’t trust your business to shaky foundations. The moment they started to have frequent outages your company should have been migrating ASAP.

As a side note, I would never use nor invest in brand new databases. Database tech needs to soak for 10+ years before I trust the software is stable and the organization behind it will exist longterm. A startup using a shiny new database is evidence of weak engineering leadership. Similarly, Terraform / Cloudformation is easy enough that needing something other than AWS tooling itself is making less sense from a cost vs. convenience perspective.


👤 firloop
We're likely going to move off of them. Last year we were using their Wireguard "peering" feature to connect our RDS DB (as recommended by their blog)[0].

This feature had a multi-hour outage, and when we wrote in for support, we were told "[t]he Wireguard peers are intended to get you development access to your network. We didn't really build them to handle inter service communication that affects uptime. The gateways we run wireguard peers on are not redundant."

We stopped using the feature (using Tailscale instead), but in my opinion, that directly contradicts the spirit of their blog and docs, and it really left a bad taste in our mouth. We're probably going to move to Render or something similar soon.

[0]: https://fly.io/blog/ipv6-wireguard-peering/#wireguard-peerin...


👤 nwienert
I used it a year ago and had to move off, just too many errors, a few seemingly lost deployments and needing in general to reconnect or turn things off/on to get them to work. Definitely felt very beta.

Final straw though really was testing DB. I had a $40/mo dedicated server and I spun up their recommended few-node cluster for postgres. Query response time was something like 5x faster for the dedicated server vs their similarly priced setup. I tried upgrading the the top of the line, still much slower and at that point many multiples more expensive.

It wasn't just that though, the entire app was sluggish, whereas locally or with a dedicated box it felt incredibly snappy. I'd have had to be spending something like ~2k/mo to get their top of the line nodes across every service and still would have to accept half the speed of my entire app. The edge isn't very useful if it's not powerful!

Disclosure: I work at Vercel, and I do like what Fly.io does generally. Had these opinions well before working at Vercel was even a consideration. I think a lot of serverless/edge type hosts are hiding their true cost behind cheap low powered nodes. Especially if the most powerful nodes are still less powerful than a very mid-tier dedicated box, there goes your entire app performance.


👤 emilsedgh
We are not. As a matter of fact we just renewed our annual contract with Heroku.

As disappointing as it is that Heroku is basically stalling, the fact is that it was light-years ahead of competition in terms of developer ergonomics. Even to this day, it's still super convenient and reliable-enough for us.

If anyone wants us to switch to their service they can't be as good as Heroku or slightly better. They need to be _much better_ to justify the costs of a switch.


👤 mushufasa
We considered moving onto Fly since we were transitioning away from Heroku. We ultimately decided on just AWS for our core products and digital ocean app engine for smaller experiments.

Fly's overall experience wasn't as smooth as Heroku, from the dashboards to the weird errors for technical things that should work but didn't. The logging and error handling wasn't as informative as it should be. In essence we agreed with the value proposition of "give us PaaS magic with more control over the infrastructure than Heroku" but it wasn't sufficiently magical. The whole low-latency cdn-like distribution angle wasn't really relevant to our use-case.


👤 doublepg23
I found your post by searching “fly.io” to see if there was anyone else reporting problems with their hosted Postgres. I seemingly can’t make migrations and all I can find is a community post that’s slowly growing in responses where it was initially reported four days ago :/

I guess I’ll try out Render?


👤 phphphphp
I’m on the fence. I don’t mind outages and as a relatively new service, there’s some expectation that there will be outages but the frequency and similarity of the outages is a little disconcerting. I’ve not yet moved off but I am reconsidering my choice to use them for production services when there are a variety of alternatives — Google Cloud Run is very reliable.

The unique aspect of their service (ability to containerise an application on your behalf) is not the important part for me so the only thing keeping me on Fly at the moment is inertia — which is a shame, I want to love Fly.


👤 monological
Had way too many errors trying to set things up, so I just switched to render.com, which I love.

👤 mrkurt
Will you email your app details to support (cc me too, if you want). If you're app is 502ing, it's unrelated to yesterday's outage.

👤 tebbers
I considered Fly for our Rails app currently hosted on Dokku, but even 3 months ago there were grumblings that it was flaky and not quite suitable yet for production. So now we are considering Northflank, Render (if they get a London region) or Digital Ocean Kubernetes Service.

👤 itake
You get what you pay for :-/.

Everyone else seems to be more expensive.


👤 epoch_100
I like them. But the outages have been tough.

👤 anacrolix
Yes. There's a lot to like, but they're not evolving quickly enough. There are so many rough edges.

👤 voganmother42
Ironically a new unresolved incident now, so I was initially not sure to which they were referring…

👤 ehaveman
i was planning to use fly.io for my next project!

what are people using these days to deploy a node app (fastify/sqlite backend + vanillajs front end)? last time i deployed anything it was a rails app to an ec2 instance via capistrano - but that was eons ago.


👤 dang
This submission broke the site rules by drastically editorializing the title. From https://news.ycombinator.com/newsguidelines.html: "Please use the original title, unless it is misleading or linkbait; don't editorialize."

Also, if you want to make an Ask HN, those are supposed to be text posts.

Normally I'd bury this altogether but because this is a YC startup and we moderate less in such cases*, I'm going to moderate it less in this case. Please don't do this in the future though.

* https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...


👤 zoomzoom
Just in case any folks are thinking about switching and wondering where else to go, there have generally been 2 alternatives: - Another PaaS - people love render, Railway, Vercel. Heroku is still best-in-class even if not free. Replit has a PaaS built in now too that is getting very real. - Going to the raw cloud e.g. AWS or GCP. As much as folks say that terraform or pulumi or CDK has made this easy, it's just really not the same thing to get a great developer experience without a ton of work.

There's a new class of tools emerging that represents a 3rd way. withcoherence.com (I'm a cofounder) gives you the preview environments, built-in pipelines, and friendly UX that Vercel has set the standard with, while operating against your GCP or AWS account. Lock-in, uptime, service diversity, compliance, and pricing are all better on AWS/GCP than a PaaS. Coherence even adds a built-in Cloud IDE, giving you a gitpod or github codespace alternative with zero additional config or integration work.

Most of the "PaaS in your own cloud" category is a pile of kubernetes abstraction. Coherence is something different, that represents a real alternative for teams that are used to a great workflow, but who don't want to invest the time to glue together open source and IaaS, or who aren't a fit for enterprise grade CNCF-based tooling.

If anyone wants to check it out more or has any questions, happy to answer them or to help with migrations - just hit up hn@withcoherence.com!