HACKER Q&A
📣 vira28

Do you self-host your database?


Now that there are so many options like AWS GCP Azure Digital Ocean Heroku Render Firebase Supabase etc..

I was wondering does anyone these days Self-host databases on their infra?


  👤 tut-urut-utut Accepted Answer ✓
For personal use and side projects, I always self-host. It's so much cheaper considering the tiny size. I usually start with sqlite instead of the "real" database, and in 99% of the cases it stays so.

At work, I never self-host. Life is so much easier if blame for whatever unrelated reason can be outsourced to some cloud provider or internal datacenter team.

Because if we self host the database, we will be responsible that it is not reachable, even if it's because someone in datacenter changed some seemingly unrelated firewall rule. I want to avoid having to explain anything to our customers. Just "have no idea, datacenter team is working on it. You can ask them?" has done wonders for my mental health and job satisfaction.


👤 ciconia
We're currently transitioning from a multi-tenant 2TB postgres DB hosted on AWS RDS to using sqlite instead, a separate database for each client.

We're doing this for multiple reasons: a) As our DB grew the service became very expensive, one of the biggest items in our AWS invoice; b) Keeping the PG servers up to date is a pain, we simply don't have time for this; c) We wanted to be able to migrate to other clouds and even be able to offer a self-hosted version of our platform.


👤 teddyh
“Self-host” is such a weird word. Having your own stuff yourself should be the default, should it not? I mean, you don’t “self-drive” your car, nor “self-work” your job. The corresponding words instead exists for the opposites: You can have a chauffeur and you can outsource your job.

I think the problem is entirely caused by the US having absolutely abysmal private internet speeds and capacity. Since you can’t then have your own server at home, you are forced to have it elsewhere with sensible internet connections.

It’s as if, in an alternate reality, no private residences had parking space for cars; no garages, no street parking. Everyone would be forced to either use public transport, taxis and chauffeur services to get anywhere. Having a private vehicle would be an expensive hobby for the rich and/or enthusiasts, just like having a personal server is in our world.

— Me, 2019-10-13: https://news.ycombinator.com/item?id=21235957#21240357


👤 eric4smith
IMHO self hosting your database (even in the cloud) is the best way to do it.

You have control over the version.

You have control over features.

You have control over performance.

It’s tons cheaper for greater performance - especially when you go over a few hundred gigs.

Yes, the hosted ones have built in replication - but my data is far too valuable to put in the hands of a third party. If they lost it - they could only shrug and say sorry and that’s it. The TOS indemnifies them.

I think it’s kind of honestly lazy — to not take the management of your data in your hands if you run a database of any significant size.

That being said, we do replicate and backup 9 ways to Sunday.


👤 buro9
Yes. A Postgres which is currently only 1TB in size. It's read heavy, the write workload is very low in tens to spikes of low hundreds per second whilst the read is in thousands per second. The workload is stable during traffic spikes due to app caching and CDN caching (for guests).

I do this for cost reasons, self hosting the database is so much cheaper than the managed options that for the same cost I can run the entire app stack, load balanced web servers, etc.

No containers anywhere but maybe in the future I'll add some.


👤 vanviegen
For self-hosting, MariaDB (or Percona or MySQL) with InnoDB is really unbeatable.

- Version upgrades don't require migrations.

- Replication is pretty easy, well understood and allows version differences between the two ends. Nowadays they even got crash resilience right.

- It doesn't require VACUUM or any other regular maintenance.

- XtraBackup is awesome.

I've been running a pretty large and rather critical MariaDB database for about 15 years (of course it has migrated to a different machine a couple of times), without anything 'interesting' happening. (Except for power failures messing up replication consistency - but that seems to be a thing of the past.)

My experiences managing PostgreSQL were... not as great. (Though with regard to most other aspects, PostgreSQL is a lot nicer than MariaDB.)


👤 geocar
Yes. I even go so far as to embed the database in my applications.

In general, this is implemented as two main flows:

- Data collected from "the edge": Web servers that serve ads, or receive form fill-outs for lead generation, that do nothing but record this information in a logfile

- Configuration and Reporting pushed from "the hub": A central processing node (usually in a hot/warm configuration, but I've been known to use things like raft here) that receives information from flow 1, makes a decision and writes the result to flow 2.

Because the "data" I want to materialise to clients always either fits into ram, or is in a coherent stream ready to deliver to a client exactly, my applications are very fast: I can always just add another node (in the right geographic place), however I also noticed another really interesting benefit: It is much easier to develop for than using an external database.

I have a general purpose tool I wrote decades ago that moves logfiles around reliably (and at low latency), merges them together, implements failover, etc, so I really don't have very much to write to make a new application. Sometimes I experiment with the encoding of the payload, but sometimes not.


👤 vbsteven
Yes, I selfhost Postgres/Redis/Mongo for small projects (db + app server on the same machine or a small number of VMs). Usually in docker, sometimes systemd units. It's amazing what you can do on a single $5-50/month machine if HA isn't super important.

On larger projects (typically once k8s gets involved) I'm running on a cloud provider anyway and I might as well use a hosted version like RDS for the main database.

It comes down to the importance/budget of the project. I'm not a Postgres expert by any means but I'm confident enough that for simple use cases I can manage selfhosted. And if I need more, hosted versions are available at a cost.

However, any hosted DB product I use has to be open source and in theory easily replaceable with a selfhosted version. After the Parse.com fiasco I'm averse to closed/proprietary components in my infrastructure.


👤 don-code
When I last tried, it, AWS RDS still required EBS-backed disks. It may be acceptable for 99% of use cases, but that means there's a ceiling around 60k IOPS, even with PIOPS disks.

Outside of RDS, you're afforded a bit more creativity in your performance profile: multiple disks in RAID 0, tiered storage (burstable AND, rather than OR, provisioned IOPS storage), and instance store, to name a few. I actually once witnessed one of our databases riding out a 400K IOPS storm for a few minutes.

Before I get downvoted: yes, these introduce risk into your data architecture. It's crucial to understand what these choices will do to your failure profile, and plan accordingly. In our case, fast-failover to a hot spare database, combined with total recovery within two hours, was enough of a compensating factor to manage that risk.


👤 kureikain
I self hosted everything from Postgres to ElasticSearch.

IMHO, managed solution ins't that much reliable than running your own with the right planning. Failover strategy of AWS RDS is just absurb. It's literally just promote other instance, switch DNS over. And I do see issue where the new master are behind a few transaction compare with the replica...

Managed database is just to move the responsibility when the database is down to somebody else.

Self host gives you a lot of flexibility to mix and experiment with new technologies and tooling.


👤 herbst
In the 6 years of running all the infrastructure for multiple thousands of daily clicks myself I only once had a currupt database that i fixed like 15 minutes after I got the bug report via email. And that's 5 years ago.

So yeah running my own postgres, one on each of my web servers actually. Mostly because it's simply sooooooo much cheaper and I hate monetary growth limitations for my projects


👤 cakoose
This article convinced me to go managed for Postgres: https://rbranson.medium.com/10-things-i-hate-about-postgresq... (2020)

> The good news is that the pain caused by many of the issues brought up in this post can be reduced or eliminated by using a managed database service like Heroku PostgreSQL, Compose PostgreSQL, Amazon RDS for PostgreSQL, or Google Cloud SQL for PostgreSQL. If you can use one of these services, for the love of all that is holy, please do!


👤 Glyptodon
Last place I worked self-hosted. Price performance on old hardware was (and seemingly is) drastically better than cloud DB for smaller usage. Current work is using AWS RDS and it seems to perform pretty poorly unless we're willing to pay a lot more than our scale justifies. (Frankly, I suspect running our prod DB on a developer laptop would outperform the low-cost RDS setup we're using...)

👤 tpetry
I do host a medium-sized PostgreSQL database, but i would love not to. I need the control you won't have with a cloud offering (you can't install extensions) but i don't want to do all the other steps, and you always have the fear that backups do not work correctly.

And i am not the only one hosting it by my own, many people are using self-hosting PaaS like dokku, flynn or caprover. And all these solutions have a common problem, they all need to reinvent the database as a service layer. What is currently really missing is a good open-source PostgreSQL as a service appliance, something "simple" like Heroku. There have been attempts like Elephant Shed, but they all try to do too much and therefore fail in adoption as they never reach stability. Or people are forced to use complex solutions like patroni which is doing many things, but if something fails you have no clue what to do.

So what is really missing?

1. A simple old-school PostgreSQL vm image 2. Built on a copy-on-write file system to clone production environments like https://postgres.ai/ and Heroku for development reasons which will not really need any storage space. 3. A built-in backup solution like pgBackRest storing the files encrypted in some cloud storage (and restore options!) 4. It does not need a complex ui or inspection/monitoring software, there are many solutions you can run on a different machine. 5. Replication, Auto-Failover etc. are hard just make it a single server which you scale vertically, if you need horizontal scaling you have very specific needs, there can't be a one size fits all solution, so don't even try it. And a real bare-metal server with nvme disks has a lot of power it's insane how fast it is compared to cloud hosted databases.


👤 dublin
Yes. A few years ago, was all-in for the cloud. Today, we have way too many examples of the cloud providers shutting down people they disagree with, for reasons that usually have nothing to do with terms of service. Given the rapidly expanding judgment of speech that is deemed "unacceptable", this is a real problem for everyone, not just conservative or free speech sites.

One example of many: America's Frontline Doctors, famous for their dissemination of factual information about Hydroxychloroquine and Ivermectin for Covid treatment, had their hosting pulled by AWS just a few weeks back, despite the ever-growing evidence that their stance is scientifically supported. (And I don't care if they were lying their butts off, AWS still shouldn't have pulled their hosting.)

Cloud services offer awesome leverage, which is a huge advantage, but anyone relying on them for their business or organization to operate is being completely irresponsible.


👤 latch
We host our own CockroachDB cluster on 3 baremetal machines.

We also host 2 separate PostgreSQL instances, each with an asynchronous replica (which serve some of the read queries) and with bareman for PITR.

The PG instances used to be on RDS. The performance difference (and cost savings) is....at least least 10x.

We have no devops/sysadmins.


👤 lbriner
I'm not sure whether you are asking whether people use Database-as-a-service instead of installing the DB themselves or whether you are asking about on-premise vs cloud Infrastructure-as-a-service.

As many have noted, DaaS is very convenient but not always completely flexible with configuration and the pricing is not comparable since there is usually a minimum price even if you only want 1 table with 10 rows so it won't scale in the same way.

For all of my home and work projects, we have servers hosted on the cloud to get better internet bandwidth and we install and run our own SQL Server and MySQL instances. A bare Vm is pretty cheap on the cloud and installing databases is fairly easy, although I have never personally setup clustering or failover in MySQL or SQL Server, I think that is quite involved.


👤 bsenftner
I embed an SqlLite3 DB into the application, which self hosts the DB as well as self hosts it's own Internet/web server. The application syncs with other copies of itself running in other locations. The application hosts its own DB, its own server, and has a RAM footprint of under 2MB before the DB is loaded into the memory mapped backing file. The application (video security) is screamingly fast, cross platform, and 100% self contained. Works fine on an air gapped network. Its biggest issue is the hardware footprint and expense to operate is so much lower than every other option, it makes people suspicious. That suspicion should be on the crappy other options, so bloated they are a sad, expensive joke.

👤 sumanthvepa
Yes. I self-host for my infrastructure. MariaDB and PostgresSQL. I've been self-hosting MySQL and Postgress for nearly 15 years now. But it's not something I recommend my clients doing.You need to be really careful to avoid foot guns like the one that took down new blur. For example. I don't run my DBs in containers, although I run my applications within containers.

👤 lolive
I wonder if it is a good deal or not. But i Host m' own Virtuoso instance, an open source DB for graph data, with a 200GB .dB file. Plus an Elastic with roughly 500GB of data. On a OVH dedicated instance, with 128GB RAM. All that for something like 1100€/year.

Would a hosted service be a better Idea, in your opinion? (That machine also has a LAMP stack for some of my web stuff).


👤 saltcod
I used WordPress as my datastore for a while on personal projects. I could wrestle it mostly into shape and use the built-in rest api. WP is my day job so I know it well and like it, but I'm not a backender, so it was a bit clunky and time consuming to get things the way I wanted.

Then I needed to host that WP instance somewhere. That was also a pain.

I decided last year to try and use Node/Express/Mongo for my backend instead and stay fully in JS land. But now I need to host a Mongo db. Instead of that, I tried Atlas for hosted mongo, but ....it feels like it's from corporate America circa 2004.

Then came Supabase.

I randomly came across it, gave it a try, and it's heaven. It's absolutely dead simple to get working (for my simple needs). It can work as a regular DB or as a realtime thing like Firebase.

Strongly recommend trying Supabase.


👤 mschaef
For personal stuff, I tend to keep it about as simple as possible with an HSQLDB instance hosted in-process. (All JVM/Clojure)

I use a small library that encapsulates the use-case (including some schema migration stuff that should probably be replaced with flyway).

https://github.com/mschaef/sql-file

There's a lot that can be said about this approach, both for and against, but I find that it brings a lot of power, is easy to set up, and generally stays out of the way. Given that the fundamental abstraction presented to user code is essentially SQL with connection pooling, there are also good escape strategies to get to architectures that would support however much capacity I'm ever likely to need.


👤 juangacovas
Self-hosting MariaDB with multi-master replication. Works pretty well for my cases. Have a friend that was surprised by the AWS pricing changes on Aurora (they charge per query now instead of bandwidth/usage?)

👤 howaboutnope
The idea of "outsourcing blame" mentioned in several comments seems really weird to me: If I make the choice to outsource something, and whoever I outsourced it to fucks up, I'm still the one who made the decision to outsource it. The same goes for outsourcing to someone who does a better job than I could have: then that was a great idea, and yay for me.

👤 BozeWolf
Right now we (team of 12, 1 devops engineer) use mongodb atlas. While mongodb/nosql itself really is not giving us any value and that I really wished that postgresql was picked for our relational data, I must admit that atlas really is a nice product.

Hosting is not only about speed and price. We use atlas (=hosted mongodb), because we get that nice dashboard with all those statistics and performance tuning hints. Also, if we screw something up, mongo engineers are available immediately to help us. That was nice when we broke production because of index creation on the foreground. (Which pymongo defaults to, even though mongodb itself defaultst to backgound creation). We consulted them for tuning a frequently running “slow” query and that helped.

In short: hosted solution for support and less maintenance. Backups are arranged. We can choose which version to run on and upgrades are also arranged.

Question: is there a comparable service for postgresql like mongodb atlas?


👤 marvinblum
We're hosting Postgres on HashiCorp Nomad and ClickHouse on a separate VM for Pirsch [0]. The Postgres db is only a few kb (maybe it's a few mb now, I haven't checked in a while), as it is only used for user accounts, settings, and some configuration. It doesn't do much so it's doing OK in the cluster using a host volume on one of the machines. ClickHouse uses more storage (don't know how much right now, but it should be less than 100mb) and resources and therefore lives on its own VM.

The main reason we self-host is privacy and cost. Postgres costs almost nothing, because it's part of the cluster we require anyways (also self-hosted) and ClickHouse can be scaled as needed. Hetzner has some really cheap VM, our whole setup, including the cluster, costs about 45€ a month.

[0] https://pirsch.io/


👤 choffee
No. Unless your business is hosting databases for people your time can be better spent delivering real business value. Draw yourself a Wardley Map of the services in your product and you will see the place you should be spending time on is delivering features that the customer values not building copies of commodity services.

👤 strzibny
Self-host for me.

You can use a cheap virtual server + attached storage. In my mind, it's not as difficult as people let you believe (until you hit a scaling issue and want horizontal scaling for example).

I also teach how to do it in my upcoming book[0] where I even have a scripted demo to create your first cluster including SSL, SELinux, attached storage,...[1].

For work stuff, I would just use managed offering in the cloud the company already has. So far, that was AWS and Azure.

[0] https://gumroad.com/l/deploymentfromscratch [1] https://gist.github.com/strzibny/4f38345317a4d0866a35ede5aba...


👤 renewiltord
Used to do this for myself and it worked great for over a decade. But because it was a decade old I had never made it easy to replicate. It was a pet.

I know better now, but also all of those other things are easier when they’re a terraform config than when they’re manually managed. I use RDS now.


👤 hedwall
We don't, the maintenance cost in man hours was way to high and replacing it with AWS Aurora made it at least as reliable with a lot less overhead.

On the plus side we can do ad hoc tests and experiments by creating a new Aurora cluster with a recent snapshot and try things out.


👤 yogevyuval
I used to self host it and moved to RDS (postgres), haven't looked back since. It's one thing to just run a DB with docker, but it's much different when you are talking about version upgrades, logs management, snapshot & restore... etc'.

👤 andix
I never used a managed database.

For smaller projects it's too expensive. I just deploy a dockerized database next to the application.

For bigger projects every customer so far wanted to have the data physically in their data center. So we just installed databases on VMs.


👤 bob1029
Yes, but we took it 1 step further and put the database inside the application.

Sounds crazy until you learn that you can expose user-defined functions to SQL and eliminate all network overhead when it runs in the same process. SQLite operations are effectively a direct method invocation across a DLL boundary. If you want queries to reliably resolve within microseconds, this is a great path to go down.

Not for every application, but it fits ours extremely well. Deploying our software to customer environments is trivial because we only have to worry about the 1 binary image. No external databases, docker hosts, etc. are required to be installed.


👤 sigio
Yup, self-hosted MariaDB/Galera clusters, postgres clusters (many), rabbitmq, redis, and many others. Self-host everything, due to legal/governmental requirements, and also because it's usually cheaper and faster.

👤 GianFabien
I used to deploy apps using Google AppEngine and DataStore. For most projects, now just use AWS EC2 with EBS and self-managed Postgres or SQLite.

In my experience, it is rare to have 100+ TPS applications. As for FAANG scale infra - YANGNI!


👤 hughrr
Yes. Our stuff won’t fit on any cloud instances for a reasonable amount of money.

👤 mceachen
This decision is easy: if you or your team is comfortable caring and feeding the DB and know how to harden and monitor the thing, save the money.

If y'all don't want to be on pager duty, though, pay someone else to manage it.


👤 axegon_
For work projects - no, haven't in many years, since 2016 iirc. For personal projects - yes - I have plenty of hardware resources to spare so it costs as much as the electricity needed to run them(which they do regardless). That said, backup dumps go to a gcp bucket since it's far less likely to fail then one or two drives shoved in the basement and storage itself is incredibly cheap. Besides personal projects in my case are not a source of income so cutting costs as much as possible is the sensible thing to do.

👤 artificialLimbs
We self host on an IBM DB2 database. 4 hours unscheduled downtime in about 20 years. It was almost the only thing unaffected when we got hit by ransomware last year.

👤 SergeAx
Interestingly, there are two stages of any project when self-hosting is way more efficient.

First is an MVP stage, when you do it on a single VPS for $5/month + $1 for backups.

Second is multifold scaling stage, when you rent or even buy a couple of cabinets of bare metal hardware.

BTW, AWS Postgres does not support master-master replication, which makes no-downtime migration pretty hard. Just remember about those small quirks making vendor lock stronger.


👤 dith3r
I'm hosting own MySQL, Mongo and Elastic clusters. At the beginning it takes more time to setup than cloud providers solutions and require more knowledge and tooling to do some ops(upgrades, backups etc). But you know which version of db you are running and how it is configured. Additionally cloud providers own solutions are binding with one of them.

👤 joshxyz
Hmmm varies, if it's corporate it's on cloud vendors and managed, if personal or hobby projects it's unmanaged.

👤 thorin
A lot of big companies do. The large UK utility I work for has dozens of large production Oracle dbs. I'm sure this is also the case with SQLServer / Postgres / DB2(!). The tide is turning though and for new systems /projects without strict regulatory requirements I'm sure cloud will become the primary choice.

👤 bognition
The company I work for self hosts our databases. We self host HBase, Elastsicsearch, MySql w/ Vitess, Kafka, and a few other things. Our scale is large enough so it makes a lot more sense to pay for engineers rather than to pay for a service.

I work on the Data Infrastructure group and am more than happy to answer any questions about it


👤 kayman
For bootstrapped projects without a PMF - which are my side projects I self host.

Postgres hot hot with 2 servers.

For clients I always recommend PaaS


👤 PikachuEXE
Self hosting PSQL via Docker container + Ansible. Also hosting MongoDB, Elastic Search, Redis instances (for session & caching, separate instances).

Might consider researching using pg_bouncer later (but really not needed right now).

Pricing and support for vendor solutions are too bad for small scale app.


👤 scottydelta
I am self-hosting postgres containerized on the server with auto encrypted backups to s3.

Besides this I also run a self hosted production grade elasticsearch cluster which I manage via ansible.

Both of these would be insanely expensive if I chose any of the managed solutions from aws or gcp.


👤 oneplane
For anything in prod, not anymore.

For anything in the big clouds, also not anymore.

For everything else: "it depends", but hyperlocal and datacenter stuff is generally a mix of self-hosted and third-party-managed-self-hosted.


👤 flowerlad
If portability from one cloud to another is important to you then self-hosting your database is the only way to go. I use Kubernetes, and have MongoDB running in a docker container.

👤 furstenheim
For postgres if you self host then you can create ad hoc native extension. Most of the time it's not needed, but I've seen 2-4x speed ups in critical queries

👤 sgt
Yes, self hosting personally and also large mission critical enterprise systems at work primarily based on PostgreSQL which is self hosted on robust VM infrastructure.

👤 onion2k
Yes, in the sense of running a container in the cloud. The security requirements for what I work on means it's pretty much impossible to do it any other way.

👤 blodkorv
We do simply because its cheaper and there is not much to take care of with our workload.

We are running mariadb on a debian vm in azure and using both vm backups and mariabackup.

Works fine.


👤 tgv
We have a few independent systems. Each runs on a two core VM with room to spare, so why not run the db server there too? It literally costs nothing.

👤 geschema
Please enlighten me. I run a commercial web application (Spring + MySQL) from a Linode VPS. Why would I would I want to host my database elsewhere?

👤 dengolius
Yes. We used Mysql setups, Percona Xtradb cluster and Clickhouse servers and it's ok to run it by myself cuz we work on bare metal servers.

👤 bkovacev
Where does one start when wanting to self-host a DB?

👤 tobyhinloopen
Yes, we have multiple physical and virtual servers. Why? Because I don’t like the added abstraction from hosted solutions, and costs.

👤 stanislavb
Yes, of course. It’s not that scary or difficult.

👤 Aeolun
I do it for my own projects. For work, where cash is not a constraint I use managed options (mostly AWS RDS)

👤 YuukiRey
I use SQLite with litestream.io/ for backups for personal hobby projects.

At work I would always go with a managed DB.


👤 lewisjoe
Yes, I self host my databases. Gives me control to choose my own backup mechanism and costs much less.

👤 tonymet
You’ll pay much more using cloud hosted DBs vs bare metal. Cloud hosted bills you by metering iops

👤 kxrm
Yup, still self host the DB as I have been doing for over 20 years. Have 40GB stored in MariaDB.

👤 linux_devil
For production , I use managed service specific to MongoDb which is Atlas.

👤 thrixton
We self-host a postgres cluster in k8s (AKS).

The main reason being that the Azure postgres offering was not fit for (our) purpose.

We did start with the Azure offering, and our default is generally to use the cloud offering.


👤 aorth
Every single one! The cloud is a trap.

👤 pjmlp
Yep, all the time.

👤 throw298340
Qqqqatt