Now I know that MySQL is the most popular (because there's a banner on mysql.com that says so! /s), and I get the sense that on raw speed alone, it's probably faster (this is purely anecdotal), but folks that use PostgreSQL speak about it with an air of superiority.
So I'm hoping some of you HN users can shed light on this, based on your experience.
Is there a clear winner, or is it purely a case of "it depends"?
(and if it's the latter, is there a generally agreed comparison list of pros and cons that I can refer to?)
Thanks!
The philosophy of mysql was to do the best it can as fast as it can for as many users as it can. Not necessarily the 'right' or 'best' way.
Its not like either philosophy is formally documented or mandatory but "generally" matches up pretty closely to real world behavior.
The biggest problem you'll have evaluating them is both products are incredibly old and have changed over the decades so you'll find plenty of web search results DEMANDING that mysql can't do transactions, for example, because that was written 25 years ago and transactions were pretty much a solved problem some decades ago on mysql. Similar issues exist for tooling, library support, clustering, I wouldn't trust a many years old blog post for either product LOL.
For anything other than a data-oriented startup, I'd pick Postgres all day long.
The troubles each database gives you do have different flavours. Postgres's query planning is mercurial and sometimes quixotic. It can be tedious forcing it to do the right thing, and it does sometimes switch strategy resulting in queries taking 100x or more longer than they used to, just because some stats changed. MySQL's query planner is more stupid, more predictable, and less likely to give you this kind of pain. OTOH you have to be more careful writing queries and you sometimes end up adding hints or encoding a query structure which locks in a query plan which might not be the best one long term. But it usually degrades gradually rather than suddenly. I'm not sure which is worse, though, because gradual degradation doesn't demand fixing.
Postgres back then was completely different then what it is now. It was mainly used for PhD and Master's student to hack on for their research. It was a mess internally and was hardly usable for production. I don't think that MySQL went through this style of development.
What eventually happened is that around 1995 SQL was added to Postgres and a bunch of non-Berkeley people started hacking on it. They did a fantastic job, and deserve all the credit for making it what it is today (Stonebraker has publically said this).
I find it useful to remember that MySQL has various storage engines (that make different trade offs, and may themselves have different options such as various row formats), when using for instance the default of InnoDB knowing that it uses index organized (meaning stores the data in a btree as opposed to heap) tables can influence what workloads perform well (is the primary key used heavily/load bearing).
Postgres often excels when queries are more complex, for instance being able to use partial indexes (include a subset of a table, a “where” clause in the index) can be very powerful.
One other point of differentiation that has been useful for me to consider for various projects is also the programming within the database(stored programs): specifically doing things like writing stored procedures/functions, Postgres has many high quality procedural languages(some are shipped in core like PL/Perl, or others like PL/R that are external) that can useful to express these with.
- Streaming Replication (keep a secondary instance for contingency or moving heavy queries away from the main database);
- Procedures in various languages;
- Foreign Data Wrappers for integration with other databases (much sturdier and Oracle's Heterogeneous Services and there's even an FDW for Elasticsearch);
- Backup/restore is quite easy as is upgrading.
The only thing I miss are packages like Oracle's.
So, unless I have no other option, I'll choose Postgresql.
To be fair, my PostgreSQL experience is more recent, and both products have matured a bunch, but I've also seen plenty of the bad old MySQL problems—like MyISAM tables and 3-byte unicode—in just the last couple of years.
No, not really. All else being equal:
I'd say that today, it's much muddier than it was maybe ten years ago, before the data science explosion and cloud explosion.
That is, turn-key cloud Postgres is not Y2K Postgres, and MySQL post-Maria is not Y2K MySQL. In the former, administration of Postgres has gotten much easier since Heroku got involved and Andy Pavlo started talking about self-driving databases. In the latter, MySQL can do text search well enough that I sat pair programming this with a young engineer around 2016-2017 when I'd have argued prior that you were wasting your time.
In the end, it probably comes down to how you rank your requirements:
usability, performance, suitability for a particular purpose.
Things like postgis and pgrouting may have little equal, so like others, those can be huge differentiators if you need them and see the advantage in db backend computation without invoking the N+1 query problem in aggregate.
I'm biased since I used MySQL before the inefficient backend was deprecated and I used PostgreSQL to do all manner of experimental EAV implementations using bjson including various forms of tuples from Rich Hickey.
All hail Stonebraker.
The fact it overlaps with MySQL if your use case is "backend for a common ORM" barely makes them comparable.
I recommend https://pgloader.io if you're currently on MySQL and want to switch your existing app.
Coming from Oracle, I've been able to use PostgreSQL pretty much as though it were Oracle, including Oracle's optimistic locking strategy. Some of the SQL functions are a little different, but it seems the underlying philosophy is the same. In other words, PostgreSQL seems like Oracle, just a little different.
MySQL has always felt completely foreign to me. Depending on your background and familiarity with other RDBMS's though, you may be right at home with it. Wouldn't be a bit surprised if someone had a similar anecdote to mine replacing PostgreSQL with MySQL and Oracle with SQL Server.
If you don't have a personal preference for either system then do some benchmarking on the platform you'll be running them on and the loads you're expecting and see if either particularly excels over the other for what you're intending to do.
All other things being equal they actually perform about the same for our use case.
Then let’s talk replication… MySql GTID/Binlog replication is stupid easy to setup. And _it just works _ and recovers really easy. You can make hundreds of near-real time replicas.
Finally the tooling. I find MySql’s tooling far easier to use than Postgres. Nearly everything is SQL. You have to memorize configuration in both cases so take this last point is an opinion.
Both are a good choice. I think would save if you need absolute single-server performance, probably Postgres is the better option, but understand you're only getting marginal gains. If you want to make hundreds of read-only replicas or use their semi-sync HA plugin, MySQL is probably way to go, but understand that multi-master writes is still not really a thing.
Today, both databases are well supported as a service on major cloud platforms. Postgres does have some performance characteristics that lead Uber to drop it [0], but this in itself is not a reason to not choose it, because most teams don't have their kind of workload.
If I were to start a new project today however I would seriously consider Postgres given that Oracle is in my opinion fairly hostile. For example, they re-licensed the .NET driver from LGPL to GPL so that anyone updating would either have to buy a commercial license or open-source their entire product. While there are OSS .NET drivers, this move alone is enough to make me think twice about using it again for a future project.
[0] https://www.uber.com/en-RO/blog/postgres-to-mysql-migration/
PostgreSQL is a great database, it's hard to go wrong with it.
I mean, unless there's something you know for sure that PostgreSQL won't do for you, it's a great default choice (even over commercial options such as Oracle or SQLServer).
Our access patterns are mostly analytical. A lot of scanning and aggregating has to be done. Nobody cared about clustering before so we had to change a lot of table schemas to make our IO efficient.
I only heard from colleagues that such issues are easier to handle in postgres because it is more flexible. But for mysql, you have to understand that primary key and clustering key are always the same thing.
(of course this depends on the storage engine, but it's the behavior of the standard engine innodb)
The main pro about postgresql is that it’s immensely versatile and can do an incredible amount of things.
The main pro about mysql is that there are some open source and battle-tested multi-master extensions for high availability.
* If you have pre-5.6.4 date/time columns in your table and perform any kind of ALTER TABLE statement on that table (even one that doesn't involve the date/time columns), it will TAKE A TABLE LOCK AND REWRITE YOUR ENTIRE TABLE to upgrade to the new date/time types. In other words, your carefully-crafted online DDL statement will become fully offline and blocking for the entirety of the operation. To add insult to injury, the full table upgrade was UNAVOIDABLE until 5.6.24 when an option (still defaulted to off!) was added to decline the automatic upgrade of date/time columns. If you couldn't upgrade to 5.6.24, you had two choices with any table with pre-5.6.4 types: make no DDL changes of any kind to it or accept downtime while the full table rewrite was performed. To be as fair as possible, this is documented in the MySQL docs, but it is mind-blowing to me that any database team would release this behavior into production. In other words, in what world is the upgrade of date/time types to add a bit more fractional precision so important that all online DDL operations on that table will be silently, automatically, and unavoidably converted to offline operations in order to perform the upgrade? To me, this is indicative of the same mindset that released MySQL for so many years with the unsafe and silent downgrading of data as the default mode of operation.
* Dropping a table takes a global lock that prevents the execution of ANY QUERY until the underlying files for the table are removed from the filesystem. Under many circumstances, this would go unnoticed, but I experienced a 7-minute production outage when I dropped a 700GB table that was no longer used. Apparently, this delay is due to the time it takes the underlying Linux filesystems to delete the large table file. This was an RDS instance, so I had no visibility into the filesystem used and it was probably exacerbated by the EBS backing for the RDS instance, but still, what database takes out a GLOBAL LOCK TO DROP A TABLE? After the incident, I googled for and found this description of the problem (https://www.percona.com/blog/2009/06/16/slow-drop-table/) which isn't well-documented. It's almost as if you have to anticipate every possible way in which MySQL could screw you and then google for it if you want to avoid production downtime.
There are others, too, but to this day, those two still raise my blood pressure when I think about them. In addition to MySQL, I've pushed SQL Server and PostgreSQL pretty hard in production environments and never encountered gotchas like that. Unlike MySQL, those teams appear to understand the priorities of people who run production databases and they make it very clear when there are big and potentially disrupting changes and they don't make those changes obligatory, automatic, and silent as MySQL did. IMO, MySQL has its place for very specific workloads, but if you don't have deep enough knowledge of DB engines and your workload to know that yours is one of those specific workloads, you should default to PostgreSQL.
MySQL has very mature cluster managers like Vitess / Planetscale. Citus for Postgres was bought by Microsoft and doesn’t seem as popular as Vitess.
Having said that, my preference is for postgresql these days. It feels more ergonomic for the (fairly boring, normal database kinda stuff) things I do.
Most answers allude to either: they are kinda same and full-featured, or different philosophies. Neither really hints at how you pick one.
if you are already using MySQL, Postgres isn't sufficiently better that it justifies the cost of migrating.
If anything Postgres has a much clearer development model. Lots of features in the pipeline.
(Meanwhile a senior dev in iowa is picking up cow shit, and not regretting his career change).
- Take data types, for example. Seems something simple, right? You can grok all data types in PostgreSQL in an afternoon. Want to grok MySQL data types? You're going to waste an entire week to grok them. There are multiple data types that seem to do the same thing with very small differences and a lot of incompatibilities... and a lot of exceptional behavior (see point below). MySQL data types are LESS powerful, less flexible and 10x more complicated than PostgreSQL data types.
- Enums and sets need to have their possible values defined on a table by table basis. What a nightmare to keep them in sync. PostgreSQL allows you to this once for the whole DB.
- Open MySQL manual and it's full of passages like: feature A does "blah" if X mode is enabled, but does "bar" if mode Y is enabled and will corrupt data if storage engine is used with mode Z enabled.
- SQL strict mode can be disabled by the client. You cannot enforce that your server will work only with SQL strict mode on because a client can simply open a connection with SQL strict mode off. There's no way to enforce that clients will only use SQL strict mode. You have to trust your clients. (This other comment explains strict mode: https://www.reddit.com/r/PostgreSQL/comments/xblooo/comment/...)
- According to the manual, if client A does stuff with strict mode on and client B does stuff with strict mode off, data corruption will occur.
- A lot of surprising and unexpected behavior, even with SQL Strict mode on
- String comparison is case insensitive. There are workarounds but they are still workarounds.
- A lot of behavior is dependent on server configuration. So switching between servers you can't expect them to behave the same.
- Behavior not standardized among versions and installs
- Changing the server SQL mode after creating and inserting data into partitioned tables can cause major changes in the behavior of such tables, and could lead to loss or corruption of data
- Useless error messages (see https://www.reddit.com/r/PostgreSQL/comments/xblooo/comment/...)
- MySQL, at the same time it is less powerful and flexible, has a lot more levers and knobs than PostgreSQL. PostgreSQL, while more powerful and flexible, conceptually is a lot simpler than MySQL.
- PostgreSQL stays out of your way; with MySQL you’re constantly fighting it
PostgreSQL -> simplicity
MySQL -> complexity
The functionality: from what I've heard and seen, PostgreSQL is commonly touted as the solution with more correctness in regards to SQL, but also has great support even for things like JSON, as well as PL/pgSQL is pretty nice to use, in addition to custom types and a whole lot of other useful stuff. That said, not every system needs than and the MySQL/MariaDB approach of having databases instead of databases and schemas is a bit simpler. That said, transactional DDL in PostgreSQL is a big plus, as someone else mentioned.
The performance: it probably depends on what you're doing. Benchmarking both for your particular workload is probably the way to go, though historically using pgBouncer has been a good idea for PostgreSQL. In the current day and age, both should be decent choices, especially with partitioning and clustering.
The tooling: here I'll come out and say that I like MySQL Workbench more than pgAdmin, both because it feels more like a native piece of software, but also because they have really great ER diagram functionality that when coupled with forwards/reverse engineering and schema sync can allow you to plan and alter your schema in a visual way: https://dev.mysql.com/doc/workbench/en/wb-design-engineering... (and it gives you SQL at the end of the day, so you can further alter things as necessary, really nice for both exploring the schema, so you don't need someting like DBVis as much, as well as for quick edits). That said, pgAdmin has built in support for displaying geospatial data (e.g. PostGIS) which is pretty great since you don't need something like QGIS as much.
The developer experience: both are good. Both have containers that are easy to run, both are easy to connect to and work with (though MySQL drivers sometimes aren't shipped with other software out of the box), both are widely supported by a variety of tools (DataGrip is one of the better ones out there as well), and they also have managed cloud offerings from many vendors, if needed. Both will scale up and down pretty decently, though PostgreSQL seems to use a bit less memory in containers and has faster startup when I last checked, though that's a minor detail. Either will be easier than trying to find a trustworthy Oracle XE container image, or to build your own with some DB setup and initialization scripts that actually work and don't throw errors about the DB being closed when the container restarts.
At the end of the day, it's hard to go wrong with either of them, much like even SQLite can be a good choice for some workloads. Right now I'm building a personal project with MariaDB, but Keycloak more or less demands PostgreSQL, so running that as well. Though in regards to MariaDB one could probably mention the whole SPAC situation that went down a bit ago.