HACKER Q&A
📣 ReD_CoDE

PostgreSQL or MySQL?


I have four questions related to SQL

1. PostgreSQL or MySQL? And why?

2. Is it possible to build a hybrid database schema? For example, SQLite+JSON?

3. Is it possible to convert XML(XMI) schema to SQLite schema automatically?

4. Is it possible to build a custom file format based on SQLite or hybrid one based on SQLite+JSON?


  👤 nbouscal Accepted Answer ✓
The following isn’t the top reason I recommend Postgres, but is the reason I think least likely to be echoed in a dozen other comments:

Postgres has some of the best documentation of any software product I’ve ever used. If someone wants to learn about SQL or databases, I always have to restrain myself from recommending that they just read the Postgres manual front to back. It’s comprehensive, it’s well written, it’s easy to navigate, and almost every explanation comes with several clear examples. It’s hard to overstate how valuable a property that is for a product as central to your architecture as your primary transactional database.


👤 matthewmacleod
On the first point, default to using Postgres unless you have a particular reason not to. In general, it offers more and better-executed features. There are some scenarios where MySQL offers a benefit and this advice doesn’t apply, but it’s unlikely you will need to worry about that if you’re in a “what database do I use” position.

Your other questions are a bit too vague to be answerable. It’s possible to build anything you want, including custom file formats or schema converters, if you want to invest the time. You might have more useful responses if you explain the use case instead of the solution you already have in mind.


👤 jerrytsai
I will respond only to Q1, taking the perspective of analytics and data science.

Answer: PostgreSQL > MySQL

Postgres implementation of SQL includes a few useful clauses that are useful for analytics that MySQL does not support.

It used to be that MySQL had no window functions, and that made it wholly inferior to Postgres when it came to analytics. However, it seems MySQL began supporting window functions two years ago, so that is no longer a reason to choose one over the other.

There are at least two features supported in Postgres that are not available in MySQL that I use often enough to care:

• SELECT DISTINCT ON

• FULL OUTER JOIN

Having these saves dev time. It is possible to implement either using MySQL, but your code will be more verbose, creating more opportunities for error.

If you care about analyzing the data for data-scientific purposes, you would be better off using Postgres. It isn't just the couple of extra clauses. It's also having more (useful) indexing choices and little choices like being able to use two pipes (||) to concatenate strings instead of CONCAT().


👤 nmpennypacker
Personally,PostgreSQL. Both have their advantages and disadvantages, but my standard arguments in favor of Postgres are that it: 1. Enforces data types natively. I've shot myself in the foot before with MySQL and 'polluting' data. 2. JSONB support makes it easy to use a 'hybrid' schema if you want NoSQL-like behavior or a junk drawer to shove JSON into.

Regarding your conversion questions, it's common to have an app that communicates with several databases, but I don't know of any architecture that would allow ACID transactions in two engines simultaneously.

As far as converting data, it all depends on what your development preferences are. I work mostly in the Ruby ecosystem, and I can tell you that I've used ActiveRecord to a Postgres database, and exported the results into a SQLite file. I'm sure it would be possible to use an XML parser like Nokogiri to convert XML into a SQLite file as well. If you prefer another major programming language, the tools likely exist to accomplish the same thing.


👤 dragonsh
Given the progress both Mariadb and postgresql has made I think you will not go wrong with either. In both you have support for json now which one you chose depends on your use case.

Mariadb - If you want easy horizontal scalability with master-master replication go with mariadb.

Postgresql - If you want reliable and powerful sql-92 compliance and willing to work with master-slave go for postgresql. You can do master to master with postgre-Xl [1]. But still given the ease of use with which you can work with galera cluster I would not recommend postgresql for master-master.

Anyways for your other questions:

2. SQLite already supports json type [2] so its not SQLite+JSON its just SQLite.

3. Yes you can write a simple python script to do it.

4. Why to use a custom file format when SQLite has its own format with support for json.

[1] https://www.postgres-xl.org/

[2] https://www.sqlite.org/json1.html


👤 madhadron
1. Do you have experience running PostgreSQL or MySQL in production? Use what you know. Do you have friends or colleagues with that experience? Use what they know. Is the stack you end up using predominantly used with one or the other, so you have lots more people in the community who will have hit issues you might have? Use that one. Failing any of those? Default to PostgreSQL.

2. I think you mean SQL+JSON, and the answer is PostgreSQL's JSONB columns, or dumping blobs into rows in MySQL (just keep your row sizes below 4kb).

3. Possible? Yes. A good idea? Almost never. Trees and relations are quite mathematically different, and you will need to specify quite a bit of the mapping to make it come out in a way you would actually want to work with.

4. Not really...generally you would have a SQLite database. If you need stuff that doesn't go in well as a blob, you would put it in a file and include a reference in the database.


👤 gtsteve
When considering between PostgreSQL and MySQL bear in mind that client-side drivers are licensed differently. I believe the MySQL connector is still GPL licensed; it used to be LGPL and then the license was changed with little fanfare. Commentators called this "weaponising the GPL". The goal is to get you to buy the enterprise version of MySQL.

I know MySQL way better but at a brief look I can see Postgres drivers are licensed more liberally.


👤 jtolmar
> 2. Is it possible to build a hybrid database schema? For example, SQLite+JSON?

Postgres and MySQL both have JSON support that can do fancy things which includes indexes, but it's extra work and I don't think it actually gains you anything. I'd recommend using regular old columns for anything you ever plan to want a query on. But if you have other stuff, it's perfectly reasonable to dump a bunch of random garbage into a JSON column. People use this for things like user settings, and are generally happy with it.

> 4. Is it possible to build a custom file format based on SQLite or hybrid one based on SQLite+JSON?

SQLite is already basically a file format. You can store JSON blobs in strings somewhere inside it if that's convenient.


👤 ausjke
If you ask this at HN most likely it will be PostgreSQL.

If you ask scripting language it got to be python or javascript here.

I use HN daily and noticed as a group, HN readers do in general have their preference on things like this.

By all means, MySQL's market share is much larger than PostgreSQL, and MySQL 8+ is very impressive. I compared both and eventually picked MySQL for my projects, even though I really _hate_ Oracle.


👤 partoa
1. Depends. If you have a lot of update queries, MySQL wins [https://eng.uber.com/mysql-migration/]. MySQL also has better replication and sharding support at the moment.

2. PostgreSQL and MYSQL also sport JSON... JSON functions can be a little tricky. Still, they work.

3. Depends on the structure of your XML. Might even possible to use RegEx to achieve this objective. XML to JSON is easier though. That can be saved in JSON fields in a DB.

4. You can build a custom file based on anything. The only catch, the SQLite client will only read a valid SQLite file. As such, you have to know what you're doing.

[Edit] Formating


👤 glax
For me postgresql's schema, EXPLAIN, Column modification and UUID. But most of the time I stick to sqlite, it satisfies most of the project with low footprint and require for installation restricted servers.

👤 evanelias
To echo a few other commenters here, a major point of consideration is your team's experience and broader technical network. Generally speaking, either Postgres or MySQL is likely fine for most applications. But you'll have a better time with whichever one you can most smoothly build on, operate, and hire for.

Separate from that, and with the disclaimer that my career has been heavily MySQL-oriented, although I try to provide a balanced perspective:

MySQL can be a good choice for OLTP-heavy workloads, such as a high volume of relatively straightforward queries. It also supports a very wide variety of replication options. In some aspects, MySQL is simpler to operate than pg. (This can be a point of contention, as MySQL is more idiosyncratic and this leads to some land-mines that are well-known to the community, but unclear from the manual alone.)

At this point there are also many hundreds (thousands?) of engineers and DBAs worldwide who have extensive experience working on massive-scale MySQL deployments. This may or may not be relevant to your project, depending what you're building and its eventual best-case size.

MySQL is admittedly less "batteries included". Although MySQL/InnoDB does support things like windowing functions, fulltext, geospatial, etc they're not necessarily a best-in-class implementation yet. In a MySQL shop, if you need OLAP, you're better off doing ETL into a separate column store or Hadoop cluster; if you need fulltext search, you're better off using a separate search index system like Solr; etc. Depending on your point-of-view, this approach is either "more UNIXy" or a major operational headache, or both :)

Meanwhile Postgres may be a better choice if you want to handle a bunch of things using a single piece of software: mixed OLTP + OLAP workloads, fulltext search, geospatial. It also has some interesting indexing features MySQL lacks, such as partial indexes, and ability to index an entire nested JSON structure in a single index.

Postgres gets major points for having a true open source development model, cleaner core code, better adherence to SQL standards, etc. Conceptually, it is very well-aligned with the textbook description of a relational database. Its adherents love it, sometimes fanatically so, which can be a mixed bag to outsiders. I sometimes feel parallels to the Mac user community in the late 90s/early 00s: it's probably a better choice for many use-cases, but personally I raise an eyebrow to suggestions that it's the universal best choice for all possible situations.


👤 mayankkaizen
I'm not a very tech savvy and know very little about databases. But I have read countless posts/comments over the years and everywhere everyone recommends postgres.

Again, I have no clue about other database systems, but postgres documentation is just awesome. I believe you can practically everything from its documentation.


👤 fouc
1. Both are great, they're both absolutely fantastic products!

PostgreSQL does have slightly more popularity nowadays, especially since they came out with JSONB support first and have more advanced features built-in.


👤 steffann
I have used PostgreSQL+JSONB and it works great. Doing indexes on JSON fields adds a lot of value.

👤 buboard
There was an Uber engineering post years ago about why they switched to mysql. Not sure how relevant it is today

https://eng.uber.com/mysql-migration/


👤 markuman123
Use what you're most comfort with, not what others are.

Postgres and MySQL (MariaDB/Percona) are both big players with a huge community.

Maybe consider to use Mariadb or Percona instead of MySQL to avoid oracle vendor lock in.


👤 LordHeini
I would also recommend Postgres for the features alone. Like the full text search.

One reason not mentioned often is the vast amount of plugins which can do lots of useful things.

Like PostGis or TimescaleDb which all work really well.


👤 pabl0rg
EXPLAIN is almost useless on mysql, but very helpful in postgres.

EXPLAIN is a very useful command once you have some degree of success (users).


👤 dragonwriter
1. Mostly, Postgres, as it's superior in most areas, but MySQL seems to still have advantages in replication (and, if you are hosting in AWS, there Aurora Serverless MySQL offering supports somethings, like their own Data API, that the equivalent Postgres offering doesn't yet which may tip the balance.

> Is it possible to build a hybrid database schema? For example, SQLite+JSON?

Yes, but many DBs will hold JSON natively and, conversely, lots of DB data fits nicely in JSON. What's the motivating use case?

> 4. Is it possible to build a custom file format based on SQLite or hybrid one based on SQLite+JSON?

Not sure what you are asking here; SQLite has its own file format.i suppose, as with many combined formats these days, you could stop that together with a folder of JSON documents and call it a custom format. What's the motivating use case?


👤 ReD_CoDE
Thank you all, I've tried to read all the comments. I think I have to clarify the questions I asked:

1. Is related to our upcoming startup, we chose MySQL + Redis, but some said use PostgreSQL

(However, my friend explained that MySQL before v8 was weak and after v8 has tried to come back to its salad days, this is why he chose MySQL)

2-3-4: In the industry, especially in the industry I'm active, Digital Built Environment Industry, there's a lot of invaluable schemas, especially XML(XMI) that I feel "these schemas convert structured data to semi-structured data" which is not good and efficient

For this reason, I'm thinking of SQL (Server) and SQLite (File) to preserve structured data

There's a vitally important schema in the built environment industry called Industry Foundation Classes (IFC) [1] which is based on EXPRESS/STEP and also has XML(XMI) schema too, but I want to convert the schema to SQL and SQLite

A friend did an invaluable job and after years introduced a correct method to convert it to SQL [2][3] but don't have any idea of developing the schema based on SQLite [4] that will accelerate the IFC development

[1] https://www.buildingsmart.org/about/what-is-openbim/ifc-intr...

[2] https://forums.buildingsmart.org/t/ifc-for-relational-databa...

[3] https://forums.buildingsmart.org/t/store-modify-and-retrieve...

[4] https://forums.buildingsmart.org/t/ifc-sqlite-project/1359


👤 zelly
Postgres is a long-term support stack at this point, like the JVM. Too many people rely on it. You need a very good reason to pick anything else.

👤 OldHand2018
For point 3 (Is it possible to convert XML(XMI) schema to SQLite schema automatically?), are you asking about a tool that takes an XML schema and gives you back an SQL schema?

I haven't heard of anything, but I would imagine it exists. I have, on the other hand, done a fair amount of XML to SQL conversion and have found that generating a good schema involves understanding the data so you know what to denormalize. You don't really want to have thousands of tables, each representing an XML tag. That would make your database very hard to use.

Trial and error, and a good knowledge of what kind of questions you are going to be asking your database to answer...


👤 pbhowmic
Answering Q1: MySQL over Postgres. Postgres has 2 critical things going against it - The query optimizer is a mess - It's implementation of secondary indexes is not cache friendly, so it can be slow

👤 karmakaze
As previously mentioned, I agree that there is much less difference between them than there used to be. Use what you know or have access to those who know.

Personally I prefer PostgreSQL and the main benefit of using MySQL is how easy it is to set up replication and operate in master-slave, master-master, or group replication.

For high write volumes, TokuDB and now RocksDB do work well, the former for sequential writes and the latter is LSM Tree based.


👤 teunispeters
1. PostgreSQL. Fast, reliable, (relatively) small (for some reason mysql got a lot bigger). Had a few historic problems with replication but I think that's been resolved. I dealt a lot with large scale large data sets and I wouldn't recommend mysql for that at all, except if they were very simple and relations were either not used or used minimally. There are better database systems than either, but I can't think of a less reliable one than MySQL. If you use PHP though, mysql is easier.

(I'm going to assume the rest of the questions are SQL related and not SQLite, which is a particular binary format)

2. why? SQL + json would make some sense, sometimes. I haven't needed it - I just do SQL. Others have better answers.

3. There are tools to do some XML to and from SQL. YMMV. I always rolled my own.

4. it's always possible. Is it worth it? I used CSV with embedded json for a long time with archival storage because it was reversably parseable and could scale time-based databases somewhat reasonably. There are smarter ways to do this too ...


👤 tyingq
You don't mention SQLite in question 1, but then include it in questions 2-4.

Did you maybe mean "SQL" and not "SQLite" here?


👤 anm89
Although there is already pretty unanimous consensus here, one side note is that administrating mysql is always a headache in my experience whereas postgres seems to just work.

Every time I start a new project and I see it's using mysql i sigh a little bit knowing that I'm going to end up spending hours fighting some dumb configuration issue.


👤 mighty_bander
For very simple usage, you won't see a lot of difference, but I would go with postgres. It has a lot of features that MySql doesn't, and isn't any harder to set up and use.

Point 2: as long as you also write software to deal with it, I guess so. Postgres has a JSON type. This question makes me want to ask what you are trying to accomplish.

Point 3: Yes, probably, and for all I know there's something out there to do it for you. I'm assuming basic usage; there is undoubtedly some weird corner of xmi that makes it a Turing-complete language or something equally ill-advised. So writing a schema converter for very basic stuff is probably quite easy, but really filling in all the junk is possibly less so.

4 appears to be a repeat of 2.


👤 dezzeus
I'm going to write a couple of things about the first point.

Several years ago, a knowledgable guy told me that the most compelling reason for choosing between PostgreSQL and MySQL was the expected I/O: "for read-intensive workloads (e.g. blogs), choose MySQL; for mixed workloads (e.g. forums), choose PostgreSQL".

But I honestly don't know if that may still be valid as of today.

Nowadays, I think that for basic things, it doesn't really matter; but for peculiar things, Postgres may have some advantages (both technically and not). Also keep in mind that, for some popular scopes, SQLite is likely everything you really need.


👤 flarg
In case it's not been said already, Oracle own mysql, use mariadb instead

👤 asdkhadsj
Question on this area. What have you found to be the best way to migrate an existing app to Postgres?

We plan to migrate to Postgres, but with every LOC written we're entrenching ourselves into Mysql. We all want Postgres, but at the same time I just can't see the transition happening with each passing day.

Thoughts?

edit: I should be clear too. I think our goal would be to not require a full rewrite spanning weeks. So an active translation layer, or replication.. or something tricky.. but I just don't foresee us being able to spend a month putting everything on hold so we can migrate.


👤 dotdi
My company recently looked at MySQL vs PostgreSQL. While we chose Postgres there is one thing that stood out in my research, I haven't found any comments on this:

We were looking at scalable, HA, possibly multi-DC workloads and such, and yes, relational DBs are not necessarily the right choice, but MySQL looked a bit better in this regard, especially for master-master replication. There are several high-profile teams that have huge production deployments running on MySQL (IIRC Uber and Github) and there are a lot of things to be learned from them.


👤 darkhorn
You can compare it from here https://sql-workbench.eu/dbms_comparison.html

👤 throw0101a
Is there a Galera-equivalent system available of Postgres? I.e., multi-master replication.

I know of BDR, but that doesn't seem to be publicly available for recent versions of Pg?


👤 thunderbong
My first goto database is always SQLite. Only when SQLite is unable to handle the requirements, for example, too many writes, do I look at PostgreSQL.

👤 Arkanosis
Regarding 1): choosing a software stack (or any part thereof) without a clear use case is a recipe for failure. What's the use case?

👤 viraptor
What does your app/framework support as a default store (what do the devs use and test against).

What features do you expect from the db? What versions of each database can you run?

Have you got an environment where one option is better supported than the other? (For example AWS aurora cluster and MySQL?)

Do you have an ops team with existing experience with either db?


👤 rayrrr
MySQL: Wait a sec, what do I use for this table, InnoDB or MyISAM? What are the tradeoffs again? Oh, right, do I want ACID transactions or fulltext search? Man, this tradeoff sucks.

PostgreSQL: Wait a sec, which schema (namespace) should I put this table in? Man, it's nice to have namespaces within a DB.


👤 donatj
MySQL or better yet AuroraDB if you want scalability. MySQL still has a sizable speed advantage, is easier to learn, maintain, and has a more vibrant world of tooling. MySQL is also easier to find experienced developers for. This has been my experience anyway.

👤 unixhero
Postgres hands down.

👤 kibibu
Are you confusing SQLite with SQL?

👤 CaptainZapp
Isn't MySQL owned by Oracle?

To each his own, but that would just about answer the question for me.


👤 dariojcravero
I'd say Postgres because of tools like https://hasura.io that let you automatically layer graphql on top of it and jsonb :)

👤 no_gravity
I really like the direct DB to filesystem mappings of the MyISAM engine in MySql. A database is simply a directory. A table is simply 3 files in that directory:

    - One for the structure
    - One for the data 
    - One for the indexes
You can simply copy a MyISAM database by copying the directory (while the MySql server is stopped).

MariaDB has forked the MyISAM engine and created the Aria engine. It is a crash safe version of MyISAM and is now their default engine for all system tables.

So in contrast to MySql, you can completely disable InnoDB in MariaDB. MySql does not support that anymore as their system tables are InnoDB.

Especially for non-concurrency workloads, MyISAM and Aria are super fast.

For these reasons, I really like MariaDB.


👤 cweagans
1. Whatever you feel most comfortable using and administering because the alternative is dangerous for production.

2. Probably.

3. Probably, but you'll probably have to write the code to do it.

4. Why bother?


👤 based2
MySQL or MariaDB?

👤 danmg
Materialized views, common table expressions, outer joins, are all missing in mysql and essential. The Postgresql's stored procedure language blows mysql out of the water, and it's possible to use different language backends.

SQLite has json extensions. You need to select them at compile time though, so you can't just assume they'll be there by default.


👤 bob1029
I have quite a bit of experience with #2.

We use a SQLite hybrid with JSON blobs (stored as regular TEXT columns) as the principal transactional store for our clients' business system state. The simplicity of this approach (which is definitely not for everyone) is what made it feasible for our small development team to manage the project as far as we have. If we were still using a big-boy SQL solution like PostgreSQL or SQL Server and managing all of the concerns specific to these dependencies (multiplied by the number of environments we are responsible for deploying these to), we probably would have been forced to close up shop by now.

Being able to deploy a single binary which brings every dependency (including the database) it needs along for the ride proved to be a lifesaver for us. The process before this involved spending an entire day getting SQL Server configured relative to the service accounts, and whatever ridiculous quirks may exist in each clients' specific AD. Today, it is literally a 30 second file extract, sc create, and it's running. We go from RDP into a blank box to a working production system within 5 minutes. This is also how we've been able to completely sidestep the docker question, since no one could justify the added complexity once we got the equation down to: a single binary folder + a bare-ass Windows Server 2016/19 box = good to go. This is also enabled by .NET Core's Self-Contained Deployments, so SQLite isn't a free lunch all on its own.

Again, the above is not for everyone. We are responsible for maintaining production services running across many client sites, and needed a solution that could scale not only in terms of our clients' requirements, but also in terms of our abilities to manage all of these environments all at once. Managing a hosted SQL service requires direct access to each of these environments which has tremendous overhead for us due to the email back-and-forth remote access game that has to be played each time.

If you are in an environment where its just a single instance of the production/test/qa/development stack, I would absolutely push for a hosted SQL option like PostgreSQL/MySQL/SQL Server. In these cases you have a single (or 4) points of touch, and it is very reasonable and consistent and all within your control. If you find yourself in a situation similar to mine, I would still advocate for the hosted SQL by default unless you are a very brave soul and understand the risks of going a bit off the beaten path. Disclaimer aside, there really are some interesting treasures out here. If you can afford to waste a month, try doing your entire persistence layer in SQLite with JSON serialized business models where appropriate. Assuming you budgeted for complete failure, worst case you walk away with a new perspective. Better case, you end up in the position I find myself in now.

Just don't forget to set PRAGMA journal_mode=WAL; if you decide to go on the adventure. It makes orders of magnitude difference in concurrent write throughput (SQL Server was never this fast for us).


👤 jstewartmobile
#1: depends on your scenario. Postgres is more like Oracle--there are types of query it can do natively that would have to be externally assisted with MySQL. MySQL is simpler, has all kinds of warts, but scales decently.

Here's why uber moved to MySQL: https://eng.uber.com/mysql-migration/

#2-4: My use of SQLite has always been very basic, so no idea. Per SQLite's author: "SQLite does not compete with client/server databases. SQLite competes with fopen()."


👤 namewasmypw
MySQL is simply garbage that shouldn't be used under any circumstance.

1. No transactional DDL 2. Implicit, confusing coercions 3. Terrible Unicode handling 4. the list goes on


👤 telaelit
MongoDB


👤 AdrianB1
1. For small databases (up to a few GB) my personal preference in the past 5 years was MS SQL (Express Edition, free), then PostgreSQL and last is MySQL. That means the answer to your specific question, PostgreSQL. Why? That is the order of features, ease of use and performance in my experience, other people may have seen different.

For larger databases I have no experience with either PostgreSQL or MySQL, I was spoiled to use MS SQL for up to Terabyte size and that's the largest I've ever worked with.


👤 kissgyorgy
You don't really benefit from MySQL speed if you are not Facebook, so I would always start with PostgreSQL.

👤 mdellavo
This entirely depends on the domain and data model, and unless you need SQL - mongodb should be a consideration too.