1. PostgreSQL or MySQL? And why?
2. Is it possible to build a hybrid database schema? For example, SQLite+JSON?
3. Is it possible to convert XML(XMI) schema to SQLite schema automatically?
4. Is it possible to build a custom file format based on SQLite or hybrid one based on SQLite+JSON?
Postgres has some of the best documentation of any software product I’ve ever used. If someone wants to learn about SQL or databases, I always have to restrain myself from recommending that they just read the Postgres manual front to back. It’s comprehensive, it’s well written, it’s easy to navigate, and almost every explanation comes with several clear examples. It’s hard to overstate how valuable a property that is for a product as central to your architecture as your primary transactional database.
Your other questions are a bit too vague to be answerable. It’s possible to build anything you want, including custom file formats or schema converters, if you want to invest the time. You might have more useful responses if you explain the use case instead of the solution you already have in mind.
Answer: PostgreSQL > MySQL
Postgres implementation of SQL includes a few useful clauses that are useful for analytics that MySQL does not support.
It used to be that MySQL had no window functions, and that made it wholly inferior to Postgres when it came to analytics. However, it seems MySQL began supporting window functions two years ago, so that is no longer a reason to choose one over the other.
There are at least two features supported in Postgres that are not available in MySQL that I use often enough to care:
• SELECT DISTINCT ON
• FULL OUTER JOIN
Having these saves dev time. It is possible to implement either using MySQL, but your code will be more verbose, creating more opportunities for error.
If you care about analyzing the data for data-scientific purposes, you would be better off using Postgres. It isn't just the couple of extra clauses. It's also having more (useful) indexing choices and little choices like being able to use two pipes (||) to concatenate strings instead of CONCAT().
Regarding your conversion questions, it's common to have an app that communicates with several databases, but I don't know of any architecture that would allow ACID transactions in two engines simultaneously.
As far as converting data, it all depends on what your development preferences are. I work mostly in the Ruby ecosystem, and I can tell you that I've used ActiveRecord to a Postgres database, and exported the results into a SQLite file. I'm sure it would be possible to use an XML parser like Nokogiri to convert XML into a SQLite file as well. If you prefer another major programming language, the tools likely exist to accomplish the same thing.
Mariadb - If you want easy horizontal scalability with master-master replication go with mariadb.
Postgresql - If you want reliable and powerful sql-92 compliance and willing to work with master-slave go for postgresql. You can do master to master with postgre-Xl [1]. But still given the ease of use with which you can work with galera cluster I would not recommend postgresql for master-master.
Anyways for your other questions:
2. SQLite already supports json type [2] so its not SQLite+JSON its just SQLite.
3. Yes you can write a simple python script to do it.
4. Why to use a custom file format when SQLite has its own format with support for json.
2. I think you mean SQL+JSON, and the answer is PostgreSQL's JSONB columns, or dumping blobs into rows in MySQL (just keep your row sizes below 4kb).
3. Possible? Yes. A good idea? Almost never. Trees and relations are quite mathematically different, and you will need to specify quite a bit of the mapping to make it come out in a way you would actually want to work with.
4. Not really...generally you would have a SQLite database. If you need stuff that doesn't go in well as a blob, you would put it in a file and include a reference in the database.
I know MySQL way better but at a brief look I can see Postgres drivers are licensed more liberally.
Postgres and MySQL both have JSON support that can do fancy things which includes indexes, but it's extra work and I don't think it actually gains you anything. I'd recommend using regular old columns for anything you ever plan to want a query on. But if you have other stuff, it's perfectly reasonable to dump a bunch of random garbage into a JSON column. People use this for things like user settings, and are generally happy with it.
> 4. Is it possible to build a custom file format based on SQLite or hybrid one based on SQLite+JSON?
SQLite is already basically a file format. You can store JSON blobs in strings somewhere inside it if that's convenient.
If you ask scripting language it got to be python or javascript here.
I use HN daily and noticed as a group, HN readers do in general have their preference on things like this.
By all means, MySQL's market share is much larger than PostgreSQL, and MySQL 8+ is very impressive. I compared both and eventually picked MySQL for my projects, even though I really _hate_ Oracle.
2. PostgreSQL and MYSQL also sport JSON... JSON functions can be a little tricky. Still, they work.
3. Depends on the structure of your XML. Might even possible to use RegEx to achieve this objective. XML to JSON is easier though. That can be saved in JSON fields in a DB.
4. You can build a custom file based on anything. The only catch, the SQLite client will only read a valid SQLite file. As such, you have to know what you're doing.
[Edit] Formating
Separate from that, and with the disclaimer that my career has been heavily MySQL-oriented, although I try to provide a balanced perspective:
MySQL can be a good choice for OLTP-heavy workloads, such as a high volume of relatively straightforward queries. It also supports a very wide variety of replication options. In some aspects, MySQL is simpler to operate than pg. (This can be a point of contention, as MySQL is more idiosyncratic and this leads to some land-mines that are well-known to the community, but unclear from the manual alone.)
At this point there are also many hundreds (thousands?) of engineers and DBAs worldwide who have extensive experience working on massive-scale MySQL deployments. This may or may not be relevant to your project, depending what you're building and its eventual best-case size.
MySQL is admittedly less "batteries included". Although MySQL/InnoDB does support things like windowing functions, fulltext, geospatial, etc they're not necessarily a best-in-class implementation yet. In a MySQL shop, if you need OLAP, you're better off doing ETL into a separate column store or Hadoop cluster; if you need fulltext search, you're better off using a separate search index system like Solr; etc. Depending on your point-of-view, this approach is either "more UNIXy" or a major operational headache, or both :)
Meanwhile Postgres may be a better choice if you want to handle a bunch of things using a single piece of software: mixed OLTP + OLAP workloads, fulltext search, geospatial. It also has some interesting indexing features MySQL lacks, such as partial indexes, and ability to index an entire nested JSON structure in a single index.
Postgres gets major points for having a true open source development model, cleaner core code, better adherence to SQL standards, etc. Conceptually, it is very well-aligned with the textbook description of a relational database. Its adherents love it, sometimes fanatically so, which can be a mixed bag to outsiders. I sometimes feel parallels to the Mac user community in the late 90s/early 00s: it's probably a better choice for many use-cases, but personally I raise an eyebrow to suggestions that it's the universal best choice for all possible situations.
Again, I have no clue about other database systems, but postgres documentation is just awesome. I believe you can practically everything from its documentation.
PostgreSQL does have slightly more popularity nowadays, especially since they came out with JSONB support first and have more advanced features built-in.
Postgres and MySQL (MariaDB/Percona) are both big players with a huge community.
Maybe consider to use Mariadb or Percona instead of MySQL to avoid oracle vendor lock in.
One reason not mentioned often is the vast amount of plugins which can do lots of useful things.
Like PostGis or TimescaleDb which all work really well.
EXPLAIN is a very useful command once you have some degree of success (users).
> Is it possible to build a hybrid database schema? For example, SQLite+JSON?
Yes, but many DBs will hold JSON natively and, conversely, lots of DB data fits nicely in JSON. What's the motivating use case?
> 4. Is it possible to build a custom file format based on SQLite or hybrid one based on SQLite+JSON?
Not sure what you are asking here; SQLite has its own file format.i suppose, as with many combined formats these days, you could stop that together with a folder of JSON documents and call it a custom format. What's the motivating use case?
1. Is related to our upcoming startup, we chose MySQL + Redis, but some said use PostgreSQL
(However, my friend explained that MySQL before v8 was weak and after v8 has tried to come back to its salad days, this is why he chose MySQL)
2-3-4: In the industry, especially in the industry I'm active, Digital Built Environment Industry, there's a lot of invaluable schemas, especially XML(XMI) that I feel "these schemas convert structured data to semi-structured data" which is not good and efficient
For this reason, I'm thinking of SQL (Server) and SQLite (File) to preserve structured data
There's a vitally important schema in the built environment industry called Industry Foundation Classes (IFC) [1] which is based on EXPRESS/STEP and also has XML(XMI) schema too, but I want to convert the schema to SQL and SQLite
A friend did an invaluable job and after years introduced a correct method to convert it to SQL [2][3] but don't have any idea of developing the schema based on SQLite [4] that will accelerate the IFC development
[1] https://www.buildingsmart.org/about/what-is-openbim/ifc-intr...
[2] https://forums.buildingsmart.org/t/ifc-for-relational-databa...
[3] https://forums.buildingsmart.org/t/store-modify-and-retrieve...
[4] https://forums.buildingsmart.org/t/ifc-sqlite-project/1359
I haven't heard of anything, but I would imagine it exists. I have, on the other hand, done a fair amount of XML to SQL conversion and have found that generating a good schema involves understanding the data so you know what to denormalize. You don't really want to have thousands of tables, each representing an XML tag. That would make your database very hard to use.
Trial and error, and a good knowledge of what kind of questions you are going to be asking your database to answer...
Personally I prefer PostgreSQL and the main benefit of using MySQL is how easy it is to set up replication and operate in master-slave, master-master, or group replication.
For high write volumes, TokuDB and now RocksDB do work well, the former for sequential writes and the latter is LSM Tree based.
(I'm going to assume the rest of the questions are SQL related and not SQLite, which is a particular binary format)
2. why? SQL + json would make some sense, sometimes. I haven't needed it - I just do SQL. Others have better answers.
3. There are tools to do some XML to and from SQL. YMMV. I always rolled my own.
4. it's always possible. Is it worth it? I used CSV with embedded json for a long time with archival storage because it was reversably parseable and could scale time-based databases somewhat reasonably. There are smarter ways to do this too ...
Did you maybe mean "SQL" and not "SQLite" here?
Every time I start a new project and I see it's using mysql i sigh a little bit knowing that I'm going to end up spending hours fighting some dumb configuration issue.
Point 2: as long as you also write software to deal with it, I guess so. Postgres has a JSON type. This question makes me want to ask what you are trying to accomplish.
Point 3: Yes, probably, and for all I know there's something out there to do it for you. I'm assuming basic usage; there is undoubtedly some weird corner of xmi that makes it a Turing-complete language or something equally ill-advised. So writing a schema converter for very basic stuff is probably quite easy, but really filling in all the junk is possibly less so.
4 appears to be a repeat of 2.
Several years ago, a knowledgable guy told me that the most compelling reason for choosing between PostgreSQL and MySQL was the expected I/O: "for read-intensive workloads (e.g. blogs), choose MySQL; for mixed workloads (e.g. forums), choose PostgreSQL".
But I honestly don't know if that may still be valid as of today.
Nowadays, I think that for basic things, it doesn't really matter; but for peculiar things, Postgres may have some advantages (both technically and not). Also keep in mind that, for some popular scopes, SQLite is likely everything you really need.
We plan to migrate to Postgres, but with every LOC written we're entrenching ourselves into Mysql. We all want Postgres, but at the same time I just can't see the transition happening with each passing day.
Thoughts?
edit: I should be clear too. I think our goal would be to not require a full rewrite spanning weeks. So an active translation layer, or replication.. or something tricky.. but I just don't foresee us being able to spend a month putting everything on hold so we can migrate.
We were looking at scalable, HA, possibly multi-DC workloads and such, and yes, relational DBs are not necessarily the right choice, but MySQL looked a bit better in this regard, especially for master-master replication. There are several high-profile teams that have huge production deployments running on MySQL (IIRC Uber and Github) and there are a lot of things to be learned from them.
I know of BDR, but that doesn't seem to be publicly available for recent versions of Pg?
What features do you expect from the db? What versions of each database can you run?
Have you got an environment where one option is better supported than the other? (For example AWS aurora cluster and MySQL?)
Do you have an ops team with existing experience with either db?
PostgreSQL: Wait a sec, which schema (namespace) should I put this table in? Man, it's nice to have namespaces within a DB.
To each his own, but that would just about answer the question for me.
- One for the structure
- One for the data
- One for the indexes
You can simply copy a MyISAM database by copying the directory (while the MySql server is stopped).MariaDB has forked the MyISAM engine and created the Aria engine. It is a crash safe version of MyISAM and is now their default engine for all system tables.
So in contrast to MySql, you can completely disable InnoDB in MariaDB. MySql does not support that anymore as their system tables are InnoDB.
Especially for non-concurrency workloads, MyISAM and Aria are super fast.
For these reasons, I really like MariaDB.
2. Probably.
3. Probably, but you'll probably have to write the code to do it.
4. Why bother?
SQLite has json extensions. You need to select them at compile time though, so you can't just assume they'll be there by default.
We use a SQLite hybrid with JSON blobs (stored as regular TEXT columns) as the principal transactional store for our clients' business system state. The simplicity of this approach (which is definitely not for everyone) is what made it feasible for our small development team to manage the project as far as we have. If we were still using a big-boy SQL solution like PostgreSQL or SQL Server and managing all of the concerns specific to these dependencies (multiplied by the number of environments we are responsible for deploying these to), we probably would have been forced to close up shop by now.
Being able to deploy a single binary which brings every dependency (including the database) it needs along for the ride proved to be a lifesaver for us. The process before this involved spending an entire day getting SQL Server configured relative to the service accounts, and whatever ridiculous quirks may exist in each clients' specific AD. Today, it is literally a 30 second file extract, sc create, and it's running. We go from RDP into a blank box to a working production system within 5 minutes. This is also how we've been able to completely sidestep the docker question, since no one could justify the added complexity once we got the equation down to: a single binary folder + a bare-ass Windows Server 2016/19 box = good to go. This is also enabled by .NET Core's Self-Contained Deployments, so SQLite isn't a free lunch all on its own.
Again, the above is not for everyone. We are responsible for maintaining production services running across many client sites, and needed a solution that could scale not only in terms of our clients' requirements, but also in terms of our abilities to manage all of these environments all at once. Managing a hosted SQL service requires direct access to each of these environments which has tremendous overhead for us due to the email back-and-forth remote access game that has to be played each time.
If you are in an environment where its just a single instance of the production/test/qa/development stack, I would absolutely push for a hosted SQL option like PostgreSQL/MySQL/SQL Server. In these cases you have a single (or 4) points of touch, and it is very reasonable and consistent and all within your control. If you find yourself in a situation similar to mine, I would still advocate for the hosted SQL by default unless you are a very brave soul and understand the risks of going a bit off the beaten path. Disclaimer aside, there really are some interesting treasures out here. If you can afford to waste a month, try doing your entire persistence layer in SQLite with JSON serialized business models where appropriate. Assuming you budgeted for complete failure, worst case you walk away with a new perspective. Better case, you end up in the position I find myself in now.
Just don't forget to set PRAGMA journal_mode=WAL; if you decide to go on the adventure. It makes orders of magnitude difference in concurrent write throughput (SQL Server was never this fast for us).
Here's why uber moved to MySQL: https://eng.uber.com/mysql-migration/
#2-4: My use of SQLite has always been very basic, so no idea. Per SQLite's author: "SQLite does not compete with client/server databases. SQLite competes with fopen()."
1. No transactional DDL 2. Implicit, confusing coercions 3. Terrible Unicode handling 4. the list goes on
For larger databases I have no experience with either PostgreSQL or MySQL, I was spoiled to use MS SQL for up to Terabyte size and that's the largest I've ever worked with.