I have 22 years of experience in the area and I always had a huge difficulty when starting a new project: choosing a database. There are tens of thousands of options out there and every single day a new one appears here on HN. And still I can't choose a suitable one for my case.
This huge fragmentation in database is much worse than the fragmentation of code languages. This fragmentation makes it difficult: migrations, common knowledge base, integration, union and selection of data, and countless other issues.
I imagine that the minimum qualities of a database are: immutable, ease of use in any language, performance, ease of querying and manipulating data, easy to configure, maintain and scale, lightweight, among others. Every time I go to choose a db I always come across some critical problem.
Why isn't out there an ultimate general-purpose database? Which fits well for 90% of cases at least? I know my grumpy friends will tell that every case is a case. But if it had a widely used database, everyone's lives would be easier.
In your opinion, is there an "ultimate" database? If yes, which one? If not, why?
- SQL Databases (Mysql, Postgres, SQL Server, SQLite)
- JSON document stores (MongoDB, etc)
- In-memory key-value stores (redis, memcache)
- key-value stores (cassandra, etc)
- Graph DBs (neo4j, dgraph)
- Search DBs (elasticsearch, Lucene)
- Timeseries DBs (timescale)
I consider the SQL databases and the JSON document stores to be the general default databases that my applications will use 90%+ of the time. SQL databases require defining the schema ahead of time, while with document stores I can just dump data into the database and then figure out what's needed afterwards. At this point, I usually use postgres as my go-to database since it's open-source, well-supported and has some additional features like the ability to use json, but if you're more familiar with a different SQL db use that instead. Most of the other database categories are specialized for specific use-cases (in-memory key-values are great for queues and caching, time-series dbs are great for metrics)
EDIT: https://db-engines.com/en/ranking currently shows 381 unique databases, which is probably a bit of an undercount but not by much.
I realized I didn't answer the `Why there are so many databases` part of your question. Essentially, many companies outgrow the general-purpose databases and need databases optimized to their read-write level and data structures. Analytics tend to use columnar-databases once their data sets are in the TB+ range so that their queries return quickly.
There is. It's called PostgreSQL :)
I don't know about the other tens of thousands of databases out there, I certainly have never heard of that many. You have various NoSQL options from Redis to MongoDB, Cassandra, various object-oriented databases, column-oriented, document databases, etc., all of which fall into a small number of general groups that target specific applications and requirements. Unless you work on hobby projects and like to experiment I can't think of a reason to look at hundreds or thousands of obscure databases.
> I imagine that the minimum qualities of a database are: immutable, ease of use in any language, performance, ease of querying and manipulating data, easy to configure, maintain and scale, lightweight, among others. Every time I go to choose a db I always come across some critical problem.
An "immutable" database (read only) would have limited value. "Easy" describes your subjective experience and opinion, not a quality of a database. Maybe list some of the "critical problems" you encounter. Certainly most non-trivial applications use databases without dealing with "critical" problems.
I think you hear very little about it because ADB users see it as a "secret weapon" to crush their competitors with. I've done large ontology work (MESH and other health ontologies) and IoT work (keep several years of sensor readings for sensors in my house) and workflow systems (select interesting HN articles or jobs I want to apply to) and it has never let me down. I haven't run a real instance serving customers in the cloud though.
For the last few years every eng manager I have worked with has been a fan of
In the early 2000s I thought it overpromised and underdelivered and called it CrashGreSlow but after MySQL got bought by Oracle the pgsql team has worked hard to improve it I think it is great today. It supports all kinds of advanced features such as stored procs, full-text search, JSON equivalent fields, etc.
Not really, realistically you have a handful of options for every use case unless you're deliberately picking really obscure stuff for the sake of it.
> Why isn't out there an ultimate general-purpose database?
Because:
1. There's many good solutions that all work well especially if we're talking about general purpose.
2. Still not everything works for every use case.
3. It's healthy to have options to pick from.
Databases are tools like any others, and you choose them based on what you actually need for any specific time. In fact, many projects already use multiple databases - storing dynamic state in a RDBM, and static resources in the filesystem (which is absolutely a database).
Of course, there are other criteria involved, like e.g. the familiarity of the team with different database paradigms, but the core fact stays the same: database is a means to an end, which can be summarized as "finding a solution for persistent storage of some data".
Your actual question doesn't have an answer. No such thing as an "ultimate" database for the same reason we can't objectively agree on the "ultimate" car or the "ultimate" food, or programming language, or anything else. It always comes down to it depends on the requirements.
I’ve settled on:
-CockroachDB for relational -FoundationDB for key value
I’m currently looking for a full search text engine that supports pluggable storage so I can have FDB be the backing store.