HACKER Q&A
📣 eatonphil

What could a modern database do that PostgreSQL and MySQL can't


If there were a general-purpose OLTP SQL database that was developed today, what features or design decisions would it pick that PostgreSQL and MySQL cannot adapt for historic reasons?

Or put another way, what are some cutting edge OLTP database techniques/features/architectures that PostgreSQL and MySQL would have a hard time supporting?


  👤 GeneralMayhem Accepted Answer ✓
Lots of people are citing cutting-edge bells and whistles for advanced query plans and data types, but after working at $BIGCO and seeing what exists there (and has existed for years), I get frustrated by the lack of ergonomic and operational essentials in the public-cloud options.

1. Horizontal scaling. PG can do sharding or replication, but that's not the same thing. Manual resharding is something that, once you've experienced it, you don't want to do it ever again, especially when things like Spanner/Cockroach exist. Database-level liquid sharding is such a dramatically superior solution that it makes any system that depends on an enumerated set of master instances seem completely obsolete.

2. Strong commit timestamps, again a la Spanner/Cockroach. Globally-ordered commit times and read snapshots aren't something you'll need in every schema, but when you do, they're magical - suddenly, you can use your OLTP system for strong lease assignment (conditioning other transactions on lease validity), you can construct arbitrary application-level transaction semantics (non-transactional read followed by transactional compare-and-swap is pretty powerful in some scenarios), and all sorts of other things.

3. Interface abstraction. Databases that ship language-support drivers should also ship a fake in-memory implementation for testing purposes that supports the same interfaces and options. There's no reason why I should have to start a subprocess and emulate a network connection to tell whether a query functions properly or not, or whether I've set isolation properties correctly for a pipeline.

4. Declarative schemas, and a well-defined update process (that isn't human-written DDL), are essential at any sort of organizational scale. Every time I see an "ALTER TABLE" statement checked into a repository, I tear my hair out.

5. Queue support, where message publishing/consumption is transactional with other read/write operations. You can always build a queue with a time-keyed table, but it's kind of a pain in the ass to get all the edge cases right. It's much better as a first-class concept supporting automatic sharding, retries, batching, and bounded-out-of-order delivery with a global cursor.


👤 Chyzwar
1. High performance read/write of Scylla/Cassandra with high availability[1]. It has some limitations for OLTP workloads and require careful planning. Postgres without Citus is not really HA and even then companies would often invent custom sharding/cluster management system.

2. Powerful graph query language like Cypher. It might not perform well in real life, but my personal experience left me amazed[2]. There is a number of issues with SQL that could be addressed[3], but current standard is way too much prevalent.

3. Zero impedance mismatch between database and application representation. In database like Smalltalk GemStone it is really seamless experience for developer to write database code and application code[4]. To some extent, MongoDB success can be attributed to this aspect.

4. Datomic temporal capabilities[5]. It is hard to maintain temporal tables in postgres. There are some use cases where you really want query in point of time. Strictly not an OLTP feature, but I can see this be usefully in many scenarios.

  [1] https://www.youtube.com/watch?v=Fo1dPRqbF-Q
  [2] https://www.youtube.com/watch?v=pMjwgKqMzi8&t=726s
  [3] https://www.edgedb.com/blog/we-can-do-better-than-sql
  [4] https://www.youtube.com/watch?v=EyBkLbNlzbM
  [5] https://www.youtube.com/watch?v=7lm3K8zVOdY
Edited: independence -> impedance

👤 grobbie
CockroachDB is getting a lot of interest these days.

It has broad PGSQL language (and also wire I think) compatibility yet has a clustered peer architecture well suited to running in a dynamic environment like cloud or k8s. Nodes can join dynamically and it can survive them leaving dynamically as long as there's a quorum. Data is distributed across the nodes without administrator needing to make any shard rebalance type interventions.

PGSQL is designed for deployment as a single server with replica servers for HA. It's not really designed for horizontal scalability like Cockroach. You can do it - the foreign data wrappers feature and table partitioning can give you poor man's scale out. Or you can use Citus which won itself a FOSS license earlier this year. And there are other Foss and proprietary approaches too.

MySQL is similar - you can do it, like with their recent router feature, but it has been retrofitted, and it's not as fluid as Cockroach. IIRC MySQL router is similar in configuration to Galera - that is, a static config file containing a list of cluster members.

Listen I'm sure that the design approach of Cockroach could be retrofitted to PGSQL and MySQL, but I'm pretty sure that doing a good job of it would be a lot of work.

So in answer to your question, I'm not sure that there's all that much RDBMS can't be made to do. Geospatial, Graph, Timeseries, GPU acceleration. Postgres has it all and often the new stuff comes to Postgres first.

By the way I love MySQL and PostgreSQL, and the amazing extensions for PGSQL make it extra awesome. Many are super mature and make pgsql perfect for many many diverse use cases.

For XXL use cases though, CockroachDB is taking a very interesting new path and I think it's worth watching.


👤 rkarthik007
When building YugabyteDB, we reuse the "upper half" of PostgreSQL just like Amazon Aurora PostgreSQL and hence support most of the functionality in PG (including advanced ones like triggers, stored procedures, full suite of indexes like partial/expression/function indexes, extensions, etc).

We think a lot about this exact question. Here are some of the things YugabyteDB can do as a "modern database" that a PostgreSQL/MySQL cannot (or will struggle to):

* High availability with resilience / zero data loss on failures and upgrades. This is because of the inherent architecture, whereas with traditional leader-follower replication you could lose data and with solutions like Patroni, you can lose availability / optimal utilization of the cluster resources.

* Scaling the database. This includes scaling transactions, connections and data sets *without* complicating the app (like having to read from replicas some times and from the primary other times depending on the query). Scaling connections is also important for lambdas/functions style apps in the cloud, as they could all try to connect to the DB in a short burst.

* Replicating data across regions. Use cases like geo-partitioning, multi-region sync replication to tolerate a region failure without compromising ACID properties. Some folks think this is far fetched - its not. Examples: the recent fire on an OVH datacenter and the Texas snowstorm both caused regional outages.

* Built-in async replication. Typically, async replication of data is "external" to DBs like PG and MySQL. In YugabyteDB, since replication is a first-class feature, it is supported out of the box.

* Follower reads / reads from nearest region with programmatic bounds. So read stale data for a particular query from the local region if the data is no more than x seconds old.

* We recently enhanced the JDBC driver to be cluster aware, eliminating the need to maintain an external load balancer because each node of the cluster is "aware" of the other nodes at all times - including node failures / add / remove / etc.

* Finally, we give users control over how data is distributed across nodes - for example, do you want to preserve ASC/DESC ordering of the PKs or use a HASH based distribution of data.

There are a few others, but this should give an idea.

(Disclosure: I am the cto/co-founder of Yugabyte)


👤 masa331
This title kind of implies that PG or MySQL aren't modern or modern enough which i think is is very wrong. Look what they bring in every update. I think they are quite modern!

👤 lifepillar
One thing PostgreSQL would likely not be able to adapt to, at least without significant effort, is dropping MVCC in favor of more traditional locking protocols.

While MVCC is fashionable nowadays, and more or less every platform offers it at least as an option, my experience, and also opinions I have heard from people using SQL Server and similar platforms professionally, is that for true OLTP at least, good ol’ locking-based protocols in practice outperform MVCC-based protocols (when transactions are well programmed).

The “inconvenient truth” [0] that maintaining multiple versions of records badly affects performance might in the future make MVCC less appealing. There’s ongoing research, such as [0], to improve things, but it’s not clear to me at this point that MVCC is a winning idea.

[0] https://dl.acm.org/doi/10.1145/3448016.3452783


👤 estebarb
Materialize (https://materialize.com/) is capable of performing SQL operations over a stream, using incremental calculation based on differential dataflow.

CockroachDB is a distributed SQL database, with strong consistency. I think that Yugabyte is similar.

Support for realtime changes, including queries over those changes, is better in other databases, like RethinkDB, OVSDB or Firebase.

Relational is not always the best way to store your data. Sometimes a graph model is better. Or for full text search something specialized is better. In that sense dgraph or elastic search may be better OLTP databases in some cases.

Columnar storage, like used in Vertical or BigQuery have several advantages in processing and data compression. But implementing it in PG or MySQL I think that would require almost a full rewrite.


👤 zinclozenge
1. Automatic backups to an S3 compatible storage, out of the box.

2. Progressive automatic scalability. As load increases or storage runs out, the DB should be able to automatically. NewSQL databases do this already.

3. Tiered storage.

4. Support streaming, stream processing, in memory data structures, etc. I feel like this is one of those weird things but I keep wishing this were possible when I work on side projects or startups. I don't want to have to spin up mysql/postgres, kafka/pulsar, flink/whatever/stream processing, redis, etc separately because that would be prohibitively expense when you're just getting off the ground unless you have VC money. So I find myself wishing I could deploy something that would do all of those things, that could also scale somewhat until the need to break everything into their own infrastructure, and if it was wire compatible with popular projects then that would be perfect. Will it happen? I doubt it, but it would be lovely assuming it worked well.


👤 jake_morrison
VoltDB is a good example of rethinking relational databases for for modern technology. Traditional databases assume that everything is stored on disk, with a bit of RAM available for cacheing. VoltDB assumes that everything is stored in RAM first, using checkpointing to disk and replication to get durability.

https://www.usenix.org/legacy/events/lisa11/tech/slides/ston...

Have a look at Michael Stonebraker, he's a database genius, and keeps starting "NewSQL" companies which use the relational model, but specialized for different applications, e.g. column stores.

https://en.wikipedia.org/wiki/Michael_Stonebraker


👤 thom
I realise I'm straying a bit from core OLTP stuff but also I think removing the historical need for a separate OLAP database is something modern systems should address. Off the top of my head:

1) Incremental materialized view maintenance, à la Materialize (bonus points for supporting even gnarly bits of SQL like window functions).

2) Really ergonomic and scalable pub/sub of some sort, à la RethinkDB.

3) Fine tuned control over query plans if I want it.

4) Probably very deep Apache Arrow integration.


👤 nicoburns
Subscriptions. Databases like Firebase will automatically push changes to query results down to clients. You can add this to Postgres with tools like Hasura, but it's poll based and not very efficient. It's a super-useful feature for keeping UIs in sync with database state.

👤 reilly3000
At least speaking for Postgres, it _is_ modern in that it’s very actively developed with all kinds of innovations layered on top of the core system. It can be a timeseries db, a real-time db, horizontally sharded, a self-contained REST/graphql API server, a graph db, an interface to many APIs via foreign data wrappers, and much more. In itself it has many analytics functions, and most cloud OLAP dbs use its syntax over a columnar storage scheme. I’m afraid most would-be database projects would be cloned by a Postgres extension before they could make a dent in its share.

👤 OrvalWintermute
Postgresql can do everything I need, but not everything I want, and apologies to mysql fans, but I refuse to touch it.

And, I think over the years they have been such an awesome community that has helped me irregularly when I popped into IRC every year or two for a problem I was having. https://github.com/davidfetter ; https://github.com/RhodiumToad ; https://github.com/bmomjian - you are great peeps!

What I would like to see are more security usability enhancements over and above what you find in https://www.crunchydata.com/products/hardened-postgres/

Security Enhancements List:

Attack Resistance

Better protections for the DBMS itself in assume compromised scenarios, or malicious users

Customized implementations per compliance/security regime

Accelerators for various security configurations

Self-audit, self-secure

Functionality Enhancements List:

Continuous Sync Postgres to other serious DB (not a one-way sync, not a DB move)


👤 aserafini
One thing I find really interesting from Redis that I would love in a relational database is the concept of ‘blocking’ queries, which block until they get a result.

For example: https://redis.io/commands/BLPOP

If I could do something like:

    Select id, msg
    From job
    Where id > 1224
And have that query block until there actually was a job with an id > 1224, it would open up some interesting use cases.

👤 bigblind
It's always been insane to me that databases aren't better at migrations. There are so many blog posts out there on how to do zero downtime migrations, and I feel like a lot more of that could probably be abstracted away.

👤 zamalek
Write-expensive, read-cheap[1]: the exact opposite of the mentioned.

Once your developers have completed an iteration, your DB will see the same queries over and over again (if it doesn't, then it should be an OLAP aggregate). These databases optimize for writes, and defer complexity to reads and, considering that you could see millions more reads than writes, makes no sense whatsoever.

[1]: https://github.com/mit-pdos/noria


👤 klysm
I think automatically maintaining materialized views isn’t well supported for a lot of reasons. But I also don’t think that it’s impossible for something like Postgres to support it more in the future.

👤 cafxx
- Blurring (safely!) the line between database and the app using it: transparently switch between bringing data to compute, or compute to data.

- Comprehensive auto tuning: automatic index creation, automatic schema tuning, dynamically switching between column/row-oriented, etc. User specifies SLOs, database does the rest.

- Deeply related to the previous two points: perfect horizontal scalability

- Configurable per-query ACID properties (e.g. delayed indexing)

- All of the above while maintaining complex SQL features like arbitrary joins, large transactions, triggers, etc.

Sure, some of these are in some form in some existing databases. But none offer all of them.


👤 zz865
My biggest problem with databases is always versioning. IE renaming a column will break old clients. If there was a way you could have multiple schema versions so you could upgrade database then clients later it would be the best.

EDIT: yes thanks for the comments, views and creating and API layers and adding instead of subtracting do all work, but I believe they're all workarounds for the underlying problem. Fixing versioning would make everyone's lives easier.


👤 jasonhansel
Better consistency and transaction isolation models. The ANSI transaction isolation levels are ambiguous and confusing, and the different consistency models supported by NoSQL databases are often poorly-specified. A database with clearer abstractions around consistency and isolation would be a big win for applications where correctness and scalability are both high priorities.

👤 shekispeaks
Automatic indexes. Adding indexes is a guessing game. It's a bit of abstraction leakage. Imagine a product engineer did not have to think about how the data is laid out on disk

👤 pornel
I'm missing something as easy to deploy as SQLite, but without its performance problems (like the stop-the-world write lock). Basically, I'd love an embeddable Postgres with a single-file storage.

👤 svcrunch
Support for vector indexes and vector similarity queries (dense, and sparse-dense).

I believe PostgreSQL and MySQL will support this within a few years.


👤 truculent
I would be interested in more sophisticated type definitions. That is, algebraic data types.

One can achieve this to a degree by storing the data as JSON, but it would be nice to be able to remove the chance of introducing errors when converting to/from JSON.


👤 dynamite-ready
CouchDB.

- Very smart clustering/redundancy features

- Can be queried over HTTP

- Has a great API

- Counterpart project (Pouch DB) can work on the client

- Queries can be cached (it's HTTP)

- Said to be 'schemaless' but can be structured

The project is always overlooked, but I still feel it's one of the better NoSQL datastores.


👤 vaughan
1. Streaming SQL

Real-time queries should be the default. Some kind of smart query invalidation (similar to incremental view maintenance) as a result of DML statements.

2. Can run in a web browser / mobile phone

Your client-side cache is usually a messy denormalization of your SQL database. For offline/local-first apps you essentially need to run your entire backend API in your browser. So you should be running an SQL database in the browser. For this to be seamless you need your database targetting JS during design stage.

3. Syncing

To support offline/local/mobile use cases, your database needs to be able to sync changes efficiently.

4. Logical-first data schema

You should be able to give your db a logical schema instead of a physical one, and the database can take care of the necessary denormalizations for performance reasons automatically.

5. Lower-level api for query planner

Instead of having SQL as your only interface, there should be a way to interact with the query tree and the query plan directly. People are always creating new query languages hacked over SQL (like Google's Logica).

6. Graph-based queries

Support a query language like Cypher. Provide better solutions for tree-based data structures - the queries get too unwieldy today.


👤 sandGorgon
search - really the holy grail. Combine full text search (with tf-idf or bm-25 support. Not the kind that postgres/mysql does).

This is a tricky engineering problem - have two kinds of indexes in the same database. But disk space is cheap. Network is expensive (especially if you're on AWS).


👤 gigatexal
Lots of talk of CockroachDB in this thread but no mention of vitess.io? Shame. While I don’t care for MySQL the folks at planetscaleDB (hosted vitess) are doing amazing work. So are the CRDB folks.

👤 jen20
I’m somewhat surprised that “zero downtime failover” does not feature more prominently in responses to this - that would be the biggest thing I’d be looking for in a database for OLTP.

👤 Glyptodon
This is pretty stupid, but I'd be happy to see a SQL dialect where a statement can start with a FROM clause. This is also pretty stupid, but there are also some cases where I wouldn't hate some sort of "JOIN AS CHILD" feature along with results that aren't 100% tabular.

👤 kweinber
Automatic versioning of data and CDC (change data capture) broadcast of data deltas to external systems.

👤 ww520
What is modern? If we think how databases run better in the modern hardware, the trend goes toward plenty of RAM and fast disk (SSD/memory-disk) that doesn't have the disk seek problem of the old days. Modern databases being developed today aim for those environment.

👤 bullen
The problem with SQL databases is that the earliest where designed before the internet became widespread.

So they have synchronized clients that do not use HTTP or JSON. You need asynchronous clients that do not consume thread context switches.

Also you always want to replicate all data in more than one location in real-time because 100% read uptime is really easy to provide, while 100% write uptime is a complex nightmare.

I made this database after using MySQL, Oracle and Postgres for 10 years (it's a 2000 line replacement for my needs that these databases filled, a very narrow subset of the features you'll find in legacy databases): http://root.rupy.se


👤 talolard
I think AWS Aurora is the most prominent thing that comes to mind. The key value proposition is the separation of storage from compute. That unlocks many promising features, “serverless” , a better parallelization story of OLAP queries etc

👤 dheera
MongoDB: Stashing unstructured JSON data that you don't really know how you might want to query later.

Also, getting up and running with an investor demo ASAP with zero technical fuss, because you have a startup idea but you're broke and can't pay your next month's rent unless you either (A) finish this demo and get that investor money next week, or (B) quit working on your idea and take the Google offer. (Yes, I've actually been there.)


👤 travisgriggs
Late to the discussion. As others have pointed out, PostgreSQL is plenty modern and capable. Whatever a “modern” database did that Postgres didn’t, would show up in Postgres soon after said “new” technology had stood the public test of “is it worth it.”

There’s a variant of the question that interests me though. It’s said that much of large scale software now days is built the way we build our cities. One layer on top of another on top of another. Even though a city like London is large and capable and full featured, it has years of history below it. And sometimes that slow evolution even complicates things. Some might argue, that if we had unlimited resources, we could go find a nice new piece of virgin ground and build a New London, that could be simplified and not contend with its historical baggage.

So by analogue, if we could respin a fresh new simplified/unified/rationalized version of postgresql today, what would it look like?


👤 spicybright
I'm extremely out of my expertise here, but I'll see if I can spark some conversation.

While possible with older SQL's through your own code, distributed sharding and keeping multiple databases in sync I would think be useful at a DB level vs user code level.

You can certainly argue that shouldn't be part of the database software though.


👤 Kinrany
A better query language. It should provide control over execution for performance-sensitive queries.

👤 benjaminjosephw
PostgreSQL and even things like SQLite are fantastic tools for programmers with time to invest in learning them an their quirks. What a modern database could do that they can't is be comprehensible to a power-user. Modelling and querying data could be far simpler than SQL makes it.

A modern database language/interface would be far more useful to most than general purpose programming languages are. We can do better for end-users than forcing them to grapple with archaic `SELECT` statements. The crowd that have invested in learning the language already will disagree, of course, but far too many newbies self-select out of that group because it seems too complicated to them. It doesn't have to be so.


👤 yencabulator
I would love to see support for compute near data ("stored procedures") that is compatible with today's practices: real programming languages, version control, testing, etc.

Calvin (http://cs-www.cs.yale.edu/homes/dna/papers/calvin-sigmod12.p...) and VoltDB are examples of different architectures that trade off the ability to do SQL-style interactive transactions (`BEGIN`+run code client side+`COMMIT`), forcing everything through predefined procedures. Combining that with better stored procedures is interesting.


👤 yencabulator
One thing Firestore's JS client library does that is very interesting is that mutations by the in-browser client show up in persistent queries, even before they sync with the server, even offline. You can tell which changes are safe on the server and which not by metadata.

This greatly improves latency and simplifies code. Trying to manually merge the mutations into all of your queries would be very awkward.

Also, the whole concept of browser clients interacting directly with the database server is darn interesting. Combine that with better support for stored procedures etc for actually enforcing business logic (somewhat made possible by Firestore security rules, but very awkward).


👤 laurencerowe
More from a document database rather than an OLTP standpoint, but working with JSON data in Postgres I'd really like to see:

* Exclusion Constraints support for GIN indexes (which supports Arrays and JSONB.) This would allow unique keys on array elements without having to use triggers to put the individual array elements in another table.

* Array element foreign keys, which I think is a subset of the Inclusion Constraints proposal (which would additionally allow constraints into ranges.)

* A faster procedural language. While plv8 is an improvement over plpgsql, it still seems substantially slower than running the equivalent in a separate NodeJS process. (Possibly just build issues for me.)


👤 jsperson
I’m surprised no one has mentioned Snowflake. The biggest feature here is no worries. It scales in every dimension, is quite stable, and has few knobs to turn. I’m on a Redshift project now, and the differences are significant. It’s not hard to overload the cluster with a complex query. With Snowflake I would just create my own cluster with the same data shared on the back end. While it’s possible to do something similar with redshift, in practice the configuration and DBA culture make it much more difficult.

Edit - reread the question and it mentions OLTP workloads. Snowflake and Redshift are specialized analytics DBs so they don’t meet that criteria.


👤 deergomoo
I am yet to actually use it for anything, but I like the sound of what EdgeDB is doing [0][1].

The tl;dr is that its schema definitions and query language align much more closely to typical app logic, theoretically eliminating the draw of an ORM and being very good at things that are unwieldy with SQL, like retrieving sets of related records.

[0] https://www.edgedb.com/showcase/data-modeling

[1] https://www.edgedb.com/showcase/edgeql


👤 xupybd
This is a hard question to answer.

You can do most anything with the wrong tool. I'd rather ask the question, when is PostgreSQL or MySQL the wrong tool for the job. I'm not sure I'm qualified to answer this, but I can point you in the direction of a book that has given me a much better understanding of the space. https://www.oreilly.com/library/view/designing-data-intensiv...


👤 jhoechtl
Bitemporal tables.

I sorely miss them on PG.

MariaDB has them https://mariadb.com/kb/en/bitemporal-tables/


👤 bbertelsen
One really cool thing in the analytics space is the adoption of columnar data stores. PostgreSQL and MySQL don't do this. (If they can please correct me!). This is for embedded loads, situations where you might use SQLite, there are now really interesting alternatives like MonetDB and DuckDB. Paper: https://hannes.muehleisen.org/SIGMOD2019-demo-duckdb.pdf

👤 brainbag
Some thing I've encountered in practically every project in the last 10 years is a large number of calculated attributes that have to be in code or in triggers. I know MySQL and PostgreSQL both have generated/virtual columns but they are in-table only, and this can be accomplished with triggers, but I want to be able to simply define my table schemas based on schemas from other tables, and have the database system handle it automatically.

👤 jfbaro
1) Native Bi temporal capabilities without hurting (too much) performance.

2) Distributed DB, like Spanner and CRDB.

3) Real time analysis of performance and then adapt the DB config accordingly, without requiring HUMAM intervention.

4) Multiple (joint?) storage types per table (Memory, Columnar or Rows, for example);

5) Native support for Copy On Write replicas (useful for ephemeral environments, Stage and development)

6) Serverless capabilities


👤 tdhz77
Does your data model change often? PostgreSQL and MySQL well be very difficult to make fundamental changes. If you need to make changes to your data model, the structure of relational databases will make it difficult for you to move fast.

Linear relationships must be defined. Your database doesn't do much for you. You must define every relationship between tables.


👤 mamcx
Too much focus in the "scalability" that only matter for a very narrow niche and lateral to the DB engine, so I instead focus in real progress/improvements for RDBMS (one of my dreams is doing this):

- Algebraic data types, removal of NULLs.

- Including a relational language, not just a partial query language (SQL). (I making one at https://tablam.org, just to get the idea)

- So, is full relational (you can store tables in tables, you can model trees with table because above, etc)

- SQL is a interface for compatibility and stuff, but the above is for the rest, because:

- The engine is not a full black box but a composite of blocks so:

-- The inner and only only black box is the full ACID storage layer, that is concerned in manage PAGEs, WALs, etc made in a lang like Rust.

-- The user-facing/high-level storage layer is above this. I think this will allow to code it in the lang above because exist:

-- A pluggable language interface (making a "WASM for database/VM") that others (like SQL) compile to. And probably WASM for stored procedures and/or extend it, THEN

-- This will allow to compile "SELECT field FROM table" CLIENT-SIDE and check it! (after supplied with the schema definition), AND TOO:

- Because it not have a limited query language but one that is full, you can code a new INDEX with it. Note how do it in any language (ignoring the complexity of storage and acid, this is where a high-level interface is needed) is simple, but impossible in current RDBMS.

- Because the DB is truly, fully, relational, you can do "SELECT * FROM Index"

- Then, you can add a cargo-like package manager to shared code to the community

- Then, you can "db-pkg add basic-auth" to install stuff like auth modules that are actually used, not like the security that is included in old database for a use case not many care for

- Allow to make real-time subscriptions to data/schema changes

- Make it HTTP-native, so is already REST/GrapQL/WebSocket/etc endpoint-capable and

- Go extra-mile with the idea of Apache Avro or similar and make the description of the DB-schema integral to it, so you can compile interfaces to the db

- Store the schema changes, so it have in-built MIGRATION support (git-like?)

- Then auto-generate DOCS with something like swagger?

----

In relational to the engine itself:

- The storage is mixed row/columnar (PAX-like) to support mixed-workloads

- The engine, like sqlite, is a single library. Server-support is another exe and the package manager is what install support for operation

- The DB is stored in a single-file?

- We want to store:

-- Rows/Tables: BTrees + PAX like today, nothing out-of-ordinary

-- LOGs/Metrics: is the same as the WAL!. A rdbms already have it, but is buried: Allow to surface that, so you can do 'SELECT * FROM my_wal"

-- Vectors: Is the same as a PAX storage but one where is only 1 column

-- Trees: Is something you can do if the DB is truly relational and allow to store tables/algebraic types on it

IF the storage have a high-level interface and exist a full-featured language ("WASM-like") interace to it, you can add the optimizations to the query planner and the code that manipulate the data without demand to get into the deeps of the engine.

This mean that people that want to disable the query planner, INSTEAD NEED to improve it! IF the query planner is a component of the engine that is surfaced, and can tweak it.


👤 m_ke
A combination of SQL, Redis, ES, and Columnar stores in a single package.

👤 pankajdoharey
High volume data ingestion and tunable consistency because Most sites start out with these two databases but eventually end of porting significant sections of their database to Cassandra.

👤 lmm
- True master-master with easy autoscaling out of the box

- Expose lower-level access to MVCC etc. Something finer grained than global transactions.

- Similarly, make indexes more first-class and visible.


👤 znpy
be multi-master and horizontally scalable out of the box.

👤 fnord77
druid / pinot - ingest-time aggregations, plus aggregation of sets on the server, like hyperloglog or theta sketches at either ingest or query time

👤 pat2man
Cockroachdb: automatic multiple primaries + sharding.

👤 acjohnson55
For Postgres: a simpler, scalable connection model.

👤 d--b
Built-in versioning and row-level permissioning

👤 jb1991
Look at Datomic.

👤 darksaints
Get rid of NULLs and ternary logic.

👤 ypcx
Learn a NN model incrementally on INSERT and then use abstraction and inference on SELECT.

👤 gkoberger
Oh I've been thinking a lot about this! It might not be the answer you're looking for since they're all very UX-related and not particularly database-specific, however I have a few ideas.

Sometimes I feel like databases were created by people who never built a website before. Most websites are pretty similar, and databases historically have never felt (to me at least) "modern". I always feel like I'm fighting against them, and making usability concessions for the sake of performance.

First, the ability to subscribe to external data sets. I feel like I spend so much time writing crappy syncing code with external APIs (like Clearbit, GitHub, etc), and it would be so much nicer if I could just "connect" with them and know it will be fairly up to date.

I also think there's so many things everyone finds themselves redoing for no reason. For example, almost every site on the internet as a user database with sessions, and each user has an email (that must be valid + verified), a password (that's encrypted + salted) and 2FA (which everyone is basically implementing themselves). It'd be so nice if the database just "knew" it was a user, and you were tweaking the presets rather than building it from scratch.

Every single company has similar workflows they each solve themselves (often in insecure ways): they all have a production database, staging environments, migrations, direct access for customer support to fix things, local db access/clones for development, etc. I'd LOVE a database that was created with all these use-cases in mind... such as a way to connect to a DB locally but scrub sensitive data, take care of migrations seamlessly, etc.

This might be a bit too "in a magical world"-y, but I'd love to not have to think about tradeoffs. Kind of like an automatic car, I'd love my database to be able to shift based on the types of data and amount of read/writes. At my company, we have 3-4 different databases for different reasons (Mongo, REDIS, ElasticSearch, ClickHouse), and it gets really difficult to keep all the data synced and connect them behind the scenes. I'd love to just never have to think about the low-level data store ever again, and have the DB do all the work without us having to worry.

There's a number of primitives that I think are used a lot, and it'd be amazing if they were built in. For example, time. It'd be great to easily get the difference between two times, or total the times of a bunch of rows. Airtable has a time primitive, and it's amazing how much friendlier it is to use.

Overall, I'd also love it to just feel a lot more like Airtable, including an Airtable-like interface for working with it (right down to the ability to create custom views and create on-the-fly forms people can submit data to). I honestly use Airtable for most of my DB needs these days (for one-off small projects), and it's such a delight to use.

Maybe I'm underestimating the importance but... I feel like databases are pretty performant these days. I hope that we can start seeing dramatic UX improvements, since we don't have to optimize for performance the same way we have in the past.


👤 sylvainkalache
Vertical integration with container technologies such as Kubernetes.


👤 7kmph
Make you think that there is hope for humanity after all

👤 mikewarot
Relationships between tables, as supported by Microsoft Access.. it did cascaded deletes and things automagically.

Persistent queries, where any change to the answer is propagated, like a subscription to a feed.


👤 EGreg
Byzantine Fault Tolerance in distributed systems :)

👤 polskibus
Postgresql does not have real clustered index (one that is automatically maintained) that Ms SQL has (and other DBs). Such feature is important for a lot of popular workloads.

👤 kokizzu3
easily add a new node in one line command and autorebalance i guess..

👤 jstgord
stored procedures/functions in javascript, out of the box ?

👤 truenindb
Basically they aren't going to impress your non-technical boss who is your boss for some inexplicable reason. Postgres has been able to function as key value store(which is basically what most NoSQL "engines" basically are, write once append only key value stores) for quite a long time now. Writing code to query your key-value store is another story.

The rest of my admittedly very snarky, but arguably highly accurate commentary on various veins of discussion on this topic will be left in a time capsule below for future generations of people who actually have a legitimate interest in computer programming to discover after the Taliban are driven out of the computer industry.

Theoretically you can replace hiring smart programmers with magical artificial intelligence that reads your mind and then uses the computer to bend the world to your iron will, but if someone else could produce such a magical artificially intelligent database why would they share it with you? Perhaps they are an omnipotent omnipresent omnibenevolent entity that loves you just a little bit more because of your good looks and high intelligence?

Is this different from wanting to do a persistent query? Basically, since you have a programmable computer that you can program to talk to the database program, you write a computer program that periodically queries the database and then takes the action when the condition is met.

SQL is an implementation of a mathematical descriptive language for relationships. The whole point is that temporal logic like "wait until this happens, then do this" can be kept somewhat separate from logic describing the data you are tracking.

You have SQL that describes what it is you want to store, and particular questions you want to ask about what it is you want to store, and then the job of the database program is to figure out how to store the data safely and efficiently and answer your questions quickly. How you write the SQL that describes the way you want to store the data depends some on what kind of questions you want to ask, and this is what an actually skilled "Database Application Programmer" can figure out for you.

Some proprietary(and probably also Postgres) databases do provide support for the kind of thing you are asking to do here in the form of what are called "Stored Procedures" . Your average corporation accumulates an utter shitload of these stored procedures that various non-technical technical question askers in different departments don't tell each other about and they are often doing the same thing in different ways at different times. Then later they crash the database and break the application itself because there is insufficient technical oversight and communication at the actual decision making levels of the corporate bureaucracy.

Long story short, do all of this stuff in persistent queries done outside of the database and tracked in a shared medium like a wiki page, or even better a physical notebook routinely reviewed as part of management actually doing some managing. https://en.wikipedia.org/wiki/Gantt_chart

This is an indication that you are not using a Model View Controller approach to building your database client. There are some python ORMs that were doing this automatically correctly a decade ago, but there are presuambly still several corporations with hundreds of millions of dollars using Enterprise Java Beans in 2021, also maybe some people with nuclear arsenals as well so you shouldn't consider yourself too behind the times.


👤 Torwald
Write a cheque for an Oracle consultant.