HACKER Q&A
📣 maest

Alternatives to q/kdb?


I'm not referring to the much talked about speed of k. Instead, one idea I really like in k is how the database is the programming environment.

The standard e.g. python + postgres setup suffers from having the data wanting to live in 2 places: in the db for long term storage and in your application memory, for manipulation/research. So you end up marshalling data to-and-fro, writing boring code dealing with the subtleties of slightly different types in python vs postgres. Maybe you even try using an ORM, which comes with a host of problems. You also have the problem of not knowing exactly where to put business logic; usually, if it can he expressed as a constraint on a table, it should leave in the database. But that's not always the case, so you end up putting some stuff in the client code.

In q/kdb, the database and your application process are the same. It's a joy to naturally interspread q-sql with q code - it's the kind of step change that maked you wonder how were you managing before discovering it.

The closest I've come to reproducing this is with pandas + storing csvs on disk, but this has a couple of problems:

1. Pandas syntax is, let's be honest, a hack. They're doing their best, but they have to comply with python syntax rules. You also have like 50 different date and time types. The table index is either useless or actively getting in the way most of the time (and I could go on).

2. Using csvs is also a hack. It's not nearly as efficient as a binary format, type information is easily lost. The format doesn't allow for easy partitions either, so it's limited in how it can scale.

Anyway, rant aside, are you aware of any other attempts out there of moving code into the database?


  👤 st1ck Accepted Answer ✓
Probably not what you want, but I was looking for a while how I can get rid of Pandas, and for the most part ClickHouse (DBMS for analytics) can do almost everything Pandas does (for my use case at least) and much more efficiently. It has very good support of array types, so it's pretty convenient for slightly nested data. Whatever you can't do completely in ClickHouse, you can export into Parquet/JSON/CSV/etc. and finish the analysis in Pandas or anywhere else.

Just be aware that ClickHouse is still somewhat experimental and idiosyncratic DBMS, doesn't have some SQL goodies like window functions or B+Tree indexes, or pivots. They only added CTE support a week ago, I haven't even tried it yet.


👤 arthurcolle
Wish I had more to offer, but here's one neat implementation of K that I found through HN a while back: https://github.com/JohnEarnest/ok

I love the little Web UI this guy wrote along with it as well.