HACKER Q&A
📣 grafs50

What does database (internals) development look like?


I'm considering trying to move towards working as a developer for a database product. I'd like to get some info on what the day-to-day tasks are like for a developer of something like MongoDB, FoundationDB, CockroachDB, NoSQL alternatives... any type of data store really.

Some things in particular I'm trying to figure out: * As with all dev work, a large portion of work is bug hunting, but what kind of bugs do DBs usually have to deal with?

* Do developers spend a lot of time on optimization or is this mostly just a concern that's figured out during initial development?

* What educational prereqs are there? Do employers (strongly) prefer a Masters or even PhD?

* How is the job market for this kind of work? Obviously demand is going to be much lower that your standard webdev job, but how is the demand/supply imbalance?

* What employers hire these developers? Is it basically just FAANG and specialty companies a la Cockroach Labs?

Thanks to anyone who takes the time to respond!


  👤 charlysl Accepted Answer ✓
Start by watching Michael Stonebraker's Turing Award lecture, he will give you the lowdown and inspiration: https://youtu.be/BbGeKi6T6QI

Start with relational, understand why it is the reference architecture, and from there the tradeoffs involved and what other architectures bring to the table (columnar, streaming, object, in-memory, array, distributed, blockchain, nosql, etc)

To really understand why you should start with relational, read Stonebraker's classic paper: "What goes around comes around": https://people.cs.umass.edu/~yanlei/courses/CS691LL-f06/pape...

It will teach you database evolution history so that you don't end up reinventing the wheel.

Stonebraker's MIT course: https://ocw.mit.edu/courses/electrical-engineering-and-compu...

There are a few lectures of this course in youtube, not by him: https://youtube.com/playlist?list=PLfciLKR3SgqOxCy1TIXXyfTqK...

MIT's distributed systems course also touches on databases: https://pdos.csail.mit.edu/6.824/schedule.html

Course by one of his disciples: https://15721.courses.cs.cmu.edu/spring2020/

Yet another disciple (in edx too): https://m.youtube.com/playlist?list=PLYp4IGUhNFmw8USiYMJvCUj...

The red book: http://www.redbook.io/

For learning SQL really well, including relational algebra, I like this course: http://users.cms.caltech.edu/~donnie/cs121/


👤 mikaelronstrom
General optimisations are not so common to work on, but new HW requires new thinking about database architecture.

A few examples, Scaling to more CPU cores, Scaling to bigger RAM, bigger disks, scaling to larger systems. Introduction of new HW such as Intel Persistent Memory, new GPUs that can be used for various purposes in a database such as compression and encryption.

Every product has weaknesses that needs to be addressed, what these are is obviously product dependent.

Personally I spent a very significant amount of time the last 5 years to automate algorithms such that they automatically adapt to load, memory sizes, VM size and so forth.

Masters are definitely ok, but my Ph.D studies have certainly helped since that made me to do a deep dive into all database algorithms. So masters is sufficient to be a database developer, but I would say a Ph.D is a good idea if you aim for a database development architect role down the road. Best of luck in your new tasks.


👤 _wnh9
Why databases specifically? Would you be open to other systems-type work? That would probably significantly expand your options.

My last job was at a company that makes data storage systems [link redacted]. That thing probably doesn't look like a database to most people, but that is exactly what it is under the hood. We had quite a few ex-Oracle people on staff too, and their skills were very useful.

The bugs were pretty fun actually. We've had to deal with network card firmware corrupting frames, a CPU bug, PCIe issues, and of course the much more numerous (and mundane) kernel bugs and run-of-the-mill null pointers and memory leaks.

And before anyone says "just use Rust": the company started many years before Rust was a thing and there was simply too much to rewrite.

There is almost always room for a good generalist developer in a company like that. You don't have to be a domain expert to join. But of course there will also be some people with PhDs on staff. Learning from them is another draw.


👤 brundolf
Similar questions could be asked about working in compilers, and lots of other foundational domains. Pretty much anything where the work is sufficiently removed from end-user products

I'm curious to hear some general answers


👤 LarryMade2
Data needs vary from company to company or application. Several things that come to mind are speed, accuracy, scalability, potability, and specific use cases (i.e. medical, GIS...).

As data needs will vary so does skills and knowledge. Best to figure out what sort of field of data that interests you and learn towards the environments that are used in it.


👤 winrid

👤 kathoum
Here is a three year old post about working as a DB developer at Oracle:

https://news.ycombinator.com/item?id=18442637


👤 plasma
Check out the postgres mailing lists, https://www.postgresql.org/list/ in particular pgsql-hackers


👤 juangacovas
As others have mentioned, MariaDB also hosts a Jira: jira.MariaDB.org

Curious detail is Jira itself doesn't officially support MariaDB.


👤 joshxyz
i think good example too is clickhouse analytics db, their development is all on github, fun to read issues and release notes every once in a while