HACKER Q&A
📣 ud0

Resources for Learning about Databases


I recently got a Frontend Engineering offer from Facebook. I have dabbled in backend work earlier in my carrier but never did anything beyond simple API’s on a single server. I set a goal for myself that every year I pick a computer science topic & study it thoroughly. I dont have a CS background but I am begining to love CS.

I started with Data Structures & Algorithms last year, studied & did over 250 leets, this is why I was able to land an offer with FAANG.

Next on my list is Databases, I want to know how they work internally, build a simple RDBS from scratch, learn SQL(I know simple CRUD operations) advanced concepts like procedures & the latest that is being used today.

I have googled yes, but I havent found any resource that meets my needs. I also plan to switch to backend soon.

Thanks in advance

Edit: I know I will not be building databases at Facebook, & I also know they probably have internal tools or ORM to access databases. My goal is not to become a database developer but to have a good knowledge of how they work just to satisfy my curiosity.


  👤 petepete Accepted Answer ✓
It gets recommended all the time in these kind of threads, but it's so good I don't care. Bill Karwin's SQL Antipatterns. You need a decent understanding of the basics to get the most from it, but there's some excellent information and examples of what to (and what not to) do.

https://www.oreilly.com/library/view/sql-antipatterns/978168...


👤 photon_lines
Great intro and overview: http://coding-geek.com/how-databases-work/

Great book: https://dataintensive.net/

Also great read and overview: http://www.redbook.io/

Great paper over-viewing the architecture of a DB: https://perspectives.mvdirona.com/content/binary/Architectur...

If you're looking into building your own database, there are some great open source projects you can reference here: https://github.com/danistefanovic/build-your-own-x#build-you...

If you want to actually dive into source code - SQLite is amazing. It has very clean and readable code, so I'd suggest using it as a reference as well: https://github.com/mackyle/sqlite


👤 lioeters
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

https://dataintensive.net/


👤 tony
Anything by Joe Celko: SQL for Smarties, Trees and Hierarchies in SQL for Smarties, Joe Celko

Also, the internals of Django ORM (https://github.com/django/django/tree/2.2.5/django/db/models) and SQLAlchemy Core (https://github.com/sqlalchemy/sqlalchemy/tree/rel_1_3_8/lib/...) and its dialects (https://github.com/sqlalchemy/sqlalchemy/tree/rel_1_3_8/lib/...) + ORM (https://github.com/sqlalchemy/sqlalchemy/tree/rel_1_3_8/lib/...)


👤 dominotw
I have three things for you

1. Designing data intenstive applications

2. Database internals https://www.amazon.com/Database-Internals-deep-dive-distribu...

3. Andy Pavlo's database course videos at cmu and guest lecture series https://www.youtube.com/channel/UCHnBsf2rH-K7pn09rb3qvkA


👤 cbanek
I really suggest against building a database from scratch. It's just too annoying, and there's so much code to write (parser, storage, indexing, query planner, connections). If you're interested in internals, I'd say look at the sqlite codebase instead: https://sqlite.org/src/doc/trunk/README.md . If anything, reading code that works is probably more useful than writing code that almost certainly won't without months and possibly years of effort.

A lot of the more complex database things are only really learned by having a large database system. Performance, distributed databases, and complex schemas come to mind here. Most of the times with simple examples, you'll do something wrong performance wise, but you'll never know because of the scale (such as forgetting an index, or doing a bad join).

Many times, you don't need to know that much about database other than some basic SQL.


👤 cozos
These are more academic than practical (i.e. build a DB from scratch) but still interesting I think.

https://github.com/rxin/db-readings


👤 dbattaglia
I found this an enjoyable resource for learning about one of the fundamentals of RDBMS, indices: https://use-the-index-luke.com/

👤 joddystreet
anything on this channel - https://m.youtube.com/channel/UCHnBsf2rH-K7pn09rb3qvkA - CMU DATABASE GROUP. all thanks to - https://mobile.twitter.com/andy_pavlo - Andy Pavlo - has a quote, something like "I only love two things, my wife and the databases". Follow his lectures and read his suggested papers.

👤 JacKTrocinskI
I would pick one RDBMS and try to dissect it, there is a lot to chose from nowadays, you can check out db-engines to get a general sense of what's out there:

https://db-engines.com/en/ranking

From what I have seen most enterprises today will be using Oracle or Microsoft, however PostgreSQL seems to have gained popularity with the web developer and small business crowd (as well as with the HN community). I have been an Oracle database developer since 2015 and would definitely recommend going that route if it interests you, at the very least it might be a good starting point because of the fantastic documentation, here's a great guide I recommend to get you started with all the basic concepts:

https://docs.oracle.com/en/database/oracle/oracle-database/1...


👤 dqpb
Readings in Database Systems http://www.redbook.io/

👤 jjirsa
Alex’s book: https://www.databass.dev/

👤 tmsh
AWS re:invent 2018 talks:

https://youtu.be/HaEPXoXVf2k

He has a sequence of 2-3 great talks on DynamoDB, the history of relational databases and the rise of access-pattern oriented db design.


👤 dantodor
I back up the previous hints for Designing data intensive applications and Database internals. I would suggest also to look at Jepsen tests, https://aphyr.com/tags/jepsen, and Adrian Colyer's blog, https://blog.acolyer.org/

👤 winrid
You could port a database from one language to another as a learning exercise.

👤 Vaslo
Anything with exercises and answers?

👤 diehunde
Andy Pavlo courses on youtube.

👤 Takiya
Yes

👤 amirouche
Do you want to build a database or use one?

Also prolly at Facebook they use some API to access the database.