What exactly happens under the hood? Why the notion of storing by column rather by row increases read and reduces insertion respective speeds?
The biggest one in my mind is cache friendliness.
If you are processing a column you are running a piece of code on one piece of data, then another piece of data, and another piece of data. If you are processing rows you might be running the same piece of code but you are also running 20 other pieces of code for the other 20 columns. Code for column B kicks the code for column A out of the cache 20 times per row. Instead of using the data bus to move data you are having to load instructions over and over again. It's like trading your 2022 processor for a 1993 Pentium.
Because it is a holistic property of the system it defies microbenchmarks. For instance somebody might benchmark the code for each of the 20 columns and find that they run fast when they run individually. When you are running one, then another, then another, you get a big performance drop that doesn't show up unless you test for it specifically.
On top of that the access pattern for the data is simpler and more predictable for columns. If you are scanning from one address to another address in a straight line the CPU can start loading data long before you need it.
The cache benefits don't just apply to the multiple caches in the CPU but also RAM caches of storage, caches hidden in the storage system, etc.
As for write efficiency, the row-based database frequently needs to just update one block or a few consecutive blocks in the storage system. The column-based system has to hit at least 20 blocks to update 20 columns.
* Often the columns are stored sorted so insertions can cause a lot of IO as the data get's shuffled around if the insert needs to happen in the middle of the current data. (This is less of a problem if your writes only ever have to append.)
* Inserting a single row for a columnar store sometimes means that you have to write each field into multiple different places. While in a row based stores you can often write all the fields all at once.
The upshot is the inserts or updates on a columnar store can result in more io operations for a columnar store than a row store which translates to slower writes to disk.
However you get gains in for analytical read workloads for the reasons that others have commented about here.
A row oriented DB puts data in storage by putting for each row all columns after another, and then rows after rows in memory. A column oriented DB splits storage by columns instead, putting everything thats in a column next to each other.
Given columns A, B, C:
row-oriented: A1,B1,C1,A2,B2,C2,A3,B3,C3, ...
column oriented: A1,A2,A3 ... B1,B2,B3, ... C1,C2,C3, ...
Accessing data that's next to each other is faster than accessing data that's split apart, and reading more data takes longer than reading less data - hence the difference in access speeds. (The details are of course more complicated, but this is what the fundamental idea boils down to)