Something I didn't understand for a while is that the process of turning row-oriented data into column-oriented data isn't a totally bespoke, foreign concept in the realm of databases. It's still of the relational abstraction. Or can be.
As an example, say we have this data:
data = [ { "name" : "Smudge" , "colour" : "black" }, { "name" : "Sissel" , "colour" : "grey" }, { "name" : "Hamlet" , "colour" : "black" } ]
This represents a table in a relational database. Let's assume this was a table in a relational database and we had to do all sorts of disk-access, whatever, to access any particular part of the data. This representation has some nice properties.
It's easy to add a new row: we can just construct a row:
{ "name" : "Petee" , "colour" : "black" }
and add it to the end of our already-existing list. On disk, we probably only have to touch a couple pages to do it. And if our row were really wide, in that it had a whole bunch of columns, that wouldn't really change. It would still have that nice property.
This is also true of looking up a row. Since all of a row's columns are stored next to each other, it's very fast to just pull that row out from wherever its stored.
Conversely, if we were to want to, say, compute a histogram of the different pet colours, we have to read quite a lot of data we don't care about in order to do so.
This is a row-oriented representation of the data. A column-oriented representation would look something like this:
... continue reading