Cassandra and de-normalization

An article on working with graph-like data on Cassandra, which basically translates to make it all a really long row with timestamps for column-ids.
The cost of denormalisation is duplication of data, and in this case we have duplicated each message by copying it to the timeline of every user who follows the author of that message. This means we incur a write cost when broadcasting a message, since we must insert the same message multiple times. Luckily for us, Cassandra is optimised for high write throughput (writes perform only sequential I/O) and it is this performance profile of Cassandra that allows us to trade some write speed for increased read throughput.
It works (and how!), but its this kind of bigdata solution that gives me the jeebies...

Comments

Popular posts from this blog

Cannonball Tree!