CouchDB - blessings and curses

John Wood (of Signal fame) has a post up about Signal's experience moving to, and away from, CouchDB. Its an interesting real-world example of what I've described in "NoSQL - What you'll find (for sure!)". To recap, when first getting into NoSQL, you are sure to find that

You didn't understand your own problem-space as well as you thought you did.
You didn't understand the package that you are using as well as you think you do.
It will not scale the way you thought it would. Oh, it'll scale all right, just not the way you thought it would.
Your object/document/JSON/whatever model really doesn't map exactly the way you expected it to.

In John's case, they found that

HTTP is a Very Slow Database Protocol
MVCC Overhead (is bad)
Large Databases Beat Up the Hard Disk
CouchDB is not a Distributed Database (by default)
map/reduce takes a while to get used to
Views take forever to build
Views are gigantic on disk
Replication issues

Taking these one by one from my perspective -->

HTTP is a Very Slow Database Protocol : Yes it is. And if your system is User I/O bound (as is ours), then it really doesn't matter :-) Which is a serious point - it all depends on where the bottle-neck of the system is.
MVCC Overhead (is bad) : Which can be a feature or a bug. For us, its a feature - we know that our records are safely written, and design accordingly.
Large Databases Beat Up the Hard Disk : Absolutely true. And we've got a couple of administrative scripts that are constantly running in the background compacting all the shards and views. Which, in my mind, is no different from the VACUUM / REINDEX scripts that we have running on the PostgreSQL environment in our previous product. There is always some administrative overhead, no way of getting around that
CouchDB is not a Distributed Database (by default) : Again, absolutely true. That is why we went with bigcouch which works pretty spectacularly - auto-sharding, distributed queries, fault-tolerant, etc. It works brilliantly, and I couldn't be happier w/ the folks at Cloudant.
map/reduce takes a while to get used to : Yet again, absolutely true. Then again, we're erlang based, and that probably helped us quite a bit. Also, its kind of difficult to *not* think map/reduce after using it for a while (think working in erlang, and then going back to java. Shudder)
Views take forever to build : Sigh. True again. But then again, when using bigcouch, its actually Forever / Nodes (ok, shards also matter). Seriously though, this is an issue, and one that we address through some serious development / deployment methodology (each view is a different document, incremental changes, etc.) which allows for easy rollbacks, etc.
Views are gigantic on disk : True true true. But, three things here - you need to be careful about how many values you are emitting per row, you need to check if multiple views can be used instead of a single complex one, and you need to be careful abut about maintenance (point 3 above). With a little care/feeding, our view sizes collapsed from Gigabytes to Tens of Megabytes (really!)
Replication issues : Bigcouch. 'nuff said. See point 4 above.

Seriously, do understand that this is not a rebuttal. Ok, maybe it is, but, I am not saying "John Wood is wrong". If anything, I am absolutely agreeing with him when he says

I would never throw out a hammer because it didn’t help me drive in a screw.

To take that analogy to its logical extreme, he has a whole bunch of screws, and needed a screwdriver. Me, I had a whole bunch of nails, and actually needed the hammer. :-)

For what its worth, I hope this helps someone else when they get around to choosing between Hammers and Screwdrivers.

Note: Our bigcouch installation sucks in ~2M documents per day, at around 1K/document...

Search This Blog

And you are here why?

CouchDB - blessings and curses

Comments

Popular posts from this blog

Sysadmin Day - July 27th

Cannonball Tree!

Erlang, Binaries, and Garbage Collection (Sigh)