CouchDB - blessings and curses
John Wood (of Signal fame) has a post up about Signal's experience moving to, and away from, CouchDB. Its an interesting real-world example of what I've described in "NoSQL - What you'll find (for sure!)". To recap, when first getting into NoSQL, you are sure to find that
For what its worth, I hope this helps someone else when they get around to choosing between Hammers and Screwdrivers.
Note: Our bigcouch installation sucks in ~2M documents per day, at around 1K/document...
- You didn't understand your own problem-space as well as you thought you did.
- You didn't understand the package that you are using as well as you think you do.
- It will not scale the way you thought it would. Oh, it'll scale all right, just not the way you thought it would.
- Your object/document/JSON/whatever model really doesn't map exactly the way you expected it to.
- HTTP is a Very Slow Database Protocol
- MVCC Overhead (is bad)
- Large Databases Beat Up the Hard Disk
- CouchDB is not a Distributed Database (by default)
- map/reduce takes a while to get used to
- Views take forever to build
- Views are gigantic on disk
- Replication issues
- HTTP is a Very Slow Database Protocol : Yes it is. And if your system is User I/O bound (as is ours), then it really doesn't matter :-) Which is a serious point - it all depends on where the bottle-neck of the system is.
- MVCC Overhead (is bad) : Which can be a feature or a bug. For us, its a feature - we know that our records are safely written, and design accordingly.
- Large Databases Beat Up the Hard Disk : Absolutely true. And we've got a couple of administrative scripts that are constantly running in the background compacting all the shards and views. Which, in my mind, is no different from the VACUUM / REINDEX scripts that we have running on the PostgreSQL environment in our previous product. There is always some administrative overhead, no way of getting around that
- CouchDB is not a Distributed Database (by default) : Again, absolutely true. That is why we went with bigcouch which works pretty spectacularly - auto-sharding, distributed queries, fault-tolerant, etc. It works brilliantly, and I couldn't be happier w/ the folks at Cloudant.
- map/reduce takes a while to get used to : Yet again, absolutely true. Then again, we're erlang based, and that probably helped us quite a bit. Also, its kind of difficult to *not* think map/reduce after using it for a while (think working in erlang, and then going back to java. Shudder)
- Views take forever to build : Sigh. True again. But then again, when using bigcouch, its actually Forever / Nodes (ok, shards also matter). Seriously though, this is an issue, and one that we address through some serious development / deployment methodology (each view is a different document, incremental changes, etc.) which allows for easy rollbacks, etc.
- Views are gigantic on disk : True true true. But, three things here - you need to be careful about how many values you are emitting per row, you need to check if multiple views can be used instead of a single complex one, and you need to be careful abut about maintenance (point 3 above). With a little care/feeding, our view sizes collapsed from Gigabytes to Tens of Megabytes (really!)
- Replication issues : Bigcouch. 'nuff said. See point 4 above.
I would never throw out a hammer because it didn’t help me drive in a screw.To take that analogy to its logical extreme, he has a whole bunch of screws, and needed a screwdriver. Me, I had a whole bunch of nails, and actually needed the hammer. :-)
For what its worth, I hope this helps someone else when they get around to choosing between Hammers and Screwdrivers.
Note: Our bigcouch installation sucks in ~2M documents per day, at around 1K/document...
Comments