If you are discarding data, you are doing something wrong.


BigData is pretty much the rage - everybody is talking about it, and some people are even doing something about it.  But before we go there, lets just look at the data you already have.



Of course you already know the value of all this data. You are, after all, using it to do whatever your product is - be it an iPhone App, a telecom service, or a wire-transfer system.  That said, data is just the first step in the Data --> Information --> Knowledge --> Wisdom food chain.  By focusing only on the first - or maybe the first two - steps of the food chain, you risk losing out on the big picture, and possibly Not Making It In The Long Run.  
  • Some of your iPhone Apps may have made money, but performing demographic and geographic analysis on the purchasers could help you figure out why.  Adding historical context could help you figure out what changed - for better or worse!
  • Your telecom service provides nice usage reports but you could be providing valuable insight to your clients by allowing them to look for patterns in this data, or to integrate it with their own internal data, 
  • Your wire-transfer system may be functioning, but analyzing the trends could help you prevent those pesky law-enforcement types from showing up because of all that money suddenly flowing into the Caymans (I know, its a simple example, but when we did it back then, it was pretty revolutionary!)

Yes, of course you can do that with <insert Oracle, Postgres, MySQL, etc. here>.  Why haven't you done so already?  I bet its some combination of Time/Energy/Money/Ability, combined with all sorts of database constraints (We need a new reporting server!  The View needs to be optimized!).  In contrast, services like Bime, Birst, and GoodData are ridiculously easy to use, and if you don't want to outsource this its really remarkably trivial to stand one of these up yourself with CouchDB/CouchApps.  In all of this, Flexibility is key, with the factors that you care about changing frequently, and the data you are manipulating dependent entirely on your (ever evolving) business requirements. And historical context is almost invariably critical.  Yes, there is absolutely a set of problems that are completely dependent on Velocity (twitter feeds monitored for mood, to predict stock market changes), but Volume is as, if not more, important.  And you can't get Volume if you throw away your data.
If you are discarding data, you are doing something wrong.  

This brings up the question - in this day and age, why would anybody ever want to throw away data?  Storage is - effectively - free, and the cumulative value of data is, well, incalculable.  I know, there are database constraints that your production system suffers from, and you don't have the Time/Energy/Money/Ability to remove these constraints.  Thats OK!.  Leave your system the way it is!  You can, however, stream data out of your system and dump int into one of any number of existing databases - CassandraRiakBigCouchMonboDB to name just a few - with little to no effect on your production system, after which you can absolutely (and with my blessing!) clobber the data.   
If you are discarding data, you are doing something wrong.  


Comments

Popular posts from this blog

Erlang, Binaries, and Garbage Collection (Sigh)

Cannonball Tree!

Visualizing Prime Numbers