Payware - and the BigData Ecosystem

Dan Woods has an article at Forbes about "How Hadoop and SAP HANA can accelerate Big Data Startups".  Its pretty hard to read - you have to get past the obvious shilling for SAP  (the byline is a bit of a giveaway - "He has written several books and created other research and educational content for SAP"), and after that, you have to ignore the Hadoop-centric nature of the post (there are other fish in the sea, you know?)

You have to get all the way to the end of the article before you get to the meat, which boils down to
  • ... can SAP make it as easy to experiment with SAP HANA as it is to download and use open source?
  • ... will developers buy into SAP’s efforts at being open and making SAP HANA easy to use?
  • (If it is) Priced too high or with onerous terms, SAP HANA won’t make sense to startups.
And these points, in the end, are what its all about.
A significant - and somewhat under-reported - aspect of the current BigData boom is that people are able to do things their way, without having to worry about licensing models, usage patterns, scalability costs, etc.

This isn't because of some philosophical Stallman-like Software should be free mindset.  Au contraire, it is because, people can now, truly,  focus on what they want to do with the data , instead of how they manage the data.

Think back to as recent as ten years ago - when an organization of any significant size had people devoted to licensing.  Licensing!  Licensing costs mattered - when you moved to a new server, or added a server, or heck, added a CPU (and sometimes even added memory).  Need a hot-spare?  That has costs.  Need to scale because you are adding customers?  Check your licensing restrictions. etc.
What this meant was that you had to bake in your data storage and persistence licensing costs into the equations right from the beginning. And these costs could be substantially different depending on what you bought, and when you bought it (buying 5000 seats up front is invariably far cheaper than buying it 1000 at a time).

As a consequence, you had to think long and hard about how you would store your data, how the data would grow, what the associated costs would be, etc., as part and parcel of the problem that you were trying to solve!
 
Think about that for a moment.
You have a problem, and you come up with a solution to that problem.  Somewhere, as part of the solution, you have to store and/or manipulate data.  Yes, this might be a significant part of the solution, but it is still part of the solution.
The moment you start worrying about licensing fees and costs, you have now made it part of the problem space!

And therein lies the rub.  What NoSQL brought to the table is the concept of solution-oriented stores
They come in all sorts of shapes and sizes (document, column-oriented, key-value, etc.) and each of these has a different sweet-spot.  I tend to think of them as solution-oriented data stores, i.e., each DB is tuned towards a specific solution domain.  Oh yes, they are certainly moving towards each other - Riak has secondary indexes, CouchDB's bigcouch variant shards and scales brilliantly, etc. but that is evolution for you.
Of course you can use these (somewhat, and very loosely) interchange-ably, but ye gods, wouldn't that be a dumb thing to do.  As a simple exercise, imagine the entertainment associated with swapping out CouchDB with Berkeley DB.  (And, in case you are wondering, I was actually asked to help in this recently because "They both are DBs right? Whats the difference?")
The point being that in this space, you pick the specific data handling solution that you need, based on your use-cases, and not based on extrinsic issues like licensing costs, etc.  Imagine the fun if this was all licensed software - "Yeah, we really needed a Key-Value store like memcache, but it was cheaper to use Oracle.  Besides, you can also do Key-Value stores in it".
Of course, when you have a hammer, everything looks like a nail to you, but that isn't the point.  What we're trying to do here is come up with the best solution to the problem, and, in our current ecosystem, there are plenty of ways of accomplishing that without paying huge license fees.

Do understand, I am not minimizing the benefits of SAP HANA, IBM Netezza, etc.  They are all remarkably good and useful pieces of software, and if you have the money to splurge, go for it - as long as it is the correct fit for what you are trying to do (but that goes without saying, right?)

The bottom line here is that a huge - and instinctive - barrier to entry for most startups is going to be the "Why should I sell my soul to SAP / Oracle / IBM / whoever" attitude.  This is especially true for startups, for startups not backed by VCs invested in the SAP / Oracle / IBM ecosystem :-)

Back to SAP HANA - what do I think of it?  Its great, and as I mentioned earlier, if you have the money, and can afford it, and most of all you need it, go for it...

Comments

Unknown said…
Hi there, awesome site. I thought the topics you posted on were very interesting. I tried to add your RSS to my feed reader and it a few. take a look at it, hopefully I can add you and follow.


SAP Careers

Popular posts from this blog

Erlang, Binaries, and Garbage Collection (Sigh)

Cannonball Tree!

Visualizing Prime Numbers