Monday, May 25, 2015

Hadoop and Big Data - A Year Later; A New Normal in Corporate Development Lifecycle

Gartner business model is not based on first principles - analysts have no real, direct experience with a particular technology. It is more like People Magazine approach: he said this, she said that, let's invent a name here, define a trend there.  Most of IT industry analysts and journalists are not an exception to this rule.
It is an amazingly shallow, yet generally accepted business model. It'd be like Siskel and Ebert writing about movies they haven't seen. ( Or I am too naive when I suppose you should know what you write about ? ).

I'll keep on ranting about Gartner - I also have a problem with their infamous hype cycle. Do they really think they discovered the Generic Law of Technology/Innovation, i.e. something like Moore's Law ? Are there any technologies that perhaps didn't quite follow hype cycle ? I think it is one of examples how humans are predisposed to "see" patterns and believe in cliches, tea leves etc. Actually even Gartner's analysts themselves claim they are surprised when they "see" some technology follows their hype cycle curve.

( Still ranting ) - since Gartner's main business is to go around and trade gossip, you gotta give them a credit for at least accurately describing current state of IT affairs. They can't predict anything ( who can  ? ), but they know what is ( and isn't ) going on within IT industry.  In Hadoop/Big Data case,  Gartner says: slow, steady, staged, and that is fairly accurate assessment of a current Big Data landscape.

Steady, staged part refers to the fact that Hadoop and Big Data are now a staple of corporate IT.
Nothing revolutionary is happening ( yet  ), but a beachhead is established for a new generation of Google-style MPP computing in corporate IT. One of the major problems with Big Data/Hadoop is that classic corporation is still unsure what to do with it and how to get there ( where ? ). This is not your run-of-the-mill ERP rollout, or yet another corporate OLTP  application with relatively clearly defined specs, technology, goals and timelines.
We are looking for what to do around Big Data project; how to collect and interpret data; we run into various limitations ( immature/non performing technology, missing crucial features ). Last but not least, classic corporate developers are quantitatively and qualitatively nowhere close to what Google has. When corporations say: "How can Google do this ? Why can't we do the same?", it is similar to saying: how can Seinfeld marry 30 years younger woman. Yes, he can, and no,  you can't, period.
Then there is corporate IT staff  resistance to learn new software that is not ( yet ) associated with big bucks, secure employment, big names and career advancement.

Hadoop eco system also suffers from immaturity and lack of enterprise class features ( "why do you want to backup ", I was asked by a leading Hadoop distributor ). You can't perform random read/write in HDFS, which ripples all the way to the inability to have secondary indexes in Impala, for example. Limitations are many and often surprising. For example, Impala simply errors out when there is insufficient memory to complete a query and Cloudera ( vendor ) is working on fully supporting spill-to-disk "feature".

On the other hand, you can have 5TB/hour of data transferred between Hadoop clusters residing miles away. You can ingest and process volume and types of data at prices that you can only dream of with standard RDBMS technologies like Oracle. That sure sounds like disruptive technology scenario might be unfolding.

And yet it moves. If not in your company, then it will somewhere else. Slowly and steadily. No pressure.