Saturday, July 26, 2014

Oracle 12c Database In-Memory is Out - Hardly Anybody Notices

Oracle 12c with In-Memory option ( note that even Oracle doesn't dare to call it In-Memory RDBMS - hence awkward Database In-Memory designation )  was released last week. Press and media are rightfully silent about it ( aside from a couple of  Oracle heads who are excited something "new" is finally happening in Oracle database world ). Much hyped Oracle flagship database's initial foray into in-memory, columnar world is a storm in a teacup, as no ground breaking features are offered ( that is, unless your database world view is limited to Oracle only ).

Oracle In-Memory option is yet another (column-organized) cache on top of old and tired row/disk based database ( establ. 1979 ).
Oracle's latest solution is fairly pedestrian - columnar cache has to be reloaded and data transformed from row to columnar on each database restart or first use of a particular table. The question this approach raises is how fast will cache be populated on each database startup, since query performance will surely suffer until cache is fully reloaded.

IBM BLU, SAP HANA are ahead of Oracle as they store data on disk natively in columnar format i.e. once data is written to disk it does not need to be transformed from row to columnar any more.

Oracle is clearly in defensive mode and not in aggressive leading edge charge. Their selling point is: no changes, backwards compatibility, all remains the same. You will get spectacular improvements in performance in some specific cases and under specific conditions. Bottom line is: you pay a lot ( In-Memory is separately priced option ), do little in terms of your technical effort ( it is easy to administer this feature and it is transparent to applications ) and experience some performance gains ( improvements will be noticeable in specific cases and situations, not across the board ). Oracle will surely continue to expand on this theme, but at this pace it will take a while to catch up with SAP Hana and IBM BLU. Any possibility of Oracle taking the lead position in database innovation race is out of question for now.

Friday, July 18, 2014

Big Data/Hadoop and Incumbent RDBMS Vendors - Saga Continues

All major RDBMS vendors ( Oracle, IBM, Teradata ) are developing and executing on strategies on how to cope with the rise of Hadoop, as well as on how to ride Big Data wave.
As far as Hadoop is concerned, initial idea was to use simple RDBMS-Hadoop connectors and loaders and thus contain or reduce Hadoop's role to storage or perhaps ETL platform. We are now witnessing next stage of Hadoop related strategies -   proliferation of federated query engines like Teradata QueryGrid, SAP Hana Smart Data Access, recently announced Oracle Big Data SQL etc. 
Oracle Big Data SQL, Teradata QueryGrid and other  federated query approaches over heterogeneous data ( Composite software and other data virtualization vendors don't belong to this category ) have Hadoop in cross-hairs i.e. they are legacy vendor attempts to cope with inevitable rise of Hadoop as centerpiece of Big Data initiatives.
Federated query engines typically originate queries from respective  legacy vendor software and/or hardware platforms. Oracle Big Data SQL, for example, runs on custom hardware only for now ( it doesn't, actually - it is still vaporware as of this date, in customary, time-honored Oracle manner ); Hadoop related part is essentially Exadata cellsrv software port to Hadoop datanodes. Distributed queries are executed across heterogeneous data sources ( Hadoop, databases etc ) with varied degrees of intelligence ( predicate pushdown, local data processing via smart scan in case of Oracle), query optimization and performance. 
Fundamental problem with this approach is that it centers Big Data activities in wrong place.
Hadoop is synonymous with Big Data initiative and is the hub around which other data sources will revolve i.e. Hadoop is a system of record for Big Data. Big Data activities should not be centered around legacy data platforms like Oracle, Teradata etc., which is exactly what above mentioned products enforce.
Federated query solutions like Oracle Big Data SQL only cover minor Big Data use cases (even if they deliver on performance area, which in itself is a tough problem to solve in heterogeneous environments ). 
This class of products should be viewed as legacy vendors attempt to defend and expand their turf by leveraging large installed base.
Some of these products are fairly advanced as they build on decades of experience in data management and are backed by huge financial and other resources of legacy vendors. The old guard can thus innovate on Hadoop platform quite fast. Hadoop was not initially built for BI or corporate data management. Relatively inexperienced ( at least in enterprise database management software development arena ) dedicated Hadoop vendors like Cloudera are tiptoeing around basic DBMS concepts and rediscovering tricks that legacy vendors mastered over decades of experience. 
Not surprisingly, Teradata was one of the first vendors to release such federated query solution ( first called SQL-H, now it is QueryGrid )  - probably because potential Hadoop squeeze is felt the strongest in their high end, very large data warehouses niche, which also happens to be Hadoop's entry point into DBMS market. 
Hadoop and Big Data are new approaches to building completely new analytic infrastructure and develop whole new class of applications based on nearly infinite scalability and near zero storage prices. While we can borrow some concepts and technologies from the old world, Big Data folks are also experimenting with newish concepts like schema-on-read that will redefine how we deal with all aspects of analytics pipeline.