Subscribe to our Newsletter

Integrating Hadoop with traditional BI/DW

Posted by Andrew Peterson on February 24, 2014 at 9:15am

I'm seeing quite a bit of discussion on Hadoop in general, but I'm not seeing a broader discussion on where Hadoop and other NoSQL tools fit into the overall BI/DW architecture and framework. Bill Inmon posted some commentary recently that compared Hadoop (a tool) with a data warehouse.

Bill Inmon on Hadoop and data warehouse ( ).  A must read.

From another perspective, Microsoft has a case study posted on how Klout uses Hadoop for storage of web activity which is then pushed off to a more traditional data warehouse.  This is a good example of how Hadoop feeds into a data warehouse framework.

Klout use of Hadoop  ( )

I see Hadoop as another tool. A great tool for the right needs, but not the one tool that will replace them all.  Even Google has started to move selected activities into a more traditional SQL oriented framework.  Their F1 system was created to support some of their web activities that need a more format SQL structure.

Microsoft's Parallel Data Warehouse (PDW) has introduced a framework called PolyBase that allows uses to seamlessly integrate Hadoop data into an existing data warehouse. I see this as a sound future for where Hadoop and other NoSQL tools can and will be merged with other existing column store and row store tools such as Vertica and Microsoft's PDW.

Just some of my thoughts on Hadoop.

If interested, Just found this posting by Bill Inmon in how Hadoop/NoSQL and a data warehouse architecture compare.  Hadoop especially is as much a massively redundant storage engine as it is a tool for analyzing unstructured data.

Article by Inmon   ( )


You need to be a member of Hadoop360 to add comments!

Join Hadoop360

Email me when people reply –