Big Data is problem statement and it can be solved with one of the tools like Apache Hadoop. But having Apache Hadoop as infra to do our proof of concepts, proof of values is little challenging. Hence we brought 3 cli
We have started to look into testing tez query engine. From initial results, we are getting 30% performance boost over Hive on smaller data set(1-10 GB) but Hive starts to perform better than Tez as data size increases. Like when we run a hive query
This is Part 1 of 10 Module Big Data and Hadoop course. The 3hr Interactive live class covers What is Big Data, What is Hadoop and Why Hadoop? You will understand How Hadoop solves the problem of Big Data with the existing Data Warehouse solutions
Design Patterns are problem specific templates developers have perfected over the years for writing correct and efficient codes. It encodes correct practices for solving a given piece of problem, so that a developer nee
When I analyze datasets for clients (say Internet click logfiles), I typically write a Perl script, summarize click log data into a "fact" table stored in memory - actually stored as an hash table - with a number of auxiliary lookup tables also store
This is of course the wrong question. I use R because I'm familiar with it, more than SAS or Python. And I use R mostly for graphics / visualization. Though things have changed, I consider R mostly as a tool to p