This million dollar question has been plaguing Hadoop enthusiasts and wannabe Hadoop enthusiasts for a while now. This blog seems to have deciphered it reasonably well:
"When Doug Cutting, the creator of Hadoop, named his new framework after his son’s toy elephant, little did he know that it would take the open source software world by storm. Today, we can also presume that Doug did not wish to create an elephantine misconception about Java being required to master Hadoop. True, Hadoop is built on Java. But you do not need to be a Java programmer to work on Hadoop.
Two important Hadoop components endorse the fact that you can work with Hadoop without having functional knowledge of Java -- Pig and Hive."
Pig is a high-level data flow language and execution framework for parallel computation, while Hive is a data warehouse infrastructure that provides data summarization and ad- hoc querying. Pig is widely used by researchers and programmers while Hive is a favourite with data analysts.
10 lines of Pig = 200 lines of Java. Check out this blog for a Pig demo.
In order to navigate through Pig and Hive, you only need to learn Pig Latin and Hive Query Language (HQL), both of which need only an SQL base. Pig Latin is very similar to SQL, while HQL can best be described as a much faster and more tolerant avatar of SQL. These languages are easy to learn, and more than 80% of Hadoop projects revolve around them.
Here's the complete post:
http://www.edureka.co/blog/do-you-need-java-to-learn-hadoop
Comments