Q1. What exactly is Hadoop?
A1. Hadoop is a Big Data framework to process huge amount of different types of data in parallel to achieve performance benefits.
Q2. What are 5 Vs of Big Data ?
A2. Volume – Size of the data
Velocity – Speed of change of data
Variety – Different types of data : Structured, Semi-Structured, Unstructured data.
Q3. Give me examples of Unstructured data.
A3. Images, Videos, Audios etc.
Q4. Tell me about Hadoop file system and processing framework.
A4. Hadoop files system is called as HDFS – Hadoop distributed file system. It consists of Name Node, Data Node and Secondary Name Node.
Hadoop processing framework is known as MapReduce. It caters Map and Reduce tasks that get scheduled in parallel to achieve efficiency.
Q5/ What is High Availability feature in Hadoop2.
A5. In Hadoop 2 Passive Name Node is introduced to avoid NameNode becoming single point of failure. This results into High Availability of Hadoop cluster.
Q6. What is Federation.
A6. Federation is introduced in Hadoop 2 to cater multiple NameNodes in Hadoop cluster. This makes NameNode horizontally scalable and allows to cater huge amount of Meta Data.
Q7. What is MetaData ?
A7. MetaData is data about data. Name Node caters MetaData in Hadoop cluster – information about files in HDFS.
Q8. What are the main components in Hadoop Eco-System and what are their functions ?
A8. Here is a list of Hadoop Eco-System components –
1. HDFS – distributed File System
2. MapReduce – programming paradigm – based on Java
3. Pig- to process and analyse the structured and semi-structured data
4. Hive – to process and analyse structured data
5. HBASE – NOSQL database
6. SQOOP – Import/Export structured data
7. Oozie – Scheduler
Q9. Tell me some major benefits of Hadoop?
A9. Some major benefits of Hadoop are –
b. Ability to handle multiple data types
c. Ability to handle big data
d. Common platform for machine learning/business intelligence/datawarehousing etc.
Q10. How Hadoop is cost-effective?
A10. Hadoop is used with commodity hardware and is open-source. So, it provides a cost-effective solution from both hardware and software fronts.
For original click here