Originally posted on Analytic Bridge
By Dan Kellett, Director of Data Science, Capital One UK
Disclaimer: This is my attempt to explain some of the ‘Big Data’ concepts using basic analogies. There are inevitably nuances my analogy misses.
What is HDFS?
When people talk about ‘Hadoop’ they are usually referring to either the efficient storing or processing of large amounts of data. MapReduce is a framework for efficient processing using a parallel, distributed algorithm (see my previous blog here). The standard approach to reliable, scalable data storage in Hadoop is through the use of HDFS (Hadoop Distributed File System).