Course Description: Training course is designed for developers who want to better understand how to create Apache Hadoop solutions. This 35 Hours provides Java programmers the necessary training for creating enterprise solutions using Apache Hadoop. It consists of an prudent combination of interactive lecture and extensive hand-on lab exercises.
Write a MapReduce program using Hadoop API.
Learn how to configure Hadoop on single/multiple machines.
Perform different Hadoop admin activities on Hadoop cluster.
Use Pig, Hive, HBase and HCatalog effectively.
Course Duration: 35 hours. Class Delivery: On-Line (Interactive Web Based ). Contents:
- The problem space and example applications
- Why don't traditional approaches scale?
- Hadoop History
- The ecosystem and stack: HDFS, MapReduce, Hive, Pigs
- Cluster architecture overview
- Hadoop distribution and basic commands
- Eclipse development
- The HDFS command line and web interfaces
- The HDFS Java API (lab)
- Key philosophy: move computation, not data
- Core concepts: Mappers, reducers, drivers
- The MapReduce Java API (lab)
- Optimizing with Combiners and Partitioners (lab)
- More common algorithms: sorting, indexing and searching (lab)
- Relational manipulation: map-side and reduce-side joins (lab)
- Chaining Jobs
- Testing with MRUnit
- Patterns to abstract "thinking in MapReduce"
- The Cascading library (lab)
- The Hive database (lab)