Subscribe to our Newsletter

[Book] Hadoop in Practice

Guest blog post by Vincent Granville

Hadoop in Practice

Alex Holmes

MEAP Began: December 2011
Softbound print: Fall 2012 | 425 pages 
ISBN: 9781617290237
Pre-Order options*
Order today and start reading Hadoop in Practice today through MEAP        
MEAP + Ebook only - $35.99
MEAP + Print book (includes Ebook) when available - $44.99
* For more information, please see the MEAP FAQs page.
  About MEAP Release Date Estimates   


Table of Contents        Resources 
  1: Getting started - FREE 

Part I: Data Logistics 
  2: Moving Data in and out of Hadoop 
  3: Data Serialization: Working with Text and Beyond - AVAILABLE 

Part II: Big Data Patterns 
  4: Applying MapReduce Patterns to Big Data 
  5: Streamlining HDFS for Big Data - AVAILABLE 
  6: Measuring and Optimizing Performance 

Part III: Data Science 
  7: Utilizing Data Structures and Algorithms 
  8. Applying Statistics 
  9. Machine Learning 

Part IV: Taming the Elephant 
10. Hive 
11. Pig 
12. Crunch and Other Technologies 
13. Testing and Debugging 
14: Job Coordination 
15. Proficient Administration 

  A: Related Technologies - AVAILABLE 
  B: Hadoop Built-in Ingress and Egress Tools 
  C: HDFS Dissected 
  D: Optimized MapReduce Join Frameworks


Hadoop is a open-source platform designed to efficiently query and analyze data distributed across large clusters. It's built around MapReduce, Google's algorithm for rapidly creating a distributed index of the Internet.

Because it's especially effective for "Big Data" systems, many well-known companies use Hadoop, including Apple, eBay and LinkedIn. Yahoo and Facebook each claim to have the largest Hadoop implementation, with petabytes of data spread across thousands of machines.

The theory behind MapReduce is straight-forward: break down a large unit of work into small parts that execute in parallel across a cluster. It gets more complicated when you start applying Hadoop to problems like complex queries, statistical calculations, real-time financial transactions, and machine learning. You need tested, practical techniques you can rely on to get the job done.

Hadoop in Practice collects nearly 100 Hadoop examples and presents them in a problem/solution format. Each technique addresses a specific task you'll face, like querying big data using Pig or writing a log file loader. You'll explore each problem step by step, learning both how to build and deploy that specific solution along with the thinking that went into its design. As you work through the tasks, you'll find yourself growing more comfortable with Hadoop and at home in the world of big data.


  • Nearly 100 tested, ready-to-use techniques
  • Conceptual overview of Hadoop and MapReduce
  • Real problems, real solutions

This book assumes you've already started exploring Hadoop and want concrete advice on how to use it in production.


Alex Holmes is a Software Engineer with over a decade of experience developing large scale distributed Java systems. He currently is a technical lead at VeriSign, using Hadoop as a Big Data platform. Alex previously developed an Internet crawl, analysis and search system using Hadoop and machine classification algorithms.


This Early Access version of Hadoop in Practice enables you to receive new chapters as they are being written. You can also interact with the authors to ask questions, provide feedback and errata, and help shape the final manuscript on the Author Online

E-mail me when people leave their comments –

You need to be a member of Hadoop360 to add comments!

Join Hadoop360

Featured Blog Posts - DSC