-- (BOOK) "Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2"
by Arun C. Murthy
, Vinod Kumar Vavilapalli
, Doug Eadline
, Joseph Niemiec
, Jeff Markham (Pearson/Addison-Wesley Professional, March 2014, ISBN
9780321934505); ** SUMMARY: This book is the comprehensive guide to building distributed, big data applications with Apache Hadoop™ YARN. YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk through the entire YARN project lifecycle, from installation through deployment, providing examples drawn from their experience—first as Hadoop’s earliest developers and implementers at Yahoo! and now as Hortonworks developers moving the platform forward.
-- (BOOK) "R for Everyone: Advanced Analytics and Graphics" by Jared Lander (Pearson/Addison-Wesley Professional, Dec. 2013, I
SBN 9780321888037); **SUMMARY: Drawing on his unsurpassed experience teaching new users, professional data scientist Jared P. Lander has written the perfect tutorial for anyone new to statistical programming and modeling. Organized to make learning easy and intuitive, this guide focuses on the 20% of R functionality most needed to accomplish 80% of modern data tasks. Lander’s self-contained chapters start with the absolute basics, offering extensive hands-on practice and sample code. Readers will download and install R; navigate and use the R environment; master basic program control, data import, and manipulation; and walk through several essential tests. Then, building on this foundation, Lander shows how to construct several complete models, both linear and nonlinear, and use some data mining techniques.
| Sample Chapter #12, "Data Reshaping
- (DIGITAL VIDEO) "R Programming LiveLessons: Fundamentals to Advanced", presented by Author Jared Lander (Pearson/Addison-Wesley Professional, Dec. 2013, ISBN 9780133743272); **SUMMARY: In 16+ hours of video instruction, Author Jared Lander provides a tour through the most important parts of R, from the very basics to complex modeling. He covers reading data, programming basics, visualization. data munging, regression, classification, clustering, modern machine learning and more. The video is based on Lander's corresponding book, "R for Everyone", and is a condensed version of the course he teaches at Columbia University.
- (BOOK) "Data Just Right: Introduction to Large-Scale Data & Analytics" by Michael Manoochehri (Pearson/Addison-Wesley Professional, Dec. 2013, ISBN 9780321898654); **SUMMARY: This book is for professionals who need practical solutions based on limited resources and time. Manoochehri helps readers to focus on building applications, rather than infrastructure, and to address each of today’s key Big Data use cases in a cost-effective way by combining technologies into hybrid solutions. He provides approaches to: managing massive datasets; visualizing data; building data pipelines and dashboards; choosing tools for statistical analysis; and more. Throughout, Manoochehri demonstrates techniques using many leading data analysis tools, including Hadoop, Hive, Shark, R, Apache Pig, Mahout, and Google BigQuery. The book is organized in parts that describe data challenges and successful solutions in the context of common use cases.
- (DIGITAL VIDEO) "Data Just Right LiveLessons" presented by Author Michael Manoochehri (Pearson/Addison-Wesley Professional, Dec. 2013, ISBN 9780133807141); **SUMMARY: In 7 hours of video instruction, Author Manoochehri provides a practical introduction to solving common data challenges, such as managing massive datasets, visualizing data, building data pipelines and dashboards, and choosing tools for statistical analysis. The course does not assume any previous experience in large scale data analytics technology, and includes detailed, practical examples.
- (BOOK) "Practical Cassandra: A Developer's Approach
" by Russell Bradberry, Eric Lubow (Pearson/Addison-Wesley Professional, Dec. 2013, ISBN 9780321933942
); **SUMMARY: Practical Cassandra
is the first hands-on developer’s guide to building Cassandra systems and applications that deliver breakthrough speed, scalability, reliability, and performance. It reflects the latest versions of Cassandra–including Cassandra Query Language (CQL), which dramatically lowers the learning curve for Cassandra developers. Bradberry and Lubow walk readers through every step of building a real production application that can store enormous amounts of structured, semi-structured, and unstructured data. Drawing on their exceptional expertise, they share practical insights into issues ranging from querying to deployment, management, maintenance, monitoring, and troubleshooting. They cover key issues, from architecture to migration, and decision-making on crucial issues such as configuration and data modeling. They provide tested sample code, detailed explanations of how Cassandra works ”under the covers,” and new case studies.