Getting Started with Impala by John Russell

You can buy the book here

This book introduces Impala SQL for Hadoop in a  simple and crystal clear way. Impala is a modern, and yet very interesting new entry in the Hadoop ecosystem. Cloudera, by implementing Impala, provided a very useful layer to Hadoop.

Impala makes SQL knowledge valuable again without having other low lever skills on Hadoop. SQL is already known to many data professionals coming to Hadoop from various platforms and DBMS and this layer helps those professionals to be ready and focused on real value of Big Data: producing Insights.

Impala currently is based on a subset of ANSI-92 SQL specification and that’s a good starting point to make SQL professional productive also in Big Data Hadoop environment.

To me the book reaches two important goals: first it’s highly readable and concise, second it shows something under active development in a “stable” and “actionable” way.

The author, to clearly point his view, provides in the book two main message: the first is “why and how” Impala implementation is a good news in Hadoop. Second: how to be ready with daily tasks and avoid pitfalls.

Impala is provided in the Cloudera’s Hadoop distribution, if you are using a different distribution, Impala is not part of it.

Pros: _____ Cons:
  • Accurate
  • Easy to understand
  • Helpful examples
  • Well-written
  • none

Best uses:

  • Expert
  • Intermediate

Yes, I would definitely recommend this book.

Since Impala is a new entry I hope other books will follow covering other integration stuff with Hadoop stack components such as Hive, HBase, Pig, Scoop etc.

Disclaimer: the book was provided to me for free as part of O’Reilly’s blogger reviewer program.