Guest blog post by Raghavan Madabusi
Apache Drill is a low-latency distributed query engine for large-scale datasets, including structured and semi-structured/nested data. Inspired by Google’s Dremel, Drill is designed to scale to several thousands of nodes and query petabytes of data at interactive speeds that BI/Analytics environments require.
Apache Drill includes a distributed execution environment, purpose built for large-scale data processing. At the core of Apache Drill is the “Drillbit” service which is responsible for accepting requests from the client, processing the queries, and returning results to the client. When a Drillbit runs on each data node in a cluster, Drill can maximize data locality during query execution without moving data over the network or between nodes.
User: Providing interfaces such as a command line interface (CLI), a REST interface, JDBC/ODBC, etc., for human or applicationdriven interaction.
Processing: Allowing for pluggable query languages as well as the query planner, execution, and storage engines.
Data sources: Pluggable data sources either local or in a cluster setup, providing in-situ data processing.
Note that Apache Drill is not a database but rather a query layer that works with a number of underlying data sources. It is primarily designed to do full table scans of relevant data as opposed to, say, maintaining indices. Not unlike the MapReduce part of Hadoop provides a framework for parallel processing, Apache Drill provides for a flexible query execution framework, enabling a number of use cases from quick aggregation of statistics to explorative data analysis.
Apache Drill can query data residing in different file formats (CSV, TSV, JSON, PARQUET, AVRO) and in different data sources (Hive, HBASE, HDFS, S3, MongoDB, Cassandra, and others). Drill provides a unified query layer that can interact with different file formats in different data sources thus avoiding any ETL necessary to bring data in different places to one location.
Check out the Use Case that demonstrates some of the Drill's capabilities