impala: a modern, open-source sql engine for hadoop

Post on 14-Jul-2015

246 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

SQL App

ODBCHDFS NN

Statestore&

Catalog

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

SQL request

HiveMetastore

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

SQL App

ODBC

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

HDFS NNStatestore

&Catalog

Planner turns request into collections of plan fragmentsCoordinator initiates execution on remotes nodes

HiveMetastore

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

SQL App

ODBCHive

Metastore HDFS NNStatestore

&Catalog

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

query results

Intermediate results are streamed between nodes

Operation permitted, query results are streamed back to client

void MaterializeTuple(char* tuple) {for (int i = 0; i < num_slots_; ++i) {

char* slot = tuple + offsets_[i];switch (types_[i]) {

case BOOLEAN:*slot = ParseBoolean();break;

case INT:*slot = ParseInt();

case FLOAT: …case STRING: …// etc.

}}

}

void MaterializeTuple(char* tuple) {// i = 0*(tuple + 0) = ParseInt();// i = 1*(tuple + 4) = ParseBoolean();// i = 2*(tuple + 5) = ParseInt();

}

Hot code path, called per row

QueryFragment

QueryFragment

QueryFragment

IO Manager

Disk Disk Disk

Impala Daemon

Disk Disk

Thread0

Thread1

Thread2

Thread3

Thread4

container format for all popular serialization formats: Avro, Thrift, Protocol Buffers

From Twitter’s “Dremel Made Simple” blog

The most efficient IO, is one that never happens at all

OVER PARTITION, RANK, LEAD, LAG, NTILE, ..

•VARCHAR, CHAR

ROLLUP, CUBE, GROUPING SETSET MINUS INTERSECT

SELECT question FROM audience WHERE has_question = true;

top related