getting started with apache ignite sql · •ignite sql basics: dml, ddl, connectivity,...

Getting Started With Apache Ignite SQL

Denis Magda, GridGain Developer RelationsIgor Seliverstov, GridGain Architecture Group

Topics

• Ignite SQL Basics: DML, DDL, connectivity, configuration• Affinity Co-Location and Distributed JOINs• Beyond Memory Capacity: Disk Tier Usage and Memory Quotas• Ignite SQL Evolution With Apache Calcite

Ignite SQL Basics

Ignite SQL = ANSI SQL at Scale

• ANSI-99 DML and DDL syntax– SELECT, UPDATE, CREATE…

• Distributed joins, grouping, sorting

• Schema changes in runtime– ALTER TABLE, CREATE/DROP INDEX

• Works with in-memory and disk-only records– If Ignite Persistence is used as a disk tier

Connectivity Options

• Thick Client APIs– Java, C#/.NET, C++

• JDBC and ODBC drivers

• Thin Client APIs– Multi-language support

Configuration Option #1: Programmatically With Annotations

Usage Scenario:• Spring-style development by annotating POJOs• DDL can be used to apply changes in runtime.

Configuration Option #2: Spring XML With Query Entities

Usage Scenario:• Ignite as a cache that writes-through

changes to an external database.

• DDL can be used to apply changes in runtime.

Configuration Option #3: In Pure SQL With DDL

Usage Scenario:• SQL-driven applications• Green-field applications using Ignite as a

database with its native persistence

Demo Time

Cluster Startup and Database Creation

Affinity Co-Location andDistributed JOINs

Ignite SQL Engine Internals

Data & Indexes

Ignite SQL

H2 Engine

Data & Indexes

Ignite SQL

H2 Engine

Data & Indexes

Ignite SQL

H2 Engine

Query Execution Phases

Thick Client

Reduce

Default Data Distribution

Canada

Toronto

Calgary

France

MarseilleMontreal

Ottawa

Country Table

City Table

SQL JOIN With Data Shuffling

Thick Client

Canada

Toronto

CalgaryParis

France

MarseilleOttawa

Montreal

ParisOttawaMontreal

1. Initiating Execution2. Execution on Servers (map phase)3. Data Shuffling4. Reduce Phase

Co-Located Distribution (aka. Affinity Co-Location)

Canada

Toronto

Calgary

France

Marseille

Country Table

City Table

MontrealOttawa Paris

All You Need is to Configure Affinity Key

Affinity Key to Node Mapping Process

Affinity Key Partition

Application Process Network Call

CityRecord

High-Performance SQL JOIN

Thick Client

Canada

Toronto

Calgary

France

Marseille

1. Initiating Execution2. Execution on Servers (map phase)3. Reduce Phase

Ottawa

Demo Time

Queries With JOINs

Beyond Memory Capacity:Disk-Tier and Memory Quotas

Multi-Tier Storage architecture

1. In-Memory - General in-memory caching, high-performance computing

2. In-Memory + Native Persistence - Ignite as an in-memory database

3. In-Memory + External Database - Acceleration of services and APIs with write-through and write-behind capability

Multi-Tier Storage Architecture

Index Page(root)

Index page(inner)

Inner page 2

Leaf page 2Index Page(leaf)

Leaf page 3

Data page Index Page(root) Leaf page Metadata page Leaf page Data page Metadata page

Memory segment

Key-Value

Data page

DataIndex

Multi-Tier Storage Architecture

Data page #0 Data page #1

Partition file with Data

Data page #2 Data page #3 Data page #4 Data page #5 Data page #6

Data page #5 Inner page Leaf page Metadata page Leaf page Data page #2 Metadata page

Memory segment

PageIdPages map Pointer in a memory segment

(read/write ops)

Position in a file (load page/checkpoint)

Java off-heap vs Java heap

SQL query processing

ParsingPlanning

Computing(filters, joins, expressions)

Scanning(index or table scan)

More Heap

Off-heap

Java off-heap vs Java heap

Sorting

Renaming

Aggregation

Projection

Filtering

ScanningCOUNTRYCITY

σcode in (‘CAN’, ‘FRA’)

⋈country.code = city.countrycode

country.name, city.nameℱMAX(city.population)

τmax_pop

πcountry.name, city.name, city.population

ρname, name0, max_pop

Here we need full set in heap

Here we need full set in heap too

Query memory quotas

How to configure:

Interim results offloading

Sorting

Renaming

Aggregation

Projection

Filtering

ScanningCOUNTRYCITY

σcode in (‘CAN’, ‘FRA’)

⋈country.code = city.countrycode

country.name, city.nameℱMAX(city.population)

τmax_pop

πcountry.name, city.name, city.population

ρname, name0, max_pop

Why don’t you flush result sets to disk?

And it

Intermediate results offloading

How to configure:

When you need quotas/offloading enabled

● Sorting (ORDER BY)

● Grouping (DISTINCT, GROUP BY)

● Complex subqueries

Demo Time

Running SQL Over Disk-Only Records

Apache Ignite SQL EvolutionWith Apache Calcite

Why do we need it?

Here we need Map-Reduce phase

Here we need Map-Reduce phase too

Typical execution flow

Parser

Optimizer modeRule-based optimizer

Cost-based optimizer

Dictionary

Row source generator Execution

SQL Query

RBO CBOStatistics

Result

Query plan

ValidatorValidation

Schema

Query AST

Apache Calcite

JDBC Client

JDBC Server

SQL Parser/Validator

Query optimizer

3rd party ops 3rd party ops

Metadata SPI

Plugable rules

3rd party data

Optional

Pluggable

Need to implement:

● Splitter

● Runtime

● Indexes support

● DML support

● DDL support

Query Parser and Transformer

Select

expr=p.id, d.name

from=person, dep

cond=p.depId = d.id AND (p.id > 10 OR p.id < 10000)

order=p.name DESC

offset=10

dep(d)person(p)

σp.id > 10 OR p.id < 10000

⋈p.depId = d.id

τp.name DESC

πp.id, p.name

Query AST

Relational operators tree (query plan)

Cost-Based Optimizer

Estimator Dictionary

Plan generator

Query transformer

Query AST (from parser)

Relational tree

Relational tree + costs

Statistics

Equivalent relational

Query plan (to Row source generator)

Cost-Based Splitter

dep(d)person(p)

σp.id > 10 OR p.id < 10000

⋈p.depId = d.id

τp.name DESC

πp.id, p.name

Reactive Execution Flow

Scan Filter Sender Receiver Client cursor

Push Push Send PushData flow

Request Request Acknowledge Request

Node buffer Node buffer Node buffer Node buffer

Backpressure

Network communication

Demo Time

Calcite Prototype Demo With Sub-Queries

Learn More

• Apache Ignite SQL– https://apacheignite-sql.readme.io/docs

• Memory Quotas (available in GridGain Community Edition):

– https://www.gridgain.com/docs/latest/developers-guide/memory-configuration/memory-quotas

• Demos shown in this webinar– https://github.com/GridGain-Demos/ignite-sql-intro-samples

• New Apache Calcite-based engine– https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+

query+execution+engine

getting started with apache ignite sql · •ignite sql basics: dml, ddl, connectivity,...

Documents

lecture 1: sql: ddl & dml

overview of sql ddl, dml and dcl commands. manual 2008 –...

dml en sql consultas sencillas usando el dml de sql

sql - data manipulation language (dml)

the sql query language dml1 the sql query language dml odds...

chapter 7 structured query language (sql) › ~mhughes ›...

lab 10 - sql dml iii & db programming...

Διάλεξη 11: sql data definition...

sql – introduction standard dml/ddl for relational db’s...

review of sql dml

--the sql query language dml--1 the sql query language dml...

hap 709 – healthcare databases sql data manipulation...

sql ddl & dml. 2 objectives of sql u sql is a...

why i design for human variability [dml conference ignite...

1 cs 430 database theory winter 2005 lecture 12: sql dml -...

outline: sql ddl - creating schemas - modifying schemas dml...

sql server 2014 -...

structured query language (sql) chapter 5: part 2: dml

cs3431 sql: updates (dml) and views (ddl). cs3431 sql dml...

sql (structured query language). 2 outline the company...