getting started with apache ignite sql · •ignite sql basics: dml, ddl, connectivity,...

40
Getting Started With Apache Ignite SQL Denis Magda, GridGain Developer Relations Igor Seliverstov, GridGain Architecture Group

Upload: others

Post on 18-Jan-2021

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Getting Started With Apache Ignite SQL

Denis Magda, GridGain Developer RelationsIgor Seliverstov, GridGain Architecture Group

Page 2: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Topics

• Ignite SQL Basics: DML, DDL, connectivity, configuration• Affinity Co-Location and Distributed JOINs• Beyond Memory Capacity: Disk Tier Usage and Memory Quotas• Ignite SQL Evolution With Apache Calcite

Page 3: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Ignite SQL Basics

Page 4: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Ignite SQL = ANSI SQL at Scale

• ANSI-99 DML and DDL syntax– SELECT, UPDATE, CREATE…

• Distributed joins, grouping, sorting

• Schema changes in runtime– ALTER TABLE, CREATE/DROP INDEX

• Works with in-memory and disk-only records– If Ignite Persistence is used as a disk tier

Page 5: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Connectivity Options

• Thick Client APIs– Java, C#/.NET, C++

• JDBC and ODBC drivers

• Thin Client APIs– Multi-language support

Page 6: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Configuration Option #1: Programmatically With Annotations

Usage Scenario:• Spring-style development by annotating POJOs• DDL can be used to apply changes in runtime.

Page 7: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Configuration Option #2: Spring XML With Query Entities

Usage Scenario:• Ignite as a cache that writes-through

changes to an external database.

• DDL can be used to apply changes in runtime.

Page 8: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Configuration Option #3: In Pure SQL With DDL

Usage Scenario:• SQL-driven applications• Green-field applications using Ignite as a

database with its native persistence

Page 9: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Demo Time

Cluster Startup and Database Creation

Page 10: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Affinity Co-Location andDistributed JOINs

Page 11: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Ignite SQL Engine Internals

Data & Indexes

Ignite SQL

H2 Engine

Data & Indexes

Ignite SQL

H2 Engine

Data & Indexes

Ignite SQL

H2 Engine

Page 12: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Query Execution Phases

City

City

Thick Client

Map

Map

Map

Reduce

Reduce

Page 13: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Default Data Distribution

Canada

Toronto

Calgary

Paris

France

MarseilleMontreal

Ottawa

Country Table

City Table

Page 14: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

SQL JOIN With Data Shuffling

Thick Client

Canada

Toronto

CalgaryParis

France

MarseilleOttawa

Montreal

ParisOttawaMontreal

1 & 4

2

2

3

1. Initiating Execution2. Execution on Servers (map phase)3. Data Shuffling4. Reduce Phase

Page 15: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Co-Located Distribution (aka. Affinity Co-Location)

Canada

Toronto

Calgary

France

Marseille

Country Table

City Table

MontrealOttawa Paris

Page 16: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

All You Need is to Configure Affinity Key

Page 17: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Affinity Key to Node Mapping Process

Affinity Key Partition

Application Process Network Call

Node

CityRecord

Page 18: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

High-Performance SQL JOIN

Thick Client

Canada

Toronto

Calgary

France

Marseille

1 & 3

2

2

1. Initiating Execution2. Execution on Servers (map phase)3. Reduce Phase

Ottawa

Paris

Page 19: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Demo Time

Queries With JOINs

Page 20: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Beyond Memory Capacity:Disk-Tier and Memory Quotas

Page 21: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Multi-Tier Storage architecture

1. In-Memory - General in-memory caching, high-performance computing

2. In-Memory + Native Persistence - Ignite as an in-memory database

3. In-Memory + External Database - Acceleration of services and APIs with write-through and write-behind capability

Page 22: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Multi-Tier Storage Architecture

Index Page(root)

Index page(inner)

Inner page 2

Leaf page 2Index Page(leaf)

Leaf page 3

Data page Index Page(root) Leaf page Metadata page Leaf page Data page Metadata page

Memory segment

Key-Value

Key-Value

Key-Value

Key-Value

Data page

DataIndex

Page 23: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Multi-Tier Storage Architecture

Data page #0 Data page #1

Partition file with Data

Data page #2 Data page #3 Data page #4 Data page #5 Data page #6

Data page #5 Inner page Leaf page Metadata page Leaf page Data page #2 Metadata page

Memory segment

PageIdPages map Pointer in a memory segment

(read/write ops)

Position in a file (load page/checkpoint)

Page 24: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Java off-heap vs Java heap

SQL query processing

ParsingPlanning

Computing(filters, joins, expressions)

Scanning(index or table scan)

Heap

Heap

More Heap

Off-heap

Page 25: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Java off-heap vs Java heap

Sorting

Renaming

Aggregation

Projection

Join

Filtering

ScanningCOUNTRYCITY

σcode in (‘CAN’, ‘FRA’)

⋈country.code = city.countrycode

country.name, city.nameℱMAX(city.population)

τmax_pop

πcountry.name, city.name, city.population

ρname, name0, max_pop

Here we need full set in heap

Here we need full set in heap too

Page 26: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Query memory quotas

How to configure:

Page 27: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Interim results offloading

Sorting

Renaming

Aggregation

Projection

Join

Filtering

ScanningCOUNTRYCITY

σcode in (‘CAN’, ‘FRA’)

⋈country.code = city.countrycode

country.name, city.nameℱMAX(city.population)

τmax_pop

πcountry.name, city.name, city.population

ρname, name0, max_pop

Why don’t you flush result sets to disk?

And it

Page 28: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Intermediate results offloading

How to configure:

Page 29: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

When you need quotas/offloading enabled

● Sorting (ORDER BY)

● Grouping (DISTINCT, GROUP BY)

● Complex subqueries

Page 30: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Demo Time

Running SQL Over Disk-Only Records

Page 31: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Apache Ignite SQL EvolutionWith Apache Calcite

Page 32: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Why do we need it?

Here we need Map-Reduce phase

Here we need Map-Reduce phase too

Page 33: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Typical execution flow

User

Parser

Optimizer modeRule-based optimizer

Cost-based optimizer

Dictionary

Row source generator Execution

SQL Query

RBO CBOStatistics

Result

Query plan

ValidatorValidation

Schema

Query AST

Page 34: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Apache Calcite

JDBC Client

JDBC Server

SQL Parser/Validator

Query optimizer

3rd party ops 3rd party ops

Metadata SPI

Plugable rules

3rd party data

3rd party data

Optional

Core

Pluggable

Need to implement:

● Splitter

● Runtime

● Indexes support

● DML support

● DDL support

Page 35: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Query Parser and Transformer

Select

expr=p.id, d.name

from=person, dep

cond=p.depId = d.id AND (p.id > 10 OR p.id < 10000)

order=p.name DESC

offset=10

dep(d)person(p)

σp.id > 10 OR p.id < 10000

⋈p.depId = d.id

τp.name DESC

πp.id, p.name

Query AST

Relational operators tree (query plan)

Page 36: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Cost-Based Optimizer

Estimator Dictionary

Plan generator

Query transformer

Query AST (from parser)

Relational tree

Relational tree + costs

Statistics

Equivalent relational

tree

Query plan (to Row source generator)

Rules

Page 37: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Cost-Based Splitter

dep(d)person(p)

σp.id > 10 OR p.id < 10000

⋈p.depId = d.id

τp.name DESC

πp.id, p.name

Root

Page 38: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Reactive Execution Flow

Scan Filter Sender Receiver Client cursor

Push Push Send PushData flow

Request Request Acknowledge Request

Node buffer Node buffer Node buffer Node buffer

Backpressure

Network communication

Page 39: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Demo Time

Calcite Prototype Demo With Sub-Queries

Page 40: Getting Started With Apache Ignite SQL · •Ignite SQL Basics: DML, DDL, connectivity, configuration •Affinity Co-Location and Distributed JOINs •Beyond Memory Capacity: Disk

Learn More

• Apache Ignite SQL– https://apacheignite-sql.readme.io/docs

• Memory Quotas (available in GridGain Community Edition):

– https://www.gridgain.com/docs/latest/developers-guide/memory-configuration/memory-quotas

• Demos shown in this webinar– https://github.com/GridGain-Demos/ignite-sql-intro-samples

• New Apache Calcite-based engine– https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+

query+execution+engine