presto testing tools: benchto & tempto (presto boston meetup 10062015)

Testing tools

[email protected]

Ł[email protected]@teradata.com

Why we need them

● Certified distro● Enterprise support● Quarterly releases

● Product testing - Tempto● Performance testing - Benchto

TemptoProduct test framework

github.com/prestodb/tempto

Łukasz [email protected]

http://www.github.com/prestodb/tempto

http://www.github.com/prestodb/tempto

What is Tempto?● End-to-end product testing framework● Targeted to software engineers● For automation● Tests easy to define● Focus on test code● Focus on database systems

● So far used for testing ○ Presto○ internal projects

How is test defined?● Java● SQL convention based

Example – Java based testpublic class SimpleQueryTest extends ProductTest {

private static class SimpleTestRequirements implements RequirementsProvider{ public Requirement getRequirements(Configuration config) { return new ImmutableHiveTableRequirement(NATION); } } @Inject Configuration configuration;

@Test(groups = {"smoke", "query"}) @Requires(SimpleTestRequirements.class) public void selectCountFromNation() { assertThat(query("select count(*) from nation")) .hasRowsCount(1) .hasRows(row(25)); }}

Example – Convention based test

allRows.sql:-- database: hive; tables: blah

SELECT * FROM sample_table

allRows.result:-- delimiter: |; ignoreOrder: false; types: BIGINT,VARCHAR

1|A|

2|B|

3|C|

Tempto architecture

user provided

library provided

TestNG

TestNGlisteners utils

testsrequirements requirement

fulfillers

Tempto architecture● Works well● Extensible● Well knownTestNG

TestNGlisteners utils


fulfillers

Tempto architecture● Tempto specific extension of TestNG

execution framework● Requirements management● Tests filtering● Injecting dependencies● Extended logging

TestNG

utils


fulfillers

TestNGlisteners

Tempto architecture● Test code :)

○ Java ○ SQL-convention basedTestNG

utils

requirements requirementfulfillers

TestNGlisteners

tests

Tempto architecture● Declarative requirements● Fulfilled by test framework via

pluggable fulfillers● e.g. mutableTable(

Tpch.NATION,LOADED,“hive”)

● Test level and suite level● Cleanup

TestNG

utilsTestNGlisteners


fulfillers

Tempto architecture● extra assertions● various tools

○ HDFS client○ SSH client○ JDBC query executor

TestNG

TestNGlisteners


fulfillers

utils

Executable runnerjava -jar target/presto-product-tests-0.120-SNAPSHOT-executable.jar --help

usage: Presto product tests --config-local <arg> URI to Test local configuration YAML file. --report-dir <arg> Test reports directory --groups <arg> Test groups to be run --excluded-groups <arg> Test groups to be excluded --tests <arg> Test patterns to be included -h,--help Shows help message

● All dependencies embedded● User provides cluster details through yaml config.

Configurationhdfs: username: hdfs webhdfs: host: master port: 50070

tests: hdfs: path: /product-test

databases: default: alias: presto

hive: jdbc_driver_class: org.apache.hive.jdbc.HiveDriver jdbc_url: jdbc:hive2://master:10000 jdbc_user: hdfs jdbc_password: na jdbc_pooling: false jdbc_jar: test-framework-hive-jdbc-all.jar

presto: jdbc_driver_class: com.facebook.presto.jdbc.PrestoDriver jdbc_url: jdbc:presto://localhost:8080/hive/default jdbc_user: hdfs jdbc_password: na jdbc_pooling: false

Benchtomacro benchmarking framework

github.com/teradata/benchto (very soon)

Karol [email protected]

http://github.com/teradata/benchto

http://github.com/teradata/benchto

Goals● Easy and manageable way to define benchmarks● Run and analyze macro benchmarks in clustered environment● Repeatable benchmarking of Hadoop SQL engines, most importantly Presto

○ also used for Hive, Teradata components

● Transparent, trusted framework for benchmarking

https://prestodb.io

Benchmarks - model

BenchmarkRun QueryExecution

Measurement AggregatedMeasurement Measurement

n n

1

n

1

n

Benchmarks - executionbefore-benchmark-macros

prewarm

benchmark

.

.

execution-0

execution-1

execution-n

after-benchmark-macros

Defining benchmarks - structure● Convention based defining of benchmark through descriptors (YAML format)

and query SQL files$ tree ..├── application-presto-devenv.yaml├── application-td-hdp.yaml├── benchmarks│ ├── presto│ │ ├── concurrency-insert-multi-table.yaml│ │ ├── concurrency.yaml│ │ ├── linear-scan.yaml│ │ ├── tpch.yaml│ │ └── types.yaml│ └── querygrid-presto-ansi│ └── concurrency.yaml└── sql

├── presto│ ├── dev-zero│ │ ├── create-alltypes.sql│ │ └── create-lineitem.sql│ ├── linear-scan│ │ ├── selectivity-0.sql│ │ ├── selectivity-100.sql

...

Defining benchmarks - descriptor● Descriptor is YAML configuration file with various properties and user defined

variables$ cat benchmarks/presto/concurrency.yamldatasource: prestoquery-names: presto/linear-scan/selectivity-${selectivity}.sqlschema: tpch_100gb_orcdatabase: hiveconcurrency: ${concurrency_level}runs: ${concurrency_level}prewarm-runs: 3before-benchmark: drop-cachesvariables: 1:

selectivity: 10, 100concurrency_level: 10

2:selectivity: 10, 100concurrency_level: 20

3:selectivity: 10, 100concurrency_level: 50

Defining benchmarks – SQL file templating● SQL files can use keys defined in YAML configuration file – templates are

based on FreeMarker$ cat sql/presto/tpch/q14.sqlSELECT 100.00 * sum(CASE WHEN p.type LIKE 'PROMO%' THEN l.extendedprice * (1 - l.discount) ELSE 0 END) / sum(l.extendedprice * (1 - l.discount)) AS promo_revenueFROM "${database}"."${schema}"."lineitem" AS l, "${database}"."${schema}"."part" AS pWHERE l.partkey = p.partkey AND l.shipdate >= DATE '1995-09-01' AND l.shipdate < DATE '1995-09-01' + INTERVAL '1' MONTH

Future work● (Tempto) Support for complex concurrent tests execution● (Benchto) Automatic regression detection● (Benchto) Customized dashboards (e.g. overall performance analysis)● (Benchto) Hardware and configuration awarness● (Benchto) More complex benchmarking scenarios● (Benchto) Support for complex concurrency scenarios● (Benchto) Scheduling mechanism

Questions?

Benchto GUI● Visualization of benchmarks results● Linking between tools (Grafana, Presto UI)● Comparison of multiple benchmarks

Grafana monitoring● We use Grafana dashboard with Graphite● Benchmark/executions life-cycle events are showed on dashboards● Provides good visibility into state of the cluster

presto testing tools: benchto & tempto (presto boston meetup 10062015)

Technology