presto testing tools: benchto & tempto (presto boston meetup 10062015)
TRANSCRIPT
Why we need them
● Certified distro● Enterprise support● Quarterly releases
● Product testing - Tempto● Performance testing - Benchto
What is Tempto?● End-to-end product testing framework● Targeted to software engineers● For automation● Tests easy to define● Focus on test code● Focus on database systems
● So far used for testing ○ Presto○ internal projects
How is test defined?● Java● SQL convention based
Example – Java based testpublic class SimpleQueryTest extends ProductTest {
private static class SimpleTestRequirements implements RequirementsProvider{ public Requirement getRequirements(Configuration config) { return new ImmutableHiveTableRequirement(NATION); } } @Inject Configuration configuration;
@Test(groups = {"smoke", "query"}) @Requires(SimpleTestRequirements.class) public void selectCountFromNation() { assertThat(query("select count(*) from nation")) .hasRowsCount(1) .hasRows(row(25)); }}
Example – Convention based test
allRows.sql:-- database: hive; tables: blah
SELECT * FROM sample_table
allRows.result:-- delimiter: |; ignoreOrder: false; types: BIGINT,VARCHAR
1|A|
2|B|
3|C|
Tempto architecture
user provided
library provided
TestNG
TestNGlisteners utils
testsrequirements requirement
fulfillers
Tempto architecture● Works well● Extensible● Well knownTestNG
TestNGlisteners utils
testsrequirements requirement
fulfillers
Tempto architecture● Tempto specific extension of TestNG
execution framework● Requirements management● Tests filtering● Injecting dependencies● Extended logging
TestNG
utils
testsrequirements requirement
fulfillers
TestNGlisteners
Tempto architecture● Test code :)
○ Java ○ SQL-convention basedTestNG
utils
requirements requirementfulfillers
TestNGlisteners
tests
Tempto architecture● Declarative requirements● Fulfilled by test framework via
pluggable fulfillers● e.g. mutableTable(
Tpch.NATION,LOADED,“hive”)
● Test level and suite level● Cleanup
TestNG
utilsTestNGlisteners
testsrequirements requirement
fulfillers
Tempto architecture● extra assertions● various tools
○ HDFS client○ SSH client○ JDBC query executor
TestNG
TestNGlisteners
testsrequirements requirement
fulfillers
utils
Executable runnerjava -jar target/presto-product-tests-0.120-SNAPSHOT-executable.jar --help
usage: Presto product tests --config-local <arg> URI to Test local configuration YAML file. --report-dir <arg> Test reports directory --groups <arg> Test groups to be run --excluded-groups <arg> Test groups to be excluded --tests <arg> Test patterns to be included -h,--help Shows help message
● All dependencies embedded● User provides cluster details through yaml config.
Configurationhdfs: username: hdfs webhdfs: host: master port: 50070
tests: hdfs: path: /product-test
databases: default: alias: presto
hive: jdbc_driver_class: org.apache.hive.jdbc.HiveDriver jdbc_url: jdbc:hive2://master:10000 jdbc_user: hdfs jdbc_password: na jdbc_pooling: false jdbc_jar: test-framework-hive-jdbc-all.jar
presto: jdbc_driver_class: com.facebook.presto.jdbc.PrestoDriver jdbc_url: jdbc:presto://localhost:8080/hive/default jdbc_user: hdfs jdbc_password: na jdbc_pooling: false
Goals● Easy and manageable way to define benchmarks● Run and analyze macro benchmarks in clustered environment● Repeatable benchmarking of Hadoop SQL engines, most importantly Presto
○ also used for Hive, Teradata components
● Transparent, trusted framework for benchmarking
Benchmarks - model
BenchmarkRun QueryExecution
Measurement AggregatedMeasurement Measurement
n n
1
n
1
n
Benchmarks - executionbefore-benchmark-macros
prewarm
benchmark
.
.
execution-0
execution-1
execution-n
after-benchmark-macros
Benchmarks - executionbefore-benchmark-macros
prewarm
benchmark
.
.
execution-0
execution-1
execution-n
after-benchmark-macros
Benchmarks - executionbefore-benchmark-macros
prewarm
benchmark
.
.
execution-0
execution-1
execution-n
after-benchmark-macros
Benchmarks - executionbefore-benchmark-macros
prewarm
benchmark
.
.
execution-0
execution-1
execution-n
after-benchmark-macros
Defining benchmarks - structure● Convention based defining of benchmark through descriptors (YAML format)
and query SQL files$ tree ..├── application-presto-devenv.yaml├── application-td-hdp.yaml├── benchmarks│ ├── presto│ │ ├── concurrency-insert-multi-table.yaml│ │ ├── concurrency.yaml│ │ ├── linear-scan.yaml│ │ ├── tpch.yaml│ │ └── types.yaml│ └── querygrid-presto-ansi│ └── concurrency.yaml└── sql
├── presto│ ├── dev-zero│ │ ├── create-alltypes.sql│ │ └── create-lineitem.sql│ ├── linear-scan│ │ ├── selectivity-0.sql│ │ ├── selectivity-100.sql
...
Defining benchmarks - descriptor● Descriptor is YAML configuration file with various properties and user defined
variables$ cat benchmarks/presto/concurrency.yamldatasource: prestoquery-names: presto/linear-scan/selectivity-${selectivity}.sqlschema: tpch_100gb_orcdatabase: hiveconcurrency: ${concurrency_level}runs: ${concurrency_level}prewarm-runs: 3before-benchmark: drop-cachesvariables: 1:
selectivity: 10, 100concurrency_level: 10
2:selectivity: 10, 100concurrency_level: 20
3:selectivity: 10, 100concurrency_level: 50
Defining benchmarks – SQL file templating● SQL files can use keys defined in YAML configuration file – templates are
based on FreeMarker$ cat sql/presto/tpch/q14.sqlSELECT 100.00 * sum(CASE WHEN p.type LIKE 'PROMO%' THEN l.extendedprice * (1 - l.discount) ELSE 0 END) / sum(l.extendedprice * (1 - l.discount)) AS promo_revenueFROM "${database}"."${schema}"."lineitem" AS l, "${database}"."${schema}"."part" AS pWHERE l.partkey = p.partkey AND l.shipdate >= DATE '1995-09-01' AND l.shipdate < DATE '1995-09-01' + INTERVAL '1' MONTH
Future work● (Tempto) Support for complex concurrent tests execution● (Benchto) Automatic regression detection● (Benchto) Customized dashboards (e.g. overall performance analysis)● (Benchto) Hardware and configuration awarness● (Benchto) More complex benchmarking scenarios● (Benchto) Support for complex concurrency scenarios● (Benchto) Scheduling mechanism
Questions?
Benchto GUI● Visualization of benchmarks results● Linking between tools (Grafana, Presto UI)● Comparison of multiple benchmarks
Grafana monitoring● We use Grafana dashboard with Graphite● Benchmark/executions life-cycle events are showed on dashboards● Provides good visibility into state of the cluster