high-performance calculations

39
PRESENTED BY Richard Wesley Senior Software Engineer High-Performance Calculations Simple tricks to make some Tableau calculations execute hundreds of times fa PRESENTED BY

Upload: stacie

Post on 25-Feb-2016

46 views

Category:

Documents


3 download

DESCRIPTION

High-Performance Calculations. Simple tricks to make some Tableau calculations execute hundreds of times faster. PRESENTED BY. Overview. Why do we need fast calculations? Real-life examples: Computing dates from Unix time stamps Computing dates from “yyyymmdd” columns - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: High-Performance Calculations

PRESENTED BY

Richard WesleySenior Software Engineer

High-Performance Calculations

Simple tricks to make some Tableau calculations execute hundreds of times fasterPRESENTED BY

Page 2: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

©2012 Tableau Software Inc. All rights reserved.

Overview

• Why do we need fast calculations?• Real-life examples:

• Computing dates from Unix time stamps• Computing dates from “yyyymmdd”

columns• Displaying numbers as text in a viz• Computing a nested set (combined field)

• Coming attractions• How version 8 speeds up some common

calculations

Page 3: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Overview: The Need for Speed

Your organisation can respond fasterYou can stay in a “flow” state longer

Page 4: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Overview: One…Billion…Rows!

All calculations run against 1 billion rowsAmplifies differences to human scale

• Range is from 6 seconds to 5 hours

Page 5: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Computers Compute!A case study in calculation performance

Page 6: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Unix Times: The Problem

Customer Task: • Convert a column of Unix timestamps to

datesTimestamps are 64-bit integers

• Contain the number of milliseconds since 1970-01-01

Need to convert timestamps to dates for analysis

• Human-style units like years, months, days (“binning”)

Page 7: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Unix Times: Original Version

Meaning:• Convert the number to a string• Take the left 10 characters• Change that to an integer (10s of

seconds)• Divide by 8640 to get days• Add to the “zero date”

Computing one billion values takes 3 hours and 45 minutes!

• Version 8 still takes about 30 minutes.

DATE("1/1/1970") + INT( INT( LEFT( STR( [unix]),10 ) ) / 8640 )

Page 8: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Unix Times: Numeric Version

Meaning:• Convert the number to seconds by

dividing• Add those seconds to the zero date• Remove the time part

Computing one billion values takes 45 seconds

• That is 13,000x faster than version 7!• Still 40x faster than version 8.

DATE( DATEADD( 'seconds', INT([unix] / 1000), #1970-01-01# ) )

Page 9: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Unix Times: Strings are Slow

Need to look at each characterNeed to figure out how many characters there areNeed to find space for the answerNeed to copy each character…and so onOften takes 10-100 instructions per value

Page 10: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Unix Times: Numbers are Fast

Computers are good at arithmetic• They “compute”!

Many arithmetic operations take only one instruction

• 2.66 GHz processor = 2.66 billion instructions / second

The more arithmetic you use, the faster they will be

Page 11: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Unix Times: When Does it Help?

These numbers are for the Tableau Data EngineShould also work for most analytic databases:

• Vertica, ParAccel, VectorWise, etc.May not help on slow databases

• MySQL, Text Files, Excel• If you extract, it should help

Page 12: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Creating DatesUse date arithmetic instead of string parsing

Page 13: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Creating Dates: The Problem

Customer Task: • Convert a column of numbers to dates

The numbers are in the form yyyymmddNeed to convert them to dates for analysis

• Time series• Binning (weeks, quarters)

Page 14: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Creating Dates: U.S.A. Strings

Answer taken from an old in-house training manualMeaning:

• Build a string in “mm/dd/yyyy” format• Cast it to a date

Problems:• One billion values takes 5 hours• Only works in the U.S.

DATE( MID( STR( [yyyymmdd] ), 4, 2 ) + “/” + RIGHT( STR( [yyyymmdd] ), 2 ) + “/” + LEFT( STR( [yyyymmdd] ), 4 ) )

Page 15: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Creating Dates: ISO Strings

Meaning:• Build a string in “yyyy-mm-dd” format• Cast it to a date

Works in any country• Data engine tries this format first

One billion values still takes 5 hours

DATE( LEFT( STR( [yyyymmdd] ), 4 ) + “-” + MID( STR( [yyyymmdd] ), 4, 2 ) + “-” + RIGHT( STR( [yyyymmdd] ), 2 ) )

Page 16: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Creating Dates: Date Arithmetic

Meaning:• Get date parts using division (/) and

remainder (%)• Use date arithmetic to add the parts to a

date constant• Division gives real numbers, INT fixes

this.One billion values takes 64 seconds

• 280x faster• This is the difference between stretching

your legs and coming back in the morning!

DATEADD( 'day', [yyyymmdd] % 100 - 1, DATEADD( 'month', INT( ( [yyyymmdd] % 10000 ) / 100 ) - 1, DATEADD( 'year', INT( [yyyymmdd] / 10000 ) - 1900, #1900-01-01# ) ) )

Page 17: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Creating Dates: Strings are Slow

The original calculation has four concatenations

• Each one needs different amounts of memory

• Each one needs to copy the characters• …and so on

Changing a number to a string has similar problemsReading dates from text is tricky

• What country are we in?• 5/3/1983

Page 18: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Creating Dates: Numbers are Fast

Numbers all have the same sizeCopying numbers is fastDate arithmetic is still arithmetic

• Not as simple as addition BUT

• Still only a few instructions• Computers are good at

arithmetic!

Page 19: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Creating Dates: Useful Numeric Functions

LOG• How many digits are there?

ABS / SIGN• Remove / extract the sign of the number

MIN / MAX• Smallest / largest of two values

Page 20: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Presenting NumbersMove display formatting out of your calculations

Page 21: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Presenting Numbers: The Problem

Customer Task: • Data only has the day of the quarter• User wants to group data by the week of

a quarter• Weeks should be labeled nicely

Need a calculation to convert the day to the weekNeed to format it for display

Page 22: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Presenting Numbers: If Then Else

Meaning:• Check all 14 possible ranges one after

another• Label out of bounds values as “Other”

Lots of typing means lots of mistakesA billion rows takes several minutes

• ~7 minutes in 7.0• ~4 minutes in 8.0

IF [Day Of Quarter] <  7 THEN "Week #1"ELSEIF [Day Of Quarter] < 14 THEN "Week #2"…ELSEIF [Day Of Quarter] < 91THEN "Week #13"ELSE "Other" END

Page 23: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Presenting Numbers: Aliases

Solution 1: Use aliases• Rewrite calculation to return numbers• Create aliases for the values: “Week #1”

etc.Only takes 36s on a billion rows

• 12x faster than 7.0• 6x faster in 8.0

Problems:• Typing aliases is still error prone• Dialogue is slow because Tableau must

find all the values

INT( [DayOfQuarter] /7 ) + 1

Page 24: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Presenting Numbers: Formatting

Solution 2: Use column formatting• Rewrite the calculation to return numbers• Apply number formatting to the column:

“Week #”0Still only takes 36s on a billion rowsViz updates live as you edit!

• Much easier to correct mistakes• Formatting editor doesn’t need to run

queries

INT( [DayOfQuarter] /7 ) + 1

Page 25: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Presenting Numbers: Strings are Slow

Databases can format output for historical reasons

• Remember teletypes? Line printers?Database formatting has to be done on every rowGrouping by string calculations can be much slower than grouping by numbers

• Need to compare entire strings instead of string identifiers

Page 26: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Presenting Numbers: Presentation is Fast

Grouping by numbers only compares numbers

• 10x-100x faster than stringsGrouping reduces the number of rows returnedAliases and formatting are applied after the query

• Changing the formatting in Tableau does not run queries

Page 27: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Combined FieldsUse Sets instead of concatenated strings

Page 28: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Combined Fields: The Problem

Customer Task: • User wants to create a multi-

column set from two or more string columns

• The user may want to change the column separator

Page 29: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Combined Fields: Concatenation

Taken from in-house training as an alternative to nested fields (called “combined fields” in version 8)Meaning:

• Concatenate two strings together • Use a separator string that can be

changedProblems:

• One billion rows takes almost 9 minutes• Changing the separator requires re-

running the query

[Month] + “, “ + [Weekday]

Page 30: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Combined Fields: Set

Using “Combine Fields…” menu item, create a field that shows both fields with a user specified separator

• “Create Set…” in v7Changing the separator does not run a new queryPerformance is extremely fast

• 6 seconds on one billion rows• 90x faster

Page 31: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Combined Fields: Strings are Slow

String concatenation is very hard to make fast

• Must build all the combinations from every row

Grouping by calculated strings is slow• Calculations don’t have string identifiers

Page 32: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Combined Fields: Numbers are Fast

Unmodified string columns are really numbers

• One number per unique string• Grouping by them is like grouping by

numbersTableau formats combined fields after the query

• Changing the formatting doesn’t run another query

Page 33: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Coming AttractionsSome things we have made faster in the version 8 data engine

Page 34: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Coming Attractions: If/Then/Else

Version 7 evaluated both sides of IFs• Computed the ELSE side even when true• Computed the THEN side even when false

Especially bad when Tableau nested many of themFixed!

Page 35: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Coming Attractions: Case

Version 7 did not have a CASE statement• Made us build huge if/then/else

statements• If the nesting was deep enough, we would

crashFixed!

• Also computes only the outputs it needs• THEN “string” much faster too

Page 36: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Coming Attractions: String Functions

Version 7 strings were computed one at a timeVersion 8, many functions have been “chunked”

• Compute 1000 values at a timeConverting to/from strings is much faster in Version 8

Page 37: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Coming Attractions: Parallel Execution

Version 7 computed values on only one processorVersion 8 tries to spread calculations across processors

• If you have 4 cores, calculations can be 4x faster

Page 38: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Coming Attractions: Combined Fields

Version 7 could not edit the column order or the separatorVersion 8 lets you edit the column order and the separator

Page 39: High-Performance Calculations

©2012 Tableau Software Inc. All rights reserved.

Questions?