high-performance calculations
DESCRIPTION
High-Performance Calculations. Simple tricks to make some Tableau calculations execute hundreds of times faster. PRESENTED BY. Overview. Why do we need fast calculations? Real-life examples: Computing dates from Unix time stamps Computing dates from “yyyymmdd” columns - PowerPoint PPT PresentationTRANSCRIPT
PRESENTED BY
Richard WesleySenior Software Engineer
High-Performance Calculations
Simple tricks to make some Tableau calculations execute hundreds of times fasterPRESENTED BY
©2012 Tableau Software Inc. All rights reserved.
©2012 Tableau Software Inc. All rights reserved.
Overview
• Why do we need fast calculations?• Real-life examples:
• Computing dates from Unix time stamps• Computing dates from “yyyymmdd”
columns• Displaying numbers as text in a viz• Computing a nested set (combined field)
• Coming attractions• How version 8 speeds up some common
calculations
©2012 Tableau Software Inc. All rights reserved.
Overview: The Need for Speed
Your organisation can respond fasterYou can stay in a “flow” state longer
©2012 Tableau Software Inc. All rights reserved.
Overview: One…Billion…Rows!
All calculations run against 1 billion rowsAmplifies differences to human scale
• Range is from 6 seconds to 5 hours
©2012 Tableau Software Inc. All rights reserved.
Computers Compute!A case study in calculation performance
©2012 Tableau Software Inc. All rights reserved.
Unix Times: The Problem
Customer Task: • Convert a column of Unix timestamps to
datesTimestamps are 64-bit integers
• Contain the number of milliseconds since 1970-01-01
Need to convert timestamps to dates for analysis
• Human-style units like years, months, days (“binning”)
©2012 Tableau Software Inc. All rights reserved.
Unix Times: Original Version
Meaning:• Convert the number to a string• Take the left 10 characters• Change that to an integer (10s of
seconds)• Divide by 8640 to get days• Add to the “zero date”
Computing one billion values takes 3 hours and 45 minutes!
• Version 8 still takes about 30 minutes.
DATE("1/1/1970") + INT( INT( LEFT( STR( [unix]),10 ) ) / 8640 )
©2012 Tableau Software Inc. All rights reserved.
Unix Times: Numeric Version
Meaning:• Convert the number to seconds by
dividing• Add those seconds to the zero date• Remove the time part
Computing one billion values takes 45 seconds
• That is 13,000x faster than version 7!• Still 40x faster than version 8.
DATE( DATEADD( 'seconds', INT([unix] / 1000), #1970-01-01# ) )
©2012 Tableau Software Inc. All rights reserved.
Unix Times: Strings are Slow
Need to look at each characterNeed to figure out how many characters there areNeed to find space for the answerNeed to copy each character…and so onOften takes 10-100 instructions per value
©2012 Tableau Software Inc. All rights reserved.
Unix Times: Numbers are Fast
Computers are good at arithmetic• They “compute”!
Many arithmetic operations take only one instruction
• 2.66 GHz processor = 2.66 billion instructions / second
The more arithmetic you use, the faster they will be
©2012 Tableau Software Inc. All rights reserved.
Unix Times: When Does it Help?
These numbers are for the Tableau Data EngineShould also work for most analytic databases:
• Vertica, ParAccel, VectorWise, etc.May not help on slow databases
• MySQL, Text Files, Excel• If you extract, it should help
©2012 Tableau Software Inc. All rights reserved.
Creating DatesUse date arithmetic instead of string parsing
©2012 Tableau Software Inc. All rights reserved.
Creating Dates: The Problem
Customer Task: • Convert a column of numbers to dates
The numbers are in the form yyyymmddNeed to convert them to dates for analysis
• Time series• Binning (weeks, quarters)
©2012 Tableau Software Inc. All rights reserved.
Creating Dates: U.S.A. Strings
Answer taken from an old in-house training manualMeaning:
• Build a string in “mm/dd/yyyy” format• Cast it to a date
Problems:• One billion values takes 5 hours• Only works in the U.S.
DATE( MID( STR( [yyyymmdd] ), 4, 2 ) + “/” + RIGHT( STR( [yyyymmdd] ), 2 ) + “/” + LEFT( STR( [yyyymmdd] ), 4 ) )
©2012 Tableau Software Inc. All rights reserved.
Creating Dates: ISO Strings
Meaning:• Build a string in “yyyy-mm-dd” format• Cast it to a date
Works in any country• Data engine tries this format first
One billion values still takes 5 hours
DATE( LEFT( STR( [yyyymmdd] ), 4 ) + “-” + MID( STR( [yyyymmdd] ), 4, 2 ) + “-” + RIGHT( STR( [yyyymmdd] ), 2 ) )
©2012 Tableau Software Inc. All rights reserved.
Creating Dates: Date Arithmetic
Meaning:• Get date parts using division (/) and
remainder (%)• Use date arithmetic to add the parts to a
date constant• Division gives real numbers, INT fixes
this.One billion values takes 64 seconds
• 280x faster• This is the difference between stretching
your legs and coming back in the morning!
DATEADD( 'day', [yyyymmdd] % 100 - 1, DATEADD( 'month', INT( ( [yyyymmdd] % 10000 ) / 100 ) - 1, DATEADD( 'year', INT( [yyyymmdd] / 10000 ) - 1900, #1900-01-01# ) ) )
©2012 Tableau Software Inc. All rights reserved.
Creating Dates: Strings are Slow
The original calculation has four concatenations
• Each one needs different amounts of memory
• Each one needs to copy the characters• …and so on
Changing a number to a string has similar problemsReading dates from text is tricky
• What country are we in?• 5/3/1983
©2012 Tableau Software Inc. All rights reserved.
Creating Dates: Numbers are Fast
Numbers all have the same sizeCopying numbers is fastDate arithmetic is still arithmetic
• Not as simple as addition BUT
• Still only a few instructions• Computers are good at
arithmetic!
©2012 Tableau Software Inc. All rights reserved.
Creating Dates: Useful Numeric Functions
LOG• How many digits are there?
ABS / SIGN• Remove / extract the sign of the number
MIN / MAX• Smallest / largest of two values
©2012 Tableau Software Inc. All rights reserved.
Presenting NumbersMove display formatting out of your calculations
©2012 Tableau Software Inc. All rights reserved.
Presenting Numbers: The Problem
Customer Task: • Data only has the day of the quarter• User wants to group data by the week of
a quarter• Weeks should be labeled nicely
Need a calculation to convert the day to the weekNeed to format it for display
©2012 Tableau Software Inc. All rights reserved.
Presenting Numbers: If Then Else
Meaning:• Check all 14 possible ranges one after
another• Label out of bounds values as “Other”
Lots of typing means lots of mistakesA billion rows takes several minutes
• ~7 minutes in 7.0• ~4 minutes in 8.0
IF [Day Of Quarter] < 7 THEN "Week #1"ELSEIF [Day Of Quarter] < 14 THEN "Week #2"…ELSEIF [Day Of Quarter] < 91THEN "Week #13"ELSE "Other" END
©2012 Tableau Software Inc. All rights reserved.
Presenting Numbers: Aliases
Solution 1: Use aliases• Rewrite calculation to return numbers• Create aliases for the values: “Week #1”
etc.Only takes 36s on a billion rows
• 12x faster than 7.0• 6x faster in 8.0
Problems:• Typing aliases is still error prone• Dialogue is slow because Tableau must
find all the values
INT( [DayOfQuarter] /7 ) + 1
©2012 Tableau Software Inc. All rights reserved.
Presenting Numbers: Formatting
Solution 2: Use column formatting• Rewrite the calculation to return numbers• Apply number formatting to the column:
“Week #”0Still only takes 36s on a billion rowsViz updates live as you edit!
• Much easier to correct mistakes• Formatting editor doesn’t need to run
queries
INT( [DayOfQuarter] /7 ) + 1
©2012 Tableau Software Inc. All rights reserved.
Presenting Numbers: Strings are Slow
Databases can format output for historical reasons
• Remember teletypes? Line printers?Database formatting has to be done on every rowGrouping by string calculations can be much slower than grouping by numbers
• Need to compare entire strings instead of string identifiers
©2012 Tableau Software Inc. All rights reserved.
Presenting Numbers: Presentation is Fast
Grouping by numbers only compares numbers
• 10x-100x faster than stringsGrouping reduces the number of rows returnedAliases and formatting are applied after the query
• Changing the formatting in Tableau does not run queries
©2012 Tableau Software Inc. All rights reserved.
Combined FieldsUse Sets instead of concatenated strings
©2012 Tableau Software Inc. All rights reserved.
Combined Fields: The Problem
Customer Task: • User wants to create a multi-
column set from two or more string columns
• The user may want to change the column separator
©2012 Tableau Software Inc. All rights reserved.
Combined Fields: Concatenation
Taken from in-house training as an alternative to nested fields (called “combined fields” in version 8)Meaning:
• Concatenate two strings together • Use a separator string that can be
changedProblems:
• One billion rows takes almost 9 minutes• Changing the separator requires re-
running the query
[Month] + “, “ + [Weekday]
©2012 Tableau Software Inc. All rights reserved.
Combined Fields: Set
Using “Combine Fields…” menu item, create a field that shows both fields with a user specified separator
• “Create Set…” in v7Changing the separator does not run a new queryPerformance is extremely fast
• 6 seconds on one billion rows• 90x faster
©2012 Tableau Software Inc. All rights reserved.
Combined Fields: Strings are Slow
String concatenation is very hard to make fast
• Must build all the combinations from every row
Grouping by calculated strings is slow• Calculations don’t have string identifiers
©2012 Tableau Software Inc. All rights reserved.
Combined Fields: Numbers are Fast
Unmodified string columns are really numbers
• One number per unique string• Grouping by them is like grouping by
numbersTableau formats combined fields after the query
• Changing the formatting doesn’t run another query
©2012 Tableau Software Inc. All rights reserved.
Coming AttractionsSome things we have made faster in the version 8 data engine
©2012 Tableau Software Inc. All rights reserved.
Coming Attractions: If/Then/Else
Version 7 evaluated both sides of IFs• Computed the ELSE side even when true• Computed the THEN side even when false
Especially bad when Tableau nested many of themFixed!
©2012 Tableau Software Inc. All rights reserved.
Coming Attractions: Case
Version 7 did not have a CASE statement• Made us build huge if/then/else
statements• If the nesting was deep enough, we would
crashFixed!
• Also computes only the outputs it needs• THEN “string” much faster too
©2012 Tableau Software Inc. All rights reserved.
Coming Attractions: String Functions
Version 7 strings were computed one at a timeVersion 8, many functions have been “chunked”
• Compute 1000 values at a timeConverting to/from strings is much faster in Version 8
©2012 Tableau Software Inc. All rights reserved.
Coming Attractions: Parallel Execution
Version 7 computed values on only one processorVersion 8 tries to spread calculations across processors
• If you have 4 cores, calculations can be 4x faster
©2012 Tableau Software Inc. All rights reserved.
Coming Attractions: Combined Fields
Version 7 could not edit the column order or the separatorVersion 8 lets you edit the column order and the separator
©2012 Tableau Software Inc. All rights reserved.
Questions?