performance tuning dataset refresh in power bi

27
Performance tuning dataset refresh in Power BI Chris Webb Power BI Customer Advisory Team Microsoft

Upload: others

Post on 02-Aug-2022

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance tuning dataset refresh in Power BI

Performance tuning dataset refresh in Power BI

Chris Webb

Power BI Customer Advisory Team

Microsoft

Page 2: Performance tuning dataset refresh in Power BI

Agenda

• Gathering requirements for data refresh in Power BI

• Choosing a storage mode

• Import refresh tuning methodology

• Measuring refresh performance

• Data modelling

• Tuning your data source

• Tuning the Power Query engine

• Tuning the Analysis Services engine

• Refresh in the Power BI Service

Page 3: Performance tuning dataset refresh in Power BI

Why is refresh performance important?

• Your reports are ready for your users to view faster

• You can refresh more frequently during the day if you need to

• Dataset development is easier

• If something goes wrong with your data you can fix and reload faster

• Slow refresh of one dataset may impact• Refresh performance of other datasets

• Report performance

• But how fast is fast enough?

Page 4: Performance tuning dataset refresh in Power BI

How often do you want your data to refresh?

Page 5: Performance tuning dataset refresh in Power BI

I want real-time data!

Page 6: Performance tuning dataset refresh in Power BI

Requirements for data refresh

• Don’t ask what your users want, ask what they need

• Questions:• When is your source data ready to use?

• How often does your source data change?

• What time do you need your data by?

• How many times do you need to refresh in a day? What is the business need?

• What if you unexpectedly need to refresh (eg to fix data problems)?

• How important is keeping data up-to-date versus report performance?

Page 7: Performance tuning dataset refresh in Power BI

Choosing a storage mode

• Import – fastest query performance but data must be refreshed

• Push – data is pushed into a dataset; many limitations

• DirectQuery – no need to refresh but query performance is slower• Composite models allow you to mix DirectQuery and Import tables

• Aggregations are pre-aggregated tables that improve query performance

• Use auto-refresh to make sure your report always shows the latest data

• Use Import unless you have a good reason not to!

Page 8: Performance tuning dataset refresh in Power BI

What happens during refresh?

Data sources Power BI

Power Query Analysis Services

Dataset

Query

Query

Query

Query

Page 9: Performance tuning dataset refresh in Power BI

Import refresh tuning methodology

• Steps:• Model your data properly

• Remove all data that isn’t needed for your reports/analysis

• Tune your data source

• Tune your Power Query queries

• Tune the Analysis Services engine inside Power BI

• You need to check:• Performance of a single refresh while developing

• Actual performance of dataset refresh in production

Page 10: Performance tuning dataset refresh in Power BI

Measuring overall refresh performance

• SQL Server Profiler is the best tool for measuring refresh performance• Connect to Power BI Desktop via DAX Studio

• Connect to Power BI Premium capacities via XMLA endpoint• Not possible to connect to Power BI Shared capacity

• Displays all activity in the Analysis Services engine

• Look for Process command and Duration column

• Power BI Service refresh history also has overall refresh times

• Refresh summary page (and API) shows refresh times for datasets in Premium

• Power BI Capacity Metrics app shows refresh times for Premium

Page 11: Performance tuning dataset refresh in Power BI

Data modelling and refresh performance

• Good data modelling is important for many reasons – data refresh performance is only one of them

• Good modelling may make refresh performance slower, but will make report query performance faster

• Basic rule: always build a star schema!

• Common problems:• Tables with lots of columns

• Do you need to unpivot measures?• Do some of your fact table columns actually belong on a dimension table?• Are you even going to use all of these columns?

• One big table instead of fact tables and dimension tables• Use of expensive data types, eg Double instead of Currency

Page 12: Performance tuning dataset refresh in Power BI

Only load the data you need

• The more data you load, the slower refresh will be

• So:• Remove any columns you don’t need

• Filter out any rows you don’t need

• Think about applying a limit on history, eg only loading one year of data

• Do this as soon as possible, ideally before the data even reaches Power BI

• It’s easier to add data back if you need it than remove data from a dataset in production

• Deployment pipelines (in Premium) can be used to limit the amount of data you work with in a development environment

Page 13: Performance tuning dataset refresh in Power BI

Data source type and refresh performance

• How quickly can your data source send data to Power BI?

• Some tips:• Relational databases perform better than files

• CSV files will perform better than JSON, XML and especially Excel

• Files stored in SharePoint may be slow to load compared to local files

• Web services may also be slow

• Consider loading your data into a fast data source before loading it into Power BI

Page 14: Performance tuning dataset refresh in Power BI

Tuning your data source

• If your data source is a relational database, tune the SQL queries that are run when refresh takes place• Tools like SQL Server Profiler can be used to see what queries are run

• Other useful tools:• Fiddler for viewing requests made to web services

• Process Monitor for viewing reads from text files

• Power Query Query Diagnostics

Page 15: Performance tuning dataset refresh in Power BI

Data source location

• Network latency between your data source and Power BI can affect refresh performance• If you’re using an On-premises data gateway, think about the location of the

gateway machine

• Power BI Premium allows you to locate different capacities in different Azure Regions

Page 16: Performance tuning dataset refresh in Power BI

Power Query engine performance

• Power Query performance can vary depending on where Power Query queries are run:• Power BI Desktop – when you are developing

• Power BI Service – if you’re only connecting to cloud data sources

• On-premises data gateway – if any of your data sources are on-prem, all traffic has to go through a gateway

• Performance will depend on:• Hardware of the machine where queries are run

• Configuration settings and properties

• Efficiency of the queries themselves

Page 17: Performance tuning dataset refresh in Power BI

Power Query Power BI Desktop

• Measure performance of Power Query queries in Desktop using:• SQL Server Profiler

• Power Query Query Diagnostics

• Settings to improve performance in Power BI Desktop:• Disable queries that you don’t need to load into the dataset

• Turn off “Allow data preview to download in the background”

• Turn off data privacy checks – but only if you know what this means!

• Experiment with “Enable parallel loading of tables”

• Use Table.View to stop multiple reads

• Turn off “Include in report refresh” if a query doesn’t need to be refreshed

Page 18: Performance tuning dataset refresh in Power BI

Query folding

• Query folding refers to the way the Power Query engine can push transformations back to the data source

• Almost always results in much better performance

• Only possible with some data sources: relational databases, Analysis Services, OData feeds, some others

• Only possible for some transformations• Different data sources support folding for different transformations

• Some transformations stop other folding happening

• Writing your own SQL queries also prevents folding

Page 19: Performance tuning dataset refresh in Power BI

Tuning the Power Query engine

• If query folding is not taking place, then the Power Query engine does the transformations in your queries

• Some transformations such as sort, merge, pivot/unpivot require all data to be loaded into RAM• A query is limited to using 256MB RAM, so paging may take place

• Some transformations force multiple reads from a data source• Using Table.Buffer may help – but may also cause paging

Page 20: Performance tuning dataset refresh in Power BI

Tuning the on-premises data gateway

• If you are using an on-premises data gateway to load data, your Power Query queries will be executed on the gateway machine

• Tips:• Locate the gateway machine close to the data source

• Make sure the gateway server has enough CPU and memory

• Clustered gateways allow for the load to be spread across multiple servers

• Turn on performance logging and use the Power BI template report to analyse it

Page 21: Performance tuning dataset refresh in Power BI

Using dataflows to improve performance

• Dataflows let you share the output of a Power Query query between multiple datasets• Do complex transformations once instead of inside multiple datasets

• Do transformations when the data for one query is ready, no need to wait until all data needed by the dataset is ready

• Data privacy checks are off by default -> better performance

• In a Premium capacity:• Enhanced compute engine improves performance by loading data into SQL

• Container Size property = more RAM for the Power Query engine

• Computed entities allow you to stage data from slow data sources

Page 22: Performance tuning dataset refresh in Power BI

Power BIData Source

Dataset A

Dataset B

Table

Query

Query

Page 23: Performance tuning dataset refresh in Power BI

Power BIData Source

Dataset A

Dataset B

Table Dataflow Entity

Query

Query

Page 24: Performance tuning dataset refresh in Power BI

Tuning the Analysis Services engine

• SQL Server Profiler displays a lot of detail about what happens during refresh in the Analysis Services engine

• Official support for Tabular Editor within Desktop will allow changing more properties:• IsAvailableInMDX – controls whether hierarchies are built on columns (only

relevant for clients that query using MDX such as Excel)

• EncodingHints – forces the use of a certain type of encoding for a column

Page 25: Performance tuning dataset refresh in Power BI

Calculated columns and calculated tables

• Calculated columns and calculated tables are evaluated during refresh• So the more you have, the slower refresh will be

• Can you replace a calculated column with a measure?• Strange but true: this may also help query performance

• Can you replace a calculated table with a Power Query query or a table in your data source?

• Loading data into hidden tables and then using DAX to transform it is usually a bad thing

• BUT certain calculations will be much quicker in DAX

Page 26: Performance tuning dataset refresh in Power BI

Incremental refresh

• Incremental refresh lets you refresh only the data that is new or has changed• Less data to load -> faster refresh

• Works by creating and managing partitions within the table

• Now available in Power BI Shared as well as Premium

• Designed for use with data warehouses built on relational databases

• Can be adapted for use with other data sources such as:• Web services

• Folders containing multiple files

Page 27: Performance tuning dataset refresh in Power BI

Refresh in the Power BI Service

• Refresh in the Power BI Service only when resources are available

• Therefore, refresh does not always start at the scheduled time

• Refresh may be slower in the Service because:• You have a very fast development PC

• It takes longer to load data into the cloud than into Power BI Desktop

• Refresh may run faster on Premium because: • More resources = more parallelism, but only on a P2+

• More likely to start on time – assuming your capacity isn’t overloaded