albert van dok sql zaterdag 12 november 2011. background life before bism what is bism bism...

Post on 22-Dec-2015

227 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BI Sematic Model

Albert van DokSQL Zaterdag12 november 2011

Agenda

BackgroundLife Before BISMWhat is BISMBISM PositioningQuestions

Background

From data towards informationBy nature the demand for (new) information and insights will always evolveTo connect and integrate (new) datasources is an essential partPreparing data for use

Data cleansingDefine relationshipsData enrichmentAdd calculationsVersioning

Goal is not always easily to achieve

Applications

• Analytical solutions• Operational reports• Dashboards &

Scorecards• Data Mining

Require-

ments• Quick delivery• Integration of data by

business user• Ad hoc reports• Excellent

performance• Flexible

Issues

• Operational reports from an analytical system

• Wrong use of tools or BI tools not flexible

• (Performance) problems• Long implementation

times• Highly depended on IT

BI across the enterprise

Life before BISM

DW

Datamart

Datamart

Data Model

Reporting Tool

Reporting Tool

ToolData Source

MOLAP

MOLAP OLAP Browser

OLAP Browser

Reporting Tool

OLTP

Life before BISM

DW

Datamart

Datamart

Data Model

Reporting Tool

Reporting Tool

ToolData Source

MOLAP

MOLAP OLAP Browser

OLAP Browser

Reporting Tool

OLTP

UDM

Life before BISM

DW

Datamart

Datamart

Data Model

Reporting Tool

Reporting Tool

ToolData Source

OLAP Browser

OLAP Browser

Analysis Services

Reporting Tool

MOLAP

MOLAP

OLTP

UDM

XM

L/A

Cache

Security

End-user model• Transalations• Actions• KPI…

Calculations

Basic dim. model• Cube &

Dimensions• Storage &

Caching policies• Linked Objects

Datasource view

UDM

The UDM in SSAS 2008 R2

UDM

Excel 2010

Reporting Services 2008 R2

&Report Builder 3

SharePoint 2010• Excel Services• PerformancePoint

Services• Visio Services

3rd party SSAS clients

MDX MDX

MDX

MDX

Besides the advantages the UDM:

Is often too complex for simple reporting purposesHas a steep learning curveUses MDX which is different than SQL…Must be implemented by a BI professionalNeeds small investment just to start

The holy grail: Self Service BI

New paradigm“Business intelligence for the masses”“Managed self-service business intelligence”

Put simple, powerful BI tools in the hands of “knowledge workers”

Familiar tools: ExcelPeople who own the data

Excel spreadsheet, Access database or SharePoint list data

Reality: Office power users

New kid on the block: PowerpivotPowerpivot for Excel

Free Addin for ExcelRunning 32/64bit and lots of RAM… Contains Vertipaq engine (SSAS running in process with Excel)

Powerpivot for SharepointComes with SQL Server 2008 R2 x64Sharepoint 2010 extentionVertipaq running on server sideFor sharing and managing PowerPivot applications

Powerpivot

PowerPivot has its own semantic model which can be seen as BISM v1

enables connecting data from various data sourcesadd relations between tablesadd calculations, two places:

in tables – calculated columns (DAX)over the whole model – calculated measures (DAX)

works in cached (VertiPaq) mode

Covers personal and team BI segments

What is Vertipaq

In-memory column-based database

Very high data compression

Doesn’t require the

process of designing and building aggregations and other tunningSupport partitioning and paging on large data sizes

Relational Database

15

4 Jim … $1,500 5 Liz … $0 6 Dave … $9,000

7 Sue … $1010 8 Bob … $50 9 Jim … $1,300

1 Bob … $3000 2 Sue … $500 3 Ann … $1,700Page 1

Page 2

Page 3

64 bytes

CPU

L2 Cache

L1 Cache

Memory (DBMS

Buffer Pool)1 … $3000

2 … $500 3 … $1700

4 … $1500

5 … $0 6 … $9000

.. $3000 .. $500 .. $1700 .. $1500.. $0 .. $9000 .. $1010 .. $50 .. $1300

7 … $1010

8 … $50 9 … $1300

.. $3000 .. $500.. $1700 .. $1500.. $0 .. $9000.. $1010 .. $50 .. $1300

8K bytes

64 bytes

Select id, name, BalDue from Customers where BalDue > $500

Query summary:• 3 pages read from disk• Up to 9 L1 and L2 cache misses

(one per tuple)

Don’t forget that:- An L2 cache miss can stall the CPU for up to 200 cycles

Columnstore Database

16

64 bytes

CPU

L2 Cache

L1 Cache

Memory

8K bytes

64 bytes

Id 1 2 3 4 5 6 7 8 9

Name Bob Sue Ann Jim Liz Dave Sue Bob Jim

BalDue 9000 1010 50 1300

3000 500 1700 1500 0

Street … … … … …..… … … … …..… … … … …..… … … … …..

9000 1010 50 1300

3000 500 1700 1500 0

3000 500 1700

3000 500 1700

1500 0

1500 0

9000 1010 50

9000 1010 50 1300

1300

Takeaways:• Each cache miss brings only

useful data into the cache• Processor stalls reduced by up to

a factor of: 8 (if BalDue values are 8 bytes)16 (if BalDue values are 4 bytes)

Caveats:• Not to scale! An 8K byte page of

BalDue values will hold 1000 values (not 5)

• Not showing disk I/Os required to read id and Name columns

Select id, name, BalDue from Customers where BalDue > $500

An example

Assume:Customer table has 10M rows, 200 bytes/row (2GB total size)Id and BalDue values are each 4 bytes long, Name is 20 bytes

Query:Select id, Name, BalDue from Customer where BalDue > $1000

Row store execution: Scan 10M rows (2GB) @ 80MB/sec = 25 sec.

Column store execution:Scan 3 columns, each with 10M entries 280MB@80MB/sec = 3.5 sec.

(id 40MB, Name 200MB, BalDue 40MB)

About a 7X performance improvement for this query!! But we can do even better using compression

Demo

Powerpivot

We are not there yet

Although Powerpivot for Excel is great, it has certain limitations

Limit to 2Gb, no support for partitions, queries Vertipaq cache, daily scheduled data refresh in Sharepoint, acces to workbook

PowerPivot and Analysis Services are two different products hence two models

Powerpivot targets business users, model managed in ExcelAnalysis Services targets BI professionals and IT, model managed on the server

“Can we have one model which integrate both worlds and seamlessly transition BI applications from Personal BI to Team BI to Organizational/Professional BI?”

And now there is BISM…

What is coming in Denali

BISM v2One model for all

reporting, analysis, dashboards, scorecardspersonal, team, corporate BI

Has a relational and multidimensional APISupport both cached (Molap & VertiPaq) and the pass-through (realtime) mode

only SQL Server data sources for now

Pass-throughno additional databasedata stays as is in the original structuresideal for the realtime analysis

Why does this work

In “Denali” every cube automatically becomes a BI Semantic Model

To create a BI semantic model you create a:multidimensional model, tabular model, PowerPivot workbook

Every model looks like cubes/dimensions/measure groups/data sources/data source views under the covers

they share a common Analysis Services file format.this shared underlying structure that makes the BI semantic model work

BISM Data modelHybrid model supporting multidimensional and tabular data models

Developed using an multidimensional or a tabular projectChoice depends on application needs and skillset

TabularFamiliar model, easier to build, faster time to solutionNot all advanced concepts (e.g. many-to-many) not available natively in the model… need calculations to simulate theseEasy to wrap a model over a raw database or warehouse for reporting & analytics

MultidimensionalSophisticated model, higher learning curveAdvanced concepts baked into the model and optimized (parent-child, many-to-many, attribute relationships, key vs. name, etc.)Ideally suited for OLAP type apps (e.g. planning, budgeting, forecasting) that need the power of the multidimensional model

BISM Business Logic & Queries

Represents the intelligence or semantics in the modelDefines entities and relations between themUser-orientedDAX

Based on Excel formulas and relational concepts – easy to get startedComplex solutions require steeper learning curve – row/filter context, Calculate, etcCalculated columns enable new scenarios, however no named sets or calc members

MDXBased on understanding of multidimensional concepts – higher initial learning curveComplex solutions require steeper learning curve – CurrentMember, overwrite semantics, etc.Ideally suited for apps that need the power of multidimensional calculations – scopes, assignments, calc members

BISM Data Access

This layer integrates data from multiple sources – relational databases, business applications, flat files, OData feeds, etc.

Two modes: cached and pass-throughCached:: pulls in data from all the sources and stores it in a compressed data structure

MOLAP and VertiPaq

Passthrough: pushes query processing and business logic down to the data source

ROLAP and DirectQuery

Analysis Services ‘Denali’ - UDM

UDM

Excel 2010

Reporting Services „Denali”

SharePoint 2010•Excel Services•Reporting Services•PerformancePoint Services

•Visio Services

3rd party SSAS clients

SharePoint 2010•Power View

MDX MDX

MDX

MDX

MDX?

Analysis Services ‘Denali’ - BISM

BISM

Excel 2010

Reporting Services „Denali”

SharePoint 2010•Excel Services•Reporting Services•PerformancePoint Services

•Visio Services

3rd party SSAS clients

SharePoint 2010•Power View

3rd party SSAS clients

MDX MDX

MDX

MDX

DAX

DAX?

DAX

Powerpivot workbook

BISM

Excel 2010

Delali’s new features in BISM

BISM in ‘Denali’ includes:hierarchies, KPIs, parent-child, drillthrough, perspectivesadditional DAX functions (RankX, DistinctCount, GroupBy, Lookup)security (role-based with Active Directory, column/row based)

BISM does not include:some of the UDM features

scripts, actions, translations, role-playing dimensionsobject modelwrite-back

otherrealtime for non-SQL Server data sourcesMDX query support for realtime

Demo

BISM and the tabular model

Advantages of BISM

Relatively simple modelFast responseFlexibleDAX calculations are similar to Excel formulasMore understandable and user-friendly to majority of peopleSame model across all scenarios

Easily scale from personal BI to corporate BIFaster development than in UDMPrototyping by end-usersEasier changes of modelReduction of cost in developing the full BI solution

Positioning of BISMMOLAP is much more complex than PowerPivot, but it offers greater scalability

ROLAP is even more limited, but it scales above 50TB space

PowerPivot models can grow up to 2GB which is the limit set by SharePoint if they want to be shared among others. Otherwise, only the memory is the limit

BISM comes in the middle and fills the space between MOLAP and PowerPivot

For the space way above the 50TB there are new ColumnStore indexes (in the relational engine)

MOLAP

PowerPivot

BISM

RO

LAP

ColumnStore

source: Thomas Kejser, SQLCAT Usability

Sca

labi

lity

50 TB

5 TB

100 Gb

2 Gb

Current Limitations in “Denali”

Two projects for building a BI Semantic Model

Future plan is to integrate these into 1 model

Use Vertipaq as an SSAS storageUse MDX scripts in tabular projects

DAX queries are not supported in multidimensional projects

and thereby Power Viewer, which uses DAX to retrieve data from the model

Analysis Services Architecture

Beyond Denali

BI Semantic Model featuresRole playing dimensionsTranslationsActionsMDX ScriptsRealtime over Oracle, Teradata, DB2…

ProgrammabilityBISM object modelMDX query support for RealtimeWrite back

Wrapup

BISM is not a replacement for UDMDAX is not a replacement for MDXColumn store databases offering blazing fast performanceEvery model has its advantagesBI architects must decide when to apply which modelBISM v2 in not complete, expect changes!

Questions

Mail to albert@qbids.nl

top related