nenug apr14 talk - data modeling for netezza
DESCRIPTION
TRANSCRIPT
![Page 1: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/1.jpg)
Data Modeling and Netezza
Biju Nair NENUG Talk
30-‐Apr-‐2014
![Page 2: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/2.jpg)
Mo@va@on and Goal
• Mo@va@on – Performance degrada@on with data volume
• Goal – Highlight considera@ons while modeling for NZ
2
![Page 3: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/3.jpg)
Data Modeling
• Logical Data Modeling – Business domain data representa@on – Independent of DBMS technology – NZ is an appliance for analy@cs
• Set based processing vs row based processing • De-‐normaliza@on • Snow flake/Star schema
• Physical Data Modeling – Takes in the DBMS features and constraints – Need to understand the DBMS architecture
3
![Page 4: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/4.jpg)
NZ Architecture
Host
Snippet Processors
Snippet Processors
Snippet Processors
Data Data Data
-‐ Parse, Op+mize and Compile query -‐ Schedule snippets -‐ Distribute data
Executes snippets
-‐ Shared nothing MPP -‐ Custom IP backbone
-‐ Appliance efficiency is maximized when -‐ snippet processors can run independently i.e. data independence -‐ All snippet processors are u@lized uniformly 4
![Page 5: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/5.jpg)
Snippet Processors
Data
Accelerator (FPGA) CPU
Compute
Host
Reads compressed
data
Un-‐compress Remove columns
Restrict rows (where)
Perform computa+on
Send data to host
-‐ Disk reads are incredibly slow rela@ve to other components especially seek @me -‐ While the CPU overhead is reduced, volume of data read will impact performance
5
![Page 6: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/6.jpg)
Data Storage
Data
Snippet Processors
Host
Extend
Page
Extend
Extend
…
Meta-‐Data
-‐ Meta data iden@fies extends/pages to read or skip 6
![Page 7: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/7.jpg)
Modeling Priori@es
• U@liza@on of all snippet processors – Need to be able to u@lize uniformly
• Maximize MPP capability of snippet processors – Ideally snippet processors should be independent
• Minimize data read from disk – Minimize data stored
• Improve computa@on in snippet processor – Compounded with data volume will help performance
7
![Page 8: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/8.jpg)
Snippet Processor U@liza@on Data Distribu+on
Host
Snippet Processors
Snippet Processors
Snippet Processors
1,MA,1212,… 3,MA,0414,…
2,CA,0113,…
1,MA,1212,… 2,CA,0113,… 3,MA,0414,…
Distribute by state
-‐ NZ will pick one of the columns to distribute if none specified in table defini@on -‐ First column in the table
Data Skew
8
![Page 9: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/9.jpg)
Snippet Processor U@liza@on Data Distribu+on
Host
Snippet Processors
Snippet Processors
Snippet Processors
1,MA,1212,… 2,CA,0113,… 3,MA,0414,…
1,MA,1212,… 2,CA,0113,… 3,MA,0414,…
Distribute by mo-‐yr
-‐ Snippet processors are u@lized uniformly -‐ What if most of the query is on for the current month?
-‐ Processing skew 9
![Page 10: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/10.jpg)
Snippet Processor U@liza@on Data Distribu+on
Host
Snippet Processors
Snippet Processors
Snippet Processors
1,MA,1212,… 4,CA,0414,…
2,CA,0113,… 3,MA,0414,…
1,MA,1212,… 2,CA,0113,… 3,MA,0414,… 4,CA,0414,…
Distribute random
-‐ Snippet processors are u@lized uniformly -‐ Helps prevent processing skew
10
![Page 11: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/11.jpg)
3,ORD1,ITEM1,… 4,ORD1,ITEM1,… 2,ORD1,ITEM1,…
1,ORD1,ITEM1,…
Snippet Processor U@liza@on Data Distribu+on and Table Joins
Host
Snippet Processors
Snippet Processors
Snippet Processors
4,CA,0414,… 1,MA,1212,… 3,MA,0414,…
2,CA,0113,…
1,ORD1,ITEM1,… 2,ORD1,ITEM1,… 3,ORD1,ITEM1,… 4,ORD1,ITEM1,…
Distribute random
-‐ Snippet processors are u@lized uniformly -‐ Makes snippet processors dependent on others impac@ng MPP maximiza@on
Need to redistribute data from both tables
11
![Page 12: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/12.jpg)
3,ORD1,ITEM1,… 2,ORD1,ITEM1,… 1,ORD1,ITEM1,… 4,ORD1,ITEM1,…
Snippet Processor U@liza@on Data Distribu+on and Table Joins
Host
Snippet Processors
Snippet Processors
Snippet Processors
4,CA,0414,… 1,MA,1212,… 3,MA,0414,…
2,CA,0113,…
1,ORD1,ITEM1,… 2,ORD1,ITEM1,… 3,ORD1,ITEM1,… 4,ORD1,ITEM1,…
Distribute on Join column -‐ cid
-‐ Snippet processors are u@lized uniformly -‐ Makes snippet processors dependent on others impac@ng MPP maximiza@on
-‐ Becer than the previous scenario
Need to redistribute data from one table
12
![Page 13: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/13.jpg)
3,ORD1,ITEM1,… 2,ORD1,ITEM1,… 1,ORD1,ITEM1,… 4,ORD1,ITEM1,…
Snippet Processor U@liza@on Data Distribu+on and Table Joins
Host
Snippet Processors
Snippet Processors
Snippet Processors
1,MA,1212,… 4,CA,0414,…
2,CA,0113,… 3,MA,0414,…
Distribute both tables on Join column -‐ cid
-‐ Snippet processors are u@lized uniformly -‐ Makes snippet processors independent maximizing MPP
13
![Page 14: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/14.jpg)
Snippet Processor U@liza@on Data Distribu+on
• Iden@fy keys to distribute data uniformly – Avoid data and processing skew – Try using join columns as the distribu@on keys – Choose same data types for join columns
• If table size is small random distribu@on is fine – If one of the join table is small, NZ will broadcast
• Redistribu@on may not be an overkill for small data – For e.g., selec@ng a small number if columns
14
![Page 15: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/15.jpg)
Snippet Processor U@liza@on Distribu+on and Query Time
0
0.5
1
1.5
2
2.5
3
3.5
Random 1 Correct Distribu@on 2 Correct Distribu@on
Query Time For Different Distribu+ons
Time (min)
15
![Page 16: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/16.jpg)
Snippet Processor U@liza@on Join Column Type and Query Time
0
0.5
1
1.5
2
2.5
Incorrect Data Types Correct Data Types
Join query +me -‐ same and diff data types
Time (min)
16
![Page 17: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/17.jpg)
Minimize Data Read From Disk Zone Maps
• Data types which supports Zone Maps – All integer data types
• int1 • int2 • int4 • int8
– Date – Timestamp
17
Refer to the product manual for the version of NZ used for the complete list of zone map able data types
![Page 18: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/18.jpg)
Data
Snippet Processors
Host
Extend 1
Page
Extend 2
Extend 3
…
Meta-‐Data
-‐ May end up reading all data from disk
Minimize Data Read From Disk Zone Maps
Table column (cid) is numeric(10,0)
No zone map for cid
18
![Page 19: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/19.jpg)
Data
Snippet Processors
Host
Extend
Page
Extend
Extend
…
Meta-‐Data
-‐ Zone maps can be used to minimize data read from disk
Minimize Data Read From Disk Zone Maps
Table column (cid) is bigint
19
![Page 20: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/20.jpg)
Minimize Data Read From Disk Zone Maps and Query Time
1.3
1.35
1.4
1.45
1.5
1.55
1.6
Incorrect Data Type Correct Data Type
Query +me with and without zone map
Time (min)
20
![Page 21: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/21.jpg)
Minimize Data Read From Disk Clustered Base Tables
• NZ stores data with same organize keys closely • Addi@onal data types are zone map able
– char – varchar – nchar – nvarchar – float – double – bool – @me – Interval
• Helps improve performance of mul@ table join
21
![Page 22: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/22.jpg)
Extend
Extend
Extend
Minimize Data Read From Disk Clustered Base Tables
Table distributed on cid
1,MA,Boston,…
5,CA,LA,…
3,FL,Tampa,,…
1,MA,Salem,…
5,CA,SF,…
3,FL,Orlando,…
1,MA,Lowell,…
5,CA,Pasadena,… 3,FL,Miami,…
Extend
Extend
Extend
Table distributed on cid organize on state
1,MA,Boston,…
1,MA,Salem,…
1,MA,Lowell,…
3,FL,Tampa,,…
3,FL,Orlando,…
3,FL,Miami,…
5,CA,LA,…
5,CA,SF,…
5,CA,Pasadena,…
Extend
1,AL,Alabama,…
5,CA,California,… 10,FL,Florida,…
State
22,MA,Mass,…
22
![Page 23: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/23.jpg)
Minimize Data Read From Disk Clustered Base Table
0
0.5
1
1.5
2
2.5
No Org+Correct Dist Org+Correct Dist
Query Time with and without org
Time (min)
23
![Page 24: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/24.jpg)
Minimize Data Read From Disk Materialized Views
• View with frequently used columns of base table – Unlike views, materialized view stores data – Reduced data read from disk – Addi@onal storage required – Need to be refreshed if base table data changes
• Can be used as an index against base table – Stores loca@on of base table data loc in a column
24
![Page 25: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/25.jpg)
Extend
Extend
Extend
Minimize Data Read From Disk Materialized Views
T_EMP distributed on cid
1,MA,Mike,…
5,CA,Fally,…
3,FL,Chris,…
4,MA,Robert,…
7,CA,Mary,…
2,FL,Jus+n,…
6,MA,Harini,…
8,CA,Mike,…
9,FL,Martha,…
SELECT ID, NAME FROM T_EMP;
Extend
MV on T_EMP order by state, name
5,CA,Fally
7,CA,Mary
8,CA,Mike
3,FL,Chris
2,FL,Jus+n
9,FL,Martha
6,MA,Harini
1,MA,Mike
4,MA,Robert
SELECT * FROM T_EMP WHERE STATE = ‘CA’
25
![Page 26: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/26.jpg)
Minimize Data Read From Disk Materialized Views
0
0.2
0.4
0.6
0.8
1
1.2
1.4
No MV With MV
Query +me on single table -‐MV vs No MV
Time (min)
26
![Page 27: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/27.jpg)
Minimize Data Read From Disk Materialized Views
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
No MV With MV
Join Query Time -‐ With and without MV
Time (min)
27
![Page 28: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/28.jpg)
Minimize Data Read From Disk Minimize Data Stored
• Choose storage efficient data types – Difference between bigint and int is 4 bytes
• Use char instead of varchar if the data length is fixed – varchar has a 2 byte overhead
• Define columns as “not null” where possible
• Store only the required data in table columns
• Encode duplicate data stored in rows 28
![Page 29: NENUG Apr14 Talk - data modeling for netezza](https://reader034.vdocument.in/reader034/viewer/2022051412/54c452ee4a795970268b457a/html5/thumbnails/29.jpg)
Improve Computa@on in Snippet Processor NZ Object Defini+ons
• Define columns as “not null” where possible – Removes logic to check nulls
• Define table keys and rela@onships – Helps NZ query op@mizer to generate efficient code
29