diseño fisico indices_2
DESCRIPTION
IndicesTRANSCRIPT
![Page 1: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/1.jpg)
DATA WAREHOUSING Physical Design
![Page 2: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/2.jpg)
2
![Page 3: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/3.jpg)
Provide efficient access to relevant records
Based on values of particular attribute(s)
Same idea as index in back of a book An index is a “thin” copy of a relation
Not all columns from the relation are included The index is sorted in a particular way
Index supports efficient lookup Useful when filters are selective
Avoid scanning rows that will be filtered out
![Page 4: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/4.jpg)
Indexes organized based on some search key Column (or set of columns) whose values are used to access the index
Organization can be sorting or hashing Index is built for some relation
One index entry per record in the relation Index consists of <Value, RID> pairs
Value = value of the search key for this record
RID = record identifier ▪ Tells the DBMS where the record is stored
▪ Usually (page number, offset in page)
![Page 5: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/5.jpg)
Traditional Access Methods
B-trees, hash tables, R-trees, grids, …
Popular in Warehouses
Covering indexes
Multi column indexes
join indexes
bit map indexes
5
![Page 6: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/6.jpg)
Idea behind fact index: Thinner version of fact table Index takes up less space than fact table Fewer I/Os required to scan it
![Page 7: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/7.jpg)
Index has 1 index entry per fact table row Regardless of how many columns are in the
index
![Page 8: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/8.jpg)
Sometimes an index has all the data you need Allows index-only query plan Not necessary to access the actual tuples Such an index is called a covering index
SELECT COUNT(*) FROM R WHERE A=5 Use index on A Count number of <5,RID> entries No need to look up records referenced by RIDs
![Page 9: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/9.jpg)
Multi-column indexes are very useful in data warehousing We say such an index has a composite key
Example: B-Tree index on (A,B) Search key is (A,B) combination Index entries sorted by A value Entries with same A value are sorted by B value Called a lexicographic sort
SELECT SUM(B) FROM R WHERE A=5 Our (A,B) index covers this query!
Coverage vs. size trade-off More attributes in search key → index covers more queries More attributes in search key → index takes up more disk space
![Page 10: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/10.jpg)
10
![Page 11: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/11.jpg)
11
Advantages
efficient computation of joins involving first index columns (or all columns)
Disadvantages
useful only for specific join combinations
▪ for general usage, it is necessary to store a high number of indices
required space may be significant
▪ joins always involve the fact table
![Page 12: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/12.jpg)
12
Cust Region Type
C1 Asia Retail
C2 Europe Dealer
C3 Asia Dealer
C4 America Retail
C5 Europe Dealer
RecID Retail Dealer
1 1 0
2 0 1
3 0 1
4 1 0
5 0 1
RecIDAsia Europe America
1 1 0 0
2 0 1 0
3 1 0 0
4 0 0 1
5 0 1 0
Base table Index on Region Index on Type
Query:
Get customer with region = „Asia‟ AND type = “Dealer”
![Page 13: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/13.jpg)
Good if domain cardinality small Most useful for attributes with low or
medium cardinality ▪ Not good for something like LastName
13
![Page 14: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/14.jpg)
Index intersection plans with bitmap indexes are fast Just perform bitwise AND! Index intersection with B-Trees requires a
join
![Page 15: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/15.jpg)
Save space for low-cardinality attributes As compared to a B-Tree or Hash index
![Page 16: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/16.jpg)
Bit vectors can be compressed Compression Pros and Cons
Reduce storage space → reduce number of I/Os required Need to compress/uncompress → increase CPU work
required Each compression scheme negotiates this trade-off
differently Operate directly on compressed bitmap → improved
performance
16
![Page 17: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/17.jpg)
Bit matrix which precomputes the join between a dimension and the fact table
one column for each dimension RID
one row for each fact table RID
cell (i,j) is 1 if fact table tuple i joins dimension tuple j, 0 otherwise
![Page 18: Diseño fisico indices_2](https://reader033.vdocument.in/reader033/viewer/2022052905/558491bbd8b42adf458b469a/html5/thumbnails/18.jpg)
Indexing dimensions attributes frequently involved in selection predicates if domain cardinality is high, then B-tree index if domain cardinality is low, then bitmap index
Indices for join indexing only foreign keys in the fact table is rarely
appropriate star join index should be used with caution (column order
issue) bitmapped join index is suggested (if available)
Indices for group by use materialized views