denormalizing data with proc sql. demoralizing data with proc sql is that a real word?? spell...
TRANSCRIPT
![Page 1: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/1.jpg)
Denormalizing Data with PROC SQL
![Page 2: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/2.jpg)
Demoralizing Data withPROC SQL
Is that a real word?? Spell Check…
![Page 3: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/3.jpg)
Grocery Store
![Page 4: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/4.jpg)
What You HaveCUSTOMER_ID NAME
1 Ansel
2 Fiona
3 James
4 Kathy
5 Ying
6 Otto
7 Costas
8 Abdul
9 Enrico
10 Mitzu
ITEM_ID ITEM_NAME
1 eggs
2 milk
3 bread
4 chicken
5 beef
6 broccoli
7 carrots
8 apples
9 peaches
10 dog food
CUSTOMER_ID ITEM_ID
1 1
1 2
1 4
1 6
2 2
2 3
2 4
2 7
2 8
… …10 4
10 5
10 8
10 10
• Relational tables in normal form
• Purchase events in many-to-many relation
• Good for relational storage, bad for computing stats
• Data step: join tables using complex merge
![Page 5: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/5.jpg)
What you wantCUSTOMER_ID NAME EGGS MILK BREAD … DOG_FOOD
1 Ansel 1 1 0 … 0
2 Fiona 1 1 1 … 1
3 James 0 1 0 … 0
4 Kathy 1 0 1 … 0
5 Ying 1 1 1 … 1
6 Otto 0 0 1 … 1
7 Costas 0 0 1 … 1
8 Abdul 0 1 0 … 0
9 Enrico 1 0 1 … 1
10 Mitzu 0 1 0 … 1
• Matrix shows who bought what• One row per customer• One column per item• Easy to compute stats
![Page 6: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/6.jpg)
How do you get this?
A few SQL examplesto build up to a solution…
![Page 7: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/7.jpg)
SQL Examples
What items did customer #1 buy?
select item_id from purchaseswhere customer_id = 1;
ITEM_ID---------- 1 2 4 6
![Page 8: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/8.jpg)
What items did customer #1 buy?Join with grocery table to get item name
select P.item_id, G.item_name fromfrom purchases P, groceries Gwhere G.item_id = P.item_idand P.customer_id = 1;
ITEM_ID ITEM_NAME---------- --------- 1 eggs 2 milk 4 chicken 6 broccoli
![Page 9: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/9.jpg)
How many customers bought eggs?
Use SQL aggregate function count().
select count(*)
from purchases P, groceries G
where P.item_id = G.item_id
and G.item_name = 'eggs'
COUNT(*)
--------
5
![Page 10: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/10.jpg)
Did customer #1 buy eggs?
Restrict by customer, count() function returns 0 or 1, i.e., yes or no
select count(*)
from groceries G, purchases P
where P.item_id = G.item_id
and G.item_name = 'eggs'
and P.customer_id = 1;
COUNT(*)
--------
1
![Page 11: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/11.jpg)
Did customer #10 buy eggs?
select count(*)
from groceries G, purchases P
where P.item_id = G.item_id
and G.item_name = 'eggs'
and P.customer_id = 10;
COUNT(*)
--------
0
![Page 12: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/12.jpg)
Subqueries• In SQL, select clause can include a query that
returns a scalar value
select name, (select count(*) from purchases) num_purchasesfrom customers
NAME NUM_PURCHASES----------- -------------Ansel 58Fiona 58James 58Kathy 58Ying 58Otto 58Costas 58Abdul 58Enrico 58Mitzu 58
![Page 13: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/13.jpg)
Correlated Subqueries• Relate inner and outer queries via alias
select name, (select count(*)
from purchaseswhere customer_id = C.customer_id) num_purchases
from customers C;
NAME NUM_PURCHASES----------- -------------Ansel 4Fiona 9James 6Kathy 3Ying 8Otto 7Costas 7Abdul 2Enrico 7Mitzu 5
![Page 14: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/14.jpg)
Putting the pieces together
Joins to get data from multiple tables
Count() to get 0/1, yes/no
Correlated subqueries rotate rows to columns
Aliases to name columns
![Page 15: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/15.jpg)
Final queryselect customer_id, name, (select count(*) from purchases P, groceries G where G.item_id = P.item_id and G.item_name = 'eggs' and P.customer_id = C.customer_id) eggs,
(select count(*) from purchases P, groceries G where G.item_id = P.item_id and G.item_name = 'milk' and P.customer_id = C.customer_id) milk,
(select count(*) from purchases P, groceries G where G.item_id = P.item_id and G.item_name = 'bread' and P.customer_id = C.customer_id) breadfrom customers C;
![Page 16: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/16.jpg)
SAS Codeproc sql;
create table Work.Purchase_Matrix as select
customer_id,name,(select count(*) from purchases P, groceries G where G.item_id = P.item_id and G.item_name = 'eggs' and P.customer_id = C.customer_id) eggs,
(select count(*) from purchases P, groceries G where G.item_id = P.item_id and G.item_name = 'milk' and P.customer_id = C.customer_id) milk,
(select count(*) from purchases P, groceries G where G.item_id = P.item_id and G.item_name = 'bread' and P.customer_id = C.customer_id) bread
from customers C;
quit;
![Page 17: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/17.jpg)
Final Dataset
CUSTOMER_ID NAME EGGS MILK BREAD
----------- ----------- ---------- ---------- ----------
1 Ansel 1 1 0
2 Fiona 1 1 1
3 James 0 1 0
4 Kathy 1 0 1
5 Ying 1 1 1
6 Otto 0 0 1
7 Costas 0 0 1
8 Abdul 0 1 0
9 Enrico 1 0 1
10 Mitzu 0 1 0
![Page 18: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/18.jpg)
Resources
SQL may not always be the most appropriate choice for a given problem. This technique starts to get untenable as the number of columns needed in the output increases.
DATA Step vs. PROC SQL: What’s a neophyte to do?
http://www2.sas.com/proceedings/sugi29/269-29.pdf
Proc SQL versus The Data Step
http://www.nesug.org/proceedings/nesug06/hw/hw06.pdf
![Page 19: Denormalizing Data with PROC SQL. Demoralizing Data with PROC SQL Is that a real word?? Spell Check…](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe01a28abf838cb32ae/html5/thumbnails/19.jpg)
Questions?