sausag 69 – 20 feb 2014 smarter sorts jerry le breton (softscape solutions) & doug lean (dhs)...
DESCRIPTION
Sorting - The Obvious First proc sort data=claims; by claim client; Its important to know your data How many variables How many distinct data values for each Sort puts your records in order - BY the values of the variables you list. SAUSAG 69 – 20 Feb 2014TRANSCRIPT
![Page 1: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/1.jpg)
SAUSAG 69 – 20 Feb 2014
Smarter Sorts
Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS)
Beyond the Obvious
![Page 2: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/2.jpg)
Sorting –The Obvious First Why Sort ?
“Data and information is almost always presented in a sorted or structured way”
![Page 3: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/3.jpg)
Sorting - The Obvious Firstproc sort data=claims; by claim client;
Its important to know your data• How many variables• How many distinct data values for each
Sort puts your records in order- BY the values of the variables
you list.
SAUSAG 69 – 20 Feb 2014
![Page 4: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/4.jpg)
Sorting – Do You Need To?proc sort data=claims; by claim;Proc tabulate ...; class claim; ... An unnecessary SORT
Some PROCS do their own sorting:TABULATEMEANSREPORTSQL(which can run out of memory for really big data sets)
SAUSAG 69 – 20 Feb 2014
![Page 5: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/5.jpg)
Sorting – Do You Need To? Only use PROC SORT before REPORT,
TABULATE, MEANS if there’s another reason later. For PROC MEANS substitute BY with CLASS
e.g. PROC MEANS NWAY; CLASS x y z;
Is similar to PROC SORT; BY x y z;
PROC MEANS; BY x y z;
And saves significant time by avoiding the SORT
SAUSAG 69 – 20 Feb 2014
![Page 6: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/6.jpg)
Sort Only What You Needproc sort data=claims out=Sorted_claims; where client =: 'A'; by claim;
Sort just the rows you want…
… and just the columns you want…proc sort data=claims(keep = c:) out=Sorted_claims; by claim;
Leaving out unwanted rows and columns can produce dramatic performance improvements.
SAUSAG 69 – 20 Feb 2014
![Page 7: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/7.jpg)
Sorting – Proc Sort vs Proc SQL/* SORT Procedure */proc sort data=claims; by client claim;run;
/* SQL Procedure */proc sql; create table claims as select * from claims order by client claim; quit;
Both will sort your data. No significant performance difference. Choose according to clarity, functional requirement and
efficiency. Make it as clear and simple as possible!
SAUSAG 69 – 20 Feb 2014
![Page 8: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/8.jpg)
Sorted Status of a Data Set
proc sort data=claims; by claim client;
Sort Information
Sortedby CLAIM CLIENT Validated YES Character Set ANSI
Sort status is saved as part of a SAS data set.
So SAS won’t waste time re-sorting if it’s already in the required order.
SAUSAG 69 – 20 Feb 2014
![Page 9: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/9.jpg)
Setting Sorted Status of a Data Set
data client_claims (sortedby = client ); merge clients claims; by client ;
Sort Information
Sortedby CLIENT Validated NO Character Set ANSI
If you know a data set is sorted, say so with the SORTEDBY= option!.
So SAS won’t waste time re-sorting later.
SAUSAG 69 – 20 Feb 2014
![Page 10: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/10.jpg)
Presorted or Notsorted
SAUSAG 69 – 20 Feb 2014
proc sort data=claims out=sorted presorted; by claim;
PRESORTED option for when data probably sorted!SAS will check and only sort if necessary.
proc print data=grouped_claims; by claim NOTSORTED;
No need to sort if data is grouped BY the required variable – it doesn’t matter its NOTSORTED (you just have to say so).
![Page 11: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/11.jpg)
Sorting and Maintaining Order
proc sort data=claims; by claim ;
By default, SAS maintains the original order of records within a BY group.
proc sort data=claims noequals; by claim ;
Using the NOEQUALS option means SAS won’t necessarily retain the original ordering.
More efficient but, directly affects the results of using NODUPKEY
SAUSAG 69 – 20 Feb 2014
![Page 12: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/12.jpg)
Sorting Duplicatesproc sort data=claims out=no_duplicates nodupkey; by claim;
proc sort data=claims out=no_duplicates
dupout=dups nodupkey; by claim;
NODUPKEY effectively keeps the first record of any duplicates.
DUPOUT= puts the duplicates to a separate table.
SAUSAG 69 – 20 Feb 2014
![Page 13: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/13.jpg)
Separating Unique & Duplicate Rows
proc sort data=claims out=sorted ; by claim;run;data unique_claims dup_claims; set sorted; by claim; if first.claim and last.claim then output unique_claims; else output dup_claims;run;
It works, but needs an extra pass of the data.
SAUSAG 69 – 20 Feb 2014
![Page 14: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/14.jpg)
Separating Unique & Duplicate Rows- the smarter way
proc sort data=claims out=duplicates uniqueout=uniques nouniquekey ; by claim;run;
NOUNIQUEKEY ensures no records with a unique key are
written to the OUT= table.
…and the UNIQUEOUT= option directs the unique records to a
separate table
SAUSAG 69 – 20 Feb 2014
![Page 15: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/15.jpg)
Sorting – Case Insensitiveproc sort data=names out=simply_sorted;by name;
data names2; set names; upcase_name = upcase(name);proc sort data=names2 out=upcase_sorted(keep=name); by upcase_name;
Upper case letters are before lower case in the ASCII collating sequence.
Creating an upper (or lower) case copy of the variable is the old solution.
SAUSAG 69 – 20 Feb 2014
![Page 16: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/16.jpg)
Sorting – Case Insensitive - Smarter
proc sort data=names out=linguistic_sorted sortseq=linguistic;by name;
SORTSEQ option specifies the collating sequence (ASCII/EBCDIC/other languages) or, LINGUISTIC option modifies the current collating sequence.
The affect is to make the sort case insensitive.
SAUSAG 69 – 20 Feb 2014
![Page 17: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/17.jpg)
Sorting – Case Insensitive – with SQL
proc sql;create table sql_sorted asselect * from namesorder by upcase(name);
PROC SQL allows the use of functions in the Order By (and other) clauses.
The result is different from Proc SORT using the sorteq=linguistic.
SAUSAG 69 – 20 Feb 2014
![Page 18: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/18.jpg)
Sorting Out Spaces
proc sort data=names out=simply_sorted;by name;
data names_temp; set names; temp_name = upcase(compress(name));run;proc sort data=names_temp out=temp_sorted(keep=name);by temp_name;
A standard sort is obviously no use.
Creating another variable for sorting, without spaces, is the old solution.
![Page 19: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/19.jpg)
Sorting Out Spaces
Proc SORT can too! This sub-option of the LINGUISTIC sortseq option, effectively
ignores spaces as well as being case-insensitive.
proc sql;create table sql_sorted asselect * from namesorder by upcase(compress(name));
proc sort data=names out=alt_handling_sorted sortseq = linguistic(alternate_handling = shifted);by name;
Proc SQL can do it too.
SAUSAG 69 – 20 Feb 2014
![Page 20: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/20.jpg)
Sorting by Numbers
proc sort data=students out=simply_sorted;by student;
Sorting text with numeric prefixes e.g. student id and name …
… results in nothing useful!
SAUSAG 69 – 20 Feb 2014
![Page 21: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/21.jpg)
Sorting by Numbers
An extra data step can create a numeric variable to sort with (as can SQL of course)
data students_temp; set students; student_num = input(scan(student,1), 2.);run;proc sort data=students_temp out=temp_sorted(keep=student);by student_num;
proc sql;create table sql_sorted asselect * from studentsorder by input(scan(student,1), 2.);
SAUSAG 69 – 20 Feb 2014
![Page 22: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/22.jpg)
Sorting by Numbers
The numeric_collation sub-option of the LINGUISTIC sortseq option, sorts by the
numeric values that prefix the variable values.
proc sort data=students out=num_collation_sorted sortseq = linguistic (numeric_collation=on);by student;
SAUSAG 69 – 20 Feb 2014
![Page 23: SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious](https://reader033.vdocument.in/reader033/viewer/2022051301/5a4d1af07f8b9ab05997db5f/html5/thumbnails/23.jpg)
Questions? Did you learn something new from this presentation?
SAUSAG 69 – 20 Feb 2014