programmability in spss 16 & 17, jon peck
DESCRIPTION
Programmability in SPSS 16 & 17, Jon PeckTRANSCRIPT
![Page 1: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/1.jpg)
Programmability in SPSS 16 and 17
Jon K Peck
Technical Advisor and Principal Software Engineer
Athens, May 2008
![Page 2: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/2.jpg)
Agenda
Review of programmabilityThe Extension mechanism and the PROPOR procedureUser-Defined dialog boxesThe Dataset class and comparing datasetsExamples: custom sorting, pattern matchingBuilding applications that embed SPSSIntegrating R into SPSSQ and AWrap up
![Page 3: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/3.jpg)
Programmability extends the standard SPSS capabilitiesMakes it easy to build jobs that respond to data, output, and the environmentAllows greater generality, more automationMakes jobs more flexible and robustAllows extending the capabilities of SPSSAllows the use of existing or new statistical modules written in R or PythonEnables simpler and more maintainable codeIncreases your productivityPuts you in control
More fun
![Page 4: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/4.jpg)
SPSS syntaxBEGIN PROGRAM PYTHON or R.Python or R codeEND PROGRAM.SPSS syntaxA program in the SPSS input stream can communicate with SPSS and control it and use the language's facilities and modulesA Python or .NET application can embed SPSS inside itselfResources and forums are at SPSS Developer Central
www.spss.com/devcentralProgrammability plugins are an optional install
Programmability embeds Python or R inside SPSS
![Page 5: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/5.jpg)
BEGIN PROGRAM.import spss, spssaux, spssdatadef findUnlabelledValues(name):d = spssaux.VariableDict()labels = set(d[name].ValueLabelsTyped)data = spssdata.Spssdata(indexes=[name])values = set()for case in data:
values.add(case[0])data.close()values.discard(None)print "\nUnlabeled Values:\n",sorted(values.difference(labels))
findUnlabelledValues("origin")END PROGRAM.
Example: Automate the job of finding unlabelled values of a variable
No label may indicate an error
Unlabeled Values:[4.0, 7.0, 11.0]
![Page 6: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/6.jpg)
Python and R Are open source software
SPSS is not the owner or licensor of the Python or R software. Any user of Python or R must agree to the terms of the license agreement located on the Python or R web site. SPSS is not making any statement about the quality of the Python or R programs. SPSS fully disclaims all liability associated with your use of the Python or R programs.
![Page 7: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/7.jpg)
SPSS is divided into two parts
The SPSS Processor: invisible– Syntax processing– Computation– Data handling– Procedures– May be remote with SPSS Server
The SPSS Front End: what you see– Menus and dialog boxes– Output Viewer– Data Editor– Syntax window
![Page 8: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/8.jpg)
SPSS 16 added new programmability and scripting features
SPSS Processor– SPSS syntax– Python programs– .NET programs
SPSS Front End– SaxBasic scripts– COM support
SPSS Processor– SPSS syntax– Python programs– .NET programs– R programs– Extensions
SPSS Front End– Basic scripts (Windows)– COM support (Windows)– Python scripts
SPSS 15 SPSS 16
![Page 9: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/9.jpg)
Scripting is useful for working with Viewer contents
Scripts can be written in Python or, on Windows, in BasicPython apis have a structure similar to familiar SaxBasic scripting– Import the spssClient module
IDEs are provided for Python and BasicSPSS 17 will allow programs to use the spssClientmoduleAutoscripts are triggered by specified types of output events– E.g., creating a table of regression coefficients
Autoscripts have been generalized in SPSS 16
![Page 10: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/10.jpg)
Python and R add great functionality to SPSSMany users know only SPSS syntaxMEANS TABLES = accel BY origin
/CELLS MEAN COUNT STDDEV MEDIAN/STATISTICS LINEARITY.
Extensions define SPSS syntax for programs via XMLDefinitions are loaded automatically on SPSS startupParsed syntax is passed to Python or R moduleUser never needs to know about the programsAuthor never needs to parse SPSS syntaxPLS module in SPSS 16 is an extension
The EXTENSION mechanism turns Python or R programs into user-defined SPSS syntax
![Page 11: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/11.jpg)
Extensions simplify the author's job
User'sSPSSSyntax
SPSSParser
ExtensionXML
Author codeModule
Run
extensionmodule
Templateparsecmd
Output
The author supplies only the gold parts
The user just enters the command syntax
![Page 12: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/12.jpg)
PROPOR is a new extension procedure
Calculates confidence intervals for proportions
Produces pivot table output
PROPOR /HELP.Confidence Intervals for Proportions and Differences in Proportions.
PROPOR /HELP displays this help and does nothing else.Syntax:
PROPOR NUM=list DENOM=list [ID=varname][/DATASET NAME=dsname][/LEVEL ALPHA=value][/HELP]
Example:PROPOR NUM= 55 DENOM=100.
Developer Central
![Page 13: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/13.jpg)
PROPOR produces a pivot table of confidence intervals
![Page 14: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/14.jpg)
What about user interfaces?
SPSS
17
![Page 15: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/15.jpg)
User-defined dialog boxes look like SPSS-defined dialogs
Which is the real one?
SPSS 17
![Page 16: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/16.jpg)
Programmability can enhance procedures: A program to customize sorting in CTABLES
CTABLES /TABLE occupation[COUNT]/CATEGORIES VARIABLES=occupation ORDER=D KEY=COUNT TOTAL=YES.This table is sorted in descending order, but category Other should be at the bottom.
![Page 17: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/17.jpg)
A Program To Customize Sorting in Ctables
import spss, spssaux2spssaux2.genCategoryList("occupation", specialvalues=[4], macroname="other")spss.Submit("""CTABLES /TABLE occupation[COUNT] /CATEGORIES VARIABLES=occupation [!other] TOTAL=YES.""")
![Page 18: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/18.jpg)
Python regular expressions greatly simply tasks involving patterns in strings
A regular expression defines a pattern that can be searched for or used in a replaceExample: a dataset contains three variables, firstname, lastname, and narrative. The names need to be replaced in the narratives so that they are anonymousSample data:
![Page 19: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/19.jpg)
Using regular expressions to work with patterns: Making a narrative anonymous
begin program.import spss, spssaux, spssdata, revard = spssaux.VariableDict()curs = spssdata.Spssdata(indexes='firstname lastname narrative', accessType='w')curs.append(spssdata.vdef("anonnarrative",
vtype=vard['narrative'].VariableType + 100))curs.commitdict()wbound = r"\b"for case in curs:
fnregex = re.compile(wbound + case.firstname.strip() + wbound, flags=re.IGNORECASE)
lnregex = re.compile(wbound + case.lastname.strip() + wbound, flags=re.IGNORECASE)
newnarr = fnregex.sub("-firstname-", case.narrative)newnarr = lnregex.sub("-lastname-", newnarr)curs.casevalues([newnarr])
curs.CClose()end program.
E.g. \bSmith\b
![Page 20: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/20.jpg)
Before and After
![Page 21: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/21.jpg)
The Dataset class delivers new functionality for data management
Available for Python and .NETRetrieve, add, delete and change variables, properties, and valuesProcess multiple datasets at the same timeAccess any case by case numberIncluded in the spss module in the plug-in
SPSS
16
ds = spss.Dataset()ds.varlist['accel'].label = "acceleration" #change labelprint len(ds.cases)ds.cases[10,2] = [100] #change a value
![Page 22: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/22.jpg)
comparedatasets uses the Dataset class to compare cases and variables in two datasets
BEGIN PROGRAM.import spss, comparedatasetsc = comparedatasets.CompareDatasets("first", "second",
idvar="id", diffcount="differences", reportroot="compare")
c.cases()c.dictionaries()c.close()END PROGRAM.
As an extension:
COMPDS DS1=first, DS2=second/DATA ID=id DIFFCOUNT=differencesROOTNAME=compare.
Developer Central
![Page 23: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/23.jpg)
Comparedatasets: The output dataset reports case differences
![Page 24: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/24.jpg)
comparedatasets:A summary is written to the SPSS Viewer
You can do selection, summary statistics, and charts on the outcome variables for further information.
SPSS 17 will have a built-in procedure
![Page 25: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/25.jpg)
The Dataset class makes it easy to use the functions in the extendedTransforms module
data list fixed /dt(a21).begin data.2/22/2008 11:47:45 AM2/22/2008 11:47:45 PMend data.
begin program.import spss, extendedTransformsspss.StartDataStep()ds = spss.Dataset()ds.varlist.append("newdt", 0)ds.varlist[-1].format = (22,22,0) # DATETIME22.0 format
for i, case in enumerate(ds.cases):ds.cases[i, -1] = extendedTransforms.strtodatetime(case[0],
"%m/%d/%Y %I:%M:%S %p")
spss.EndDataStep()end program.
strtodatetimeand datetimetostrallow patternsto be usedfor dates and times
14 functions inextendedTransforms
Developer Central
![Page 26: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/26.jpg)
You can write applications where SPSS is hidden using external drives mode
Application built by SPSS Services
![Page 27: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/27.jpg)
A Reporting Application
Real nameshave beenscrambled
![Page 28: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/28.jpg)
Written entirely in PythonUses SPSS invisibly for calculation and chartingOutput is captured with the Output Management System (OMS)Uses free packages to supplement SPSS– wxPython for user interface– Reportlab for PDF production
Similar things could be done with .NET
The application was built with Python, SPSS, and standard Python packages
![Page 29: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/29.jpg)
R programs can be run inside SPSS
SPSS datasets and output can be processed by RNew SPSS datasets can be created from RR can communicate with SPSS via 30 apis
BEGIN PROGRAM R.cases <- spssdata.GetDataFromSPSS(c("mpg", "accel"), 5)spsspivottable.Display(cases, collabels=c("mpg", "accel"))END PROGRAM.
• Output appears in the SPSS Viewer• spsspivottable.Display produces pivot tables
• print() produces plain text•SPSS 17 will include graphical output
![Page 30: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/30.jpg)
R brings many statistical methods into SPSS
52 packages starting with"a"
![Page 31: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/31.jpg)
Example: Estimate Rents Using theR Package kknn: K Nearest Neighbors
BEGIN PROGRAM R.dict <- spssdictionary.GetDictionaryFromSPSS()data <-spssdata.GetDataFromSPSS()library(kknn)kl <- c("rectangular","triangular","epanechnikov", "gaussian","rank")t.con <-train.kknn(nmqm ~ wfl + bjkat + zh, data=data, kmax=25, kernel=kl)print(t.con)newv <- spssdictionary.CreateSPSSDictionary(c("predictedRent",
"Predicted Rent", 0, "F8.2", "scale"))spssdictionary.SetDictionaryToSPSS("newrents", data.frame(dict, newv))best <- (charmatch(t.con$best.parameters$kernel, klist)-1) * 25 +
t.con$best.parameters$kspssdata.SetDataToSPSS("newrents",
data.frame(c(t.con$fitted.values[[best]]), data))spssdictionary.EndDataStep()END PROGRAM. (Adapted from an Example in the kknn Package)
![Page 32: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/32.jpg)
R output appears in the Viewer. The output data appear in the Data Editor
![Page 33: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/33.jpg)
Where We Have Been Today
Programmability adds flexibility and power to SPSSThe extension mechanism integrates programs better into SPSS syntaxThe new Dataset class adds data management powerThe new scripting capabilities provide more ways to work with outputR integration opens a large collection of statistical techniques to SPSS users
![Page 34: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/34.jpg)
Questions and Answers
?
??
????
![Page 35: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/35.jpg)
In Conclusion
Programmability capabilities continue to growOpening up SPSS puts you in control through plugging in your own codeMore tasks can be automatedYou can easily tap large R and Python librariesNew capabilities extend data managementThe Extension mechanism integrates capabilities with a consistent syntax
![Page 36: Programmability in SPSS 16 & 17, Jon Peck](https://reader034.vdocument.in/reader034/viewer/2022051000/55cf9ac8550346d033a35ef1/html5/thumbnails/36.jpg)
Tell us about your programmability experiences
Jon Peck, Ph. D.
SPSS Inc233 S Wacker DriveChicago, IL [email protected]