understanding the benefits of ibm spss statistics server · 2017-01-20 · ibm spss statistics...

14
Business Analytics IBM Software IBM ® SPSS ® Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Introduction IBM SPSS Statistics Server is robust, powerful analytical software that seamlessly scales from handling the analytical needs of a single department to hundreds of users across the enterprise. It provides all of the features of IBM SPSS Statistics, plus capabilities that deliver faster performance, more efficient processing of large datasets and enhanced security in enterprise deployments. Statistics Server’s client/server architecture, its ability to take advantage of multiple processors and cores, and its advanced analytical procedures specially tuned to work with large datasets enable organizations with massive amounts of data to optimize performance on data transformations, reporting, and analytics – whether data resides in a central data center or across distributed offices. In benchmark testing designed to simulate a typical production environment, we found that most analytical procedures run faster on the Statistics Server than on the Statistics client 1 , including: Data transformation procedures (add files, aggregates, match files, etc.) – on average, 6 times faster on the Statistics Server Sort procedure – on average, 3.35 times faster on the Statistics Server Commonly used model-building procedures (regression, GLM, Mixed, nomreg, etc.) – on average, 3 times faster on Statistics Server This report discusses the high-performance capabilities available with Statistics Server, provides detailed benchmarking results and addresses other important benefits such as job automation, scheduling and scoring data. Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster with Statistics Server 6 Comparing performance between the Statistics Server and the Statistics client 7 Increase analyst productivity 7 Automating jobs with Statistics Server 8 Scoring new data with Statistics Server 8 Guidelines for purchasing Statistics Server 8 Conclusion 9 Appendix A: Description of local and distributed mode 10 Appendix B: Benchmark test details 13 Appendix C: Benchmark test results 14 About SPSS, an IBM Company 1 The results described here are based on testing done in IBM SPSS laboratories. Although our test environments simulate typical production environments in the field, we cannot guarantee that organizations performing similar tests will see identical results. This data is presented for general guidance. Actual results will vary depending on the configuration of the Statistics Server and clients (number of CPU cores, RAM, disk speed, etc.). For more details on the benchmarking performed, see Appendix B.

Upload: others

Post on 18-Apr-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Understanding the Benefits of IBM SPSS Statistics Server · 2017-01-20 · IBM SPSS Statistics Server technology and the configuration, accessing files over a LAN is anywhere from

Business AnalyticsIBM Software IBM® SPSS® Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server

IntroductionIBM SPSS Statistics Server is robust, powerful analytical software that seamlessly scales from handling the analytical needs of a single department to hundreds of users across the enterprise. It provides all of the features of IBM SPSS Statistics, plus capabilities that deliver faster performance, more efficient processing of large datasets and enhanced security in enterprise deployments.

Statistics Server’s client/server architecture, its ability to take advantage of multiple processors and cores, and its advanced analytical procedures specially tuned to work with large datasets enable organizations with massive amounts of data to optimize performance on data transformations, reporting, and analytics – whether data resides in a central data center or across distributed offices.

In benchmark testing designed to simulate a typical production environment, we found that most analytical procedures run faster on the Statistics Server than on the Statistics client1, including:

Data transformation procedures (add files, aggregates, match files, •

etc.) – on average, 6 times faster on the Statistics Server Sort procedure – on average, 3.35 times faster on the Statistics Server •

Commonly used model-building procedures (regression, GLM, Mixed, •

nomreg, etc.) – on average, 3 times faster on Statistics Server

This report discusses the high-performance capabilities available with Statistics Server, provides detailed benchmarking results and addresses other important benefits such as job automation, scheduling and scoring data.

Contents:

1 Introduction

2 Performance 101: Understanding the drivers of better performance

3 Why performance is faster with Statistics Server

6 Comparing performance between the Statistics Server and the Statistics client

7 Increase analyst productivity

7 Automating jobs with Statistics Server

8 Scoring new data with Statistics Server

8 Guidelines for purchasing Statistics Server

8 Conclusion

9 Appendix A: Description of local and distributed mode

10 Appendix B: Benchmark test details

13 Appendix C: Benchmark test results

14 About SPSS, an IBM Company

1 The results described here are based on testing done in IBM SPSS laboratories. Although our test environments simulate typical production environments in the field, we cannot guarantee that organizations performing similar tests will see identical results. This data is presented for general guidance. Actual results will vary depending on the configuration of the Statistics Server and clients (number of CPU cores, RAM, disk speed, etc.). For more details on the benchmarking performed, see Appendix B.

Page 2: Understanding the Benefits of IBM SPSS Statistics Server · 2017-01-20 · IBM SPSS Statistics Server technology and the configuration, accessing files over a LAN is anywhere from

Business AnalyticsIBM Software IBM SPSS Statistics Server

Performance 101: Understanding the drivers of better performanceA number of parameters can affect the performance of an analytical procedure, including the number of central processing units (CPUs) or cores, the amount of random access memory (RAM), the speed and configuration of the disk drives, and the location of the data being analyzed.

Number of CPUs/coresIdeally, analytical procedures should run twice as fast on two CPUs, three times as fast on three CPUs, and so on. However, such perfect scalability is rarely achieved in reality, and the performance benefits of multiple CPUs/cores vary from procedure to procedure as explained below.

Degree of parallelizationThis is the extent to which a procedure can be parallelized or broken into multiple independent tasks. Procedures that can be easily parallelized and scheduled to run simultaneously on different CPUs/cores benefit the most. Procedures that are inherently serial or require a lot of disk I/O – for example, crosstabs and frequencies – will not benefit to a great extent from multiple CPUs/cores.

Parallelization overheadThis is the overhead associated with breaking up a procedure into independent tasks, scheduling each task and then merging the results. As operating systems and hardware platforms differ in the way tasks are partitioned and distributed across CPUs/cores, it is reasonable to expect the parallelization overhead to vary between platforms.

MemoryMemory, in the context of this paper, refers to the amount of physical RAM on the machine. For faster performance, it’s best to have the entire dataset that an analytical procedure executes on in RAM. Accessing data from RAM is much faster than accessing data from a disk. If the dataset cannot be held in its entirely in RAM, there is a cost associated with swapping parts of the dataset between RAM and disk.

Disk drives/computer storage devicesAlthough there are several storage device technologies and configurations, high-end hard drives spin at 10,000 to 15,000 rpm, and can achieve sustained transfer rates up to 125 MB/sec. High-speed storage devices can dramatically improve performance when doing data transformations like sorts, merges, aggregates etc. on large datasets.

Accessing files over a LAN vs. WANSimply stated, a local area network (LAN) is the network technology used within an office to access datasets. A wide area network (WAN) is the network technology used across offices to access datasets. Although the speed of the LAN and WAN will vary depending on the type of

2

Page 3: Understanding the Benefits of IBM SPSS Statistics Server · 2017-01-20 · IBM SPSS Statistics Server technology and the configuration, accessing files over a LAN is anywhere from

IBM SPSS Statistics Server

technology and the configuration, accessing files over a LAN is anywhere from 20 to 40 times faster than accessing files over a WAN. Performance of an analytical procedure is much faster when the dataset is accessed over a LAN than when it is accessed over a WAN.

Why performance is faster with Statistics Server

No need to transfer datasets between distributed officesThe Statistics Server, when configured with the Statistics client in distributed mode (see Appendix A for a description of distributed mode), supports client/server architecture. In this configuration, the Statistics Server is installed in the central data center, in close proximity to the data. Users across the enterprise (in central and distributed offices) use the Statistics client to connect to the Statistics Server. All of the analytical processing and data access takes place on the Statistics Server; only the results of the analysis are transferred over the network to the Statistics client. This makes the Statistics Server an ideal solution for users in remote offices or users who travel frequently and require access to analytical capabilities on the go.

As the need to transfer large datasets to end users’ desktops is eliminated, the data transferred over the network is minimized and performance is improved. This prevents bandwidth saturation and improves performance of not only the Statistics application, but other mission-critical applications as well, including e-mail, enterprise resource planning (ERP) and customer relationship management (CRM) and other network applications.

Timing in seconds to access a data file

File Size

Statistics client connecting directly to the data over a WAN (T1 3.0 Mbps)

Statistics client connecting to the Statistics Server at the data center over a WAN (T1 3.0 Mbps)

Time saved with Statistics Server in secs

50 MB 2 min, 10 secs 4 secs 2 min, 6 secs

250 MB 10 min, 50 secs 40 secs 10 min, 10 secs

1 GB 43 min, 17 secs 80 secs 41 min, 57 secs

Table 1. Comparing time to access data using the Statistics client in local mode (accessing files in the data center directly over the WAN) vs. accessing the same data using the Statistics client to connect to the Statistics Server over the WAN, with data access handled by the Statistics Server 2

We recommend Statistics Server for organizations with distributed offices that need to access files greater than 25 MB across offices.

Business AnalyticsIBM Software

3

2 The results are based on the assumption that the available bandwidth is 3.0 Mbps. In reality, the time saved will be greater as bandwidth is taken up by other applications such as e-mail, network backups, etc. The data presented here is for illustrative purposes only. Actual results will vary depending on the configuration, bandwidth, and latency of the WAN; therefore, organizations performing similar tests may not see identical results.

Page 4: Understanding the Benefits of IBM SPSS Statistics Server · 2017-01-20 · IBM SPSS Statistics Server technology and the configuration, accessing files over a LAN is anywhere from

IBM SPSS Statistics Server

As shown in Table 1, significant time savings can be achieved with Statistics Server when accessing files in distributed offices: for example, 2 minutes for a 25 MB file, 10 minutes for a 250 MB file, and 42 minutes for a 1 GB file.

MultithreadingMultithreading is the technical term used to break a task into multiple tasks that can be executed in parallel. As discussed above, not all analytical procedures can take advantage of multithreading. The procedures that are multithreaded in Statistics are listed in Table 2 below. In Statistics Server, there is no limit to the number of threads supported per procedure. The number of threads can be configured automatically for a user or group, or can be set manually. Users can also set the number of threads on a per procedure basis.

Procedure family Procedure Name

Correlations BivariatePartial

Regression LinearOrdinalMultinomialLogistic

Data Reduction Factor Analysis

Survival Analysis Cox RegressionLogistic Regression

Multiple Imputation Impute missing values

Table 2: List of multithreaded analytical procedures

As shown in Appendix C, the benefits of multithreading become more pronounced as the number of variables3 increases (wide datasets). The results of the benchmark testing show that the performance of the following commonly used analytical procedures improved significantly as the number of threads increases from 4 to 16:4

Linear regression procedure: improved by 52 percent•

Factor procedure: improved by 43 percent•

Cox regression procedure: improved by 24 percent•

Correlation procedure: improved by 24 percent•

Additional details on the benchmark tests that demonstrate the benefits of multithreading can be found in Appendix C.

Business AnalyticsIBM Software

4

3 The term variables refers to the number of columns or predictors in your dataset.4 The results shown are based on testing done in SPSS, an IBM Company’s

laboratories. Although our test environments simulate typical production environ-ments in the field, we cannot guarantee that organizations performing similar tests will see identical results. This data is presented for general guidance. Actual results will vary depending on the configuration of the Statistics Server and clients (number of CPU cores, RAM, disk speed, etc.)

Page 5: Understanding the Benefits of IBM SPSS Statistics Server · 2017-01-20 · IBM SPSS Statistics Server technology and the configuration, accessing files over a LAN is anywhere from

IBM SPSS Statistics Server

Support for 64-bit computingThe total amount of RAM supported depends on the processor. Theoretically, 32-bit processors are limited to accessing 4 GB of RAM. Typically, the RAM available to an application on a 32-bit machine is much lower for several reasons:

Most machines with 32-bit processors are not configured with 4 GB •

of RAM because RAM is expensiveThe operating system requires some RAM as well•

Hence, on machines with 32-bit processors configured with the maximum amount of RAM, the RAM available to the application is approximately 2 to 3 GB. On machines with 64-bit processors, the amount of RAM supported is several multiples higher. Analytical procedures that run on large datasets will run much more slowly on a 32-bit machine than on a 64 bit machine because of the disk activity required to swap parts of the dataset into and out of RAM.

SQL PushbackThe Statistics Server supports the pushback of sorts and aggregates to a SQL database. When large datasets are sourced from a SQL database, SQL pushback ensures that operations that can be performed more efficiently in the database are performed there.

Support for advanced analytical procedures tuned to work with large datasets with a lot of predictorsStatistics Server supports advanced procedures like Naïve Bayes and the Predictor Selector algorithm that are specially designed for wide datasets with a large number of predictors. These analytical procedures are not available in the Statistics client when configured in local mode.

Support for server operating systems and hardwareThe Statistics Server is designed to support server operating systems and hardware. Desktop operating systems, namely Windows® XP and Vista®, are limited to two processors or sockets5. Server operating systems in general support a greater number of processors or sockets. As discussed above, procedures that can be parallelized run much faster on an operating system that supports a greater number of sockets or processors. Additionally, server operating systems have several sophisticated features that improve performance, scalability, and resilience.

Unlike the Statistics Base client, which is limited to a maximum of four CPUs or cores, an analytical procedure performed on the Statistics Server can access an unlimited number of CPUs and cores.

Business AnalyticsIBM Software

5

5 The Windows Vista and XP do not limit the number of cores per socket.

Page 6: Understanding the Benefits of IBM SPSS Statistics Server · 2017-01-20 · IBM SPSS Statistics Server technology and the configuration, accessing files over a LAN is anywhere from

IBM SPSS Statistics Server

Statistics Server is ideal for organizations with a single office that need to perform analysis on files that are greater than 100 MB

Comparing performance between the Statistics Server and the Statistics clientResults of specific procedures6 run on both the Statistics Server and the Statistics client demonstrate that:

Data transformation procedures (add files, aggregates, match files, etc.) •

run on average 6 times faster on the Statistics serverSort procedure runs on average 3.35 times faster on Statistics Server•

Commonly used modeling procedures such as regression, GLM, •

Mixed, and nomreg run on average 3 times faster on Statistics Server

Rather than simply time several procedures independently, the benchmarking test was structured to simulate a typical job run in a production environment. Groups of related procedures were then assembled into test suites. This grouping was meant to reflect a certain type of analysis or data processing that a Statistics user might execute in the course of a day’s work. Five test suites were developed as listed below:

1. Data transformations: add files, aggregates, case to variables, sort, etc.2. Simple multi-threaded procedures: correlation, factor, etc.3. Building models: GLM, mixed, nomreg4. Data mining: trees5. Statistical calculations: beta, srange, smod, poisson, etc.

Groups of related procedures

Time saved with Statistics Server

Average speedup with Statistics Server

Data transformations 64.95% 5.92

Sort 69.90% 3.35

Commonly used multi-threaded procedures (N=10M cases)

47.52% 2.31

Building models 62.19% 2.90

Data mining 43.98% 1.44

Statistical calculations 62.44% 2.90

AVERAGE 60.60% 2.54

Table 3: Benchmarking results for jobs run on Statistics Server and the Statistics client7

Business AnalyticsIBM Software

6

6 The results shown are based on testing done in IBM SPSS laboratories. Although our test environments simulate typical production environments in the field, we cannot guarantee that organizations performing similar tests will see identical results. This data is presented for general guidance. Actual results will vary depending on the configuration of the Statistics Server and clients (number of CPU cores, RAM, disk speed, etc.)

7 The results shown in Table 3 are based on testing done in IBM SPSS laboratories. Although our test environments simulate typical production environments in the field, we cannot guarantee that organizations performing similar tests will see identical results. This data is presented for general guidance. Actual results will vary depending on the configuration of the Statistics Server and clients (number of CPU cores, RAM, disk speed, etc.)

Page 7: Understanding the Benefits of IBM SPSS Statistics Server · 2017-01-20 · IBM SPSS Statistics Server technology and the configuration, accessing files over a LAN is anywhere from

IBM SPSS Statistics Server

The results in Table 3 show that, on average, the Statistics Server is 2.54 times faster than the Statistics client (on a procedure basis), and the time saved on a typical Statistics job is 60.6 percent.

Description of capability

Statistics Server

Statistics client configured in local mode

Supports client/server architecture. Datasets don’t have to be downloaded to a user’s desktop.

Yes No. All files need to be downloaded to the user’s desktop.

Supports multiple processors and cores

No limit to num-ber of CPU and cores sup-ported.

Number of threads is limited to 4. This limits the number of CPUs and cores supported to 4.

Supports Server operating system and hardware

Yes No

Table 4. The reasons why a job run on Statistics Server is faster than a job run on the Statistics client.

Table 4 compares the capabilities of Statistics Server with those of the Statistics client configured to connect locally to illustrate why jobs can be run significantly faster using the server software.

Additional information on the benchmarking tests, including the test suite procedures, dataset sizes and configuration of the Statistics Server and client, are provided in Appendix B.

Increase Analyst ProductivityStatistics Server’s high-performance capabilities enable organizations to achieve significant gains in productivity. When users are connected to a Statistics Server in distributed mode, they can initiate multiple analytical jobs concurrently. This is an important advantage over the client software, particularly when performing data transformation jobs on large datasets. Because all of the processing is done on the Statistics Server, users can continue to work on their desktops while running several jobs at the same time.

Automating jobs with Statistics ServerThe Statistics batch facility available with Statistics Server is ideal for performing jobs that are repetitive and need to be performed at regular intervals. Efficiencies are realized as the manual tasks associated with running weekly, monthly or quarterly reports are minimized.

Business AnalyticsIBM Software

7

Analysts can run multiple analytical jobs at the same time while continuing to work on their desktops.

Page 8: Understanding the Benefits of IBM SPSS Statistics Server · 2017-01-20 · IBM SPSS Statistics Server technology and the configuration, accessing files over a LAN is anywhere from

IBM SPSS Statistics Server

Additionally, when Statistics Server is used with IBM® SPSS®

Collaboration and Deployment Services, these jobs can be scheduled automatically, leveraging this platform’s content management and scheduling capabilities. Run time variables are supported, allowing the same job to be run multiple times with different input parameters. More importantly, the output of the job (the report, etc.) can be stored in the repository and accessed directly by business users through a dashboard. (A Web interface is available with Collaboration and Deployment Services.)

Scoring new data with Statistics ServerThe Statistics Server ships with a scoring engine that allows new data to be scored. Users connected to Statistics Server in distributed mode can open one or more models created in Statistics, IBM® SPSS® Modeler or IBM® SPSS® AnswerTree®, and score new data. This capability is not available with the Statistics client in local mode.

Guidelines for purchasing Statistics ServerThe Statistics Server is especially designed for the following scenarios:

Organizations with distributed offices looking to centralize their data •

and IT infrastructure in one or more data centersOrganizations with distributed offices that need to analyze and share •

files greater than 25 MB across officesOrganizations looking to virtualize applications and desktops using •

enabling technologies like Citrix® Terminal Server. These servers are especially tuned to presenting applications and user interfaces and are not designed to handle the high CPU and I/O intensive work load of analytic jobs. Statistics Server ensures that the heavy processing is offloaded from the Citrix/Terminal server box and ensures better performance and availability. Organizations that need to perform analysis on large datasets (greater •

than 100 MB) sourced from a SQL server or a data warehouse

Conclusion Statistics Server is sophisticated analytical server software that provides robust, scalable analytical capabilities when working with large datasets. It supports a client/server architecture that enables organizations to pursue a centralization strategy. Because large datasets do not have to move across offices for analysis, performance improves, resulting in greater analyst productivity and efficiency in distributed offices.

Business AnalyticsIBM Software

8

Page 9: Understanding the Benefits of IBM SPSS Statistics Server · 2017-01-20 · IBM SPSS Statistics Server technology and the configuration, accessing files over a LAN is anywhere from

IBM SPSS Statistics Server

In addition, because Statistics Server is a foundational technology, organizations that invest in it can leverage it in many ways. For example, Statistics Server, when integrated with Collaboration and Deployment Services, enables them to:

Automate scheduling of Statistics jobs•

Store the output of a Statistics job in a portal where it can be accessed •

by business usersDeploy simplified analytical capabilities targeted to business users via a •

Web interface for jobs executed on Statistics Server

When integrated with Modeler, Statistics Server enables organizations to:

Take advantage of advanced data mining algorithms and a •

complementary, process-driven approach for building and scoring modelsIntegrate advanced model management and deployment capabilities •

seamlessly with existing business processesExcel in today’s fast-paced business environment by building and •

deploying many highly accurate models without requiring deep statistical expertise

Appendix A: Description of local and distributed mode

Local modeWhen running in local mode, all the analysis is performed on the user’s desktop computer using the CPU resources on the desktop itself. All of the data that is being analyzed needs to be transferred to the local user’s desktop (see Fig 1). If users are performing transformations on data located in a shared network resource, the transformed data must be transferred across the network to be saved on the file server or database. As the size of the data and the number of users increase, these data transfers can take up an appreciable amount of network bandwidth, adversely impacting network performance and the performance of other mission-critical applications like ERP, CRM, and e-mail that run on the network. This makes local mode more suitable for organizations with single offices and relatively smaller datasets.

Figure 1. Statistics run in local mode.

Business AnalyticsIBM Software

9

Page 10: Understanding the Benefits of IBM SPSS Statistics Server · 2017-01-20 · IBM SPSS Statistics Server technology and the configuration, accessing files over a LAN is anywhere from

IBM SPSS Statistics Server

Distributed modeIn distributed mode, all the analysis is performed on the Statistics Server, located at the central datacenter (typically co-located with the data files). Because the analysis is performed on the Statistics Server, there is no need to transfer data to individual users’ desktops. As all the data transfers are localized between the Statistics Server and the file Server/database, performance is greatly improved. Only the results of the analysis – typically a fraction of the size of the original data – are transferred to the Statistics client.

Figure 2. Statistics in distributed mode.

Appendix B: Benchmark test details

Configuration All the testing was done using the batch facility8 or Statistics. Datasets were local to the Statistics Server. It is reasonable to expect similar results when using a Statistics client to connect to the Statistics Server (distributed mode). When comparing the performance between running a job using the batch facility vs. running the same job in distributed mode, there is a small overhead associated with distributed mode. This is because in distributed mode, the results of the analysis get transferred across the network from the Statistics Server to the end users machine. In batch facility, the results of the analysis are written to a disk drive/network share accessible to the Statistics Server. As the output of the analysis is typically small in size, the overhead associated with transferring this output on a properly configured network is minimal.

Repeated trialsTo help control for the chance variation of any single test run, each test suite was repeated three times. The average time in seconds is reported.

Business AnalyticsIBM Software

10

8 Typically the client for Statistics server is the Statistics client running on a desktop computer. The Statistics Server batch facility is an alternative way to use the power of the Statistics Server. StatisticsB is a command line executable that runs on the server computer where the Statistics Server is installed. StatisticsB is intended for automated production of statistical reports. Automated production provides the ability to run analyses without user intervention. Automated production is advantageous if users are required to perform repetitive time-consuming analyses, such as weekly reports. StatisticsB takes as its input a syntax file containing the data transformation and/or analytical procedures to run, with several command line arguments to control the format of or customize the output.

Page 11: Understanding the Benefits of IBM SPSS Statistics Server · 2017-01-20 · IBM SPSS Statistics Server technology and the configuration, accessing files over a LAN is anywhere from

IBM SPSS Statistics Server

Configuration of the Statistics ServerCPU: 4 CPUs, Intel Xeon 3 GHz, dual core Hyper threadedRAM: 8 GB Operating system: Windows 2003 Server, 64-bit

Configuration of Statistics clientCPU: 1 CPU, Intel T 7500, 2.19GHz, dual coreRAM: 3 GB Operating system: Windows XP, 32-bit

Details on the datasetTwo datasets were used:

Dataset 1: Size 2.1 GB, 5 million cases, 127 variables•

Dataset 2: Size 3 GB, 10 million cases, 127 variables (used for simple •

multithreaded procedures; see table 5 for details)

Groups of related procedures

Statistics Server (64 bit)*

Statistics Client (32 bit)**

Time saved

Average speedup (multiple of times faster)

Data transformations

ADD FILES 18.45 169.34 89.10% 9.18

AGGREGATE 33.19 94.95 65.04% 2.86

CASESTOVARS 8.84 7.94 -11.34% 0.90

MATCH FILES 22.00 224.17 90.19% 10.19

VARSTOCASES 0.08 0.31 74.19% 3.88

UNIFORM (Simple COMPUTE)

38.13 217.80 82.49% 5.71

Average time saved 61.44%

Average speedup 6.02

Sort

SORT NUMERIC 146.90 578.73 74.62% 3.94

SORT STRING 183.28 526.33 65.18% 2.87

Average time saved 69.90%

Average speedup 3.35

Table 5: Benchmarking data comparing Statistics Server with the Statistics client9

• Numberofthreads8••Numberofthreads2

Business AnalyticsIBM Software

11

9 The data shown is based on testing done in IBM SPSS laboratories. Although our test environments simulate typical production environments in the field, we cannot guarantee that organizations performing similar tests will see identical results. This data is presented for general guidance. Actual results will vary depending on the configuration of the Statistics Server and clients (number of CPU cores, RAM, disk speed, etc.)

Page 12: Understanding the Benefits of IBM SPSS Statistics Server · 2017-01-20 · IBM SPSS Statistics Server technology and the configuration, accessing files over a LAN is anywhere from

IBM SPSS Statistics Server

Groups of related procedures

Statistics Server (64 bit)*

Statistics Client (32 bit)**

Time saved

Average speedup (multiple of times faster)

Simple Multithreaded Procedures (N=10M)

CORRELATION 230.78 800.83 71.18% 3.47

FACTOR 140.95 219.22 35.70% 1.56

PARTIAL CORR 141.75 217.64 34.87% 1.54

REGRESSION (120 dependent vari-ables)

145.55 281.72 48.34% 1.94

Average time saved 47.52%

Average speedup 2.31

Building Models

GLM 70.09 350.91 80.03% 5.01

MIXED 116.23 174.13 33.25% 1.50

NOMREG 57.45 170.48 66.30% 2.97

REGRESSION 30.64 99.36 69.16% 3.24

Average time saved 62.19%

Average speedup 2.90

Data Mining

TREES 615.00 885.49 43.98% 1.44

Average time saved 43.98%

Average speedup 1.44

Statistical Calculations

BETA 40.12 106.20 62.22% 2.65

CFVAR & BETA 73.75 250.19 70.52% 3.39

POISSON BERNOULLI

38.55 84.87 54.58% 2.20

Average time saved 62.44%

Average speedup 2.90

ToTAL TIMe 2151.73 5460.61 60.60% 2.54

Table 5 (continued)

• Numberofthreads8••Numberofthreads2

Business AnalyticsIBM Software

12

Page 13: Understanding the Benefits of IBM SPSS Statistics Server · 2017-01-20 · IBM SPSS Statistics Server technology and the configuration, accessing files over a LAN is anywhere from

IBM SPSS Statistics Server

Appendix C: Benchmark test results

Multi-threaded procedure names File Size

Number of

cases

Number of

variables

Number of threads

4 8 16 8 16

Time in secondsTime saved in

seconds

Discriminant 351MB 200,000 202 7.48 7.04 7.37 5.88% 5.88%

Csscoxreg 565.86 431.40 351.89 23.76% 23.76%

Sort 2.7GB 2,000,000 13 255.20 220.64 224.80 13.54% 13.54%

Csordinal 45.27 49.33 54.25 -8.97% -8.97%

Cslogistic 48MB 100,000 50 70.71 57.64 59.20 18.48% 18.48%

Linear regression 703MB 200,000 400 30.93 14.74 11.83 52.34% 52.34%

Factor 703MB 200,000 400 62.66 35.53 24.11 43.30% 43.30%

Correlation 24.60 18.61 22.39 24.35% 24.35%

Partially correlated 16.14 11.33 11.35 29.80% 29.80%

Nomreg 16.73 13.64 14.34 18.47% 18.47%

Csselect 42.67 42.10 42.52 1.34% 1.34%

ToTAL TIMe 1138.25 902.00 824.05

PerCeNTAge TIMe SAveD overALL 20.76% 27.60%

Table 6: Benchmarking results demonstrating performance improvements as the number of threads increases10.

As the number of threads increases from 4 to 16:The linear regression procedure improves by 52 percent•

The factor procedure improves by 43 percent•

The COX regression procedure improves by 24 percent•

The correlation procedure improves by 24 percent•

Overall, performance for the multithreaded procedures increases by •

27.60 percent as the number of threads increases from 4 to 8

Business AnalyticsIBM Software

13

10 The data shown is based on testing done in IBM SPSS laboratories. Although our test environments simulate typical production environments in the field, we cannot guarantee that organizations performing similar tests will see identical results. This data is presented for general guidance. Actual results will vary depending on the configuration of the Statistics Server and clients (number of CPU cores, RAM, disk speed, etc.).

Page 14: Understanding the Benefits of IBM SPSS Statistics Server · 2017-01-20 · IBM SPSS Statistics Server technology and the configuration, accessing files over a LAN is anywhere from

About SPSS, an IBM Company SPSS, an IBM Company, is a leading global provider of predictive analytics software and solutions. The company’s complete portfolio of products - data collection, statistics, modeling and deployment - captures people’s attitudes and opinions, predicts outcomes of future customer interactions, and then acts on these insights by embedding analytics into business processes. IBM SPSS solutions address interconnected business objectives across an entire organization by focusing on the convergence of analytics, IT architecture and business process. Commercial, government and academic customers worldwide rely on IBM SPSS technology as a competitive advantage in attracting, retaining and growing customers, while reducing fraud and mitigating risk. SPSS was acquired by IBM in October 2009. For further information, or to reach a representative, visit www.spss.com.

© Copyright IBM Corporation 2010

SPSS Inc., an IBM Company Headquarters, 233 S. Wacker Drive, 11th floor Chicago, Illinois 60606

SPSS is a registered trademark and the other SPSS products named are trademarks of SPSS Inc., an IBM Company. © 2010 SPSS Inc., an IBM Company. All Rights Reserved.

IBM and the IBM logo are trademarks of International Business Machines Corporation in the United States, other countries or both. For a complete list of IBM trademarks, see www.ibm.com/legal/copytrade.shtml.

Other company, product and service names may be trademarks or service marks of others.

References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates.

Any reference in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

YTW03038USEN-00

Please Recycle