quantum
TRANSCRIPT
Quantum
WHAT IS QUANTUM AND WHAT DOES IT DO?______________________5
Stages in a Quantum run:______________________________________________________5
Basic Elements In Quantum____________________________________________________6
Different Number types that can be used in Quantum:______________________________8Whole numbers____________________________________________________________________8Real numbers______________________________________________________________________8
Variables and arrays____________________________________________________9Data variables__________________________________________________________________9Integer variables_______________________________________________________________10Real variables__________________________________________________________________10Subscription____________________________________________________________________11
Expressions_____________________________________________________________12Arithmetic expressions_________________________________________________________12C o m b i n i n g a r i t h m e t i c e x p r e s s i o n s _________________________________13
Counting the number of codes in a column______________________________________15
Generating a random number_________________________________________________16
Logical expressions__________________________________________________________16
Comparing data variables and data constants____________________________________18
Fields of data variables_______________________________________________________21
C h e c k i n g t h e a r i t h m e t i c v a l u e o f a f i e l d o f c o l u m n s___________________________________________________________________________23
C o m b i n i n g l o g i c a l e x p r e s s i o n s ______________________________24Comparing variables and arithmetic expressions to a list_______________________27
Naming lists________________________________________________________________29
Speeding up large programs___________________________________________________30
How Quantum reads data_____________________________________________________30Types of record______________________________________________________________30Ordinary records_____________________________________________________________30Multicard records____________________________________________________________31
Multicard records with Trailer Cards___________________________________________31Reading data into the C array__________________________________________________31Ordinary records_______________________________________________________________31Multicard records_______________________________________________________________31Ignoring card types_____________________________________________________________32
Processing the data__________________________________________________________32
Changing the contents of a variable__________________________________32
Trailer Cards____________________________________________________________32
Allread________________________________________________________________________33firstread and lastread_______________________________________________________33Reserved variables_________________________________________________________________33
Describing the data structure for Multicard records___________________________34Record type__________________________________________________________________34
Ordinary Records____________________________________________________________34Multicard Records____________________________________________________________34
Record length________________________________________________________________35Serial number location______________________________________________________________35Card type location_________________________________________________________________36Required card types________________________________________________________________36Repeated card types________________________________________________________________37Highest card type number__________________________________________________________38Dealing with alphanumeric card types_______________________________________________38
Merging Data using Quantum_________________________________________________39Merge sequence for Trailer Cards___________________________________________________39Merging data files________________________________________________________________39Merging complete cards___________________________________________________________40Merging a field of data from an external file__________________________________________42
Writing out data____________________________________________________________44Print files________________________________________________________________________44Printing out individual records________________________________________________________44Writing Out Parts of Records_________________________________________________________48Data files________________________________________________________________________49Creating new cards_______________________________________________________________49
Some General Instances for forcecoding cleaning etc.______________________________50Writing to a report file______________________________________________________________50Assignment statements______________________________________________________________51Copying codes____________________________________________________________________51Assignment with and, or and xor______________________________________________________53Adding codes into a column_________________________________________________________53Deleting codes from a column________________________________________________________54Forcing single-coded answers________________________________________________________54Setting a random code in a column____________________________________________________55Reading numeric codes into an array___________________________________________________56Clearing variables_________________________________________________________________59Flow control__________________________________________________________________59Statements of condition_____________________________________________________59Examining records_____________________________________________________________62Holecounts_____________________________________________________________________62Frequency distributions_____________________________________________________63require________________________________________________________________________64Column and code validation_________________________________________________64Comments with require______________________________________________________66Checking codes in columns__________________________________________________66Exclusive codes______________________________________________________________67Automatic error correction__________________________________________________68Validating logical expressions______________________________________________69Testing the equivalence of logical expressions____________________________69Actions when a require statement fails____________________________________70Data correction______________________________________________________________70Forced editing (forced cleaning)____________________________________________71
Introduction to the tabulation_________________________________________72
The hierarchy of the tabulation section____________________________________72Components of a tabulation program______________________________________72Run control statements_____________________________________________________72Defining run conditions______________________________________________________72Table control statements___________________________________________________________74Creating a table______________________________________________________________74commonly used options in tab section_____________________________________75Axis control statements_____________________________________________________76factors________________________________________________________________________79Miscellaneous ‘n’ statements_______________________________________________80More commands to generates counts______________________________________80The col statement______________________________________________________________80The val statement___________________________________________________________81The fld statement____________________________________________________________82
Weighting in Quantum_________________________________________________83Weighting methods__________________________________________________________83Types of weighting__________________________________________________________83
Descriptive statistics___________________________________________________85
Quanvert________________________________________________________________86
Structure of Quantum Spec:_______________________________________89
WHAT IS QUANTUM AND WHAT DOES IT DO?Quantum is a highly sophisticated and very flexible computer language designed to simplify the
process of obtaining useful information from a set of questionnaires. So it converts technical
information collected by using questionnaires into managerial Information by programming
Quantum performs a variety of tasks. It can:
► check and validate the data
► edit and correct the data
► produce different types of lists and reports of data
► produce new data files
► recode data and produce new variables
► generate tables
► Perform statistical calculations.
Stages in a Quantum run:
A. First, the data is read onto a disk. Data on disk can come from a number of different
sources, for example:
o It may be entered directly via a terminal by a telephone interviewer using
Quancept CATI.
o It may be collected over the World Wide Web using software such as Quancept
Web.
o It may be entered directly into a computer by an interviewer conducting a
personal interview using Quancept CAPI.
o It may be entered by a data entry clerk using a data entry package.
B. Next, the tasks to be performed are defined using the Quantum language.
C. Then, Quantum translates these tasks into instructions that the computer can
understand.
D. Finally, the computer itself uses this program to run your job.
Quantum comprises two sections – an edit and tabulation section. The edit section checks and
validates the data, generates lists and reports, corrects data, produces new data files, and
recodes data and creates new variables. The tabulation section produces tables and performs
statistical calculations.
Quantum reads the records in the data file one at a time and passes them through the various
parts of the Quantum program. As long as there are records remaining in the data file, the loop of ‘read a record -edit - tabulate’ is repeated; once the last record has been processed, the tables are ready for printing.
Basic Elements In Quantum
There are three basic elements in Quantum:
o Data constants
o Integer numbers
o Real numbers
Which are stored in variables:
o Data variables store data constants
o Integer variables store whole numbers
o Real variables store real numbers
Individual constants
An individual constant is one or more of the codes 1234567890–& or blank. The – is sometimes
referred to as the 11 or X punch, and & is sometimes called the 12, V or Y punch. Each code
represents one answer to a question. For example, let’s take the question ‘What is your favorite
color?’ which has the response list:
Red 1
Yellow 2
Blue 3
Green 4
Black 5
White 6
These codes are coded into one column. If my favorite color is green, this will appear in the data
file as a 4 in the appropriate column, just as if your favorite color is red, there will be a 1 in that
column. To refer to these answers inside your Quantum program (maybe we only want our table
to include those respondents whose favorite color is blue), type in the code enclosed in single
quotes: ’3’ You will also have to tell Quantum which column to look in. Several codes may be
combined in the same column and are called multicodes.. Multicodes or multicoding mean two or
more codes in the same column. Suppose the next question asks me to choose three colors from
the same list; I pick yellow, black and white. If these answers were all coded in the same column
(a multicoded column),
They would be referred as :
’256’ or ’526’ or ’652’
Or
Any other variation of those three codes. Quantum does not care what the codes are entered in.
If you have a series of consecutive codes in the order &–01234567890–& you may either type
each code separately or you may enter the first and last codes separated by a slash (/) meaning
‘through’, as shown below:
’1/7’ means ’1234567’
’&/4’ means ’&–01234’
’&/9’ means ’&–0123456789 (all 12 codes)
’1/&’ means ’1234567890–& (all 12 codes)
As you can see, the last two examples mean exactly the same thing. However, the notations ’0/&’
and ’0–&’ are not the same: ’0/&’ means ’01234567890–&’ whereas ’0– &’ is ’0’, ’–’ and ’&’ only.
Some combinations of codes represent ASCII characters; that is, they represent characters which
you can type on your screen:
’&1’ is the equivalent of ’A’
’&2’ is the equivalent of ’B’
The only time you would use letters rather than codes (i.e., ’A’ rather than ’&1’) is when the
questionnaire tells you that a column should contain a letter.
Sometimes we may need to write a notation for ‘no codes’ – for instance, if my favorite color does
not appear in the list of choices. To do this, we write ’ ’ (i.e., a blank enclosed in single quotes).
Strings of data constants
To refer to a string of codes in a field of columns, it has to be provided between two “$” signs:
e.g.
$codes$
When data constants are single-coded or the multicodes correspond to ASCII characters (e.g. A’,
’B’) they may be strung together. Strings of data constants are sometimes called literals or
column fields. Strings are enclosed in dollar signs, with the component single codes losing their
single quotes. For example:
$12345$ $ABC$ $916 7&$
The first string is five columns long with 1 in the first column, 2 in the second, 3 in the third, and
so on. The third string is six columns wide with the fourth column being blank.
Instances when strings might be used are:
• When we want to refer to a questionnaire serial number
• When the answers to a question are represented by codes of more than 1 digit. For example, in
a car ownership survey the car make and model owned may be represented by a 3-digit code. To
pick up respondents owning a particular type of car you would need to check whether the relevant
columns contained the code for that car. For instance, to look for owners of Ford Escorts you
might ask Quantum to search for the string $132$ in a particular field of columns.
Different Number types that can be used in Quantum:
Quantum can deal with whole numbers (integers) in the range -2,147,483,647 to
2,147,483,647.
Real numbers are numbers containing decimal points. To be valid, they must have at least one
digit on either side of the decimal point:
0.1 and 1.0 are correct
.1 and 1. are not
Quantum deals with real numbers of any size with accuracy up to six significant figures. Numbers
with more than six significant figures have the sixth figure rounded up or down depending on the
value of the remaining figures.
96.82529 is rounded to 96.8253
189462.1 is rounded to 189462.0
Variables and arrays
There are three types of variables – data, integer and real – each used for storing different types
of information. You may create your own variables with names representing the type of
information stored (e.g., the variable called meals might contain a count of the number of meals
eaten during the day) or you may use the ones offered automatically by Quantum. Sometimes it
is useful for a series of variables to have the same name. Each variable may then be addressed
by its position in the group. This arrangement is known as an array.
Data variables
To define a data variable, type:
data var_name sizes <<Syntax>>
At the start of every job, Quantum provides you with an array of 1,000 data cells called C. This
array is sometimes referred to as the C matrix. The individual cells are called C-variables. Each
C-variable stores one ‘column’ of data. Quantum reads data from your data file into this array.
Let’s say we have a very small questionnaire which uses 43 columns to store the data. Quantum
will read the data for each respondent into cells 1 to 43 of the C array, one respondent at a time.
The codes from column 1 of the data are copied into cell 1 of the C array, the codes from column
2 of the data are copied into cell 2, and so on. When Quantum has finished with that respondent’s
data it clears out the cells in the C matrix and reads the data for the next respondent, placing it in
cells 1 to 43 of the array we can access this data by defining the columns whose contents we
wish to inspect or change.
Let’s take the questions about color that we mentioned earlier. The printed questionnaire tells us
that the respondent’s favorite color will be coded into column 15, to look at this column we would
write:
c15 or c(15)
C-variables are reset to blank before a new respondent’s data is read. Thus, you can be certain that Quantum never muddles the contents of column 10 for the first respondent with those of c10 for the second respondent. As we mentioned above, you may create your own data variables to store specific pieces of data. For instance, in a shopping survey we may want to store data about visits to Sainsburys in an array called ‘sains’ and data about visits to Safeways in an array called ‘safe’ Before we can use these arrays, we must create them. If each array is to contain 100 cells or column of data, we would write: data sains 100s data safe 100s where the s at the end of each statement causes Quantum to recognize that, for example, safe1 is the same as safe(1), just as it knows that c15 and c(15) refer to the same column of data. If you created the arrays without the s, then Quantum would not recognize safe1 as being the same as safe(1).
Integer variables
To define an integer variable, type:
int var_name sizes
To refer to an integer variable, type:
name[cell_number]
Integer variables store whole numbers. Strings of integer variables are called integer arrays, and
each cell in the array may store any whole number from -2,147,483,647 to 2,147,483,647. At
the start of each run, Quantum provides an array of 200 integer variables called T. The first cell in
this array is the integer variable t1 which may store any value within the given range; the second
cell in the array is the integer variable called t2 which may also store any value within the given
range. To illustrate the difference between a data variable and an integer variable, let’s suppose
that our data contains the value of the respondent’s car to the nearest whole pound. If the value is
£6,000, this will take up 4 columns in the data (assuming that we are only concerned with the
digits) – that is, four data variables, the first of which will contain the 6, and the other three of
which will all contains zeroes. If we placed this same value in an integer variable, we would only
need one variable to store the whole value because each variable can store values in the range
from -2,147,483,647 to 2,147,483,647
We have already mentioned that Quantum provides an integer array of 200 integer variables. You
may create your own arrays using statements similar to those shown above for data variables.
Suppose you have a household survey in which you have collected the value of each car that the
family owns. You want to set up an integer array in which to store each value, so you write: int
carval 10s This creates an array called carval which contains ten separate integer variables
called carval1 to carval10. Notice that we have followed the array size with the letter s so that we
can omit the parentheses from the individual variable names. We can then copy the value of the
first car into carval1, the value of the second car into carval2, and so on. If a particular household
owns three cars values at £6,000, £2,500 and £500, then carval1 would have a value of 6,000,
carval2 would be 2,500 and carval3 would be 500. If you create your own integer variables, it is
recommended that you name them with names that reflect their purpose in the run.
Real variables
To define a real variable, type:
real var_name sizes
To refer to a real variable, type:
name[cell_number]
You may define real variables and arrays to store real numbers with accuracy up to six significant
figures. Values with more than six significant figures have the sixth figure rounded up or down
according to the value of the extra figures. As with integer variables, the names of real variables
should give some clue to the type of information they contain. Real arrays are created by
statements of the form: real liters 5s this example creates a real array called liters which has five
real variables named liters1 to liters5. It can store five real values, the first in liters1 and the fifth in
liters5. Quantum also provides a set of 100 real variables named X which you may use. As an
example, let’s say that the data contains information on how long, on average, each person in the
household spent watching television during a given week. We want to manipulate these figures
so we create an array of real variables in which to store the average viewing figures real tvwatch
8s this provides room for up to eight people’s figures. If our household contains four people with
viewing averages of 20.8 hours, 15.75 hours, 9.75 hours and 10.0 hours, then tvwatch1 will have
a value of 20.8, tvwatch2 will have a value of 15.75, tvwatch3 will be 9.75 and tvwatch4 will be
10.0 hours. The rest of the variables in the array have values of 0.0.
Reading real numbers from columns
To read real values from the C array, type:
cx(start_col, end_col)
Data from the questionnaire is read into columns for use during the run. When the data contains
real numbers you will have to tell Quantum that the dot is to be treated as a decimal point rather
than as a multicode representing a number of different answers. The way to do this is to refer to
the field as cx: cx(15,20) cx(131,135) Here we have two fields containing real numbers: the first is
six columns wide including the decimal place, which means that the number itself contains five
digits, whereas the second is only five columns wide with four digits Notice that there is no need
to tell Quantum where the decimal point is
Subscription
As we have shown above, you may refer to specific variables in integer and real arrays and cells
or columns in data arrays by naming their position in the array.
For example:
c1 is the first column of the C array
t5 is the fifth variable in the T array
time3 is the third variable in the array called time
seg(2) is variable 2 of the seg array
Variables within an array may also be referred to using any arithmetic expression. In this case,
parentheses must be used. For example:
c(t1) the column number depends on the value of t1. If t1 has a value of 10, then the
variable is c10; if t1 is 67, the variable is c67.
c(t4,t5) the field delimiters depend on the values of t4 and t5. If t4 has a value of 12 and
t5 has a value of 19, the column field referred to is c(12,19).
t(c4) the variable number depends on the value in c4. If c4 contains a single code in
the range 1 to 9, the integer variable will be one of t1 to t9 depending on the
exact value in c4. If c4 is multicoded, then the result is nonsense.
time(c4*23) the variable number is the result of multiplying the value in c4 by 23 As in the
previous example, c4 must be single-coded in the range 1 to 9 for this example
to make sense. Thus, if c4 contains just a 4, the value of the expression is 92 so
the variable referred to is time92.
When variables are referenced in this way, the value of the expression must be positive. The
expression c(t15) is acceptable as long as t1 is at least 5. If the expression has a zero or
negative value Quantum will issue an array dimension error when it comes to read the data
during the datapass. Also, if the variable refers to columns, the value of the subscript must not
exceed 32,767. These are called subscripted variables and they greatly increase the flexibility with which you can write your edit.
Expressions
Quantum recognizes two types of expression – arithmetic and logical. Arithmetic expressions are used to produce numeric values and logical expressions, when evaluated, produce a value of true or false.
Arithmetic expressions
The simplest form of arithmetic expression is a single positive or negative number such as 10 or
26.5 or an integer or real variable. Although the C Array is data, columns may also be used in
arithmetic when the response coded into those columns is a numeric response, such as a
respondent’s age or the number of different shops he visited. For example, if columns 243 to 247
contain the codes 4,7,2,6 and 0 respectively the value in c(243,247) could be read as 47,260.
Similarly, if columns 45 to 48 contain 7, 8, a dot and 2 respectively, the value in cx(45,48) would
be 78.2. Blank columns in a field are ignored when the codes in those columns are evaluated.
Thus, if columns 20 to 21 contain the codes 6 and 7 respectively, and column 22 is blank, the
codes in c(20,22) will be evaluated as 67. A similar result is produced if the blank column appears
anywhere else in the field. All the examples of c(20,22) below produce an arithmetic value of 67:
+----20----+ +----21----+ +----22----+
6 7
6 7
6 7
The same applies to multicoded columns. If you use a multicoded column as part of an arithmetic
expression, the multicoded column will be ignored. The exception to this is a multicode of a digit
and a minus sign which creates a negative number: a minus sign anywhere in a numeric field
negates the value in the field as a whole, not just then number it is multicoded with.
For example:
2---+----3----+----4
12-4 is -1234
3
4---+----5----+----6
83- is -83
C o m b i n i n g a r i t h m e t i c e x p r e s s i o n s
To combine arithmetic expressions, type:
variable operator variable [operator variable ... ]
where variable is a numeric value or the name of a variable containing a numeric value,
and operator is one of the arithmetic operators , , * (multiply) or / (divide).
More often than not you will want to combine numeric expressions to form a larger expression, for
instance to count the number of records read with a given code in a named column. Arithmetic
expressions are linked with any of the arithmetic operators listed below: Expressions may contain
more than one of these operators, for instance:
t5 + c(134,136) / tot
c(150,152) * 10 + 2.5
Quantum evaluates such expressions in the following order:
1. Expressions in parentheses.
2. Multiplication and division
3. Addition and subtraction
If you wish to change this order you should enclose the expressions which go together in
parentheses. The first expression in the example above will be evaluated by dividing the value in
columns 134 to 136 by otot and adding the result to t5. If you change the expression to:
(t5 + c(134,136)) / tot
this adds the values of t5 and c(134,136) first and then divides that by otot. Let’s substitute
numbers and compare the results. If t5=10, otot=5 and the value in c(134,136) is 125 the two
versions of the expression would read as follows:
10 + 125 / 5 = 35 and (10 + 125) / 5 = 27
Where two integer expressions are combined, the result is integer (any decimal places are
ignored), but if an expression contains a real then the result will be real. Therefore, if t1=5 and
t2=3, then:
t1 + 4 = 9
t1 + 4.0 = 9.0
t1 * t2 = 15
t1 / t2 = 1
t1 * 1.0 = 5.0
t1 * 1.0 / t2 = 1.66
If you use parentheses in expressions which contain both integer and real variables, you need to
take extra care to ensure that your expression is producing the correct results. Let’s look at an
example to illustrate how an expression can look correct but can still produce unexpected results.
If we assume that t40=2 and t41=70, the expression:
t40 * 100.0 / t41
yields a result of 2.8 (i.e., 200.0/70). The final value will be 2.8 if the result is saved in a real
variable, or 2 if it is saved in an integer variable. If we use parentheses:
(t40 / t41) * 100.0
the result is 0.0 (or 0 if saved in an integer variable). The reason for this is as follows Because
Quantum evaluates expressions in parentheses before it deals with the rest of the expression, it
treats that expression as integer arithmetic. The rules for integer arithmetic dictate that real
results are truncated at the decimal point, so the true result of 0.28 becomes 0. Any multiplication
involving zero is always zero, so the final result is zero. If you find that a run gives unexpected
zero results, try looking for expressions of this type and checking whether the parenthesized part
of the expression has been truncated because the integer division results in a decimal number.
Counting the number of codes in a column
To count the number of codes in a column or list of columns, type:
numb(cn1[’codes’], cn2[’codes’], ... )
If any columns are followed by a code reference, only those codes will be counted for those
columns.
The function numb is an arithmetic expression which counts the number of codes in a column or
list of columns. Its format is:
numb(cn1,cn2, ... cnn)
where cn1 to cnn are the columns whose codes are to be counted. So, if we wanted to count the
number of codes in columns 132 to 135 we would type:
numb(c132,c133,c134,c135)
Notice that even though the columns are consecutive, each one is entered separately, with each
column number preceded by a ‘c’. It is incorrect to define only the start and end columns of a field
when using numb. Therefore it is wrong to write numb(c(132,135)) or numb(c(132,135)) and, if
you write statements such as these, Quantum will flag them as errors. Sometimes you will only be
interested in certain codes, for instance you may want to know how many 1, 2 or 3 codes there
are in a group of columns. In this case the function is entered as:
numb(cn’p1’,cn’p2’, ... cnn’pn’)
where p1 to pn are the codes to be counted. Only the named codes are counted – any others
appearing in the columns are ignored. Let’s say our data on card 1 is as follows:
1---+----2---...---5----+----4
1 2 1
6 / /
8 6 7
8
and we want to count the number of codes in column 115 and also the number of codes in the
range ’5/8’ in columns 121 and 157. The expression would be entered as:
numb(c115,c121’5/8’,c157’5/8’)
When Quantum checks these columns and codes, it will tell us that there are 9 codes in these columns which are within the given ranges. These codes are all four codes in column 115 (we did not specify which codes to count in that column), codes 5 and 6 in column 121 (codes 2 to 4 are outside the given range), and codes 5 to 7 in column 157 (codes 1 to 4 are outside the given range).
Generating a random number
To generate a random number in the range 1 to n, type:
random(n)
Quantum can generate random numbers automatically with the random function:
random(n)
where n is the maximum value the random number may take. So, to generate a random
number in the range 1 to 100, the expression would read:
random(100)
The number produced may be saved for later use in an integer variable or column, thus:
rnum=random(32)
c(110,112)=random(156)
When using random with columns, always make sure that the number of columns allocated to the number is sufficient to store the highest possible number that can be generated. In our example, we need three columns in order to store numbers up to 156.
Logical expressions
Logical expressions are used for comparing values, codes and variables.
Comparing values to compare the values of two arithmetic expressions, type:
<<arith_exp>> log_operator <<arith_exp>>
where log_operator is one of the operators .eq., .gt., .ge., .lt., .le or .ne
Values are compared when you need to check whether an expression has a given value – for
example, did the respondent buy more than 10 pints of milk?
Values are compared by placing arithmetic expressions on either side of one of the following operators:
Exp. Value
.eq. equal to
.gt. greater than
.ge. greater than or equal to
.lt. less than
.le. less than or equal to
.ne not equal to / unequal to
If the number of pints of milk that the respondent bought is stored in columns 114 and 115, the
expression to check whether he bought more than ten pints would be:
c(114,115) .gt. 10
If the number in these columns is greater than ten the expression is true, otherwise it is false.
Earlier we have said that integer variables may take numeric values or the logical values true and
false depending upon whether or not the value is zero. To check whether the respondent bought
any packets of frozen vegetables, we can either write:
fveg .gt. 0
To check the numeric value of the variable fveg, or we can simply say:
fveg
to check whether the logical value of fveg is true.
To check whether fveg is false (i.e. zero), we would write
.not. fveg
Comparing data variables and data constants
In virtually every Quantum run you will want to check which codes occur in which columns. This is
easily done using logical expressions. There are several forms of expression depending on
whether you are checking a column or a field of columns.
Data variables
To test whether a data variable contains at least one of a list of codes, type:
var_name’codes’
To test whether a data variable contains none of the listed codes, type:
var_namen’codes’
To test whether a data variable contains exactly the given codes and nothing else, type:
var_name = ’codes’
To test whether two data variables contain identical codes, type:
var_name1 = var_name2
To test whether a data variable contains codes other than those listed, type:
var_nameu’codes’
To test whether two data variables do not contain identical codes, type:
var_name1uvar_name2
To check whether a column or data variable contains certain codes, place the codes, enclosed in
single quotes, immediately after the name of the column or data variable:
e.g. c1’1’ c156’23’ brand’5’
The expression: Cn’p’ checks whether a column (n) contains a certain code or codes (p). The
expression is true as long as column n contains at least one of the given codes. It does not matter
if there are other codes present since these are ignored. For example, to check whether column
6 contains any of the codes 1 through 4 we Would type:
c6’1/4’
The expression is true if C6 contains any of the codes 1, 2, 3 or 4 or any combination of those
odes, regardless of what other codes may also be present. For instance:
----+----1 ----+----1 ----+----1
1 1 1
6 2 3
8 3 0
- 4
&
are true, but:
----+----1
5
7
9
—
is false.
In our original example we chose the codes 1 through 4. You can, of course, use any codes you
like and they may be entered in any order.
The opposite of Cn’p’ is:
cnN’p’
which checks that a column does not contain the given code or codes. The expression is true as
long as the column does not contain any of the listed codes. For example: c478n’5/7&’ is true as
long as column 478 does not contain a 5, 6, 7 or & or any combination of them.
A multicode of ’189’ returns the logical value true, because it does not contain any of the codes
’5/7&’ whereas a multicode of ’1589’ makes the expression false because it contains a ’5’.
The ’=’ operator is used to check that the contents of a column are identical to the given codes.
The expression: c312=’1/46’ is true as long as c312 contains all of the codes 1 through 4 and 6,
and nothing else. The expression: c142=’ ’ checks that column 142 is blank. The equals sign is
optional when checking for blanks, so we could simply write:
c142’ ’
to check whether column 142 is blank. The ’=’ operator may also be used to compare the
contents of two data variables. For example: c56=c79 checks whether c56 contains exactly the
same codes as c79. If so, the expression is true, otherwise it is false. If we have
+----6----+ ... +----8----
1 1
5 5
the expression is true, but:
+----6----+ ... +----8----
1 1
5 5
9
yields the value false because column 79 contains a ’9’ when column 56 does not. If you have
defined your own data variables, you could write a statement of the form: brand1=c79 to check
whether the data variable called brand1 contains the same codes as c79.
The opposite of ’=’ is ’U’ (unequal):
cnU’p’
This checks whether column n contains something other than just the code ’p’. Suppose we have
two sets of data:
----+-----5 ----+-----5
1 1
4 5
7 9
and we write:
c44u’7’
The expression is true for both sets of data. In the first example, the ’7’ is multicoded with a ’1’
and a ’4’, while in the second example, column 44 does not contain a ’7’ at all. The only time this
expression is false is when column 44 contains a ’7’ and nothing else
Fields of data variables
To test whether a field contains a given list of codes, type: var_name(start, end) = $codes$
To test whether two fields contain identical strings, type:
var_name1(start1, end1) = var_name2(start2, end2)
To test whether the codes in one field differ from a given string, type:
var_name(start, end)u$codes$
To test whether the codes in one field differ from those in another, type:
var_name1(start1, end1)uvar_name2(start2, end2)
The contents of data fields must be enclosed in dollar signs with each code in the string referring
to a separate column in the field. For instance, to check whether columns 47 to 50 contain the
codes –, 6, 4 and 9 respectively we would type:
c(47,50)=$–649$
The only data for which this expression is true is:
+----5-----+
-649
However, if our data read:
+----5-----+
-529
164&
the expression would be false because all columns are multicoded. All our examples have used
columns, but the same rules apply to data variables that you define yourself.
For example:
rating(1,4)=$1234$
checks whether the field rating1 to rating4 contains the codes 1, 2, 3 and 4 in that order That is, it
checks whether rating1 contains a 1, whether rating2 contains a 2, and so on. When checking the
contents of fields in this way, make sure that you enter as many columns as there are codes in
the string (i.e. five codes require five columns). The exception to this rule occurs when you are
checking for blanks when the expression may be shortened to:
c(50,80)=$ $
This type of statement may also be used to compare two fields, to check whether the second field
contains exactly the same codes as the first field. When you compare one field with another,
Quantum takes each column in the first field in turn and looks to see whether the corresponding
column in the second field contains exactly the same codes. For example, if the first column of
the first field contains a code 1 and a code 2 and nothing else, then Quantum will check whether
the first column of the second field also contains a code 1 and a code 2 and nothing else. If all
columns of the second field are identical to their counterparts in the first field, then the expression
is true; otherwise it is false. Here is an example:
c(129,132)=c(356,359)
For this expression to be true, column 129 must contain exactly the same codes as column 356,
column 130 must be exactly the same as column 357, and so on. Once gain the two expressions
on either side of the equals sign must be the same length Comparisons of one data variable
against another are concerned with columns and codes: they are not concerned with the
arithmetic values of the codes in the fields as a whole.
If we have:
----+----3----+----
02 2
the expression:
c(24,25)=c(34,35)
is false because the string $02$ is not the same as the string $2$. If you want to compare fields
arithmetically (i.e., is 02 the same as 2) then you will need to use the eq. operator:
c(24,25).eq.c(34,35)
to test whether the value in c(34,35) was equal to the value in c(24,25). The .eq. operator is
described in the section entitled "Comparing values"
To check whether the codes in one field match a given string or the codes in another field, we can
use the = (equals) operator: c(m,n)=$codes$ cm=cn c(m,n)=c(m1,n1) If codes in the field c(m,n)
match the given string or the codes in c(m1,n1) then the expression is true. If the two fields are
not identical, then the expression is false
Let’s look at an example of the unequals operator. The statement:
c(67,69)u$123$
is true at all times unless our data reads:
The expression:
c(67,69)uc(77,79)
is true as long as columns 67 to 69 differ by at least one code from columns 77 to 79. If our data
is:
+----7----+----8
123 256
the expression is true because each of columns 77 to 79 differ from columns 67 to 69 Also, if we
have:
+----7----+----8
123 123
5
the expression is true because column 77 is multicoded ’15’. The only time the expression is false
is when columns 67 to 69 are identical to columns 77 to 79.
Checking the ar i thmetic va lue of a f ie ld of co lumns
To test whether a value in a field is within a specified range, type:
range(start, end, minimum, maximum)
Blanks at the start of the field cause this statement to give a false result. To ignore leading
blanks, type:
rangeb(start, end, minimum, maximum)
The logical expression range checks whether the number in a field of columns is within a given
range. If so, the expression is true, otherwise it is false. The format of this statement is:
range(start,end,min,max)
where start and end are column numbers and min and max are the range delimiters.
For example, the statement:
range(137,139,100,150)
will return the value true if the number in columns 37 to 39 of card 1 is in the range 100 to 150.
A variation of range is rangeb which allows columns to the left of the field to be blank if the
number is right-justified in the field. In all other respects it is exactly the same as range. If our
data is:
----+----2
123 6
the expression: rangeb(17,18,1,10) will be true because the string $ 6$ will be read as 6. With
range the value would be false.
However, the expression:
rangeb(15,18,2000,3000)
returns false because of the blank in c17.
Combining logica l express ions
To combine logical expressions, type:
expression operator expression
where operator is one of .or., .and., or .xor.
Two or more logical expressions may be combined into a single expression using the operators:
and. both/all true
or. one or the other or both/all true
not. negates (reverses) an expression
Any number of subexpressions may be combined to form a larger expression, but whether the
result is true or false depends upon the values of the subexpressions and also upon the operators
used to combine them
The .and. operator requires that all the expressions preceding and following the .and. be true for
the whole expression to be true. Thus, the statement:
int1.eq.9 .and. c116’1’
is true if the integer variable int1 has a value of 9 and column 116 contains a 1. If either
subexpression is false, the whole expression is false too By comparison, the .or. operator
requires that one expression or the other, or both, be true in order for the whole expression to be
true.
c(249,251)=$159$ .or. numb(c132,c135) .gt. 4
For this expression to be true, columns 249 to 251 must contain nothing but a ’1’, ’5’ and ’9’
respectively or the number of codes in columns 132 to 135 must be greater than 4. It is also true
if both expressions are true. However, if both are false, the overall result is false.
Expressions are reversed (negated) simply by preceding them with the keyword .not. Although it
is not wrong to use it with a single variable, it is more generally used to reverse an expression
containing the keywords .and. and .or.. Thus, it is not wrong to write .not.c15’1/5’ but it is much
simpler to write this as c15n’1/5’.
Example:
The .and. operator requires that all the expressions preceding and following the .and. be true for
the whole expression to be true. Thus, the statement:
int1.eq.9 .and. c116’1’
is true if the integer variable int1 has a value of 9 and column 116 contains a 1. If either
subexpression is false, the whole expression is false too.
By comparison, the .or. operator requires that one expression or the other, or both, be true
in order for the whole expression to be true.
c(249,251)=$159$ .or. numb(c132,c135) .gt. 4
For this expression to be true, columns 249 to 251 must contain nothing but a ’1’, ’5’ and ’9’
respectively or the number of codes in columns 132 to 135 must be greater than 4. It is also true
if both expressions are true. However, if both are false, the overall result is false.
Expressions are reversed (negated) simply by preceding them with the keyword .not. Although it
is not wrong to use it with a single variable, it is more generally used to reverse an expression
containing the keywords .and. and .or.. Thus, it is not wrong to write .not.c15’1/5’ but it is much
simpler to write this as c15n’1/5’.
Take care when using .not. with the .eq. operator. Statements of the form:
.not. c(1,3) .eq. 100
are incorrect and will not work. They should be written as either:
(not.(c(1,3).eq.100))
with the expression to be reversed enclosed in parentheses, or:
(c(1,3).ne.100)
Any of the operators .and., .or, and .not. may appear in a statement more than once, as long as
you use parentheses to define the order of evaluation. For example:
(c15’1/47’ .or. c16’3579’) .and. c22’&’
causes Quantum to check whether the .or. condition is true before dealing with the .and Suppose
our data is:
----+----2----+
13 &
79
The first expression (c15’1/47’) is true because column 15 contains a 1 and a 7 and the second
expression (c16’3579’) is also true since the codes it contains are amongst those listed as
acceptable. Thus, the .or. condition is true. Column 22 contains an ampersand so the last
expression is also true, therefore the expression as a whole is true regardless If both expressions
in the parentheses were false, the whole expression would be false not. with .and. and .or.
When you use .not. with expressions in parentheses, be very careful that what you write is what
you mean. Let’s take the conditions male and married and forget about columns and codes for
the minute. The condition:
(Male .and. Married)
refers only to married men. The opposite of this is:
.not. (Male .and. Married)
which refers to unmarried men and all women. This can also be written as:
not.Male or.not.Married
The first .not. collects all the women, the second collects everyone who is not married (e.g.
single, widowed etc), and together they collect people who are female and unmarried. We
use .or. instead of .and. here because the latter will gather unmarried women but will ignore the
unmarried men and married women.
Reversing .or. expressions works in exactly the same way. The expression:
(Male .or. Married)
means anyone who is Male, or anyone who is Married, or anyone who is Male and Married. The
opposite of this is:
.not. (Male .or. Married)
which means anyone who is not Male or is not Married or is not both; that is, anyone who
is a woman and is unmarried. This can be written as:
.not. Male .and. .not. Married
Thus, we can summarize, as follows:
Positive Negative Is the Same as
(A .and. B) .not. (A .and. B) .not. A .or. .not. B
(A .or. B) .not. (A .or. B) .not. A .and. .not. B
Here is an example using columns and codes:
.not. (c(135,137)=$519$ .or. c160’6/0’)
If our data is:
3----+----4----+----5----+----6----+
519 1
9 &
the expression is true because c(135,137) do not contain just the codes 5, 1 and 9 (c135 is
multicoded), and c160 does not contain any of the codes 6 through 0. The expression will only be
false if:
A) Column 135 contains a 5 only, column 136 contains a 6 only and column 137 contains a
9 only, and Column 160 contains any of the codes 6 through 0, either singly or as a
multicode.
We could therefore write the expression as:
.not. c(135,137)=$519$ .and. .not. c160’6/0’
Comparing variables and arithmetic expressions to a list
To compare the value of a variable or an arithmetic expression to a list of numbers, type:
item .in. (value1, value2, ... )
Ranges of numbers may be entered in the list as start:end. If the item is a reference to a field
containing blanks, enter the values as strings of codes enclosed in dollar signs.
Example:
C(3,5).in.($123$,$765,$ 26$)
C(120,122).in.(100,110,200:250)
From time to time you may need to check whether a variable or arithmetic expression has one of
a given list of values. For example, if the questionnaire codes brands of frozen vegetables as 3-
digit codes into columns 145 to 147 we might want to check that only valid codes appeared in this
field. This is achieved using the logical expression .in. as follows:
variable-name .in. (list) or
arithmetic-exp .in. (list)
where variable-name is that of the variable to be checked and list is a list of permissible values.
The arithmetic expression is an expression consisting of data or integer variables, arithmetic
operators and integer values as described earlier in this chapter. If the variable or arithmetic
expression has one of the listed values, the expression is true, if not, it is false. The left-hand side
of the expression may contain integer variables, columns or data variables containing whole
numbers, or expressions using these types of variables. If it is a data variable, then the list may
contain codes enclosed in dollar signs. Quantum will then compare the codes in the data variable
with the codes inside the dollar signs. We could therefore check that the frozen vegetables have
been coded correctly by keying in a statement which says:
c(145,147) .in. ($205$,$206$,$207$,$210$,$215$,$220$)
Quantum will flag any records in which c(145,147) does not contains exactly 205, 206,
207, 210, 215 or 220 (i.e. three single-coded columns) as incorrect.
If the data variable contains a valid positive or negative whole number, then the list may also
contain such values. Ranges of values may be entered in the form min:max, where min is the
lowest acceptable value and max is the highest. Since the frozen vegetables have numeric
codes, we could write the expression as:
c(145,147) .in. (205:207,210,215,220)
Any columns in the field which contain non-numeric data (e.g. multicodes) will be flagged as
incorrect, as will any which contain values which do not match the specification Sometimes,
though, the codes and numbers will not be interchangeable. If you have 2- digit codes in a 3-
column field, the statement:
c(206,2 09) .in. ($ 10$,$ 11$,$ 12$,$ 13$)
is not the same as:
c(206,209) .in. (10:13)
unless column 206 is always blank. If the 2-digit codes have been padded on the left with zeroes
instead of blanks (i.e., 010, 011) or if they all start in column 206 (i.e., $10 $, $11 $), then the first
expression will be false, even though the second one will still be true.
If the left-hand side of the expression is an integer variable or an arithmetic expression, the list
may contain positive or negative whole numbers: total .in. (100,200,500:1000) Lists may contain
up to 247 values or codes, which may be entered in any order. In our examples, we have always
entered them in ascending order, but this is not a requirement of Quantum. You may enter codes
in a list in any order you like. The exception is numeric ranges which must be entered in the form
lowest:highest
Naming lists
To assign a name to a list of values, type:
definelist name=(list)
where list is a comma-separated list of numbers, ranges or code strings enclosed in dollar signs.
If you have a list that is used more than once you may give it a name and refer to it by that name
instead of typing in the complete list each time. To name a list, write:
definelist name=(list)
For example:
definelist fveg=(205:207,210,215,220)
To use a defined list, simply replace the list with the name:
c(145,147) .in. fveg
Speeding up large programs
To speed up your Quantum program by converting expressions of the form c(1,4)=$1234$ into C
in a more efficient way, type:
inline n
where n is the maximum field width to be converted in this manner. This statement must appear
at the start of the edit.
If you have a large edit, you can speed up the time it takes to run by including the inline
statement in your edit. This instructs the Quantum compiler to convert expressions of the form
c(1,4)=$1234$ into statements in the C programming language in a different way to the way it
normally does. You need not worry about these different methods of conversion, apart from
deciding whether or not to use them.
If you want to speed your program up, place a statement of the form:
inline n
at the beginning of the edit section, where n is the maximum field width to be converted in the
special way. For example:
inline 6
Here we are saying that fields of six columns or less should be converted in the special way
rather than in the normal way.
How Quantum reads data
In order for the answered questionnaire to be processed, the information contained on the questionnaire must be read into the computer into a location where Quantum can access it. This is done by reading the data into the data variable array called C which is supplied automatically with every Quantum run. You may then access this data by addressing this array. Different types of records are read into the C Array in different ways.
Types of record
Quantum deals with three types of record: ordinary, multicard and multicard with trailer cards.
Ordinary records
These are strings of codes and numbers, one per respondent, up to a maximum of 32,767 characters per respondent.
Multicard records
When data originates from punched cards and each questionnaire requires more than 80
columns, the data is spread over several cards. So that all cards belonging to a particular
respondent may be easily identified, each questionnaire is assigned a serial number which is
entered as part of the data for each card. Within this, each card has a unique card type or card
number to distinguish it from others in the group. It is important that both the serial number and
card type be in the same relative positions on all cards in the file, since this is the only way that
Quantum can tell which data belongs to which respondent. If the questionnaire serial number is in
columns 1 to 4 of each card and the card type is in column 5, and we are looking at questionnaire
1005, we will see that it has two cards whose first five columns are 10051 and 10052
respectively. Quantum can deal with records that contain up to 327 cards per respondent.
occasionally you may have multicard records in which each ‘card’ is greater than 80 columns.
The notes that follow refer to multicard records of up to 100 columns per card.
Multicard records with Trailer Cards
Sometimes a record contains very repetitive data which is tabulated over and over again in the
same way. For instance, a shopping survey may ask the respondent a series of identical
questions for each store he visited. In this case, there may be a separate card for each store.
Processing this type of data is often easier if we treat all cards containing the same questions as
if they were, in fact, one card with one card number. These cards are called Trailer Cards Thus, if the respondent visited five stores, and the questions about these stores are coded on a card 2, the record for that respondent would contain five cards of type 2. If demographic details were stored on a card 1, the whole record would be 6 cards in all. In Quantum, the demographic data would be described as the higher level and the stores as the lower level.
Reading data into the C array
Data is read into the C Array automatically, one record at a time. The way data is read depends upon the record structure.
Ordinary records
Ordinary records are read into cell 1 onwards of the array. Therefore, for example, the 50th column is referenced as c50 and the 200th cell as c200.
Multicard records
Records are read into c101 to c200 for card 1, c201 to c300 for card 2, and so on. For example,
80-column cards are read into c101 to c180 for card 1 and c201 to c280 for card 2. Columns 181-
200, 281-300, etc remain blank. In this case, the C Array may be pictured as ten rows of 100 cells
each. Column 50 of card 1 is then accessed by referring to it as c150, and column 67 of card 8 is
referred to as c867.
Ignoring card types
It is also possible to read cards into the array sequentially regardless of card type: the first card
goes in c(101,200), the second in c(201,300), the third in c(301,400), and so on.
Processing the data
Each time an ordinary record or set of cards comprising a multicard record is read in, hat data is
processed first by the edit section and then by the tabulation section of your program. The
complete record is edited and tabulated in one go. The exception to this is the trailer card record
where processing can take place a number of times within each record for each lower level.
To ensure that only the part of the edit section applying to a particular level is used, the edit
section is defined separately for each level. Similarly, the table instructions specify the level at
which the table should be incremented.
Changing the contents of a variable
This section describes how to assign values to variables and the statements emit, delete and
priority, all of which may be used to alter the contents of a variable. Emit, delete and priority are used only with columns whereas assignment statements can deal with character, integer and real variables. When we say that these statements change the contents of a column we mean that they change the contents of that column as it exists during the run: at no time do they change the corresponding column in the data file.
Trailer Cards
By using the Levels facility, the user need not know how Quantum deals with trailer card data
internally. However, there are occasions when it may be necessary to edit or tabulate the data
without using levels. To do this, it is necessary to know more about how trailer cards are
processed.
Quantum deals with trailer cards in a number of ‘reads’. Cards are read into the appropriate rows
of the C Array until:
a) a card is located with a card type matching that of the previous card (e.g., two
consecutive card 2’s), or
b) a card is read with a type lower than its predecessor and matching one of the card types
already read in during the current ‘read’ (e.g., a card 2, a card 3, and then another card
2).
In order to produce useful tables, you will need to know which cards are currently in the
C Array.z`
Quantum has four reserved variables – thisread, allread, firstread and lastread – which it uses to
keep track of which cards it has read for each respondent.
thisread
The array called thisread is used to check which cards have been read in during the current
read. thisread1 will be true (or 1) if a card type 1 has just been read in; thisread2 will be true if a
card 2 has just been read, and so on.
There are nine such variables (thisread1 to thisread9) available unless extra card types have
been specified using the max= option In this case, these variables will be numbered 1 to max; if
there are 13 cards, we will have thisread1 to thisread13.
Allread
allread notes which cards have been read in so far for this questionnaire. If cards 1, 2 and 3
have been read so far, allread1, allread2 and allread3 will all be true. Additionally, each cell of
allread will contain the number of cards of the given type read in – for instance, if two cards of
type 3 have been read, allread3 will be true and it will contain the number 2.
As with thisread, there are nine allread variables available unless extra card types have been
specified with max=.
firstread and lastread
The variables firstread and lastread become true when the first and last cards in a record have
been read in.
Reserved variables
Other reserved variables associated with reading in data:
lastrec set to true when the last record in the file has been read or, in the case of trailer cards, the
last read of the last record has occurred. rec_count stores the number of records read in so far.
card_count counts the number of cards read so far.
Describing the data structure for Multicard records
To describe the structure of the data, type:
struct; options
All programs dealing with multicard records must contain a struct statement unless the data
contains trailer cards which will be read and tabulated using the levels facility. In this case you
may choose between using a struct statement or using a levels file. If the run has no struct
statement and no levels file, Quantum assumes that the data contains ordinary records to be read
into c1 onwards of the C array.
The struct statement is used to define the type of records, the location of the serial number and
card type in the record and the number of the highest card type if greater than 9. Its format is:
struct;options
Record type
To define the record type, type:
struct; read=n
where n is 0 for ordinary records, 2 to read multicard records in sections according to the card
type, or 3 to read multicard records in all in one go.
Quantum recognizes two types of record: single card and multicard. The type of record is defined
by the keyword read= on the struct statement:
Ordinary Records
Ordinary records are defined using read=0. Each record is read into c1 onwards of the array.
Since it is the default, you need only use it when other options are required; for example, when
the records contain serial numbers and you wish to have the serial number printed out as part of
the record, or when you are working with long records of more than 100 columns.
Multicard Records
Multicard records are identified by the keyword read=2. Each card in the record is read into the
row corresponding to the card type of that card – that is, card 1 in c(101,200), card 2 in
c(201,300), and so on. We mentioned briefly that it is possible to read all cards in a multicard
record in at once and ignore the card type. The first card goes in c(101,200), the second in
c(201,300), and so on. This is achieved with read=3.
Record length
To define the record length of records greater than 100 columns, type:
struct; reclen=n
The keyword reclen=n defines the maximum number of characters to be read into the C rray, the
number of cells to be reset to blanks and the number of cells to be written out
by the write statement.
With ordinary records reclen may take any value, but with multicard records the maximum is
reclen=1000. In both cases, the default is reclen=100. When data is being read into the matrix,
any record which is longer than reclen characters is truncated to that length and a warning
message is printed.
When ordinary records are written out with write or split, cells c1 to c(reclen) are copied, with any
trailing blanks being ignored. For instance, if we have:
struct;read=0;reclen=200
and the current record is only 157 characters long, the record written out will be 157 characters
long. This length can be overridden by an option on a filedef statement. When multicard records
are written out, columns c101 to c(100+reclen), c201 to c(200+reclen), and so on will be output.
Thus, if we write:
struct;read=2;reclen=70
and we have 2 cards per record, Quantum will write out c(101,170) and c(201,270). Finally, with
ordinary records cells c1 to c(reclen) are reset to blanks between records, but with multicard
records cells c101 to c(100+reclen), c210 to c(200+reclen), and so on are reset.
Serial number location
To define the location of the serial number in each record, type:
struct; ser=c(m,n)
The keyword ser=c(m,n) defines the field of columns containing the respondent serial number.
For example, if the serial number is in columns 1 to 5 of an ordinary record we would write:
struct;read=0;ser=c(1,5)
Similarly, if it is in columns 1 to 5 of a multicard record the statement would be:
struct;read=2;ser=c(1,5)
Notice that even with multicard records we only give the actual column numbers containing the
serial number, rather than card type and column number as is usually the case when identifying
columns in such records. This is because the column numbers refer to all cards in the data set
rather than to a single card in the file.
Card type location
To define the location of the card type in the record, type:
struct; crd=cn
Defining the card type location is much the same as defining the position of the serial number in
the record. The keyword is crd=cn for a single digit card type or crd=c(m,n) for a card type of
more than one digit. Once again, m and n are column numbers only, not card type and column
number.
For example:
struct;read=2;ser=c(1,4);crd=c5
tells us that we have a multicard record with serial numbers in columns 1 to 4 and the card type in
column 5 of each card. Each card will be read into the row corresponding to its card number.
Required card types
To define cards which must be present in each record, type:
struct; req=card_numbers
where card_numbers is either a comma-separated list of card numbers, or a range of sequential
card numbers in the form start:end or start/end.
Sometimes some cards will be optional and others mandatory. You may define those cards which
must appear in every record by using the keyword req= followed by the numbers of the cards
that each respondent must have.
For example:
req=1,2
tells us that cards 1 and 2 must be present in each record for that record to be accepted. Any
other cards are optional. If a record is read without one of these cards, the error message ‘Card
Missing in Set’ and a note of the record’s position in the file are printed and the record is ignored.
If you have ranges for required card types, you may type the numbers of the lowest and highest
cards separated by a slash (/) or a colon (:) rather than listing each card type separately. For
example, if cards 1 to 4 are all required, you may type:
req=1,2,3,4 or req=1/4 or req=1:4
Repeated card types
To define cards which may appear more than once in a record, type:
struct; rep=card_numbers
where card_numbers is either a comma-separated list of card numbers, or a range of sequential
card numbers in the form start:end or start/end. If the data contains trailer cards and the Levels
facility is not used, you must list their card types with the keyword rep=. For instance, if card 2 is
a trailer card we would write
rep=2. Where there is more than one trailer card, each card type is listed separated by a
comma. If cards 2, 3 and 4 are all trailer cards we could write:
rep=2,3,4
If you have ranges for required card types, you may type the numbers of the lowest and
highest cards separated by a slash (/) or a colon (:) rather than listing each card type
separately.
For example, if cards 2 to 4 are all required, you may type:
rep=2,3,4 or rep=2/4 or rep=2:4
If rep= is not used and a record is read with two or more cards of the same type, the last
card of that type will be accepted and the message ‘Identical duplicate’ or ‘Non-identical
duplicate’ and a note of the record’s position in the file will be printed. For example:
Record structure error: serial 026, card 234 in run, card 234 in dfile
card type 2 – non-identical duplicate
Because rep= refers to trailer cards only, it will be ignored if read=2 and crd= are not
both present on the struct statement.
Highest card type number
To define the highest card type in the record, if there are more than nine cards per record,
type:
struct; max=n
The only time you need to inform Quantum of the highest card type is when you have
records with more than nine cards. This is so that Quantum can allocate sufficient cells
in the C array to store the extra cards. The highest card type is defined with max=n, where
n is the number of the highest card type. Cells 1 to max*reclen are then cleared between
respondents. For example, to read a data set with 11 cards per respondent we might write:
struct;read=2;ser=c(1,4);crd=c5;req=1,2,3,4;max=11
If you forget max=, and a record is read with more than nine cards, the message ‘Too
many cards per record’ is printed and the record is rejected. On the other hand, if a card
is read with a card type higher than that defined with max=, the record is rejected with
the message ‘Card number out of range’.
Dealing with alphanumeric card types
To define the location in the C array of cards with alphanumeric card types, type:
struct; order=card_types
where card_types is a list of card type numbers and letters in the order they are to appear
in the C array.
From time to time you may need to read in records with alphabetic as well as numeric
card types. This generally happens in a multicard data set containing more than nine cards
per record where only one column has been allocated to the card type.
Quantum can deal with this data but first you will have to say where in the C array the
alphabetic card types should go. This is done with the keyword:
order=n
where n is one or more of the codes ’1234567890–&’ or the letters A to Z (in upper or
lower case) not separated by spaces.
The card type bearing the first number in the list is read into c(101,200), the card bearing
the second code in the list is read into c(201,300) etc. For example, suppose each record
has ten cards – 1 to 9 and A – our struct statement might say:
struct;read=2; ser=c(1,4);crd=c4;max=10;order=123456789A
Data from card A would be read into cells 1001 to 1100 of the C array.
Merging Data using Quantum
Merge sequence for Trailer Cards
To define the location of the merge sequence number in trailer cards, type:
struct;seq=cn
When trailer card data is merged during a run with the merge facility, you may wish
trailer cards to be merged in a specific order, according to a sequence number entered as
part of the data. The location of this sequence number can be defined with the keyword
seq=cn for a single column code or seq=c(m,n) for a multicolumn code. For more
information on merging data see the next section.
Merging data files
When we say that Quantum allows you to merge data files, we do not mean that Quantum
takes data from a number of files and merges it to create a new file. Rather, we mean that
data can be read from a series of files during a Quantum run. Of course, the merged data
can then be written out to a new file for future use.
Quantum provides two methods for merging data. The first is designed for studies where
you have different card types in different files; for example, cards 1 and 2 in the file data1
and card 3 in the file data2. In this case, merging is by serial number and, optionally, card
type and trailer card sequence number.
The second method is designed for situations where you want to merge a field of data
from an external file into records from the main data file. For example, you may have a
file of manufacturers’ codes which refer to a number of products. If each record in the
main data file contains the product the respondent preferred, you may wish to merge the
appropriate manufacturer’s code from the external file into the main data in the C array.
In this case, merging is based on finding matching keys in the main record and the records
in the external file.
Both options are described in detail below.
Merging complete cards
Data for a study may be spread across a number of files. This is particularly useful with
large surveys because it means that you can put each card type in a different file and
simply merge in the cards required for the current batch of tables. For example, if we
require tables from cards 4 and 5, we need not even read in cards 1, 2, 3 and 6.
Data from up to 16 files may be merged; that is, the main data file and 15 others. It may
be merged on serial number and, within that, on card type. With trailer card data, you also
have the option of merging trailer cards according to a sequence number entered as part
of the data.
In order for the merge to be successful, all files must be sorted in ascending order with
the serial number, card type and sequence number in the same position. Quantum reads
the locations from the keywords ser=, crd= and seq= on the struct statement.
To merge data files you must create a file called merges telling Quantum which items to
merge on, and which files to merge. The type of merge is represented by a number:
1 merge on serial number. Cards are read in from each data file according to their
serial number only – the card type and sequence number, if any, are ignored. You
might use this option when you have two files, dat01 containing cards of type 1 and
dat02 containing cards of type 2, and you want the files to be merged so that card
type 1 is read into the C-Array, followed by card type 2.
3 merge on serial number and card type (default). With this option, cards with the
same serial number read from different data files are merged to form a single record
by comparing the serial number and card type. Cards within a record are then sorted
sequentially from 1 so that each card is read into the appropriate cells of the
C-Array. For example, if dat01 contains cards 1 and 3, and dat02 contains cards of
type 2, the merge will produce records containing cards 1, 2 and 3 in that order.
5 merge on serial number, card type and sequence number. This is similar to merge
type 3, except that trailer cards are merged according to their sequence number. For
example, if dat01 contains cards 1 and 2, where card 2 is a trailer card with a
sequence number of 2, and dat02 contains cards 2 and 3, where card 2 is a trailer
cards with a sequence number of 1, the merged record will contain cards 1, 2/1, 2/2,
and 3, in that order.
This is the first item in the merges file, and is followed by the names of the files to be
merged with the main data file named in the Quantum command line. Items may be
entered on separate lines or all on the same line separated by semicolons. For example,
if we want to merge data in files dat02 and dat03 with data in the main file, dat01, by
serial number, card type and sequence number, the merges file would look like this:
5; dat02; dat03
Notice that we have not mentioned dat01 in the merges file because it will be named on
the Quantum command line instead.
Merging a field of data from an external file
To merge extra data from an external data file into the data currently in the C array, type:
int_variable=mergedata($ex_file$, key_field, key_start, copy_to, data_start)
where
ex_file is the name of the file containing the extra data.
key_field is the location of the key in the main data file, entered using the standard
Quantum notation for columns and fields
key_start is the start column of the key in the external data file.
copy_to is the field in the main data record in which to place the external data. The
field is defined using the standard Quantum notation for columns and fields.
data_start is the start column of the data to be copied.
This statement returns in int_var_name a 1 if a match was found or 0 if not.
The mergedata statement merges a field of data from an external file with the main data
at the datapass stage of the Quantum run. Merging is by means of a data key present in
both the main records and the records in the external file. If a record in the external file
has a key which matches that of a record in the main data file, the external data will be
merged into a user-defined field of the main record when it is read into the C array.
In order for data to be merged correctly, both the main data file and the external file must
be sorted in ascending order by key value. If the key is the record serial number then the
data file will already be sorted in the correct order (assuming, of course, that the data is
sorted by serial number). If you are using a key that is not the record serial number you
must sort the data file so that it is ordered by key rather than by serial number.
The syntax for mergedata is:
int_variable=mergedata($ex_file$, key_field, key_start, copy_to, data_start)
where
int_variable
is the name of an integer variable in which the function can place its return
value.
ex_file is the name of the file containing the extra data. It must be enclosed in dollar
signs.
key_field is the location of the key in the main data file, entered using the standard
Quantum notation for columns and fields.
key_start is the start column of the key in the external data file, for example, 1 if the
key starts in column 1. The length of the key is taken from the length of
key_field.
copy_to is the field in the main data record in which to place the external data. The
field is defined using the standard Quantum notation for columns and fields.
data_start is the start column of the data to be copied. Quantum copies as many
columns as are defined by copy_to.
For example:
t1 = mergedata($manuf_codes$,c(178,180),15,c(168,175),1)
tells Quantum to compare the key in columns 178 to 180 of the main record with the key
which starts in column 15 of the external records in the file manuf_codes.
Because the key field in the main record is 3 columns long, Quantum reads columns 15
to 17 of each external record to obtain its key. If the keys match, Quantum copies the data
from the external record into columns 168 to 175 of the main record in the C array. The
external data to be copied starts in column 1 and, since the destination field is 8 columns
long, Quantum copies 8 columns starting at that column.
This statement returns a value of 1 if a match was found (i.e., merging took place), or 0
if not.
There is no limit on the number of mergedata statements in a specification, but you may
only merge data from up to nine different files per record.
Writing out data
There are three ways of writing out your data once it has been read into the C-Array. You
may:
a) create a new data file
b) copy records to a print file
c) write information to a report file
Data and print files are both accessed by the write statement, but the exact format of the
statement varies according to the type of file and the information being written. Report
files are written to with the report statement.
Print files
Print files are printouts of records or parts of records with headings, descriptive texts and
page numbers. They cannot be used as data for subsequent Quantum runs.
Printing out individual records
To write a record or part of a record to a print file, type:
write [file_name] [field] [$text$]
The word write by itself prints out a whole record in the form it is when the write
statement is executed, together with a ruler showing which codes fall in which columns,
the line number of the record in the data file and the message ‘write’ indicating that the
record was generated by a write statement. Any multicodes in the record are shown as
asterisks, but you may change this with an option on the filedef statement.
If the record contains more than one card, each card is listed separately beneath the ruler.
For example, the statement:
write
by itself might give us:
Quantum edit report
1 in file
----+----1----+----2-- ... --9----+----0
column 1 - 100 are |12345
write
2 in file
----+----1----+----2-- ... --9----+----0
column 1 - 100 are |23456
write
Each write statement will produce a line in the default print file, out2, telling you how
many records were written out, as follows:
2 (1%) write
The example above was very simple; more often than not your program will contain
several write statements and you will want some way of identifying which records were
printed by which statement and why. If the write is dependent upon some other statement
– for instance, it is part of an if statement – the whole statement is printed underneath each
record, thus:
Here, as you can see, we are checking that column 14 contains a 1/4. This record has been
printed out because it contains a ’5’ instead.
67 in file
----+----1----+----2-- ... --9----+----0
column 1 - 100 are |0015263-16*735 *837361 ... 79&
if (c14n’1/4’) write
Here, as you can see, we are checking that column 14 contains a 1/4. This record has been
printed out because it contains a ’5’ instead.
Sometimes it is more helpful to have an explanatory text printed instead of the statement
itself. In this case all that is necessary is to follow the word write with the text to be
printed enclosed in dollar signs:
if (c308n’1/5’) write $C308 incorrect$
if (numb(c117,c118,c119).gt.3) write $too many choices$
might give us:
Quantum edit report
Record 17 51 in file
----+----1----+----2-- ... --9----+----0
column 101 - 200 are |00170116548986131*46*1 ...
column 201 - 300 are |0017026464515 875 ** ...
column 301 - 400 are |0017031929-5897231 ...
C308 incorrect
too many choices
Record 32 94 in file
----+----1----+----2-- ... --9----+----0
column 101 - 200 are |003201837021 **53798 ...
column 201 - 300 are |0032021353452 763736 ...
column 301 - 400 are |003203212 & ...
too many choices
Our first statement writes out all records in which column 308 does not contain any of
the codes 1/5, and the second picks up all records having more than 3 codes in columns
117 to 119.
Normally all output from write goes to the default print file, and whenever the current
record is written to this file, the variable printed_ becomes true. You may change the
output file by following the word write with the name of the file to write to. For example:
write pfile $First Print$
writes to the file ‘pfile’, whereas;
write errors $Second Print$
writes to a file called ‘errors’.
All files named on write statements must be defined on a filedef statement before they are
used.
If two or more write statements apply to a single record, the record is printed out once in
the state it was when the first applicable write was read, with all relevant write statements
or texts listed below it. If a record satisfies two or more write statements which write to
different files, Quantum will write the record out once for each statement, in the state it
is when each write is executed.
Writing Out Parts of Records
Often you will not want to write out the whole record, especially if it contains several
cards. Therefore Quantum allows you to include a field specification in a write statement
to print only selected portions of an incorrect record. For example:
if (c110’2’.and.c119’2’) write c(110,120) $Married woman$
checks that columns 110 and 119 both contain a 2, and if so prints out columns 110 to
120 in the print file, followed by the text Married woman. If you are writing out less than
ten columns, Quantum does not print a ruler above the codes.
If you are dealing with multi-card records, you may prefer to use this form of write to
have only the card containing the error printed, rather than all cards in the record. If we
take our previous example where we were checking the contents of column 308:
if (c308n’1/5’) write $c308 incorrect$
prints all three cards in the record, whereas:
if (c308n’1/5’) write c(301,380) $C308 incorrect$
prints only card 3.
To write selected parts of a record to a particular file the notation is:
write filename c(m,n) [$text$]
Data files
To write records or fields to a data file, type:
write file_name [c(start_col, end_col)]
write may also be used to copy records to a data file. This is useful if you want to separate
a particular card type from the rest of the data, or if you want to correct errors and save
the corrected data in a new file for later tabulation.
To write records to a data file the command is:
write filename
to write the whole record to the named file, or
write filename c(m,n)
to write columns m to n only.
Creating new cards
New cards can be created by copying information into spare columns of the C-Array. To save
these as part of a new data file you will have to give each new card the same respondent serial
number as the rest of the data in the array and a card type which may or may not be unique. In
the example below, we are moving some information from card 1 of a 2-card data set into a new
card 3. The comments explain what each statement is doing.
/* Copy the data into the new card
c(310,341)=c(148,179)
/* Delete it from its original place
c(148,179)=$ $
/* Give it a serial number and card type
c(301,304)=c(101,104); c380’3’
/* Set thisread true for card 3
thisread3=1
/* Define pfil as a data file
filedef pfil data
/* Copy cards 1, 2 and 3 to pfil
write pfil
Some General Instances for forcecoding cleaning etc.
Writing to a report file
To write information to a report file, type:
report[n] file_name variable_names
where variable_names is a comma-separated list of the variables and texts to print.
Use reportn rather than just report to start a new line each time the statement is executed.
A report file is a special type of print file in which you can print out records, fields or
variables in the format of your choice. To write information in a report file, use the report
statement, as follows:
report filename parameters
where filename is the name of the file to be written to, and parameters define exactly what
is to be written.
Lines in a report may be up to 1024 characters long. Report does not start a new line
automatically at the end of each write, but you may tell it to do so by following the
keyword report with the letter n:
reportn filename parameters
In both cases, the named file must be identified as a report file using a filedef statement,
as mentioned below.
The parameter list defines what is to be printed in the report file. It may contain variables,
texts, and special characters representing tabs and spaces.
Assignment statements
• to copy codes from one column into another.
• to replace certain codes in one column with those from a second column.
• to assign the value of an arithmetic expression to a variable.
• to copy codes from groups of columns into another column using the logical
operators and, or and xor.
In spite of the diversity of these functions the basic format of any assignment statement
is:
variable=item
where item defines what is to be copied into the variable.
Remember that comments can be identified by a capital C in column 1. If the first
variable in your statement starts with a C, make sure that you type it in lower case
otherwise the whole line will be read as a comment and ignored. For example:
col 1
c(15,16)=$12$ is correct, but
C(15,16)=$12$ will be read as a comment even though the syntax is correct
Alternatively, you may precede assignment statements with the word set, thus:
set c(15,16)=$12$
Copying codes
To copy codes into a single data variable, overwriting the variable’s original contents,
type:
variable=’codes’
To copy a string of codes into a field, type:
var_name(start,end)=$codes$
To copy the contents of one variable or field into another, type:
variable1 = variable2
Assignment statements are most commonly used to copy codes into a column or to copy
the contents of one variable into another. For instance:
c121=’159’
c121=c134
You can also copy strings of characters into fields of columns. Let’s say we want to copy
the code 59642 into columns 76 to 80 of card 3; we would write:
c(376,380)=$59642$
Partial column replacement
To replace a code or set of codes in one data variable with a code or set of codes in a
second data variable, type:
variable1’codes1’=variable2’codes2’
codes1 and codes2 must contain the same number of codes, and the codes must be in
superimposable order
Storing arithmetic values
To store the value of an arithmetic expression in a variable, type:
variable = expression
To copy a real value into a data variable, type:
var_name(start,end) :dp = expression
where dp is the number of decimal places required.
For example, if x5=10.22, the statement:
cx(15,19):2=x5
results in:
10.22
Assignment with and, or and xor
To copy codes which are present in at least one of a list of columns, type:
data_var_name=or(cnum1[’codes1’], cnum2[’codes2’], ...)
To copy codes which are present in all of a list of columns, type:
data_var_name=and(cnum1[’codes1’], cnum2[’codes2’], ...)
To copy codes which are present in only one of a list of columns, type:
data_var_name=xor(cnum1[’codes1’], cnum2[’codes2’], ...)
The final type of assignment is copying codes from a set of columns. The codes copied
depend upon the type of operator used:
and Copy codes present in all columns
or Copy codes present in one or more columns
xor Copy codes present in one column only
The format of the statement is:
column = operator(ca,cb,cc, ...)
where ca, cb, and cc are the columns whose codes are to be compared. Note that even if
you are comparing codes in consecutive columns, each column must be identified
separately,
For example:
the statement c181=and(c137,c138,c139) results in:
copying of codes into c181,that present in all columns c137,c138 and c139
the statement c182=or(c137,c138,c139) results in:
c182 contains a list of all codes present in AT LEAST ONE of the named columns.
Adding codes into a column
To add codes into a column in addition to those that are already there, type:
emit cn1’codes1’ [, cn2’codes2’ ...
Emit inserts codes into a column leaving the original contents intact. Its format is:
emit cn’p’
More than one column may be entered on each line, provided that each one is separated
by a comma.
emit c567’7’, c110’2’
emit can only be used with single columns; string variables are not valid: emit
c(100,110)$99$ does not work.
Deleting codes from a column
To delete selected codes from a column, type:
delete cn1’codes1 [, cn2’codes2’ ... ]
The delete statement is the opposite of emit in that it deletes codes from a column leaving
the remainder intact. Its format is:
delete cn’p’
More than one deletion may be effected with the same delete statement as long as each
column is separated by a comma.
delete c110’5’, c179’56’
Forcing single-coded answers
To force single-coding of a multicoded columns, type:
priority cn’code1’, ’code2’ ,’code3’,[cn2’code1a’, ’code2a’ ,’code3a’, ... ]
where a code at the start of the list should be accepted in preference to any later in the list.
The statement used for this is:
priority cn’code1’, ’code2’ ,’code3’,[cn2’code1a’, ’code2a’ ,’code3a’, ... ]
where cn is the column whose codes are to be checked and ’p1’ to ’pn’ are the positions
to check, entered in order of priority, the most important first.
priority checks only the listed positions; if any other codes are present they are
ignored.
the statement: priority c249’5’, ’4’, ’3’, ’2’, ’1’
causes Quantum to scan column 249 to see first whether it contains a ’5’ and, if so,
to delete all subsequent codes in the list. If c249 contains a ’5’ and nothing else, obviously
there will be no extra codes to delete; this does not matter. If there is no ’5’ in c249,
Quantum then checks whether it contains a ’4’; if so, any other codes in the range ’1/3’
are deleted, otherwise the program skips to the next code in the list and checks for that.
If none of the listed codes are found, the column remains unchanged.
Setting a random code in a column
To choose a random code from a list of codes, type:
data_var_name=rpunch(’codes’)
To choose a random code from the codes present in a column, type:
data_var_name=rpunch(col_number)
For example:
c115 = rpunch(’1/5’)
will place one of the codes 1 through 5 in column 115.
Alternatively, you may use rpunch with another C-variable, thus:
c115 = rpunch(c120)
Once this statement has been executed, column 115 will contain one of the codes present
in column 120.
Reading numeric codes into an array
To set up an array based on numeric codes in the data, type:
field array_name=column_spec [,code=cell_number, ...]
column_specs are references to the fields containing the numeric codes. code is a
non-numeric code present in those fields and cell_number is the cell of the array which
should be incremented whenever that code is encountered.
Cells in the array are reset to zero at the start of each new record. To prevent this
happening, enter the statement name as fieldadd rather than field. The rest of the
statement is as shown.
The format of the field statement is:
field output_array = column_specs [,special_specs]
output_array is the name of the array in which you wish to store the counts of responses.
You can use spare columns in the C array, but you may find your program is easier to
read if you define an integer array of your own with a name which reflects the type of
information it contains. For example, if you want an integer array called films, you might
write:
int films 5s
ed
field films = .....
When you define the integer array, make sure that you request as many cells as there are
codes in the data. In this example there are five films so you define the array as having
five cells. Quantum automatically creates an extra cell (cell 0) which it uses to count
responses for which there is no cell allocated. If there were six films, for example,
Quantum would increment cell 0 each time it found code 06 in the films columns. You
might like to check the value of this cell as a means of reporting on invalid codes:
if (films0 .gt. 0) write c(1,20) $Bad film code$
Negative and zero values also cause cell zero to be incremented. Codes which are shorter
than the field width are accepted as long as they are padded with blanks or zeroes.
The input_specs part of the statement defines the columns to read. You have a number of
choices here. First, you may list each column or field reference one after the other,
separated by commas. The list must be enclosed in parentheses. In our example this
would be:
field films = (c(12,13), c(14,15), c(16,17))
Second, if you have sequential fields as you do here, you can type the start columns of
each field followed by the field length. The list of start columns is separated by commas
and enclosed in parentheses, and the field length comes after the closing parenthesis and
starts with a colon. If you use this notation for the film example you would write:
field films = (c12, c14, c16) :2
If you wish, you can abbreviate this further by typing just the start columns of the first
and last fields, followed by the field length.
field films = c12, c16 :2
Third, if the fields are not sequential, you list the start columns and field width of each
group of columns (as shown above) and separate each group with a slash. For example,
to read data from columns 12 to 17 and 52 to 57, with each field being two columns wide,
you would type:
field films = c12, c16 / c52, c56 :2
This reads c(12,13), c(14,15), c(16,17), c(52,53), c(54,55) and c(56,57).
You can also use this notation for single non-sequential fields. For example:
field films = c23 / c36 / c71 :2
means c(23,24), c(36,37) and c(71,72).
The special_specs part of the statement is optional. You use it when a field contains
non-numeric codes such as $&&$ for None of these films. If you want to count codings
of this type, you must remember to allocate cells in the array for each code or group of
codes you wish to count. You then include the notation:
code = cell_number
to count those codes. For example:
int films 6s
ed
field films = (c12, c14, ch16) :2, $&&$=6
If you want to count more than one non-numeric code, list each one individually,
separated by commas.
Quantum normally resets the cells of the integer array to zero at the start of each record.
If you want counts to continue from one record to another, use a fieldadd statement
instead of field. For example:
fieldadd films = (c12, c14, c16) :2
Clearing variables
To remove values from variables, type:
clear var_name1, var_name2, var_name3
Changing the contents of a variable – Chapter 8 / 103
Variables of any type may be cleared using a clear statement:
clear var1, var2, .... varn
where var1 to varn are any valid Quantum variable or range of variables. For example:
clear c(109,180), t(1,200), myarray(29,33), myint, myreal
Data variables are reset to blank, integer variables are reset to 0 and real variables are reset to
0.0.
Variables can also be cleared using assignment statements (e.g., t1=0), but there are advantages
to using clear instead. Firstly, clear is much easier to write. Secondly, with clear the compiler
checks that the subscripts are in the correct range (e.g., 1 to 33 if ‘myarray’ has only 33 cells);
this is not possible with the loop method because the subscript is a variable. However, if you use
variables as subscripts with clear (e.g., clear c(t1,t1+5) subscript checking once again cannot be done.
Flow control
Statements in the edit section are usually dealt with in the order in which they occur in the program. Quantum provides statements which may be used to alter this normal order of execution, for example, by missing out a statement or repeating a group of statements a number of times.
Statements of condition
1) Ed -Defines start of edit section of a quantum run. The statement is essential if a Quantum run
contain an edit section
2) End -Defines the end of the edit section. This statement is a must if the run contains An edit
section.
1) If -To define statements to be executed if a certain condition is true For example:
if (numb(c10,c11,c12).gt.3) emit c20’9’
2) Else -To define statements to be executed if a given condition does not exist, For example:
if (c115’1’); else; emit c140’2’
3) go to - Ensures Quantum program will include statements which refer to certain respondents
only; For example:
The statement:
if (c121n’1’) go to 50
causes Quantum to go immediately to the statement labeled 50 if column 121 does not contain a
’1’ Any statements between this if statement and statement 50 are ignored whenever a record is
read where c121n’1’ is true.The statement labeled 50 may be any Quantum statement, but many
people just write:
50 continue
4) continue- This statement is a dummy statement whose sole purpose is to join various bits of a
program together. It is often used with a statement label as a destination for routing with go to, or
to identify the end of a loop.
5) Loops- Are used to define repetitive statements. Loops are extremely important structures
because they enable the same set of basic statements to be executed over and over again on a
changing series of numbers, columns or codes. Their use can reduce the work involved in
checking data. The statement which introduces a loop is do which is formatted as follows:
1. The word do.
2. A label number identifying the last statement in the loop.
3. An integer variable (for numbers or columns) or a letter (for codes) whose value is to be
used by the statements in the loop.
4. An equals sign.
5. A list of whole numbers, integer variables or codes which are the values the integer
variable or letter is to take. These may be entered in two ways
Loops should be terminated by any statement other than go to, stop, return, another do or an if
containing any of these words. The main purpose of the terminating statement is to identify the
end of the loop and send the program back to the start of the loop. Go to and return send the
record elsewhere, stop terminates the run and another do indicates the start of another loop. The
statement most often used to terminate a loop is the dummy statement continue. Any statement
that terminates a loop must be preceded by a label number.
Thus, the usual format of a loop is:
do label.number int.var = value list
- - statements to be executed - -
label.number statement
For example:
do 20 t5 = 125,145,5
if (c(t5,t5+4).gt.3000) c(t5,t5+4)=$ $
20 continue
6) Reject- To reject a record from the rest of the edit Normally all records are passed straight from
the edit to the tabulation section regardless of whether or not they contain errors. Reject tells
Quantum to continue editing the record but not to include it in the tables.
For instance, we might write:
if (c73’8’) reject
if (c80’1’) t5=t5+1
end
to reject records in which column 73 contains an ’8’ from the tabulations but not from the rest of
the edit. Therefore, even if c73’8’, the record is still checked for a ’1’ in column 80 and if one is
found, t5 is incremented.
7) Return - To send the record to the tabulation section, The word return in Quantum bears no
relation to the same word in English. It does not mean go back to the start of the edit or anything
like that, rather it means ‘terminate the edit immediately and jump to the tabulation section’. Once
the record is tabulated Quantum reads in another record as usual. If there is no tabulation
section, the next record is read in straight away.
Return is very often used with reject to reject a record without finishing the edit. For example:
if (c73’8’) reject; return
if (c80’1’) t5=t5+1
end
Here any records in which c73’8’ are rejected from the tables, but, because reject is
followed by return which sends records to the tabulation section, editing is terminated
immediately. Thus, only records in which c73n’8’ will be tested for a ’1’ in column 80.
8) Stop -To stop editing records and start tabulating records read so far Stop tells Quantum to
stop the run and print tables once editing has been completed on the current record.
For example,
we may want test tables for first 100 people,so we set up a counter and terminate the run when it
reaches 100:
The statement:
if (rec_count.eq.100) stop
will stop editing records and start tabulating records read so far
9) Process - To send a record temporarily to the tab section Process is an edit statement which is
similar to return but must not be confused with it. When return is executed, the record is sent on
to the tabulation section; after the tables are completed for that record, the program returns to the
start of the edit section and the next record is read in.
When process is executed, the record is also sent immediately to the tabulation section where it
is used in table creation. However, after the record has been tabulated, control is passed back to
the edit section to the statement immediately following the word process.
The record continues through the edit and any statements after process applicable to the record
are executed. At the end of the edit the record is passed through the tabulation section again.
10) Split - To write correct records out to a clean data file and incorrect records out to a dirty data
file
Clean and dirty data files are the terms used to refer to files of correct and incorrect or rejected
records created automatically by the edit statement split.
Examining records
Holecounts
Holecounts are used to obtain an overall picture of the data before you write your edit program.
For each column they show:
o a distribution of the codes – e.g., how many respondents have a 2 in column 56
o the density of coding – i.e., how many respondents have 1, 2 or 3 or more codes ineach
column
o the total number of codes for the whole data file.
Creating a holecount
To create a holecount, type:
count c(start_col, end_col) [$text$]
where text is the holecount title.
To create a holecount you will use the count statement:
count c(start_col,end_col) [$text$]
where text is the heading to be printed at the top of each page. This is optional; if it is omitted the
holecount will simply be headed ‘Holecount’. Our example was created by the statement:
count c(1,16) $Demonstration Holecount$
Frequency distributions
A frequency distribution enables you to inspect the contents of a field of columns containing
alphabetic or numeric data. For example, in a shopping survey the price the respondent paid for a
bottle of mineral water may be stored in columns 112 to 114. A frequency distribution will tell you
how many respondents bought mineral water at particular price. This is very useful for
determining how the values in these fields should be grouped for tabulation, as well as for rough
estimates of medians.
To create a frequency distribution sorted in alphabetic and rank orders, type:
list c(start_col, end_col) [$text$]
where text is the heading to be printed.
To produce a frequency distribution sorted in alphabetic order only, type lista instead of list. For a
distribution sorted in rank order only, type listr instead of list.
Here are some examples:
listr c(107,108) $Contents of cols 7 and 8$
lista c(100,104) $First Set of Car Brands$
The first example produces a frequency distribution of the contents of c(107,108) sorted in
numeric order; the second example generates a list of car brands which will be sorted in
alphabetic order.
Data validationIn earlier section we discussed ways of examining the data for a set of records (with count) or for
an individual record (with write). In general, however, we want to check the validity of the data for
individual records by putting in the edit a set of testing sentences which will tell us not only
whether a record contains an error but also what that error is. There are two types of checking
sentence.
The first involves checking whether a column contains the correct type of coding (single-coding/
multi coding) and whether the codes in that column are valid. Take the question on a
respondent’s sex which may be Male, coded c106’1’, or Female, coded c106’2’. c106 must be
single-coded since no person can have two sexes, and the only codes which may appear in that
column are 1 and 2.Any record in which c106 is not single-coded with a 1 or a 2 will be flagged as
incorrect.
The second type of checking involves making sure that columns whose contents depend on the contents of other columns contain the correct codes. For instance, suppose the questionnaire asks whether the respondent has ever used a particular brand of washing up liquid. The answer is coded into c125 as ’1’ for Yes and a ’2’ for No. If the answer is Yes, the next questions concerning price and quality are asked. If c125’2’ indicating that the respondent has not used that brand of washing up liquid, the following columns must be blank. Conversely, if c125’1’, the following columns must be coded according to the codes on the questionnaire.
require
Both tasks listed above can be carried out using if but sometimes they can become very
complicated and repetitive. Therefore, Quantum has an additional testing statement, require,
specifically designed to increase the efficiency of this checking process.
Require is used in three types of sentence:
Column Validation
Tests columns against a given set of characteristics and deals with records not meeting the
requirements according to a specified action code.
Testing the Validity of a Logical Expression
Tests a logical expression and, if it is true, continues with the next statement. If the expression is
false, the record is dealt with according to the given action code.
Testing the Equivalence of Logical Expressions
Compares the logical value of a group of logical expressions. If all are true or all are false, the run
continues with the next statement, otherwise if the expressions yield a mixture of values the
specified error action is carried out.
The require statement has three forms, depending upon the function it performs, and these are
described in the subsequent sections. Each one must start with the word require which may be abbreviated to R.
Column and code validation
To validate columns and codes, type:
require[/code/] condition col1 [,col2 ...]
where code is the error action code, condition is the type of coding required, and col1 and col2
are the columns or fields to be tested.
This form of the require statement has four basic parts:
1. The word require or the letter r followed by a space.
2. An optional error action code enclosed in slashes.
3. A code defining the type of coding required.
4. The column or columns to be checked, separated by commas.
Checking type of coding
Checking with require can be as simple or complex as you like. In this section, we will start with
the simplest checks and deal with each extra feature in turn. We will assume, unless otherwise
stated, that the error action code is the default Print and Reject (code 3) and will omit it from most
of the examples accordingly
The most basic form of the require statement simply checks whether the column or field of
columns contains the correct type of code; it does not check the individual codes themselves.
Code types may be:
b Blank
nb Not blank (i.e., single-coded or multi coded)
sp Single-coded (literally, single-punched)
spb Single-coded or blank
One of these types must follow the word require since it tells Quantum what to check for. All that
remains is to say which columns are to be inspected; just list each column or field of columns at
the end of the statement. If more than one column or field is defined, each one must be separated
by a comma.
Here are some examples in which the record to be checked is:
----+----1----+----2----+----3----+----4----+
002411123481231&- *1927235537*&& 1 1 1
The statement:
require nb c10, c(25,35)
checks that columns 10, and 25 to 35 inclusive are not blank – they may contain any number of
codes. This record satisfies both conditions so it passes on to the next statement in the edit.
The statement:
r sp c11, c15, c23, c41
looks to see whether columns 11, 15, 23 and 41 are single-coded. In our record they are, but if
this were not the case (say c11’123’) the record would be printed out and rejected from any tables
that may be produced. Additionally, Quantum would tell us ‘Column 11 is 123’.
Comments with require
To define a message to be printed when a record fails a test, type:
r [/err_code/ ] condition columns $message$
When incorrect records are printed out, require automatically prints a short text describing the
error. Normally, it tells you what codes were found in the column which is wrong, but if this is not
what you want, you may define your own error text by entering it enclosed in dollar signs at the
end of the statement. This text will then be printed in place of the default text when errors are
found. For example, if c329 is multicoded when it should be single-coded, the statement:
r sp c329
will print the whole record and tell us which codes were found in that multicode:
Column 329 is 13
Instead of being told which codes the column contains, you may prefer to see a message linking
the error to a question on the questionnaire. In this case you will need to add your own error text
as follows:
r sp c329 $q21a not sp$
These texts may be as long or short as you like.
Checking codes in columns
To check for specific codes in a column, type:
r [/err_code/] condition col1’codes1’ [, col2’codes2’ ... ]
where codes1 are to codes to be tested for in column or field col1, and codes2 are the codes to
be tested for in column or field col2.
Any codes which are present in col1 but are not listed in codes1 are ignored. The same applies
to any other column and code pairs listed.
Sometimes it is not sufficient to check just the type of coding, and you will want to know whether
the codes found are valid for that column. To do this, we use the information given in the previous
section as a base, and add on our first ‘optional extra’. To check whether a column or field of
columns contains specific codes, follow the column specification with the codes to be checked,
enclosed in single quotes. For example:
r /5/ sp c223’1/5’
tells us that column 223 should be single-coded within the range of codes 1 through 5. Any other
codes in this column are ignored. Thus, a record in which c223’14’ is incorrect because it contains two of the listed codes, whereas a record in which c223’27’ is correct because it contains only a 2 from the range ’1/5’. Of course, any record which does not contain a 1, 2, 3, 4 or 5 at all is also incorrect, regardless of whether or not it is single-coded: c223’9’ is just as wrong as c223’789&’.
Exclusive codes
To check that a column or field contains no codes other than those listed, type:
r [/err_code] condition col1’codes1’o
If col1 contains any codes other than those given in codes1, the test is false.
Now that you know how to check codes, the next thing to discuss is how to check that all other
code positions are blank.
We have said that statements of the form:
r sp ca’p’
accept all records containing only one of the codes ’p’ in column a, regardless of what other
codes are also present. To check that a column contains only the listed codes and nothing else,
follow the code specification with the letter O (for only) in upper or lower case. For example, to
indicate that c356 must be single-coded in the range ’1/5’ and that all other positions (’6/&’) must
be blank, you should type:
r sp c356’1/5’o
which is the same as
if (c356’6/&’.or.numb(c356).ne.1) write; reject
Any of the following would cause the record to be printed and rejected:
c356’34’ c356’59’ c356’8’ c356’ ’
Require may define conditions for more than one column. Just follow each column with the code
positions to be checked and separate each set with a comma:
r sp c164’12-’, c165’1/70’, c166’1/3’, c167’1/9-’, c168’1/5’
Here the columns to be checked are consecutive but have been listed separately because they
each have different sets of valid codes. If all columns could be single-coded in the range 1 to 7
we might abbreviate this to:
r sp c(164,168)’1/7’ $q10a/e$
since this notation means that each column in the field must be single-coded within the given
range rather than that the field as a whole may contain only one of those codes.
Automatic error correction
To define a correction code to be used as a replacement for codes which fail the required
condition, type:
r [/err_code/] condition col1’codes1’ :’new_code’
new_code is the code or codes to be inserted in col1 if it fails the test condition. Any codes
already in that column are overwritten.
As you know, records found to have errors are printed, coded and/or rejected according to the
error action code. When the run is finished you will look at these records and, if possible, correct
the errors by using the on-line edit or correction file facilities.
Occasionally you will know in advance what to do with certain types of error; say, for instance, the
respondent’s sex has been miscoded. You may decide or be told to recode this person as a ’3’ in
the appropriate column indicating that the sex was not known. The way to do all this in one go is
to write the normal require statement that checks columns and codes, and to follow the code
specification with a colon (:) and the replacement code (in this case ’3’) enclosed in single quotes,
thus:
r /2/ sp c106’12’ :’3’
Any record in which c106 is not single-coded with either a ’1’ or a ’2’ will have the contents of
c106 overwritten with a ’3’.
The equivalent using if and an assignment statement would be written:
if (numb(c106’12’).ne.1) c106’3’;
+write $c106 incorrect$
Once again, the require is shorter and quicker.
When working with fields, it is not possible to define replacement strings for the field as a whole.
You should, however, note that if a single replacement code is given for a field of columns, any
incorrect columns in that field will be overwritten with the replacement code. The correct columns
remaining untouched. If we have:
+----4----+
1927
and we write c(237,240)’1/5’ :’&’" we will have:
+----4----+
1&2&
Validating logical expressions
This type of require also has four parts, two of which are optional:
1. The word require or the letter r followed by a space.
2. An optional action code enclosed in slashes.
3. A logical expression enclosed in parentheses.
4. An optional error text enclosed in dollar signs.
For example:
r /3/ (c133’4’ .and. c140n’5’) $Cols 33/40 incorrect$
says that c133 must contain a ’4’ and c140 must not contain a ’5’. If one or other or both expressions are false, Quantum prints the record out with the message ’Cols 33/40 incorrect’ and rejects it from the tables.
Testing the equivalence of logical expressions
To test whether a group of logical expressions all have the same logical value, type:
r = (expression1) (expression2) ...
There must be a space between r and the = sign.
Require can evaluate groups of expressions and perform given tasks depending on whether all
expressions are true or all are false. When all the expressions have the same value (i.e., all true
or all false) Quantum continues with the next statement in the program, whereas if some are true
and some are false, the record being tested will be dealt with according to the given (or default)
error action code.
This statement has five parts:
1. The word require or the letter r.
2. An equals sign which must be preceded by a space.
3. An optional action code.
4. The expressions to be evaluated, each one enclosed in parentheses .
5. Optional error text enclosed in dollar signs.
This type of statement is generally used to check routing patterns. For example: if a ’2’ in c125
means that the respondent did not try Brand A washing powder, we would expect columns 126 to
145 which record his opinion of it to be blank. On the other hand, if he tried the washing powder,
we would expect to find his opinions about it coded in columns 126 to 145. This can be written:
r = (c125’2’) (c(126,145)=$ $)
which says that to be accepted, a record must either have a ’2’ in column 125 and blanks in columns 126 to 145, or something other than a ’2’ in c125 with at least one code somewhere in c(126,145).
Actions when a require statement fails
When Quantum executes a require statement, it sets the variable failed_ to True if the data fails
the require statement or to False if the record passed the requirement. You can then test
whether failed_ is True and take whatever actions you wish. For example, if you are checking
that the respondent’s sex is coded as a ’1’ or a ’2’ only, you may wish to blank out the column if it
contains any other code or codes. You could write this as:
r sp c123’12’
if (failed_) set c123’ ’
The test for failure is made on the last require statement executed for the current record.
This may not always be the most recent require statement in the program, and it may not be the
require statement you intend Quantum to execute. If you write:
r sp c112’1/5’
if (c115’1’) r b c116
if (failed_) set c116’ ’
the test for failure could apply to either of the previous statements. If column 115 does not contain
a ’1’, the second require statement will not be executed and failed_ will be True if column 112 is
not single-coded in the range ’1/5’. If column 115 contains a ’1’, then failed_ will be True if
column 116 is not blank.
You can get around this potential problem by setting failed_ to zero (the equivalent of False) just
before the require statement you wish to test. For instance:
r sp c112’1/5’
failed_ = 0
if (c115’1’) r b c116
if (failed_) set c116’ ’
Data correction
There are four ways to correct data:
o Correct the data in the original data file.
o Correct the data in the C array interactively.
o Replace the incorrect codes with specific codes using edit forcing statements.
o Write a file of corrections to be merged with the original data when it is read in by a
Quantum program.
Forced editing (forced cleaning)
This section does not introduce any new keywords; instead it tells you how to combine the
statements that you already know in order to clean your data.
A record which generates too many error messages, or which is clearly incorrect can be
removed, as noted. Suppose its serial number is 2004. Then we have:
if (c(101,104)=$2004$) reject; return
This rejects the record from the rest of the edit and the tabulation section as well. This statement
should be at the beginning of the edit to avoid unnecessary editing of a useless record.
Columns within a record can be removed by blanking them out or setting them to a common
reject code, often a minus or ampersand. For example:
if(c125n’12’) c125’&’; c(126,145)=$ $
All records in which c125 contains neither a 1 or a 2 will have the contents of that column
replaced with an ampersand, and whatever is in c(126,145) blanked out. As a real-life example,
suppose a 1 in c125 means that the respondent visited the market, and a 2 in that column means
he did not. Information about purchases made at the market are stored in c(126,145). If column
125 contains neither a 1 or a 2, we cannot clearly establish whether or not the respondent visited
the market so we set c125 to a special code and blank out any information about purchases.
Inserting correct data is generally more difficult than removing invalid data, because you very
often don’t know what the correct data is. However, if you do know, you can correct the data
record by record, or make the same correction for any record which is incorrect. For instance:
if(c(101,104)=$2222$) c112’2’; c(113,114)=$ $
corrects the record whose serial number is 2222 by setting a 2 into c112 and blanking out
c(113,114).
If you do not know what the correct data is, you may decide to replace the incorrect code or
codes with a valid code chosen at random. For example:
if (c(101,104)=$3625$) c145=rpunch(’1/5’)
replaces whatever was in column 145 with one of the codes 1 through 5 for the record whose serial number is 3625.
Introduction to the tabulation
When a record has passed through the edit without being rejected, it is passed to the tabulation section, if one exists. At this point, data, integer and real variables are available to create tables. The program deals with one complete record at a time. The tabulation section consists of a series of statements which determine the contents of the tables. Each table may be thought of as a matrix of cells. Each cell of this table is defined by two conditions, one from the row and one from the column.
The hierarchy of the tabulation section
The tabulation process is hierarchical in characteristics can be defined at one level which will apply to that and all lower levels.
Components of a tabulation program
A tabulation run consists of three sets of control statements:
Run control statements
Run control statements determine the overall characteristics of the run, and contain the text which is constant for all tables. Filters may be defined, applicable either to all tables in the run or to all tables defined before another general filter statement is read. Titles are entered in various ways depending upon their position in the table.
Defining run conditions
To define global and default conditions for the run, type:
a;opt1[; opt2; ... ]
at the start of the tabulation section.
Global run conditions, if any, are defined on the a statement. If used, it must be the first statement
in the tabulation section. Its format is:
a;options
where options are keywords defining the global characteristics of the run. You may list as many
keywords as you like, provided that they are separated by semicolons (;), for example:
a;dsp;op=12;date;dec=1
Some of the commonly used options and it’s functions are :
colwid=n Defines the width of columns in the printed tables where no p statements exist in the
column
csort Sort tables column-wise (i.e., horizontal sorting rather than vertical row-wise sorting).
date By default, tables are printed without a date. Use of the keyword date causes the current
date to be printed in the top right-hand corner of each table. The date is in the format dd mm yy
dec=n This determines the number of decimal places for absolute figures. If
dec= is not used, the default of no decimal places is assumed.
decp=n This sets the number of decimal places required for percentages. The default is decp=1
meaning one decimal place. This applies when op=0, 2, 7 or & (see below). Any number of
decimal places are allowed, as long as you make each column wide enough to accommodate
them.
dsp This leaves one blank line between each row of data in a table. Without this, one line follows
directly underneath another.
flt=name Invokes the filter conditions and titles named on the flt= statement. If the filter defines
conditions, the rules governing data options apply.
flush Causes rows containing percentages to be printed with the percentages directly below the
absolutes rather than one column to the right.
indent=n Where a row text is longer than the space allocated to the row text in the table,
Quantum breaks the line in between words and contin ues the
text on the next line. To have these continuation lines indented from the left margin, specify the
amount of indentation required with indent=.
Texts may be indented by between 0 and 15 spaces: the default is indent=0.
op=n This keyword governs the type of output in the tables. Output types are & Total
percentages. The value in the cell is percentaged against
the number in the upper left-hand corner of the table (normally the base) rather than on the totals
in the relevant column or row. If the table contains more than one base element, percentages are
calculated using the leftmost figure in the most recent base element.
- Row rank figures are printed below each cell. Figures are ranked within rows, using 1 for
the largest figure. Where two or more numbers have the same rank, they are all assigned the
lowest rank possible. Thus, if the previous rank was 2 and the next value to be ranked occurs in
the row three times, those numbers will all be ranked 5.
1. 0 Row percentages.
2. 1 Absolute figures (default).
3. 2 Column percentages.
4. 3 Column rank figures are printed below each cell. Figures are ranked within columns,
using 1 for the largest figure. Where two or more numbers have the same rank, they are
all assigned the lowest rank possible. Thus, if the previous rank was 2 and the next value
to be ranked occurs in the column three times, those numbers will all be ranked 5.
5. 5 Prints the text 100% on each cell of the base row.
6. 6 Used with op=2 to produce two percentages for each cell.
7 Cumulative percentages.Indices. The index for a cell is generated by dividing
the row percentage in the cell by the row percentage in the base row.
8 Prints absolutes and percentages side by side.
age This option invokes automatic page numbering. Since this is the default – pages are
numbered from 1 automatically – this option is generally used in its negative form of nopage
which suppresses automatic page numbering.
paglen=n This determines the number of lines printed on each page. The default is paglen=60
lines but any value between 10 and 10,000 is valid.
pagwid=n Normally tables can be up to 132 characters wide. pagwid= enables you to decrease
the page width or to extend it to a maximum of 10,000 characters.
pc This prints percent signs after percentage figures. This is the default, so this option is usually
used negatively – nopc – to print percentage figures without percent signs.
sort: Creates sorted or ranked tables.
wm=n This keyword names the weighting matrix to be used.
Table control statements
Table control statements name the questions to be cross-tabulated against each other to create
tables. In Quantum, these questions are called axes. The most important table control statement
is the tab statement which lists the axes to be used to create an individual table. These statements may also specify the text and overall characteristics of each table.
Creating a table
To create a table, type:
tab [axis1] [axis2] [axis3] [axis4] row_axis column_axis [;options]
In order to create a table, Quantum needs to know which is the column axis and which isthe row
axis. If the table has more than two dimensions you will need to say which axes should be used
for the extra dimensions. Each table must be created separately using a tab statement, as
follows:
tab row-axis column-axis
Tab statements must precede the axes definitions in your program file.
multidimensional tables
Multidimensional tables are ones created from more than two axes. They occur when a series of
tables has the same rows and columns, but each table in the group has additional characteristics
which are themselves the conditions of other axes. This sounds complicated, so let’s take an
example.
Our basic table is of age by sex created by the tab statement: tab age sex
We have been asked to produce a separate table of age by sex for each region of the country.
Whereas before each cell had two conditions (age and sex) it now has three (region, age and
sex).
There are two ways of writing this specification. You may either:
a) write as many tab statements as there are regions, and filter each table of age by sex to
include only those respondents resident in a given region, or
b) write a single tab statement to create a three-dimensional table.
Both methods produce the same results – the main advantage of (b) over (a) is that (b) involves
you in a lot less work.
The tab statement to create the multidimensional table is:
tab region age sexcommonly used options in tab section
sid place this table to the right of the previous one
und place this table underneath the previous one
add add this table to the previous one
div divide the previous table by this one
To place tables side by side, type a tab statement for the first table and follow it with:
sid row_axis column_axis [;options]
Options are any of anlev=, c=, celllev=, inc=, maxim, means, median, minim and wm=. To
place tables one underneath the other, type a tab statement for the first table and follow it with
for example: the statement
tab region sex;c=250’1’
sid region age;c=254’1’
will place two tables side by side
To place tables one underneath the other, type a tab statement for the first table and follow it
with:
und row_axis column_axis [;options]
Options are any of anlev=, c=, inc=, maxim, means, median, minim and wm=. for example:
the statement
tab region sex;inc=c(25,28)
und region age;inc=c(35,38)
will place the second table underneath the first one
To add tables, type a tab statement for the first table and follow it with:
add[col_offset[,row_offset] ] axis_names
where axis_names is the same number of axis names as appears on the tab statement. for
example:
tab ax01 bk01
add ax02 bk02
Here we are creating the table ax02 by bk02 and adding it to the table ax01 by bk01.
To divide one table by another, define the top table on a tab statement followed by:
div axis_names [;options]
where axis_names is a list of as many axis names as there are on the tab statement, and
options is any of the keywords anlev=, c=, inc=, maxim, means, median, minim or wm=. The
statements:
tab ax06 brk1
div ax07 brk2
Defines the denominator of a table to be produced by dividing the table specified On the previous tab statement by that on the div line.
Axis control statements
Broadly speaking, an axis is Quantum’s way of defining questions from the questionnaire. Each
axis consists of a set of statements which establish the conditions and text for the rows and
columns of a table.
The axis is an integral part of your tabulation program: without it there can be no tables. At its
simplest level an axis represents a question on the questionnaire, and contains statements which
define the responses to that question and the codes by which Quantum can identify them.
Each axis may be used to create one or more of the following:
o the rows of a table
o the columns of a table
o a page in a set of tables
o a set of pages in a group of tables
Types of elements within axes
There are four types of element in an axis:
o Text and condition elements
o Text elements
o Arithmetic elements
o Statistical elements
Text and condition elements
These elements contain text and conditions which define the characteristics a respondent must
have to be included in the element. In a simple axis each element will refer to one response to a
question and will produce a row, column or table of figures telling you how many people gave that
response.
The general format of a condition is:
c=logical expression
c=cn’p’ is true if column contains the code ‘p’ and false if does not
Most commonly used count-creating elements for tabulation are:
Count-creating elements are the basis of any table since they tell you how many respondents
gave which responses. There are several statements which will create numeric elements; which
you use will depend upon the type of data to be read and the complexity of the condition defining
eligibility for inclusion in the element. Statements are:
n01 used for simple or complex conditions
n15 same as n01 except that the element is not printed
n10 creates a base for percentaging
n11 same as n10 except that the element is not printed
col used for simple conditions
val used for numeric data
fld used for numeric codes
bit a variant of fld
Text elements
These elements create nothing but text; no cells containing counts or values are created from
these elements.
There are three statements which are used within an axis to create text-only elements. These
are:
n03 create a text-only element
n23 create a subheading
n33 continue long element texts
If you would like subheadings to be underlined, place one of the options unl1, unl2 or unl3 on
the n23. The hdlev= keyword allows you to define various levels of subheading, starting at level
1 for the top subheading down to level 9 for the lowest level. If you would prefer the text to be left
justified above the columns to which it refers, add the option hdpos=l to the n23. If you would
prefer the text to be right justified, use hdpos=r instead. (hdpos=c is also available for centered
text but since this is the default you are unlikely to need it).
Arithmetic elements
These are elements which contain arithmetic values rather than counts. For example, one
element may tell you the number of times a product was bought rather than the number of people
who bought it.
Statistical elements
Part of Quantum’s power lies in the fact that it offers you the ability to create various types of
statistical output without having to know the formulae necessary to calculate them. These
elements contain totals, subtotals or statistical functions such as means and standard deviations.
Statements which perform statistical calculations are:
n07 average
n12 mean
n13 sum of factors
n17 standard deviation
n19 standard error of the mean
n20 error variance of the mean
n30 medians
n04 total
n05 subtotal
To define incremental values for means, standard deviations, standard errors and error
variances, type:
n25[element_text; inc=arith_expr [;c=log_expr] [; row] [; col]
The n25 does not normally print anything in the table. Use row and/or col to print these values as the rows and/or columns of the table.
factors
fac= defines factors when the numbers in the data are not to be used (e.g., the data may be
multicoded) whereas inc=, also mentioned in the Data Options section, reads the data from the
column and uses that as the factor for each row. What to use when is best illustrated by
examples,
although in general you should try to use fac= whenever possible since, in processing terms, it is
more efficient than inc=.
The respondent has been asked to say how much he agrees or disagrees with a particular
statement. If he agrees very much, he has a code ’1’ in, say, C210. If he agrees somewhat, he
has a ’2’; if he neither agrees nor disagrees he is coded as ’3’; disagrees somewhat, a ’4’ and
disagrees very much, a ’5’. People who refuse to answer are coded as C210’&’. We wish to
obtain a numerical mean value of these opinions using factors of +2 for agrees very much down
to –2 for disagrees very much. These are not the same as the codes representing these
responses in the data, so we enter them with fac=. People who refused to answer will appear in
the table but will not be included in the mean.
So the axis will look like
l vers1
n01Agrees Very Much;c=c210’1’;fac=2
n01Agrees Somewhat;c=c210’2’;fac=1
n01Neither Agrees Nor Disagrees;c=c210’3’;fac=0
n01Disagrees Somewhat;c=c210’4’;fac=-1
n01Disagrees Very Much;c=c210’5’;fac=-2
n01Refused;c=c210’&’
n12Mean;dec=2
Miscellaneous ‘n’ statements
To define a condition that applies to a group of consecutive elements, type:
n00;c=logical_expression
An n00 defines a condition applicable to all subsequent rows until another n00 is read or
until the end of the axis, whichever is the sooner. Its format is:
n00[;c=condition]
Where the condition is any valid logical expression.
To override the automatic page turnover within an axis, insert the statement:
n09[Text]
at the point at which a new page is required. ‘Text’ is an optional text which will be printed beneath the table headings at the top of the next page.
More commands to generates counts
The col statement
To define a list of elements with codes all in the same column, type:
col number;[base;] elm_txt1[=’codes1’] [; elm_txt2[=’codes2’] ... ]
If several consecutive statements in an axis have conditions defined by a code or codes in the
same column, you can save yourself a lot of time and effort by replacing the individual n01
statements with a single col statement.
One of the simplest col statements you can write is:
col n;[base];Rtext1[=’p1’];Rtext2[=’p2’]
where n is the column containing the codes for this question, base creates a base element, and
Rtext1=’p1’, Rtext2=’p2’ and so on define the texts and conditions for the individual elements.
To explain more clearly how the col statement works, let’s take the axis mstat that we wrote
earlier and rewrite it using a col statement. Originally it consisted of five
statements:
n10Base
n01Single;c=c109’1’
n01Married;c=c109’2’
n01Divorced;c=c109’3’
n01Widowed;c=c109’4’
We can replace these with the line:
col 109;Base;Single;Married;Divorced;Widowed
The val statement
Val is used when the conditions defining eligibility for inclusion in an element are positive
numbers or ranges of positive numbers rather than codes; that is, where the question in the
questionnaire requires a numeric response rather than a single or multicoded answer; for
example, the number of people in the household, or the number of telephone calls made.
To define elements whose condition is that a variable contains a specific value, type:
val variable; = ;number1 [element_txt1];number2 [element_txt2] ...
If the elements contain text as well as a number, the number may appear anywhere in the text. If
the value is not part of the text, type:
val variable; = ;element_text = number; ...
The base, hd=, tx= and =rej options described for col statements are also valid on val
statements of this type.
Val can be used to test whether the value of a variable is equal to a given value. If it is equal, the
cell count is incremented by 1. The format is:
val variable;[Base];[hd=Text];=;[tx=Text];n1 [Text1]; ... ;nn [Textn]
where variable is the data, integer or real variable whose value is to be tested, n1 to nn are the
values against which the variable is to be compared, and Text1 to Textn are the row
descriptions to be printed in the table.
The equals sign indicates that the test is for arithmetic equality rather than ranges. Base, hd=
and tx= are optional and create the base, sub-heading and text-only rows of the table as
described for col statements.
Let’s work through an example to illustrate this. Suppose c(110,111) contains data on the number
of people in the household, and we wish to set up a table showing how many respondents live in
households containing 1, 2, 3, 4, 5 or 6 people, so we write:
val c(110,111);Base;Hd=Number in Household;=;1 Person;2 People;
+3 People;4 People;5 People;6 People
The fld statement
To define elements whose condition is that a field contains a specific numeric code, type:
fld column_specs;element_txt1[=code[,code ...] ]; ...
The base, hd=, tx= and =rej options described for col statements are also valid on fld
statements.
The column specs on a fld statement define the columns to be read. There are three ways of
entering them. First, you may list each column or field reference one after the other, separated by
commas. The list must be enclosed in parentheses. In our example this would be:
fld (c(12,13), c(14,15), c(16,17))
Second, if you have sequential fields as you do here, you can type the start columns of each field
followed by the field length. The list of start columns is separated by commas and enclosed in
parentheses, and the field length comes after the closing parenthesis and starts with a colon. If
you use this notation for the film example you would write:
fld (c12, c14, c16) :2
If you wish, you can abbreviate this further by typing just the start columns of the first and last
fields, followed by the field length. This time you do not use parentheses:
fld c12, c16 :2
Third, if the fields are not sequential, you may list the start columns and field width of each
group of columns (as shown above) and separate each group with a slash. For example, to read
data from columns 12 to 17 and 52 to 57, with each field being two columns wide, you would
type:
fld c12, c16 / c52, c56 :2
This reads c(12,13), c(14,15), c(16,17), c(52,53), c(54,55) and c(56,57).
You can also use this notation for single non-sequential fields. For example:
fld c23 / c36 / c71 :2
means c(23,24), c(36,37) and c(71,72).
The element specs part of the statement defines the element texts and the codes which represent
those responses. If you enter element texts by themselves, Quantum assumes that the first text is
code 1, the second text is code 2, and so on. The codes apply to all fields named in the column
specs part of the statement. Therefore, to define elements which will count the number of people
who saw each film, you would write:
fld c12,c16:2;Columbus;Aliens 3;Pretty Woman;
+Green Card;Batman 2
Weighting in Quantum
Sometimes in surveys we treat the respondents as representatives of the total population of
which they are a sample. Normally, tables reflect the attitudes of the people interviewed, but we
may want the tables to reflect the attitudes of the total population instead, so that it seems as if
we had interviewed everyone rather than just a sample of the population. This, of course,
assumes that the people interviewed are a truly representative sample.
If we take a sample of 380 from a population of 10,000 middle-aged housewives, and discover
that 57 members of this sample buy cheddar cheese, we may want the number of middle-aged
housewives who buy cheddar cheese to read 1,500 in our tables, not 57. Moving from 57 to 1,500
is the fine art of weighting. In this case, each middle-aged housewife has a weight of 10,000/380.
Since 57 of them buy cheddar cheese, the number in the cell will be:
10000 / 380 * 57 = 1,500
Weighting is also used to correct biases that build up during a survey. For example, when conducting interviews by telephone you may find that 60% of the respondents were women. You may then want to correct this ratio of men to women to make the two groups more evenly balanced.
Weighting methods
Quantum is sufficiently flexible to allow more than one set of weights for a given set of
respondents. Which set is applied is determined by options on the a,sectbeg, flt or tab
statement or on the statements which create the individual rows or columns of a table. Each set
of weights, however, will apply one weight for each respondent. There are two ways of calculating
weights:
a) The weight for each respondent may be part of the data for that respondent, or it may be
calculated in the edit and passed to the tabulation section as a variable.
b) The more common method of weighting is to define a set of characteristics and apply specific weights to respondents satisfying those characteristics.
Types of weighting
Quantum offers factor, target and rim weighting, preweights, postweights, weighting using
proportions and weighting to a given total.
Factor weighting
With factor weighting, every record which satisfies a given set of conditions is assigned a specific
weight. You would generally use it when the weights are calculated outside of Quantum – for
instance, you may be told that all unemployed people in London require a weight of 10.5,
whereas unemployed people in the rest of the country need a weight of 7.3.
Target weighting
Target weights may be used when you know the exact number of respondents you want to
appear in each cell of the weighted table. For example, in a table of age by sex, you may know
the exact number of men under 21, women under 21, and so on, to appear in the table once it
has been weighted. The weights that you define in your matrix are therefore the values to appear
in the weighted table rather than the weights to be applied to each respondent of a given age and
sex.
Rim weighting
Rim weighting is used when:
a) you want to weight according to various characteristics, but do not know the relationship of the
intersection of those characteristics, or
b) you do not have enough respondents to fill all the possible cells of the table if you were to
weight the data using the multidimensional technique described above.
For example, you may want to weight by age, sex and marital status and may know the weights
for each category of those characteristics (e.g. people aged 25 to 30; men; single people).
However, you may not know the weights for, say, single men aged between 25 and 30, married
women aged between 31 and 40, and so on.
Entering weights as proportions (input weighting)
When we were talking about target weighting, we said that sometimes you might not know the
actual counts of respondents in a group, even though you may know that the group is a certain
percentage or proportion of the total population. For instance, you may know that 60% of the
population is women, but you may not know how many women that represents.
When this happens, you can enter the percentages or proportions as the weights for each group,
and use the keyword input to indicate that these figures should be used as targets. For example,
in a table of age by sex you would enter the proportion or percentage that each combination of
age and sex is of the total population, and Quantum would calculate what weight to assign to
each respondent in each category.
Weighting to a given total
When you define targets which add up to more than the number of respondents in your sample,
Quantum will calculate the weights for each respondent such that the total for the weighted table
equals the total of the figures in the weighting matrix. You may define your own total figure
(usually the number of respondents in your sample) using the keyword total=n, where n is the
required weighted total. Quantum will then calculate the weights according to the values in the
weighting matrix and will then adjust them to match the total you have defined.
Preweights
Preweights, stored as part of each respondent’s data or created during the edit, are applied to
individual records before target or factor weighting is applied. When the characteristic weights are
targets, the preweights are used in the calculation of the weight for each respondent.
Postweights
The opposite of preweights are postweights, which are applied after all other weights have been applied, and therefore have no effect on the way in which targets are reached. They are generally used to make a final adjustment to a specific item.
Descriptive statistics
Quantum provides facilities for calculation of a set of basic statistics from the figures produced in
Quantum tabulations. They include the statistics most commonly used for testing hypotheses
about the values of proportions (percentages) and the locations (average values) of variables,
and about differences in these between two or more subsets of the data. There are also chi-
squared statistics for testing hypotheses about a single distribution or about differences between
two or more distributions.
The statistical tests available are:
o One-dimensional, two-dimensional and single classification chi-squared tests
o Four tests of differences between proportions (Z-tests)
o Two tests of differences between means (T-tests)
o Friedman’s test of differences in location between a set of related samples (sometimes
known as ‘Friedman’s two-way analysis of variance’)
o Kolmogorov-Smirnov test of differences between two samples
o McNemar’s test of the significance of changes
o F Test for testing differences between a set of means (one-way analysis of variance
(ANOVA))
o Newman Keuls test of differences between means
o For each statistic, Quantum also calculates and prints an associated significance level so that you can readily see the results of the tests you have performed.
Quanvert
Quanvert is the Windowed version of quantum database. In other words , it is the GUI for
Quantum . Quanvert can process surveys of any type, size or complexity. Whether it's a survey
with hundreds of questions, or millions of respondents, or one that's been conducted on a regular
basis for years - Quanvert can handle it fast.
Quanvert has been specifically designed for the market researcher. You don't have to be a data
processing or computer expert, or a statistician -you just have to be interested in your survey
results! And you can investigate your data from your desktop, without having to search through
volumes of printed reports. There is no need to predict what analyses you will require before you
receive your data
Any table can be created based on any variable or question. You can test out any hypothesis,
and dig as deep into the data as you wish. For instance, you may want to examine the age group
of people who responded positively to an advertisement. You can then take this a stage further
and produce a series of tables filtered on those females interviewed. Quanvert is especially
powerful for analyzing individual responses to verbatim or "open" questions.
How is the database produced?Quanvert databases are specified and created using quantum- SPSS MR's leading package for
editing, weighting and tabulating survey data. Quantum is already renowned in its own right as
the most powerful tabulation system available today. You can create the database yourself using
Quantum.
Preparing Quanvert database using Quantum
Before you can convert a Quantum spec and data file into a Quanvert database there are several
tasks you may need to carry out first. These include checking the Quantum program to ensure
that it will create the required information in the appropriate places, and setting up subdirectories
if variables are not to be stored in the main project directory. If you have a large database from
which you require only a few variables, you may use the raw Quantum data rather than creating a
full Quanvert database.
To create a Quanvert database,
Following command needs to be given at the command prompt
:
quantum –v [–pd dir_1] [–td dir_2] [prog_file] [data_file]
The –v parameter tells Quantum not to produce tables but, when it reaches the output
stage, to run the flip program instead.
The –pd and –td parameters allow you to read files from and create temporary files in directories
other than the directory in which you are running Quantum.
All Quanvert projects originate from Quantum. Although Quanvert produces tables identical to
those generated by Quantum, it does not normally use the raw data and Quantum program files.
Instead, it uses a series of compressed data and axis files, one pair per axis, derived from the
Quantum files. These individual databases are referred to as inverted or transposed databases,
and the process which creates them is called flipping. In databases with simple axes it is
possible to run Quanvert almost immediately on the raw Quantum data.
Files created by flip
File creates a number of files. The ones which are important to Quanvert are: The sex axis, for
instance, will have two sex.ax containing the element texts and sex. fli containing the inverted
data for that axis.
Filename Contents
*.ax axes text files
*.fli inverted data files
*.inc numeric variables (inc) files
*.mul values for numeric variables in axes
*.bit bit files for named filters
*.btx text for named filters
*.alp text (alphanumeric) variables files
axes.inf names of axes present in the database
incs.inf names of numeric variables present in the database
alpha.inf names of text variables present in the database
bits.inf names of named filters present in the database
qvinfo levels and weighting information
qvlvmn levels cross-reference files defining the relationship between the
higher level m and the lower level n
seg1.qv default run conditions and titles from the a statement
wmvalsn.q weights for weight matrix n
The sex axis, for instance, will have two sex.ax containing the element texts and sex. Fli
containing the inverted data for that axis.
To tidy a directory once the database has been created, type:
flipclean [–a] under Unix or: flipclea [–a] under DOS.
This deletes any temporary files created during the flip process but leaves intact any files which
are needed for Quanvert. Example
Structure of Quantum Spec:A typical program might look like this:
Struct;read=2;ser=c(5,8);crd=c(9,10);max=32 Structure of the Record
*include vars External Variables and Arrays are declared in a file
called Vars and included before including edit section
ed
*include edit Edit section will have calculations of counts, column settings to get
end counts which are not straight-forward.
a;dsp;spechar=–*;decp=1;flush;wm=0;axttr; Global commands which
+dec=0;rinc;acr100;dp;nsw;nopage;notype; control the overall
+paglen=64;pagwid=145; characteristics of a run
wm1 wax1 wax2;rim;input;
+20;30;50; Weighting of the dat in the output ( if required )
+50;50;
+33;33;33
*include tabs Will have details of what to be tabulated
with what in order to get a table
*include axes Contains the definitions of all variables used as Rows
*include breaks Contains the definitions of all variables used as Columns