ccntbcliet s2asch cvee piniie pebpotation giodps
TRANSCRIPT
CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS
by
HAfilC A. ABANHA, E . T e c h . i n H.E.
A IHEISIS
IS
COHPaTEB SCIENCE
Sutiitted to the Graduate Faculty of Texas Tech University in
Partial Pulfillaent of the Bequireaents for
the Degree of
aASlEB OF SCIENCE
Approved
May, 198M
Cx'f,^^ ACKNOWLEDGHENTS
I sincerely thank Dr. Erol Enre for his direction and
helpful criticisa of this work. I am also indebted to Dr.
John Walkup and Dr. Martin Hardwick for their encourageaent
and technical advice.
11
ABSTBACT
A penetrance learning system is iapleaented to aechan-
ically find a heuristic to perfora a heuristically control
led search over finite permutation group graphs. The learn
ing systea is tased on probabilistic analysis. A saaple so
lution is evaluated in detail. Seasonably gcod results were
obtained.
2.11
CONTENTS
ACKNOWIEDGMENTS 11
ABSTBACT iii
CHAPTEB
I. PBEILIHISIEIES 1
Intrcdnction 1 Definitions 2 Probleii Statement 5
II. SOLTJTIClf HETHCDCLOGY 8
IntrodBCtion 8 Definitions 9
—Penetrance Learning System 13 Solver 14 Differentiator 16 Begresscr 18 Penetrance Normalization 19
III. BESDLTS 22
Implenentation 22 Bxperinental Tests 23 Conclusions 27
BIBLIOGBAPHY 29
APPENDIX
A. EBBOB ESTIMATES 31
B. TBAINING PBOBLEM SETS 33
C. COMPDTEB IMPLEMENTATION 34
IV
LIST OF TABLES
1. Chosen Features 24
2. Learning Phase Results 24
3. Solving Phase Besults 25
4. Test for Local Optiaality 26
5. Training Problems 33
LIST OF FIGDBES
1. Group Trans fcraat i cns of a Hexagon 7
2 . B r e a d t h - f i r s t Search Tree '^'^
3 . Penetrance Evaluation '̂ ^
4. Penetrance learning System 14
5. Heuristic Search Procedure ^^
6. Region Splitting Procedure ''̂
VI
CHAPTEB I
PBELIMINABIES
Introduction
The essential nature of Artificial Intelligence(AI) is
that of "syabcls and search" [Newell,1981 ]. However, the
central concern of aost HI research publications to date may
be sumaarized tj the following problem:
"Given a large state-space (possibly infinite), a finite
set of operators, initial states and final states find a se
quence of operators (possibly optimal) from the initial
state to a final state" [Nilsson,1971]- This problea occurs
in different forms in areas like large production systems,
graaaars, theorem-proving, puzzles, gaaes, database and
knowledge base systeas.
A systematic exhaustive search for a solution to the
above problea is invariably combinatorially explosive and
thus highly prodigal of computer tiae and memory. A concern
of problem-solving research in AI has been to devise heuris
tics that control the direction of the search, thus yielding
solutions with substantial reduction in search effort.
Heuristics are •rules of thumb* incorporating certain
problem-specific information, which may be aechanically
discovered [Ernst ,1982] cr supplied by huaan experts in the
problea doaain [Simon, 1980 ] - Often such aethods do not guar
antee a s o l u t i o n , or i f they do i t aay not he optimal. Thus
such contro l led search aethods aay be at best quas i -a lgo-
r i t h a s . But there i s yet no pract ical a l ternat ive to th i s
approach.
The main concern herein i s to apply such heur i s t i ca l l y
control led search techniques to perautation groups. A prob
a b i l i s t i c learning systea i s developed to obtain a su i tab le
h e u r i s t i c for the search method. A s i a i l a r penetrance learn
ing systea has been successful ly iapleaented for the f i f t e e n
puzzle [Bendell ,1983 ] .
Cefinitiong
The fo l lon ing de f in i t i ons pertain to s tate-space search
problems [Georgeff,1983 ] .
Def in i t ion . A s tate-space problea. P, i s a sextuple ,
p := <S,0,Y,C,I ,P>,
where, S = set of states (state space) ,
0 = finite set of operator(input) sjmhols,
Y = state transition partial function, Y:S*C — > S,
C = cost partial function, C:S*0 — > B+,
(B* = the set of positive real numbers)
I = finite set of initial states(I in S),
F = finite set of final states or goals (F in S) .
Definition, s ==> s« iff there exists o in C, such that o (s)
= s'. ==>+ represents the transitive closure of ==>.
Definition. If i ==>+ f, for i in I and f in F, then the
corresponding operator sequence is a solution to the prob
lea- If the operator sequence is of ainimal total cost the
solution is octiaal.
Definition. A heuristic (partial) function, h is
H : S — > B-»-.
H will hereinafter be referred to as a heuristic. H serves
to evaluate the potential of an operator to yield a final
state when applied to a given state. It is thus an estimate
of the ainiaal cost of an operator sequence from a given
state to a final state.
The follciing definitions pertain to Group Theory.
[Herstein,1964; Stone,1973].
Definition. A qroup^ <G,.>, is an algebraic structure where
G is a set, . is the product operation and
i) . : G*G — > G
ii) g. (h.k) = (g-h) .k for all g,h,k in G
• « • 111) there exists, e in G, such that, e.g = g.e = g,
for all g in G
iv) for all g in G there exists an inverse, g» in G,
such that, g.g' = g'-g = e.
<G,.> is finite if the cardinality(order) of G is finite.
The order of a group element, g is the least positive inte
ger, r, such that, g = e.
Definition. A subset of elements of a group, G, is called a
generator set of G iff every eleaent of G can be expressed
as a product of the subset elements and their inverses.
Definition. For a group, G, an equality of a product of gen
erators and generator inverses to the group identity is
cailed a generator relation.
Definition. A group graph is a directed graph in which:
1. The vertices are labelled in one to one correspondence
with the elements of the group.
2. If X is a group generator then for each vertex, y,
there is an edge froa y to z, where, z = y.x.
Definition. A perautation of a set, X, is a fcijection froa X
onto X. The degree of the perautation is the cardinality
(nuaber of elements) of X.
D e f i n i t i o n . A cjjcle of a permuta t ion , P, of a s e t , X, i s an
ordered s e t , 2 m-1
(X ,XP,XP, . . ,XP ) ,
where, x is in X, and m is the least positive integer such m
that, xP = X.
Definition. A permutation .group is a set of permutations
that forms a group under function composition (i.e., a sub
group of the set of all fcijections on a set, X).
Problem Statement
Given a finite generator set of a permutation group of
finite degree the problea of concern herein is to obtain a
product sequence of generator permutations that yields a
given permutation in the group.
This is clearly a state space problem as defined above.
The initial state(I) corresponds to the identity permutation
and the operator set(O) corresponds to the generator set or
to any permutation derived as a product of generators. Thus
states and operators are indistinguishable here. The cost
function may be taken as C(s,o) := 1 (o) , where 1 (o) = the
length of the generator seguence represented by operator, o.
The state transition function is defined by the functional
coaposition of permutations and is thus total.
This problem is non-trivial when large permutation
groups are considered. In general it can be extended to any
group as it can be shown that any group is isomorphic to
soae perautation group [Stone,!973 ]. The execution tiae of
an exhaustive search algorithm for such a problea is expo
nential in the cardinality of the generator set. However, it
is typical of those probleas for which the controlled search
described above is appropriate.
The solution of this kind of problea could be useful in
areas such as robot task planning, geometrical transforma
tions, aemory interconnection design [Wu,1981] and the solu
tion to several coabinatorial puzzles having a group struc
ture [ Stone, 1973 J.
For example, figure 1 shows a siaple application to
geometrical transforaations using the dihedral group
[Stone,197 3 ]. There are two generators, rotation through 60
degrees (g) and reflection about a vertical axis (h). Thus
froa the identity several configurations may be obtained by
finding the appropriate sequence of generators. This could
be a far aore ccaplicated problem for more complex geometri
cal configurations.
Similarly we could consider a robot manipulator with a
set of primiaitive movements (generators) which could be
concatenated tc achieve a certain resultant motion (goal).
Any particular nanipulatcr configuration is thus considered
a group element.
Identity, e Botated 180 degrees and reflected about the vertical axis, g^h
Figure 1: Group Transformations of a Hexagon
CHAPTEB II
SOLUTION METHOCOLOGY
Introduction
Several different kinds of state-space search proce
dures using heuristics have been matheaatically analysed
[Nilsson,1971]. But in practice it is extreaely difficult
to find heuristics that satisfy specific aatheaatical prop
erties [Bagchi, 1S83]. Exaaples of such properties are 'ad-
aissibility* and the •acnotcne restriction* [Nilsson,1980].
So invariably an empirical approach has to be adopted. Thus
in order to solve the problem at hand the main concern is to
find some method of obtaining a useful heuristic. A learn
ing systea is employed tc obtain a heuristic by statistical
ly analysing several completed solutions. The program devel
oped has a capability of learning from experience and
improving its performance with tiae. Thus the main objective
is to develop an automated learning system which needs lit
tle human intervention.
8
Definitions
The following are definitions of terainology used to
describe the learning systea that has been developed.
Definition. A rroblea instance. P, is a particular problea
in the class (domain) of probleas under consideration.
Definition. A search tree, T(H,P,H), is the tree obtained
from problem instance, P, by repeatedly applying all opera
tors to the currently best state (according to the heuris
tic, H) until either a solution is found or until a aaxiaua
of M states have been generated-
Definition. A feature is a property that serves as a measure
of the difference (difference metric) between each state, S,
and the final state, G, of a problea instance.
Definition. A feature vector for a given state is an n-vec
tor (for n chosen features) whose components are the feature
values of the state.
Definition. An n-dimensional feature space is an n-dimen-
sional vector space of feature vectors.
In practice the features are chosen based on
problem-specific inforaation. They may be aechanically
10
detera ined [Erns t ,1982 ] , or based on expert huaan knowledge
of the problem doaain. Figure 2 shows a s imple search t ree
with cons tant h e u r i s t i c (breadth- f i r s t ) for the symmetric
group of degree f o u r . The fea ture vector coaponents are
eva lua ted for the s t a t e s on the s o l u t i o n path (darkened).
The t ¥ 0 chosen f ea tures are: f1=absolute d i f f e r e n c e of prod
u c t s of c y c l e l e n g t h s , f2=nuaber of misplaced e l e a e n t s . Ter
minal s t a t e s are parenthes ized . For s t a t e S1=0 1 2 3 the
c y c l i c r epresen ta t ion i s (0) (1) (2) (3) and for the g o a l , &=2
0 1 3 i t i s (102) ( 3 ) . Therefore, we have, f1 = 1 ( 1 . 1 . 1 . 1 ) -
(3 .1 )1 = 2 , where the f a c t o r s correspond to the c y c l e
l e n g t h s and, we have, f2 = 3 because the f i r s t three symbols
are a i s p l a c e d . The array representa t ion of permutations i s
exp la ined in chapter I I I . This use of f e a t u r e s i s s i a i l a r
to the aeans-end a n a l y s i s used i n GPS. [ N i l s s o n , 1980;
Ernst , 1969] .
D e f i n i t i o n . A r e g i o n , r , i s a p a r t i c u l a r subset (voluae) of
t h e f e a t u r e space . For s i a p l i c i t y we assume that for a 2-D
f e a t u r e space the r e g i o n s are r e c t a n g l e s , for 3-D cuboids ,
and so on. Regions aay be represented by t h e i r upper and
lower l i a i t s — for exaaple rectangular r e g i o n s are
represented by d iagona l ly oppos i te corners of the
corresponding r e c t a n g l e s .
11
S 1=0 1 2 3
£2=1 0 2 3 S3=1 2 3 C
(S4=0 1 2 3) S5=2 1 3 0 S6=0 2 3 1 S7=2 3 0 1
B
(S8=2 C 3 1) S9=3 2 0 1 S10=1 3 0 2
{S11=3 2 1 0) (S12=3 1 2 0) 513=0 3 1 2 (S14=2 0 1 3)
GENEBATOB SET: A = 1 0 2 3 B = 1 2 3 0
GOAL = 2 0 1 3 SOLUTION = EBAE (backward chained se q uence)
State SI S3̂ S6 S10 S14
f 1 2 1 0 1 0
f2 3 4 4 4 0
Figcre 2 : Breadth - f i r s t Search Tree
D e f i n i t i o n . The penetrance , p , of a search t r e e , T, in
r e g i o n , r, i s ,
p(r ,T) := g ( r , T ) / t ( r , T ) ,
where, g (r ,T) = nuaber of expanded s t a t e s in r which are on
a s o l u t i o n path and t ( r ,T) = t o t a l nuaber of expanded s t a t e s
in r .
12
Figure 3 shows a search t r e e , ! , and a corresponding
two-dimensional feature space ,F , divided into 'rectangular*
r e g i o n s . The problea doaain i s the symmetric group of degree
f o u r . The penetrances in the various reg ions are as f o l l o w s :
p(r1 ,T) = 1/1=1.C, p ( r 2 , T ) = V 2 = 0 . 5 , and p (r3,T) =1/3 = 0.3
[ R e n d e l l , 1 9 8 3 ] .
7I\ °7t^x:\; o o \ o o \ \ ^-.
H
T
\ GOAL N
\ \
\ \
\ \
\ \
\ \ \ \
\ \
\ \
\ \
\ \
\ \
• ^ v]x- ^4^ ^5
- >
"*T-
\ m. - ^
y ^
\ H
Eigure 3: Penetrance Evaluation
13
Penetrance could be used as measure of the worthiness
of states for expansion. For any state, S, in tree, T,
P(r,T) is an estimate of the conditional probability that S
is on a solution path given that the feature vector for S
lies in region, r. Thus if we can estiaate the penetrance
at any point in the feature space we could use this to esti
aate the relative aerit of each selected feature. So, as
will be seen later, an estiaate of p(r,T) for each region,
r, of search tree, T, is as good as a heuristic.
Penetrance Learning System
The concept of penetrance is used to develop a learning
systea which * learns* a useful heuristic. Figure 4 shows a
Penetrance Learning Systea (PLS). It consists of three main
coaponents: a SOLVER, a DIFFEBENTIATOB and a CLOSTERER.
This systea is described below. Further details aay be ob
tained froa [Rendell, 1983].
The penetrance learning systea is used to •learn* a
heuristic on a probabilistic basis by solving several sets
of training probleas. To get the systea started a 'booting*
aechanisa is incorporated.
14
Set of problea instances,P
Solution trees, T(b,P)
w
SCLVEB ^
P)
'r - i
DIFFEBENTIATOB
CuBulative region set, {(r,p,e) )
BEGBESSCB
Feature weight vector,b
Figure 4: Penetrance Learning System
Solver
The SOLVEE essentially consists of soae heuristic graph
search procedure similar to A* [Nilson,1980 ]. Such a graph
search is used to select a node for expansion based on the
value of the corresponding evaluation function or heuristic.
Details of the particular search procedure used here are
given in figure .5. Expanding a state means generating the
15
next s t a t e s by applying a l l the re lavant opera tors . Here
OPEN and CLCSED are b i - d i r e c t i o n a l l inked l i s t s [Knuth,1968;
S tandish ,1980 ] . We assuae a h e u r i s t i c funct ion of the f o r a ,
H := bO -f b 1 . f 1 • b2. f2 • b3 . f3 • . . . • bn. fn ,
where, f 1 , f 2 , . . , f n are the values of a s e t of n chosen f e a
t u r e s and b 0 , t 1 , b 2 , . . , b n are the feature we ights .
AI. Place, the i n i t i a l s t a t e s on OPEN.
A2. I f OPEN i s eapty , e x i t with f a i l u r e . Otherwise cont inue .
A3. Transfer from OPEN to CLOSED s t a t e , s , such t h a t H(s) i s a in iaua over a l l s in OPEN (reso lve t i e s a r b i t r a r i l y but always in favour of a f i n a l s t a t e ) .
A4. I f s i s a f i n a l s t a t e , e x i t with the s o l u t i o n seguence obtained by t r a c i n g backwards through the po inters ( see A5). Otherwise cont inue .
A5. Expand s . I f there are no next s t a t e s go to A2. I f any next s t a t e i i s in OPEN cr CLOSED ignore i t . Otherwise compute H (i) and s e t up a po inter froa i to s .
A6. Go to A2.
Figure 5: Heuristic Search Procedure
For the heuristic to be defined we aust determine the
feature weights b0,b1,b2,-.,bn. To start with all the bi
values are set to zero and we consider the entire feature
16
space within the bounds of each feature for the particular
problem doaain (i.e., initially the feature space consists
of a single region). We then give the SCLVEB a set of prob
leas to solve resulting in a set of search trees. Since,
initially, H := C, the search trees are breadth-first as we
have a constant heuristic. From the set of search trees we
get a corresponding set of feature space points and we mark
those that lie on a solution path — this constitutes the
output of the SCL7EB.
Differentiator
The main function of the differentiator is to partition
the feature space into representative clusters(regions). The
output of the SCLVEB is used to calculate the penetrance in
each region,r of the feature space. Associated with each
penetrance value, p, is an error estimate, e (see appendix
A) . In practice the set of regions is stored as triples
(r,p,e) on a •blackboard* (a globally modifiable data struc
ture) and is thus accessable to different components of the
PLS. The DIFFEBENTIATOB systematically splits each region of
the feature space into two by inserting hyperplanes
(infinite dividing planes) parallel to the feature axes at
regular intervals and determining which split gives the
17
maximum difference in penetrance between the corresponding
two sub-regions . The process of s p l i t t i n g i s repeated re
curs ive ly unt i l the feature space cannot be d i f ferent iated
any further (see f igure 6) . The newly obtained region set
now replaces what was previously on the blackboard. This
d i f f e r e n t i a t i o n procedure i s s i a i l a r to the c luster ing of
large data s e t s for s t a t i s t i c a l analys is [2upan,1982], ex
cept that here the c lus ter ing i s dene in reverse.
Let r be a region in the cumulative region s e t .
Exhaustively in ser t hyperplanes paral le l to the feature axes.
While any hypeiplane boundaries reaain untried do Select a hyperplane creating two sub-regions r1 and r 2 . Find the penentances and error e s t i a a t e s for r1 and r2 . I f t h i s dichotoay gives a d is tance , d (see appendix A) greater than any previous, note the hyperplane.
Endwhile
If the best d was greater than some ainimum replace r by the corresponding r1 and r2 .
Repeat the above procedure for a l l regions in the cumulative region s e t unt i l no more s p l i t t i n g occurs.
Figure 6: Region Spl i t t ing Procedure
18
Repressor
Using the region set information as input the REGRESSOR
performs a multiple linear regression with the center
points of each region as the independent variable and the
penetrance as the response variable [Draper, 1981 ]. The cen
ter of each region is representative of that region. The
centroid could be used for better results. This regression
model is siaple and straightforward and was proved success
ful for the fifteen puzzle [Rendell, 1983]. Several effi
cient computerized statistical packages are available for
regression analysis. The present system uses the routine
RLSEP of the IKS Library [IMSL,1982]. Thus we obtain the
relative aerit of each chosen feature, giving the feature
weight vector components,bi. In practice it aay be useful to
use a tranforaation function before the regression. We use
In (p) instead of the penetrance,p as the response variable
as there tends to be very little difference in penetrance
values. For the heuristic we use,
fl := exp(b0 + b1.f1 • b2.f2 • ... • bn.fn).
The new feature weight values,bi, now replace the previous
values on the corresponding blackboard.
The PLS is thus given successive problea sets until the
feature space becoaes undiffrentiable and so cannot be
further partitioned. At this stage we expect the bi values
\
19
to becoae more or less constant. The PLS has then,
hopefully, •learned• a heuristic that will solve most prob
leas in the problem doaain under consideration, expanding
substantially fewer states than an exhaustive search would-
This hope is based on the probability that the chosen fea
tures are relevant towards evaluating the merits of a state
of the problem and that the systea is fairly stable. The
fundamental design aspect of the PLS is therefore to accumu
late knowledge iteratively by iaproving the active heuristic
between successive problea set solutions.
Penetrance Noraalization
Various aeasnres need to be taken to stabilize the per-
foraance of the PLS. Associated with the penetrance,p in
each region is an error estiaate,e which is used as a
weighting factor before the regression is perforaed. Thus
instead of just In(p) as the response variable we use
ln(p)/ln(e). For error estimates see Appendix A. Besides
this stability measure we also have to correct for bias in
penetrance values.
In practice, for non-trivial problem instances,
realizable search trees are either breadth-first for easier
problem instances or the result of using good heuristics.
20
for harder problem instances. But as better heuristics are
used in tree searching the localized penetrance within re
gions of the feature space tends to be biased upward rela
tive to the overall (true) penetrance of a breadth-first
search tree. A perfect heuristic would thus yield the larg
est possible penetrance of unity. In order to stabilize the
PLS performance we therefore need to standardize all pen
etrance values by removing this bias.
Penetrance normalization is done empirically as fol
lows. Let p be the actual localized penetrance and p* the
true penetrance in some region r of the feature space. Let
s = P/P'» If 1̂ gets split into several sub-regions and if ri
is one such sub-region with localized penetrance, pi, then
the normalized penetrance for ri is given by pi/si, where.
In (si) = ln(s) • (1-1/h) .b. (cr-ci) .
cr and ci are the centers of r and ri, respectively, and h
is obtained by a simple linear regression, assuming the mod-
el p» = P . This regression is performed using the set of
par- ent regions before they are differentiated. For unsplit
par- ent regions we just use p* = p^ as this is found to be
the general trend from values obtained in practice. The p»
values are those obtained from a breadth-first search
(initially), or from previous penetrance normalization
(subsequently). For split regions we have to account for
21
bias within a region and therefore an addit ional correction
i s appl ied. I t aay be noted that t . (cr-ci) i s the expected
logarithmic true penetrance difference between r and r i and
the factor 1/h converts th i s to a biased value, so that the
e n t i r e correct ion term counters the logar i tha ic bias due to
the h e u r i s t i c , B . There i s no perfect rat ionale however be
hind t h i s noraal izat ion procedure; but i t i s found tc work
wel l in pract i ce .
CHAPTER I I I
RESULTS
lapleaentation
A PLS as described in chapter II was iapleaented in the
C prograaaing language [Kernighan,1978; Whitesaiths,1983 ] on
a VAX-11/780 running VMS. This PLS could be used to devel
op a heuristic to search any group graph as explained in
chapter I. Although a significant amount of list processing
is involved, a procedural language like C is appropriate be
cause of the nature of the aatheaatical coaputations used.
There is also soae aotivation nowadays to use procedural
languages in AI research because they are acre universally
available and urderstood.
Perautaticns are internally represented as one-dimen
sional arrays [Knuth, 1968 ]. For exaaple, the permutation
0—>2, 2—>1, 1—>0, 3—>3 aay be conveniently represented
as 2 0 1 3, where the domain eleaents are implicitly repre
sented by the position(array index) of their corresponding
images. The data resulting from solving a set of problea
instances is stored in a large sorted array, which is sorted
using a shell sort,in order of feature vector values. Other
imleaentation details can be found in appendix C.
22
23
Experiaental Tests
Several t e s t s were run with the developed systea and
i t s perforaance was found to be generally s a t i s f a c t o r y . In
P^rticulsLT deta i led performance evaluation i s presented for
the fol lowing perautation group:
Generator Set :
A = 1 0 2 3 4 5
E = 1 2 3 4 5 0
It can be shown [Stone,1573] that this generator set gener
ates the symaetric group of degree six, and order 6!. This
particular exaaple was chosen for illustration as it is not
small enough to be too trivial and not too large to compare
the results obtained with corresponding breadth-first solu
tions. For exaiple, trial problea instances with the syaae-
tric group of degree 8 took several hours of CPU time for a
breadth-first search. Hence for extensive tests the above
problem doaain is most practical.
Four apparently relevant features were chosen(see table
1) . A total of 45 training problems (see appendix B) were
used. The initial iteration consisted of 10 problems and
thereafter 5 problems were used in each iteration. After
about 7 iterations the program had rejected two of the
features by setting their weights to zero. The feature
24
TABLE 1
Chosen Features
f1 - absolute difference of products of cycle lengths between a given state and the final state.
f2 = nuaber of aisplaced eleaents. f3 = sum of distances apart of egual elements
in a given state state and the final state. f4 - number of pair reversals between a given state
and the final state.
weights apparently converged to constant values after the
cumulative region set became undifferentiable. The results
of the 'learning* phase are suamarized in table 2.
TABLE 2
Learning Phase Besults
Iteration
1 2 3 4 5 6 7 8
number of regions
2 4 15 25 27 27 27 27
bO
-0.67 -0.78 -0.58 -1.86 -1.93 -5.24 -7.61 -7.54
Feature bi
0.0 0.05
-0.01 -0.32 0.0 0.48 0.67 0.64
heights b2
0.0 0.0 0.0 CO 0.0 0.0 0.0 0.0
b3
0.0 0.0 0.0 0.0 -0.33 -0.24 -0.29 -0.28
b4
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
25
Taking the aean of the l a s t two rows we obtain the heu
r i s t i c ,
H = exp (-7.575 + 0.655 f l - 0.285 f3) .
The so lver was then given 41 randomly se lected problem in
stances and on the aggregate about 14% fewer s t a t e s were ex
panded compared with breadth-f irst so lut ions of the same
problem ins tances . However, of these 4 1 problem ins tances ,
i f we consider only those 32 for which more than 100 s t a t e s
were expanded in the breadth-f irst s o l u t i o n s , the above heu
r i s t i c proved far superior (see table 3 ) . Approximately 4056
fewer s t a t e s were expanded with the learned heur i s t i c . The
mean so lu t ion length, however, was approximately three times
that for breadth- f irs t search. It aust be noted that a
breadth- f irs t search always re su l t s in the shortest so lut ion
sequence.
TABLE 3
Solving Phase Results
Nuaber of problem instances = 32.
Breadth-first Heuristic Search Search
States expanded (mean) 368.56 229.34 Solution length (aean) 11.13 31.38
26
[ R e n d e l l , 1 9 8 3 ] c l a i a s t h a t u s i n g a s i a i l a r PLS f o r the
wel l -known f i f t e e n p u z z l e the l e a r n e d h e u r i s t i c was found t o
be l o c a l l y o p t i a a l , b o t h , i n terms of t h e aean nuaber of
s t a t e s expanded and i n terms of the aean s o l u t i o n l e n g t h .
However, i n t h i s e x a a p l e , t h e l e a r n e d h e u r i s t i c was n o t e x
a c t l y l o c a l l y o p t i a a l a s can be seen f r o a t a b l e 4 — t h e
s a a e 32 problem i n s t a n c e s a s i n t a b l e 3 were used and the
f e a t u r e w e i g h t , b1 was perturbed s l i g h t l y in e i t h e r d i r e c
t i o n .
TABLE 4
T e s t f o r Local O p t i a a l i t y
S t a t e s Expanded S o l u t i o n Length (aean) (aean)
b i = b i (opt) 229 .34 3 1 , 3 8
b i = 1 .25 b1<opt) 232 .56 3 2 . 8 1 b i = 0 . 7 5 b 1 ( c p t ) 225 .03 3 4 . 9 4
b i = 2 . 0 b l ( o p t ) 314 .81 4 2 . 0 6 b'i = 0 . 5 b l ( o p t ) 234 .81 33-69
27
ConclusJrOns
Since the irethods used are largely eap ir ica l in nature
they are not subject to rigorous aatheaatical ana lys i s .
However, cer ta in calculated guesses can be aade based on e s -
t i a a t i c n s and experience. Perforaance evaluation can thus
be only on the basis of experiaent. Froa the experiaental
r e s u l t s obtained several conclusions aay be drawn. The cor
responding exhaustive search (breadth-first) has been used
as a yardstick in the perforaance evaluation.
Although heur i s t i c search was t e t t e r than a breadth-
f i r s t search in teras of the nuaber of s t a t e s expanded, the
r e s u l t s are r e l a t i v e l y not so good for problea instances
so lvable by the breadth-f irst search within 100 s t a t e expan
s i o n s . This i s probably due to the fact that the training
problea instances were arranged in order of the nuaber of
s t a t e s expanded by a breadth-f irst search. Hence, i f a so
lu t ion i s not obtained after a certain aaximua number of
s t a t e s are expanded by breadth-f irst search, we can then ea-
ploy a h e u r i s t i c search. This wi l l invariably be the case
for large problea doaains.
Although the so lut ion lengths are sub-optiaal t h i s i s
not a very serious drawback. A few aechanically discovered
generator r e l a t i o n s aay be used as patterns in order to
shorten the s o l u t i o n . The Knuth-Horris-Pratt pattern
28
matching algorithm [Standish,1980] could be used for this
purpose. Hence, in general, less restrictive heuristics
could be used [Eagchi,1983].
Although the learned heuristic was found to be optiaal
for the fifteen puzzle, this claim is not quite valid for
permutation group graphs. This is probably because of the
fact that the operators in the fifteen puzzle problea pro
duce relatively less perturbation in the configuration of
any given state. Hence there is greater inherent stability
in the fifteen puzzle. It aight be worthwhile to consider
the group generators in coaputing difference metrics as was
done fcr GPS [Ernst,1969 ].
Inspite of the measures taken to produce a convergent
result we cannot ensure convergence. The results obtained
depend on several factors like the specfic problem domain,
the features selected, the significance level of the regres
sion coefficients, and so on. But it at least succeeds in
eliminating a large amount of human trial and error in order
to obtain a heuristic.
BIBLIOGRAPHY
B a g c h i , A. and Mahanti , A. , Search Algor i thms Under D i f f e r e n t Kinds of H e u r i s t i c s - A comparat ive s t u d y . i -ACJ 3 0 , 1 , pp 1 - 2 1 , Gan 1983.
Draper , N.R. and Smith , H. , A l l i e d Begres s ion A n a l y s i s . John Wiley and S o n s , 1981.
E r n s t , G.W. and G o l d s t e i n , M.a . , Mechanical D i scovery of C l a s s e s of Problem-Solv ing S t r a t e g i e s , J.ACM 2 9 , 1 , pp 1 - 2 3 , Jan 1S82.
E r n s t , G.W. and N e w e l l , A . , GPS: A Case Study in G e n e r a l i t y and Problem S o l v i n g . Academic P r e s s , 1969 .
G e o r g e f f , M.P . , S t r a t e g i e s i n H e u r i s t i c S e a r c h , A r t i f i c i a l I n t e l l i g e n c e 20,pp 393 -425 , 1983.
H e r s t e i n , I . N . , Topics i n Algebra . B l a i s d e l l P u b l i s h i n g C o . , 1 9 6 4 .
H o p c r o f t , J . E . and Oilman, J . D . , I n t r o d u c t i o n t o Automata Theory, Languages and Computat ion, Addison-Wesley P u b l i s h i n g ^ C o T , 19797
IMSL, Library Be ference Manual. Edn 9, V o l . 4 , Ch. R, June 198 2 .
Kernighan, B.K. and R i t c h i e , D. M., Tjje C Programming Language, P r e n t i c e - H a l l , I n c , 1978.
Knuth, D . E . , The Art of Computer Programming. Vol . J : Fundamental A l s o r i t h n i s , Addison-Wesley , 1968.
L e n a t , D . B . , The Nature of H e u r i s t i c s , A r t i f i c i a l I n t e l U g e n c e 19, pp 189 -249 , Oct 1982.
N e w e l l , A. and Simon, H.A. , Computer S c i e n c e a s Empir i ca l Enguiry:Symbols and Search , Mind D e s i g n . Bradford Books P u b l i s h e r s , pp 3 5 - 6 6 , 1 9 8 1 .
N i l s s o n , N . J . , P r o b l e m - S o l v i n g Methods i n A r t i f i c i a l I n t e l l i g e n c e . " f l c Graw-Hi l l Bock C o . , 1 9 7 1 .
N i l s s o n , N . J . , P r i n c i p l e s of A r t i f i c i a l I n t e l l i g e n c e . Tioga P u b l i s h i n g C o . , 198^7~
29
30
R e n d e l l , L . , A New B a s i s for S t a t e - S p a c e l e a r n i n g Systems and a S u c c e s s f u l l a p l e m e n t a t i o n . A r t i f i c i a l I n t e l l i g e n c e 20 , pp 3 6 7 - 3 9 2 , 1983.
Sedgewick , R . , Permutat ion Generat ion Methods, Coaputing Survexs 9 , 2 , pp 1 3 7 - 1 6 4 , June 1977.
S h a p i r o , S .C . , Techniques of A r t i f i c i a l I n t e l l i g e n c e , Van Nostrand C o . , 1979.
Simon, H.A. , l e s s o n s for AI from Human P r o b l e m - S o l v i n g , Computer S c i e n c e Research R e i i § i # Department of Computer S c i e n c e , Carneg ie -Mel lon U n i v e r s i t y , 1980.
S l a g l e , J . R . , A r t i f i c i a l I n t e l l i g e n c e ; The H e u r i s t i c Programming Af j roach , Mc Graw-Hi l l Book C o . , 1971 .
S t a n d i s h , T - A . , D a t a s t r u c t u r e s T e c h n i g u e s . Addison-Wesley P u b l i s h i n g CoT7"l9 8 0 . * ~
S t o n e , H . S . , D i s c r e t e Mathematical S t r u c t u r e s and t h e i r A p p l i c a t i o n s , S c i e n c e Research A s s o c i a t e s , I n c . , 1973.
Toper , R.W., Fundamental S o l u t i o n s o f t h e Eight Queens Problem, BIT 2 2 , pp 4 2 - 5 2 , 1982 .
W h i t e s a i t h s , L t d . , C I n t e r f a c e Manual f o r VAX-11, March 1983 .
Wu, C. and Feng, T . , The U n i v e r s a l i t y of the S h u f f l e -Exchange Network, IEEE Trans , on Computers C-3Q,5, pp 3 2 4 - 3 3 2 , May 1S81.
Zupan, J . , C l u s t e r i n g of Large Data S e t s . Besearch S t u d i e s P r e s s , 198 27
APPENDIX A
EBBOR ESTIMATES
For the purpose of achieving stability in the PLS each
penetrances, p, is associated with an error factor, e. Hence
a region, r, is coded as the triple, (r,p,e), representing a
penetrance as saall as p/e or as large as pe.
As the criterion to split a region, r, into the sub-re
gions ri and r2 the distance, d, is given by,
d = In(p1/e1) - In(e2.e2),
where, pi, el are the penetrance and error estimate for ri,
and p2, e2 that for r2.
Whenever region splitting occurs, the sub-regions in
herit the error factor of the parent region. This is multi
plied by two other factors. The first is (1 ^ sqrt(g)yg).(1
• sqrt (t)/t), where g and t are the same as in the defini
tion of penetrance in chapter II. The second multiplying
factor is a quantity inversely proportional to the pene
trance, p, and is (1 > 1/sqrt(p)). These expressions are
based on the reasoning that the accuracy decreases as sample
size decreases.
Whenever regression is performed on the cumulative
region set each response variable value,In(p) is weighted by
Bultiplying by 1/ln (e) .
31
32
If pi and p2 are true penetrance estiaates of a
region,r, and el and e2 are the corresponding error factors,
the new estiaate, p, is obtained froa.
In p = (In pl/ln el 4 in p2/ln e2)/(1/ln el • 1/ln e2) ,
where. In e = 1/(1/ln el • 1/ln e2).
APPENDIX E
TRAINING PROBIEP! SETS
The following are the problea sets used in each itera
tion of the learning phase (only final states shown).
TABLE 5
Training Probleas
I t e r a t i o n 1
3 4 5 0 2 1 3 5 0 2 1 4 5 1 0 3 2 4 0 1 3 2 4 5 5 2 0 1 3 4
I t e r a t i o n 3
3 1 4 C 5 2 3 4 0 5 2 1 3 5 4 C 2 1 3 2 1 4 0 5 3 1 4 G 5 2
I t e r a t i o n 6
1 2 5 0 3 4 5 4 2 C 1 3 3 0 4 1 2 5 5 4 3 1 0 2 1 3 0 5 2 4
5 0 1 3 4 1 5 2 0 3 0 1 3 2 5 1 4 2 0 3 5 1 3 4 2
I t e r a t i o n
5 3 4 2 0 0 4 3 1 5 2 4 0 5 3 3 1 4 0 2 4 1 3 5 2
2 4 4 «;
0
jl
1 2 1 5 0
I t e r a t i o n 7
2 4 0 3 1 4 2 0 1 3 4 0 1 3 5 0 1 2 5 4 3 1 0 4 2
5 5 2 3 5
I t e r a t i o n 2
0 5 2 4 3 1 0 2 4 3 1 5 2 1 0 4 3 5 2 0 4 3 1 5 2 1 0 4 5 3
I t e r a t i o n 5
4 3 5 2 0 1 4 5 2 0 1 3 4 1 3 5 2 0 4 3 5 2 0 1 4 5 2 0 1 3
I t e r a t i o n 8
3 5 2 1 4 0 2 4 1 3 0 5 1 0 5 3 2 4 0 3 1 4 2 5 0 2 5 4 3 1
33
APPENDIX C
COMPUTER IBPLEBESTATION
The f o l l c w i n g i s t h e c o a p l e t e l i s t i n g of a c o a p u t e r im
p l e m e n t a t i o n of the PLS i n the C programming language
[ K e r n i g h a n , 1 9 7 8 ; W h i t e s m i t h s , 1 9 8 3 ] , with FORTRAN i n t e r f a c
i n g .
* HEADER DECLARATIONS ( p l s . h ) *
• i n c l u d e < s t d . h > • i n c l u d e < s t d t y p . h > • i n c l u d e < p a s c a l . h > • i n c l u d e <vms.h> • i n c l u d e < s t d i c . h >
• d e f i n e MAXDEG 10 • d e f i n e MAXGEN 10 • d e f i n e MAXPOIST 20000 • d e f i n e MAXNODE 2001 • d e f i n e MAXCOEF 5 • d e f i n e YES 1 • d e f i n e NO 0
struct node { int state[HAXDEG]; float fvalue; int ident; int gen; struct node *parent; struct node *lnode; struct node *rnode;
};
struct gener { int state[HA](DEG]; char symbol;
};
struct cell { int lo[nAXCCEE];
/* permutation array */ /* state fvalue •/ /• node number •/ /* generator index •/ /* points to parent */ /* points to left neighbour */ /* points to right neighbour •/
34
35
}
int hi[MAXCCEF]; float cp[HAXCC£F]; float p,e,ep,s; struct cell •link;
* PENETRANCE LEARNING SYSTEM *
•include "pls.h»»
main () {
s t a t i c s t r u c t c e l l •head; s t a t i c s t r u c t gener gense t [ MAXGEN ] ; s t a t i c i n t goal[MAXDEG],pointsCHAXPOINTX HAXCOEF]; s t a t i c i n t ndeg ,ngens ,nprob ,pmax ,n i t er ,nreg , i ; s t a t i c f l o a t b[MAXCOEF]; FILE • f o p e n O , • f p , ^ f s , * f b , ^ f r, • f d;
fp = fopen ("dra1:Xwkg24]plsp. dat","r") ; f s = s t d c u t ;
g e t p r o b ( f p , f s , g e n s e t , S n d e g , e n g e n s , & n p r o b , S u i t e r ) ; f pr in t f ( fs ,"main: i t e r a t i o n ^ !?d\n",niter) ;
fb = fopen (»»dra1:[wkg24 ]plsb .dat" ,"r") ; getb (fb,*«l€",b,flAXCOEF) ; f c l o s e (fb) ;
pmax=0; whi le (nproh— > 0) { ge ta ( fp ,"^d " , g o a l , n d e g ) ; s o l v e r ( g e n s e t , g o a l , points[ paax] ,6paax ,ndeg ,ngens ,b , f s) ; }
fr = fopen ("dra1:[wkg24]plsr.dat","r") ; head = getcell(fr,MAXCOEF,5nreg); clust(head,t,Snreg,points,MAXCOEF,pmax,niter) ; f close (f r) :
fr = fopen (**dra1:[wkg24]plsr.dat»*,"w") ; f printf (fr,"1!id\n",nreg) ; putcell(fr,head,MAXCOEF); f close (f r) ;
fb = fopen (••dra1:£wkg24]plsb.dat","w") ; regres(h€ad,b,MAXCOEF,nreg) ; putb(fb,"5r6.2f ",b,MAXCOEF) ;
36
}
fclose(ft); putb(fs,"5?8.2f ",b, MAXCOEF) fclose(fs);
* SOLVER FUNCTIONS * #41 ;^4>4c««4(i|c^«*;*4 4 : » 4 c * # 4 ' 4 ( * 4 c 4 c 4 i 4 ( * 4 t 4 c 4 i « 4 * « « « 4 i * 4 ' « « « • • • • • • • • • • • ' • • • • • /
s o l v e r ( g s e t , g c a l , v,pmax, ndeg, ngen, b,fp) s t r u c t gener g s e t [ ] ; i n t goa l [ ] , v [ ][HAXCCEF],*paax; i n t ndeg,ngen; f l o a t b [ ] ; FILE • f p ;
{ ex tern char * a l l o t b u f ; s t r u c t node * o p e n , • c l o s e d , • p , ^ g ; s t r u c t node • s e t l i s ( ) , • s e l e c t () , ^ o n l i s ( ) ; i n t ns[MAXEEG],f[ MAXCOEF ],compa () ; i n t 1 , j , f cund,memo,pid ,nopen,nc losed ,nregen; f l o a t fl 0 , f va l ; f pr in t f (f p," ^\n") ; f p r i n t f ( fp ,"So lver : goal = »*) ; puta(fp,"! lc " ,goa l ,ndeg) ; nopen = nc losed = nregen = 0; pid = 0; / • i n i t i a l i z e • / open = s e t l i s O ; / • s e t - u p l i s t s • / c l o s e d = s e t l i s () ; f or ( i = 0 ; i < ngen; i++) { / • put generators on open • /
f v a l = H ( f , h , g s € t [ i ] . s t a t e , g o a l , n d e g ) ; for (j=1; j < MAXCOEF; j**)
^CFi3]Cj] = ^1335 / * copy • * / f ixncde (open, gset[i ].state,ndeg,fval,pid,i,NULL) ; pid**; nopen**;
} found = NO; aeao = YES; whi le( (opeB->rnode != open) S5 found==NC 85 Beao==YES) {
p = s e l e c t (open); / • choose node with ain fva lue • / t r a n s f e r (p->lnode ,c losed) ; nopen—; nc losed**; f o r ( i = 0 ; i < ngen; i**) {
prod ( n s , p - > s t a t e , g s e t [ i ] . s t a t e , n d e g ) ; i f ( o n l i s (open,ns,ndeg) | | o n l i s ( c lo sed ,ns ,ndeg) )
nregen**; / • ignore regenerated nodes • / e l s e { fval = H(f,b,ns,goal,ndeg): q = fiinode(open->lnode,ns,adeg^fval,pid,i,p) ;
37
if (q == NULL |1 pid = MAXNCDE) [ fprintf(fp,"solver: memory exhausted n"); memo = NO; break;
} f o r ( j = 1 ; j < MAXCOEF; j**)
• C F i d ] [ j ] = f [ j ] ; / • copy ! • / p i d * * ; nopen**; if(compa(goal,g->state,nd€g) == 0) {
•pmax *= pid; found = YES; fprintf (fp, "Solution (reversed) = *•) ; putS€q(fp,q,gset,v) ; break;
} } } } if (found == NO) fprintf(fp," nsolution not foundXn") ;
fprintf (f p,"States:E=%d 0=5(d R=«d\n", nclosed,nopen,nregen) ;
freeit (open) ; } /*.-... •/
getprob ( f i , f o , g s , d g , n g , n p , n i ) FILE • f i , ^ f o ; s t r u c t {
i n t state[MAXDEG]; char symbol;
} *gs ; i a t •dg ,^ng ,^np ,^ni ; {
i n t i ; f s canf (fi,'»ird %d %d %d n", dg ,ng ,np ,n i ) ; f o r ( i = 0 ; i < •ng; i**) {
f s c a n f ( f !,"%•€ %c %*c %*c !l^c",figs[ i ] . symbol) ; geta(f i ,"%d " , g s [ i ] . s t a t e , •dg) ;
}
fprintf (fc,"Generator Set: \n") ; for(i=0; i < •ng; i**) [
fprintf(fo," %c = ",gs[i].symbol); puta(fo,"^d ",gs[i].state,•dg);
} } /* V
38
float H(f,b,s,g,n) /• heuristic evaluation function •/ int f[ ],s[ ],g[ ],n; /• using 5 features •/ float b[ ]; { IMPORT DOUBLE €xp(); i n t c y c l p o ,nmis() , s d i s t Q , sd i f () ,npairs () , i ; f l o a t value;
f [ 0 ] = 1.0; / • constant feature • / f [ 1 ] = abs ( cyc lp ( s ,n ) -cyc lp (g ,n ) ) ; / • cycle difference • / f [ 2 ] = n m i s ( s , g , n ) ; / • mismatches • / f [ 3 ] = s d i s t ( s , g , n ) ; / • sum of distances • / f [ 4 ] = npairs ( s ,g ,n ) ; / * pair reversals • / value = 0; for ( i=0; i < MAXCOEF; i**)
value *= b [ i ] ^ f [ i ] ; return (exp (value)) ; }
y^:/li^*iti**iliHc:^**********:i^**iH^:^li*^L***** * * * * * * * * * * * * * * * * * * * * * * * * * *
* REGION SET DIFFRENTIATION & PEliETRANCE NORMALIZATION • ^^it:Hi*:tt * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
c lus t (head ,b ,nr ,v ,m,n ,n i ) / • c lus ters feature space • / s t r u c t c e l l •head; / • by d i f ferent ia t ion • / f l o a t b[ ] ; i n t •nr,v[ ][HAXCOEF],o,n,ni; { s truct c e l l • r ; f l o a t h,power(); i n t i ;
pr intf ("clust: \o") ; s o r t a ( v , a , n ) ; i f (ni > 1)
h=power(head,^nr,v,m,n); SDlit(head->link,v,ff l ,n); i f (ni > 1 )
norm (head,b,h ,v, a,n) ; *nr = 0 ; / • s i z e of new region se t • / r=head->link; whi le(r != NULI) {
r = r ->l ink; (•nr) **;
} } / • * /
39
s p l i t ( r e g i o n , v , a , n ) / • s p l i t s r e g i o n s e t • / s t r u c t c e l l • r e g i o n ; i n t v[ ][MAXC0EI3, a, n; [
IMPORT DOUBLE s g r t () ; s t r u c t c e l l • r ; i n t i , a x i s , d i v , n d i v ; i n t lo[MAXCOEF],hi[f!AXCOEF],lo IHAXCCEF3,hi [MAXCOEF]; f l o a t p [ 2 ] , p _ [ 2 ] , e [ 2 ] , e ^ [ 2 ] , d , d _ , d i s t ( ) ;
c o p y a ( l o _ , r e g i o n - > l o , a ) ; / * i n i t i a l i z e b e s t s p l i t i n f o • / c o p y a ( h i , r e g i o n - > h i , a ) ; d_ = - 1 0 0 0 0 0 0 . C O ; f o r ( a x i s = 1 ; a x i s < a; a x i s * * ) { / • t r y a l l f e a t u r e a x e s • /
c o p y a ( l o , r e g i o n - > l o , B ) ; copya ( h i , r e g i o n - > h i , a ) ; nd iv= r e g i o n - > h i [ a x i s ] - r e g i o n - > l o [ a x i s ] ; f o r ( d i v = 1 ; d i v < n d i v ; div**) { / • t r y a l l h y p e r p l a n e s • /
l o [ a x i s ] * * ; / • on t h i s a x i s • / h i [ a x i s ] = l o [ a x i s ] ; pene t (&p[C] ,8e [0 ] , v , r e g i o n - > l o , h i , a, n) ; penet (8p[ 1 ] , S e [ 1 ] , v , l o , r e g i o n - > h i , a ,n) ; d = d i s t (p ,e ) ; i f (d > d ) { / • save b e s t s p l i t so f a r • /
d_ = d; copya (lc_,lo,B) ; copya (hi ,hi,a) ; for(i=G;~i<2; i**) {
P^Ci]=P[i3; e.[i]=e[i]; }
} }
}
3
if(d > -C.2) ( /• create two sub-regions •/ r="(struct cell •) allot (sizeof (struct cell)) ; copya (r->lo,lo_,B) ; copya (r->hi,region->hi,m); copya (r->cp,region->cp, m) ; r->p = p j 1 ] : jc->€ = e_[ 1]+(region->ep)^(1*1/sgrt(50.0^p_[1])) r->ep = region->ep; r->s = region->s; r->l ink = region->link; copya (region->hi,hi_,m) ; region-)p = p_£0]; region->€ = e_[0] • (region->ep)
• (1*1/sqrt(5C.0*p_[0])) ; region->link = r;
40
i f (d_ > C) s p l i t (region,v,m,n);
e l s e i f {(region = region->link) != NULL)
s p l i t (region,V,a,n); } / * . • V
s t ruc t c e l l • / • inputs region se t • / g e t c e l l ( f p , a , n ) FILE *fp; i n t a, •n; [
struct cell •r, •t, •head; int i,j;
f scanf (f p,*'1d " ,n) ; r=head=(struct cell •) allot(sizeof(struct cell)) ; for(i=0; i < ^n; i**) {
t= ( s t ruc t c e l l •) a l l o t ( s i z e o f ( s t r u c t c e l l ) ) ; g e t a ( f p , "^d ", t ->lo ,m); g e t a ( f p , "^d ", t - > h i , a ) ; for ( j=0; j<a; j**)
t - > c p [ j ] = 0.5^(t->loCJ] * t - > h i [ j ] ) ; fscanf (f p, "5SeJ{e", 8t->p, 8t->e) ; t->ep = t ->e; r->l ink = t ; t ->l ink = NULL; r=t;
} return (head);
} / *
p u t c e l l (fp,head,a) / • outputs region set • / FILE • f p ; s truct c e l l •head; i n t a; {
s truct c e l l • t ; t=head->link; while (t > SULL) [
puta(fp, "%d ", t -> lo ,a ) ; puta( fp , "̂ d̂ ", t->hi,m) ; f p r i n t f ( f p , "«12.9f 5512.9f\n", t ->p , t->e) ; t= t -> l ink;
}
= /
41
/* V
p e n e t ( p , e , v , l c , h i , m , n ) / • f i n d s p e n e t r a n c e = p , e rror=e • / f l o a t • p , ^ e ; / • wi th p o i n t s = v [ n ] [ m ] , b c u n d s = l o , h i • / i n t v[ ] £ H A X C 0 1 F : , 1 O [ ] , h i £ ] , a , n ;
IMPORT DCOELE s g r t ( ) ; i n t i , t i n a r y ( ) , c o B p a ( ) ; f l o a t g , t ;
g = t = 0 . 0 ; i = 0 ;
w h i l e ( c o a p a ( S l o [ 1 ] , S v [ i ] [ 1 ] , a - 1 ) > 0) i * * ;
w h i l e { c o B i a ( S h i [ 1 ] , S v £ i ] [ 1 ] ,B-1) > 0 8S i < n) { i^ ( • [ i l E O ] ) g *= 1 .0; / • i n c good count • / t *= 1.C; / • i n c t o t a l count • / !+•;
)
• p = (t>O.C S8 g > 0 . 0 ? g / t : 1 . 0 e - 5 ) ; / • pene trance • / • e = (g>O.C ? 1 . 0 * 1 . 0 / s q r t ( g ) : 100 .0)
• (t>O.C ? 1 . 0 * 1 . 0 / s q r t (t) : 1 0 0 . 0 ) ; / • e r r o r • / } / * . . . • V
f l o a t d i s t ( p , e ) / • d i s t a n c e between s u b - r e g i o n s • / f l o a t p [ ] , e [ ] ; {
II! PORT DC U EL I In () ; i n t i , j ;
i = 0 ; j = 1 ; i f ( P [ 0 ] < p [ 1 ] ) {
j=0 ; i = 1 ;
r e t u r n (In < p [ i ] / e [ i ] ) - In (p[ j ]^e[ j ] ) ) ; } / • . . . . V
float power(head,nr,v,m,n) /• search power 8 •/ struct cell •head; /• uncorrected search factor •/ int v[ ]£MAXCOEF',nr,B,n; C
IMPORT DCOELE ln() ; struct cell •r; float p,e,b,xy£500]; int i;
printf ("power: \E") ;
42
r=head->l ink; i = 0 ; while (r > 5ULL) {
penet (6p ,Se , v , r - > l o , r - > h i , a, n) ; r ->s = p/r->p; x y £ i ] = l n ( p ) ; x y £ i * n r : = l n ( r - > p ) ; i**; r = r - > l i n k ;
} r s l f o r (5b,xy,8nr) ; return(b>0 ? b : 1 e - 5 ) ;
/* V nora (head,b,h,v,a,n) /• fine penetrance normalization •/ struct cell •head; /• of split regions •/ float b[ ], h; int m; {
IMPORT DCOELE exp () , l n () , s g r () , sqrt () ; s t r u c t c e l l • r ; i n t i , c o i n ( ) ; f l o a t p , e , e € , f a c t ;
p r i n t f ( " n o r a : \ n " ) ; r = head->l ink; whi le (r != NULL) {
i f ( c o i n ( r - > l o , r - > h i , r - > c p , m ) ) { / • u n s p l i t region • / penet (8p,Se,v,r->lo,r->hi,m,n) ; p = €xp(h^ln (p)) ; ee = 1.0/sqr (ln(r->e)) * 1.0/sgr (ln(e)) ; r->p = exp((ln (r->p)/sgr (ln(r->e))
* In (p)/sqr(ln(e)))/ee) ; r->e = exp (sgrt(1.0/ee)) ;
else { /• region was split •/ fact = 0; for (1=1; i < B; i**)
fact *= b[i]^(r->cp£i] - 0.5^(r->hi[i] * r->lo[i]));
r->p /= exp ((ln(r->s) * (1.0 - 1.0/h) • fact)); if (r->p < 1e-5) r->p = 1e-5;
} r = r->link; /• next region •/
43
/ • - V
int coin (x,y,c,i) /• is c centre of region (x,y) ? •/ lot x[ ],y[ ],m; float c£ ]; {
int i;
i=1; while{i<B 88 ( a b s ( c [ i ] - (x [ i ]*y[ i ] ) •O.S) < 1e-5) )
1 * * ; return (i==i ? 1 : 0);
/ * , _ • /
regres (head,b,B,nr) / • au l t ip le l inear regression • / s t ruct c e l l •head; f l o a t b[ ] ; iftt m,nr; C
IMPORT DOOELE In () ; s t r u c t c e l l * r ; f l o a t x y [ 1 C 0 0 ] , b x y [ 1 0 0 ] ; i n t i , j ;
p r i n t f ( " r e g r e s : n") ; r = h e a d - > l i E k ; i = 0 ; w h i l e (r > HULL) {
f o r ( j = 1 ; j < a ; j * * ) ^ y f i • ( j - 1 ) * n r ] = 0 . 5 ^ ( r - > l o [ j ] * r - > h i [ j ] ) ;
x y [ i * (a-1) • n r ] = l n ( r - > p ) / l n (r->e) ;
r = r - > l i D k ; } r a l f o r ( b , b x y , x y , 8 m , 6 n r ) ;
}
y*****it***ili**** ***********************************^*4^^^^^
• LIST MANAGEMENT FUNTIONS * ****************************************** ************^Hiy
struct node • /• sets up list with header node •/ setlis 0 {
struct node •p;
p = (struct node •) allot(sizeof(struct node));
44
} / * .
p->lnode = p->rnode = p; return (p) ;
y transfer(p ,q) / • transfers node after p to after q • / s truct node • p , •q; {
s t r u c t node • r ; i f (p->rnod€ == p)
return (-1); r = p->rncde; r->rnode->lnode = p; p->rnode = r->rnode; i n s e r t (q,r) ;
} / * • /
inser t (p,q) / • i n s e r t s node q after node p • / s t r u c t node •p,«q; / • in a l i s t of nodes • / {
q->rnode = p->rnode; q->lnode = p->rnode->lnode; p->mode = p->rnode->lnode = g;
} / * . . . - . . - . . . . . • / s truct node • / • creates new node and • / fixnode ( g n , s , n , f , o , g , p ) / • returns pointer to i t • / s t ruc t node •qn; / * pointer to predecessor node • / i n t s[ ] , n; / • permutation of degree n • / f l o a t f; / • fvalue of node • / i n t o; / • n o d e number • / i n t g; / • generator index • / s t ruc t node •p ; / • pointer to parent node • / {
s t r u c t node •pn;
pn = (s truct node •) a l l o t ( s i z e o f ( s t r u c t node)) i f (pn == KOLL) return (NULL) ; copya(pn->state,s,n); pn->fvalue = f; pn->ident = o; pn->gen = g; pn->parent = p; insert (qn,f n) ; return (pn);
45
} /• V
struct node • onlis(qn,x,n) /• returns pointer to node with •/ struct node •qn; /• state x on list pointed to by qn •/ int x[ ],n; [
struct node •p;
f or (p=qn->rnode; p != qn; p=p->mode) i f (coapa ( p - > s t a t e , x , n ) == 0)
return (p) ; return(NOII) ;
} / • V
putseq ( f p , s , g , v ) / • pr in t s s o l u t i o n seguence • / FILE • f p ; / • from node s backwards • / s t r u c t node • s ; s t r u c t gener • g ; i n t v£ ][ MAXCOEF :; {
s t r u c t node • p ; i n t l e n ;
l e n = 0; f o r ( p = s ; p 1= NULL; p=p->parent) {
l e n * * ; fpr in t f ( fp ,"5c" ,g [F->gen] . symbol ) ; v [ p - > i d e n t ] [ 0 ] = 1 ; / • good node • /
f p r i n t f ( f p , " nso lu t ion length = *«d\n",len); }
/ * - • • • V
s t r u c t node • ^. ^ ^ .^v • ^ , *^ s e l e c t (qn) / * s e l e c t s f i r s t node with mm fvalue • / s t r u c t node^ qn; / • from queue gn • / i
s t r u c t node • p , •pmm; f l o a t minf; p = qn->rnode; a i n f = p->fvalu€; for (pa in=p; p != qn; p=p->rnode)
i f ( p - > f v a l u e < ainf) { ainf = p->fvalu€; pain = p;
46
} return (pain);
} /*.. V •define NULL 0 /• pointer value for error report •/ •define ALL0TSI2E 4000000 /• size of available space •/
static char allctbuf[ALLCTSIZB] = {»0*); /• allot storage •/ static char •allotp = allotbuf; /• next free position •/
char •allot(n) /• returns pointer to n bytes •/ int n; /• general byte storage allocator •/ {
if (allctp • n <= allotbuf * ALLOTSIZE) { /• fits •/ allotp *= n; return(allotp - n); /• old p •/
} else return (NOLL) ;
} /*.......... •/
freeit (p) /• free storage pointed to by p •/ struct node •p; {
if (p >= allctbuf 85 p < allotbuf * ALLOTSIZE) allotp = p;
} ^ ^ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* ARRAY MANIPULATICNS • ^***********************************************^*******/
prod(p,x,y,n) /• product of two perautations •/ int p[ ], x[], y£], n; {
int i;
for(i=0 ; i < n; i**)
P[i] = i C ^ E i ] ] ;
^. V i n t c o a p a ( x , y , n ) / • returns <0 i f x<y, • / i n t x[ ] , y [ ] , n; / • 0 i f x=y, >0 i f x>y • / {
int i; i=0;
47
while ( x [ i ] = y [ i ] 85 i<n) i**;
return(i==n ? 0 : x [ i ] - y [ i ] ) ; } / * • • • • - - - . . . V copya(x,y,n) / • copy array y to array x • / i a t x£ ] , y [ ] , n; C
int i ;
for( i=0; i<n; i**) ^ i ] = l £ i ] ;
} / * - . - . . . . . • /
geta(fp,ffflt,x,n) / • inputs array • / FILE •fp; char • fat ; int x[ ] , n; {
in t i ; for (1=0; i < n; i**) {
fscanf (fp,fat ,Sx[i]) ; }
} / * . V
puta (fp,fmt,x,D) /• outputs array •/ FILE •fp; char •fat; int x£ ], n; {
in t i ; for(i=0; i < n; i**)
fprintf (fp,fmt,x[i]) ; f printf (fp,"\n") ;
3 / * . - . . ^ V getb (fp,fBt,x,n) / • inputs f loat array • / FILE •fp; char • fa t ; f loa t x[ ] ; int n; {
int i ;
48
for(i=0; i < n; i**) { fscanf (fp,fat,8x[i]) ;
3
putb FILE c h a r
( f p , f a t , x * f p ; • f a t ;
f l o a t x[ ] ; i n t n; {
i n t i ;
,n)
• /
/• outputs float array •/
3 / *
f or ( i= 0 ; i < n; i**) fpr int f ( f p , f a t , x [ i ] ) ;
f printf (fp,"\n") ;
* /
/• orbit lengths product of •/ /• perautation x of degree n •/
i n t cyc lp(x ,n) i n t x£ ] , n; { i n t prod,count,i,3,teap,y[MAXDEG]; f o r ( i = 0 ; i < n; i**)
y [ i ] = x [ i ] ; prod = 1; f o r ( i = 0 ; i < n; i**) {
ifCyCi] < C) continue; count = 1; j = i ; while ( y [ j ] != i) {
teap = y £ j ] ; yCj] = - 1 ; j = teap; count**;
3 yC j ] = - n prod •= count;
3 return (prod) ; 3 / *
* /
i n t nmis(jc,y,n) / • nuaber of misaatches of arrays x and y • / i n t x£ ] , y[ ] , n; { i n t c o u n t , i ;
49
CO un t = 0 ; f o r ( i = 0 ; i < n; i**)
i f ( x [ i 3 •= y£ i ] ) count**; return (count) ; 3 / * * /
i n t s d i s t ( x , y , n ) i n t x£ ] ,y[ 3 ,n; £ i n t c o u n t , i , j ;
/ • SUB of d i s tances • / / * between equal elements • /
count=0; f o r ( 1 = 0 ; i < n; i**) {
j = 0 ; w h i l e ( j < n '65 x [ i ] ! =
count *= a b s ( i - j ) ; 3 re turn (count) ; 3 / * . . . . . . . . . . . - . . . . . . ,
y l j ] )
V
i n t npairs ( x , y , n ) i n t x[ ] , y£ ] , n; { i n t c o u n t , i , j ;
/ • number of pair r e v e r s a l s • /
count=0; f o r ( i = 0 ; i < n; i**)
for ( j = i * 1 ; j < n ; j**) i f ( x [ i ] = = y [ j ] 88 y [ i ] = = x [ j ] )
count**; return (count) ; 3 / * . . . . . . . • * /
s o r t a ( v , B , n ) / • s h e l l s o r t v£0 ] . . .v£n-1 ] in i n c order • / i n t v£ ]£flAXCOEF', m, n; {
i n t g a p , t e a p , i , j ,k ,compa() ;
for (gap = n /2; gap > 0; gap / = 2) f o r ( i = gap; i < n; i**)
f o r ( j = i - g a p ; j>=0; j-=gap) { i f (compa(fiv[ j ] £ 1 ] , 8 v [ j * g a p ] [ 1],m-1) <= 0)
break; f cr (k=0; k<m; k**) {
50
3
temp = v£j ]£k3; • [ J l C k ] = v£j*gap]£k] ; vCJ*gap3£k] = temp;
3 / * • /
51
The following are FORTRAN subroutines used for regression.
Q*:^* *:ltt ***************** *******************i^* ***********
C BEGRESSION ROUTINES • Q*********i^i^** ************ ^*^:tiit:^:$^i:^^i^^^^^^:^i^i^im^^jli^lli^**
SUBROUTINE RSLFOR (B,XY,N) C SIMPLE LINEAR REGRESSION
INTEGER N REAL XT(N,2) , B INTEGER IX,IMOD,IPRED,IP,NN,IER REAL ALEAP(3) ,DES(5) ,AN0VA(14) ,STAT(9) ,PflED(1,7)
IX = N IMOD = 1 IPRED = 0 ALBAP (1) = 0.05 IP=1 CALL RLCHE(XY,IX,N,IMOD,IPRED,ALBAP,DES,
• ANOVA,STAT,PBED,IP,NN,IEB) B = STAT (1) RETURN END
SUBROUTINE RMLFOR (B,XYB, XY, M, N) FORWARD STEPWISE MULTIPLE LINEAR REGRESSION
INTEGER REAL INTEGER REAL
B,N E(H) ,XYB(M,5),XY(N,M) IX,IJ0B(2) ,IND(9) ,IB,IER ALFA(2) ,AN0VA(16) ,VABB(15)
MX = H-1 IX = N ALFA(1) ALFA (2) IJOB(1) IJ0B(2) IB = H IN0(1) IND(2) IND (3) IND(4)
0.1 0.15 0
= 1
= 1 = 0 = 0 = 0
CALL RLSEP (XY,N,MX,IX,ALFA,iaOB, IND,AHOVA,XYB,IB,VARB,IER)
B(1) = XTE(M,2)
52
10
K=l!-1 DO 10 1=1,K B ( I * 1 ) = XYB(I ,2 ) RETURN END