Download - BayesiaLab Satisfaction Poll Analysis
Data analysis –
satisfaction poll
In this part we
present how to
define global
satisfaction and how
to see all
interactions between
variables.
Data is contained in
text file (CSV).
There is a title line
The separator
is a semicolon
The import
wizard
automatically
detects the file
separators and
title line.
The first column is
an identifier. Since
this information is
not useful for
analysis, the
column becomes
grey: it is unused.
The file contains
missing data. The
average value of
present data shall
replace any
missing value in
the considered
column.
Data information is displayed
here. 711 poll responses are
gathered in this dataset.
Discretizing
continuous values
Variables represent evaluation
marks from 1 to 10. Manual
discretization allows showing
repartition function of the
selected continuous variable.
Generate a
discretization
with equal
distances with
three intervals
leads to this
graph.
Since the
discretization is
adequate, it can be
applied to all
variables
For transferring the
discretization mode
to other variables Ctrl + A for applying
discretization to all
variables.
The Bayesian
network is
created with one
node per column.
For characterizing
global satisfaction,
the first step is to
use the search
function for
finding
“Satisfaction”
node.
The search function
* and % can be used for
simplifying search
Clicking on the
line causes the
node to blink.
This node is the
target variable of
the analysis. We
are interested in
the >7 satisfaction
value.
The augmented
Markov blanket
shall be used for
characterizing the
target variable. It
allows to find the
minimal set of
variables that
characterize
global satisfaction.
Zoom in and out
tools are available
for better graph
visualization.
Force directed
layout positioning
algorithm allows
organizing the
nodes on the
workspace
While switching
to validation
mode, note that
only 15 nodes
among 215 are
selected relevant
by the network
For highlighting
important
relationships
between
variables, the
force of the arcs
tools shall be
used.
An arc’s thickness
is proportional to
its relevance with
regards to target
variable. SE1
variable is the
most important
for global
satisfaction
Unconnected nodes
become transparent.
BayesiaLab can
generate reports.
SE1 node is in first
position : it is the
most important
variable of this
analysis.
The probabilistic
profile of polls
presenting a
global satisfaction
mark >=7 is also
reported.
After closing the
report, note that it
is possible to
monitor all
correlations
between variables
by right clicking in
the right side of
the screen.
The monitors
display the
probability
distribution and
permit changing
the variables
values.
Target variable has
red background.
As the most important,
SE1 variable appears in
first position.
Monitors can be
used for finding
the probabilistic
profile of polls
presenting high
satisfaction mark.
When clicking on this modality,
the probabilities are
propagated throughout the
network. The probabilistic
profile becomes readable.
The same
technique can be
applied to other
modalities and
variables. The
results are
automatically
propagated to the
remaining
variables.
Poor SE1 mark is
reported on all monitors.
After target
variable
characterization,
the second part of
this tutorial
explores the
relationship
between all
variables of the
poll.
In modelization
mode, delete all arcs.
The SopLEQ
algorithm is
appropriate for
discovering
associations
between
variables.
After some
computational
time, SopLEQ
learning finds a
complex network.
By using
positioning and
zoom tools, the
graph becomes
more reader-
friendly.
In this case, where
the graph is large
but with average
connectivity,
symmetric
positioning is
adequate.
For increasing
network
readability, a
comments
dictionary can be
linked with the
graph. In this file,
the name of each
node is completed
with comments.
When done, hints
indicate that the
node has
comments.
Clicking this button displays
or disables comments for
selected nodes
A modality
dictionary can
also be
interactively
designed. This can
be done by double
clicking on a node
and opening
“modality name”
sheet
Give a name to
each modality
Once the
modalities labels
are validated, the
dictionary can be
exported as a text
file
The file is defined
only for SK5 node.
#Wed Oct 11 14:28:27 CEST 2006
SK5.<\=7=Average
SK5.<\=4=Poor
SK5.>7=Very good
By a simple
modification, it
becomes valid for
all nodes of the
graph.
#Wed Oct 11 14:28:27 CEST 2006
<\=7=Average
<\=4=Poor
>7=Very good
The dictionary can
now be associated
back to all nodes
of the graph
The monitors
from the
validation mode
become easier to
read.
The same process
can be applied for
attributing values
to modalities and
generating
modality values
dictionary.
This is done in
modelization
mode, by double
clicking a node
and opening the
“values” sheet.
When the
modality is poor,
it marks 0 points,
10 points for
average and 20
points for very
good
The same process
consisting of
exporting the
dictionary,
modifying the text
file and importing
back can be
applied for
attributing values
to all nodes
modalities
The total and average
values of the graph
modalities are calculated
The values are also
computed depending on
the probability distribution.
Every question is
related to a theme.
For instance, this
pool has 36
themes. The class
concept in
BayesiaLab is
useful for
associating
themes to nodes.
The themes
dictionary is
contained in a text
file.
By clicking on the
new-appeared
icon on the
bottom right of
the window, the
class editor opens.
It becomes
possible to apply
modifications to
classes instead of
applying to nodes
Opens the class editor
The readability
can be increased
by applying
automatic class
colours. This is
done by selecting
all the classes
with <ctrl + a>
and clicking the
“color” button.
Note that nodes
are globally
gathered by
colour. This
provides useful
information about
links inter and
intra-theme. In
this case, this also
denotes a well-
designed poll.
When closing the “Edit
classes” window, the
nodes become coloured
depending on their class.
The comments are
also coloured
depending on the
class.
A “colours
dictionary” can
also be saved as a
text file.
In this example,
themes have been
created base on
expert knowledge.
Nevertheless,
BayesiaLab
provides tools for
automatic theme
design by
grouping
semantically close
variables.
In validation mode, the
variable clustering is based
on association rules
discovering in the network.
Since the
clustering is
applied, new
colours are
applied to nodes.
BayesiaLab identified
48 nodes groups.
Moving this cursor forces
the number of groups.
The nodes colours are
also changed.
There are two
other new icons in
the clustering
toolbar.
Exiting the clustering modeThis is for validating
the current clustering
BayesiaLab is able
to build latent
variables
according to the
recently realized
clustering.
When validating, a
confirmation is asked.
In modelization
mode, the
multiple
clustering allows
clustering
individuals from
each single
variable group.
This wizard tunes
the multiple
clusterings
realized. (one per
identifier cluster).
Data is saved in this directory
Specifying the
number of
classes for each
new latent
variable
In the same
fashion as data
clustering, a
HTML report is
created for each
clustering. They
are useful for
renaming new
variables and
their modalities
Once the
clusterings are
realized, a new
network is
created with one
node per latent
variable (keeping
the initial colour)
An internal database is
created. It contains the most
probable cluster values for
each line of the initial file.
This database can be saved in
a spare file with the “data”
menu.
Probabilistic
relationships
between the
nodes of this new
network can be
discovered with
the SopLEQ
algorithm.
After computation
and automatic
nodes positioning,
the obtained
network present
51 nodes
representing the
latent variables of
the initial dataset.