article in pressstatic.tongtianta.site/paper_pdf/140a8146-a8f4-11e9-b323-00163e08… · available...
Post on 16-Jul-2020
0 Views
Preview:
TRANSCRIPT
ARTICLE IN PRESS
JID: KNOSYS [m5G; March 4, 2018;8:59 ]
Knowledge-Based Systems 0 0 0 (2018) 1–13
Contents lists available at ScienceDirect
Knowle dge-Base d Systems
journal homepage: www.elsevier.com/locate/knosys
A three learning states Bayesian knowledge tracing model
Kai Zhang
a , b , ∗, Yiyu Yao
b
a National Engineering Research Center for E-Learning, Central China Normal University, China b Department of Computer Science, University of Regina, Canada
a r t i c l e i n f o
Article history:
Received 7 September 2017
Revised 27 February 2018
Accepted 1 March 2018
Available online xxx
Keywords:
Bayesian knowledge tracing
Three-way decisions
a b s t r a c t
This paper proposes a Bayesian knowledge tracing model with three learning states by extending the
original two learning states. We divide a learning process into three sections by using an evaluation
function for three-way decisions. Advantages of such a trisection over traditional bisection are demon-
strated by comparative experiments. We develop a three learning states model based on the trisection of
the learning process. We apply the model to a series of comparative experiments with the original model.
Qualitative and quantitative analyses of the experimental results indicate the superior performance of the
proposed model over the original model in terms of prediction accuracies and related statistical measures.
© 2018 Elsevier B.V. All rights reserved.
1
k
m
k
e
A
a
i
“
m
m
n
[
o
[
m
c
B
b
t
m
W
i
m
R
l
n
u
i
s
i
t
b
l
t
p
s
i
a
d
v
f
p
t
s
b
p
t
u
h
0
. Introduction
Knowledge is regarded as a collection of rules or skills in the
nowledge tracing model. Knowledge tracing is a process of esti-
ating the probability that a student has learned a rule or skill of
nowledge in intelligent tutoring systems. More generally, knowl-
dge is regarded as a collection of knowledge components (KCs).
s pointed out by Koedinger et al. [8] , each of the KCs is defined
s “an acquired unit of cognitive function or structure that can be
nferred from performance on a set of related tasks”. For example,
Addition”, “Subtraction”, “Multiplication” and “Division” are com-
only used KCs in math. The aim of knowledge tracing is to esti-
ate the probability that a KC is learned.
In 1995, Corbett and Anderson [4] proposed the origi-
al Bayesian knowledge tracing (BKT) model. Many extensions
14,18,19,23] of the original BKT model have been introduced in
rder to fit different learning environments. Pardos and Heffernan
18] individualized the prior learning parameter of the original BKT
odel by adding one student node. They also introduced the diffi-
ulty of a KC by adding one node into the topology of the original
KT model, and two performance parameters are individualized
y the added KC difficulty node [19] . Taking into account instruc-
ional interventions, Lin and Chi [14] proposed an Intervention-BKT
odel by adding an intervention node in the original BKT model.
ang et al. [23] proposed a Multi-Grained-BKT model by follow-
ng the definition of multi fine-grained KCs, and a Historical-BKT
odel to incorporate previous question responses.
∗ Corresponding author at: Department of Computer Science, University of
egina, Canada.
E-mail address: zhangkai@mail.ccnu.edu.cn (K. Zhang).
m
t
a
t
ttps://doi.org/10.1016/j.knosys.2018.03.001
950-7051/© 2018 Elsevier B.V. All rights reserved.
Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye
https://doi.org/10.1016/j.knosys.2018.03.001
However, the original BKT model intuitively introduced two
earning states without a formal definition. Furthermore, the origi-
al BKT model and its extensions regard the two learning states as
nlearned-state and learned-state for granted. The unlearned-state
ndicates that a student unlikely masters a KC, and the learned-
tate indicates that a student certainly masters a KC. One can eas-
ly see that the two learning states may not completely reflect
he evolution of learning. There exists a transitional learning state
etween the unlearned-state and learned-state. In the transitional
earning state, the student probably masters a KC.
Motivated by the observation of the existence of the transi-
ional learning state, we introduce a method to divide a learning
rocess into three learning states by ideas from three-way deci-
ions [24–28] . Consequently, we improve the original BKT model
nto a new three learning states BKT (TLS-BKT) model.
By extracting the commonly used idea of thinking in threes
cross many disciplines, Yao [28] proposed a theory of three-way
ecisions. A trisecting-and-acting model of three-way decisions in-
olves dividing a universal set into three parts and designing ef-
ective strategies to process the three parts. Yu et al. [29] pro-
osed a three-way decisions based clustering method to discover
he transitional regions of adjacent clusters. Li et al. [11] utilized
equential three-way decisions for cost-sensitive face recognition
ased on a sequence of image granulations. Savchenko [22] ap-
lied sequential three-way decisions to multi-class recognition by
risecting the distance from query objects to reference objects. By
sing the thresholds from game-theoretic rough sets and infor-
ation theoretic rough sets, Nauman et al. [17] divided applica-
ion software behaviors into three parts. Li et al. [12] presented
cost-sensitive software defect prediction method by classifying
he software modules into three different regions. Zhang and Min
sian knowledge tracing model, Knowledge-Based Systems (2018),
2 K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13
ARTICLE IN PRESS
JID: KNOSYS [m5G; March 4, 2018;8:59 ]
Hidden x1 x2 x3 xT
Observed y1 y2 y3 yT
Fig. 1. The graphical representation of the HMM.
Fig. 2. The instantiated of the HMM.
T
b
r
{
b
n
Q
A
N
A
w
t
V
A
N
B
w
b
c
b
t
p
�
e
t
i
o
a
s
r
2
m
s
b
a
g
m
b
a
i
i
P
t
[30] proposed several three-way recommender systems. Lang et al.
[10] proposed a three-way decisions based conflict analysis method
to divide conflict probabilities into three parts. Chen et al. [3] pro-
posed a multi-granular mining method to refine the transitional
regions of the three-way decisions model. Based on an L-level sim-
ilarity relation and a loss interval set, Liu et al. [15] proposed a new
loss function to determine the three parts of a universal set. Qi
et al. [21] proposed several algorithms for building three-way con-
cept lattices by the connections between three-way decisions and
classical concept lattices. Zhou [32] divided an email set into three
parts by minimizing the misclassification cost. Liang et al. [13] pro-
posed a relative value based loss function to divide a universal set
into three parts. To satisfy three different optimization objectives,
Zhang and Yao [31] proposed three different Gini functions that
can divide a universal set into three parts. Results of these studies
suggest that the philosophy of thinking in threes can be equally
applicable to model a learning process in intelligent tutoring sys-
tems. The introduction of the transitional state into the original
BKT model offers new research opportunities and challenges.
The rest of the paper is organized as follows. Section 2 in-
troduces the basics of the original BKT model and three-way de-
cisions. Section 3 proposes the definitions of a learning process
and a learning state. Based on them, an evaluation measure for
learning state partitions is derived. Thus, the original two learn-
ing states and the proposed three learning states are comparable.
Section 4 proposes the TLS-BKT model. Section 5 investigates the
performance of the proposed model by comparing with the orig-
inal BKT model on Assistments Math data sets and discusses the
implications of the experimental results. Section 6 presents our
conclusions.
2. Bayesian knowledge tracing model and three-way decisions
This section introduces basic concepts of hidden Markov mod-
els, the original Bayesian knowledge tracing model and three-way
decisions.
2.1. Hidden Markov models
A hidden Markov model (HMM) [2] is for the purpose of repre-
senting the probability distributions over a discrete time sequence
of observations Y = (y 1 , y 2 , . . . , y T ) . An HMM assumes that any ob-
servation in Y is determined by a Markov process sequence of
states X = (x 1 , x 2 , . . . , x T ) , and any state x i in X is hidden instead
of being observed. The joint distribution of X and Y , modeled by
an HMM, is factorized as follows:
P (X, Y ) = p(x 1 ) p(y 1 | x 1 ) T ∏
t=2
p(y t | x t ) p(x t | x t−1 ) , (1)
where p ( x 1 ) is the probability of the hidden state at time 1, p ( y t | x t )
is the emission probability of the observation y t at time t given the
hidden state x t , and p(x t | x t−1 ) is the transition probability from the
hidden state x t−1 to the hidden state x t .
The graphical representation of the factorization Eq. (1) is il-
lustrated in Fig. 1 .The gray circles, labeled by x t , t ∈ { 1 , 2 , . . . , T } ,represent the hidden states at time t . The rounded rectangles, la-
beled by y t , t ∈ { 1 , 2 , . . . , T } , represent the observations at time t .
Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye
https://doi.org/10.1016/j.knosys.2018.03.001
he arrows between the gray circles indicate the transition proba-
ilities p(x t | x t−1 ) . The arrows from the gray circles to the rounded
ectangles indicate the emission probabilities p ( y t | x t ).
An HMM also assumes that any hidden state variable x t , t ∈ 1 , 2 , . . . , T } and any observation variable y t , t ∈ { 1 , 2 , . . . , T } are
oth discrete. Specifically, suppose that x t takes on N values, de-
oted as follows:
= { q 1 , q 2 , . . . , q N } . ll the transition probabilities of an HMM can be represented as a
× N transition matrix as follows:
= [ a i j ] N×N =
[p(x t = q j | x t−1 = q i )
]N×N
,
here i = 1 , 2 , . . . , N; j = 1 , 2 , . . . , N. Likewise, suppose that y takes on M values, denoted as follows:
= { v 1 , v 2 , . . . , v M
} . ll the emission probabilities of an HMM can be represented as a
× M emission matrix as follows:
= [ b i ( j)] N×M
=
[p(y t = v j | x t = q i )
]N×M
,
here i = 1 , 2 , . . . , N; j = 1 , 2 , . . . , M.
An instantiated HMM is shown in Fig. 2 . The gray circles, la-
eled by q 1 , . . . , q N , indicate the values that any hidden state x t an take on at a particular time. The rounded rectangles, labeled
y v 1 , . . . , v M
, indicate the values that the corresponding observa-
ion y t can take on at that time. The solid arrows denote transition
robabilities. The dashed arrows denote emission probabilities.
Assume that the initial hidden state value probability vector is
= [ π(i )] N , where π(i ) = p(x 1 = q i ) , i ∈ { 1 , 2 , . . . , N} . The param-
ters of an HMM are represented by λ = (A, B, �) , that is a triplet
hat includes a transition matrix A , an emission matrix B and an
nitial hidden state value probability vector �. Given a sequence
f observations, the most likely λ can be estimated by a learning
lgorithm. Given λ and a sequence of observations, the most likely
equence of hidden states can be deduced by an inference algo-
ithm.
.2. The Bayesian knowledge tracing model
The original BKT model and its extensions have been imple-
ented on the basis of HMMs. The observations of HMMs repre-
ent a sequence of a student’s performances, any of which is la-
eled as correct or incorrect. The hidden states of HMMs represent
sequence of the learning states, any of which is intuitively re-
arded as unlearned-state or learned-state.
The aim of the original BKT model and its extensions is to esti-
ate the learned probability of each KC at opportunity t , denoted
y a learning parameter P ( L t ), t ≥ 0. P ( L t ) means the probability of
KC is learned after the t -th opportunity of applying it, and P ( L 0 )
s the initial probability of a KC before any opportunity of applying
t. Furthermore, the model introduces another learning parameter
( T ) and two performance parameters P ( G ) and P ( S ). The parame-
er P ( T ) is the probability that the learning state will transit from
sian knowledge tracing model, Knowledge-Based Systems (2018),
K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13 3
ARTICLE IN PRESS
JID: KNOSYS [m5G; March 4, 2018;8:59 ]
Fig. 3. The original BKT model.
u
K
w
w
u
w
t
e
o
i
g
s
o
r
p
t
f
t
g
l
t
2
[
t
i
t
d
p
s
c
t
fi
c
i
c
t
T
d
t
r
t
r
s
t
n
b
s
o
u
u
s
b
W
u
o
t
R
f
u
i
I
t
t
a
t
a
w
3
w
t
t
a
f
s
b
m
3
t
s
nlearned-state to learned-state after an opportunity to apply a
C, P ( G ) means the probability that a KC will be guessed correctly
hen it is in unlearned-state, and P ( S ) is the probability that a KC
ill be slipped when it is in learned-state. The probability P ( L t ) is
pdated after each opportunity to apply a KC as follows [1] :
P (L t−1 | Correct t ) =
P (L t−1 )(1 − P (S))
P (L t−1 )(1 − P (S)) + (1 − P (L t−1 )) P (G ) ,
P (L t−1 | Incorrect t ) =
P (L t−1 ) P (S)
P (L t−1 ) P (S) + (1 − P (L t−1 ))(1 − P (G )) ,
P (L t ) = P (L t−1 | evidence t ) + (1 − P (L t−1 | evidence t )) ∗ P (T ) ,
here Correct t means that the performance of the t -th opportunity
o apply the KC is correct, Incorrect t means the contrary case, and
vidence t ∈ {Correct t , Incorrect t } means the performance of the t -th
pportunity to apply the KC.
The original BKT model can be illustrated in Fig. 3 . Fig. 3 (a)
s the graphical representation of the original BKT model. The
ray circles, labeled by kc t , t ∈ { 1 , 2 , . . . , T } , represent the learning
tates of a certain kc at time t . The rounded rectangles, labeled by
t , t ∈ { 1 , 2 , . . . , T } , represent the performances at time t . The ar-
ows between the gray circles mean the learning state transition
robabilities p(kc t | kc t−1 ) from time t − 1 to time t . The arrows be-
ween the gray circles to the rounded rectangles mean the per-
ormance emission probabilities p ( o t | kc t ) given a learning state at
ime t . Fig. 3 (b) is the instantiation of the original BKT model. The
ray circles, labeled by u and l , represent the unlearned-state and
earned-state. The rounded rectangles, labeled by c and i , represent
he correct and incorrect performances.
.3. Three-way decisions
An underlying idea of three-way decisions is thinking in threes
28] . In contrast to dichotomous thinking in terms of two op-
ions, three-way decisions introduce a third option. More specif-
cally, we move from true/false, black/white, yes/no etc. into
rue/unsure/false, black/grey/white, yes/maybe/no etc. Three-way
ecisions were initially proposed by giving a sound semantical ex-
lanation of the positive, boundary and negative regions in rough
ets [20] . Subsequent studies showed that three-way decisions are
ommonly used across many disciplines and are an effective way
o complex problem solving. Three-way decisions are based on
ndings from human cognition [7] . Due to limited information pro-
ess capacity, we can normally process a small number of units of
nformation, ranging from two to seven [5,16] . Thinking in threes
omes naturally in our daily lives. The added third option provides
he necessary flexibility and universality of three-way decisions.
here is a fast growing interest in theory and practice of three-way
ecisions [24–26] .
A trisecting-and-acting model of three-way decisions consists of
wo basic components [28] , as illustrated by Fig. 4 . The rounded
ectangle, labeled by ‘A universal set’, represents a universe ( U )
Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye
https://doi.org/10.1016/j.knosys.2018.03.001
hat includes all the elements in a problem domain. The rounded
ectangles, labeled by ‘Region I’, ‘Region II’, and ‘Region III’, repre-
ent three subsets of U . The three regions are pairwise disjoint and
he union of the three regions equals to U . The three regions do
ot necessarily form a partition of U , since one or two of them can
e the empty set. Strategy I, Strategy II and Strategy III are three
trategies for acting on the corresponding regions. In this paper, we
nly employ the trisecting part of the trisecting-and-acting model.
In order to trisect a universal set U into three regions, one may
se an evaluation function and a pair of thresholds α, β on val-
es of the evaluation function. Suppose ( L , �) is a totally ordered
et, that is, � is reflexive, antisymmetric, transitive and compara-
le. (i.e., for any pair of elements a, b ∈ L , either a �b or b �a holds).
e can define a relation ≺ as: holds a ≺b if a �b ∧ ¬( b �a ). An eval-
ation function v : U → L maps the element in U to L . Given a pair
f thresholds α, β ∈ L with β≺α, we divide the universal set U into
hree regions as follows:
Region I (v ) = { x ∈ U | v (x ) � β} , Region II (v ) = { x ∈ U | β ≺ v (x ) ≺ α} , egion III (v ) = { x ∈ U | α � v (x ) } . (2)
A learning process about a KC is a sequence of a student’s per-
ormances about the KC. The trisecting-and-acting model can be
sed in trisecting and interpreting of a learning process. A learn-
ng process can be divided into three regions according to Eq. (2) .
n the Region I( v ) of a learning process, a student does not mas-
er a KC and the performances may be poor. In the Region II( v ) of
he learning process, the student probably knows more and more
bout the KC and the performances become better and better. In
he Region III( v ) of the learning process, the student knows about
KC and the performances may be good and stable. The details
ill be discussed in the next section.
. Interpretation of three learning states
In this section, we first formulate a learning process. Second,
e present two examples to illustrate the original bisection par-
ition of a learning process [4] and the proposed trisection parti-
ion of a learning process. Third, we formulate a learning state as
n interval of a learning process and define a distinction measure
or evaluating a partition. Fourth, we propose the methods for bi-
ection partition and trisection partition. Finally, we compare the
isection partition with the trisection partition by the distinction
easure.
.1. Examples for bisection and trisection of a learning process
In a Bayesian knowledge tracing model, we represent the his-
ory of a student’s learning process by her/his performance on a
eries of exercises regarding a KC. If the student provides a right
sian knowledge tracing model, Knowledge-Based Systems (2018),
4 K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13
ARTICLE IN PRESS
JID: KNOSYS [m5G; March 4, 2018;8:59 ]
A universal set U
Region I Region II Region III
Strategy I Strategy II Strategy III
Trisecting
Acting
Fig. 4. The trisecting-and-acting model of three-way decisions [28] .
1 0 0 0 0 1 0 1 0 1 1 1
unlearned-state learned-state
A Learning Process(a) The original bisection partition.
1 0 0 0 0 1 0 1 0 1 1 1
unlearned-state learning-state learned-state
A Learning Process(b) The proposed trisection partition.
Fig. 5. Two partition examples.
j
p
T
p
a
s
e
s
c
a
i
p
a
O
D
s
d
a
a
0
D
O
a
d
l
e
s
l
l
D
m
t
D
t
F
p
t
i
answer to the exercise, it is denoted as a correct performance. Oth-
erwise, it is denoted as an incorrect performance. Generally, incor-
rect performances are in the majority in the beginning of a series
of exercise, and correct performances are keeping rising as the stu-
dent continues learning. In this paper, we make the same assump-
tion as the original bisection partition, which is that the perfor-
mances in a learning process are no forgetting, gradually improv-
ing and eventually good. Formally, a sequence of performances ( o )
is defined as a learning process ( O ) as follows:
O = (o 1 , o 2 , . . . , o n ) , n ≥ 1 , o i ∈ { 0 , 1 } , where o i = 0 indicates an incorrect performance and o i = 1 indi-
cates a correct performance.
The original bisection partition formulates the evolution of a
learning process with two learning states: unlearned-state and
learned-state. For a learning process, such as (1, 0, 0, 0, 0, 1, 0,
1, 0, 1, 1, 1), in the beginning of the learning process, a student
is in the unlearned-state, and the student’s performances are poor.
After that, the student is in the learned-state, and the student’s
performances are good. This example is illustrated in Fig. 5 (a).
Unlike the original bisection partition, the proposed trisection
partition formulates the evolution of a learning process with three
learning states: unlearned-state, learning-state and learned-state.
For the same learning process, a student is in the unlearned-state
in the beginning of the learning process, and the student’s perfor-
mances are poor. After that, the student is in the learning-state,
and the student’s performances are improved gradually from poor
to good. In the last interval of the learning process, the student is
in the learned-state, and the student’s performances are good. This
example is illustrated in Fig. 5 (b).
3.2. Evaluation of a partition
Intuitively, the case that adjacent learning states of a partition
are similar indicates that the partition unclearly formulates the
evolution of a learning process. On the contrary, the case that ad-
Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye
https://doi.org/10.1016/j.knosys.2018.03.001
acent learning states of a partition are distinct indicates that the
artition clearly formulates the evolution of a learning process.
herefore, in order to clearly formulate the evolution of a learning
rocess, adjacent learning states of a partition should be as distinct
s possible.
Based on the above idea, we propose a definition of a learning
tate and a measure for evaluating the distinction between differ-
nt learning states. In this way, the distinction of adjacent learning
tates of a partition can be computed. Thus, two partitions become
omparable.
As can be seen in Fig. 5 , a learning state can be characterized by
n interval of a learning process. Therefore, we formulate a learn-
ng state with an interval of a learning process. Suppose a learning
rocess is O = (o 1 , o 2 , . . . , o n ) , a learning state is an interval of O
s follows:
i : j = (o i , o i +1 , . . . , o j ) , 1 ≤ i ≤ j ≤ n.
efinition 1 (The average performance) . Let O i : j be a learning
tate in a learning process O , the average performance of O i : j is
efined as follows:
p(O i : j ) =
o i + o i +1 + . . . + o j
j − i + 1
, 1 ≤ i ≤ j ≤ n.
It is an attribute of a learning state, and its value reflects the
verage performance of the learning state. For example, ap(O i : j ) = . 6 reflects that the average performance of O i : j is 0.6.
efinition 2 (The distinction between learning states) . Let O a : b ,
c : d be two learning states. The distinction between O a : b and O c : d
re defined as follows:
[ O a : b , O c: d ] = [ ap(O a : b ) − ap(O c: d ) ] 2 .
It is a measure for evaluating the distinction between two
earning states. It evaluates the distinction by the squared differ-
nce between the respective average performances of two learning
tates. Its value reflects the degree of the distinction between two
earning states. The larger its value is, the more distinct the two
earning states are.
efinition 3 (The distinction of a partition) . Let O =(o 1 , o 2 , . . . , o n ) be a learning process, a partition divide it into learning states O 1: i 1
, O i 1 +1: i 2 , O i 2 +1: i 3
, . . . , O i m −1 +1: n . The distinc-
ion of the partition are defined as follows:
=
d [ O 1: i 1 , O i 1 + 1: i 2 ] + d [ O i 1 + 1: i 2 , O i 2 + 1: i 3 ] + . . . + d[ O i m −2 + 1: m −1 , O i m −1 + 1: n ]
m −1
It is an attribute of a partition. Its value reflects the degree of
he average distinction of adjacent learning states of a partition.
or example, for a learning process, the distinction of one partition
1 is 0.3, the distinction of another partition p2 is 0.7. It indicates
hat adjacent learning states of p2 have larger distinction and p2
s a better partition.
sian knowledge tracing model, Knowledge-Based Systems (2018),
K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13 5
ARTICLE IN PRESS
JID: KNOSYS [m5G; March 4, 2018;8:59 ]
3
s
n
o
w
i
v
s
a
O
w
s
u
s
m
a
t
i
t
p
t
W
i
l
c
p
i
m
t
p
K
u
β
a
O
w
u
a
m
a
∑
t
p
d
o
W
O
a
3
b
t
w
i
w
r
t
s
t
b
p
s
t
a
i
f
t
s
b
i
t
n
4
s
d
m
b
p
4
B
e
m
E
n
b
.3. Methods for the bisection and trisection
In the original bisection partition context, the two learning
tates are taken for granted, but the method to derive them is
ever mentioned before. In this subsection, we propose two meth-
ds that really divide a learning process into two learning states as
ell as three learning states.
Suppose that an evaluation function v maps entries of a learn-
ng process O = (o 1 , o 2 , . . . , o n ) to their subscripts, that is
(o i ) = i, 1 ≤ i ≤ n.
In the context of the original bisection partition, two learning
tates divided from a learning process can be represented by using
threshold 1 ≤γ < n as follow:
O 1: γ =
{x ∈ O | 1 ≤ v (x ) ≤ γ } = { o 1 , o 2 , . . . , o γ
},
(γ +1): n =
{x ∈ O | γ + 1 ≤ v (x ) ≤ n } = { o γ +1 , o γ +2 , . . . , o n
}, (3)
here O 1: γ and O γ +1 : n divide a learning process into a beginning
tate and an end state. The beginning state O 1: γ is the original
nlearned-state, and the end state O γ +1 : n is the original learned-
tate.
According to Definition 1 , the corresponding average perfor-
ances of Eq. (3) is represented as follows:
ap(O 1: γ ) =
o 1 + o 2 + . . . + o γ
γ,
p(O (γ +1): n ) =
o γ +1 + o γ +2 + . . . + o n
n − γ. (4)
As can be seen in (3) , since 1 ≤γ < n , there exist n − 1 methods
o bisect a learning process. On the other side, it is worth not-
ng that the optimal bisection partition has the maximum distinc-
ion among all possible bisection partitions. Therefore, a learning
rocess can be divided into two learning states by optimizing the
hreshold γ as follows:
arg max γ
d [O 1: γ , O (γ +1): n
]s.t. 1 ≤ γ < n. (5)
ith the optimal γ in Eq. (5) , the two learning states O 1: γ , O γ +1 : n
n Eq. (3) are solved, and so are ap ( O 1: γ ), ap(O γ +1 : n ) in Eq. (4) .
In the context of the proposed trisection partition, we divide a
earning process into three learning states by following the prin-
iple of three-way decisions. In the unlearned-state, a student’s
erformances are poor and relatively stable. It suggests that a KC
s unlikely mastered. In the learning-state, the student’s perfor-
ances are improved gradually from poor to good. It suggests that
he KC is probably mastered. In the learned-state, the student’s
erformances are good and relatively stable. It suggests that the
C is certainly learned. Such a trisection of a learning process nat-
rally reflects the beginning, middle, and end of learning.
Inspired by Eq. (2) , given a pair of thresholds α and β with
< α, we can divide a learning process into three learning states
s follows:
O 1: β = { x ∈ O | 1 ≤ v (x ) ≤ β} =
{o 1 , o 2 , . . . , o β
},
(β+1): α = { x ∈ O | β + 1 ≤ v (x ) ≤ α} =
{o β+1 , o β+2 , . . . , o α
},
O (α+1): n = { x ∈ O | α + 1 ≤ v (x ) ≤ n } = { o α+1 , o α+2 , . . . , o n } , (6)
here 1 ≤β < α < n . Obviously, the beginning state O 1: β is the
nlearned-state, the transitional state O β+1 : α is the learning-state,
nd the end state O γ +1 : n is the learned-state.
According to Definition 1 , the corresponding average perfor-
ances of Eq. (6) is represented as follows:
ap(O 1: β ) =
o 1 + o 2 + . . . + o β
β,
Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye
https://doi.org/10.1016/j.knosys.2018.03.001
p(O (β+1): α) =
o β+1 + o β+2 + . . . + o α
α − β,
ap(O (α+1): n ) =
o α+1 + o α+2 + . . . + o n
n − α. (7)
Similar to the bisection partition, since 1 ≤β < α < n , there exist n −2 i =1 i ∗ (n − i − 1) methods to trisect a learning process. The op-
imal trisection partition has the maximum distinction among all
ossible trisection partitions. Therefore, a learning process can be
ivided into three learning states by optimizing the pair of thresh-
lds α, β as follows:
arg max α,β
d [O 1: β, O (β+1): α
]+ d
[O (β+1): α, O (α+1): n
]2
s.t. 1 ≤ β < α < n. (8)
ith the optimal α, β in Eq. (8) , the three learning states O 1: β ,
(β+1): α, O (α+1): n in Eq. (6) are solved, and so are ap ( O 1: β ),
p(O (β+1): α) , ap(O (α+1): n ) in Eq. (7) .
.4. Comparisons between bisection and trisection
In this subsection, the contrast tests compare the distinctions
etween the bisection partition with the trisection partition. The
ests are conducted by using artificial data here. The real data tests
ill be conducted in Section 5 . In order to simulate various learn-
ng processes, four length types of artificial learning processes,
hose lengths are respectively 10 0, 50 0, 10 0 0, 20 0 0, are generated
andomly. In order to balance the accuracy and time-consuming of
he experiment, each length type of the learning process has 10 0 0
amples.
The distinctions of the trisection partition are plotted as a func-
ion of each artificial learning process, and the distinctions of the
isection partition are plotted as a function of the same learning
rocess, illustrated in Fig. 6 . As can be seen from Fig. 6 , the tri-
ection partition offers a larger distinction than the bisection par-
ition in all types of learning processes. It is because that the aver-
ge performance of the unlearned-state in the trisection partition
s smaller than that in the bisection partition, and the average per-
ormance of the learned-state in the trisection partition is larger
han that in the bisection partition.
Based on the results of this section, we conclude that the tri-
ection partition characterizes the evolution of a learning process
etter than the bisection partition. Therefore, we improve the orig-
nal two learning states BKT model into a new BKT model with
hree learning states. In next section, the specific structure of the
ew model is proposed.
. A three learning states BKT model
This section proposes a new BKT model with three learning
tates. The probabilities of the three learning states at time t are
erived by the forward and backward probabilities of HMMs. A
ost likely sequence of learning states from time 1 to t is derived
y the inference algorithm of HMMs. The parameter λ of the pro-
osed model is derived by the learning algorithm of HMMs.
.1. The topology of the proposed model
Same as the original BKT model, the instantiation of the TLS-
KT model has two observations: correct and incorrect. Differ-
nt from the original BKT model, the instantiation of the TLS-BKT
odel has three hidden states: unlearned, learning and learned.
ach hidden state has two transition probabilities to itself and the
ext hidden state, and each hidden state has two emission proba-
ilities to the two observations.
The TLS-BKT model is illustrated in Fig. 7 .
sian knowledge tracing model, Knowledge-Based Systems (2018),
6 K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13
ARTICLE IN PRESS
JID: KNOSYS [m5G; March 4, 2018;8:59 ]
0 250 500 750 10000.02
0.04
0.06
0.08
The
dis
tinct
ion
The Length of learning processes: 100
TrisectionBisection
0 250 500 750 10000
0.005
0.01
0.015
0.02The Length of learning processes: 500
TrisectionBisection
0 250 500 750 1000The number of learning processes
0
0.005
0.01
0.015
0.02
The
dis
tinct
ion
The Length of learning processes: 1000
TrisectionBisection
0 250 500 750 1000The number of learning processes
0
0.002
0.004
0.006The Length of learning processes: 2000
TrisectionBisection
Fig. 6. The distinction comparisons between bisection and trisection.
Fig. 7. The TLS-BKT model.
V
A
w
t
B
w
l
�
w
B
B
4
t
g
α
B
Fig. 7 (a) is the graphical representation of the TLS-BKT model.
It is same with Fig. 3 (a), and the same parameters in the two fig-
ures have the same meaning. Fig. 7 (b) is the instantiation of the
TLS-BKT model. The gray circles, labeled by u, e and l , represent
the unlearned-state, learning-state and learned-state. The rounded
rectangles, labeled by c and i , represent the correct and incor-
rect performances. p uu and p ue denote the transition probability
from unlearned-state to itself and learning-state. p ee and p el denote
the transition probability from learning-state to itself and learned-
state. p ll denotes the transition probability from learned-state to
itself. p uc and p ui denote the emission probability from unlearned-
state to correct and incorrect performances, respectively. p ec and
p ei denote the emission probability from learning-state to correct
and incorrect performances, respectively. p lc and p li denote the
emission probability from learned-state to correct and incorrect
performances, respectively.
In the TLS-BKT model, the learning state of a KC takes on three
values, denoted as follows:
Q = { q 1 = u, q 2 = e, q 3 = l} , where u, e and l represent unlearned-state, learning-state, and
learned-state, respectively. The performance of a KC takes on two
values, denoted as follows:
= { v 1 = c, v 2 = i } , where c and i represent the correct performance and the incor-
rect performance, respectively. Let O = (o 1 , o 2 , . . . , o T ) be a learn-
ing process from time 1 to T , and KC = (kc 1 , kc 2 , . . . , kc T ) be a se-
quence of the learning states at the corresponding time.
Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye
https://doi.org/10.1016/j.knosys.2018.03.001
The transition matrix A of the proposed model is defined by
= [ a i j ] 3 ×3 =
[
p uu p ue 0
0 p ee p el
0 0 p ll
]
,
here a i j = p(kc t+1 = q j | kc t = q i ) , i, j ∈ { 1 , 2 , 3 } . The emission ma-
rix B of the proposed model is defined by
= [ b j (k )] 3 ×2 =
[
p uc p ui
p ec p ei
p lc p li
]
,
here b j (k ) = p(o t = v k | kc t = q j ) , k ∈ { 1 , 2 } , j ∈ { 1 , 2 , 3 } . The initial
earning state probability vector is
= [ π(i )] 3 ,
here π(i ) = p(x 1 = q i ) , i ∈ { 1 , 2 , 3 } . The parameters of the TLS-
KT model, such as the transition matrix A , the emission matrix
and �, are denoted as λ = (A, B, �) .
.2. The probabilities of the three learning states at time t
Let (o 1 , o 2 , . . . , o t ) be part of the learning process from time 1
o t . Given the parameter λ, the forward probability of kc t = q i is
iven as follows:
t (i ) = P ( o 1 , o 2 , . . . , o t , kc t = q i | λ) . (9)
ased on Eq. (9) , given part of the learning process
(o 1 , o 2 , . . . , o t+1 ) , the probability of kc t+1 = q i is given as fol-
sian knowledge tracing model, Knowledge-Based Systems (2018),
K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13 7
ARTICLE IN PRESS
JID: KNOSYS [m5G; March 4, 2018;8:59 ]
0 500 1000 1500 2000The number of learning processes
00.10.20.30.40.50.60.7
The
dis
tinct
ion
Date set: 2004
TrisectionBisection
0 200 400 600 800The number of learning processes
0
0.1
0.2
0.3
0.4
The
dis
tinct
ion
Date set: 2005
TrisectionBisection
0 500 1000 1500 2000The number of learning processes
0
0.1
0.2
0.3
0.4
The
dis
tinct
ion
Date set: 2006
TrisectionBisection
Fig. 8. The distinction comparisons between trisection and bisection in 2004 - 2006.
0 20 40 60 80 100The number of experiments
0.6
0.8
All-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000.6
0.8
The
acc
urac
y
D-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000.6
0.8
G-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000.6
0.8
M-KC
TLS-BKTOriginal BKT
0 20 40 60 80 100The number of experiments
0.6
0.8
The
acc
urac
y
N-KC
TLS-BKTOriginal BKT
0 20 40 60 80 100The number of experiments
0.6
0.8
P-KC
TLS-BKTOriginal BKT
Fig. 9. The accuracy comparisons in 2004.
Table 1
The AUC comparisons in 20 04–20 06.
Year Model D-KC G-KC M-KC N-KC P-KC All-KC
2004 Original BKT 0.6349 0.5876 0.6031 0.6477 0.6239 0.6123
TLS-BKT 0.8036 0.7603 0.7505 0.8153 0.8367 0.8056
2005 Original BKT 0.6339 0.5732 0.5900 0.6649 0.6852 0.6338
TLS-BKT 0.7504 0.7055 0.6790 0.7609 0.7471 0.7547
2006 Original BKT 0.6378 0.6109 0.6100 0.6663 0.6755 0.6491
TLS-BKT 0.7898 0.7653 0.7510 0.8394 0.8222 0.7996
l
α
w
s
p
p
β
B
β
w
r
s
P
T
l
P
T
c
P
ows:
t+1 (i ) =
[
N ∑
j=1
αt ( j) a ji
]
b i (o t+1 ) ,
here a ji and b i (o t+1 ) are the corresponding entries in A and B , re-
pectively. Similarly, let (o t+2 , o t+3 , . . . , o T ) be part of the learning
rocess from time t + 2 to T . Given the parameter λ, the backward
robability of kc t+1 = q i is as follows:
t+1 (i ) = P ( o t+2 , o t+2 , . . . , o T , kc t+1 = q i | λ) . (10)
ased on Eq. (10) , given part of the learning process
(o t+1 , o t+2 , . . . , o T ) , the probability of kc t = q i is given by
t (i ) =
N ∑
j=1
a i j b j (o t+1 ) βt+1 ( j) ,
Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye
https://doi.org/10.1016/j.knosys.2018.03.001
here a ji and b i (o t+1 ) are the corresponding entries in A and B ,
espectively.
Given the learning process O , the probability of the learning
tate at time t is given as follows:
( kc t = q i | O, λ) =
P ( kc t = q i , O | λ)
P (O | λ) . (11)
his equation can be derived according to Eqs. (9) and (10) as fol-
ows:
( kc t = q i , O | λ) = αt (i ) βt (i ) .
he probability of the learning state at time t , given by Eq. (11) ,
an be represented as:
( kc t = q i | O, λ) =
αt (i ) βt (i ) ∑ N j=1 αt ( j) βt ( j)
. (12)
sian knowledge tracing model, Knowledge-Based Systems (2018),
8 K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13
ARTICLE IN PRESS
JID: KNOSYS [m5G; March 4, 2018;8:59 ]
0 20 40 60 80 100The number of experiments
0.6
0.8
All-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000.6
0.8T
he a
ccur
acy
D-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000.6
0.8
G-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000.6
0.8
M-KC
TLS-BKTOriginal BKT
0 20 40 60 80 100The number of experiments
0.6
0.8
The
acc
urac
y
N-KC
TLS-BKTOriginal BKT
0 20 40 60 80 100The number of experiments
0.6
0.8
P-KC
TLS-BKTOriginal BKT
Fig. 10. The accuracy comparisons in 2005.
0 20 40 60 80 100The number of experiments
0.6
0.8
All-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000.6
0.8
The
acc
urac
y
D-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000.6
0.8
G-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000.6
0.8
M-KC
TLS-BKTOriginal BKT
0 20 40 60 80 100The number of experiments
0.6
0.8
The
acc
urac
y
N-KC
TLS-BKTOriginal BKT
0 20 40 60 80 100The number of experiments
0.6
0.8
P-KC
TLS-BKTOriginal BKT
Fig. 11. The accuracy comparisons in 2006.
s
A
l
K
The probabilities of kc t = q i , i ∈ { 1 , 2 , 3 } can be solved given a
learning process O and λ.
4.3. The prediction of the most likely sequence of learning states
Besides the probabilities of the three learning states at time t
in Eq. (12) , the most likely sequence of learning states from time 1
to T can be estimated by the inference algorithm of HMMs given a
learning process O from time 1 to T .
Given the learning process O = (o 1 , o 2 , . . . , o T ) and λ, the infer-
ence procedure searches for the most likely sequence of learning
Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye
https://doi.org/10.1016/j.knosys.2018.03.001
tates KC , which can be computed as follows [6] :
KC = arg max KC
P ( KC| O, λ)
= arg max KC
∏
T
P (O | KC, λ) P ( KC| λ) . (13)
ccording to Eq. (13) , the estimate of a most likely sequence of the
earning states from time 1 to T can be computed as follows:
C = arg max KC
P ( KC| O, λ)
= arg max KC
∏
T
P (O | KC, λ) P ( KC| λ)
= arg max kc 1 ,kc 2 , ... ,kc T
T ∏
t=1
p(o t | kc t , λ) p(kc t | λ) , (14)
sian knowledge tracing model, Knowledge-Based Systems (2018),
K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13 9
ARTICLE IN PRESS
JID: KNOSYS [m5G; March 4, 2018;8:59 ]
0 500 1000 1500 2000The number of students
0.4
0.45
0.5
0.55
0.6All-KC
TLS-BKTOriginal BKT
0 150 300 450 6000.4
0.45
0.5
0.55
0.6
RM
SE
D-KC
TLS-BKTOriginal BKT
0 50 100 1500.4
0.45
0.5
0.55
0.6
0.65
G-KC
TLS-BKTOriginal BKT
0 10 20 30 400.2
0.3
0.4
0.5
0.6M-KC
TLS-BKTOriginal BKT
0 150 300 450 600The number of students
0.45
0.5
0.55
0.6
RM
SE
N-KC
TLS-BKTOriginal BKT
0 150 300 450 600The number of students
0.4
0.45
0.5
0.55
0.6P-KC
TLS-BKTOriginal BKT
Fig. 12. The RMSE comparisons in 2004.
0 250 500 750 1000The number of students
0.4
0.45
0.5
0.55
0.6All-KC
TLS-BKTOriginal BKT
0 50 100 1500.45
0.5
0.55
0.6
RM
SE
D-KC
TLS-BKTOriginal BKT
0 2 4 60.4
0.45
0.5
0.55
0.6G-KC
TLS-BKTOriginal BKT
0 10 20 30 40
0.4
0.45
0.5
0.55M-KC
TLS-BKTOriginal BKT
0 50 100 150 200The number of students
0.4
0.45
0.5
0.55
0.6
RM
SE
N-KC
TLS-BKTOriginal BKT
0 150 300 450 600The number of students
0.4
0.45
0.5
0.55
0.6P-KC
TLS-BKTOriginal BKT
Fig. 13. The RMSE comparisons in 2005.
w
d
m
s
4
s
f
b
c
e
λ
w
d
i
L
P
here kc t ∈ Q , and O t ∈ P . p(kc t | λ) = a kc t−1 kc t is the conditional
istribution of the learning state kc t given the current λ.
p(o t | kc t , λ) = b kc t o t is the conditional distribution of the perfor-
ance o t given the hidden state kc t and current λ. A most likely
equence of the learning states can be solved by Eq. (14) .
.4. Solution of the parameters
Prior to the estimate of the probabilities of the three learning
tates at time t or a most likely sequence of the learning states
rom time 1 to T , the parameter λ of the proposed model need to
e solved by the learning algorithm of HMMs.
Given a learning process O = (o 1 , o 2 , . . . , o T ) , the learning pro-
edure of HMMs looks for the optimal λ to satisfy the following
Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye
https://doi.org/10.1016/j.knosys.2018.03.001
quation [6] :
= arg max λ
E KC| O, ̄λ log L (λ; O, KC)
= arg max λ
∑
KC
P (KC| O, ̄λ) log L (λ; O, KC) , (15)
here λ̄ is the current estimate of λ, P (KC| O, ̄λ) is the conditional
istribution of the sequence of learning states KC given the learn-
ng process (o 1 , o 2 , . . . , o t ) under the current estimate of the λ, and
( λ; O, KC ) is the likelihood function, which is equal to P ( O, KC | λ).
Since
(KC| O, ̄λ) =
P (KC, O | ̄λ)
P (O | ̄λ) ,
sian knowledge tracing model, Knowledge-Based Systems (2018),
10 K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13
ARTICLE IN PRESS
JID: KNOSYS [m5G; March 4, 2018;8:59 ]
0 500 1000 1500 2000The number of students
0.45
0.5
0.55
0.6
0.65All-KC
TLS-BKTOriginal BKT
0 100 200 3000.45
0.5
0.55
0.6
0.65
RM
SE
D-KC
TLS-BKTOriginal BKT
0 20 40 600.4
0.45
0.5
0.55
0.6G-KC
TLS-BKTOriginal BKT
0 10 20 30 400.4
0.45
0.5
0.55
0.6M-KC
TLS-BKTOriginal BKT
0 200 400 600 800The number of students
0.45
0.5
0.55
0.6
RM
SE
N-KC
TLS-BKTOriginal BKT
0 200 400 600 800The number of students
0.45
0.5
0.55
0.6P-KC
TLS-BKTOriginal BKT
Fig. 14. The RMSE comparisons in 2006.
0 20 40 60 80 100The number of experiments
0
1
2
3
4
5
All-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000
1
2
3
4
The
sta
ndar
d de
viat
ion D-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000
1
2
3G-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000
1
2
3M-KC
TLS-BKTOriginal BKT
0 20 40 60 80 100The number of experiments
0
1
2
3
4
5
The
sta
ndar
d de
viat
ion N-KC
TLS-BKTOriginal BKT
0 20 40 60 80 100The number of experiments
0
1
2
3
4
5
P-KC
TLS-BKTOriginal BKT
Fig. 15. The SD comparisons in 2004.
a
l
t
t
l
t
t
B
u
g
t
i
t
where P (O | ̄λ) is a constant. Eq. (15) is equivalent to:
λ = arg max λ
∑
KC
P (O, KC| ̄λ) log P (O, KC| λ)
= arg max λ
∑
kc i ∈ Q,o t ∈ V P (o 1 , . . . , o T , kc 1 , . . . , kc T | ̄λ
)log P ( o 1 , . . . , o T , kc 1 , . . . , kc T | λ) . (16)
The parameter λ can be solved by Eq. (16) .
4.5. Discussions
First, the original BKT model and its extensions intuitively re-
gard learning states as unlearned-state and learned-state without
a clear formulation. The proposed learning state is formulated by
Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye
https://doi.org/10.1016/j.knosys.2018.03.001
n interval of a learning process. Based on it, the evolution of a
earning process can be characterized accurately.
Second, based on the proposed learning state formulation, the
risection partition is proposed by maximizing the average dis-
inction of the adjacent learning states. Following that, it achieves
arger distinction than the bisection partition. It indicates that the
risection partition refines the learning states more exactly than
he bisection partition.
Third, compared to the original BKT model, the proposed TLS-
KT model increases the new learning-state. Correspondingly, this
pdates the parameters as well as the inference and prediction al-
orithms of the model. The parameters consist of the increasing
ransition and emission parameters of the new learning-state. The
nference procedure of the proposed model takes the incremen-
al information to derive the probabilities of all the three learn-
sian knowledge tracing model, Knowledge-Based Systems (2018),
K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13 11
ARTICLE IN PRESS
JID: KNOSYS [m5G; March 4, 2018;8:59 ]
0 20 40 60 80 100The number of experiments
0
1
2
3
4All-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000
1
2
3
4
The
sta
ndar
d de
viat
ion D-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000
1
2
3
4G-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000
1
2
3
4M-KC
TLS-BKTOriginal BKT
0 20 40 60 80 100The number of experiments
0
1
2
3
4
The
sta
ndar
d de
viat
ion N-KC
TLS-BKTOriginal BKT
0 20 40 60 80 100The number of experiments
0
1
2
3
4P-KC
TLS-BKTOriginal BKT
Fig. 16. The SD comparisons between in 2005.
0 20 40 60 80 100The number of experiments
0
1
2
3
4All-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000
1
2
3
4
The
sta
ndar
d de
viat
ion D-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000
1
2
3
4G-KC
TLS-BKTOriginal BKT
0 20 40 60 80 1000
1
2
3
4M-KC
TLS-BKTOriginal BKT
0 20 40 60 80 100The number of experiments
0
1
2
3
4
5
The
sta
ndar
d de
viat
ion N-KC
TLS-BKTOriginal BKT
0 20 40 60 80 100The number of experiments
0
1
2
3
4
5
P-KC
TLS-BKTOriginal BKT
Fig. 17. The SD comparisons between in 2006.
i
m
s
5
u
p
s
i
m
f
t
a
f
B
1
t
5
i
s
M
2
b
A
N
ng states at time t , and the prediction procedure of the proposed
odel takes the incremental information to predict the most likely
equence of learning states.
. Experiments
In this section, a series of experiments were conducted to eval-
ate the effectiveness and efficiency of the TLS-BKT model by com-
aring with the original BKT model. The distinctions of the tri-
ection and bisection partition were compared. Following compar-
sons of prediction accuracy and area under the curve (AUC), root
ean square error (RMSE) and standard deviation (SD) were used
or the purposes of demonstrating the proposed model from sta-
istical perspectives. Except for the comparisons on the distinction,
ll other experiments were carried out by 10-fold cross-validation
Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye
https://doi.org/10.1016/j.knosys.2018.03.001
or validating the proposed model. We implemented the original
KT model and the TLS-BKT model on a MAC with macOS Sierra
0.12.6, Inter Core i5 CUP@ 2.70 GHz and 8.0 GB of memory, and
he software platform is MATLAB R2016a.
.1. Data sets
The data sets involved are WPI-Assistments [9] from DataShop,
ncluding the Assistments Math 20 04–20 05 (912 students), the As-
istments Math 20 05–20 06 (3136 students) and the Assistments
ath 20 06–20 07 (5046 students), which are denoted as 20 04,
0 05 and 20 06, respectively. The experiments were conducted
ased on five KCs in the data sets. The five KCs are ‘D-Data-
nalysis-St atistics-Probability’, ‘G-Geometry’, ‘M-Measurement’, ‘N-
umber-Sense-Operations’ and ‘P-Patterns-Relations-Algebra’, de-
sian knowledge tracing model, Knowledge-Based Systems (2018),
12 K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13
ARTICLE IN PRESS
JID: KNOSYS [m5G; March 4, 2018;8:59 ]
t
l
a
m
a
t
s
6
i
w
p
p
p
t
a
p
A
S
a
M
2
[
c
R
noted as ‘D-KC’, ‘G-KC’, ‘M-KC’, ‘N-KC’ and ‘P-KC’, respectively. In
our experiments, a learning process which contains less than 100
performances is excluded in order to estimate probability accu-
rately.
5.2. Comparisons of the distinction
For each learning process, the distinctions of the trisection and
bisection partition are shown in Fig. 8 . The trisection partition ob-
tains a larger distinction than the bisection partition on each real
learning process. Besides showing superior in the artificial data
sets in Fig. 6 , the trisection partition offers a better distinction in
the real data sets.
5.3. Comparisons of student performance prediction
To compare the prediction accuracies between the TLS-BKT
model and the original BKT model, the accuracies of each of the
five KCs were computed separately. Furthermore, the accuracies of
all the five KCs combination were evaluated. The accuracies were
averaged across 1 to 100 experiments in order to display the stable
prediction performance of the two models.
The accuracies of the TLS-BKT and original BKT model are de-
picted in Figs. 9–11 . The proposed model is superior to the orig-
inal BKT model in predicting the single KC as well as the combi-
nation KCs. This is because that the TLS-BKT model has one more
learning-state than the original-BKT model. When a student maybe
learn a KC, the learning state of the student will be transited into
the learning-state instead of the unlearned-state or learned-state.
The prediction accuracies can also be illustrated in AUC respec-
tively, as can be seen in the Table 1 . The AUC maximum of each KC
is annotated in bold. The proposed model achieves the maximum
in the single KC and the combination KCs, which is consistent with
Figs. 9–11 . Interestingly, the AUC values of 2005 are overall lower
than that of 2004 and 2006. Furthermore, it keeps the trend that
the AUC values of the proposed model are larger than the origi-
nal BKT model. This observation is related to the number of KCs.
2005 has much less data than 2004 and 2006, and the two models
cannot be trained sufficiently and lead to higher deviations.
5.4. Comparisons of statistical measures
Besides the comparisons of the prediction between the two
models, comparisons of statistical measures offer the evaluation in
an alternative perspective. In this section, RMSE and SD are em-
ployed to assess the proposed model and the original BKT model.
The RMSE on each of the five KCs was computed separately.
Furthermore, the RMSE on all the five KCs combination was evalu-
ated. The RMSE was averaged across 1 to 100 experiments in order
to display the stable performance of the two models. The RMSE
was computed for each student in order to show the detailed per-
formance of the two models.
The RMSE comparisons between the two models are shown in
Figs. 12–14 .
The results show the RMSE of the proposed model prediction is
lower than that of the original BKT model in the single KC as well
as the combination KCs. It suggests that the TLS-BKT model of-
fers closer predictions than the original BKT model. This is because
that the transitional state of the TLS-BKT model offers the better
prediction, which makes the prediction of the proposed model is
closer to the real situation.
The SD on each of the five KCs was computed separately. Fur-
thermore, the SD on all the five KCs combination was evaluated.
The SD was averaged across 1 to 100 experiments in order to dis-
play the stable performance of the two models.
Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye
https://doi.org/10.1016/j.knosys.2018.03.001
Figs. 15–17 depict the standard deviation comparisons between
he proposed model and the original BKT model.
The results show the SD of the proposed model prediction is
ower than that of the original BKT model in the single KC as well
s the combination KCs. It suggests that the TLS-BKT model offers
ore stable predictions than the original BKT model. The reason is
lso related to the learning-state of the TLS-BKT model. The predic-
ion results of the proposed model are coordinated by the learning-
tate.
. Conclusions
In this paper, we divided the learning process into three learn-
ng states by the evaluation function based on ideas from three-
ay decisions. Based on the three learning states, we have pro-
osed the new TLS-BKT model. The results of the comparative ex-
eriments demonstrate clearly that the proposed model improves
rediction accuracies and shows superior robustness on the statis-
ical measures than the original BKT model. Such improvement is
scribed to the proposed three learning states which offer a more
recise formulation than the original two learning states.
cknowledgment
This research was partially supported by the program of China
cholarship Council (CSC) under the Grant No. 201606775044 , and
Discovery Grant from NSERC, Canada. We used the Assistments
ath 20 04–20 05 (912 Students), 20 05–20 06 (3136 Students),
0 06–20 07 (5046 Students) and dataset accessed via DataShop
[9] ]. We would like to thank the anonymous reviewers for their
onstructive advice.
eferences
[1] R.S. Baker , A.T. Corbett , V. Aleven , More accurate student modeling through
contextual estimation of slip and guess probabilities in Bayesian knowledgetracing, in: Proceedings of ITS’08, 5091, Springer, 2008, pp. 406–415 .
[2] L.E. Baum , T. Petrie , Statistical inference for probabilistic functions of finitestate Markov chains, Ann. Math. Stat. 37 (6) (1966) 1554–1563 .
[3] J. Chen , Y.P. Zhang , S. Zhao , Multi-granular mining for boundary regions inthree-way decision theory, Knowl.-Based Syst. 91 (2016) 287–292 .
[4] A.T. Corbett , J.R. Anderson , Knowledge tracing: modeling the acquisition of
procedural knowledge, User Model User-adapt. Interact. 4 (4) (1995) 253–278 . [5] N. Cowan , The magical number 4 in short-term memory: a reconsideration of
mental storage capacity, Behav. Brain. Sci. 24 (1) (2001) 87–114 . [6] Z. Ghahramani , An introduction to hidden Markov models and Bayesian net-
works, Int. J. Pattern Recognit. Artif. Intell. 15 (01) (2001) 9–42 . [7] R.W. Keidel , Strategy made simple: thinking in threes, Bus. Horiz. 56 (1) (2013)
105–111 .
[8] K. Koedinger , A.T. Corbett , C. Perfetti , The knowledge-learning-instructionframework: bridging the science-Practice chasm to enhance robust student
learning, Cogn. Sci. 36 (5) (2012) 757–798 . [9] K.R. Koedinger , R.S. Baker , K. Cunningham , A. Skogsholm , B. Leber , J. Stamper ,
A data repository for the EDM community: The PSLC datashop, CRC Press, BocaRaton, FL, 2010 .
[10] G.M. Lang , D.Q. Miao , M.J. Cai , Three-way decision approaches to conflict anal-
ysis using decision-theoretic rough set theory, Inf. Sci. 406 (2017) 185–207 . [11] H.X. Li , L.B. Zhang , B. Huang , X.Z. Zhou , Sequential three-way decision and
granulation for cost-sensitive face recognition, Knowl.-Based Syst. 91 (2016)241–251 .
[12] W.W. Li , Z.Q. Huang , Q. Li , Three-way decisions based software defect predic-tion, Knowl.-Based Syst. 91 (2016) 263–274 .
[13] D.C. Liang , W. Pedrycz , D. Liu , Determining three-way decisions with deci-
sion-theoretic rough sets using a relative value approach, IEEE Trans. Syst.,Man, Cybern. A, Syst. Hum. 47 (8) (2016) 1785–1799 .
[14] C. Lin , M. Chi , Intervention-BKT: incorporating instructional interventions intoBayesian knowledge tracing, in: Proceedings of ITS’16, 9684, Springer, 2016,
pp. 208–218 . [15] D. Liu , D.C. Liang , C.C. Wang , A novel three-way decision model based on in-
complete information system, Knowl.-Based Syst. 91 (2016) 32–45 . [16] G.A. Miller , The magical number seven, plus or minus two: some limits on our
capacity for processing information., Psychol. Rev. 63 (2) (1956) 81–97 .
[17] M. Nauman , N. Azam , J.T. Yao , A three-way decision making approach to mal-ware analysis using probabilistic rough sets, Inf. Sci. 374 (2016) 193–209 .
[18] Z.A. Pardos , N.T. Heffernan , Modeling individualization in a Bayesian networksimplementation of knowledge tracing, in: Proceedings of UMAP’10, 6075,
Springer, 2010, pp. 255–266 .
sian knowledge tracing model, Knowledge-Based Systems (2018),
K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13 13
ARTICLE IN PRESS
JID: KNOSYS [m5G; March 4, 2018;8:59 ]
[
[
[
[
[
[
[
[
[
[
[19] Z.A. Pardos , N.T. Heffernan , KT-IDEM: introducing item difficulty to theknowledge tracing model, in: Proceedings of UMAP’11, 6787, Springer, 2011,
pp. 243–254 . 20] Z. Pawlak , Rough sets, Int. J. Parallel. Program. 11 (5) (1982) 341–356 .
[21] J.J. Qi , T. Qian , L. Wei , The connections between three-way and classical con-cept lattices., Knowl.-Based Syst. 91 (1) (2016) 143–151 .
22] A. Savchenko , Fast multi-class recognition of piecewise regular objects basedon sequential three-way decisions and granular computing, Knowl.-Based Syst.
91 (2016) 252–262 .
23] Z. Wang , J.L. Zhu , X. Li , Z.T. Hu , M. Zhang , Structured knowledge tracing mod-els for student assessment on coursera, in: Proceedings of L@S’16, ACM, 2016,
pp. 209–212 . 24] Y.Y. Yao , Three-way decision: an interpretation of rules in rough set theory, in:
Proceedings of RSKT’09, Springer, 2009, pp. 642–649 . 25] Y.Y. Yao , Three-way decisions with probabilistic rough sets, Inf. Sci. 180 (3)
(2010) 341–353 .
Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye
https://doi.org/10.1016/j.knosys.2018.03.001
26] Y.Y. Yao , The superiority of three-way decisions in probabilistic rough set mod-els, Inf. Sci. 181 (6) (2011) 1080–1096 .
[27] Y.Y. Yao , An outline of a theory of three-way decisions, in: Proceedings ofRSCTC’12, 7413, Springer, 2012, pp. 1–17 .
28] Y.Y. Yao , Three-way decisions and cognitive computing, Cognit. Comput. 8 (4)(2016) 543–554 .
29] H. Yu , C. Zhang , G.Y. Wang , A tree-based incremental overlapping clusteringmethod using the three-way decision theory, Knowl.-Based Syst. 91 (2016)
189–203 .
30] H.R. Zhang , F. Min , Three-way recommender systems based on random forests,Knowl.-Based Syst. 91 (2016) 275–286 .
[31] Y. Zhang , J.T. Yao , Gini objective functions for three-way classifications, Int. J.Approx. Reason. 81 (2017) 103–114 .
32] B. Zhou , Y.Y. Yao , J.G. Luo , Cost-sensitive three-way email spam filtering, J. In-tell. Inf. Syst. 42 (1) (2014) 19–45 .
sian knowledge tracing model, Knowledge-Based Systems (2018),
top related