a dynamic mixture model to detect student motivation and ... · derived purely from screen shots...

38
Calculus III: Homework Problem Sets 1 Part I §1. Homework Set 1: Introduction to 3-Dimensions (page 1) §2. Homework Set 2: Introduction to Vectors (page 4) §3. Homework Set 3: Dot Product and Projections (page 6) §4. Homework Set 4: Cross Product (page 10) §5. Homework Set 5: Lines (page 17) §6. Homework Set 6: Planes (and More Lines) (page 23) §7. Homework Set 7: Surfaces (page 33-35) §1. Homework Set 1: Introduction to 3-Dimensions 1. Calculate the distance from the point P = (0, 1, 2) to the point Q = (3, 1, -2). FT 2. Calculate the midpoint of the line segment XY where X = (4, 3, 5) and Y = (2, 5, -3). FT 3. What is the distance from the point (-4, 5, -5) to the x-axis? FQ 4. What is the distance from (3, 4, 5) to the x-axis? FT 5. Find the distance from the point (-5, 2, -3) to the y-axis. FQ 6. What is the distance from (3, 4, 6) to the z -axis? FT 7. Find the distance from the point (3, 7, -4) to each of the following. FQ (a) the xy-plane (b) the y-axis 8. The point (2,y,z ) is on a line through (-3, 4, -2) that is parallel to one of the coordinate FQ axes. Which axis must it be and what are y and z ? 9. Suppose that a box has its faces parallel to the coordinate planes and that the points (-5, 3, 2) FQ and (-1, 1, 2) are two endpoints of a diagonal on one face of the box. What are the two endpoints of the other diagonal on this same face of the box? 1 FQ = Filaseta Quiz problem, FT = Filaseta Test problem, FF = Filaseta Final Exam problem 1

Upload: others

Post on 17-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Dynamic Mixture Model to Detect Student Motivation and ... · derived purely from screen shots suggests that simple, inex-pensive methods for estimating motivation should be useful

A Dynamic Mixture Model to Detect Student Motivation and Proficiency

Jeff Johns and Beverly WoolfComputer Science Department

University of Massachusetts AmherstAmherst, Massachusetts 01003

{johns, bev}@cs.umass.edu

Abstract

Unmotivated students do not reap the full rewards ofusing a computer-based intelligent tutoring system. De-tection of improper behavior is thus an important com-ponent of an online student model. To meet this chal-lenge, we present a dynamic mixture model based onItem Response Theory. This model, which simultane-ously estimates a student’s proficiency and changingmotivation level, was tested with data of high schoolstudents using a geometry tutoring system. By account-ing for student motivation, the dynamic mixture modelcan more accurately estimate proficiency and the proba-bility of a correct response. The model’s generality is anadded benefit, making it applicable to many intelligenttutoring systems as well as other domains.

IntroductionAn important aspect of any computer-based intelligent tu-toring system (ITS) is the ability to determine a student’sskill set and to tailor its pedagogical actions to address thestudent’s deficiencies. Tutoring systems have demonstratedthis ability in the classroom (VanLehn et al. 2005). How-ever, even the most effective tutoring system will fail if thestudent is not receptive to the material being presented. Lackof motivation has been shown empirically to correlate witha decrease in learning rate (Baker, Corbett, & Koedinger2004). While attempts to motivate a student by using mul-timedia and/or by couching the material as a game haveproved partially successful, there is still significant room forimprovement. In fact, these motivation tools can themselvescause undesirable behavior, where students uncover waysto game the system. This issue of motivation and perfor-mance is particularly relevant given the weight assigned tohigh stakes achievement tests, such as the Scholastic Apti-tude Test (SAT), as well as other exams that can be requiredfor graduation. Students use tutoring systems to practicefor high-stakes tests but typically are not graded based ontheir performance, which leads to low effort. The concernof low motivation affecting performance is also being ad-dressed by the educational assessment community (Wise &DeMars 2005).

Copyright c© 2006, American Association for Artificial Intelli-gence (www.aaai.org). All rights reserved.

Automated diagnosis is the first step in addressing astudent’s level of motivation. Several models have beenproposed to infer a student’s engagement using variablessuch as observed system use (time to respond, number ofhints requested), general computer use (opening an Inter-net browser, mouse activity), and visual and auditory clues(talking to the person at the nearby computer). The pur-pose of this paper is not to point out new ways in whichstudents display unmotivated behavior, but rather to providea general, statistical framework for estimating both motiva-tion and proficiency in a unified model provided that un-motivated behavior can be identified. We propose combin-ing an Item Response Theory (IRT) model to gauge studentproficiency and a hidden Markov model (HMM) to infer astudent’s motivation. The result is a very general, dynamicmixture model whose parameters can be estimated from stu-dent log data and can run online by an ITS. We validate themodel using data from a tutoring system, but indicate themodel can be applied to other types of data sets.

BackgroundThe next two sections describe previous work in estimat-ing motivation and provide a brief introduction to Item Re-sponse Theory.

Relevant Literature

Several models have been proposed to infer student motiva-tion from behavioral measures. de Vicente and Pain (2000;2002) designed a study where participants were asked towatch prerecorded screen interaction of students using theirtutoring system. Participants in the study created over eightyinference rules linking motivation to variables that could beobserved on the screen. The fact that so many rules could bederived purely from screen shots suggests that simple, inex-pensive methods for estimating motivation should be usefulfor a tutoring system.

Conati (2002) presented a dynamic decision network tomeasure a student’s emotional state based on variables suchas heart rate, skin conductance, and eyebrow position (incontrast to the more easily attained data used by de Vicenteand Pain). The structure and parameters of the model, inthe form of prior and conditional probabilities, were set byhand and not estimated from data. The probabilistic model

163

Page 2: A Dynamic Mixture Model to Detect Student Motivation and ... · derived purely from screen shots suggests that simple, inex-pensive methods for estimating motivation should be useful

applies decision theory to choose the optimal tutor action tobalance motivation and the student’s learning.

A latent response model (Baker, Corbett, & Koedinger2004) was learned to classify student actions as either gam-ing or not gaming the system. Furthermore, instances ofgaming the system were divided into two cases: gamingwith no impact on pretest-posttest gain and gaming with anegative impact on pretest-posttest gain. The features usedin the latent response model were a student’s actions in thetutor, such as response time, and probabilistic informationregarding a student’s latent skills.

Arroyo and Woolf (2005) developed a Bayesian networkusing a student’s observed problem-solving behavior andunobserved attitude toward the tutor. The unobserved vari-ables were estimated from a survey that students filled outafter using the tutor. Correlation between pairs of variableswas used to determine the network’s connectivity.

Beck (2005) proposed a function relating response time tothe probability of a correct response to model student disen-gagement in a reading tutor. He adapted the item characteris-tic curve from IRT to include a student’s speed, proficiency,response time, and other problem-specific parameters. Thelearned model showed that disengagement negatively corre-lated with performance gain.

These models embody different assumptions about thevariables required to estimate student motivation (e.g. staticversus dynamic models, complex versus simple features,user specified versus learned model parameters, generic ver-sus domain specific models). The model proposed in thispaper is different because it encompasses the following fourprinciples which do not all exist in any one of the previousmodels. First, the model should estimate both student mo-tivation and proficiency. These variables need to be jointlyestimated because poor performance could be due to eitherlow motivation or insufficient ability. Only one of the afore-mentioned models performs this function (Beck 2005). Sec-ond, the proposed model should run in real time. There ex-ists a tradeoff between model complexity and expressivenessto ensure tutoring systems can take action at the appropriatetime. Ideally, the model parameters should also be estimablefrom a reasonable amount of data. Third, the model shouldbe flexible enough to easily include other forms of unmoti-vated behavior as researchers identify them. Fourth, motiva-tion needs to be treated as a dynamic variable in the model.Empirical evidence suggests that a student’s motivation leveltends to go in spurts. For example, Table 1 shows actual per-formance data (initial response time, total time to click thecorrect answer, and number of incorrect guesses) of a sin-gle student doing multiple-choice geometry problems. Theproblems are not arranged according to difficulty; therefore,the obvious shift in the student’s behavior after the seventhproblem could be attributed to a change in motivation.

Item Response TheoryIRT models were developed by psychometricians to examinetest behavior at the problem level (van der Linden & Ham-bleton 1997). This granularity is in contrast to previous workthat examined behavior at the aggregate level of test scores.While IRT models encompass a wide variety of test formats,

Initial Total NumberProblem Time (s) Time (s) Incorrect

1 40 40 02 44 44 03 13 13 04 19 19 05 7 9 46 22 22 07 35 35 08 2 3 29 2 2 0

10 3 4 111 2 4 412 2 3 3

Table 1: Data from a single student using the geometry tutor.Notice the change in behavior after the first seven problems

we focus in this paper on IRT models for dichotomous userresponses (correct or incorrect).

Item Response Theory posits a static, generative modelthat relates a student’s ability, θ, to his/her performance ona given problem, Ui, via a nonlinear characteristic curve,f(Ui|θ). IRT models are data-centric models (Mayo &Mitrovic 2001) because they do not presuppose a decom-position of problems into separate, required skills. Eachproblem in an IRT model is assumed independent of theother problems. The random variable θ is drawn from a nor-mal distribution with a specified mean and variance. Therandom variables associated with each problem, Ui, comefrom a Bernoulli distribution with the probability of a cor-rect response given by the following parameterized function(Equation 1, Figure 1).

P (Ui = correct|θ) = ci +1 − ci

1 + exp(−ai(θ − bi))(1)

This is referred to as the three-parameter logistic equa-tion, where ai is the discrimination parameter that affectsthe slope of the curve, bi is the difficulty parameter thataffects the location, and ci is the pseudo-guessing param-eter that affects the lower asymptote. Note that the two-parameter logistic equation is a special case of the three-parameter equation where ci is set to zero. Consistentand efficient methods exist for estimating these parame-ters. A more thorough description of the IRT model, it’sproperties, and the role of each of the parameters can befound in any text on the subject (Baker & Kim 2004;van der Linden & Hambleton 1997).

ModelWe propose a dynamic mixture model based on Item Re-sponse Theory (DMM-IRT). The probabilistic model con-sists of four types of random variables: student proficiency,motivation, evidence of motivation, and a student’s responseto a problem.

The latent variables in the student model correspond toproficiency (θ) and motivation (Mi). Proficiency is definedto be a static variable (note, if statistical estimates of profi-ciency are made online while a student uses the tutor, theneach new data point causes the estimate to change, but the

164

Page 3: A Dynamic Mixture Model to Detect Student Motivation and ... · derived purely from screen shots suggests that simple, inex-pensive methods for estimating motivation should be useful

Figure 1: Three-parameter logistic function relating profi-ciency (θ) to the probability of a correct response. The threecurves illustrate the discrimination parameter’s effect whilefixing the other parameters at bi = 0.5 and ci = 0.2

random variable is nonetheless assumed to be static). Thisassumption is also made in IRT modeling. In this paper,we assume θ has a unidimensional normal distribution withmean 0 and variance 1. Experiments were also conductedwith a multidimensional normal distribution, but those stud-ies are not discussed because the small data set did not war-rant more hidden dimensions. Student motivation is definedas a discrete, dynamic variable. The first-order Markov as-sumption is assumed to hold; therefore, motivation on the(i + 1)th problem depends only on motivation on the ith

problem. The motivation variable can take on as many val-ues as can be distinguished given the type of interaction al-lowed in the tutoring system. For the Wayang Tutor, we haveidentified three values for the motivation variable:

1. unmotivated and exhausting the hints to reach the finalhint that gives the correct answer (‘unmotivated-hint’)

2. unmotivated and quickly guessing answers to find the cor-rect answer (‘unmotivated-guess’)

3. motivated (‘motivated’)

The two unmotivated behaviors (1 and 2 above) are the mostprevalent examples of how students game the Wayang Tutor.These are typical student behaviors and have been noted byother researchers (Baker et al. 2004). In Table 1, the stu-dent exhibits the unmotivated-guess behavior in the last fiveproblems due to the short response times.

The student model uses two observed variables for eachproblem. The first variable is the student’s initial response(Ui) which is either correct or incorrect. If a student’s initialresponse is to ask for a hint, then Ui is labeled as incorrect.The second observed variable (Hi) is defined as the evidencecorresponding to the hidden motivation variable, and thustakes on as many values as the motivation variable. For theWayang Tutor, Hi has three values:

1. ‘many-hints’ if the number of hints seen before respond-ing correctly > hmax (corresponds to unmotivated-hint)

Figure 2: A graphical depiction of the dynamic mixturemodel based on Item Response Theory (DMM-IRT)

2. ‘quick-guess’ if the number of hints seen before respond-ing correctly < hmin and if the time to first response <tmin (corresponds to unmotivated-guess)

3. ‘normal’ if neither of the other two cases apply (corre-sponds to motivated)

This variable is defined using the number of hint requestsand time spent. Since the majority of intelligent tutoring sys-tems already capture this data, the model has widespread usewith minimal or no change to the system architecture. How-ever, the random variables in this model would not changeeven if more sophisticated techniques or data sources werenecessary to detect motivation. The conditions for the valuesof Hi would change to accommodate the new information.

The graphical model in Figure 2 describes how the ob-served and latent variables are connected. As the dashed lineindicates, this model is a combination of a hidden Markovmodel (HMM) and an Item Response Theory (IRT) model.This is similar in structure to a switching state-space model(Ghahramani & Hinton 2000) with two exceptions: the con-tinuous variable in the DMM-IRT is static instead of dy-namic and the DMM-IRT uses a different function relat-ing the continuous variable to the observed variables. TheDMM-IRT is also similar to another class of models knownas latent transition analysis (Collins & Wugalter 1992), orLTA. LTA employs dynamic, latent variables to capturechanges in a student’s ability over long periods of time. Thisis different from the DMM-IRT which assumes ability isstatic and motivation is dynamic.

The DMM-IRT is a mixture model where the mixturesare defined by the behavior the student exhibits. Thisis manifested in the model by the probability distributionP (Ui|θ, Mi). When the student is motivated, the distribu-tion is described by the IRT item characteristic curve. Weused the two-parameter logistic curve (Equation 2), but analternate form can also be employed.

P (Ui = correct|θ, Mi = motivated) =1

1 + exp(−ai(θ − bi))(2)

If the student is unmotivated, then the distribution takes

165

Page 4: A Dynamic Mixture Model to Detect Student Motivation and ... · derived purely from screen shots suggests that simple, inex-pensive methods for estimating motivation should be useful

one of the following two forms.

P (Ui = correct|θ, Mi = unmotivated-guess) = di (3)

P (Ui = correct|θ, Mi = unmotivated-hint) = ei (4)

The constant di corresponds to the probability of ran-domly guessing the answer correctly. This is equivalent tothe pseudo-guessing parameter ci in Equation 1. The con-stant ei is the probability of a correct response given theunmotivated-hint behavior. The value of ei should be closeto zero since Ui is labeled as incorrect if the student’s firstresponse is to ask for a hint. A key model assumption isthat both distributions described by the unmotivated behav-ior (Equations 3 and 4) do not depend on student proficiency(e.g. the two variables are uncorrelated). Hence, an actionwhile in an unmotivated state will not alter the system’s cur-rent estimate of a student’s proficiency. If this independenceassumption is correct, then the model does not underesti-mate student proficiency by accounting for motivation.

Parameter EstimationMarginal maximum likelihood estimation (Bock & Aitkin1981) is the most common technique used to learn the IRTproblem parameters, ai and bi. For an implementation ofthis algorithm, see (Baker & Kim 2004). This is an instanceof the expectation-maximization (EM) (Dempster, Laird, &Rubin 1977) algorithm where the hidden student variables aswell as the parameters for each problem are estimated simul-taneously. The parameters are chosen to maximize the likeli-hood of the data. We adapted the Bock and Aitkin procedureto include the latent motivation variables in the DMM-IRT.

The parameter estimation procedure iterates between theexpectation step and the maximization step. In the E-Step,the probability distribution for the latent, continuous vari-able, P (θ), is integrated out of the likelihood equation. Thisintegral is approximated using the Hermite-Gauss quadra-ture method which discretizes the distribution. For this pa-per, we used ten discrete points, or quadrature nodes, equallyspaced over the interval [−4,+4] to approximate the stan-dard normal distribution. The E-Step results in estimates forthe probability of a correct response at each of the quadra-ture nodes. These estimates are then used in the M-Step,which is itself an iterative process, to determine the problemparameters, ai and bi, that best fit the logistic curve.

Baker and Kim (2004) point out that this EM algorithmis not guaranteed to converge because IRT models using thetwo-parameter logistic equation are not members of the ex-ponential family. We did not experience difficulty in gettingthe algorithm to converge, which is consistent with other re-ported findings.

MethodologyDomainExperiments were conducted with data from high school stu-dents using the Wayang Outpost (Arroyo et al. 2004). TheWayang Outpost (http://wayang.cs.umass.edu) isan intelligent tutoring system for the mathematics section ofthe SAT. The tutor presents multiple-choice geometry prob-lems to students and offers them the option to seek help in

solving the problems. The data set consists of 401 studentsand 70 problems, where a student attempted, on average, 32of the 70 total problems. For each problem that a studentcompleted, the observed variables in the model (Ui and Hi)were recorded. The parameters used to determine Hi wereset to tmin = 5 seconds, hmin = 2 hints, and hmax = 2hints for all seventy problems. A slightly more accuratemethod would assign different values to different problemsbased on the number of available hints and problem diffi-culty.

ExperimentsThree models were evaluated to determine their accuracy inpredicting whether a student would correctly answer his/hernext problem. We tested the DMM-IRT, an IRT model usingthe two-parameter logistic equation that does not measurestudent motivation, and a simple, default strategy of alwaysguessing that the student answers incorrectly (the majorityclass label).

Given the relatively small size of the data set, not all theDMM-IRT parameters were estimated simultaneously. Thediscrete conditional probability tables associated with theHMM were estimated first using only the observed motiva-tion variables, H . The learned values for these three distri-butions are shown below (where the order of values for Mi isunmotivated-hint, unmotivated-guess, and motivatedand for Hi is many-hints, quick-guess, and normal).The probability di (Equation 3) was set to 0.2 for all sev-enty problems, corresponding to an assumption of uniformguessing as there are five answers to each multiple-choiceproblem. The probability ei (Equation 4) was set to 0.01 forall seventy problems. The item characteristic curve param-eters ai and bi (Equation 2) were then estimated for eachproblem.

P (M1) =(

0.1 0.1 0.8)T

P (Hi|Mi) =

0.7 0.05 0.250.05 0.7 0.250.05 0.05 0.9

P (Mi|Mi−1) =

0.85 0.05 0.10.05 0.9 0.050.05 0.05 0.9

ValidationFive-fold cross validation was used to evaluate the models.Thus, data from 320 students was used to train the mod-els and 80 students to test the models’ accuracy. The test-ing procedure involves using the trained model to estimate astudent’s ability and motivation given performance on previ-ous problems (U1, H1, U2, H2, . . . , Ui−1, Hi−1), predict-ing how the student will do on the next problem (Ui), andcomparing the student’s actual response with the predictedresponse. This process is repeated for each student in thetest population and for each problem the student completed.The result is an accuracy metric (correct predictions dividedby total predictions) that is averaged over the five cross val-idation runs. Pseudocode for this procedure is provided inAlgorithm 1.

166

Page 5: A Dynamic Mixture Model to Detect Student Motivation and ... · derived purely from screen shots suggests that simple, inex-pensive methods for estimating motivation should be useful

Algorithm 1 The cross validation frameworkInput: aj , bj for each problem; U , H for each studentOutput: accuracyfor i = 1 to (# students in test population) do

// Assume U ij refers to the i’th student’s response

// (0 or 1) to the j’th problem he/she performedfor j = 2 to (max # problems student i finished) do{θ, Mj} ← MLE given (U i

1,Hi1, a1, b1), . . . ,

(U ij−1,H

ij−1, aj−1, bj−1)

if P (Uj = correct|θ, Mj) ≥ 0.5 thenU ← 1

elseU ← 0

if U ij == U then

correct ← correct + 1else

incorrect ← incorrect + 1accuracy ← correct / (correct + incorrect)

Cross Validation AccuracyModel Average Minimum MaximumDefault 62.5% 58.2% 67.7%

IRT 72.0% 70.4% 73.6%DMM-IRT 72.5% 71.0% 74.0%

Table 2: Average, minimum, and maximum accuracy valuesover the five cross validation runs

ResultsThe experimental results are shown in Table 2. The DMM-IRT achieved the best performance in terms of predicting theprobability of a correct student response. It predicted with72.5% accuracy whereas the IRT model had 72.0% and thebaseline strategy of always predicting an incorrect responseattained 62.5%. Accuracy values in the 70-80% range aregenerally good because both correct guesses and incorrectslips from knowledgeable students occur in multiple-choicetests.

The marginal improvement in performance by the DMM-IRT over the IRT model is not statistically significant. How-ever, it is interesting that the DMM-IRT made different pre-dictions. The model predicted more incorrect responses thanthe IRT model by accounting for student motivation. It isuseful to consider how this occurs in the DMM-IRT. If a stu-dent is unmotivated and answers a problem incorrectly, thenthe model’s estimate of proficiency does not decrease much.While this leads to larger estimates of student proficiency,this effect is offset by a decrease in the estimate of studentmotivation. The dynamics in the motivation variable allowthe model’s predictions (about the probability of a correctresponse to the next problem) to change more quickly thancould be achieved via the static accuracy variable. While thislead to a modest improvement in prediction accuracy, wehypothesize that the difference could be greater with longersequences where students perform more than 32 problems,which was the average for this data set.

The effect of motivation in the model can be explained

Initial P (Mi = P (Mi = unmot−Problem Time (s) motivated) ivated-guess)

1 40 0.99 0.012 44 0.99 0.013 13 0.99 0.014 19 0.99 0.015 7 0.97 0.026 22 0.92 0.077 35 0.72 0.268 2 0.05 0.949 2 0.01 0.99

10 3 0.01 0.9911 2 0.01 0.9912 2 0.01 0.99

Table 3: Smoothed estimates of the student’s motivationlevel, P (Mi), for each of the twelve problems shown in Ta-ble 1. The remaining probability mass is associated with theunmotivated-hint behavior

by re-examining the student performance data from Table 1.This time, we show the smoothed estimates for the marginalprobability of the motivation variable, P (Mi). These val-ues, shown in Table 3, are the probability of the studentbeing in one of the three behavioral states: motivated,unmotivated-guess, and unmotivated-hint. After theseventh problem, the model believes the student is in an un-motivated state and can therefore adjust its belief about howthe student will perform.

The data presented in Tables 1 and 3 is an ideal caseof a student displaying two distinct, non-overlapping be-haviors. This is easily captured by the model dynamics,P (Mi|Mi−1). Students do not always follow such idealbehavior. The top line in Figure 3 shows smoothed esti-mates of motivation for another student using the WayangTutor. The student’s erratic behavior makes one-step pre-dictions less reliable. Noisy dynamics may be hamperingthe DMM-IRT from achieving larger gains in cross valida-tion accuracy. Figure 3 also demonstrates the difference inproficiency estimates between the IRT and DMM-IRT. TheDMM-IRT model does not decrease its estimate of θ whenthe probability of the student being motivated is low (forexample, during problems 26-31). The IRT model is poten-tially underestimating the student’s ability.

ConclusionsWe have presented and evaluated a dynamic mixture modelbased on Item Response Theory (DMM-IRT). This modeluses a student’s behavior to disambiguate between profi-ciency and motivation. Proficiency is modeled as a static,continuous variable and motivation as a dynamic, discretevariable. These assumptions are based on a student’s ten-dency to exhibit different behavioral patterns over the courseof a tutoring session. The DMM-IRT is a combination of ahidden Markov model and an IRT model. Given this gener-ality, the model is easily tailored to many existing intelligenttutoring systems and can handle additional forms of unmoti-vated behavior. Users need only identify the evidence for theunmotivated behavior, P (H|M), and the effect on the prob-ability of a correct response, P (U |θ, M). The experiments,

167

Page 6: A Dynamic Mixture Model to Detect Student Motivation and ... · derived purely from screen shots suggests that simple, inex-pensive methods for estimating motivation should be useful

Figure 3: The top line (right y-axis) shows smoothed esti-mates of a student’s motivation. The bottom two lines (lefty-axis) show the maximum likelihood estimates of θ after iproblems for the DMM-IRT (solid) and IRT (dashed)

though run offline in Matlab, were sufficiently fast that on-line inference in a tutoring system will occur in real time.These properties of the DMM-IRT satisfy the four principlesdiscussed in the relevant literature section. We also pointout that the model can be useful for other domains, such asobject recognition, where a dynamic variable (e.g. lightingconditions) changes the context under which the static vari-able (e.g. shape of the object) is observed.

Experiments were conducted with real data of studentsusing a geometry tutoring system. Results suggest that theDMM-IRT can better predict student responses compared toa model that does not account for motivation. The DMM-IRT produced increased estimates of student proficiencyby assuming unmotivated behavior is independent of profi-ciency. This capability is important for preventing underes-timation in low-stakes assessment, but further investigationis required to determine the correlation between these twovariables.

In the future, we plan to implement the model to run on-line with the Wayang Tutor. Various strategies will be ex-plored to determine (1) how quickly an unmotivated studentcan be re-engaged, and (2) the effect on the student’s learn-ing. We hope to improve cross-validation accuracy by updat-ing the model’s dynamics to better reflect student behavior.

References

Arroyo, I., and Woolf, B. 2005. Inferring Learning andAttitudes from a Bayesian Network of Log File Data. InProceedings of the Twelfth International Conference on Ar-tificial Intelligence in Education, 33–40.

Arroyo, I.; Beal, C.; Murray, T.; Walles, R.; and Woolf, B.2004. Web-based Intelligent Multimedia Tutoring for HighStakes Achievement Tests. In Proceedings of the SeventhInternational Conference on Intelligent Tutoring Systems,468–477.

Baker, F., and Kim, S.-H. 2004. Item Response Theory:

Parameter Estimation Techniques. New York, NY: MarcelDekker, Inc.Baker, R.; Corbett, A.; Koedinger, K.; and Wagner, A.2004. Off-Task Behavior in the Cognitive Tutor Class-room: When Students ”Game the System”. In Proceedingsof the ACM CHI 2004 Conference on Human Factors inComputing Systems, 383–390.Baker, R.; Corbett, A.; and Koedinger, K. 2004. DetectingStudent Misuse of Intelligent Tutoring Systems. In Pro-ceedings of the Seventh International Conference on Intel-ligent Tutoring Systems, 531–540.Beck, J. 2005. Engagement Tracing: Using ResponseTimes to Model Student Disengagement. In Proceedingsof the Twelfth International Conference on Artificial Intel-ligence in Education, 88–95.Bock, R., and Aitkin, M. 1981. Maximum LikelihoodEstimation of Item Parameters: Applications of an EM Al-gorithm. Psychometrika 46:443–459.Collins, L., and Wugalter, S. 1992. Latent Class Models forStage-Sequential Dynamic Latent Variables. MultivariateBehavioral Research 27(1):131–157.Conati, C. 2002. Probabilistic Assessment of User’s Emo-tions in Educational Games. Journal of Applied ArtificialIntelligence, special issue on ”Merging Cognition and Af-fect in HCI” 16(7-8):555–575.de Vicente, A., and Pain, H. 2000. A ComputationalModel of Affective Educational Dialogues. AAAI Fall Sym-posium: Building Dialogue Systems for Tutorial Applica-tions, 113–121.de Vicente, A., and Pain, H. 2002. Informing the Detectionof the Students’ Motivational State: An Empirical Study. InProceedings of the Fifth International Conference on Intel-ligent Tutoring Systems, 933–943.Dempster, A.; Laird, N.; and Rubin, D. 1977. MaximumLikelihood from Incomplete Data via the EM Algorithm.Journal of Royal Statistical Society Series B 39:1–38.Ghahramani, Z., and Hinton, G. 2000. Variational Learn-ing for Switching State-Space Models. Neural Computa-tion 12(4):831–864.Mayo, M., and Mitrovic, A. 2001. Optimising ITS Be-havior with Bayesian Networks and Decision Theory. In-ternational Journal of Artificial Intelligence in Education12:124–153.van der Linden, W., and Hambleton, R., eds. 1997. Hand-book of Modern Item Response Theory. New York, NY:Springer-Verlag.VanLehn, K.; Lynch, C.; Schulze, K.; Shapiro, J.; Shelby,R.; Taylor, L.; Treacy, D.; Weinstein, A.; and Wintersgill,M. 2005. The Andes Physics Tutoring System: LessonsLearned. International Journal of Artificial Intelligence inEducation 15(3):147–204.Wise, S., and DeMars, C. 2005. Low Examinee Effortin Low-Stakes Assessment: Problems and Potential Solu-tions. Educational Assessment 10:1–17.

168