learning integrated symbolic and continuous action models joseph xu & john laird may 29, 2013

Learning Integrated Symbolic and Continuous Action Models

LearningIntegrated Symbolic and Continuous Action ModelsJoseph Xu & John LairdMay 29, 2013Action Models?BenefitsAccurate action models allow forInternal simulationBacktracking planningLearning policies via trial-and-error without incurring real-world cost3AgentWorldAgentModelExplorationRewardRewardExplorationWorldPolicyRequirementsModel learning should beAccuratePredictions made by model should be close to realityFastLearn from few examplesGeneralModels should make good predictions in many situationsOnlineModels shouldnt require sampling entire space of possible actions before being useful4Continuous EnvironmentsDiscrete objects with continuous propertiesGeometry, position, rotationInput and output are vectors of continuous numbersAgent runs in lock-step with environmentFully observable5Output-9.05.8InputEnvironmentAgent0.21.20.00.0pxpyrxryAB0.00.2pzrz3.43.90.0pxpypz0.00.00.0rxryrzAB5Action Modeling in Continuous Domains6Locally Weighted Regression7Locally Weighted Regression?xk nearest neighborsWeightedLinearRegression

8Fix equation8LWR ShortcomingsLWR generalizes based on proximity in pose spaceSmoothes together qualitatively distinct behaviorsGeneralizes with examples that are closer in absolute coordinates instead of similarity in object relationships9?LWR ShortcomingsLWR generalizes based on proximity in pose spaceSmoothes together qualitatively distinct behaviorsGeneralizes with examples that are closer in absolute coordinates instead of similarity in object relationships

10Our ApproachType of motion depends on relationships between objects, not absolute positionsLearn models that exploits relational structure of the environmentSegment behaviors into qualitatively distinct linear motions (modes)Classify which mode is in effect using relational structure

Flying mode(no contact)Ramp rolling mode(touching ramp)Bouncing mode(touching flat surface)11Learning Multi-Modal Models12~intersect(A,B) above(A,B) ball(A)Relational State intersect(A,B) above(A,B) ball(A)Timemode Imode IISegmentationClassificationRelational Mode ClassifierContinuous stateBABA0.21.20.00.00.00.20.30.90.00.00.00.2RANSAC + EMFOILScene Graph12Predict with Multi-Modal Models13~intersect(A, B)Relational Statemode Imode IIRelational Mode ClassifierContinuous stateBA0.21.20.00.00.00.2Scene Graphprediction~intersect(A,B)above(A,B)ball(A)ballballballbx03, by03, vy03~(b,p), ~(b,r)t03bx02, by02, vy02~(b,p), ~(b,r)t02bx01, by01, vy01~(b,p), ~(b,r)t01RANSACt = vy 0.98noisestaterelationstargstaterelationstargstaterelationstargstaterelationstargplatformrampSay learning y-velocityExplain noise mode, cant be classified into existing modesChange noise to unknown14RANSACDiscover new modesChoose random set of noise examplesFit line to setAdd all noise examples that also fit lineIf set is large (>40), create a new mode with those examplesOtherwise, repeat.

15New modeRemainingnoise1.2.3.4.ballbx03, by03, vy03~(b,p), ~(b,r)t03bx02, by02, vy02~(b,p), ~(b,r)t02bx01, by01, vy01~(b,p), ~(b,r)t01RANSACt = vy 0.98noisestaterelationstargstaterelationstargstaterelationstargstaterelationstargplatformrampSay learning y-velocityExplain noise mode, cant be classified into existing modesChange noise to unknown16bx06, by06, vy06~(b,p), ~(b,r)t06bx05, by05, vy05~(b,p), ~(b,r)t05bx04, by04, vy04~(b,p), ~(b,r)t04bx03, by03, vy03~(b,p), ~(b,r)t03bx02, by02, vy02~(b,p), ~(b,r)t02bx01, by01, vy01~(b,p), ~(b,r)t01t = vy 0.98ballballballballEMnoisestaterelationstargstaterelationstargstaterelationstargstaterelationstargplatformrampExpectation MaximizationSimultaneously learn:Association between training data and modesParameters for mode functions ExpectationAssume mode functions are correctCompute likelihood that mode generated data point MaximizationAssume likelihoods are correctFit mode functions to maximize likelihoodIterate until convergence to local maximum18bx06, by06, vy06~(b,p), ~(b,r)t06bx05, by05, vy05~(b,p), ~(b,r)t05bx04, by04, vy04~(b,p), ~(b,r)t04bx03, by03, vy03~(b,p), ~(b,r)t03bx02, by02, vy02~(b,p), ~(b,r)t02bx01, by01, vy01~(b,p), ~(b,r)t01t = vy 0.98ballballbx07, by07, vy07(b,p), ~(b,r)t07Clause: ~(b,p)FOILnoisestaterelationstargstaterelationstargstaterelationstargstaterelationstargplatformrampFOILLearn classifiers to distinguish between two modes (positives and negatives) based on relationsOuter loop: Iteratively add clauses that cover the most positive examplesInner loop: Iteratively add literals that rule out negative examplesObject names are variablized for generality20

Clause# pos. ex.~intersect(target, any)16221~z-overlap(target, any)6162east-of(target, x)36~x-overlap(target, any)21east-of(x, target)9~ontop(target, any) & above(target, x)2FOILFOIL learns binary classifiers, but there can be many modesUse one-to-one strategy:Learn classifier between each pair of modesEach classifier votes between two modesMode with most votes winsbx06, by06, vy06~(b,p), ~(b,r)t06bx05, by05, vy05~(b,p), ~(b,r)t05bx04, by04, vy04~(b,p), ~(b,r)t04bx03, by03, vy03~(b,p), ~(b,r)t03bx02, by02, vy02~(b,p), ~(b,r)t02bx01, by01, vy01~(b,p), ~(b,r)t01t = vy 0.98ballballballbx09, by09, vy09(b,p), ~(b,r)t09bx08, by08, vy08(b,p), ~(b,r)t08bx07, by07, vy07(b,p), ~(b,r)t07RANSACt = vyClause: ~(b,p)noisestaterelationstargstaterelationstargstaterelationstargstaterelationstargplatformrampbx06, by06, vy06~(b,p), ~(b,r)t06bx05, by05, vy05~(b,p), ~(b,r)t05bx04, by04, vy04~(b,p), ~(b,r)t04bx03, by03, vy03~(b,p), ~(b,r)t03bx02, by02, vy02~(b,p), ~(b,r)t02bx01, by01, vy01~(b,p), ~(b,r)t01t = vy 0.98bx13, by13, vy13(b,p), ~(b,r)t13bx12, by12, vy12(b,p), ~(b,r)t12bx11, by11, vy11(b,p), ~(b,r)t11bx10, by10, vy10(b,p), ~(b,r)t10bx09, by09, vy09(b,p), ~(b,r)t09bx08, by08, vy08(b,p), ~(b,r)t08bx07, by07, vy07(b,p), ~(b,r)t07t = vyballballballballballClause: (b,p)FOILClause: ~(b,p)noisestaterelationstargstaterelationstargstaterelationstargstaterelationstargplatformrampbx06, by06, vy06~(b,p), ~(b,r)t06bx05, by05, vy05~(b,p), ~(b,r)t05bx04, by04, vy04~(b,p), ~(b,r)t04bx03, by03, vy03~(b,p), ~(b,r)t03bx02, by02, vy02~(b,p), ~(b,r)t02bx01, by01, vy01~(b,p), ~(b,r)t01t = vy 0.98bx13, by13, vy13(b,p), ~(b,r)t13bx12, by12, vy12(b,p), ~(b,r)t12bx11, by11, vy11(b,p), ~(b,r)t11bx10, by10, vy10(b,p), ~(b,r)t10bx09, by09, vy09(b,p), ~(b,r)t09bx08, by08, vy08(b,p), ~(b,r)t08bx07, by07, vy07(b,p), ~(b,r)t07t = vyballballballballbx16, by16, vy16~(b,p), (b,r)t16bx15, by15, vy15~(b,p), (b,r)t15bx14, by14, vy14~(b,p), (b,r)t14RANSACt = vy 0.7Clause: (b,p)Clause: ~(b,p)noisestaterelationstargstaterelationstargstaterelationstargstaterelationstargplatformrampbx06, by06, vy06~(b,p), ~(b,r)t06bx05, by05, vy05~(b,p), ~(b,r)t05bx04, by04, vy04~(b,p), ~(b,r)t04bx03, by03, vy03~(b,p), ~(b,r)t03bx02, by02, vy02~(b,p), ~(b,r)t02bx01, by01, vy01~(b,p), ~(b,r)t01t = vy 0.98bx13, by13, vy13(b,p), ~(b,r)t13bx12, by12, vy12(b,p), ~(b,r)t12bx11, by11, vy11(b,p), ~(b,r)t11bx10, by10, vy10(b,p), ~(b,r)t10bx09, by09, vy09(b,p), ~(b,r)t09bx08, by08, vy08(b,p), ~(b,r)t08bx07, by07, vy07(b,p), ~(b,r)t07t = vyballbx16, by16, vy16~(b,p), (b,r)t16bx15, by15, vy15~(b,p), (b,r)t15bx14, by14, vy14~(b,p), (b,r)t14t = vy 0.7Clause: (b,r)Clause: (b,p)Clause: ~(b,p)FOILnoisestaterelationstargstaterelationstargstaterelationstargstaterelationstargplatformramp25DemoPhysics simulation with ramp, box, and ballLearn models for x and y velocities

link26Physics Simulation Experiment2D physics simulation with gravity40 possible configurationsTraining/Testing blocks run for 200 time steps40 configs x 3 seeds = 120 training blocksTest over all 40 configs using different seedRepeat with 5 reorderingsgravityrandom offsetorigin27Learned Modes28Prediction Accuracy

Compare overall accuracy against single smooth function learner (LWR)29Classifier Accuracy

Compare FOIL performance against classifiers using absolute coordinates (SVM, KNN)30KKHack legend30NuggetsMulti-modal approach addresses shortcomings of LWRDoesnt smooth over examples from different modesUses relational similarity to generalize behaviorsSatisfies requirementsAccurate. New modes are learned for inaccurate predictionsFast. Linear modes are learned from (too) few examples General. Each mode generalizes to all relationally analogical situationsOnline. Modes are learned incrementally and can immediately make predictions

31CoalsSlows down with more learning keeps every training exampleAssumes linear modesRANSAC, EM, and FOIL are computationally expensive

learning integrated symbolic and continuous action models joseph xu & john laird may 29, 2013

Documents

b abovea

existing modeschange

benefitsaccurate action

continuous propertiesgeometry

continuous numbersagent

absolute positionslearn

distinct behaviorsgeneralizes

absolute coordinates