multi-level control architecture for bionic handling …...multi-level control architecture for...

Multi-level control architecture for Bionic Handling Assistant robot

Milad Malekzadeh1, J. F. Queißer2 and Jochen J. Steil1

Abstract— A multi-level control architecture from low-level tohigh-level trajectory planning for the Bionic Handling Assistant(BHA) robot is proposed. In this architecture, the combinationof imprecise analytical model and machine learning techniquesmakeigs the controller faster and more accurate in the low-levelwhile a recent Task-parametrized Learning from Demonstra-tion (LfD) approach is exploited in the higher level to encodethe position and orientation attractors. A variant of DynamicalSystems’ application maps the position and orientation informa-tion into the attractor space. The performance of the proposedarchitecture is tested with a BHA robot in an apple-pickingexperiment.

INTRODUCTION

The control of soft continuum robots is challenging owingto the mechanical elasticity and complex dynamics of softmanipulators. An additional challenge emerges when weneed to collect demonstrations and apply LfD. Common PIDcontrollers for soft continuum manipulators typically havepoor performance. On the other hand, the implementation ofadvanced force and impedance control approaches is difficultdue to the lack of analytical models and the slow dynamics.Recently, different methods have been proposed to control acontinuum robot. For a recent review see [1].

Although most of the recent researches in this area arefocused on developing new bio-inspired soft robots, in thispaper our aim is to propose a multi-level control architecturewhich enables us to control such a robot. We deploy learningacross all levels to enable the application of LfD for a real-world manipulation task. To record the demonstrations, anactively compliant controller is used. A version of dynamicalsystems that are able to encode both position and orientationthen maps the recorded 6D end-effector pose data into avirtual attractor space. A recent LfD method encodes the poseattractors within the same model for point-to-point motionplanning.

The designed apple-picking experiment of this paper isimplemented on the Bionic Handling assistant (BHA) robotthat has been designed by Festo as a robotic 3D printedpendant to an elephant trunk. It is pneumatically actuated andcomprises several continuous parallel components operatedat low pressures, which makes the BHA inherently safe forphysical interaction with humans and an interesting platformfor collaborative robotics.

HYBRID CONTROL ARCHITECTURE

For the reaching task of this paper, positioning errors below1cm are required to be able to successfully grasp objects

1Institute for Robotics and Process Control (IRP), TU BraunschweigMuehlenpfordtstr. 23, 38106 Braunschweig - Germany

2Research Institute of Cognition and Robotics (CoR-Lab), Faculty ofTechnology at Bielefeld University, Germany

-1 -0.5 0 0.5 1

y[m]

0

0.2

0.4

0.6

0.8

1

1.2

z[m

]

Kin. modelKin. model + ELM

Fig. 1. A qualitative comparison of kinematic model based on tourssegments with and without error correction by machine learning.

for which standard modeling techniques are not sufficient.We exploit hybrid modeling techniques to combine estimatedanalytical models (classical control) and machine learning [2].The overall implementation is depicted with blue rectanglesin the scheme of Fig. 2. The aim of each control block isto serve one or multiple specific goals. Here we explain thedifferent blocks of Fig. 2 from low to high level control.

I & II. To overcome the inherent slow dynamics of theBHA robot, the low-level PID length controller is first aug-mented with an equilibrium model to generate an additionalfeed-forward signal. The equilibrium model predicts requiredpressures for postures with zero velocity and acceleration. Weutilized a variant of the Extreme Learning Machine (ELM) aslearning method that allows to integrate additional constraints.The combination of a slow PID controller and the feedforward signal of the equilibrium model leads to a significantimprovement of length control [3].

III. We then enhance the precision of the approximateconstant-curvature model by learning the error of the forwardkinematics to ground truth the data collected by a VICONtracking system. We assume that the representation of theerror compensation for the kinematics of the robot requiresa less complex function than learning the kinematics fromscratch.

The approach is based on the assumption that the segmentsobey the Constant Curvature Model principles which allowkinematic simulation of continuous deformations. For eachsegment, the related three measured lengths of the actuatorsare used to compute the coordinate transformation betweentwo segments, which can then be chained in order to get thecomplete forward kinematics from base to end-effector. Toreduce the residual error, using another ELM, we estimatethe error ε(q) of the constant-curvature based forward model

Fig. 2. The proposed overall control scheme. Each control block, highlighted by a number is explained through out the text.

fccm with respect to the current chamber lengths q to obtainan optimized model estimate ffkin(q) = fccm(q)+ ε(q) [2].This allows to reach a mean error for the positioning of theend-effector below 0.6 cm which is good in relation to therepeatability of 4cm as measured without learning in [4].Fig. 1 shows a qualitative comparison of kinematic modelbased on tours segments with and without error correctionby machine learning.

IV. We refer to the utilization of an active compliant controlof the robot to implement a kinesthetic teaching mode [3]. Inthe compliant mode, the difference of the sensed pressures pfrom the predicted pressures p of the pneumatic bellows forthe current measured posture lreal is observed. A deformationof the robot while keeping the bellows’ pressures constantis possible due to the elastic properties which results in amismatch between predicted and observed pressures. In casethis mismatch exceeds the threshold T , a posture update isinitiated to comply with the deformed robot configuration.

V. For encoding the motion trajectory of a single point,instead of incorporating one dynamical system with a singlefinal goal and learning the force term for every datapointalong the trajectory (Dynamical Movement Primitives), aset of attractor points (attractor trajectory) is extracted byconsidering multiple dynamical systems i.e. one dynamicalsystem for each datapoint along the trajectory [5].

In this paper, the following different dynamical systemsencode the position and orientation of a virtual unit mass ateach datapoint, due to their independent modalities [6]

xp =KP(xp − xp)−KVxp, (1)

xo = 2KO log(xo ∗ xo)−KWxo, (2)

where KP , KV ,KO, KW ∈ R3×3 are the stiffness anddamping matrices, xp, xp and xp are the position, velocityand acceleration of the end-effector’s position and xo andxo are the axis-angle representation of angular velocityand acceleration. xo, xo and xo represent the quaternionorientation, its conjugate and the attractor in the form of

World frame=W

0

0W

0

0W=F1

F2

0

0

Frame 1

F1 = W

0

0

Frame 2

F2

(a) Demonstrations in the world-frame (b) Demonstrations in the local-frames

Fig. 3. (a) A Gaussian Mixture Model with three components encodesthe whole trajectories without considering task parameters in a single worldframe. (b) Assuming two frame of references for the same data and observingfrom different frames by transforming the data.

unit quaternions. The position attractor trajectory xp and theorientation attractors xo extracted from the above equations,are computed given the recorded pose information (IV).

The computed position attractor xp and orientation attractor(in unit quaternion space) xo, will be used through out thenext section as the position and orientation data to be learned.

VI. Task-parametrized Gaussian mixture model (TP-GMM)is an extension of the classic Gaussian mixture model in whichseveral frames of references are considered to describe therobot behavior [7]. This considers the effects of different task-parameters whilst in standard GMM, the approach averagesout between all trajectories. The demonstrations in the world-frame can be observed from different coordinate frames (Fig.3), each of which considers the local variations from the pointof view the specific frame i.e., the effect of that frame on thefinal movement. This local model is realized as mixture ofGaussian distributions. The component-wise product of thelocal GMMs after being transformed into the world-framecompromises between all of them.

(a) (b)

(c)

Fig. 4. Apple-reaching: (a) frame of references in apple reaching. (b) Thegray lines and the green arrows respectively show the collected movementsand apple poses. (c) A sample reproduced position and orientation of theend-effector given one of the demonstrations. The red line and arrows are thereproduces trajectory and orientations (plotted every 8 time-steps).The redellipsoids are the retrieved GMM corresponding to the position attractors.

The TP-GMM is a set of local Gaussian Mixture Modelswith a common mixing coefficient defined and optimizedat each frame of reference and equal number of Gaussiancomponents. The parameters of a model with K componentsare defined by {πi, {µi

j ,Σij}Pj=1}Ki=1, where πi is the mixing

coefficient for the ith Gaussian component and µij and Σi

j

are the center and covariance matrix of the ith Gaussian com-ponent locally defined at frame j. The parameter learning isachieved by maximizing the log-likelihood function, resultingin an iterative Expectation-Maximization (EM) algorithm.

In the reproduction phase, given a set of new frames andthe learned model, a new GMM with parameters can begenerated. Gaussian Mixture Regression (GMR) then retrievesthe output part of the new datapoint (in our case new positionand orientation attractors). The position and orientation dataare finally obtained by integrating the equations (1) and (2)and substituting the attractors.

EXPERIMENTS

VII. The real BHA robot with 9 DOF is used in an applepicking experiment consisting of two parts: (1) approachingtoward an apple and grasping it (apple-reaching) and (2)putting it into a basket by approaching the basket (apple-picking). The grasping is is realized by closing the robot’scompliant gripper that adaptively wraps around the object.

A. Reaching an apple

In this part, given the position and orientation of a hangingapple, the robot must align its end-effector with the apple’sorientation, then approach towards it and finally must grab

(a) (b)

(c)

Fig. 5. Apple-picking: (a) An example position of the apple and basketframes. (b) 15 demonstrations in which the pink and green arrows representthe different positions of apples and basket frames. (c) A sample reproducedend-effector position for one of the demonstrations (red line).

it by closing its gripper (Fig. 4). The 24 demonstrationsrecorded for this part of the experiment, include 100 samplesof time, the position of the end-effector and its orientationin unit quaternions (D = 8), and start and end fixed-frame of references i.e., ξ ∈ R8×2400 after concatenationof the demonstrations. The virtual attractors for position andorientation are calculated while the stiffness and dampinggains were set to kP = 500, kV = 50 and kO = 250,kW = 25 respectively (over-damped).

B. Picking the apple

Fig. 5 shows 15 collected demonstrations with 4 dimensions(time and end-effector position) i.e., ξ ∈ R4×1500. We chosenot to consider the orientation of the end-effector whereas theapple can be placed in the basket without needing to encodethe orientation.

The generalization capability of the proposed approachwas examined successfully in practice, by providing differentposes of apple and different positions for basket using thereal BHA robot. We present a few cases in Fig. 6. The com-plementary videos can be found here: https://goo.gl/wnCX1o.The position of the basket frame is updating at each time-step and if the basket moves during the experiment, the robotis able to follow. A new frame is fed into the model thatproduces a new end-effector trajectory.

The hybrid kinematics model, improved by error learning

Fig. 6. Different poses in apple reaching: (top row) The apples in different poses. The learned model was used successfully to reach apples with differentposition and orientation. (bottom row) The basket in different positions. The model for the second part of the experiment was examined in multiple situations.

(IV) is then utilized to command the robot precisely. In ourexperiment the proposed model was able to produce suitableand smooth movements between the points.

REFERENCES

[1] G. T. Thomas, A. Yasmin, F. Egidio, and L. Cecilia, “Control strategiesfor soft robotic manipulators: A survey,” Soft Robotics, vol. 5, no. 2,pp. 149–163, 2018.

[2] R. Reinhart, Z. Shareef, and J. Steil, “Hybrid analytical and data-drivenmodeling for feed-forward robot control,” Sensors, vol. 17, no. 2, 2017.

[3] J. F. Queißer, K. Neumann, M. Rolf, R. F. Reinhart, and J. J. Steil, “Anactive compliant control mode for interaction with a pneumatic softrobot,” in IEEE/RSJ Intl Conf. on Intelligent Robots and Systems, 2014.

[4] M. Rolf and J. J. Steil, “Efficient exploratory learning of inversekinematics on a bionic elephant trunk,” IEEE transactions on neuralnetworks and learning systems, vol. 25, no. 6, pp. 1147–1160, 2014.

[5] S. Calinon, Z. Li, T. Alizadeh, N. G. Tsagarakis, and D. G. Caldwell,“Statistical dynamical systems for skills acquisition in humanoids,” inProc. IEEE Intl Conf. on Humanoid Robots (Humanoids), Osaka, Japan,2012, pp. 323–329.

[6] J. Silverio, L. Rozo, S. Calinon, and D. G. Caldwell, “Learning biman-ual end-effector poses from demonstrations using task-parameterizeddynamical systems,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robotsand Systems (IROS), Hamburg, Germany, Sept.-Oct. 2015.

[7] S. Calinon, “A tutorial on task-parameterized movement learning andretrieval,” Intelligent Service Robotics, vol. 9, no. 1, pp. 1–29, 2016.

multi-level control architecture for bionic handling …...multi-level control architecture for...

Documents