icassp2012 poster estimating the spin of a table tennis ball using inverse compositional image...

?

†

Estimatingthespinofa table tennisballusing InverseCompositional ImageAlignment

Toru Tamaki?, Haoming Wang?, Bisser Raytchev?, Kazufumi Kaneda?, Yukihiko Ushiyama†

AbstractIn this paper, we propose a method for estimating the rotationalvelocity of a table tennis ball with Inverse Compositional ImageAlignment, ICIA [2]. Assuming orthogonal projection of the cam-era, we derive an update rule for the motion parameters. Because ofthe precomputation of the Hessian matrix in ICIA and the simpli-fying assumptions, the motion parameters estimation at each frameis very fast. We evaluate results for measuring the spins in a syn-thesized sequence, as well as in real image sequences of table tennisrallies.

ICIA overviewWe have two successive images I1 and I2, and let I1(x) be theintensity value at location x = (x, y)T in image I1. A registrationminimizes the sum of squared difference between corresponding in-tensities in I1 and I2.

In ICIA, the objective function f to be minimized is given in thefollowing form:

f =∑x

|I1(w(x; ∆p))− I2(w(x;p))|2 , (1)

where w(x;p) is a warping function that gives the location wherea point x is moved by a motion parameter p. Note that w shouldbe an identity mapping when the parameter is zero: w(x;0) =x. ∆p is obtained at each iteration step, then an update rule forICIA w(x;p) ◦ w(x; ∆p)−1 → w(x;p) is used. Here ◦ denotes acomposition of two warps that means that the warping functionsshould form a group: if w1 and w2 are valid warp functions, so isw1 ◦w2.∆p is calculated at each step of iterations. By using the first orderTaylor expansion of the term I1, f can be approximated as follows:

f =∑x

∣∣∣∣I1(x) +∇I1(x)∂w(x;0)

∂p∆p− I2(w(x;p))

∣∣∣∣2 . (2)

To obtain the best update ∆p of the parameter that gives the min-imum of the function, we solve ∂f

∂∆p = 0. Then, we have

∆p = H−1∑x

[∇I1(x)

∂w(x;0)

∂p

]T[I2(w(x;p))− I1(x)] , (3)

where the Hessian H is given by

H =∑x

[∇I1(x)

∂w(x;0)

∂p

]T [∇I1(x)

∂w(x;0)

∂p

]. (4)

The advantage of ICIA is to avoid the computation of H in eachiteration step. Because computing H is very time consuming, pre-computation and reuse of H make the algorithm very fast.

Design of the System

AssumptionsWe assume that the camera is orthographic. This is a reasonableassumption because the ball is much smaller (40 mm in diameter)than the distance to the camera. In the experiments, the imagesare taken at a distance of 3 to 5 meters from a player. The ratio ofthe ball diameter and the distance is 1.3% to 0.8%, and then theappearance change of the ball in the image screen is less than 1%(we omit the derivation of this simple geometry due to the spacelimit). Therefore, the depth of the ball can be ignored.The radius r of the ball is assumed to be given. In the experimentalsetup, the direction of the motion of the ball between the players isusually perpendicular to the optical axis of the camera. Hence, thechange of appearance of the ball is negligible (also this is obviousfrom the discussion above), and it is reasonable to assume that r isconstant for all frames and given in advance.Also we assume that the center location c = (cx, cy)T of the ballin every frame is given by some other source: e.g. by using circledetection with the Hough transform, or simply by user interaction.

Motion parametersWe choose the angular velocities about three axes to parameterizethe spin of the ball. Angular velocities are represented by the Eulerangles α, β, γ between successive frames. Then, a rotation matrixR is constructed by the angles: R = RαRβRγ . However, note thatthe angles are not accumulated over frames, but reset every frameto represent angular velocities (not angles) in order to avoid erroraccumulation.As we consider the translation of the ball only in the image plane,translation is represented by a vector t = (tx, ty)T . Because thedirection of the ball’s motion is approximately perpendicular to theoptical axis and the depth of the ball is relatively small as mentionedabove, we can ignore the translation along the depth direction.Therefore, the motion parameter vector p we use includes thefollowing five parameters: p = (α, β, γ, tx, ty)T .

Ball shape and depthSince the motion parameters represent three-dimensional rigid mo-tion, we also use the depth of the ball to register the images.The depth is obtained as follows. We assume that the target isa sphere with radius r: x2 + y2 + z2 = r2. Rewriting this equa-tion, we obtain an equation for the depth at location x on the ball:z(x) = ±

√r2 − (x2 + y2).We simply use the positive values of the

above equation. This enables a visibility test easy because if thedepth of a transformed point becomes negative, it is not visible tothe camera.

Warping functionWe define the warping function as follows:

w(x;p) = Po

[R

(x− cz(x− c)

)+

(t0

)]+ c, (5)

where z is obtained from the shpere equation, andPo is an ortho-graphic projection matrix.The warp includes a 3D rigid transforma-tion (rotation and translation) of a 2D point x. First, x is centeredat c (i.e., x−c), and the z component is added to make a 3D vector.Next, it is rotated by R and translated by t. Then, the 3D vectoris projected to 2D by Po. Finally, c is added to move it back.

Update ruleAn update rule in the form w(x;p) ◦ w(x; ∆p)−1 → w(x;p) isderived as follows.Let two 2D points and their corresponding 3D points be:

x1 = w(x; ∆p) = PoX1, (6)x2 = w(x;p) = PoX2. (7)

Now the composition of two warps is obtained by writing x2 interms of x1 by traversing x1, x, and then x2.Here X1 and X2 are as follows:

X1 = ∆R(X −C) + ∆T + C, (8)X2 = R(X −C) + T + C, (9)

where

X =

(x

z(x− c)

), C =

(c0

), T =

(t

0

). (10)

∆R and ∆T are parameters corresponding to ∆p.First, we have

X = ∆R−1(X1 −C −∆T ) + C, (11)

then after substituting X in X2:

X2 = (R∆R−1)(X1 −C) + (T −R∆R−1∆T ) + C. (12)

Thus, we obtain the updated motion parameters:

R← R∆R−1, t← Po(T −R∆R−1∆T ). (13)

Experimental Results: Simulation

First we evaluated the accuracy of the estimated rotation angles for asynthesized image sequence with a texture-mapped 3D sphere model.At each frame, the ball rotates by ±5 degrees about either Y axis(first 20 frames), X axis (next 20 frames) or Z axis (last 20 frames).Fig. 3 shows the estimated rotation angles. The total spin anglewas computed by cos−1

(tr(R̂)−1

2

), where R̂ is the rotation matrix

obtained from the estimated angles. RMSE of the total spin angleover 60 frames of the sequence is 0.0177 [deg] which is quite small andshows the accuracy of the proposed method.

Fig. 3: Estimated rotation angles for the synthesized sequence.

Experimental Results: Real sequences

Fig. 4: Sequences used in the experiments for different players.

1 2 3 4 5 6

frame number

-4000

-2000

0

2000

4000

6000

rpm

totalalphabetagamma

1 2 3 4 5 6

frame number

-4000

-2000

0

2000

4000

6000

rpm

totalalphabetagamma

(a) Spins in rpm.

1 2 3 4 5 6

frame number

-2

0

2

4

6

8

pix

el per

fram

e

tx

ty

1 2 3 4 5 6

frame number

-2

0

2

4

6

8

pix

el per

fram

e

tx

ty

(b) 2D translations in pixels at each frame.

Fig. 5: Estimated RPM and translations.

Here we describe experimental results of spin estima-tion for real image sequences of table tennis rallies.Fig. 4 shows two image sequences of different play-ers. Images were taken at 600 fps by a fixed high-speedcamera with halogen lamps mounted from the side ofthe player. An official 40 mm table tennis ball was ran-domly textured by marker pens. The angle parametersα, β, and γ were measured in radian per 1/600 secondbecause they were estimated by using two successiveframes. Then the angles were converted to rotation-per-minute (rpm). The radius and center locations wereobtained by manually in these experiments.Fig. 5(a) shows the estimated spins in rpm. The spinsα, β, and γ of are shown separately, while the total spinis also shown. Both sequences start one or two framesbefore the racket hits the ball. As can be seen, the spinsare small in the first two frames, then increase rapidlyat the third frame.2D translations are shown in Fig. 5(b). The verticalaxis represents translations at each frame; e.g., veloci-ties of the ball. At the beginning of the sequences, theballs are falling vertically, then hit by the racket. There-fore, ty is large at first (because y axis is downward),then tx increases (x axis is rightward) at the third frame.

ConclusionsWe have proposed a method for measuring the spin of a table ten-nis ball with Inverse Compositional Image Alignment under someassumptions that are practical for this application. The prototypesystem of the proposed method is useful and currently used fortraining of players. Although in this paper we have focused specif-ically on table tennis, the proposed method has a large variety ofpotential applications in other sports too: football, baseball, golf,volleyball, and so on. Directions of future work to make the systemmore practical include automatic detection of the center and radiusof the ball, and the use of temporal information of multiple frames.

References[1] Toru Tamaki, Takahiko Sugino, Masanobu Yamamoto, “Mea-

suring Ball Spin by Image Registration,” in Proc. of the 10thKorea-Japan Joint Workshop on Frontiers of Computer Vision(FCV2004), pp.269–274, 2004.

[2] Simon Baker, Iain Matthews, “Lucas-Kanade 20 Years On: AUnifying Framework,” IJCV, 56(3), 221-255, 2004.

icassp2012 poster estimating the spin of a table tennis ball using inverse compositional image...

Technology