icassp2012 poster estimating the spin of a table tennis ball using inverse compositional image...

1
? Estimating the spin of a table tennis ball using Inverse Compositional Image Alignment Toru Tamaki ? , Haoming Wang ? , Bisser Raytchev ? , Kazufumi Kaneda ? , Yukihiko Ushiyama Abstract In this paper, we propose a method for estimating the rotational velocity of a table tennis ball with Inverse Compositional Image Alignment, ICIA [2]. Assuming orthogonal projection of the cam- era, we derive an update rule for the motion parameters. Because of the precomputation of the Hessian matrix in ICIA and the simpli- fying assumptions, the motion parameters estimation at each frame is very fast. We evaluate results for measuring the spins in a syn- thesized sequence, as well as in real image sequences of table tennis rallies. ICIA overview We have two successive images I 1 and I 2 , and let I 1 (x) be the intensity value at location x =(x, y ) T in image I 1 . A registration minimizes the sum of squared difference between corresponding in- tensities in I 1 and I 2 . In ICIA, the objective function f to be minimized is given in the following form: f = X x |I 1 (w(xp)) - I 2 (w(x; p))| 2 , (1) where w(x; p) is a warping function that gives the location where a point x is moved by a motion parameter p. Note that w should be an identity mapping when the parameter is zero: w(x; 0)= x. Δp is obtained at each iteration step, then an update rule for ICIA w(x; p) w(xp) -1 w(x; p) is used. Here denotes a composition of two warps that means that the warping functions should form a group: if w 1 and w 2 are valid warp functions, so is w 1 w 2 . Δp is calculated at each step of iterations. By using the first order Taylor expansion of the term I 1 , f can be approximated as follows: f = X x I 1 (x)+ I 1 (x) w(x; 0) p Δp - I 2 (w(x; p)) 2 . (2) To obtain the best update Δp of the parameter that gives the min- imum of the function, we solve ∂f Δp = 0. Then, we have Δp = H -1 X x I 1 (x) w(x; 0) p T [I 2 (w(x; p)) - I 1 (x)] , (3) where the Hessian H is given by H = X x I 1 (x) w(x; 0) p T I 1 (x) w(x; 0) p . (4) The advantage of ICIA is to avoid the computation of H in each iteration step. Because computing H is very time consuming, pre- computation and reuse of H make the algorithm very fast. Design of the System Assumptions We assume that the camera is orthographic. This is a reasonable assumption because the ball is much smaller (40 mm in diameter) than the distance to the camera. In the experiments, the images are taken at a distance of 3 to 5 meters from a player. The ratio of the ball diameter and the distance is 1.3% to 0.8%, and then the appearance change of the ball in the image screen is less than 1% (we omit the derivation of this simple geometry due to the space limit). Therefore, the depth of the ball can be ignored. The radius r of the ball is assumed to be given. In the experimental setup, the direction of the motion of the ball between the players is usually perpendicular to the optical axis of the camera. Hence, the change of appearance of the ball is negligible (also this is obvious from the discussion above), and it is reasonable to assume that r is constant for all frames and given in advance. Also we assume that the center location c =(c x ,c y ) T of the ball in every frame is given by some other source: e.g. by using circle detection with the Hough transform, or simply by user interaction. Motion parameters We choose the angular velocities about three axes to parameterize the spin of the ball. Angular velocities are represented by the Euler angles α,β,γ between successive frames. Then, a rotation matrix R is constructed by the angles: R = R α R β R γ . However, note that the angles are not accumulated over frames, but reset every frame to represent angular velocities (not angles) in order to avoid error accumulation. As we consider the translation of the ball only in the image plane, translation is represented by a vector t =(t x ,t y ) T . Because the direction of the ball’s motion is approximately perpendicular to the optical axis and the depth of the ball is relatively small as mentioned above, we can ignore the translation along the depth direction. Therefore, the motion parameter vector p we use includes the following five parameters: p =(α,β,γ,t x ,t y ) T . Ball shape and depth Since the motion parameters represent three-dimensional rigid mo- tion, we also use the depth of the ball to register the images. The depth is obtained as follows. We assume that the target is a sphere with radius r : x 2 + y 2 + z 2 = r 2 . Rewriting this equa- tion, we obtain an equation for the depth at location x on the ball: z (x)= ± p r 2 - (x 2 + y 2 ).We simply use the positive values of the above equation. This enables a visibility test easy because if the depth of a transformed point becomes negative, it is not visible to the camera. Warping function We define the warping function as follows: w(x; p)= P o R x - c z (x - c) + t 0 + c, (5) where z is obtained from the shpere equation, andP o is an ortho- graphic projection matrix.The warp includes a 3D rigid transforma- tion (rotation and translation) of a 2D point x. First, x is centered at c (i.e., x - c), and the z component is added to make a 3D vector. Next, it is rotated by R and translated by t. Then, the 3D vector is projected to 2D by P o . Finally, c is added to move it back. Update rule An update rule in the form w(x; p) w(xp) -1 w(x; p) is derived as follows. Let two 2D points and their corresponding 3D points be: x 1 = w(xp)= P o X 1 , (6) x 2 = w(x; p) = P o X 2 . (7) Now the composition of two warps is obtained by writing x 2 in terms of x 1 by traversing x 1 , x, and then x 2 . Here X 1 and X 2 are as follows: X 1 R(X - C )+ΔT + C , (8) X 2 = R(X - C )+ T + C , (9) where X = x z (x - c) , C = c 0 , T = t 0 . (10) ΔR and ΔT are parameters corresponding to Δp. First, we have X R -1 (X 1 - C - ΔT )+ C , (11) then after substituting X in X 2 : X 2 =(RΔR -1 )(X 1 - C )+(T - RΔR -1 ΔT )+ C . (12) Thus, we obtain the updated motion parameters: R RΔR -1 , t P o (T - RΔR -1 ΔT ). (13) Experimental Results: Simulation First we evaluated the accuracy of the estimated rotation angles for a synthesized image sequence with a texture-mapped 3D sphere model. At each frame, the ball rotates by ±5 degrees about either Y axis (first 20 frames), X axis (next 20 frames) or Z axis (last 20 frames). Fig. 3 shows the estimated rotation angles. The total spin angle was computed by cos -1 tr( ˆ R)-1 2 , where ˆ R is the rotation matrix obtained from the estimated angles. RMSE of the total spin angle over 60 frames of the sequence is 0.0177 [deg] which is quite small and shows the accuracy of the proposed method. Fig. 3: Estimated rotation angles for the synthesized sequence. Experimental Results: Real sequences Fig. 4: Sequences used in the experiments for different players. 1 2 3 4 5 6 frame number -4000 -2000 0 2000 4000 6000 rpm total alpha beta gamma 1 2 3 4 5 6 frame number -4000 -2000 0 2000 4000 6000 rpm total alpha beta gamma (a) Spins in rpm. 1 2 3 4 5 6 frame number -2 0 2 4 6 8 pixel per frame tx ty 1 2 3 4 5 6 frame number -2 0 2 4 6 8 pixel per frame tx ty (b) 2D translations in pixels at each frame. Fig. 5: Estimated RPM and translations. Here we describe experimental results of spin estima- tion for real image sequences of table tennis rallies. Fig. 4 shows two image sequences of different play- ers. Images were taken at 600 fps by a fixed high-speed camera with halogen lamps mounted from the side of the player. An official 40 mm table tennis ball was ran- domly textured by marker pens. The angle parameters α, β , and γ were measured in radian per 1/600 second because they were estimated by using two successive frames. Then the angles were converted to rotation- per-minute (rpm). The radius and center locations were obtained by manually in these experiments. Fig. 5(a) shows the estimated spins in rpm. The spins α, β , and γ of are shown separately, while the total spin is also shown. Both sequences start one or two frames before the racket hits the ball. As can be seen, the spins are small in the first two frames, then increase rapidly at the third frame. 2D translations are shown in Fig. 5(b). The vertical axis represents translations at each frame; e.g., veloci- ties of the ball. At the beginning of the sequences, the balls are falling vertically, then hit by the racket. There- fore, t y is large at first (because y axis is downward), then t x increases (x axis is rightward) at the third frame. Conclusions We have proposed a method for measuring the spin of a table ten- nis ball with Inverse Compositional Image Alignment under some assumptions that are practical for this application. The prototype system of the proposed method is useful and currently used for training of players. Although in this paper we have focused specif- ically on table tennis, the proposed method has a large variety of potential applications in other sports too: football, baseball, golf, volleyball, and so on. Directions of future work to make the system more practical include automatic detection of the center and radius of the ball, and the use of temporal information of multiple frames. References [1] Toru Tamaki, Takahiko Sugino, Masanobu Yamamoto, “Mea- suring Ball Spin by Image Registration,” in Proc. of the 10th Korea-Japan Joint Workshop on Frontiers of Computer Vision (FCV2004), pp.269–274, 2004. [2] Simon Baker, Iain Matthews, “Lucas-Kanade 20 Years On: A Unifying Framework,” IJCV, 56(3), 221-255, 2004.

Upload: toru-tamaki

Post on 04-Jul-2015

5.513 views

Category:

Technology


0 download

DESCRIPTION

Tamaki Toru, Haoming Wang, Bisser Raytchev, Kazufumi Kaneda, Yukihiko Ushiyama: "Estimating the spin of a table tennis ball using inverse compositional image alignment", Proc. of ICASSP 2012 ; 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing,pp. 1457-1460 (2012 03), Kyoto International Conference Center , Kyoto, Japan, March 25-30, 2012.

TRANSCRIPT

Page 1: ICASSP2012 Poster Estimating the spin of a table tennis ball using inverse compositional image alignment

?

Estimatingthespinofa table tennisballusing InverseCompositional ImageAlignment

Toru Tamaki?, Haoming Wang?, Bisser Raytchev?, Kazufumi Kaneda?, Yukihiko Ushiyama†

AbstractIn this paper, we propose a method for estimating the rotationalvelocity of a table tennis ball with Inverse Compositional ImageAlignment, ICIA [2]. Assuming orthogonal projection of the cam-era, we derive an update rule for the motion parameters. Because ofthe precomputation of the Hessian matrix in ICIA and the simpli-fying assumptions, the motion parameters estimation at each frameis very fast. We evaluate results for measuring the spins in a syn-thesized sequence, as well as in real image sequences of table tennisrallies.

ICIA overviewWe have two successive images I1 and I2, and let I1(x) be theintensity value at location x = (x, y)T in image I1. A registrationminimizes the sum of squared difference between corresponding in-tensities in I1 and I2.

In ICIA, the objective function f to be minimized is given in thefollowing form:

f =∑x

|I1(w(x; ∆p))− I2(w(x;p))|2 , (1)

where w(x;p) is a warping function that gives the location wherea point x is moved by a motion parameter p. Note that w shouldbe an identity mapping when the parameter is zero: w(x;0) =x. ∆p is obtained at each iteration step, then an update rule forICIA w(x;p) ◦ w(x; ∆p)−1 → w(x;p) is used. Here ◦ denotes acomposition of two warps that means that the warping functionsshould form a group: if w1 and w2 are valid warp functions, so isw1 ◦w2.∆p is calculated at each step of iterations. By using the first orderTaylor expansion of the term I1, f can be approximated as follows:

f =∑x

∣∣∣∣I1(x) +∇I1(x)∂w(x;0)

∂p∆p− I2(w(x;p))

∣∣∣∣2 . (2)

To obtain the best update ∆p of the parameter that gives the min-imum of the function, we solve ∂f

∂∆p = 0. Then, we have

∆p = H−1∑x

[∇I1(x)

∂w(x;0)

∂p

]T[I2(w(x;p))− I1(x)] , (3)

where the Hessian H is given by

H =∑x

[∇I1(x)

∂w(x;0)

∂p

]T [∇I1(x)

∂w(x;0)

∂p

]. (4)

The advantage of ICIA is to avoid the computation of H in eachiteration step. Because computing H is very time consuming, pre-computation and reuse of H make the algorithm very fast.

Design of the System

AssumptionsWe assume that the camera is orthographic. This is a reasonableassumption because the ball is much smaller (40 mm in diameter)than the distance to the camera. In the experiments, the imagesare taken at a distance of 3 to 5 meters from a player. The ratio ofthe ball diameter and the distance is 1.3% to 0.8%, and then theappearance change of the ball in the image screen is less than 1%(we omit the derivation of this simple geometry due to the spacelimit). Therefore, the depth of the ball can be ignored.The radius r of the ball is assumed to be given. In the experimentalsetup, the direction of the motion of the ball between the players isusually perpendicular to the optical axis of the camera. Hence, thechange of appearance of the ball is negligible (also this is obviousfrom the discussion above), and it is reasonable to assume that r isconstant for all frames and given in advance.Also we assume that the center location c = (cx, cy)T of the ballin every frame is given by some other source: e.g. by using circledetection with the Hough transform, or simply by user interaction.

Motion parametersWe choose the angular velocities about three axes to parameterizethe spin of the ball. Angular velocities are represented by the Eulerangles α, β, γ between successive frames. Then, a rotation matrixR is constructed by the angles: R = RαRβRγ . However, note thatthe angles are not accumulated over frames, but reset every frameto represent angular velocities (not angles) in order to avoid erroraccumulation.As we consider the translation of the ball only in the image plane,translation is represented by a vector t = (tx, ty)T . Because thedirection of the ball’s motion is approximately perpendicular to theoptical axis and the depth of the ball is relatively small as mentionedabove, we can ignore the translation along the depth direction.Therefore, the motion parameter vector p we use includes thefollowing five parameters: p = (α, β, γ, tx, ty)T .

Ball shape and depthSince the motion parameters represent three-dimensional rigid mo-tion, we also use the depth of the ball to register the images.The depth is obtained as follows. We assume that the target isa sphere with radius r: x2 + y2 + z2 = r2. Rewriting this equa-tion, we obtain an equation for the depth at location x on the ball:z(x) = ±

√r2 − (x2 + y2).We simply use the positive values of the

above equation. This enables a visibility test easy because if thedepth of a transformed point becomes negative, it is not visible tothe camera.

Warping functionWe define the warping function as follows:

w(x;p) = Po

[R

(x− cz(x− c)

)+

(t0

)]+ c, (5)

where z is obtained from the shpere equation, andPo is an ortho-graphic projection matrix.The warp includes a 3D rigid transforma-tion (rotation and translation) of a 2D point x. First, x is centeredat c (i.e., x−c), and the z component is added to make a 3D vector.Next, it is rotated by R and translated by t. Then, the 3D vectoris projected to 2D by Po. Finally, c is added to move it back.

Update ruleAn update rule in the form w(x;p) ◦ w(x; ∆p)−1 → w(x;p) isderived as follows.Let two 2D points and their corresponding 3D points be:

x1 = w(x; ∆p) = PoX1, (6)x2 = w(x;p) = PoX2. (7)

Now the composition of two warps is obtained by writing x2 interms of x1 by traversing x1, x, and then x2.Here X1 and X2 are as follows:

X1 = ∆R(X −C) + ∆T + C, (8)X2 = R(X −C) + T + C, (9)

where

X =

(x

z(x− c)

), C =

(c0

), T =

(t

0

). (10)

∆R and ∆T are parameters corresponding to ∆p.First, we have

X = ∆R−1(X1 −C −∆T ) + C, (11)

then after substituting X in X2:

X2 = (R∆R−1)(X1 −C) + (T −R∆R−1∆T ) + C. (12)

Thus, we obtain the updated motion parameters:

R← R∆R−1, t← Po(T −R∆R−1∆T ). (13)

Experimental Results: Simulation

First we evaluated the accuracy of the estimated rotation angles for asynthesized image sequence with a texture-mapped 3D sphere model.At each frame, the ball rotates by ±5 degrees about either Y axis(first 20 frames), X axis (next 20 frames) or Z axis (last 20 frames).Fig. 3 shows the estimated rotation angles. The total spin anglewas computed by cos−1

(tr(R̂)−1

2

), where R̂ is the rotation matrix

obtained from the estimated angles. RMSE of the total spin angleover 60 frames of the sequence is 0.0177 [deg] which is quite small andshows the accuracy of the proposed method.

Fig. 3: Estimated rotation angles for the synthesized sequence.

Experimental Results: Real sequences

Fig. 4: Sequences used in the experiments for different players.

1 2 3 4 5 6

frame number

-4000

-2000

0

2000

4000

6000

rpm

totalalphabetagamma

1 2 3 4 5 6

frame number

-4000

-2000

0

2000

4000

6000

rpm

totalalphabetagamma

(a) Spins in rpm.

1 2 3 4 5 6

frame number

-2

0

2

4

6

8

pix

el per

fram

e

tx

ty

1 2 3 4 5 6

frame number

-2

0

2

4

6

8

pix

el per

fram

e

tx

ty

(b) 2D translations in pixels at each frame.

Fig. 5: Estimated RPM and translations.

Here we describe experimental results of spin estima-tion for real image sequences of table tennis rallies.Fig. 4 shows two image sequences of different play-ers. Images were taken at 600 fps by a fixed high-speedcamera with halogen lamps mounted from the side ofthe player. An official 40 mm table tennis ball was ran-domly textured by marker pens. The angle parametersα, β, and γ were measured in radian per 1/600 secondbecause they were estimated by using two successiveframes. Then the angles were converted to rotation-per-minute (rpm). The radius and center locations wereobtained by manually in these experiments.Fig. 5(a) shows the estimated spins in rpm. The spinsα, β, and γ of are shown separately, while the total spinis also shown. Both sequences start one or two framesbefore the racket hits the ball. As can be seen, the spinsare small in the first two frames, then increase rapidlyat the third frame.2D translations are shown in Fig. 5(b). The verticalaxis represents translations at each frame; e.g., veloci-ties of the ball. At the beginning of the sequences, theballs are falling vertically, then hit by the racket. There-fore, ty is large at first (because y axis is downward),then tx increases (x axis is rightward) at the third frame.

ConclusionsWe have proposed a method for measuring the spin of a table ten-nis ball with Inverse Compositional Image Alignment under someassumptions that are practical for this application. The prototypesystem of the proposed method is useful and currently used fortraining of players. Although in this paper we have focused specif-ically on table tennis, the proposed method has a large variety ofpotential applications in other sports too: football, baseball, golf,volleyball, and so on. Directions of future work to make the systemmore practical include automatic detection of the center and radiusof the ball, and the use of temporal information of multiple frames.

References[1] Toru Tamaki, Takahiko Sugino, Masanobu Yamamoto, “Mea-

suring Ball Spin by Image Registration,” in Proc. of the 10thKorea-Japan Joint Workshop on Frontiers of Computer Vision(FCV2004), pp.269–274, 2004.

[2] Simon Baker, Iain Matthews, “Lucas-Kanade 20 Years On: AUnifying Framework,” IJCV, 56(3), 221-255, 2004.