real-time feature-based video mosaicing at 500 fps

Upload: andy-maddocks

Post on 02-Jun-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Real-time Feature-based Video Mosaicing at 500 Fps

    1/6

    Real-time Feature-based Video Mosaicing at 500 fps

    Ken-ichi Okumura, Sushil Raut, Qingyi Gu, Tadayoshi Aoyama, Takeshi Takaki, and Idaku Ishii

    Abstract We conducted high-frame-rate (HFR) video mo-saicing for real-time synthesis of a panoramic image by imple-menting an improved feature-based video mosaicing algorithmon a field-programmable gate array (FPGA)-based high-speedvision platform. In the implementation of the mosaicing algo-rithm, feature point extraction was accelerated by implementinga parallel processing circuit module for Harris corner detectionin the FPGA on the high-speed vision platform. Feature pointcorrespondence matching can be executed for hundreds ofselected feature points in the current frame by searching thosein the previous frame in their neighbor ranges, assumingthat frame-to-frame image displacement becomes considerablysmaller in HFR vision. The system we developed can mosaic512512 images at 500 fps as a single synthesized image inreal time by stitching the images based on their estimated

    frame-to-frame changes in displacement and orientation. Theresults of an experiment conducted, in which an outdoor scenewas captured using a hand-held camera-head that was quicklymoved by hand, verify the performance of our system.

    I. INTRODUCTION

    Video mosaicing is a video processing technique in which

    multiple images are merged into a composite image that

    covers a larger, more seamless view than the field of view

    of the camera. Video mosaicing has been used to generate

    panoramic pictures in many applications such as tracking,

    surveillance, and augmented reality. Most video mosaicing

    methods [1] determine the transform parameters between

    images by calculating feature points using feature extractors

    such as the KLT tracker [2] and SIFT features [3] to match

    feature point correspondences between images. Many real-

    time video mosaicing approaches for fast image registration

    have been proposed. Kourogi et al. proposed a fast image reg-

    istration method that uses pseudo motion vectors estimated

    as gradient-based optical flow [4]. Civera et al. developed

    a drift-free video mosaicing system that obtains a consistent

    image from 320240 images at 30 fps using an EKF-SLAMapproach when previously viewed scenes are revisited [5].

    de Souza et al. performed real-time video mosaicing without

    over-deformation of mosaics using a non-rigid deformation

    model [6]. Most of these proposals only apply to standardvideos at relatively low frame rates (e.g., NTSC 30 fps).

    Further, the camera has to be operated slowly, without any

    large displacement between frames, in order to prevent diffi-

    culties in matching feature point correspondences. However,

    in scenarios involving rapid camera motion, video images

    at such low frame rates are insufficient for tracking feature

    points in the images. Thus, there is a demand for real-time

    K. Okumura, S. Raut, Q. Gu, T. Aoyama, T. Takaki, and I. Ishii. are withHiroshima University, Hiroshima 739-8527, Japan (corresponding author (I.Ishii) Tel: +81-82-424-7692; e-mail: [email protected]).

    high-frame-rate (HFR) video mosaicing at rates of several

    hundred frames per second and greater.

    Many field-programmable gate array (FPGA)-based HFR

    vision systems have been developed for real-time video

    processing at hundreds of frames per second and greater

    [7][9]. Although HFR vision systems can simultaneously

    calculate scalar image features, they cannot transfer the entire

    image at high speed to a personal computer (PC) owing to the

    limitations in the transfer speed of its inner bus. We recently

    developed IDP Express [10], an HFR vision platform that

    can simultaneously process an HFR video and directly map

    it onto memory allocated in a PC. Implementing a real-

    time function to extract feature points in images and matchtheir correspondences between frames on such an FPGA-

    based high-speed vision platform would make it possible to

    accelerate video mosaicing at a higher frame rate than 30

    fps, even when the camera moves quickly.

    In this study, we developed a real-time video mosaicing

    system that operates 512512 images at 500 fps. The sys-tem can simultaneously extract and enable correspondence

    between feature points in images for estimation of frame-

    to-frame displacement, and stitch the captured images into a

    panoramic image to give a seamless wider field of view.

    I I . ALGORITHM

    Most video mosaicing algorithms are realized by executingthe following processes [1]: (1) feature point detection, (2)

    feature point matching, (3) transform estimation, and (4)

    image composition. In many applications, image composition

    at dozens of frames per second is sufficient for the human

    eye to monitor a panoramic image. However, feature tracking

    at dozens of frames per second is not always accurate when

    the camera is moving rapidly, because most feature points

    are mistracked when image displacements between frames

    become large. In HFR vision, frame-to-frame displacement

    in feature points becomes considerably smaller. Narrowing

    the search range by assuming HFRs reduces the inaccuracy

    and computational load needed to match feature points

    between frames even when hundreds of feature points areobserved from a rapidly moving camera, whereas image

    composition at an HFR requires a heavy computational load.

    We introduce a real-time video mosaicing algorithm that

    can resolve this trade-off between tracking accuracy and

    computational load in video mosaicing for rapid camera

    motion. In our algorithm, steps (1)(3) are executed at an

    HFR on an HFR vision platform, and step (4) is executed

    at dozens of frames per second to enable the human eye

    to monitor the panoramic image. In this study, we focus on

    acceleration of steps (1)(3) for HFR feature point tracking

    2013 IEEE/RSJ International Conference onIntelligent Robots and Systems (IROS)November 3-7, 2013. Tokyo, Japan

    978-1-4673-6357-0/13/$31.00 2013 IEEE 2665

  • 8/10/2019 Real-time Feature-based Video Mosaicing at 500 Fps

    2/6

    at hundreds of frames per second or more on the basis of

    the following concepts:

    (a) Feature point detection accelerated by hardware logic

    Generally, feature point detection requires pixel-level com-

    putation of the order O(N2), where N2 is the imagesize. Gradient-based feature point detection is suitable for

    acceleration by hardware logic because the local calculation

    of brightness gradients can be easily parallelized as pixel-level computation. In this study, we accelerate gradient-based

    feature point detection by implementing its parallel circuit

    logic on an FPGA-based HFR vision platform.

    (b) Reduction in the number of feature points

    Thousands of feature points are not always required for

    estimating transform parameters between images in video

    mosaicing. The computational load can be reduced by se-

    lecting a smaller number of feature points because feature-

    level computation of the order O(M2) is generally requiredin feature point matching; M is the number of selected

    feature points. In this study, the number of feature points used

    in feature point matching is reduced by excluding closely

    crowded feature points as they often generate correspondingerrors between images in feature point matching.

    (c) Narrowed search range by assuming HFR video

    Feature point matching still requires heavy computation

    of the order O(M2) if it is required that all the featurepoints correspond to each other between frames, even when a

    reduced number of feature points are used. In an HFR video,

    it can be assumed that frame-to-frame image displacement

    grows considerably smaller, which allows a smaller search

    range to be used for points corresponding to points in the

    preceding frame. This narrowed search range reduces the

    computational load of feature point matching to correspond-

    ing feature points between frames on the order ofO(M).

    Our algorithm for real-time HFR video mosaicing consists

    of the following processes:

    (1) Feature point detection

    (1-a) Feature extraction

    To extract feature points such as upper-left vertexes, we

    use the following brightness gradient matrix C(x, t):

    C(x, t) =

    xNa(x)

    I2x(x, t) I

    x(x, t)I

    y(x, t)Ix(x, t)I

    y(x, t) I2y (x, t)

    , (1)

    where Na(x) is the a a adjacent area of pixel x = (x, y),and Ix(x, t) and I

    y(x, t) indicate the following positivevalues ofIx(x, t) and Iy(x, t), respectively:

    I(x, t) =

    I(x, t) (I(x, t)> 0)

    0 (otherwise) (= x, y), (2)

    where Ix(x, t) and Iy(x, t) are x and y differentials of theinput image I(x, t) at pixel x at time t.

    (x, t) is defined as a feature for feature tracking usingthe Harris corner detection [11] as follows:

    (x, t) = det C(x, t) (TrC(x, t))2. (3)

    Here, is a tunable sensitive parameter, and values in the

    range 0.040.15 have been reported as feasible.

    (1-b) Feature point detection

    Thresholding is conducted for(x, t)with a threshold Tto obtain a map of feature points, R(x, t), as follows:

    R(x, t) =

    1 ((x, t)> T)0 (otherwise)

    . (4)

    The number of feature points when R(x, t) = 1 is countedin the p p adjacent area ofx as follows:

    P(x, t) =

    xNp(x)

    R(x, t), (5)

    where P(x, t) indicates the density of the feature points.

    (1-c) Selection of feature points

    To reduce the number of feature points, closely crowded

    feature points are excluded by counting the number of feature

    points in the neighborhood. The reduced set of feature points

    R(t) is calculated at time t as follows:

    R(t) = {x | P(x, t) P0} , (6)

    whereP0 is a threshold used to sparsely select feature points.

    We assume that the number of feature points is less than M.

    (2) Feature point matching

    (2-a) Template matching

    To enable correspondence between feature points at the

    current time t and those at the previous time tp= t nt,template matching is conducted for all the selected feature

    points in an image. t is the shortest frame interval of avision system, and n is an integer. For template matching to

    enable the correspondence of the i-th feature point at timetpbelonging toR(tp),xi(tp) (1 i M), to thei

    -th feature

    point at time t belonging to R (t), xi(t) (1 i M), the

    sum of absolute difference (SAD) is calculated as follows:

    E(i, i; t, tp) =

    =(,)Wm

    |I(xi(t)+, t) I(xi(tp)+, tp)|, (7)

    where Wm is an m m window of template matching.To reduce the number of mismatched points, x(xi(tp); t),

    which indicates the feature point at time t corresponding to

    the i-th feature point xi(tp) at time tp, and x(xi(t); tp),which indicates the feature point at time tp corresponding

    to the i-th feature point xi(t) at time t, are bidirectionallysearched by selecting the feature points when E(i, i; t, tp)is at a minimum in their adjacent areas as follows:

    x(xi(tp); t) = xi(i)(t) = arg minxi (t)Nb(xi(tp))

    E(i, i; t, tp), (8)

    x(xi(t); tp) = xi(i)(tp) = arg minxi(tp)Nb(xi (t))

    E(i, i; t, tp), (9)

    wherei(i) and i(i)are determined as the index numbers ofthe feature point at time t, corresponding to xi(tp), and thefeature point at time tp, corresponding toxi(t), respectively.The pair of feature points between time t and tp are selected

    when the corresponding feature points are mutually selected

    as the same points as follows:

    x(xi(tp); t) =

    x(xi(tp); t) (i= i(i

    (i))) (otherwise)

    , (10)

    2666

  • 8/10/2019 Real-time Feature-based Video Mosaicing at 500 Fps

    3/6

    where indicates that no feature point at time t correspondsto the i-th feature point xi(tp) at time tp, and the featurepoints at time t that have no corresponding point at time tp,

    are adjudged to be newly appearing feature points.

    We assume that the image displacement between times t

    and tp is small, and the feature point xi(t) at time t can bematched with a feature point at time tp in the b b adjacentarea ofxi(t). The processes described in Eqs. (8)(10) areconducted for all the feature points belonging to R(tp) andR(t), and we can reduce the computational load of featurepoint matching in the order ofO(M) by setting a narrowedsearch range with a small value for b.

    (2-b) Frame interval selection

    To avoid accumulated registration errors caused by small

    frame-to-frame displacements, the frame interval of feature

    point matching is adjusted for feature point matching at time

    t + t using the following averaged distance among thecorresponding feature points at times t and tp:

    d(t; tp) =

    xi(tp)

    Q(t;tp)

    |x(xi(tp); t)xi(tp)|

    S(Q(t; tp)) , (11)

    where Q(t; tp) is a set of the feature points xi(tp) at timetp that satisfy x(xi(tp); t)= (i= 1, , M). S(Q(t; tp))is the number of elements belonging to Q(t; tp).

    Depending on whether d(t; tp) is larger than d0, n = n(t)is determined for feature point matching at time t + t; theframe interval of feature point matching is initially set tot.

    (2-b-i) d(t; tp) d0,The image displacement between times t and tp= tp(t)is

    too small for feature point matching. To increase the frame-

    to-frame displacement at the next time t + t, the parameter

    n(t + t) is incremented by one as follows:n(t + t) = n(t) + 1. (12)

    This indicates that the feature points at time t+ t arematched with those at time tp(t+t) = tn(t)t. Withoutconducting any process in steps (3) and (4) below, return to

    step (1) for the next time interval t + t.

    (2-b-ii) d(t; tp)> d0To reset the frame-to-frame displacement at time t + t,

    n(t + t) is set to one as follows:

    n(t + t) = 1. (13)

    This indicates that the feature points at time t+ t arematched with those at time tp(t + t) = t. Thereafter, go tosteps (3) and (4); the selected pairs of feature points, xi(t)(Q(t; tp)) and x(xi(t); tp), are used in steps (3) and (4).

    (3) Transform estimation

    Using the selected pairs of feature points, affine parameters

    between the two images at times t and tp are estimated as

    follows:

    xTi (t) =

    a1 a2a3 a4

    xT(xi(t); tp) +

    a5a6

    , (14)

    = A(t; tp)xT(xi(t); tp) +b

    T(t; tp), (15)

    where aj (j = 1, , 6) are components of the transformmatrixA(t; tp)and translation vectorb(t; tp)that express theaffine transform relationship between the two images at times

    t and tp. In this study, they are estimated by minimizing the

    following Tukeys biweight evaluation functions,E1 and E2:

    El=

    xi(tp)Q(t;tp)

    wli d2li (l= 1, 2). (16)

    Deviations d1i and d2i are given as follows:

    d1i= |x

    i (a1x

    i+ a2y

    i+ a5)|, (17)

    d2i= |y

    i (a3x

    i+ a4y

    i+ a6)|, (18)

    where x(xi(t); tp) = (x

    i, y

    i) and x

    i(t) = (x

    i , y

    i) indicatethe temporally estimated location of the i-th feature point at

    time t in Tukeys biweight method. The following processes

    are iteratively executed to reduce estimation errors caused

    by distantly mismatched pairs of feature points. Here, the

    weights w1i and w2i are set to one and x

    i(t) set to xi(t)as their initial values.

    (3-a) Affine parameter estimation

    On the basis of the least squares method to minimize E1andE2, affine parametersaj(j = 1, , 6)are estimated bycalculating the weighted product sums of the xy coordinates

    of the selected feature points as follows:

    a1a2

    a5

    =

    i

    w1ix2i

    i

    w1ix

    iy

    i

    i

    w1ix

    ii

    w1ix

    iy

    i

    i

    w1iy2i

    i

    w1iy

    ii

    w1ix

    i

    i

    w1iy

    i

    i

    w1i

    1

    i

    w1ix

    i x

    ii

    w1ix

    i y

    ii

    w1ix

    i

    , (19)

    a3a4a6

    =i

    w2ix2i

    i

    w2ix

    iy

    ii

    w2ix

    ii

    w2ix

    iy

    i

    i

    w2iy2i

    i

    w2iy

    ii

    w2ix

    i

    i

    w2iy

    i

    i

    w2i

    1

    i

    w2iy

    ix

    ii

    w2iy

    iy

    ii

    w2iy

    i

    . (20)

    Using the affine parameters estimated in Eqs. (19) and

    (20), the temporally estimated locations of feature points can

    be updated as follows:

    xTi (t) =

    a1 a2a3 a4

    xT(xi(t); tp) +

    a5a6

    . (21)

    (3-b) Updating weights

    To reduce the errors caused by distantly mismatched pairs

    of feature points, w1i and w2i are updated for the i-th pairof feature points using the deviations d1i and d2i as follows:

    w1iw2i

    =

    1 d21iW2u

    2

    1 d22iW2u

    2 (d1i Wu, d2i Wu)

    00

    (otherwise)

    . (22)

    Wu is a parameter used to determine the cut point for

    evaluation functions at u time iteration. As the number of

    2667

  • 8/10/2019 Real-time Feature-based Video Mosaicing at 500 Fps

    4/6

    iterationsu is larger, Wu is set to a smaller value. Increment

    u by one and return to step (3-a) during u U.

    AfterU time iteration, the affine parameters between two

    input images at times t and tp are determined, and the

    affine transform matrix and vector are obtained. A(t; tp)andb(t; tp)are accumulated with the affine parameters estimatedat time tp as follows:

    A(t) = A(t; tp)A(tp), (23)

    bT(t) = A(tp)bT(t; tp)+b

    T(tp), (24)

    where A(t) and b(t) indicate the affine parameters of theinput image at time t, compared with the input image at

    time 0; A(0) and b(0) are initially given as the unit matrixand zero vector, respectively.

    (4) Image composition

    To monitor the panoramic image at intervals oft, thepanoramic image G(x, t) is updated by attaching an affine-transformed input image at time t to the panoramic image

    G(x, t t) at time t t as follows:

    G(x, t) =

    I(x

    , t) (ifI(x

    , t)=)G(x, t t) (otherwise)

    , (25)

    where the panoramic image G(x, t) is initially given as anempty image at time 0; x indicates the affine-transformed

    coordinate vector at time t as follows:

    xT =A(t)xT +bT(t). (26)

    Our video mosaicing algorithm can be efficiently executed

    as a multi-rate video processing method. The interval in steps

    (1) and (2) can be set to the shortest frame interval t foraccurate feature point tracking in HFR vision, while that in

    step (3) depends on the frame-to-frame image displacement;

    it is the shortest frame interval when there is significant cam-era motion, and it becomes larger when the camera motion

    becomes smaller. On the other hand, the interval in step (4),

    t, can be independently set to dozens of millisecond toenable the human eye to monitor the panoramic image.

    III. SYSTEM I MPLEMENTATION

    A. High-speed vision platform: IDP Express

    To realize real-time video mosaicing at an HFR, our

    video-mosaicing method was implemented on the high-speed

    vision platform IDP Express [10]. The implemented system

    consisted of a camera head, a dedicated FPGA board (IDP

    Express board), and a PC. The camera head was highly

    compact and could easily be held by a human hand. Itsdimensions and weight were 232377 mm and 145 g,respectively. On the camera head, eight-bit RGB images

    of 512512 pixels could be captured at 2000 fps with aBayer filter on its CMOS image sensor. The IDP Express

    board is designed for high-speed processing and recording

    of 512512 images transferred from the camera head at2000 fps. On the IDP Express board, hardware logic can be

    implemented on a user-specified FPGA (Xilinx XC3S5000-

    4FG900) for acceleration of image processing algorithms.

    The 512512 input images, and the processed results on the

    thresholding

    submodule

    grayimage

    4 parallel

    color-to-gray

    convertor

    multiplication

    &

    summation

    gradientmatrix C

    4 parallel 4 parallel

    trace&

    determinant

    2'xI

    2'y

    I

    yx II ''

    4 parallel

    20bit34

    featuremultiplication

    &

    subtraction

    CTr

    Cdet

    16 parallel

    35bitx4

    36bit435bitx4

    Harris Corner Detector Submodule

    8bitx4

    ),( tF x

    colorinputimage

    ),( tI x8bitx4

    featurepoint

    counter

    4 parallel

    ),( tx

    featurepointmap

    1bitx4

    ),( tB x ),( tP x8bitx4

    numberof

    featurepoints

    8bitx4

    ),( tF x

    colorinputimage

    Feature Point Extraction Circuit Module

    Fig. 1. Schematic data flow of a feature point extraction circuit module.

    IDP Express board, can be memory-mapped via 16-lane PCI-

    e buses in real time at 2000 fps onto the allocated memories

    in the PC, and arbitrary processing for the execution of

    software on the PC can be programmed with an ASUSTek

    P5E mainboard, Core 2 Quad9300 bulk 2.50 GHz CPU,

    4 GB memory, and a Windows XP Professional 32-bit OS.

    B. Implemented hardware logic

    To accelerate steps (1-a) and (1-b) in our video mosaic-ing method, we designed a feature point extraction circuit

    module in the user-specific FPGA on the IDP Express;

    steps (1-a) and (1-b) have a computational complexity of

    O(N2). The other processes (steps (1-c), (2), (3), and (4))were implemented in software on the PC because, with their

    computational complexity ofO(M), they can be acceleratedby reducing the number of feature points.

    Figure 1 is a schematic data flow of the feature point

    extraction circuit module implemented in the FPGA on the

    IDP Express board. It consists of a color-to-gray converter

    submodule, a Harris feature extractor submodule, and a

    feature point counter submodule. Raw eight-bit 512512

    input color images with a Bayer filter, F(x, t), are scannedin units of four pixels from the upper left to the lower

    right using X and Y signals at 151.2 MHz. The color-to-

    gray converter submodule converts RGB images into eight-

    bit gray-level images I(x, t) in parallel with the four pixelsafter RGB conversion for input images with a Bayer filter.

    In the Harris feature detector submodule, Ix and I

    y are

    calculated using 33 Prewitt operators, and

    I2x,

    IxI

    y,

    and

    I2y in the adjacent 33 pixels(a= 3) are calculatedas 20-bit data using 160 adders and 12 multipliers in parallel

    in units of four pixels at 151.2 MHz. The feature (x, t) iscalculated as 35-bit data by subtracting det C(x, t) with athree-bit shift value of(TrC(x, t))2 in units of four pixels

    when = 0.0625; 16 multipliers, 24 adders, and four three-bit shifters are implemented for calculating (x, t).

    The feature point counter submodule can obtain the posi-

    tions of feature points as a 512512 binary map B(x, t) bythresholding(x, t) with a threshold T in parallel in unitsof four pixels at 151.2 MHz; B(x, t) indicates whether afeature point is located at pixel x at time t. The number of

    feature points in the 55 adjacent area (p = 5) is countedfor all the feature points as a 512512 map P(x, t) using96 adders in parallel in units of four pixels; P(x, t) is usedfor checking closely crowded feature points in step (1-c).

    2668

  • 8/10/2019 Real-time Feature-based Video Mosaicing at 500 Fps

    5/6

  • 8/10/2019 Real-time Feature-based Video Mosaicing at 500 Fps

    6/6

    t=0.0s

    t=1.0s

    t=2.0s

    t=3.0s

    t=4.0s

    t=0.5s

    t=1.5s

    t=2.5s

    t=3.5s

    t=4.5s

    Fig. 5. Synthesized panoramic images

    matrix and translation vector were set as unit matrix and zero

    vector, respectively, at t = 0. It can be seen that most of the

    feature points in the input images were correctly extracted.Figure 4 shows the frame interval of feature point matching,

    and the x- and y-coordinates for translation vector b(t) fort= 0.05.0 s. Corresponding to the cameras motion, it canbe seen that a short frame interval was set for the high-speed

    scene, and a long frame interval for the low-speed scene.

    Figure 5 shows the synthesized panoramic images of

    32001200 pixels, taken at intervals of 0.5 s. With time,the panoramic image was extended by accurately stitching

    affine transformed input images over the panoramic image

    at the previous frame, and large buildings and background

    forests on the far side were observed in a single panoramic

    image at t = 4.5 s. These figures indicate that our systemcan accurately generate a single synthesized image in real

    time by stitching 512512 input images at 500 fps whenthe camera head is quickly moved by a human hand.

    V. CONCLUSION

    We developed a high-speed video mosaicing system that

    can stitch 512512 images at 500 fps to give a singlesynthesized panoramic image in real time. On the basis of the

    experimental results, we plan to improve our video mosaicing

    system for more robust and long-time video mosaicing,

    and to extend the application of this system to video-based

    SLAM and other surveillance technologies in the real world.

    REFERENCES

    [1] B. Zitova and J. Flusser, Image registration methods: A survey,Image Vis. Comput., 21 , 9771000, 2003.

    [2] J. Shi and C. Tomasi, Good features to track, Proc. IEEE Conf.Comput. Vis. Patt. Recog., 593-600, 1994.

    [3] D.G. Lowe, Distinctive image features from scale-invariant key-points, Int. J. Comput. Vis., 60 , 91110, 2004.

    [4] M. Kourogi, T. Kurata, J. Hoshino, and Y. Muraoka, Real-time imagemosaicing from a video sequence, Proc. Int. Conf. Image Proc., 133137, 1999.

    [5] J. Civera, A.J. Davison, J.A. Magallon, and J.M.M. Montiel, Drift-free real-time sequential mosaicing,Int. J. Comput. Vis.,81, 128137,2009.

    [6] R.H.C. de Souza, M. Okutomi, and A. Torii, Real-time imagemosaicing using non-rigid registration, Proc. 5th Pacific Rim Conf.on Advances in Image and Video Technology, 311322, 2011.

    [7] Y. Watanabe, T. Komuro, and M. Ishikawa, 955-fps real-time shapemeasurement of a moving/deforming object using high-speed visionfor numerous-point analysis, Proc. IEEE Int. Conf. Robot. Autom.,31923197, 2007.

    [8] S. Hirai, M. Zakoji, A. Masubuchi, and T. Tsuboi, Realtime FPGA-based vision system, J. Robot. Mechat., 17 , 401409, 2005.

    [9] I. Ishii, R. Sukenobe, T. Taniguchi, and K. Yamamoto, Develop-ment of high-speed and real-time vision platform, H3 Vision, Proc.

    IEEE/RSJ Int. Conf. Intelli. Rob. Sys., 36713678, 2009.[10] I. Ishii, T. Tatebe, Q. Gu, Y. Moriue, T. Takaki, and K. Tajima, 2000

    fps real-time vision system with high-frame-rate video recording,Proc. IEEE Int. Conf. Robot. Autom., 15361541, 2010.

    [11] C. Harris and M. Stephens, A combined corner and edge detector,Proc. the 4th Alvey Vis. Conf., 147151, 1988.

    2670