automatic image collection of objects with similar function by learning human grasping forms

Automatic Image Collection of Objects with Similar Function by Learning Human Grasping Forms

Automatic Image Collection of Objects with Similar Function by Learning Human Grasping Forms Shinya Morioka, Tadashi Matsuo, Yasuhiro Hiramoto, Nobutaka Shimada, Yoshiaki ShiraiRitsumeikan UniversityThank you for introduction.Good Morning. Im Shinya Morioka from Ritsumeikan University.

Now, I will talk about our work, (I would like to present our work entitled) Automatic Image Collection of Objects with Similar Function by Learning Human Grasping Forms

===========================================================================================

1OutlineMotivationRelated WorkProposed MethodResultsConclusionsThis is the outline of my presentation.

MotivationRelated Work Proposed MethodResultsConclusions===========================================================================================

2Motivation(1/2)How to classify unknown objects into categories such as for drinking, for cutting ... ?

To learn this relation, many labeled images are required.

for drinkingfor cutting

but...ObjectHand

First, I would like to talk about the motivation of our research.

We are trying to realize a system classifying unknown objects into categories such as for drinking or for cutting or for writing and so on.

We focus on the fact that a grasping form for each object depends on such categories.[]

We can find how to use a grasped object from how to grasp it.

So, we want to extract the relation for the classification by machine learning method. But it requires many labeled images.

[]posture

===========================================================================================

3

Motivation(2/2)How to collect labeled images?Object features may be invisible !We propose a method to estimate an object region and standard coordinate system based on the wrist.

Extracted SURF featuresLabeled image normalized with wrist

We want these labeled images for machine learning.

But it is difficult to estimate an object region by features from the object itself, because its some parts may be hidden by the hand (as these figures).

Since each category has very various images, it is difficult to manually collect sufficient number of labeled images.

Therefore, we propose a method to automatically collect them.

So, we have to use local features on a hand to estimate an object region and its relative position.We take SURF as a local feature.

In this presentation, we propose a method to estimate (such) an object region and standard coordinate system based on the wrist (from these local features).

===========================================================================================

4Related workR. Filipovych, E. Ribeiro, Recognizing Primitive Interactions by Exploring Actor-Object States, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8(2008).

They focus on estimation of a time sequence of states when one interact with an object.It is based on the relation between local appearances and interaction state. It is difficult to estimate an object region in a single image.We propose a method to estimate an object region and standard coordinate system based on the wrist.There is a research focusing relation between a hand and an object.

They focus on estimation of interaction state in a time sequence.

So, the research does not include object extraction from an image.

We propose a method to estimate an object region and standard coordinate system based on the wrist.

===========================================================================================[]5Proposed MethodTraining a local estimator of wrist positionEstimating wrist positionTraining a local estimator of object positionGenerating wrist-object coordinate systemTraining a estimator of object region on the wrist-object coordinate systemEstimating an object regionTrainingEstimating2.4.6.Our proposed method consists of 6 stages on training. These, 6 stages from 1 to 6.On estimation, there are 3 stages, these the stage 2, 4, and 6.===========================================================================================

6Proposed Method(1/6)Training a local estimator of wrist positionEstimating wrist positionTraining a local estimator of object positionGenerating wrist-object coordinate systemTraining a estimator of object region on the wrist-object coordinate systemEstimating an object regionTrainingEstimating2.4.6.Now, I explain the stage 1.[]===========================================================================================

7Training a local estimator of wrist position

wrist position(manual )

Local coordinate for SURF feature 1 Local coordinate for SURF feature2Train Randomized Trees(RTs) with relative wrist positions for SURF features.

B : blockBw : the block including wrist positionWith trained RTs, we can calculate ......At first, we train Randomized Trees(RTs) with relative wrist positions for SURF features.

Here, we use the word SURF feature as a set of position, scale, direction and SURF descriptor.

As these figures, we define local coordinate for each SURF feature.

The coordinate is determined by the position, scale and direction of the SURF feature.

This feature generates this coordinate, and this feature generates this coordinate.()

The RTs are trained with the wrist position represented by the local coordinate.

With the trained RTs, we can calculate probability distribution() of wrist position on local coordinate for each SURF feature.

===========================================================================================8Proposed Method(2/6)Training a local estimator of wrist positionEstimating wrist positionTraining a local estimator of object positionGenerating wrist-object coordinate systemTraining a estimator of object region on the wrist-object coordinate systemEstimating an object regionTrainingEstimating2.4.6.Then, the next is the stage 2.[]===========================================================================================

9Estimating wrist position(1/2)SURF feature extractionClassification of features by SVMReduced features classified as handAs preprocess, we remove features apparently from outside of a hand because they interfere the following voting process.HandBackground

2-Class SVMTeachers

HandBackgroundInput : SURF descriptorOutput : Hand or BackgroundSince the local estimator works well only on a hand, we have to remove SURF features apparently from outsideof a hand as preprocess.To remove such confusing features, we use 2-Class SVM.

It is trained with SURF descriptor from hands and backgrounds.

These figures show the flow of feature reduction.

By using the SVM, we can remove unnecessary features.===========================================================================================

10

Probability distribution by local estimators

Likelihood aswrist positionEstimatedwrist positionReducedSURF featuresaccumulate

Estimating wrist position(2/2)RTsRTsRTs

......FindMaximum

After the feature reduction, we estimate the wrist position by accumulating probability distributions.

A distribution is calculated for each SURF feature by the trained RTs.

Then, we generate likelihood map by accumulating the distributions.

We take a position with maximum likelihood as an estimation result B^.===========================================================================================mean shift

Hough Forest 11RESULTS : estimated wrist position

wrist position

These are the results of estimation of wrist position.

We experimented on two types of objects, cups and scissors.

The square in red shows the estimated wrist position.

These figures show that wrist positions are successfully estimated.===========================================================================================

12Proposed Method(3/6)Training a local estimator of wrist positionEstimating wrist positionTraining a local estimator of object positionGenerating wrist-object coordinate systemTraining a estimator of object region on the wrist-object coordinate systemEstimating an object regionTrainingEstimating2.4.6.Then, the next is the stage 3.[]

===========================================================================================

13Likelihood as a hand part

Low High

Likelihood as a hand part

If the j-th SURF feature originates from a hand, should be high aroundthe estimated wrist position .

We take

as a likelihood as a hand part of the j-th SURF feature.Likelihood as a hand part is useful to estimate a hand region and an object region.To train local estimator of object position, we have to distinguish SURF features into one from a hand and one from others.So, we define likelihood as a hand part. It can be calculated for each SURF descriptor.

If the j-th SURF feature originates from a hand, the corresponding distribution will be high around the estimated wrist position B^.

This probability means how much a feature contribute to estimation of the wrist position.We take it as likelihood as a hand part.

The square in the image shows the SURF features.The color of a square is corresponding to the likelihood.Red means high likelihood, and black means low likelihood.===========================================================================================hSURF

P(B|A), read as the probability of B given A.

14RESULTS : Likelihood as a hand part

HighLow

These are the calculated results of likelihood as a hand part.

It is clear that a SURF feature with high likelihood is included in a hand area.===========================================================================================15

Object center

HighLowWrist position

K-means clustering

Training a local estimator of object centerWe train RTs with represented by each local coordinate. (Here, we collect the vectors from images with simple background.)

With trained RTs, we can calculate ...Next, we train a local estimator of object centers.First, we estimate wrist positions for images with simple backgrounds.Then, we find object centers by applying K-means clustering to 3-dimensional vectors (xj, yj, hj).After finding wrist position and object center, we train RTs with their difference vector represented by each local coordinate.

With trained RTs, we can calculate probability distribution of the difference vector on each local coordinate.

===========================================================================================

In this K-means clustering 3 dimensional vectors are used.each vector consists of SURF feature position and the likelihood hj16Proposed Method(4/6)Training a local estimator of wrist positionEstimating wrist positionTraining a local estimator of object positionGenerating wrist-object coordinate systemTraining a estimator of object region on the wrist-object coordinate systemEstimating an object regionTrainingEstimating2.4.6.Next is the stage 4, estimation of object center by using the local estimators.[]

===========================================================================================

17Probability distribution by local estimatorsLikelihood asobject centerEstimatedobject centerReducedSURF featuresaccumulateEstimating object centerRTsRTsRTs......FindMaximum

We have already estimated the wrist position.And we can calculate the probability distribution of difference vector from the wrist to the object center.So we can calculate the probability distribution of object center for each SURF feature.

Similarly to estimation of wrist position,we generate likelihood map by accumulating the distributions.We take a position with maximum likelihood as an estimated object center.===========================================================================================

SURFSURFRTs

18RESULTS : estimated object centerObject center

wrist positionThese are likelihood map and estimated object center.This blue square means the estimated object center.This red square means the estimated wrist position.

These show that object centers are successfully estimated.===========================================================================================

19

Generating wrist-object coordinate systemwrist-object coordinate system

Normalized imagewrist position = (0,0)object center = (-1, 0)

Now, we have the wrist position and the object center.We want to learn an object shape and its position relative to the hand,without regard to scale, direction, absolute position in an image.So, we introduce the wrist-object coordinate as shown in this left figure.

With the wrist-object coordinate, we can normalize images as the right figures.

These normalized images are suitable for learning an object shape and its position relative to the hand.

===========================================================================================

SURFRTs---

20Proposed Method(5/6)Training a local estimator of wrist positionEstimating wrist positionTraining a local estimator of object positionGenerating wrist-object coordinate systemTraining a estimator of object region on the wrist-object coordinate systemEstimating an object regionTrainingEstimating2.4.6.Then, the next is the stage 5.[]

===========================================================================================

21Training a estimator of object region on the wrist-object coordinate systemOne-Class SVMs are built for hand and object.SVM for Hand regionSVM for Object regionInput : Output :Whether ison an object or notClassification into hand, object, and background classesInput : Output :Whether is on a hand or not

We collect teachers from the K-means clustering result of images with simple background

Now, we have the wrist-object coordinate.

In this stage, we learn object region and hand region on the wrist-objet coordinate.

We use one-class SVMs for learning them.

One-class SVM for hand region outputs whether a SURF feature is on a hand or not.Its input is (u,v), position on wrist-object coordinate and h-j, likelihood as a hand part.// We use h-j for estimating hand region according to how to grasp an object.Since we use h-j, a hand region depends on how to grasp an object.

One-class SVM for object region outputs whether a given position is on a object or not.Its input is position on wrist-object coordinate.

We train the SVMs with these images with simple background.The teachers are generated by K-means clustering.

22Proposed Method(6/6)Training a local estimator of wrist positionEstimating wrist positionTraining a local estimator of object positionGenerating wrist-object coordinate systemTraining a estimator of object region on the wrist-object coordinate systemEstimating an object regionTrainingEstimating2.4.6.Then, the next is the stage 6.===========================================================================================

23

TOTAL RESULTS (movie)HandObjectBackgroundThis movie shows an example result of classification into a hand, an object and background. The blue area means the estimated object region.Each square means SURF features.A pink square means a feature recognized as a hand.A green square means a feature not recognized as a hand.

In this movie, I change how to grasp a scissors.The proposed method successfully find object region and hand region.

===========================================================================================

24

A cup used on trainingA cup used on trainingWe also applied to 4 types of cups and 3 types of scissors.We generate two estimators for cup and scissors, respectively.The above two rows the result for cup.One of 4 cups is used on training, the other 3 cups are not used on training.The proposed method can extract correct object region even for unknown cups.

One of 3 scissors is used on training, ...

25ConclusionFuture workObject recognition by learning the relation between an object and a posture of a hand grasping it. With the proposed method, a wrist can be found and an object center can be estimated from a set of the wrist and local features.The wrist and the object center make a wrist-object coordinate suitable for learning a shape of a grasped object.Finally, Id like to conclude my talk.

With the proposed method, a wrist can be found and an object center can be estimated from a set of the wrist and local features.

The wrist and the object center make a wrist-object coordinatewhich is suitable for learning a shape of a grasped object.

Future work is Object recognition using our method

Thats all of my presentation.Thank you for your attention.

===========================================================================================

26LimitationLocal estimators and SVMs must be generated for each category.Local estimatorfor a cupLocal estimatorfor a scissors

27LimitationA local estimator and a SVM must be trained with images known to be in a single category.Teachers for a cup

Teachers for a scissors

28LimitationClassification of unknown objects is not implemented.

for drinkingUnknown object image

29

30

31

Training a local estimator of wrist positionSURF161215131418171920101123219786543212224272526283130293233343635

161215131418171920101123219786543212224272526283130293233343635

SURFSURF Randomized Trees (RTs )

wrist position(manual )

SURF128

SURFRandomized Trees

SURFSURFSURFSURFRandomized Trees

Randomized Trees32Background(2/2)How to collect labeled images?Object features may be invisible !We use local features on a hand to estimate object regions and its relative position.

Extracted SURF featuresLabeled image normalized with wrist

However, the shapes of our hands are complex to begin with. Depending on how the angle from which the hands are looked at, their appearance changes greatly, making it difficult to detect.

Moreover, occlusion occurs depending on how the tool is grasped.

Therefore, by learning the local features of the hand, we will detect the hand and the tool even when occlusion has occurred.

We can ascertain by looking at the shape of the hand, as to what kind of tool exist in which area in the image, so we will use this information, cut out the tools from the image and collect them.

33Training a local estimator of object position

SURFSURFRTs

SURFK-means clustering

SURFSURFRandomized Trees

34Generating wrist-object coordinate systemSURFRTs

SURFRTs

-

--

3536

automatic image collection of objects with similar function by learning human grasping forms

Documents

object region

object features

grasped object

actorobject states

local features

machine learning method

unknown objects

various images