robotdance - nao - sjsu computer science …ackerman/cc/student_work/systems/robotdance.pdfhighly...
TRANSCRIPT
7 SENSES FOR NATURAL INTERACTION
• Moving: Humanoid shape and inertial unit(for balance)
• Feeling: Sensors on his head, hands and feet
• Hearing: 4 directional microphones and loudspeakers
• Speaking: 4 directional microphones and loudspeakers
• Seeing: Equipped with 2 cameras he can recognize shapes and objects
• Connecting: Uses WiFi and Ethernet to access internet autonomously
• Thinking: Able to reproduce human behavior
AUTONOMOUS ROBOT DANCING DRIVEN BY BEATS AND
EMOTIONS OF MUSIC
• Usually every music has choreography
• If music is changed, choreography needs to be changed
• Automate the choreography satisfying following goals:
• Choreography should be safe for the performance
• Choreography should reflect emotions of music
• Dance should be synchronized to the music
EMOTION
EXTRACTION SMERS(SVR-based Music Emotion Recognition System) model: Each
musical composition has a single emotion
Paper model: Emotion may change over time within a piece
Represented by vector of (a,v) coordinates
ith element in the vector represents emotion of music at 15i + 15 seconds
EMOTION
EXTRACTION for a segment of music audio, extract feature vector x, apply learned
regression model(Support Vector Regression) to get vector y
𝑥𝑖 𝜖 R6 is a vector of extracted music audio features, which include estimated key(one of the 12 major or minor keys), average energy and standard deviation of energy, estimated tempo, standard deviation of beat duration and harmonicity
𝑦𝑖 𝜖 R2 is an emotion vector
BEAT TRACKING
• Global tempo(overall beats per minute of a piece of music) estimation:
o Highest peak is detected and corresponding lag is chosen as tempo
• Find best beat times
𝑆 𝑇 = 𝛼
𝑖
𝑁
𝑂𝑛𝑠𝑒𝑡 𝑡𝑖 − (1 − 𝛼)
𝑖+1
𝑁
𝑑𝑖𝑠𝑡(𝑡𝑖 − 𝑡𝑖−1, 𝐶)
• Time: 𝑡𝑖• Weighing parameter: 𝛼 𝜖 (0,1)• Difference between beat interval and global tempo: 𝑑𝑖𝑠𝑡(𝑡𝑖 − 𝑡𝑖−1, 𝐶)
MOTION PRIMITIVE(SEQUENCE OF KEYFRAMES)
• NAO has 21 joints grouped into 4 categories: head, left arm, right arm, legs
• Each category of joints is defined to be c = {Jc,1, . . . , Jc,|c|}, where c ∈ {Head, LArm, RArm, Legs}
and Jc,1, . . . , Jc,|c| are the indices of the joints in the category. |c| is the total number of joints in the
category c
• Each keyframe (static pose) is associated with a category c, and is defined as Kc = {Vc,1, . . . , Vc,|c|},
where Vc,j contains the joint angle of joint index Jc,j.
• A motion primitive is Mc(β) = {Kc,1, βD1, Kc,2, . . . , Kc,F-1, βDF-1, Kc,F} where F is the number of
keyframes in Mc. D is the min time that it takes to move from one keyframe, Kc,f , to the next
keyframe, Kc,f+1, and is pre-defined. We parameterize the motion primitive with β, where β ∈ R and
β ≥ 1. β is calculated using the beat times of the music so as to synchronize the motion primitive with
the music
• Musical emotion decides the motion primitive
GENERATE SCHEDULE OF MOTION PRIMITIVES
4 static postures of NAO for each of the 6 emotions:
Happy
Sad
Fear
Angry
Surprise
Disgust
Motion primitives constructed from these
A Markov Model is used to select the motion primitive
Separate model for each category c
MARKOV DANCER MODEL
• Generate motion primitive Mc,i with probability P(Mc,i| Mc, i-1,e)
• e is the emotion detected at the end of Mc,i-1
• When i=1, we select motion primitive with probability P(Mc,1|e)
• P(Mc,i| Mc, i-1,e) = C .E .N
• C: Continuity factor
• E: Emotion factor
• N: constant Normalizing factor
OTHER THINGS NAO IS DOING
• NAO has partnered with IBM to create ‘Connie’, a
highly advanced AI powered NAO robot as a
concierge at Hilton hotels
• Guests can ask Connie questions about nearby
tourist attractions, hotel facilities, restaurants and
bars and get a instant intelligent and helpful
response back