multimedia signal processing mmsp sgn-5016 irek defée tietotalo tf 316 [email protected]

46
MULTIMEDIA SIGNAL PROCESSING MMSP SGN-5016 Irek Defée Tietotalo TF 316 [email protected]

Post on 20-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

MULTIMEDIA SIGNAL PROCESSING

MMSPSGN-5016

Irek Defée

Tietotalo TF 316

[email protected]

Course info

• Lectures: Room TB 214

Tue ja Wed 10.15-12

• Exercises mandatory

• Exam written

Course info

• Course Web page

http:/www.cs.tut.fi/~defee/mulsp.html

• Course material is regulary updated, please use only the updated material

Petri Hirvonen

[email protected]://www.cs.tut.fi/~hirvone2/5016_exercises.htm

Exercises for SGN-5016 Multimedia Signal Processing

Exercises• TC303• Group1: 8:15-10:00, TC 303 28.10• Group2: 8:15-10:00, TC 303 29.10• You can participate in one or both of the exercise groups if there is space, is not attend one group• A written report is returned by e-mail after each exercise.• The details about the report are included in the exercise

material.

WHAT IS THIS COURSE ABOUT???

1. WHAT IS MULTIMEDIA (MM) ?

2. WHAT IS THE TOPIC OF MULTIMEDIA

SIGNAL PROCESSING?

(THIS AREA IS NOT WELL DEFINED YET)

MULTIMEDIA SIGNAL PROCESSING

WHAT IS MULTIMEDIA?• COMPOSED OF MULTI+MEDIA

MEDIA = MEDIUM OF COMMUNICATION

WE COMMUNICATE NATURALLY:

VISUALLY, BY SPEECH, BY TOUCH…

WE COMMUNICATE BY TECHNOLOGY:

RADIO (MOBILE PHONES), TV, PRESS,

CINEMA, BOOKS

• PEOPLE USE VARIOUSCOMMUNICATION MEDIA: SPEECH, VISION, TOUCH….

IN THE PAST WHEN PEOPLE COMMUNICATED THEY HAD TO USE THOSE MEDIA DIRECTLY.

IN PRESENT CIVILISATION THERE ARE MANY TECHNOLOGIES WHICH

EXTEND HUMAN COMMUNICATION

PRODUCER OF INFORMATION HUMAN

RECEIVER OF INFORMATION HUMAN

COMMUNICATION MEDIUM NATURAL (E.G. VOICE, TOUCH): WE USE SPECIFIC PHYSICAL MEDIUM E.G. AIR PLUS PRODUCTION SPECIALLY ENCODED SIGNALS FOR CONVEYING INFORMATION

COMMUNICATION MEDIUM INDIRECT VIA TECHNOLOGY (E.G. CINEMA, RADIO, PRESS, TV)

GENERAL MODEL OF HUMAN COMMUNICATION

• MORE RECENT IS A MODEL OF

HUMAN – MACHINE

COMMUNICATION, OR EVEN

MACHINE-MACHINE COMMUNICATION

WHEN WE USE COMPUTERS, WE

COMMUNICATE WITH MACHINE,

THE COMMUNICATION MEDIA ARE:

TOUCH/GESTURE <-> KEYBOARD, MOUSE

VISION <-> DISPLAY

HEARING <-> SOUND

• HUMANS CAN USE SEVERAL DIFFERENT MEDIA FOR COMMUNICATION

E.G. SPEECH, TOUCH, VISUAL SYSTEM

HUMANS OFTEN USE SEVERAL

MEDIA SIMULTANEOUSLY OR IN OTHER

WORDS MULTIPLE MEDIA =MULTIMEDIA

FOR EXAMPLE: WHEN WE TALK WITH

SOMEBODY WE USE GESTURES, FACE

EXPRESSIONS

• IN FACT PEOPLE PREFER TO USE MULTIPLE MEDIA = MULTIMEDIA

- WE CAN USE SINGLE MEDIA, E.G. SPEECH WHEN TALKING ON THE PHONE

BUT SEEING EACH OTHER WHEN TALKING ”ENHANCES” THE CONTACT

- WE CAN LISTEN TO THE RADIO, E.G. NEWS, BUT TV IS PREFERRED EVEN IF WE JUST SEE A PERSON READING THE NEWS

- MULTIMEDIA IS MORE NATURAL FOR PEOPLE

• THERE IS ANOTHER USE OF WORD ”MEDIA”, IN THE SENSE OF

MEDIA INDUSTRY

MEDIA INDUSTRY IS DEALING WITH PRODUCING, DISTRIBUTING AND SELLING INFORMATION ADDRESSING HUMAN MEDIA SYSTEM

MULTIMEDIA INFORMATION IS VERY IMPORTANT FOR THE INDUSTRY

THERE ARE MANY ENGINEERING PROBLEMS IN DEALING WITH MULTIMEDIA INFORMATION

• WHAT IS MULTIMEDIA SIGNAL

PROCESSING (MMSP) ?

IT IS ABOUT PROCESSING

COMMUNICATION AND UTILIZATION

OF INFORMATION USED BY HUMANS

ONE CAN CONSIDER THREE

SCENARIOS OF USAGE:

1. HUMAN-HUMAN

2. HUMAN – MACHINE

3. MACHINE - MACHINE

WHY MULTIMEDIA SIGNAL PROCESSING IS POSSIBLE? THIS IS BECAUSE WE HAVE MEANS FOR DIGITAL REPRESENTATION AND PROCESSING OF ANY TYPE OFINFORMATION. IF WE TALK ON THE PHONE, LISTEN TO THE MUSIC FROM MP3PLAYER, WATCH MOVIE FROM DVD DISC, TAKE PICTURE WITH CAMERA, WE KNOW THAT INFORMATION IS REPRESENTED BY BITS AND PROCESSED DIGITALLY

WHAT WE NEED ARE ALGORITHMSHOW TO PROCESS THE SIGNALS DIGITALLY

MULTIMEDIA SIGNAL PROCESSING

IS ABOUT ALGORITHMS FOR THE PROCESSING OF SIGNALS WHICH ARE USED BY HUMANS FOR COMMUNICATIONWITH OTHER PEOPLE OR MACHINES OR DEALING WITH THE WORLD AROUND

• WHAT ARE THE MEDIA SIGNALS?

MEDIA SIGNALS ARE THOSE SIGNALS WHICH ARE ACCESSIBLE TO THE HUMAN INFORMATION PROCESSING SYSTEM

ONE OF THE ISSUES IN MULTIMEDIA SIGNAL PROCESSING IS WHAT TYPE OF SIGNALS AND WHAT KIND OF COMBINATIONS OF SIGNALS CAN BE USED. FOR EXAMPLE: ACOUSTICAL SIGNALS: SOUNDS, SPEECH-LANGUAGE, MUSIC

WE CONVERT THOSE SIGNALS TO DIGITAL FORMAT AND USE

• EXAMPLE: DIGITAL MUSIC (CD, MP3, DVD, INTERNET RADIO)

• EXAMPLE: DIGITAL VIDEO (DVD, BLUE RAY, INTERNET TV)

THESE ARE SYSTEMS FOR TRANSFERRINGCONTENT PRODUCED BY ARTISTS TO PEOPLE. THESE SYSTEMS USE SPECIFICDIGITAL ENCODING AND COMPRESSION OF INFORMATION TO RECORD THECONTENT. THE QUESTION IS HOW TO MAKE THIS.

BUT HAVING SUCH SYSTEMS A NEW

PROBLEM EMERGES:

HOW TO PROTECT MEDIA INFORMATION

UNAUTHORIZED USE?

(FOR EXAMPLE ILLEGAL COPYING?)

How to represent media information in

most pleasing way?

Examples are High Definition technologies:

- Flat Displays

- HD DVD, Blue Ray discs, HDTV

• THE SECOND MAIN ASPECT OF MMSP

2. HUMAN-MACHINE COMMUNICATION

HOW TO MAKE INTERACTION WITH

COMPUTERS (AND OTHER MACHINES)

MORE NATURAL? NATURAL MEANS E.G. MORE

SIMILAR TO HUMAN-HUMAN INTERACTION,

MORE INTUITIVE, MORE PLEASING,

ATTRACTIVE….

THAT INCLUDE ALSO HOW TO MAKE

MACHINES MORE INTELLIGENT:• FOR EXAMPLE , INSTEAD OF TYPING

WE COULD TALK TO COMPUTERS AND INSTEAD OF COMPUTERS PRINTING ON SCREEN ANSWERS THEY WOULD TALK TO US.

OR, IF COMPUTERS WOULD SEE US USING CAMERAS, THEY POSSIBLY COULD REACT MORE LIKE PEOPLE.

BUT TODAY WE STILL USE KEYBOARD

AND MOUSE, WHY?

• WE USE KEYBOARD AND MOUSE

BECAUSE WE DO NOT HAVE BETTER TECHNOLOGY: WE DO NOT KNOW HOW TO PROCESS SPEECH AND VISUAL INFORMATION AS EFFECTIVELY AS PEOPLE ARE ABLE TO DO

• BUT WE MAY THINK OF COMPUTERS WITH CAMERAS AND MICROPHONES

WHICH WILL BE ABLE TO DO SO• THIS MAY BECOME POSSIBLE BECAUSE

OF FAST PROGRESS IN DEVELOPMENT OF ALGORITHMS AND PROCESSORS

• THIS PROGRESS CAN BE ILLUSTRATED ON MANY EXAMPLES

- COMPARE PC TODAY AND 10 years AGO

(TODAY WE HAVE MULTICORE PROCESSORS AND THE NUMBER OF CORES IS GROWING FAST)

- COMPARE MOBILE DEVICE TODAY AND MOBILE PHONE 10 years AGO

(TODAY THE TELEPHONE FUNCTION IS JUST ONE ADDITION TO MULTIPLE MEDIA PROCESSING: MUSIC, VIDEO, CAMERA, TOUCH, ORIENTATION)

EXTRAPOLATE THIS TO THE NEXT 10 years!

WE CAN EXPECT IN THE FUTURE:

• COMPUTERS, MOBILE, AND ALL KIND OF OTHER DEVICES WILL BE MORE AND MORE CLEVER (=INTELLIGENT?)

• THESE SYSTEMS WILL BE RELYING

ON INCREASINGLY SOPHISTICATED MULTIMEDIA SIGNAL PROCESSING CAPABILITIES

• WE HAVE THUS TWO MAIN AREAS TO COVER IN MMSP:

1. MEDIA INFORMATION PROCESSING

IN MULTIMEDIA SYSTEMS

2. MEDIA COMPUTER INTERFACE FOR

HUMAN-COMPUTER INTERACTION

THESE ARE THE TOPICS OF

THE MMSP COURSE

• Please note however that our Multimedia Signal Processing course is matched to the study program at TUT, especially to the Multimedia Major

• We have many courses specialized in single media processing: Digital Audio, Image Processing, Video Processing, Video Compression, Pattern Recognition

• We avoid overlapping with those courses. We are also not going into algorithms which were proposed by researchers but they are not in wider use yet, this is covered in other courses and seminars

• In other universities they may not have so many specialized courses, the course content is different

• There is one absolutely basic observation:• MANY MULTIMEDIA SIGNAL PROCESSING

TASKS ARE ALREADY IMPLEMENTED IN BIOLOGICAL SYSTEMS, ESPECIALLY IN THE HUMAN INFORMATION PROCESSING SYSTEM

• FOR EXAMPLE: VISUAL AND ACOUSTICAL COMMUNICATION BETWEEN PEOPLE, USING VISUAL INFORMATION IN RECOGINIZING OBJECTS. BIOLOGICAL SYSTEMS DO IT PERFECTLY BUT WE DO NOT KNOW HOW, THAT IS ALGORITHMS

IN THE FIRST PART OF THIS COURSE

WE SHALL COVER BASIC KNOWLEDGE

RELATED TO

HUMAN INFORMATION PROCESSING THIS SYSTEM PROCESSESS MEDIA

INFORMATION AND IT DOES IT IN

FANTASTIC WAY. IF WOULD KNOW HOW

IT MAKES IT, IT COULD HELP US TO

MAKE BETTER MEDIA INFORMATION

PROCESSING (BETTER MMSP ALGORITHMS)

BUT BEFORE WE GO FURTHER LET US MAKE

SOME MEDIA TECHNOLOGY OVERVIEW,

WHERE MULTIMEDIA SIGNAL PROCESSING

WILL BE USEFUL IN THE FUTURE

MULTIMEDIA SIGNAL PROCESSING

ALLOWS FOR NEW CLASSESS OF DEVICES

AND SYSTEMS:

MORE SOPHISTICATED COMMUNICATION,

MORE ADVANCED INTERFACES

THEY ARE ILLUSTRATED NEXT

Mobile Multimedia Devices Examples

WHAT THESE MOBILE DEVICE EXAMPLES SHOW TO US?-DEVICES HAVE MULTIPLE SENSORS AND MULITPLE MEDIA PROCESSING CAPABILITIES- TAKE ONE EXAMPLE - TOUCH

Device is controlled by fingers, e.g. picture sizeor even playing guitar

What is still missing?

Maybe makeup, but this is a joke

ANOTHER EXAMPLE: DIGITAL CAMERAS

Digital cameras perform a lot of processingfor best picture quality. But recent camerashave new features related to analysis of visual information.

Face Detection automatically detects a face in the frame and adjusts focus, exposure, contrast, and skin complexion so it turns out perfectly.Face Recognition – a feature that “remembers” faces from previous shots. When a familiar face is recorded several times, the camera will prompt the users to register the face. Once registered, if the face appears into the frame again, the camera will display the name specified for that person and prioritize focus and exposure for the face.

To make such feature an algorithm forface detection and recognition is neededworking fast and reliably

COMPLETELY NEW TYPES OF DEVICES ARE POSSIBLE: EXAMPLE Wii

Wii by Nintendo

Contollers have motion sensors

Game & fitness accessories

Dancing pad Balance board

Sports game Music performance

AIBO DOG – PERSONAL ROBOT WITH SENSES

Completely New Types of Devices

IT HAS SENSES: MICROPHONE,CAMERA, TEMPERATURE,DISTANCE, ACCELERATION,BALANCE, TOUCH

IT HAS INSTINCTS AND BEHAVIORS

"Is this a real cat?" A robot cat you can bond with like a real pet --

NeCoRo is born

        

                       

Completely New Types of Devices

    Omron ready to test demand for robo-cat

Equipped with Omron's proprietary MaC (Mind and Consciousness) technology, feelings are generated according to recognition feedback, which is dependent on configurations based on psychological concepts, leading to cognitive decisions and actions determined by these feelings (applicable patent acquired)

Feelings of satisfaction, anger, and uneasiness generated based on recognition feedback

Desires to sleep or be cuddled generated according to physiological rhythms

Via a learning function, personality traits such as selfishness and the need for attention will change in response to the owner

PERSONAL ROBOTSSTART APPEARING ...

                                 

                

Fujitsu has developed a new miniature humanoid robot, named HOAP-1, designed for wide application in research and development of robotic technologies.Fujitsu Automation will begin domestic sales of the robot from today and hopes to sell 100 units within three years.Weighing 6kg and standing 48cm tall, the light and compact HOAP-1 and accompanying simulation software can be used for developing motion control algorithms in such areas as two-legged walking, as well as in research on human-to-robot communication interfaces.The basic simulation software and user-developed programs are designed to run on RT-Linux on an operating command PC, which communicates with the the robot through a USB interface. The robot's internal sensors and actuators (motors) also use USB interface and can be easily expanded according to needs

                               

                                                                           

                                                                                       

                                     

                          

  

  

                                                                                          

The two-legged walking technology developed by Honda represents a unique approach to the challenge of autonomous locomotion. Using the know-how gained from these prototypes, research and development began on new technology for actual use. ASIMO represents the fruition of this pursuit.

• Progress of technology is fast: Even the old television is changing, in 2010 a three dimensional television, 3D TV, will start

3D TV set

Glasses

And also a first TV controlled by hand gestures will be available (but very expensive)

What we see from these examples?

• We can see that devices are developing to have

- More complexity

- More intelligence

- More natural interaction with people

To add even more such features one needs

algorithms for multimedia signal processing,

many of these algorithms should have

capabilities similar to biological systems.