… · web viewthese interfaces are realised by giving the computer ... a multimodal user...

Perceptual Intelligent Systems 1

Seminar Report On

PERCEPTUAL INTELLIGENT SYSTEMS

Guided By: Ms. Bindu S. Moni

Submitted By:

N.M.Jophi

S1 MCA

Roll No. 29

A.J.C.E Dept. Of Comp. Science & Engg.


CONTENTS

1. Introduction 3

2. Perception 3

- Filters that make up perception 3

3. Perceptual User Interfaces 5

4. Information Flow in Perceptual User Interfaces 6

5. Perceptual Intelligence 7

6. Perceptual Intelligent Systems 7

7. Gesture Recognition Systems 8

- Challenge of Gesture Recognition 8

8. Speech Recognition Systems 9

- Performance of Speech Recognition Systems 9

9. Nouse Perceptual Vision Interface 10

- Tools available 11

10. Conclusion 13

11. Reference 14



Introduction

Inanimate things are coming to our life. That is the simple objects that surround us are

gaining sensors, computational powers, and actuators. Consequently, desks and doors, TVs

and telephones, cars and trains, eyeglasses and shoes, and even the shirts on our backs are

changing from static, inanimate objects into adaptive, reactive systems that can be more

friendly, useful, and efficient. These new systems could be even more difficult to use than

current systems. It depends how we design the interface between the world of humans and the

world of this new generation of machines. To change inanimate objects into smart active

helpmates they need perceptual intelligence.

The main problem with today’s systems is they are both deaf and blind. They mostly

experience the world around them through a slow serial line to a keyboard and mouse. Even

multimedia computers, which can handle signals like sound and image, do so only as a

transport device that knows nothing Computers need to share our perceptual environment

before they can be really helpful. They need to be situated in the same world that we are; they

need to know much more than just the text of our words of the signals’ content.

Here comes the importance of perceptual intelligence. If the systems have the ability

to learn perception, they can act in a smart way. Perceptual intelligence is actually a learned

skill.

Perception

Perception is the end result of a thought that begins its journey with the senses. We

see, hear, physically feel, smell or taste an event. After the event is experienced it must then

go through various filters before our brains decipher what exactly has happened and how we

feel about it. Even though this process can seem instantaneous, it still always happens.

The filters that make up perception are as follows: What we know about the subject or event.

I saw an orange and knew it was editable.

What our previous experience (and/or knowledge) with the subject or event was.

Last time I ate an orange I peeled it first (knowledge to peel an orange before

eating it) and it was sweet.



Our previous experience forms our expectations.

Our current emotional state. How we are feeling at the time of the event does affect

how we will feel after the event.

I was in a bad mood when I ate the orange and it angered me that it was sour

and not sweet (my expectation).

In the end my intellectual and emotional perception regarding the eating of an orange

was an unpleasant experience. Depending on how strong that experience was,

determines how I will feel next time I eat an orange. For example, if I got violently

sick after eating an orange, the next time I see an orange, I probably won’t want to eat

it. If I had a pleasant experience eating an orange, the next time I see an orange, I’ll

likely want to eat it.

Even though emotions seemly occur as a result of an experience, they are actually the

result of a complicated process. This process involves interpreting action and thought and

then assigning meaning to it. The mind attaches meaning with prejudice as the information

goes through the perceptual filters we mentioned above.

Our perceptual filters also determine truth, logic along with meaning - though they

don’t always do this accurately. Only when we become aware that a bad feeling could be an

indication of a misunderstanding (error in perception) we can begin to make adjustments to

our filters and change the emotional outcome.

When left alone and untrained, the mind chooses emotions and reactions based on a

"survival" program which does not take into account that we are civilized beings – it’s only

concerned with survival.

A good portion of this program is faulty because the filters have created distortions,

deletions and generalizations which alter perception. For example, jumping to a conclusion

about "all" or "none" of something based on one experience. The unconscious tends to think

in absolutes and supports "one time" learning from experience (this is the survival aspect of

learning).



Perceptual User Interfaces

A perceptual interface is one that allows a computer user to interact with the computer

without having to use the normal keyboard and mouse. These interfaces are realised by

giving the computer the capability of interpreting the user's movements or voice commands.

Perceptual Interfaces are concerned with extending human computer interaction to use

all modalities of human perception. All current research efforts are focused at including

vision, audition, and touch in the process. The goal of perceptual reality is to create virtual

and augmented versions of the world, that are perceptually identical to the human with the

real world. The goal of creating perceptual user interfaces is to allow humans to have natural

means of interacting with computers, appliances and devices using voice, sounds, gestures,

and touch.

Perceptual User interfaces (PUI) are characterised by interaction techniques that

combine an understanding of natural human capabilities with computer I/O devices and

machine perception and reasoning. They seek to make the user interface more natural and

compelling by taking advantage of the ways in which people naturally interact with each

other and with the world-both verbally and nonverbally. Devices and sensors should be

transparent and passive if possible, and machines should perceive relevant human

communication channels as well as generate output that is naturally understood. This is

expected to require integration at multiple levels of technologies such as speed and sound

recognition and generation, computer vision, graphical animation and visualization, language

understanding, touch based sensing and feedback learning, user modelling and dialogue

management.



Information Flow in Perceptual User Interfaces

PUI integrates perceptive, multimodal, and multimedia interfaces to bring our human

capabilities to bear on creating more natural and intuitive interfaces.

A perceptive user interface is one that adds human-like perceptual capabilities to the

computer, for example, making the computer aware of what the user is saying or what the

user’s face, body and hands are doing. These interfaces provide input to the computer while

leveraging human communication and motor skills.

A multimodal user interface is closely related emphasizing human communication

skills. We use multiple modalities when we engage in face to face communication leading to

more effective communication. Most work on multimodal UI as focused on computer

input(for example using speech together with pen based gestures).Multimodal output uses

different modalities, like visual display, audio and tactile feedback to engage human

perceptual, cognitive and communication skills in understanding what is being presented. In

multimodal UI various modalities are sometimes used independently or simultaneously or

tightly coupled.

Multimedia UI uses perceptual and cognitive skills to interpret information presented

to the user .Text, graphics, audio and video are the typical media used.

PUIs will enhance the use of computers as tools or appliances, directly enhancing

GUI-based applications. For example, by taking into account gestures, speech and eye gaze.

Perhaps, more importantly, these technologies will enable broad use of computers as



assistance, or agents that will interact in more human like ways. Perceptual interfaces will

enable multiple styles of interaction such as speech only, speech and gesture, text and touch,

vision, and synthetic sound, each of which may be appropriate in different circumstances,

whether that be desktop apps, hands-free mobile use, or embedded household systems.

Perceptual Intelligence

Perceptual Intelligence is the knowledge and understanding that everything we

experience (especially thoughts and feelings) are defined by our perception. Perceptual

intelligence is paying attention to people and the surrounding situation in the same way

another person would, thus allowing these new devices to learn to adapt their behaviour to

suit us, rather than adapting to them as we do today.

In the language of cognitive science, perceptual intelligence is the ability to deal with

the frame problem; it is the ability to classify the current situation, so that it is possible to

know what variables are important and thus can take appropriate action. Once a computer has

the perceptual ability to know who, what, when, where, and why, then the probabilistic rules

derived by statistical learning methods are normally sufficient for the computer to determine

a good course of action.

The key to perceptual intelligence is making machines aware of their environment,

and in particular, sensitive to the people who interact with them. They should know who we

are, see our expressions and gestures, and hear the tone and emphasis of our voice.

Perceptual Intelligent Systems

We have developed computer systems that can follow people‘s actions, recognizing

their faces, gestures, and expressions.

Some of the systems are:

Gesture Recognition System

Speech Recognition System

Nouse Perceptual Vision Interface



Gesture Recognition System

Gesture Recognition deals with the goal of interpreting human gestures via

mathematical algorithms. Gestures can originate from any bodily motion or state but

commonly originate from the face or hand. Current focuses in the field include emotion

recognition from the face and hand gesture recognition. Many approaches have been made

using cameras and computer vision algorithms to interpret sign language.

Gesture Recognition can be seen as a way for computers to begin to understand

human body language, thus building a richer bridge between machines and humans than

primitive text user interfaces or even GUIs (Graphical User Interfaces), which still limit the

majority of input to keyboard and mouse.

Gesture Recognition enables humans to interface with the machine (HMI) and interact

naturally without any mechanical devices. Using the concept of Gesture Recognition, it is

possible to point a finger at the computer screen so that the cursor will move accordingly.

This could potentially make conventional input devices such as mouse, keyboards and even

touch-screens redundant.

Gesture Recognition can be conducted with techniques from computer vision and

image processing.

Often the term gesture interaction is used to refer to inking or mouse gesture

interaction, which is computer interaction through the drawing of symbols with a pointing

device cursor. Strictly speaking the term mouse strokes should be used instead of mouse

gesture since this implies written communication, making a mark to represent a symbol.

Challenges of Gesture Recognition

There are many challenges associated with the accuracy and usefulness of Gesture

Recognition software. For image-based gesture recognition there are limitations on the

equipment used and image noise. Images or video may not be under consistent lighting, or in

the same location. Items in the background or distinct features of the users may make

recognition more difficult. The variety of implementations for image-based gesture

recognition may also cause issue for viability of the technology to general usage. For

example, recognition using stereo cameras or depth-detecting cameras are not currently

commonplace. Video or web cameras can give less accurate results based on their limited

resolution.



Speech recognition System

Speech recognition converts spoken words to machine-readable input (for example, to

the binary code for a string of character codes). The term voice recognition may also be used

to refer to speech recognition, but more precisely refers to speaker recognition, which

attempts to identify the person speaking, as opposed to what is being said.

Speech recognition applications include voice dialling (e.g., "Call home"), call routing

(e.g., "I would like to make a collect call"), domotic appliance control and content-based

spoken audio search (e.g., find a podcast where particular words were spoken), simple data

entry (e.g., entering a credit card number), preparation of structured documents (e.g., a

radiology report), speech-to-text processing (e.g., word processors or emails), and in aircraft

cockpits (usually termed Direct Voice Input).

Performance of Speech Recognition Systems

The performance of speech recognition systems is usually specified in terms of

accuracy and speed. Most speech recognition users would tend to agree that dictation

machines can achieve very high performance in controlled conditions. There is some

confusion, however, over the interchange ability of the terms "speech recognition" and

"dictation".

Commercially available speaker-dependent dictation systems usually require only a

short period of training (sometimes also called `enrolment') and may successfully capture

continuous speech with a large vocabulary at normal pace with a very high accuracy. Most

commercial companies claim that recognition software can achieve between 98% to 99%

accuracy if operated under optimal conditions. `Optimal conditions' usually assume that

users:

Have speech characteristics which match the training data,

Can achieve proper speaker adaptation, and

Work in a clean noise environment (e.g. quiet office or laboratory space).

This explains why some users, especially those whose speech is heavily accented,

might achieve recognition rates much lower than expected. Speech recognition in video has

become a popular search technology used by several video search companies.



Limited vocabulary systems, requiring no training, can recognize a small number of

words (for instance, the ten digits) as spoken by most speakers. Such systems are popular for

routing incoming phone calls to their destinations in large organizations.

N ouse Perceptual Vision Interface

Nouse PVI is a perceptual vision interface program that offers a complete solution to

working with a computer in Microsoft Windows OS hands-free. Using a camera connected to

a computer, the program analyzes the facial motion of the user to allow him/her to use it

instead of a mouse and a keyboard. As such Nouse - PVI allows a user, to perform the basic

three computer-control actions:

Cursor control: Includes

Cursor positioning

Cursor moving, and

Object dragging - which are normally performed using mouse motion

Clicking: Includes

Right-button click

Left-button click

Double-click, and

Holding the button down - which are normally performed using the mouse

buttons

Key/letter entry: Includes

Typing of English letters

Switching from capital to small letters, and to functional keys

Entering basic MS Windows functional keys as well as Nouse functional keys

- which would normally be performed using a keyboard.



The program is equipped with such tools as:

Nousor (Nouse Cursor) -

The video-feedback-providing cursor that is used to point and to provide the

feeling of “touch" with a computer.

Nouse Click -

A nose-operated mechanism to simulate types of clicks.

Nouse Codes -

Configurable Nouse tool that allows entering computer commands and operate

the program using head motion codes.

Nouse Editor -

Provides an easy way of typing and storing messages hands-free using face

motion. Typed messages are automatically stored in Clipboard (as with CNTR+A,

CNTR+C).

Nouse Board-

A specially designed for face-motion-based typing on-screen keyboard that

automatically maps to the user's facial motion range.

Nouse Typer -

A configurable Nouse tool that allows typing letters by drawing them inside

the cursor (instead of using the Nouse Board).

Nouse Chalk -

A configurable Nouse tool that allows writing letters as with a chalk on a piece

of paper. Written letters are automatically saved on hard drive as images that can be

opened and emailed.

And such features as:

Automatic focusing on the user nose and motion range calibration.

Lock On Area, Glue/Unglue mechanisms that allow to map user's motion range onto

an arbitrary windows application



Figure: The appearances of Nouse Board: groupingof letters by four is made to suit four directions

of “clicking” motion

Many of the universities have research centres which focus on perceptual intelligence.

In India MIT have developed two experimental test buds smart rooms and smart clothes.



Conclusion

It is now possible to track people’s motion, identify them by voice and facial

appearance, and recognize their actions in real time using only modest computational

resources. By using this perceptual information we have been able to build smart room and

smart clothes that can recognize people, understand their speech, allow them to control

information displays without mouse or keyboard, communicate by facial and hand gesture,

and interact in a more personalized, adaptive manner. Our overall goal is to make the

computers seem as natural to interact with as another person. Sometimes this means than

there should be no interface; it should just recognize what is going on and what is the right

thing. At other times, it means that the system should engage in a dialogue with a person. We

want a system that is truly human centred and natural to interact with; this requires not just

perceptions but also a significant understanding of the semantics of the everyday world and

the reasoning capabilities to use this understanding flexibly.



Reference

www.ayrmetes.com

www.centerforfuturehealth.com

www.infibeam.com

www.wikipedia.org


… · web viewthese interfaces are realised by giving the computer ... a multimodal user...

Documents