lecture #32 www search. review: data organization kinds of things to organize –menu items –text...
TRANSCRIPT
![Page 1: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/1.jpg)
Lecture #32
WWW Search
![Page 2: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/2.jpg)
Review: Data Organization
• Kinds of things to organize– Menu items– Text– Images– Sound– Videos– Records (I.e. a person’s name, address, & phone
number, or a car’s year, make, & model)
![Page 3: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/3.jpg)
Review: Data Organization
• Three ways to find things:– Lists (in-order search, binary search)– Trees (balance number of branches with time to
decide which is correct branch)– Search
![Page 4: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/4.jpg)
WWW Search
![Page 5: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/5.jpg)
Search issues
• How do we say what we want?– I want a story about pigs– I want a picture of a rooster– How many televisions were sold in Vietnam
during 2000?– Find a movie like this one
• How does the computer find what we said?
![Page 6: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/6.jpg)
Things to search for
• Records
• Text
• Images
• Audio
• Video
![Page 7: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/7.jpg)
Records
• Car– Price– Miles– Year– Make– Doors
• Queries• Price < 6000 & Miles<100000• Make == Toyota & Year > 1993
![Page 8: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/8.jpg)
Queries
• Make == Toyota & Year >1993
![Page 9: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/9.jpg)
Queries
• Make == Toyota & Year >1993
![Page 10: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/10.jpg)
Queries
• Year >1993 or Price < $3,000
![Page 11: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/11.jpg)
Queries
• Year >1993 or Price < $3,000
![Page 12: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/12.jpg)
Databases
• Large collections of records
• Accessed by queries
![Page 13: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/13.jpg)
Things to search for
• RecordsText
• Images
• Audio
• Video
![Page 14: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/14.jpg)
Text searching
• How do I say what I want?– Type some phrase
• I want a story about pigs
• How will the computer match this?– What is text?
• An array of characters
– What can can a computer do with text?• Match characters
![Page 15: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/15.jpg)
Text searching
• People think in words not characters
• How do I convert an array of characters into an array of words?– Collect together sequences of letters– How do I know if character C is a letter?
• C>=“a” & C<=“z” | C>=“A” & C<=“Z”
![Page 16: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/16.jpg)
Convert to words
• Because people think in words
![Page 17: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/17.jpg)
Every document is an array of words
• I want a story about pigs
• How will I find the right documents?– Find all documents that have the word “pigs”
![Page 18: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/18.jpg)
Searching text
• How will I find pigs fast?– Create an index of all words
• With each word store the name or address of each document that contains that word
– Search the index for “pigs”• Return the list of documents
• Use a binary search on the word list (50,000 words)
![Page 19: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/19.jpg)
Problems
• What if a document has the word “Pig” but not “pigs”?
• Normalize– Case - make all words lower case
• Pig -> pig
– Stemming - remove all suffixes and prefixes before putting a word into the index
• pigs -> pig• piggy -> pig
![Page 20: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/20.jpg)
Problems
• I want a story about pigs?– How does the computer know to search for
pigs?• It doesn’t
– How does the computer know what a story is?• It doesn’t
![Page 21: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/21.jpg)
Searching
• I want a story about pigs
• Pick out the important words and search for them– Which words are important?
– D = number of times a word appears in a document– A = average number of times a word appears in all
documents
– Importance = D/A• Why?
![Page 22: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/22.jpg)
How do we create an index of all documents on the Web?
• Try = a list of URLs• Seen = all URLs you have seen
While (Try is not empty){ Page = take a URL from Try
Words = all the “important” words in Pageadd Page to the index using all of WordsLinks = all URLs in Pagefor every Link that is not in Seen add Link to Try and to Seen
}
![Page 23: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/23.jpg)
Other ways to find important words and important documents
• A Document is important if many other documents point to it
• A word is important in document D if that word occurs frequently in documents that link to document D.
![Page 24: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/24.jpg)
Images
• What will I say when searching for an image?– I want a rooster picture– Draw a picture of a rooster?
![Page 25: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/25.jpg)
Search by picture?
?
Is this possible? If so, how?
![Page 26: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/26.jpg)
What’s in a picture?• Computers don’t understand the contents of
images
• To a computer an image is a bunch of colored pixels
![Page 27: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/27.jpg)
I want a picture of a rooster
• Label all of the pictures
• How does Google Images do it?– File name of the picture “rooster-crossingSt.jpg”– Words around the picture in the HTML
• Use “Safe Search” and set filters appropriately (http://www.youtube.com/watch?v=maWx-ApkBCs)
![Page 28: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/28.jpg)
Audio
• Talking– Use speech recognition to convert audio to text
– With each recognized word keep track of where in the audio it was recognized.
• Build an index using the recognized text– Normalize based on how words sound rather
than are spelled.
![Page 29: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/29.jpg)
Video
• Where in “Casablanca” does Bogart say “Play it again Sam” ?
– he never does, he just says “play it”
• How can the computer find that?– Transcribe the audio– Speech recognition on the audio
![Page 30: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/30.jpg)
Video
• Does Woody ever kiss Bo Peep?
• Exactly what color is a kiss?
![Page 31: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/31.jpg)
Video
• Does Woody ever kiss Bo Peep?
• Annotate every frame with who is in the frame and search for frames with both Woody and Bo Peep.
![Page 32: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/32.jpg)
So what’s with this?
![Page 33: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/33.jpg)
Or this?
![Page 34: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/34.jpg)
Is Woody cheating?
![Page 35: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/35.jpg)
Search• Records
– Queries• < > = And Or
• Text– Normalized words (case, stemming, thesaurus)
• Images– Add words
• Audio– Transcribe or recognize as words
• Video– Transcribe– Annotate
![Page 36: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/36.jpg)
“Re-Search” Directions in Image Recognition, Search and Retrieval
![Page 37: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/37.jpg)
From R. Szeliski, Computer Vision Algorithms and Application, Course Notes CSE 576, U. Washington
Face Detection – Viola & Jones
Face DetectionIn Commercial Digital Cameras
Train on- 1000’s of faces- Millions of non-faces
![Page 38: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/38.jpg)
Face Recognition(Eigenfaces [Turk and Pentland 1991])
N
N
N2
0 7125068 2104412853
Project image into higher-dimensional space
“Recognize” by grouping unknown image with closest training example
![Page 39: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/39.jpg)
Face Recognition(Picasa - Google)
• Image search/organization• Automatically finds, crops and groups images of
the same person from a collection of photos• Allows user feedback (trainable) - user can
indicate if it found the wrong person.
![Page 40: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/40.jpg)
From R. Szeliski, Computer Vision Algorithms and Application, Course Notes CSE 576, U. Washington
Create visual “words” from image features.
Face/Object Recognition/Search:Feature-Based Technology
ObjectObject Bag of Bag of “words”*“words”*
Extract Extract FeaturesFeatures
*Li Fei-Fei (Princeton)
![Page 41: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/41.jpg)
From R. Szeliski, Computer Vision Algorithms and Application, Course Notes CSE 576, U. Washington
Do this for multiple objects
Face/Object Recognition/Search:Feature-Based Technology
*Li Fei-Fei (Princeton)
![Page 42: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/42.jpg)
From R. Szeliski, Computer Vision Algorithms and Applications, p. 605
How to get matching images/documents?:
Use “word” frequencies = where nid = # times word i occurs in document d nd = total # words in document d
Then combine word frequency with inverse document frequency weighting to downweight words that occur frequently (D = # of occurrences; A = average # of occurrences)
Face/Object Recognition/Search:Bag of Words
![Page 43: Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,](https://reader037.vdocument.in/reader037/viewer/2022110103/56649e375503460f94b27cfd/html5/thumbnails/43.jpg)
From R. Szeliski, Computer Vision Algorithms and Application, Course Notes CSE 576, U. Washington
Drop word features through a “vocabulary tree” to classify
Face/Object Recognition/Search:Feature-Based Technology
*Li Fei-Fei (Princeton)