search engine and services
DESCRIPTION
Search engine and services. Course: Location Aware Machine Intelligence Presented by : Celestine Mkama Kalendero 25.02.2014. Outline. Search Engine results ranking based on location Review of Personalized Mobile Search Engine Extraction of Address Data from Unstructured Text. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/1.jpg)
Search engine and services
Course: Location Aware Machine IntelligencePresented by : Celestine Mkama Kalendero
25.02.2014
![Page 2: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/2.jpg)
Outline1. Search Engine results ranking based on location2. Review of Personalized Mobile Search Engine 3. Extraction of Address Data from Unstructured Text
![Page 3: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/3.jpg)
Search Engine Results Ranking based on Location
Carolyn Watters and Ghada AmoudiFaculty of Computer Science, Dalhousie University, Halifax, Nova
Scotia. Canada. E-mail: [email protected] Year: 2003
![Page 4: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/4.jpg)
Result Ranking in Search engine
( as in the year 2002 )Search engine build their indexes based on a) Keyword occurence Frequency of query negotiation
Prons+ Robust, FastCons- User sort through pages when queries related to physical
distance and location 44 % of users frustrated by search engine (Realname,2000)
![Page 5: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/5.jpg)
Geosearcher Location based ranking system Translate search reference point into coordinates (Long,Lat) Rank search results in ascending order based on distance
Geosearcher architecture
![Page 6: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/6.jpg)
Geosearcher architecture-Query Presented by end system users e.g skiing resort District of Columbia Query- Skiing resolt Reference Point- District of Columbia Sample random Urls available ( used for evaluation )
![Page 7: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/7.jpg)
Geosearcher architecture-Geocoding
Process of assigning latitude and longitude coordinates to the host for each site;
- Preliminary work ( Perfomed by researchers)a) Determine Locationb) Create Lookup table
![Page 8: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/8.jpg)
Geosearcher architecture-Geocoding
a) Determining Location From Host Urls – DNS,Country Codes,Whois database
- Map location into coordinates e.g Use Getty Thesaurus(GS) to map location into cordinates + Containing state and area code for US,Canada + Other Countries
b) Lookup Table - Country Codes with Coordinates
www.about.comwww.dartmouth.camathresource.com
![Page 9: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/9.jpg)
Geosearcher architecture-Geocoding
a) Determining Location From Host Urls – DNS,Country Codes,Whois database
- Map location into coordinates e.g Use Getty Thesaurus(GS) to map location into cordinates + Containing state and area code for US,Canada + Other Countries
Lookup TableCountry Code State Code Area Code Coordinates(Lat,Long)US AL 25634.9200, 87.2703 US CA 53038.8951, 77.0367CA NS 90245.0000, 63.0000FI Helsinki 60.1708, 24.9375
NO Oslo 59.9500, 10.7500
![Page 10: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/10.jpg)
Example: Location Information
Getty thesaurus
Whois Database
![Page 11: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/11.jpg)
Geosearcher architecture-Geocoding
The Processa) Check coordinates from host tableb) If not, send domain to whois -Return Country Code(CC) and Area code on Match If CC is ca or us and area code, Lookup in Table :- Get state
name or province c) If not ,strip down domain by 1 level (i.e data.about.com to
about.com )d) Unmatched names checked in IPtoLL(Host-LatLong Conversion) - IPtoLL uses administrative contactStore Results in host table
Next
![Page 12: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/12.jpg)
Geosearcher architecture-Geocoding
The Processa) Check coordinates from host tableb) If not, send domain to whois -Return Country Code(CC) and Area code on Match If CC is ca or us and area code, Lookup in Table :- Get state
name or province
Host TableHost Coordinates(Lat,Long)
www.skibluemt.com 34.9200, 87.2703
www.dcski.com 38.8951, 77.0367
![Page 13: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/13.jpg)
Distance and Ranking
For Ranking URL in host table from ref Location Calculated using haversine distance Stored in session host table Rank results based on distance (Insertion sort)
![Page 14: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/14.jpg)
Results
Unranked Result-
Altavista
Using Geosearcher
![Page 15: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/15.jpg)
Results..contdValidation of accuracy Examined 100 result manually for Location Information 90 websites assigned correctly
78% of 83 URLs were accurately identified
![Page 16: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/16.jpg)
Results..contdAlgorithm Effectiveness Tested with 10 sets of 100 URLs using Yahoo Random Link
generator
![Page 17: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/17.jpg)
Personalized Mobile Search Engine Using Location and Content Concepts
Namrata G Kharate ME-Computer-II
MCOERC, Nasik-India
Prof. S. A. BhavsarAssistant Prof. Computer Dept.
MCOERC, Nasik-India
Publication: November, 2013
![Page 18: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/18.jpg)
Search - Mobile Devices Search queries on mobile Devices – Shorter,ambiguous Search Results- Less Accurate
Solution We need a system that capture user preference to return
personalized result ranking Personalized Mobile Search Engine (PMSE)
![Page 19: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/19.jpg)
PMSE- System Architecture
RSVM- Ranking Support Vector Machine Next
![Page 20: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/20.jpg)
PMSE- System Architecture
RSVM- Ranking Support Vector Machine
![Page 21: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/21.jpg)
PMSE
Client Receive user requests Store Click through Data (Location,Content) Submit Request to server Display results Profile preference in ontology based user profile
Server Forward request to commercial search engine RSVM Training Search Result Reranking
![Page 22: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/22.jpg)
Extraction of Address Data from Unstructured Text using Free Knowledge Resources
Sebastian [email protected]
Simon [email protected]
Publication: November, 2013
Ralf [email protected]
Christoph [email protected]
Multimedia Communications LabTechnische UniversitätDarmstadt Germany
![Page 23: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/23.jpg)
Extraction of Address Data
Is of interest in various domainso Location – based serviceso Address respiratory –automatically created
- Automatic harvesting of web address is not possible
Solution Identify business address data,hybrid approach
Combine Pattern & Gazetteers
![Page 24: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/24.jpg)
Address Structure-Germany
Company Name- No special pattern Street- varies, Burgermeister-Jung,Bgm.-Jung Street # - Digit sequence, e.g 45a,45-47 Postal Code-exactly 5 numbers,reserved Cities –Frankfurt,Ffm,Frankfurt/Main
![Page 25: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/25.jpg)
Address Data IdentificationWorkflow
![Page 26: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/26.jpg)
Address Data IdentificationPreprocessing Strip HTML Markup –e.g using Beautiful Soap Library Clearing- Removing non-unicode chars,White space btn
numbers Line Splitting and Tokenizing –using Apache openNLP toolkit Part of Speech Tagging- using TreeTagger
Next
![Page 27: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/27.jpg)
![Page 28: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/28.jpg)
Address Data IdentificationLine Splitting and Tokenizing –using Apache openNLP toolkit
![Page 29: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/29.jpg)
Address Data Identification1. Postal Codes
Token regular expression [0-9]{5}2. Cities
Generated list based on OpenStreetMap accessed via Overpass-API (28,087 entries)
oKnown city found in the listoPreceded directly by postal code
![Page 30: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/30.jpg)
Address Data Identification3. Street Numbers
Use Regular expression ([0-9]{1,3})([a-zA-Z][0-9]?)?(([+|-])([0-9]{1,3})([a-zA-Z][0-9]?)?)?
4. Steet NamesGenerated list based on OpenStreetMap
accessed via Overpass-API (300,000 entries)oUse street name endings e.g str
![Page 31: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/31.jpg)
Address Data Identification5. Company Name Search Identical terms ( Wikipedia )- 29 terms e.g GmbH-Private,AG-Public Exploit standard address structure
![Page 32: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/32.jpg)
Evaluation & Methology Site with Legal Note (1,576 websites )
Fraction of full address identified correctly
Rcorrect Address- 0.946, Rcompany-0.82
complete address w/o
company name
complete address with
company name
company name
street city0.50.60.70.80.9
1
Precision
Recall
![Page 33: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/33.jpg)
ConclusionSearch engine Ranking Evaluation- Algorithm was accurate and effective Efficiency- Impacted by reliance on external databases
Reccommendation Have Database of special resources – Increase efficiency Adaptation to other languages- Address extraction
![Page 34: Search engine and services](https://reader035.vdocument.in/reader035/viewer/2022070500/5681685f550346895ddea418/html5/thumbnails/34.jpg)
Thank You!
(Q&A)