a specialised search engine for neuroscience webpages
DESCRIPTION
N euro S earch. A Specialised Search Engine for Neuroscience WebPages. Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis, [email protected]. Contents. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/1.jpg)
A Specialised A Specialised Search Engine for Search Engine for
Neuroscience WebPagesNeuroscience WebPages
Fatma Y. ELDRESI Fatma Y. ELDRESI (MPhil )Systems Analysis / Programming Specialist, AGOCO
Part time lecturer in University of Garyounis,
NeuroSearch
![Page 2: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/2.jpg)
2
Contents Introduction
Implementation
Testing
Software lifecycle : (1)webCrawler Engine, (2) Indexer Engine, (3) Query Engine, (4) Re-Crawler Engine (Specialised Crawler)
Conclusions
Components in a NeuroSearch & its Architecture
Challenges
![Page 3: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/3.jpg)
3
Introduction
What is a Search
Engine?
A server or a collection of servers dedicated to indexing internet web pages, storing the results and returning lists of pages which match particular queries.
Convenient search engines generate indexes :
•Google using Spider•Yahoo using Directory
“NeuroSearch” Using Spider & the Advance Knowledge
![Page 4: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/4.jpg)
4
Introduction cont..
Defining the
problem
In addition,(1)- users have many challenges in choosing the relevant keywords;(2)- professionals sometimes fail in their search and get disappointed result,
becauseA. the retrieved pages sometimes not related orB. different from what the they’re looking for.
TheThe Objective
Creating a specialised search engine (i.e, Advance knowledge) to read web documentsIndex and update all the content in the local serverAnswer the queries from the local database Update the system over a constant period
why is a specialised search engine needed? Web has got non centralised organisation, with huge mixed
collection of Information Updated continuously, without standard format, Pages are extensively linked
Therefore,Therefore, establishing standard measures for relevance is a very challenging task establishing standard measures for relevance is a very challenging task
![Page 5: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/5.jpg)
5
Components of “NeuroSearch”
It has two components:It has two components:1-1-Search/Crawler EngineSearch/Crawler Engine2- 2- Query enginesQuery engines
![Page 6: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/6.jpg)
6
Components explained
Retriever (Query engine)
Re-crawler
Indexer
Spider
Crawler EngineCrawler Engine
Crawler EngineCrawler Engine
Crawler EngineCrawler Engine
Query EngineQuery Engine
![Page 7: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/7.jpg)
7
“NeuroSearch” Architecture Model
Search Engine
Interface
Query Engine
Indexer
Index
Re-Crawler WebCrawler
World Wide Web
Users
WWW
![Page 8: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/8.jpg)
8
Implementation and Case Study
•Creating the database using Access DB.
•Implementing all parts of “NueroSearch” using Java Language and SQL.
![Page 9: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/9.jpg)
9
NeuroSearch Database
The
Advance
Knowledge
TEXTTEXT TEXT
WebCrawler data
Advance Knowledge data Re-crawler
data
Query Data
Indexer data
![Page 10: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/10.jpg)
10
The advance knowledge Case study- Neuroscience (Vision)
Ph
ase
1P
has
e 1
Ph
ase
2P
has
e 2
Ph
ase
3P
has
e 3
NeuroSearch uses advance knowledge about Neuroscience (vision) as a case study.
Then, as a domain knowledge of Vision, do data mining to construct keywords and the relation between them.
This knowledge is stored in the database and categorised by numbers, and related knowledge is categorised
too and stored in data network form in the database.
![Page 11: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/11.jpg)
11
Software lifecycle
Consists of 1. WebCrawler/Spider EngineWebCrawler/Spider Engine 2. 2. Indexer EngineIndexer Engine 3. 3. Re-Crawler (specialised)Re-Crawler (specialised)
Crawler Engine
![Page 12: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/12.jpg)
12
WebCrawler (Spider)
Spider
1)-This web crawler is general one which can download any kind of WebPages. It performs this using :
3)-In addition, WebCrawlerhas to access the proxyaccess the proxyfirewallfirewall (i.e. in Newcastle University LAN), before downloaded any web sites.
2)-Fetch URL, retrieves all its WebPages and saves them in the local drive
4)-The crawler performs a performs a breadth-first breadth-first searchsearch, which means it collects a list of all the links that are on the current page before
it follows any of the links to a new page.
![Page 13: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/13.jpg)
13
WebCrawler - real challenge.
Challenge 1:connect to www and accessing private websites.
Solution 1:Crawler has to allow its socket to connect first with the Proxy server.
Challenge 2:connect this socket further to the WWW
Solution 2:Get method : the straight forward socket uses is just to get the file name. However, in this caseGet command has to take the full URL.
![Page 14: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/14.jpg)
14
Indexer Engine
Indexer Engine
4)-The Ranking Method
1)-Firstly, it search the webpage using it’s advance knowledge. Then, Webpage will be deleted if it is not related to the case study subject.
2)- if it is related to the case study subject (neuroscience) so the indexer will collect the following information from the document:
3)-All keywords it contains, how many times they are repeated, title, contents Then, save them in the database for later display in the query result and do other calculation.
![Page 15: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/15.jpg)
15
Query Engine
QueryEngine
It has an interface to accept keywords from the user
gives the user 2 choices for either display only the most relevant result, or the whole result which include the related results.
It searches for query keywords in the index database and retrieved the result in html format.
![Page 16: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/16.jpg)
16
Query Result: This is indeed an edge compared to other convenient search engines
![Page 17: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/17.jpg)
17
Re-Crawling
Re-Crawling
2-its interface allow the special users decide to continue crawling the website or
cancel it.
1-WebCrawler is specialised of any subject created in the advance knowledge in the database, which will achieve this purpose by reading the URL from the index database using SQL
3-This Part of software aimed to update the index found new link. This is will make search and crawlany “advance knowledge” subject related websites easier
![Page 18: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/18.jpg)
18
Testing phaseTesting phase
20 tests for each category
Test phase requires:checking the first 10 ranking queries results of the “NeuroSearch” withthe same 10 queries results of another search engine such as Google.
abbreviation abbreviation & combined& combined
keywordskeywords
generalgeneral keywordskeywords
specific specific keywordskeywords
AbbreviationAbbreviation keywordskeywords
combinedcombined keywordskeywords
Total ofTotal of 1000 tests 1000 tests
![Page 19: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/19.jpg)
19
Testing cont..
Ranking query test results in General Keywords:
Search Engine Google NeuroSearch Search Engine
First 10
results
Rank Keyword Repeated Rank Keyword repeated Related-keyword
repeatedQuality/
percentage
1 0 0 0 10 1 3 53 3 37%
2 10 1 3 10 1 3 51 3 27%
3 0 0 0 10 1 3 37 3 36%
4 0 0 0 10 1 3 37 3 33.6%
5 0 0 0 10 1 3 34 3 36.7%
6 0 0 0 10 1 3 29 3 38.4%
7 0 0 0 10 1 3 28 3 38.1%
8 0 0 0 10 1 3 28 3 38%
9 0 0 0 10 1 3 28 3 24.9%
10 0 0 0 10 1 3 28 3 13.8%
Average %
10% 10% 100% 100%
Table 1: (Query 1) Ranking query test result in General Keywords: (Eye)
![Page 20: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/20.jpg)
20
Testing cont..The Average Rankinf performance Engine Query test results
(Category based)Error bar = +/- 1 standard deviation
6.33
36.66
1.99
48.99
80.96
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5
Ra
nk
ing
pe
rfo
rm
an
ce
Chart 1 Average of Keywords
performance for Category Based test
results of the (Google)
The Average Keyword Performance Engine Query test results (Category based)
Error bar = +/- 1 standard deviation
92.33 88.49 92.9979.49
98.16
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5
Ra
nk
ing
pe
rfo
rma
nc
e
NeuroSearch
Chart 2 Average of Keywords
performance for Category Based test results of the (NeuroSearch)
![Page 21: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/21.jpg)
21
Analysing the search engines ranking results Depends on the Categories
Independent Samples T-Test Google Search Engine * NeuroSearch Search Engine
-16.920
.000
9 Statisticallysignificant
-4.394
.000
19 Statisticallysignificant
-63.50
.000
19 Statisticallysignificant
-3.387
.003
19 Statisticallysignificant
-2.904
.009
19 Statisticallysignificant
T-value
Sig. (2-tailed)
df (degree offreedom
T-value
Sig. (2-tailed)
df (degree offreedom
T-value
Sig. (2-tailed)
df (degree offreedom
T-value
Sig. (2-tailed)
df (degree offreedom
T-value
Sig. (2-tailed)
df (degree offreedom
General Keywords
Specific keywords
abbreviationskeywords
combinedkeywords
abbreviations,combined andspecific keywords
GoogleSearchEngine
Generalkeywords
SpecificKeywords
abbreviationskeywords
combinedkeywords
abbreviations,combined and
specifickeywords
NeuroSearch Search Engine
Table 4. The Average Ranking Engines Performance Query test results Category based
![Page 22: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/22.jpg)
22
Analysing the Average Ranking Engines Performance Query test results Category based
t test Result analysis Result analysis ..
is used to compare two groups' scores on the same variable
p value < .05).
That indicates, NeuroSearch have a statistically significantly higher mean score in all categories ranking results (100) than Google (52.35)
the negative values of t-test show the (inverse) relation between them when NeuroSearch results increase the Google results decrease.
![Page 23: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/23.jpg)
23
Visual representation
52.35
100
0 10 20 30 40 50 60 70 80 90 100
Ranking Performance
1
Average Ranking Engines performance queries based
Google NeuroSearch
Chart 3 Average of Categories Based Engines ranking performance
90.29
34.98
0102030405060708090
100
Average of Keywords
1
Average Keywords Engines performance queries based
Google NeuroSearch
Chart 4 Average of the keyword Based in the documents in Query test results for (Category based Query) engines performance
![Page 24: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/24.jpg)
24
Conclusion
Although “Although “NeuroSearch”NeuroSearch”
search engine Used search engine Used
a a simple algorithmsimple algorithm to judge the page to judge the page
quality compared by quality compared by
other convenient search engines,other convenient search engines,
Although “Although “NeuroSearch”NeuroSearch”
search engine Used search engine Used
a a simple algorithmsimple algorithm to judge the page to judge the page
quality compared by quality compared by
other convenient search engines,other convenient search engines,
““NeuroSearch”NeuroSearch” proves to be very proves to be very
powerful in obtaining relevant results,powerful in obtaining relevant results,
““NeuroSearch”NeuroSearch” proves to be very proves to be very
powerful in obtaining relevant results,powerful in obtaining relevant results,
Particularly, if its Particularly, if its advance advance knowledge knowledge built/createdbuilt/created by by specialist (domain specialist (domain knowledge),knowledge),
e.g. Oil, Medical, e.g. Oil, Medical, arts, etcarts, etc
Particularly, if its Particularly, if its advance advance knowledge knowledge built/createdbuilt/created by by specialist (domain specialist (domain knowledge),knowledge),
e.g. Oil, Medical, e.g. Oil, Medical, arts, etcarts, etc
![Page 25: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/25.jpg)
25
Reference (example..)
: Wandell, Brain A. Foundations of Vision. Sunderland, Massachusetts, USA, 1995.
Brin, S. and L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. The Seventh Annual International WWW Conference and computing science of Stanford University, Stanford, CA 94305.USA, 1998.
![Page 26: A Specialised Search Engine for Neuroscience WebPages](https://reader031.vdocument.in/reader031/viewer/2022012916/56813478550346895d9b5829/html5/thumbnails/26.jpg)
26
Ready for Questions!!!Ready for Questions!!!