secure search engine ivan zhou xinyi dong. project overview the secure search engine project is a...
TRANSCRIPT
Secure Search EngineIvan ZhouXinyi Dong
Project Overview The Secure Search Engine project is a
search engine that utilizes special modules to test the validity of a web page.
These tests consist of verifying the web page's certificates and determining if the page in question is a phishing site.
Changed goal: Setup a proxy in the cloud that would be the medium to communicate between the client and the SSE.
Detailed architectureComponents:
Browser / Phone Proxy SSE
Certificate verification modulePhishing status verification module
Detailed architectureCell
PhoneBrowser
SSE
Internet Proxy
CertificateVerification
PhishingVerification
Project Description
Migration of a different SSE project to the Mobicloud.
Test and modify if necessary the SSE in this new environment.
Set up a proxy and its code to make it able to communicate to the SSE server and do proper work.
Roadmap2/6 2/15 2/27 3/10 3/20 4/8 4/18
Migration
Testing/Fixing
Background Crawler
Android SSE
Migration of SSE (another version)
Modify and Test SSE
Setup Proxy
ProxySSE Communication
Task AllocationsIvan: Migration of another version of SSE Modify and test new SSE
Xinyi: Setup Proxy Communication between SSE & Proxy
Technical Details for task 1
Task 1: Migrate the existing SSE project from a local environment to Mobicloud All software installation: Apache Tomcat,
MySQL, Netbeans, SVN, Java JDK, Jython. Configuration: VM’s Internet connection,
VNC configuration, PATH for Java/Tomcat/SVN, connection for MySQL server
Publish website to Apache Tomcat
Technical Details for new task 2
Two parts need to be tested carefully Phishing Filter Crawler
Phishing Filter Checks with the database if it is a
phishing site or not See if a third party site(phishtank) has
said it is a phishing site Compute the confidence ourselves.
Technical Details for task 2 Crawler.py: A Python implementation of java
code to crawl webpage’s information Seeds in Database Crawl domain Crawl domain path Crawl child links
Difficulties encountered: Webpages’ particularity (Localhost) (solved) Only connect with port 443. Port 80? (solved) Unreasonable logic in crawler.py(depth..)
(exploring) Other problems (exploring)
Technical Details for task 3 Develop a background process to
frequently update the bank database for the crawler. crontab -e Syntax: min|hour|day|month|weekday|
command 00*** /sse/crawler.py
Technical Details for task 4 Create an Android component to
integrate SSE into a mobile device (tentative). All applications are written using the Java
programming language. Android SDK. Eclipse: ADT Plugin. Current firmware v2.1 update 1 on Droid.
Newest firmware available v2.2.1
Technical Details for task 5Migration of another version of SSE Reasons:
Previous SSE was buggy and therefore not stable.
Previous SSE’s phishing filter was not working. Previous SSE was not working properly on some
sites. Same procedure as last version, but use
Eclipse IDE instead of Netbeans IDE.
Technical Details for task 6Modify and test new SSE Cleanup multiple copies of code. Broken PhishingFilter / Google Pagerank
Used to point to: http://zquery.com/api?q= Now uses (limited):
http://webinfodb.net/a/pr.php?url= Additional:
http://api.exslim.net/pagerank/check Change of threshold value
Technical Details for task 7Setup Proxy We set another VM in our mobicloud
system as proxy. The proxy c-icap forward request to web
server. Use VPN to connect from client to the
proxy.
Technical Details for task 8Communication between SSE & Proxy At the proxy, add code in check_url
module to get features: Request SSE server with CURL and get returned
value. Parse the returned webpage and analyze which
kind of site it is(hasCertificate, isPhishing). Warn and block the do-not-has-certificate and
phishing site.
Demo
Conclusion
The project is completed. The SSE server is modified from “do not
have phishing checking” to being able to check both certificate and phishing site.
The proxy takes the computation load off the client side. So now the requests to the SSE, and parsing and analyzing of the results, can all be done at the proxy level.
Thank you! Comments & Questions.