finding cacheable areas in your web site using python and selenium
DESCRIPTION
Finding cacheable areas in your Web Site using Python and Selenium. David Elfi Intel. What does this session talk about?. Python Performance Web applications Hands on session. Caching. Hot topic in web applications because Better response time across geo distribution Better scalability - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/1.jpg)
Finding cacheable areas in your Web Site using Python
and Selenium
David ElfiIntel
![Page 2: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/2.jpg)
What does this session talk about?
Python Performance Web applications Hands on session
![Page 3: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/3.jpg)
Caching
Hot topic in web applications because- Better response time across geo distribution
- Better scalability
Difficult to focus at development time
Help developers to improve response time
![Page 5: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/5.jpg)
What to do Find text areas repeated in a web resource (page, json response, other dynamic
resources) in order to split them in different responses
Use Cache-Control, Expires and ETag HTTP Headers for caching control
Identify all the dependencies for a given URL
- Even AJAX calls
![Page 6: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/6.jpg)
Proposed Solution Take snapshots in different points in time
- Use selenium for:
- Download ALL the content
- Needs to run JS code for Ajax
Compare the snapshots looking for similarities
- Split the similar text in different HTTP responses
![Page 7: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/7.jpg)
Solution – Snapshots Selenium through a forward proxy
Proxy Twisted
Data
Web ServerStore Content
![Page 8: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/8.jpg)
Running Selenium – Snapshots
Call Selenium from Python
Use of WebDriver
>>> from selenium import webdriver>>>>>> br = webdriver.Firefox()>>> >>> br.get(“http://www.intel.com”)>>> >>> br.close()
![Page 9: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/9.jpg)
Twisted Proxy - Snapshots
class CacheProxyClient(proxy.ProxyClient): def connectionMade(self): # Connection Made. Prepare object properties def handleHeader(self, key, value): # Save response header.
def handleResponsePart(self, buf): # Store response data. def handleResponseEnd(self): # Finished response transmission. Store it
class CacheProxyClientFactory(proxy.ProxyClientFactory): protocol = CacheProxyClient
class CacheProxyRequest(proxy.ProxyRequest): protocols = dict(http=CacheProxyClientFactory)
class CacheProxy(proxy.Proxy): requestFactory = CacheProxyRequest
class CacheProxyFactory(http.HTTPFactory): protocol = CacheProxy
![Page 10: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/10.jpg)
Selenium + Twisted - Snapshots
Run Selenium using Proxy>>> from selenium import webdriver>>> fp = webdriver.FirefoxProfile()>>> fp.set_preference("network.proxy.type", 1)>>> fp.set_preference("network.proxy.http", "localhost")>>> fp.set_preference("network.proxy.http_port", 8080)>>> br = webdriver.Firefox(firefox_profile=fp)
![Page 11: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/11.jpg)
Selenium + Twisted - Snapshots
Configure Twisted and run Selenium in an internal Twisted threadfrom twisted.internet import endpoints, reactor
endpoint = endpoints.serverFromString(reactor, "tcp:%d:interface=%s" % (8080, "localhost"))d = endpoint.listen(CacheProxyFactory()) reactor.callInThread( runSelenium, url_str)
reactor.run()
![Page 12: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/12.jpg)
All together running
![Page 13: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/13.jpg)
1 n32
= 1
= 2
= nComparison method
Output
![Page 14: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/14.jpg)
Comparison
''' Equal sequence searcher '''def matchingString(s1, s2): '''Compare 2 sequence of strings and return the matching sequences concatenated''' from difflib import SequenceMatcher matcher = SequenceMatcher(None, s1, s2) output = "" for (i,_,n) in matcher.get_matching_blocks(): output += s1[i:i+n] return output
def matchingStringSequence( seq ): ''' Compare between pairs up to final result ''' try: matching = seq[0] for s in seq[1:len(seq)]: matching = matchingString(matching, s) return matching except TypeError: return ""
![Page 15: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/15.jpg)
Next Steps Split similar texts in different HTTP responses
Set Cache-Control
- Public
- Private
- No-cache
Set Expires
- Depending on the time it should be cache
Set ETag
- If response is big and does change too often
![Page 16: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/16.jpg)
Advanced Features to be done Detect cache invalidation time from snapshots
SSL supports
Wait for all AJAX calls
Selenium Scripting
- Authenticated URLs
- Full feature sequence
![Page 17: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/17.jpg)
Summary If caching areas has not been identified previous to development, this code could
save time and effort in doing so
Caching areas need to be analyzed for looking best cache method (server cache, CDN, browser caching)
Refactoring for maximizing caching data is the next step
![Page 18: Finding cacheable areas in your Web Site using Python and Selenium](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d97550346895dbaed89/html5/thumbnails/18.jpg)
Q & A