phantom of the web - meetupfiles.meetup.com/2810092/phantom of the web - 2016-03-08.pdf · outline...

36
Phantom of the Web Dušan Omerčević / Squrb March 8th, 2016

Upload: others

Post on 26-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Phantom of the Web

Dušan Omerčević / SqurbMarch 8th, 2016

Page 2: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Outline of the talk

- Personal introduction- What problem are we solving- Introduction to PhantomJS- PhantomJS tips & tricks- Aggregating data behind login screens

Page 3: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Dušan Omerčević, M.Sc.

- Founder of Squrb- Lead engineering and product development at Zemanta- Head of Software development at Najdi.si- Researcher in the field of computer vision- Led several large software development projects (e.g., highway traffic

management system, electronic toll collection)

Page 4: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

What problem are we solving

Page 5: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Online Services Are On a Roll, Take Back Control

Page 6: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Tracking usage and costs of online services

Page 7: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Sources of usage and costs data

- official APIs (<10% services support it)- credit card statements (costs only)- ERPs (costs only)- emails (costs only)- aggregate data hidden behind login screens (usage and costs data)

- Log in to a dashboard- Retrieve usage data- Retrieve costs data

Given the proliferation of rich, javascript-based web applications it is no longer feasible to parse HTML returned by the server.

Page 8: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Introduction to PhantomJS

Page 9: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

PhantomJS

A headless WebKit scriptable with a JavaScript API.

Use cases:

- Headless website testing (Jasmine, QUnit, Mocha, …)- Web crawling- Screen capture- Page automation- Network monitoring and performance testing (YSlow)- Server rendering of client-side JavaScript- Many nefarious purposes (e.g. ad fraud, website hacking, bidding wars)

Page 10: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

PhantomJS

- released in 2011 by Ariya Hidayat- some 100 contributors on GitHub

- version 2.1.1 (QtWebKit 5.5)- January 2016 (not yet fully stable)- Webkit 538.1 (November 2013) - Chrome 27, Safari 8

- version 2.0.0 (QtWebKit 5)- January 2015 (quite stable)- Webkit 537.11 (2012) - Chrome 23, Safari 6.1

- version 1.9.8- January 2014- Webkit 534.34 (2011) ~ Chrome 13, Safari 5.1

Page 11: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

PhantomJS alternatives / companions:

- slimerJS (scriptable headless Gecko, i.e. Firefox 31)- trifleJS (scriptable headless Internet Explorer)- Zombie.js (scrptable headless Node.js)- casperJS (utilities & syntactic sugar over PhantomJS and SlimerJS)

Page 12: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

PhantomJS: Hello, World!

$ cat hello.js console.log('Hello, world!');phantom.exit();

$ phantomjs hello.js Hello, world!

$ phantomjs [options] somescript.js [arg1 [arg2 [...]]]

Page 13: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

PhantomJS: Loading and Rendering a Page

var page = require('webpage').create();page.open('http://example.com', function(status) { console.log("Status: " + status); if(status === "success") { page.render('example.png'); } phantom.exit();});

Page render supports different formats (jpg, png, pdf), clipping regions, scroll position, zoom, and render quality.

Page 14: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

PhantomJS: Code Evaluation

Based on http://www.slideshare.net/SergeyShekyan/shekyan-zhang-owasp

PhantomJS JavaScript

context

QtWebKit

Web page JavaScript

context

Control

PageEvent

Injection

Callback

var page = require(‘webpage’).create();page.open(url, function(status) { var title = page.evaluate(function() { return document.title; }); console.log(‘Page title is ‘ + title);});

Page 15: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

page.evaluate

- page.evaluate is executed in web page JavaScript context!- page.evaluate serializes and deserializes data structures upon return

(The rule of thumb: if it can be serialized via JSON, then it is fine.)- page.evaluateAsync does the same thing but without blocking the current execution

var page = require(‘webpage’).create();page.open(url, function(status) { var personData = page.evaluate(function() { var nameEl = document.querySelector(‘input#name’); var emailEl = document.querySelector(‘input#email’); if (nameEl == null || emailEl == null) { return null; } return {name: nameEl.value, email: emailEl.value}; }); console.log(‘Person data: ‘ + JSON.stringify(personData));});

Page 16: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Injecting scripts in PhantomJS JavaScript context

Injecting scripts in web page JavaScript context

page.injectJs works the same as page.includeJs except that it pauses execution until the script is fully loaded.

PhantomJS: Loading and injecting scripts

var wasSuccessful = phantom.injectJs('lib/utils.js');

var page = require('webpage').create();page.open('http://www.sample.com', function() { page.includeJs("https://cdnjs.cloudflare.com/libs/jquery.js", function() page.evaluate(function() { $("button").click();});});});

Page 17: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Example module (universe.js)

This module can be used in another script like the following:

PhantomJS: Support for CommonJS Modules

exports.answer = 42;exports.start = function () { console.log('Starting the universe....');}

var universe = require('./universe');universe.start();console.log('The answer is', universe.answer);

Page 18: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Global cookie jar

Page specific cookie jar

PhantomJS: Cookie handling

phantom.addCookie({ 'name': 'Added-Cookie-Name', 'value': 'Added-Cookie-Value', 'domain': '.google.com'});

var page = require('webpage').create();page.addCookie( ...);

Page 19: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

PhantomJS: Handling frames

var frameName = page.framesName[0];var page = require('webpage').create();page.switchToFrame(frameName);page.evaluate(function() { document.querySelector('a#target').click();});

Page 20: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

PhantomJS: Remote control via web servervar $q = require("q"); // Kris Kowal’s Qvar server = require('webserver').create();

var simpleProxyService = server.listen(HTTP_PORT, function(request, response) { $q.Promise(function(resolve, reject, notify) { var page = require('webpage').createPage(); page.onLoadFinished = function(status) { resolve(page.content); }; page.open('https://www.example.com/'); }).then(function (result) { response.statusCode = 200; response.write(result); response.close();});});

PhantomJS and Node.js don’t like each other.

Page 21: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

PhantomJS tips & tricks

Page 22: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

onResourceRequested: requestData

Know what requests are being made by the page:

- very useful for debugging

var webPage = require('webpage');var page = webPage.create();

page.onResourceRequested = function(requestData, networkRequest) { console.log('Request (#' + requestData.id + '): ' + JSON.stringify(requestData));};

Page 23: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

onResourceRequested: abort()

Abort the current network request

- Speed up page rendering (e.g. by not loading tracking JS libraries and large images)

- Prevent PhantomJS crashes triggered by external libraries

var webPage = require('webpage');var page = webPage.create();

page.onResourceRequested = function(requestData, networkRequest) { networkRequest.abort();};

Page 24: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

onResourceRequested: changeUrl(newUrl)

Provide an alternative implementation of a resource:

- Mocking-up libraries & altering page functionality- e.g. networkRequest.changeUrl(requestData.url.replace('perPage=20', 'perPage=1000'));

- Speed up page rendering (e.g. by replacing remote resources with local copies)

var webPage = require('webpage');var page = webPage.create();

page.onResourceRequested = function(requestData, networkRequest) { networkRequest.changeUrl(‘dummy.js’);};

Page 25: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Changing request headers before the request is made:

- Mocking-up requests

onResourceRequested: setHeader(key, value)

var webPage = require('webpage');var page = webPage.create();

page.onResourceRequested = function(requestData, networkRequest) { networkRequest.setHeader(‘Authorization’, ‘Bearer 08xvgs7sbd6d’);};

Page 26: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

onResourceReceived

var fs = require('fs');var page = require('webpage').createPage();// Do not forget to set this!page.captureContent = ['app.example.com/account/billing'];

page.onResourceReceived = function(response) { if (response.url.indexOf('app.example.com/account/billing') >= 0 && response.body.length == 0) { fs.write(‘invoice.pdf’, response.body, 'b');}

page.open(‘app.example.com/account/billing/invoice_1234.pdf’);

Retrieving body content upon onResourceReceived does not work in PhantomJS 2.1.x (it’s a known bug!)

Page 27: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Making async XMLHttpRequests

page.evaluate(function() { var http = new XMLHttpRequest(); http.open('POST', 'https://www.example.com/search?search_type=users', true); http.setRequestHeader('Content-type', 'application/json'); http.onreadystatechange = function() { if (http.readyState == 4 && http.status == 200) window.callPhantom(http.responseText); }

http.send('{"search":{"page":1,"per_page":1000}}');});

page.onCallback = function(responseText) { var result = JSON.parse(responseText);}

Page 28: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Mouse clicking & key pressing

The events are not synthetic DOM events, each event is sent to the web page as if it comes as part of user interaction.

var SHIFT_KEY = 0x02000000;var ALT_KEY = 0x08000000;

var page = require('webpage').create();page.open('http://phantomjs.org/quick-start.html', function(status) { var element = page.evaluate(function() { return document.querySelector('img[alt="PhantomJS"]'); });

page.sendEvent('click', element.offsetLeft, element.offsetTop, 'left'); page.sendEvent('keypress', page.event.key.A, null, null, SHIFT_KEY | ALT_KEY);});

Page 29: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Detecting PhantomJS

Exploiting differences between PhantomJS and a real browser:

- outdated WebKit engine- uses QtWebKit wrapper around WebKit- no video and audio- no plug-ins- exposes window.callPhantom and window._phantom- no sandboxing (turn a headless browser against the attacker :) )

Detailed information available in http://www.slideshare.net/SergeyShekyan/shekyan-zhang-owasp

Page 30: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Aggregating data behind login screens

Page 31: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Logging in

- open login screen, enter credentials, and click log in button (the most common scenario)

- POSTing credentials- logging in using identity providers (e.g. Google, GitHub)

- 1st, log in to identity provider,- 2nd, click on “Login with Google” or “Login with GitHub” button (voilà!)

- 2-factor authentication- keep the PhantomJS session running, while asking user to enter 2nd factor- 2FA is not a panacea for session hijacking!

- CAPTCHAs- screengrab CAPTCHA and ask user to solve it, while keeping session running

Page 32: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

POSTing credentials example

var settings = { operation: "POST", encoding: "utf8", headers: { 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8' }, data: encodeURI('username=' + username + '&password=' + password)};

var page = require('webpage').create();page.open('https://app.example.com/v2/users/login/', settings, processLogin);

Page 33: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Parsing dashboard data

var invoices = page.evaluate(function() { var result = []; var invoiceElements = document.querySelectorAll('table#invoice-list > tbody > tr'); for (var i = 0; i < invoiceElements.length; i++) { var invoiceFields = invoiceElements[i].querySelectorAll('td'); var invoiceURL = invoiceFields[0].querySelector('a').href; result.push({ invoiceDate: new Date(Date.parse(invoiceFields[1].textContent.trim())), invoiceID: invoiceURL.match(/invoice_as_pdf\/(.*)/i)[1], amount: invoiceFields[2].textContent.trim(), description: invoiceFields[0].textContent.trim(), invoiceURL: invoiceURL }); } return result;});

Page 34: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Make use of unofficial APIs

(used extensively by modern javascript-based web applications)

var page = require('webpage').createPage();page.captureContent = ['projects.exampleapp.com/api/account'];

page.onResourceReceived = function(response) { if (response.url.indexOf('projects.exampleapp.com/api/account') >= 0 && response.body.length == 0) { var accountInfo = JSON.parse(response.body);}

page.open(‘https://projects.exampleapp.com/d/main#/team/account’);

Page 35: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Thank you!

(It’s Q&A time now!)

Page 36: Phantom of the Web - Meetupfiles.meetup.com/2810092/Phantom of the Web - 2016-03-08.pdf · Outline of the talk - Personal introduction - What problem are we solving - Introduction

Fun Times Ahead

Squrb has an early start in a potentially enormous market.

We’re looking for a product-minded engineer with solid JavaScript knowledge to join the core team.

Contact me at [email protected] for more information.