warren he, devdatta akhawe, and prateek mittaluniversity of california berkeley this subset of the...

1
Warren He, Devdatta Akhawe, and Prateek MittalUniversity of California Berkeley This subset of the web application generates new requests to the server by assembling pieces of responses from earlier requests as well as user inputs. The synthesized program shall do Synthesizing Lightweight Web Scripts Macro recorders These record UI events and play them back to a real browser. Testing frameworks These provide a browser scripting interface for automated testing. Browser emulators These implement a scriptable, headless browser. AutoHotkey Selenium Watir Related Work This diagram shows the parts of a web application. We identify a subset of the entire functionality and use programming synthesis techniques to reimplement it. Our Approach Introduction We want to get a program that performs an action on a website from a demonstration of the action performed by a user. There are some “record and replay” solutions available for interacting with web applications. We want the following properties from this program, which we find lacking in existing record-replay solutions: a) Embodies a clear specification of the protocol b) Contributes a reasonably small attack surface c) Has a lightweight runtime that’s easy to integrate In this project, we formulate a common model for these programs and develop a system for creating them automatically from demonstrations. Synthesis We use data from a user demonstration to create a number of program specifications, each one with an input/output example corresponding to the construction of one server request. A synthesizer searches a space of programs to find candidates which satisfy these examples by generating the same request seen in the demonstration. A sample request specification is shown below. Requests comprise variable concat expressi ons Version Space Algebra The space of programs is defined by a domain specific language. Our language only contains operations relevant to constructing HTTP requests. A partial grammar of this language appears below. Demonstration Capture A program that assembles these strings can be though of as emitting a sequence of tokens, which when concatenated, produces the desired string. This step-by-step execution model is a good fit for a technique called version space algebra, which divides a large synthesis problem into smaller, easier problems with known start and end states. Below is a diagram of a sequence of states in a sample program. The state consists of the output “so far” and a stack of encoding procedures. A further optimization comes from the formulaic nature of some string constructions. In the program above, some steps are known from running a parser on the desired output. We wrote a browser extension to record data during while a user demonstrates the action on the standard client . This extension captures the HTTP requests and responses transmitted during the transaction. Additionally, on each form submission event, it saves a snapshot of the states of each control in the form. Our implementation produces Python code. To the left is a sample of the generated code. This program shows how the synthesis deals with: 1.Hidden field 2.Text field 3.Dropdown menu 4.Field left blank Application Code Generation To drive our development of this system, we used it to build a service for helping students apply to graduate school. We demonstrated creating accounts at ten schools and entering some typical personal information. For each school, we used our system to synthesize a program to this automatically. This worked particularly well for five schools which had little JavaScript in their registration process. We built a web interface and connected the programs’ inputs to a single form on our own page. Students can enter their information once on our service and have it automatically start the application procedure at multiple schools. Our service does not actually submit the application; the student signs in to the created account to fill in school-specific details and pay the application fee. This product is online at https://applicast.herokuapp.com/static/client.html

Upload: julie-mcdowell

Post on 27-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Warren He, Devdatta Akhawe, and Prateek MittalUniversity of California Berkeley This subset of the web application generates new requests to the server

Warren He, Devdatta Akhawe, and Prateek Mittal University of California Berkeley

This subset of the web application generates new requests to the server by assembling pieces of responses from earlier requests as well as user inputs.

The synthesized program shall do the same.

Synthesizing Lightweight Web Scripts

Macro recordersThese record UI events and play them back to a real browser.

Testing frameworksThese provide a browser scripting interface for automated testing.

Browser emulatorsThese implement a scriptable, headless browser.

AutoHotkey

Selenium Watir

Related Work

This diagram shows the parts of a web application.

We identify a subset of the entire functionality and use programming synthesis techniques to reimplement it.

Our Approach

IntroductionWe want to get a program that performs an action on a website from a demonstration of the action performed by a user. There are some “record and replay” solutions available for interacting with web applications. We want the following properties from this program, which we find lacking in existing record-replay solutions:

a) Embodies a clear specification of the protocolb) Contributes a reasonably small attack surfacec) Has a lightweight runtime that’s easy to integrate

In this project, we formulate a common model for these programs and develop a system for creating them automatically from demonstrations.

SynthesisWe use data from a user demonstration to create a number of program specifications, each one with an input/output example corresponding to the construction of one server request.

A synthesizer searches a space of programs to find candidates which satisfy these examples by generating the same request seen in the demonstration. A sample request specification is shown below.

Requests comprise variable concat expressions

Version Space Algebra

The space of programs is defined by a domain specific language. Our language only contains operations relevant to constructing HTTP requests. A partial grammar of this language appears below.

Demonstration Capture

A program that assembles these strings can be though of as emitting a sequence of tokens, which when concatenated, produces the desired string. This step-by-step execution model is a good fit for a technique called version space algebra, which divides a large synthesis problem into smaller, easier problems with known start and end states.

Below is a diagram of a sequence of states in a sample program. The state consists of the output “so far” and a stack of encoding procedures.

A further optimization comes from the formulaic nature of some string constructions. In the program above, some steps are known from running a parser on the desired output.

We wrote a browser extension to record data during while a user demonstrates the action on the standard client . This extension captures the HTTP requests and responses transmitted during the transaction.

Additionally, on each form submission event, it saves a snapshot of the states of each control in the form.

Our implementation produces Python code. To the left is a sample of the generated code. This program shows how the synthesis deals with:

1.Hidden field2.Text field3.Dropdown menu4.Field left blank

Application

Code Generation

To drive our development of this system, we used it to build a service for helping students apply to graduate school. We demonstrated creating accounts at ten schools and entering some typical personal information. For each school, we used our system to synthesize a program to this automatically. This worked particularly well for five schools which had little JavaScript in their registration process. We built a web interface and connected theprograms’ inputs to a single formon our own page.

Students can enter theirinformation once on our serviceand have it automatically start theapplication procedure at multipleschools. Our service does notactually submit the application;the student signs in to the createdaccount to fill in school-specificdetails and pay the application fee.

This product is online at https://applicast.herokuapp.com/static/client.html