jszap – compressing javascript · 2019-02-25 · gaurav sinha, iit kanpur . a web 2.0 application...
TRANSCRIPT
JSZap: Compressing JavaScript Code
Martin Burtscher, UT Austin
Ben Livshits & Ben Zorn, Microsoft Research
Gaurav Sinha, IIT Kanpur
A Web 2.0 Application Dissected
70,000+ lines of JavaScript code
downloaded 2,855 Functions
1+ MB code
Talks to 14 backend services
(traffic, images, directions, ads, …)
2
Lots of JavaScript being Transmitted
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
www.live.com
spreadsheets.google
maps.live
chi.lexigame
hotmail
gmail
dropthings
maps.google
pageflakes
bunny hunt
Fraction of download that is JavaScript
3
Up to 85% of a Web 2.0
app is JavaScript code!
AJAX: Tension Headaches
4
Execution can’t start without
the code
Move code to client for
responsiveness
JavaScript on the Wire
JavaScript crunch
gzip -d parser AST
JSZap
gzip
5
JSZap Approach
• Represent JavaScript as AST instead of source
• Serialize the compressed AST
• Decompress directly into AST on client
• Use gzip as 2nd-level (de-)compressor
6
Benefits of AST-based Compression
• Compression: less to transmit
• ASTs are blasted directly into the browser
Reduced Latency
• Reduces mobile charges
• Reduces operator network costs: better for servers
Reduced Network Bandwidth
• Ensures well-formedness of code
• Can use to check language subsets easily (AdSafe)
• Caching incremental updates
• Unblocking HTML parser
Correctness, Security, and other Benefits
7
JSZap Compression
JavaScript JSZap gzip
8
JSZap Compression
JavaScript identifiers gzip
literals
productions 1
2
3
9
10
GZIP is a formidable
opponent
JSZap vs. GZIP
11
5.4 5.4
18.4 19.0
8.4 11.5
0
5
10
15
20
25
30
35
40
JSZapgzip
Size
in K
B
Literals Identifiers Productions
Talk Outline
identifiers
literals
productions 1
2
3
evaluation on real code
12
Background: ASTs
a * b + c 1) E E + T
2) E T
3) T T * F
4) T F
5) F id
+
*
a b
c 5
5
1
3
5
13
Expression Grammar Tree
A Simple Javascript Example var y = 2;
function foo () {
var x = "jscrunch";
var z = 3;
z = y + y;
}
x = "jszap";
Identifier Stream
y foo x z z y y x
Literal Stream
"jscrunch" 2 3 "jszap" 14
Production Stream
1 3 4 ... 1 3 4 ...
Benchmarking JSZap
Benchmark name Source lines
Source bytes
gmonkey 922 17,382
getDOMHash 1,136 25,467
bing1 3,758 77,891
bingmap1 3,473 80,066
livemsg1 5,307 93,982
bingmap2 9,726 113,393
facebook1 5,886 141,469
livemsg2 7,139 156,282
officelive1 22,016 668,051
• JavaScript files up to 22K LOC
• Variety of app types
• Both hand-generated, and machine-generated
• gzipped everything
15
Components of JavaScript Source
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
gmo
nke
y
getD
OM
Has
h
bin
g1
bin
gmap
1
livem
sg1
bin
gmap
2
face
bo
ok1
livem
sg2
off
icel
ive1
productions identifiers literals
16
• None of the categories can be ignored
• Identifiers become more prominent with code growth
Compressing the Production Stream
• Frequency-based production renaming
• Differential encoding: 26 and 57 => 2 and 3
• Chain rule: eliminate predictable productions
• Tree-based prediction-by-partial-match
17
PPMC
• Consider compressing
– if (P) then X else X
• Should be very compressible • if (P) then ...abc... else ...abc...
18
P
X X
…
…
• Tree context used to build a predictor
• Provides the next likely child node given context C and child position p
• Arithmetic coding: more likely=shorter IDs
• See paper for details
Production Compression with PPMC
0.6772
50%
55%
60%
65%
70%
75%
80%
85%
90%
95%
100%gm
on
key
getD
OM
Has
h
bin
g1
bin
gmap
1
livem
sg1
bin
gmap
2
face
bo
ok1
livem
sg2
off
icel
ive1Pro
du
ctio
n C
om
pre
ssio
n (
gzip
= 1
)
19
Compressing the Identifier Stream
• Symbol tables instead of identifier stream:
– Compress redundancy: offset into table
– Global or local symbol tables
– Use variable-length encoding
• Other techniques:
– Sort symbols by frequency
– Rename local variables
20
Variable-length Encoding for Identifiers
is global?
is renamed local
00…
01…
fits in 1 byte?
11…
10…
21
Variable-Length Identifier Encoding
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
gmo
nke
y
getD
OM
Has
h
bin
g1
bin
gmap
1
livem
sg1
bin
gmap
2
face
bo
ok1
livem
sg2
off
icel
ive1
parent
local 2byte
local 1byte
local builtin
global 2byte
global 1byte
22
Symbol Tables: Effectiveness
0.943
89%
80%
85%
90%
95%
100%gm
on
key
getD
OM
Has
h
bin
g1
bin
gmap
1
livem
sg1
bin
gmap
2
face
bo
ok1
livem
sg2
off
icel
ive1
Ide
nti
fie
rs (
No
ST =
1)
Global ST VarEnc
23
Compressing Literals
• Symbol tables
• Grouping literals by type
• Pre-fixes and post-fixes
• These techniques result in 5-10% savings compared to gzip
24
Average JSZap Compression: 10%
0.8792
80%
82%
84%
86%
88%
90%
92%
94%
96%
98%
100%gm
on
key
getD
OM
Has
h
bin
g1
bin
gmap
1
livem
sg1
bin
gmap
2
face
bo
ok1
livem
sg2
off
icel
ive1
JSZa
p C
om
pre
ssio
n (
gzip
= 1
)
25
Productions, 26%
Identifiers, 57%
Literals, 17%
13% savings
Summary and Conclusions
• JSZap: AST-based compression for JavaScript
• Propose a range of techniques for compressing – Productions – Identifiers – Literals
• Preliminary results are encouraging: 10% savings over gzip
• Future focus
– Latency measurements – Browser integration
26
Well-formedness
Security (AdSafe)
AST representation
Unblocking HTML parser
Caching and incremental
updates
Compression with JSZap
27
?
Questions?