css parsing: performance tips & tricks
TRANSCRIPT
CSS Parsing performance tips & tricks
Roman Dvornov Avito
Moscow, September 2016
Frontend lead in Avito
Specializes in SPA
Maintainer of:basis.js, CSSO, component-inspector, csstree and others
CSS parsing (russian)
3
tinyurl.com/csstree-intro
This talk is the continuation of
CSSTree
CSSTree – fastest detailed CSS parser
5
How this project was born
About a year ago I started to maintain CSSO
(a CSS minifier)
7
github.com/css/csso
What's wrong with Gonzales• Development stopped in 2013
• Unhandy and buggy AST format
• Parsing mistakes
• Excessively complex code base
• Slow, high memory consumption, pressure for GC
9
But I didn’t want to spend my time developing the
parser…
10
Alternatives?
You can find a lot of CSS parsers
12
Common problems• Not developing currently
• Outdated (don't support latest CSS features)
• Buggy
• Unhandy AST
• Slow13
PostCSS pros• Сonstantly developing
• Parses CSS well, even non-standard syntax + tolerant mode
• Saves formatting info
• Handy API to work with AST
• Fast15
General con: selectors and values are not parsed
(are represented as strings)
16
That forces developers to• Use non-robust or non-effective approaches
• Invent their own parsers
• Use additional parsers: postcss-selector-parser postcss-value-parser
17
Switching to PostCSS meant writing our own selector and value parsers,
what is pretty much the same as writing an entirely new parser
18
However, as a result of a continuous refactoring within a few months
the CSSO parser was completely rewrote (which was not planned)
19
And was extracted to a separate project
github.com/csstree/csstree
20
Performance
CSSO – performance boost story (russian)
22
tinyurl.com/csso-speedup
My previous talk about parser performance
After my talk on HolyJS conference the parser's
performance was improved one more time :)
23
* Thanks Vyacheslav @mraleph Egorov for inspiration
24
CSSTree: 24 msMensch: 31 msCSSOM: 36 msPostCSS: 38 msRework: 81 msPostCSS Full: 100 msGonzales: 175 msStylecow: 176 msGonzales PE: 214 msParserLib: 414 ms
bootstrap.css v3.3.7 (146Kb)
github.com/postcss/benchmark
Non-detailed AST
Detailed AST
PostCSS Full = + postcss-selector-parser
+ postcss-value-parser
Epic fail as I realised later I extracted
the wrong version of the parser
25
😱github.com/csstree/csstree/commit/57568c758195153e337f6154874c3bc42dd04450
26
CSSTree: 24 msMensch: 31 msCSSOM: 36 msPostCSS: 38 msRework: 81 msPostCSS Full: 100 msGonzales: 175 msStylecow: 176 msGonzales PE: 214 msParserLib: 414 ms
bootstrap.css v3.3.7 (146Kb)
github.com/postcss/benchmark
Time after parser update
13 ms
Parsers: basic training
Main steps
• Tokenization
• Tree assembling
28
Tokenization
30
• whitespaces – [ \n\r\t\f]+ • keyword – [a-zA-aZ…]+ • number – [0-9]+ • string – "string" or 'string' • comment – /* comment */ • punctuation – [;,.#\{\}\[\]\(\)…]
Split text into tokens
31
.foo { width: 10px;}
[ '.', 'foo', ' ', '{', '\n ', 'width', ':', ' ', '10', 'px', ';', '\n', '}']
We need more info about every token: type and location
32
It is more efficient to compute type and location
on tokenization step
33
.foo { width: 10px;}
[ { type: 'FullStop', value: '.', offset: 0, line: 1, column: 1 }, …]
Tree assembling
35
function getSelector() { var selector = { type: 'Selector', sequence: [] };
// main loop
return selector;}
Creating a node
36
for (;currentToken < tokenCount; currentToken++) { switch (tokens[currentToken]) { case TokenType.Hash: // # selector.sequence.push(getId()); break; case TokenType.FullStop: // . selector.sequence.push(getClass()); break; … }
Main loop
37
{ "type": "StyleSheet", "rules": [{ "type": "Atrule", "name": "import", "expression": { "type": "AtruleExpression", "sequence": [ ... ] }, "block": null }]}
Result
Parser performance boost Part 2: new horizons
39
[ { type: 'FullStop', value: '.', offset: 0, line: 1, column: 1 }, …]
Token's cost: 24 + 5 * 4 + array =
min 50 bytes per token
Our project ~1Mb CSS 254 062 tokens
= min 12.7 Mb
Out of the box: changing approach
Compute all tokens at once and then assembly a tree is much more easy, but needs more memory, therefore is
slower
41
Scanner (lazy tokenizer)
42
43
scanner.token // current token or nullscanner.next() // going to next tokenscanner.lookup(N) // look ahead, returns // Nth token from current token
Key API
44
• lookup(N)fills tokens buffer up to N tokens (if they are not computed yet), returns N-1 token from buffer
• next()shift token from buffer, if any, or compute next token
Computing the same number of tokens, but not simultaneously
and requires less memory
45
Problem: the approach puts pressure on GC
46
Reducing token's cost step by step
48
[ { type: 'FullStop', value: '.', offset: 0, line: 1, column: 1 }, …]
Type as string is easy to understand, but it's for
internal use only and we can replace it by numbers
49
[ { type: FULLSTOP, value: '.', offset: 0, line: 1, column: 1 }, …]
…// '.'.charCodeAt(0)var FULLSTOP = 46;…
50
[ { type: 46, value: '.', offset: 0, line: 1, column: 1 }, …]
51
[ { type: 46, value: '.', offset: 0, line: 1, column: 1 }, …]
We can avoid substring storage in the token – it's very
expensive for punctuation (moreover those substrings
are never used); Many constructions are assembled by several
substrings. One long substring is better than
a concat of several small ones
52
[ { type: 46, value: '.', offset: 0, line: 1, column: 1 }, …]
[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]
53
[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]
Look, Ma! No strings just numbers!
54
Moreover not an Array, but TypedArray
Array of objects
Arraysof numbers
Array vs. TypedArray• Can't have holes
• Faster in theory (less checking)
• Can be stored outside the heap (when big enough)
• Prefilled with zeros
55
56
[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]
Uint8Array Uint32Array Uint32Array Uint32Array Uint32Array
1 4 4 4 4
17 per token(tokens count) 254 062 x 17 = 4.3Mb
4.3Mb vs. 12.7Mb (min)
57
Houston we have a problem: TypedArray has a fixed length,
but we don't know how many tokens will be found
58
59
[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]
Uint8Array Uint32Array Uint32Array Uint32Array Uint32Array
1 4 4 4 4
17 per token(symbols count) 983 085 x 17 = 16.7Mb
16.7Mb vs. 12.7Mb (min)
60
16.7Mb vs. 12.7Mb (min)
60
Don't give up, let's look on arrays
more attentively
61
start = [ 0, 5, 6, 7, 9, 11, …, 35 ]
end = [ 5, 6, 7, 9, 11, 12, …, 36 ]
61
start = [ 0, 5, 6, 7, 9, 11, …, 35 ]
end = [ 5, 6, 7, 9, 11, 12, …, 36 ]
…
62
start = [ 0, 5, 6, 7, 9, 11, …, 35 ]
end = [ 5, 6, 7, 9, 11, 12, …, 36 ]
offset = [ 0, 5, 6, 7, 9, 11, …, 35, 36 ] start = offset[i] end = offset[i + 1]
+
=
63
[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]
Uint8Array Uint32Array Uint32Array Uint32Array Uint32Array
1 4 4 4 4
13 per token983 085 x 13 = 12.7Mb
64
a { top: 0;}
lines = [ 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3]
columns = [ 1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1]
lines & columns
64
a { top: 0;}
lines = [ 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3]
columns = [ 1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1]
lines & columns
65
line = lines[offset];
column = offset - lines.lastIndexOf(line - 1, offset);
lines & columns
65
line = lines[offset];
column = offset - lines.lastIndexOf(line - 1, offset);
lines & columns
It's acceptable only for short lines, that's why we cache the last line
start offset
66
[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]
Uint8Array Uint32Array Uint32Array Uint32Array Uint32Array
1 4 4 4 4
9 per token983 085 x 9 = 8.8Mb
67
8.8Mb vs. 12.7Mb (min)
Reduce operations with strings
Performance «killers»*• RegExp • String concatenation • toLowerCase/toUpperCase • substr/substring • …
69
* Polluted GC pulls performance down
Performance «killers»*• RegExp • String concatenation • toLowerCase/toUpperCase • substr/substring • …
70
* Polluted GC pulls performance down
We can’t avoid using these things, but we
can get rid of the rest
71
var start = scanner.tokenStart;
…
scanner.next();
…
scanner.next();
…
return source.substr(start, scanner.tokenEnd);
Avoid string concatenations
72
function cmpStr(source, start, end, str) { if (end - start !== str.length) { return false; }
for (var i = start; i < end; i++) { var sourceCode = source.charCodeAt(i); var strCode = str.charCodeAt(i - start);
if (sourceCode !== strCode) { return false; } }
return true;}
String comparison
No substring!
73
function cmpStr(source, start, end, str) { if (end - start !== str.length) { return false; }
for (var i = start; i < end; i++) { var sourceCode = source.charCodeAt(i); var strCode = str.charCodeAt(i - start);
if (sourceCode !== strCode) { return false; } }
return true;}
String comparison
Length fast-check
74
function cmpStr(source, start, end, str) { if (end - start !== str.length) { return false; }
for (var i = start; i < end; i++) { var sourceCode = source.charCodeAt(i); var strCode = str.charCodeAt(i - start);
if (sourceCode !== strCode) { return false; } }
return true;}
String comparison
Compare strings by char codes
Case insensitive comparison of strings*?
75
* Means avoid toLowerCase/toUpperCase
Heuristics• Comparison with the reference strings only (str)
• Reference strings may be in lower case and contain latin letters only (no unicode)
• I read once on Twitter…
76
Setting of the 6th bit to 1 changes upper case latin letter to lower case
(works for latin ASCII letters only)
'A' = 01000001'a' = 01100001
'A'.charCodeAt(0) | 32 === 'a'.charCodeAt(0)
77
78
function cmpStr(source, start, end, str) { … for (var i = start; i < end; i++) { … // source[i].toLowerCase() if (sourceCode >= 65 && sourceCode <= 90) { // 'A' .. 'Z' sourceCode = sourceCode | 32; }
if (sourceCode !== strCode) { return false; } } …}
Case insensitive string comparison
Benefits• Frequent comparison stops on length check
• No substring (no pressure on GC)
• No temporary strings (e.g. result of toLowerCase/toUpperCase)
• String comparison don't pollute CG
79
Results• RegExp • string concatenation • toLowerCase/toUpperCase • substr/substring
80
No arrays in AST
What's wrong with arrays?• As we are growing arrays their memory
fragments are to be relocated frequently (unnecessary memory moving)
• Pressure on GC
• We don't know the size of resulting arrays
82
Solution?
83
Bi-directional list
84
85
85
AST node AST node AST node AST node
Needs a little bit more memory than arrays, but…
86
Pros• No memory relocation
• No GC pollution during AST assembly
• next/prev references for free
• Cheap insertion and deletion
• Better for monomorphic walkers87
Those approaches and others allowed to reduce memory consumption,
pressure on GC and made the parser twice faster than before
88
89
CSSTree: 24 msMensch: 31 msCSSOM: 36 msPostCSS: 38 msRework: 81 msPostCSS Full: 100 msGonzales: 175 msStylecow: 176 msGonzales PE: 214 msParserLib: 414 ms
bootstrap.css v3.3.7 (146Kb)
github.com/postcss/benchmark
It's about this changes
13 ms
But the story goes on 😋
90
Parser performance boost story Part 3: а week after FrontTalks
In general
• Simplify AST structure
• Less memory consumption
• Arrays reusing
• list.map().join() -> loop + string concatenation
• and others…92
Once more time about token costs
94
[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]
Uint8Array Uint32Array Uint32Array Uint32Array Uint32Array
1 types 4 offsets 4 4 lines 4
9 per token983 085 x 9 = 8.8Mb
lines can be computed on demand
95
96
[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]
Uint8Array Uint32Array Uint32Array Uint32Array Uint32Array
1 types 4 offsets 4 4 lines 4
5 per token983 085 x 5 = 4.9Mb
Do we really needs all 32 bits for the offset?
Heuristics: no one parses more than 16Mb of CSS
97
98
offset = [ 0, 5, 6, 7, 9, 11, 11, …, 1234 ]
type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]
99
offset = [ 0, 5, 6, 7, 9, 11, 11, …, 1234 ]
type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]
offsetAndType[i] = type[i] << 24 | offset[i]
+
=
100
offset = [ 0, 5, 6, 7, 9, 11, 11, …, 1234 ]
type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]
offsetAndType[i] = type[i] << 24 | offset[i]offsetAndType = [ 16777216, 788529157, … ]
+
=
101
offset = [ 0, 5, 6, 7, 9, 11, 11, …, 1234 ]
type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]
offsetAndType[i] = type[i] << 24 | offset[i]offsetAndType = [ 16777216, 788529157, … ]start = offsetAndType[i] & 0xFFFFFF;type = offsetAndType[i] >> 24;
+
=
102
[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]
Uint8Array Uint32Array Uint32Array Uint32Array Uint32Array
1 types 4 offsets 4 4 lines 4
4 per token983 085 x 4 = 3.9Mb
3.9-7.8 Mb vs. 12.7 Mb (min)
103
104
class Scanner { ... next() { var next = this.currentToken + 1;
this.currentToken = next; this.tokenStart = this.tokenEnd; this.tokenEnd = this.offsetAndType[next + 1] & 0xFFFFFF; this.tokenType = this.offsetAndType[next] >> 24; }}
Needs 2 reads for 3 values (tokenEnd becomes tokenStart)
105
class Scanner { ... next() { var next = this.currentToken + 1;
this.currentToken = next; this.tokenStart = this.tokenEnd; this.tokenEnd = this.offsetAndType[next + 1] & 0xFFFFFF; this.tokenType = this.offsetAndType[next] >> 24; }}
But 2 reads look redundant, let's fix it…
106
offset = [ 0, 5, 6, 7, 9, 11, 11, …, 1234 ]
type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]
offsetAndType[i] = type[i] << 24 | offset[i]start = endend = offsetAndType[i + 1] & 0xFFFFFF;type = offsetAndType[i] >> 24;
106
offset = [ 0, 5, 6, 7, 9, 11, 11, …, 1234 ]
type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]
offsetAndType[i] = type[i] << 24 | offset[i]start = endend = offsetAndType[i + 1] & 0xFFFFFF;type = offsetAndType[i] >> 24;
…
107
offset = [ 0, 5, 6, 7, 9, 11, 11, …, 1234 ]
type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]
The first offset is always zero
108
offset = [ 0, 5, 6, 7, 9, 11, 11, …, 1234 ]
type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]
Shift offsets to the left
109
offset = [ 5, 6, 7, 9, 11, 11, …, 1234 ]
type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]
offsetAndType[i] = type[i] << 24 | offset[i + 1]offsetAndType[i] = type[i] << 24 | offset[i]start = endend = offsetAndType[i] & 0xFFFFFF;type = offsetAndType[i] >> 24;
…
110
class Scanner { ... next() { var next = this.currentToken + 1;
this.currentToken = next; this.tokenStart = this.tokenEnd; this.tokenEnd = this.offsetAndType[next] & 0xFFFFFF; this.tokenType = this.offsetAndType[next] >> 24; }}
Now we need just one read
111
class Scanner { ... next() { var next = this.currentToken + 1;
this.currentToken = next; this.tokenStart = this.tokenEnd; next = this.offsetAndType[next]; this.tokenEnd = next & 0xFFFFFF; this.tokenType = next >> 24; }}
-50% reads (~250k)
👌
Re-use
The scanner creates arrays every time when it parses
a new string
113
The scanner creates arrays every time when it parses
a new string
113
New strategy• Preallocate 16Kb buffer by default
• Create new buffer only if current is smaller than needed for parsing
• Significantly improves performance especially in cases when parsing a number of small CSS fragments
114
115
CSSTree: 24 msMensch: 31 msCSSOM: 36 msPostCSS: 38 msRework: 81 msPostCSS Full: 100 msGonzales: 175 msStylecow: 176 msGonzales PE: 214 msParserLib: 414 ms
bootstrap.css v3.3.7 (146Kb)
github.com/postcss/benchmark
13 ms 7 ms
Current results
And still not the end… 😋
116
One more thing
CSSTree – is not just about performance
118
New feature*: Parsing and matching of
CSS values syntax
119
* Currently unique across CSS parsers
Example
120
121
csstree.github.io/docs/syntax.html
CSS syntax reference
122
csstree.github.io/docs/validator.html
CSS values validator
123
var csstree = require('css-tree');var syntax = csstree.syntax.defaultSyntax;var ast = csstree.parse('… your css …');
csstree.walkDeclarations(ast, function(node) { if (!syntax.match(node.property.name, node.value)) { console.log(syntax.lastMatchError); }});
Your own validator in 8 lines of code
Some tools and plugins• csstree-validator – npm package + cli command
• stylelint-csstree-validator – plugin for stylelint
• gulp-csstree – plugin for gulp
• SublimeLinter-contrib-csstree – plugin for Sublime Text
• vscode-csstree – plugin for VS Code
• csstree-validator – plugin for Atom
More is coming…124
Conclusion
If you want your JavaScript works as fast as C, make it look like C
126
Previous talks• CSSO – performance boost story (russian)
tinyurl.com/csso-speedup
• CSS parsing (russian)tinyurl.com/csstree-intro
127