the hiphop virtual machine

57
THE HIPHOP VIRTUAL MACHINE Keith Adams Facebook 2/6/13

Upload: lenga

Post on 11-Feb-2017

227 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: THE HIPHOP VIRTUAL MACHINE

THE HIPHOP VIRTUAL MACHINE Keith Adams Facebook 2/6/13

Page 2: THE HIPHOP VIRTUAL MACHINE

Why PHP?

Page 3: THE HIPHOP VIRTUAL MACHINE

Facebook’s PHP Engines •  2004-2009: Zend

•  Switch-on-bytecode C interpreter

•  2009-2012: HipHop •  Ahead-of-time compiler •  Intermediate: a C++ program •  Output: big fat ELF binary •  Devs use interpreter

•  2013-: HipHop VM •  Just-in-time compiler •  Dev/prod unified

Page 4: THE HIPHOP VIRTUAL MACHINE

HipHop Production Throughput

simpl

e

simpl

ecall

simpl

euca

ll

simpl

eudc

all

man

del

man

del2

acker

man

n(11

)

ary(

50000

000)

ary2(

5000

0000

)

ary3(

2000

0)

fibo(

40)

hash

1(100

0000

0)

hash

2(100

00)

heap

sort(

20000

00)

matrix

(2000

0)

neste

dloop

(36)

sieve

(3000

0)

strca

t(300

00)

binar

y!tree

s

fannk

uch!r

edux

fast

a

k!nuc

leot

ide1

man

delbr

ot

n!bod

y

regex

!dna

rever

se!c

ompl

emen

t

spec

tral!n

orm

Geo

Mea

n

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Zend HHVMi HipHop

Exec

uti

on

tim

e re

lati

ve

to Z

end

Figure 4. Performance comparison of Zend, HHVMi, and HipHop using standard PHP and computer language benchmarks

and updated with the minimal changes necessary to keepit running the site. These updates included support for newlanguage extensions, but no performance enhancements.

In August 2010, we last compared the performance ofHipHop and Zend running Facebook production traffic. Atthat time, the version of Zend used was 5.1.3. For this com-parison, two clusters of servers were used, one with HipHopand another with Zend, and load balancers ensured that thetwo sets received equal traffic. The average CPU time serv-ing requests in each cluster was computed across a large pe-riod of time. On average, HipHop completed web requests2.5⇥ faster than Zend. Since HipHop was much faster, theservers in its cluster were underutilized. We also comparedthe throughput of the clusters when the servers were fullyloaded. To do so, we configured the load balancers to pro-vide enough traffic to keep the CPUs in both clusters 100%utilized. The results of this experiment, measured in numberof requests-per-second (RPS), demonstrated that the HipHopcluster achieve throughput 2.5⇥ higher than Zend. On theleft side of Figure 5, this throughput data is plotted relativeto HipHop.

Figure 5 also plots the throughput improvements made toHipHop after August 2010. This comparison was performedusing the same methodology described above, except thatthe Zend cluster was changed to use the baseline HipHopbranch instead. The load balancers were again set to keepthe servers in each cluster fully utilized, so that we couldmeasure the maximum throughput achieved by each server.Overall, HipHop’s throughput over this period improved by

Aug

’201

0

Dec

’2010

Mar

’201

1

Jun’20

11

Sep’20

11

Dec

’2011

0

0.5

1

1.5

2

2.5

Baseline HipHop HipHop Zend

Rel

ativ

e T

hro

ug

hp

ut

Figure 5. HipHop throughput over time, running Face-book production traffic. The baseline is the throughput ofHipHop’s version from August 2010, which is when Zendlast ran Facebook.

another 2.3⇥. Although we cannot directly compare withZend running the current Facebook web site, these data sug-gest that HipHop’s throughput is now 5.8 (2.5 ⇥ 2.3) timeshigher than what Zend 5.1.3 would achieve. These improve-ments resulted from additional tuning, static analyses, andoptimizations described in Section 3 that were added in thisperiod.

Given the nature of the Facebook application, which pro-cesses huge data sets, one could expect Facebook’s webservers to be I/O bound. However, given the great amount

From “The HipHop Compiler for PHP,” Zhao et al., OOPSLA 2012

Page 5: THE HIPHOP VIRTUAL MACHINE

HipHop Learnings • Compiler taught us that type inference is the most

important optimization available.

• Many types cannot be inferred at compile-time!

Page 6: THE HIPHOP VIRTUAL MACHINE

Dynamic Types

goldbach_conjecture() ? 3.14159 : “string” !

mysql_fetch_row($result)[0] !

123.2 / $divisor !

Page 7: THE HIPHOP VIRTUAL MACHINE

Claim: Types are homogenous in practice •  $a / $b could produce boolean but almost never does

• Database row types rarely change

• Goldbach conjecture can only be true or false

• Etc.

Page 8: THE HIPHOP VIRTUAL MACHINE

HHVM Vision • Observe types at runtime before generating code • Discover types that cannot be inferred

Page 9: THE HIPHOP VIRTUAL MACHINE

Aside: HipHop Bytecode • Originally developed to support a fast interpreter • Executable serialization of PHP

=

lval:$c +

rval:$a rval:$b

PushL 1 !PushL 0 !Add!SetL 2 !

Page 10: THE HIPHOP VIRTUAL MACHINE

Example program <? function mymax($a, $b) { return $a > $b ? $a : $b; }

...

Page 11: THE HIPHOP VIRTUAL MACHINE

Type polymorphism <? function mymax($a, $b) { return $a > $b ? $a : $b; }

mymax(1, 2) => 2 mymax(1.3, 3.2) => 3.2 => 3.2 mymax(true, false) => true mymax(array(1,2), array(4)) => array(1,2) mymax(array(1,2), array(2,1)) => array(2,1) mymax("Felix", "Bob”) => "Felix"

Page 12: THE HIPHOP VIRTUAL MACHINE

In HHBC

function mymax($a, $b) { ! return $a > $b ? $a : $b; !} !

PushL 1 ! PushL 0 ! Gt! JmpZ 1f ! PushL 0 ! Jmp 7 2f !1: PushL 1 !2: RetC !

Page 13: THE HIPHOP VIRTUAL MACHINE

HHVM’s Strategy • Basic-block at a time •  Type specialization • Goal: incremental type discovery

Page 14: THE HIPHOP VIRTUAL MACHINE

Tracelets • Our compilation unit: the tracelet

• Sequence of control-flow-free bytecode instructions, annotated with type information

• We construct tracelets the moment before running them, translate them to machine code, and chain them together in the code cache

Page 15: THE HIPHOP VIRTUAL MACHINE

Tracelet construction • mymax(10, 333);

PushL 1 ! PushL 0 ! Gt ! JmpZ 1f ! PushL 0 ! Jmp 7 2f !1: PushL 1 !2: RetC !

Page 16: THE HIPHOP VIRTUAL MACHINE

Tracelet construction: trimming • mymax(10, 333);

PushL 1 ! PushL 0 ! Gt ! JmpZ 1f – Break at branch! PushL 0 ! Jmp 7 2f !1: PushL 1 !2: RetC !

Page 17: THE HIPHOP VIRTUAL MACHINE

Tracelet construction • mymax(10, 333);

PushL 1 ! PushL 0 ! Gt ! JmpZ X !

Page 18: THE HIPHOP VIRTUAL MACHINE

Tracelet construction • mymax(10, 333);

PushL 1 – Relies on locals 0 and 1! PushL 0 ! Gt ! JmpZ X !

Page 19: THE HIPHOP VIRTUAL MACHINE

Tracelet construction: guards • mymax(10, 333);

Local0 :: Int! Local1 :: Int! PushL 1 ! PushL 0 ! Gt ! JmpZ X !

Page 20: THE HIPHOP VIRTUAL MACHINE

Tracelet construction: data flow • mymax(10, 333);

Local0 :: Int! Local1 :: Int! PushL 1 ! PushL 0 ! Gt (Int, Int) -> Bool! JmpZ X (Bool) -> () !

Page 21: THE HIPHOP VIRTUAL MACHINE

Tracelet construction: machine code • mymax(10, 333);

Local0 :: Int! Local1 :: Int! PushL 1 ! PushL 0 ! Gt ! JmpZ X !

cmpl $0x3,-0x4(%rbp) !jne <retranslate> !cmpl $0x3,-0x14(%rbp) !jne <retranslate> !

mov -0x20(%rbp),%rax !mov -0x10(%rbp),%r13 !mov %r13,%rcx !cmp %rax,%rcx !jle <translateSuccessor0> !jmpq <translateSuccessor1!

Page 22: THE HIPHOP VIRTUAL MACHINE

Tracelet construction • mymax(10, 333);

PushL 1 ! PushL 0 ! Gt ! JmpZ 1f – Taken! ! PushL 0 ! Jmp 7 2f – Skipped!

1: PushL 1 !2: RetC !

Page 23: THE HIPHOP VIRTUAL MACHINE

Logical view of code cache

$a :: Int, $b :: Int

$a > $b ?

$a :: Int, $b :: Int

return $b

Retranslate B

Retranslate C

Program Flow Guard Flow

Retranslate A A

C

A: PushL 1 ! PushL 0 ! Gt ! JmpZ 1f !B: PushL 0 ! Jmp 7 2f !C: 1: PushL 1 !2: RetC

Page 24: THE HIPHOP VIRTUAL MACHINE

Call mymax(“a”, “z”) $a :: Int, $b :: Int

$a > $b ?

$a :: Int, $b :: Int

return $b

Retranslate B

Retranslate C

Program Flow Guard Flow

Retranslate A A

C

$a :: String, $b :: String $a > $b ?

$a :: String, $b :: String return $b

Page 25: THE HIPHOP VIRTUAL MACHINE

Call mymax(“z”, “a”) $a :: Int, $b :: Int

$a > $b ?

$a :: Int, $b :: Int

return $b

Retranslate B

Retranslate C

Program Flow Guard Flow

Retranslate A A

C

$a :: String, $b :: String $a > $b ?

$a :: String, $b :: String return $b

$a :: String, $b :: String return $a

Page 26: THE HIPHOP VIRTUAL MACHINE

Risk: Code explosion • N inputs, each takes on t types

•  will yield tN separate translations!

• Solution: truncate tracelet chain at 12 items •  Fall back to interpreter. • Applies to 0.0066% of chains

$a :: Null, $b :: Null, $c :: Null

...

Interp A

$a :: Int, $b :: Null, $c :: Null

...

...

$a :: Int, $b :: Bool, $c :: String

...

Page 27: THE HIPHOP VIRTUAL MACHINE

Function types • PHP methods and functions return untyped data •  Late-bound • Early approach: using return values breaks tracelets

function  hotFunc()  {  …  foreach  ($obj  in  $objs)  {          $s  =  $a[$obj-­‐>getName()];  //  what  type?  }  

Page 28: THE HIPHOP VIRTUAL MACHINE

Return type prediction class  C  {      …      function  getName()  {  return  $this-­‐>name;  }  }  

class  D  {      …      function  getName()  {  return  “D”;  }  }  

function  hotFunc()  {  …  foreach  ($obj  in  $objs)  {          $s  =  $a[$obj-­‐>getName()];  //  what  type?  }  

Page 29: THE HIPHOP VIRTUAL MACHINE

Type profiling • Special case of value profiling[1] •  Typical value-profiling records values encountered per-call

site •  Problems

•  Too much information •  Requires long calibrations

•  [1] Calder, et al. "Value profiling." Microarchitecture, 1997. Proceedings., Thirtieth Annual IEEE/ACM International Symposium on. IEEE, 1997.

Page 30: THE HIPHOP VIRTUAL MACHINE

Fun trick: profile names • Run first N requests in interpreter • Remember method name -> return type info • System learns facts like “methods named getName() tend

to return strings” •  Works for callsites, classes, methods, etc. we haven’t seen in

warmup •  Exploits the ways that humans naturally write code

• Our codebase has millions of call sites, but only 13K unique method names

• Accuracy ~99%

Page 31: THE HIPHOP VIRTUAL MACHINE

Example name predictions name type reset Null getTimer Int get_loggedin_user Int is_fb_employee Bool array_pull Array IPAddress Object ends_with Bool vqsprintf String html2txt String

Page 32: THE HIPHOP VIRTUAL MACHINE

Perf

Page 33: THE HIPHOP VIRTUAL MACHINE

HHVM vs. Zend

Page 34: THE HIPHOP VIRTUAL MACHINE

Perf: Wordpress

0

200

400

600

800

1000

1200

1400

Zend HHVM Interp

HPHPc HHVM Jit HHVM Repo Jit

Wordpress RPS

RPS

Page 35: THE HIPHOP VIRTUAL MACHINE

Facebook CPU idle (1/27/2013)

0

10

20

30

40

50

60

70

80

90

1 18

35

52

69

86

103

120

137

154

171

188

205

222

239

256

273

290

307

324

341

358

375

392

409

426

443

460

477

494

511

528

545

562

579

596

613

630

647

664

h3

v3

Page 36: THE HIPHOP VIRTUAL MACHINE

Road ahead •  Low-hanging fruit • Memory management • Broadened compilation scope •  Improved compilation quality • Science fiction: compilation as big-data

Page 37: THE HIPHOP VIRTUAL MACHINE

Conclusions • HHVM can run your PHP

•  very fast •  in both prod and dev

•  Lots of cool work left

Page 38: THE HIPHOP VIRTUAL MACHINE

Team HHVM

Page 39: THE HIPHOP VIRTUAL MACHINE

Thanks! •  https://github.com/facebook/hiphop-php/ • Our group blog: http://www.hiphop-php.com/wp/

Page 40: THE HIPHOP VIRTUAL MACHINE

HHBC is a platform, not an IR • Bytecode can be serialized • Runs without source • Natively supports PHP datatypes • Other front-ends, languages could produce HHBC

PHP   AST  

Op)mize  

Parse  HHBC  

Bytecode  Gen  

Page 41: THE HIPHOP VIRTUAL MACHINE

Ahead: Compilation at scale •  Facebook: many thousands of machines running the

same app

• Almost any profiling/compilation overhead can be amortized

• Challenge: publish local compilation results

Page 42: THE HIPHOP VIRTUAL MACHINE

Ahead: HackIR • SSA-based IR

• Supports classic compiler optimization suite, multiple machine targets

• Recently checked in

Page 43: THE HIPHOP VIRTUAL MACHINE

Ahead: Larger compilation scope • Stitch together a method’s tracelets

• HHIR gets larger scope for optimization

•  Tracelet body counts •  Provide both control-flow and type profiling

Page 44: THE HIPHOP VIRTUAL MACHINE

HipHop Pipeline

$c = $a + $b; // php!

=

lval:$c +

rval:$a rval:$b

_c.assign(_a.add(_b)); // C++ !

parse

optimize

emit

Page 45: THE HIPHOP VIRTUAL MACHINE

Problems with AoT • PHP development environment suffers

•  Slow •  New class of bug •  Perf work different from “normal” work •  Operational headaches deploying ELF binary

• Missed perf opportunities •  Runtime has more info than compile-time •  Type inference •  Method dispatch

Page 46: THE HIPHOP VIRTUAL MACHINE

Important related work • Other dynamic language VMs

•  Smalltalk, Self •  ECMAscript: v8, JSC, Tamarin, *Monkey, … •  Other: LuaJIT, PyPy

• Dynamic binary translators •  Shade •  Embra •  VMware’s BT monitor •  DynamoRIO

Page 47: THE HIPHOP VIRTUAL MACHINE

HipHop GeoMean of 5.6x faster than Zend

From “The HipHop Compiler for PHP,” Zhao et al., to appear in OOPSLA 2012

simpl

e

simpl

ecall

simpl

euca

ll

simpl

eudcall

mandel

mandel2

acker

mann(1

1)

ary(5

0000000)

ary2(

50000000)

ary3(

20000)

fibo(

40)

hash1(1

0000000)

hash2(1

0000)

heapsor

t(2000000)

matr

ix(2

0000)

nestedl

oop(36)

sieve(

30000)

strcat(

30000)

binar

y!trees

fannkuch!re

dux fa

sta

k!nucle

otid

e1

mandelb

rot

n!body

regex

!dna

rever

se!c

ompl

emen

t

spectr

al!nor

m

GeoMean

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Zend HHVMi HipHop

Exe

cutio

n tim

e re

lativ

e to

Zen

d

Figure 4. Performance comparison of Zend, HHVMi, and HipHop using standard PHP and computer language benchmarks

and updated with the minimal changes necessary to keepit running the site. These updates included support for newlanguage extensions, but no performance enhancements.

In August 2010, we last compared the performance ofHipHop and Zend running Facebook production traffic. Atthat time, the version of Zend used was 5.1.3. For this com-parison, two clusters of servers were used, one with HipHopand another with Zend, and load balancers ensured that thetwo sets received equal traffic. The average CPU time serv-ing requests in each cluster was computed across a large pe-riod of time. On average, HipHop completed web requests2.5⇥ faster than Zend. Since HipHop was much faster, theservers in its cluster were underutilized. We also comparedthe throughput of the clusters when the servers were fullyloaded. To do so, we configured the load balancers to pro-vide enough traffic to keep the CPUs in both clusters 100%utilized. The results of this experiment, measured in numberof requests-per-second (RPS), demonstrated that the HipHopcluster achieve throughput 2.5⇥ higher than Zend. On theleft side of Figure 5, this throughput data is plotted relativeto HipHop.

Figure 5 also plots the throughput improvements made toHipHop after August 2010. This comparison was performedusing the same methodology described above, except thatthe Zend cluster was changed to use the baseline HipHopbranch instead. The load balancers were again set to keepthe servers in each cluster fully utilized, so that we couldmeasure the maximum throughput achieved by each server.Overall, HipHop’s throughput over this period improved by

Aug’2010

Dec’20

10

Mar’201

1

Jun’2011

Sep’2011

Dec’20

110

0.5

1

1.5

2

2.5

Baseline HipHop HipHop Zend

Rel

ativ

e T

hrou

ghpu

t

Figure 5. HipHop throughput over time, running Face-book production traffic. The baseline is the throughput ofHipHop’s version from August 2010, which is when Zendlast ran Facebook.

another 2.3⇥. Although we cannot directly compare withZend running the current Facebook web site, these data sug-gest that HipHop’s throughput is now 5.8 (2.5 ⇥ 2.3) timeshigher than what Zend 5.1.3 would achieve. These improve-ments resulted from additional tuning, static analyses, andoptimizations described in Section 3 that were added in thisperiod.

Given the nature of the Facebook application, which pro-cesses huge data sets, one could expect Facebook’s webservers to be I/O bound. However, given the great amount

Page 48: THE HIPHOP VIRTUAL MACHINE

PHP Perf Challenges • What method/function is being called here? • What types do these data have? • Automatic memory management • High language “surface area”

Page 49: THE HIPHOP VIRTUAL MACHINE

Prod: Tracelet Chain length

Page 50: THE HIPHOP VIRTUAL MACHINE

Risk: Warmup • Possible weak point of JIT vs. AoT: warmup latency • We start with an empty code cache • Goal: reach steady state quickly

Page 51: THE HIPHOP VIRTUAL MACHINE

Warmup: Production requests/second

Page 52: THE HIPHOP VIRTUAL MACHINE

Code size over time

Page 53: THE HIPHOP VIRTUAL MACHINE

Jit output / time

Page 54: THE HIPHOP VIRTUAL MACHINE

The HipHop VM (HHVM) • New execution engine for PHP •  ~5x faster than standard interpreter • No compromises

•  Interactive use same as interpreter •  Real warts-and-all PHP language

Page 55: THE HIPHOP VIRTUAL MACHINE

Why PHP? •  Large base of software • Mediawiki, Drupal, Wordpress •  Large opportunities for performance improvement

Page 56: THE HIPHOP VIRTUAL MACHINE

Background: PHP • Server-side • Single-threaded • Stateless

•  Web requests run in isolation •  Request starts with empty heap, runs to completion •  No thread-to-thread memory sharing

• Untyped • Highly dynamic •  Facebook

•  > 18 MLOC •  Significant open source PHP •  Site pushed twice daily

Page 57: THE HIPHOP VIRTUAL MACHINE

Dynamic binding: functions <?

switch(date('l')) { case ’Wednesday': function foo($a, $b) { return "Wednesday!!!" . (32 * $a - $b); } break; default: function foo($a, $b) { return date('l') . ($a + $b) . "\n"; } } echo foo(1, 2);