the hiphop virtual machine - semantic scholar...• originally developed to support a fast...

57
THE HIPHOP VIRTUAL MACHINE Keith Adams Facebook 2/6/13

Upload: others

Post on 11-Apr-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

THE HIPHOP VIRTUAL MACHINE Keith Adams Facebook 2/6/13

Why PHP?

Facebook’s PHP Engines •  2004-2009: Zend

•  Switch-on-bytecode C interpreter

•  2009-2012: HipHop •  Ahead-of-time compiler •  Intermediate: a C++ program •  Output: big fat ELF binary •  Devs use interpreter

•  2013-: HipHop VM •  Just-in-time compiler •  Dev/prod unified

HipHop Production Throughput

simpl

e

simpl

ecall

simpl

euca

ll

simpl

eudc

all

man

del

man

del2

acker

man

n(11

)

ary(

50000

000)

ary2(

5000

0000

)

ary3(

2000

0)

fibo(

40)

hash

1(100

0000

0)

hash

2(100

00)

heap

sort(

20000

00)

matrix

(2000

0)

neste

dloop

(36)

sieve

(3000

0)

strca

t(300

00)

binar

y!tree

s

fannk

uch!r

edux

fast

a

k!nuc

leot

ide1

man

delbr

ot

n!bod

y

regex

!dna

rever

se!c

ompl

emen

t

spec

tral!n

orm

Geo

Mea

n

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Zend HHVMi HipHop

Exec

uti

on

tim

e re

lati

ve

to Z

end

Figure 4. Performance comparison of Zend, HHVMi, and HipHop using standard PHP and computer language benchmarks

and updated with the minimal changes necessary to keepit running the site. These updates included support for newlanguage extensions, but no performance enhancements.

In August 2010, we last compared the performance ofHipHop and Zend running Facebook production traffic. Atthat time, the version of Zend used was 5.1.3. For this com-parison, two clusters of servers were used, one with HipHopand another with Zend, and load balancers ensured that thetwo sets received equal traffic. The average CPU time serv-ing requests in each cluster was computed across a large pe-riod of time. On average, HipHop completed web requests2.5⇥ faster than Zend. Since HipHop was much faster, theservers in its cluster were underutilized. We also comparedthe throughput of the clusters when the servers were fullyloaded. To do so, we configured the load balancers to pro-vide enough traffic to keep the CPUs in both clusters 100%utilized. The results of this experiment, measured in numberof requests-per-second (RPS), demonstrated that the HipHopcluster achieve throughput 2.5⇥ higher than Zend. On theleft side of Figure 5, this throughput data is plotted relativeto HipHop.

Figure 5 also plots the throughput improvements made toHipHop after August 2010. This comparison was performedusing the same methodology described above, except thatthe Zend cluster was changed to use the baseline HipHopbranch instead. The load balancers were again set to keepthe servers in each cluster fully utilized, so that we couldmeasure the maximum throughput achieved by each server.Overall, HipHop’s throughput over this period improved by

Aug

’201

0

Dec

’2010

Mar

’201

1

Jun’20

11

Sep’20

11

Dec

’2011

0

0.5

1

1.5

2

2.5

Baseline HipHop HipHop Zend

Rel

ativ

e T

hro

ug

hp

ut

Figure 5. HipHop throughput over time, running Face-book production traffic. The baseline is the throughput ofHipHop’s version from August 2010, which is when Zendlast ran Facebook.

another 2.3⇥. Although we cannot directly compare withZend running the current Facebook web site, these data sug-gest that HipHop’s throughput is now 5.8 (2.5 ⇥ 2.3) timeshigher than what Zend 5.1.3 would achieve. These improve-ments resulted from additional tuning, static analyses, andoptimizations described in Section 3 that were added in thisperiod.

Given the nature of the Facebook application, which pro-cesses huge data sets, one could expect Facebook’s webservers to be I/O bound. However, given the great amount

From “The HipHop Compiler for PHP,” Zhao et al., OOPSLA 2012

HipHop Learnings • Compiler taught us that type inference is the most

important optimization available.

• Many types cannot be inferred at compile-time!

Dynamic Types

goldbach_conjecture() ? 3.14159 : “string” !

mysql_fetch_row($result)[0] !

123.2 / $divisor !

Claim: Types are homogenous in practice •  $a / $b could produce boolean but almost never does

• Database row types rarely change

• Goldbach conjecture can only be true or false

• Etc.

HHVM Vision • Observe types at runtime before generating code • Discover types that cannot be inferred

Aside: HipHop Bytecode • Originally developed to support a fast interpreter • Executable serialization of PHP

=

lval:$c +

rval:$a rval:$b

PushL 1 !PushL 0 !Add!SetL 2 !

Example program <? function mymax($a, $b) { return $a > $b ? $a : $b; }

...

Type polymorphism <? function mymax($a, $b) { return $a > $b ? $a : $b; }

mymax(1, 2) => 2 mymax(1.3, 3.2) => 3.2 => 3.2 mymax(true, false) => true mymax(array(1,2), array(4)) => array(1,2) mymax(array(1,2), array(2,1)) => array(2,1) mymax("Felix", "Bob”) => "Felix"

In HHBC

function mymax($a, $b) { ! return $a > $b ? $a : $b; !} !

PushL 1 ! PushL 0 ! Gt! JmpZ 1f ! PushL 0 ! Jmp 7 2f !1: PushL 1 !2: RetC !

HHVM’s Strategy • Basic-block at a time •  Type specialization • Goal: incremental type discovery

Tracelets • Our compilation unit: the tracelet

• Sequence of control-flow-free bytecode instructions, annotated with type information

• We construct tracelets the moment before running them, translate them to machine code, and chain them together in the code cache

Tracelet construction • mymax(10, 333);

PushL 1 ! PushL 0 ! Gt ! JmpZ 1f ! PushL 0 ! Jmp 7 2f !1: PushL 1 !2: RetC !

Tracelet construction: trimming • mymax(10, 333);

PushL 1 ! PushL 0 ! Gt ! JmpZ 1f – Break at branch! PushL 0 ! Jmp 7 2f !1: PushL 1 !2: RetC !

Tracelet construction • mymax(10, 333);

PushL 1 ! PushL 0 ! Gt ! JmpZ X !

Tracelet construction • mymax(10, 333);

PushL 1 – Relies on locals 0 and 1! PushL 0 ! Gt ! JmpZ X !

Tracelet construction: guards • mymax(10, 333);

Local0 :: Int! Local1 :: Int! PushL 1 ! PushL 0 ! Gt ! JmpZ X !

Tracelet construction: data flow • mymax(10, 333);

Local0 :: Int! Local1 :: Int! PushL 1 ! PushL 0 ! Gt (Int, Int) -> Bool! JmpZ X (Bool) -> () !

Tracelet construction: machine code • mymax(10, 333);

Local0 :: Int! Local1 :: Int! PushL 1 ! PushL 0 ! Gt ! JmpZ X !

cmpl $0x3,-0x4(%rbp) !jne <retranslate> !cmpl $0x3,-0x14(%rbp) !jne <retranslate> !

mov -0x20(%rbp),%rax !mov -0x10(%rbp),%r13 !mov %r13,%rcx !cmp %rax,%rcx !jle <translateSuccessor0> !jmpq <translateSuccessor1!

Tracelet construction • mymax(10, 333);

PushL 1 ! PushL 0 ! Gt ! JmpZ 1f – Taken! ! PushL 0 ! Jmp 7 2f – Skipped!

1: PushL 1 !2: RetC !

Logical view of code cache

$a :: Int, $b :: Int

$a > $b ?

$a :: Int, $b :: Int

return $b

Retranslate B

Retranslate C

Program Flow Guard Flow

Retranslate A A

C

A: PushL 1 ! PushL 0 ! Gt ! JmpZ 1f !B: PushL 0 ! Jmp 7 2f !C: 1: PushL 1 !2: RetC

Call mymax(“a”, “z”) $a :: Int, $b :: Int

$a > $b ?

$a :: Int, $b :: Int

return $b

Retranslate B

Retranslate C

Program Flow Guard Flow

Retranslate A A

C

$a :: String, $b :: String $a > $b ?

$a :: String, $b :: String return $b

Call mymax(“z”, “a”) $a :: Int, $b :: Int

$a > $b ?

$a :: Int, $b :: Int

return $b

Retranslate B

Retranslate C

Program Flow Guard Flow

Retranslate A A

C

$a :: String, $b :: String $a > $b ?

$a :: String, $b :: String return $b

$a :: String, $b :: String return $a

Risk: Code explosion • N inputs, each takes on t types

•  will yield tN separate translations!

• Solution: truncate tracelet chain at 12 items •  Fall back to interpreter. • Applies to 0.0066% of chains

$a :: Null, $b :: Null, $c :: Null

...

Interp A

$a :: Int, $b :: Null, $c :: Null

...

...

$a :: Int, $b :: Bool, $c :: String

...

Function types • PHP methods and functions return untyped data •  Late-bound • Early approach: using return values breaks tracelets

function  hotFunc()  {  …  foreach  ($obj  in  $objs)  {          $s  =  $a[$obj-­‐>getName()];  //  what  type?  }  

Return type prediction class  C  {      …      function  getName()  {  return  $this-­‐>name;  }  }  

class  D  {      …      function  getName()  {  return  “D”;  }  }  

function  hotFunc()  {  …  foreach  ($obj  in  $objs)  {          $s  =  $a[$obj-­‐>getName()];  //  what  type?  }  

Type profiling • Special case of value profiling[1] •  Typical value-profiling records values encountered per-call

site •  Problems

•  Too much information •  Requires long calibrations

•  [1] Calder, et al. "Value profiling." Microarchitecture, 1997. Proceedings., Thirtieth Annual IEEE/ACM International Symposium on. IEEE, 1997.

Fun trick: profile names • Run first N requests in interpreter • Remember method name -> return type info • System learns facts like “methods named getName() tend

to return strings” •  Works for callsites, classes, methods, etc. we haven’t seen in

warmup •  Exploits the ways that humans naturally write code

• Our codebase has millions of call sites, but only 13K unique method names

• Accuracy ~99%

Example name predictions name type reset Null getTimer Int get_loggedin_user Int is_fb_employee Bool array_pull Array IPAddress Object ends_with Bool vqsprintf String html2txt String

Perf

HHVM vs. Zend

Perf: Wordpress

0

200

400

600

800

1000

1200

1400

Zend HHVM Interp

HPHPc HHVM Jit HHVM Repo Jit

Wordpress RPS

RPS

Facebook CPU idle (1/27/2013)

0

10

20

30

40

50

60

70

80

90

1 18

35

52

69

86

103

120

137

154

171

188

205

222

239

256

273

290

307

324

341

358

375

392

409

426

443

460

477

494

511

528

545

562

579

596

613

630

647

664

h3

v3

Road ahead •  Low-hanging fruit • Memory management • Broadened compilation scope •  Improved compilation quality • Science fiction: compilation as big-data

Conclusions • HHVM can run your PHP

•  very fast •  in both prod and dev

•  Lots of cool work left

Team HHVM

Thanks! •  https://github.com/facebook/hiphop-php/ • Our group blog: http://www.hiphop-php.com/wp/

HHBC is a platform, not an IR • Bytecode can be serialized • Runs without source • Natively supports PHP datatypes • Other front-ends, languages could produce HHBC

PHP   AST  

Op)mize  

Parse  HHBC  

Bytecode  Gen  

Ahead: Compilation at scale •  Facebook: many thousands of machines running the

same app

• Almost any profiling/compilation overhead can be amortized

• Challenge: publish local compilation results

Ahead: HackIR • SSA-based IR

• Supports classic compiler optimization suite, multiple machine targets

• Recently checked in

Ahead: Larger compilation scope • Stitch together a method’s tracelets

• HHIR gets larger scope for optimization

•  Tracelet body counts •  Provide both control-flow and type profiling

HipHop Pipeline

$c = $a + $b; // php!

=

lval:$c +

rval:$a rval:$b

_c.assign(_a.add(_b)); // C++ !

parse

optimize

emit

Problems with AoT • PHP development environment suffers

•  Slow •  New class of bug •  Perf work different from “normal” work •  Operational headaches deploying ELF binary

• Missed perf opportunities •  Runtime has more info than compile-time •  Type inference •  Method dispatch

Important related work • Other dynamic language VMs

•  Smalltalk, Self •  ECMAscript: v8, JSC, Tamarin, *Monkey, … •  Other: LuaJIT, PyPy

• Dynamic binary translators •  Shade •  Embra •  VMware’s BT monitor •  DynamoRIO

HipHop GeoMean of 5.6x faster than Zend

From “The HipHop Compiler for PHP,” Zhao et al., to appear in OOPSLA 2012

simpl

e

simpl

ecall

simpl

euca

ll

simpl

eudcall

mandel

mandel2

acker

mann(1

1)

ary(5

0000000)

ary2(

50000000)

ary3(

20000)

fibo(

40)

hash1(1

0000000)

hash2(1

0000)

heapsor

t(2000000)

matr

ix(2

0000)

nestedl

oop(36)

sieve(

30000)

strcat(

30000)

binar

y!trees

fannkuch!re

dux fa

sta

k!nucle

otid

e1

mandelb

rot

n!body

regex

!dna

rever

se!c

ompl

emen

t

spectr

al!nor

m

GeoMean

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Zend HHVMi HipHop

Exe

cutio

n tim

e re

lativ

e to

Zen

d

Figure 4. Performance comparison of Zend, HHVMi, and HipHop using standard PHP and computer language benchmarks

and updated with the minimal changes necessary to keepit running the site. These updates included support for newlanguage extensions, but no performance enhancements.

In August 2010, we last compared the performance ofHipHop and Zend running Facebook production traffic. Atthat time, the version of Zend used was 5.1.3. For this com-parison, two clusters of servers were used, one with HipHopand another with Zend, and load balancers ensured that thetwo sets received equal traffic. The average CPU time serv-ing requests in each cluster was computed across a large pe-riod of time. On average, HipHop completed web requests2.5⇥ faster than Zend. Since HipHop was much faster, theservers in its cluster were underutilized. We also comparedthe throughput of the clusters when the servers were fullyloaded. To do so, we configured the load balancers to pro-vide enough traffic to keep the CPUs in both clusters 100%utilized. The results of this experiment, measured in numberof requests-per-second (RPS), demonstrated that the HipHopcluster achieve throughput 2.5⇥ higher than Zend. On theleft side of Figure 5, this throughput data is plotted relativeto HipHop.

Figure 5 also plots the throughput improvements made toHipHop after August 2010. This comparison was performedusing the same methodology described above, except thatthe Zend cluster was changed to use the baseline HipHopbranch instead. The load balancers were again set to keepthe servers in each cluster fully utilized, so that we couldmeasure the maximum throughput achieved by each server.Overall, HipHop’s throughput over this period improved by

Aug’2010

Dec’20

10

Mar’201

1

Jun’2011

Sep’2011

Dec’20

110

0.5

1

1.5

2

2.5

Baseline HipHop HipHop Zend

Rel

ativ

e T

hrou

ghpu

t

Figure 5. HipHop throughput over time, running Face-book production traffic. The baseline is the throughput ofHipHop’s version from August 2010, which is when Zendlast ran Facebook.

another 2.3⇥. Although we cannot directly compare withZend running the current Facebook web site, these data sug-gest that HipHop’s throughput is now 5.8 (2.5 ⇥ 2.3) timeshigher than what Zend 5.1.3 would achieve. These improve-ments resulted from additional tuning, static analyses, andoptimizations described in Section 3 that were added in thisperiod.

Given the nature of the Facebook application, which pro-cesses huge data sets, one could expect Facebook’s webservers to be I/O bound. However, given the great amount

PHP Perf Challenges • What method/function is being called here? • What types do these data have? • Automatic memory management • High language “surface area”

Prod: Tracelet Chain length

Risk: Warmup • Possible weak point of JIT vs. AoT: warmup latency • We start with an empty code cache • Goal: reach steady state quickly

Warmup: Production requests/second

Code size over time

Jit output / time

The HipHop VM (HHVM) • New execution engine for PHP •  ~5x faster than standard interpreter • No compromises

•  Interactive use same as interpreter •  Real warts-and-all PHP language

Why PHP? •  Large base of software • Mediawiki, Drupal, Wordpress •  Large opportunities for performance improvement

Background: PHP • Server-side • Single-threaded • Stateless

•  Web requests run in isolation •  Request starts with empty heap, runs to completion •  No thread-to-thread memory sharing

• Untyped • Highly dynamic •  Facebook

•  > 18 MLOC •  Significant open source PHP •  Site pushed twice daily

Dynamic binding: functions <?

switch(date('l')) { case ’Wednesday': function foo($a, $b) { return "Wednesday!!!" . (32 * $a - $b); } break; default: function foo($a, $b) { return date('l') . ($a + $b) . "\n"; } } echo foo(1, 2);