parallel&scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfcompact •...

35
Parallel Scan Bryan Mills, PhD Spring 2017

Upload: others

Post on 31-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Parallel  Scan  

Bryan  Mills,  PhD    

Spring  2017  

Page 2: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Scan  

•  Prefix  Sum  Example  

1   2   3   4   5   6   7   8  

1   3   6   10   15   21   28   36  

INPUT  

OUTPUT  

Page 3: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Generalized  Scan  

•  f(x,y)  =  operaJon  – Can  be  any  binary  associaJve  operaJon  

•  sum,  mulJply,  logical  and/or,  max,  min  

•  0  =  IdenJty  element  – zero  for  sum,  one  for  mulJplicaJon,  true  for  logical,  false  for  logical  or,  0  for  unsigned  max  

Page 4: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Generalized  Scan  Type acc = identity;for (i = 0; I < elems.length(); i++) {

acc = f(acc, elems[i]);out[i] = acc;

}

•  How  many  steps?  

•  How  much  work?  

n  

n  

Page 5: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Why?  

INPUT  

OUTPUT  

•  Many  problems  look  like  this  –  Output  depends  on  one  new  input  and  previous  output  –  Balancing  checkbook,  regressions,  even  sorJng  can.  

Page 6: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Types  of  Scan  

•  Exclusive  – Output  all  elements  excluding  the  current    

•  Inclusive  – Output  all  elements  including  the  current  

1   2   3   4   5   6   7   8  

0   1   3   6   10   15   21   28  

INPUT  

OUTPUT  

1   3   6   10   15   21   28   36  OUTPUT  

Page 7: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Looks  serial  INPUT  

OUTPUT  

•  It  is  non-­‐obvious  how  to  convert  to  parallel  however  we  will  look  at  two  algorithms  – Hillis/Steele    – Blelloch  

Page 8: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Hillis/Steele  Inclusive  Scan  1   2   3   4   5   6   7   8  

3  

Input  

Step  1  

Add  direct  neighbors  n  =  1  

Page 9: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Hillis/Steele  Inclusive  Scan  1   2   3   4   5   6   7   8  

3   5   7   9   11   13   15  

Input  

Step  1  

Add  direct  neighbors  n  =  1  

Page 10: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Hillis/Steele  Inclusive  Scan  1   2   3   4   5   6   7   8  

1   3   5   7   9   11   13   15  

Input  

Step  1  

If  no  neighbors  then  copy  previous  step  

Page 11: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Hillis/Steele  Inclusive  Scan  1   2   3   4   5   6   7   8  

1   3   5   7   9   11   13   15  

Input  

Step  1  

6  Step  2  

Add  neighbors  two  away  n  =  2  

Page 12: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Hillis/Steele  Inclusive  Scan  1   2   3   4   5   6   7   8  

1   3   5   7   9   11   13   15  

Input  

Step  1  

6   10   14   18   22   26  Step  2  

Add  neighbors  two  away  n  =  2  

Page 13: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Hillis/Steele  Inclusive  Scan  1   2   3   4   5   6   7   8  

1   3   5   7   9   11   13   15  

Input  

Step  1  

1   3   6   10   14   18   22   26  Step  2  

If  no  neighbors  copy  down  

Page 14: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Hillis/Steele  Inclusive  Scan  1   2   3   4   5   6   7   8  

1   3   5   7   9   11   13   15  

Input  

Step  0  

1   3   6   10   14   18   22   26  Step  1  

Add  neighbors  four  away  n  =  4  Generally  n  =  2step  

15  Step  2  

Page 15: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Hillis/Steele  Inclusive  Scan  1   2   3   4   5   6   7   8  

1   3   5   7   9   11   13   15  

Input  

Step  0  

1   3   6   10   14   18   22   26  Step  1  

Add  neighbors  four  away  n  =  4  Generally  n  =  2step  

15   21   28   36  Step  2  

Page 16: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Hillis/Steele  Inclusive  Scan  1   2   3   4   5   6   7   8  

1   3   5   7   9   11   13   15  

Input  

Step  0  

1   3   6   10   14   18   22   26  Step  1  

Copy  down  others  

1   3   6   10   15   21   28   36  Step  2  

Page 17: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Hillis/Steele  Inclusive  Scan  1   2   3   4   5   6   7   8  

1   3   5   7   9   11   13   15  

Input  

Step  0  

1   3   6   10   14   18   22   26  Step  1  

You  now  have  the  inclusive  scan.  

1   3   6   10   15   21   28   36  Step  2  

Steps  =  O(log  n)  Work  =  O(n  log  n)  <-­‐  dimensions  of  rectangle  above  

Page 18: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Blelloch  Exclusive  Scan  

•  Happens  in  two  passes:  – Reduce  

•  Like  previous  reduce  steps  but  keep  around  intermediate  results  

– Down  sweep  •  New  operaJon  

Page 19: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Blelloch  Exclusive  Scan  1   2   3   4   5   6   7   8  Input  

Page 20: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Blelloch  Exclusive  Scan  1   2   3   4   5   6   7   8  Input  

3   7   11   15  Step  0  n  =  2^0  =  1  

Start  a  simple  reduce  

Page 21: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Blelloch  Exclusive  Scan  1   2   3   4   5   6   7   8  Input  

3   7   11   15  Step  0  n  =  2^0  =  1  

Similar  to  other  reduces,  however  keep  around  the  intermediate  results:  3,  7,  11,  15,  10,  26  

10   26  Step  1  n  =  2^1  =  2  

36  Step  2  n  =  2^2  =  4  

Page 22: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Blelloch  Down  Sweep  OperaJon  

•  Reverse  reduce  step  –  Same  inputs  (leb  and  right)  –  But  two  outputs  also  leb  and  right  

•  Add  L+R  and  put  on  right  •  Copy  down  R  and  put  it  on  leb  

L   R  

L+R  R  

Page 23: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Blelloch  Down  Sweep  1   2   3   4   5   6   7   8  Input  

3   7   11   15  Step  0  n  =  2^0  =  1  

10   26  Step  1  n  =  2^1  =  2  

36  Step  2  n  =  2^2  =  4  

0  

First  replace  last  column  with  idenJty  element,  for  sum  this  is  zero  (0)  

Replace  with  IdenJty  elem  

Page 24: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Blelloch  Down  Sweep  1   2   3   4   5   6   7   8  Input  

3   7   11   15  Step  0  n  =  2^0  =  1  

10   26  Step  1  n  =  2^1  =  2  

36  Step  2  n  =  2^2  =  4  

10   0  Replace  with  IdenJty  elem  

Down  sweep  OperaJon  

Copy  element  down  from  previous  reduce,  this  is  why  you  need  to  keep  around  the  intermediate  reduce  results.  

Page 25: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Blelloch  Down  Sweep  1   2   3   4   5   6   7   8  Input  

3   7   11   15  Step  0  n  =  2^0  =  1  

10   26  Step  1  n  =  2^1  =  2  

36  Step  2  n  =  2^2  =  4  

10   0  Replace  with  IdenJty  elem  

0   10  Down  sweep  OperaJon  

Down  sweep  operaJon  Copy  right  to  leb  (0)  Put  L+R  to  the  right  (10  +  0  =  10)  

Page 26: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

1   2   3   4   5   6   7   8  Input  

3   7   11   15  Reduce  Step  0  

10   26  Reduce  Step  1  

36  Reduce  Step  2  

10   0  IdenJty  Elem  

3   0   11   10  Down  Sweep  0  

0   3   10   21  Down  Sweep  1  

Page 27: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

1   2   3   4   5   6   7   8  Input  

3   7   11   15  Reduce  Step  0  

10   26  Reduce  Step  1  

36  Reduce  Step  2  

10   0  IdenJty  Elem  

3   0   11   10  Down  Sweep  0  

1   0   3   3   5   10   7   21  Down  Sweep  1  

0   1   3   6   10   15   21   28  Down  Sweep  2  

Page 28: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Blelloch  Exclusive  Scan  

•  #  Steps?  – 2  log  n  =  O(log  n)  

•  Work?  – O(n)  

Page 29: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Compact  

•  Many  Jmes  you  lots  of  data  and  you  only  want  to  perform  some  computaJon  on  a  subset  of  that  data.  – Logs  analysis:  only  look  at  logs  containing  a  certain  search  term  or  type  of  search  term  

– Graphics:  only  perform  ray  tracing  on  elements  in  the  viewport  

– Big  Data:  Calculate  histogram  of  incomes  for  everyone  with  a  dog  

Page 30: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

What  is  compact  

•  Given  some  predicate  funcJon  remove  those  elements  which  return  false  and  “squeeze”  the  data  into  the  required  space.  

1   2   3   4   5   6   7   8  

T   F   T   F   T   F   T   F  

Input  

Predicate  Is  odd?  

1   3   5   7  Output  

Page 31: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Parallel  Compact  

•  Just  have  each  thread  evaluate  predicate  and  copy  only  on  true.  

1   2   3   4   5   6   7   8  

T   F   T   F   T   F   T   F  

Input  

Predicate  Is  odd?  

1   3   5   7  Dense  Output  

1   -­‐   3   -­‐   5   -­‐   7   -­‐  Sparse  Output  

Sparse  is  easy  in  parallel,  how  do  we  get  dense  output  in  parallel?  

Page 32: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Parallel  Compact  

•  Just  have  each  thread  evaluate  predicate  and  copy  only  on  true.  

1   2   3   4   5   6   7   8  

T   F   T   F   T   F   T   F  

Input  

Predicate  Is  odd?  

1   3   5   7  Dense  Output  

1   -­‐   3   -­‐   5   -­‐   7   -­‐  Sparse  Output  

Sparse  is  easy  in  parallel,  how  do  we  get  dense  output  in  parallel?  

Page 33: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Parallel  Compact  1   2   3   4   5   6   7   8  

T   F   T   F   T   F   T   F  

Input  

Predicate  Is  odd?  

1   3   5   7  Dense  Output  

1   -­‐   3   -­‐   5   -­‐   7   -­‐  Sparse  Output  

Look  at  the  desired  address  locaJons  in  the  dense  output  for  each  for  each  element  in  the  sparse  output    

Address  in  dense  output   0   -­‐   1   -­‐   2   -­‐   3   -­‐  

Page 34: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Parallel  Compact  1   2   3   4   5   6   7   8  

T   F   T   F   T   F   T   F  

Input  

Predicate  Is  odd?  

1   3   5   7  Dense  Output  

True/False  =  1/0  

Address  in  dense  output   0   -­‐   1   -­‐   2   -­‐   3   -­‐  

1   0   1   0   1   0   1   0  

0   1   1   2   2   3   3   4  Exclusive  Scan  on  1/0    

To  get  addresses  for  dense  output  run  a  SCAN  !  !  

Page 35: Parallel&Scan&people.cs.pitt.edu/~bmills/docs/teaching/cs1645/lecture_scan.pdfCompact • Many&Jmes&you&lots&of&dataand&you&only& wantto&perform&some&computaon&on&a subsetof&thatdata.&

Parallel  Compact  Steps  

1.  Run  Predicate  2.  Create  a  scan-­‐in  array    – True  =  1  – False  =  0  

3.  Run  exclusive  scan  over  scan-­‐in  array  – Output  is  the  scaker  addresses  for  input  

4.  Scaker  the  input  into  output  addresses