sanders: algorithm engineering algorithm engineering für...

334
Sanders: Algorithm Engineering June 16, 2015 1 Algorithm Engineering für grundlegende Datenstrukturen und Algorithmen Peter Sanders Was sind die schnellsten implementierten Algorithmen für das 1 × 1 der Algorithmik: Listen, Sortieren, Prioritätslisten, Sortierte Listen, Hashtabellen, Graphenalgorithemen?

Upload: others

Post on 02-Sep-2019

5 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 1

Algorithm Engineering fürgrundlegende Datenstrukturen und

Algorithmen

Peter SandersWas sind die schnellsten implementierten Algorithmen

für das 1×1 der Algorithmik:

Listen, Sortieren, Prioritätslisten, Sortierte Listen, Hashtabellen,

Graphenalgorithemen?

Page 2: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 2

Nützliche Vorkenntnisse

Algorithmen I

Algorithmen II

etwa Rechnerarchitektur (oder Ct lesen ;-)

passive Kenntnisse von C/C++

Vertiefungsgebiet: Algorithmik

Page 3: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 3

Material

Folien

wissenschaftliche Aufsätze.

Siehe Vorlesungshomepage

Basiskenntnisse: Algorithmenlehrbücher,

z.B. Mehlhorn/Sanders, Cormen et al.

Mehlhorn Näher: The LEDA Platform of Combinatorial

and Geometric Computing. Gut für die fortgeschrittenen

Papiere.

Catherine McGeoch, A Guide to Experimental

Algorithmics

Page 4: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 4

Zusatzaufgabe

Wahl zwischen Vortrag und Programmierprojekt.

Themen siehe Webseite

20% der Note

Vorträge 20min Vortrag + 10min Diskussion

Blockveranstaltung gegen Ende / kurz nach der

Vorlesungszeit

keine Ausarbeitung

i.allg. ein Paper lesen und verständlich präsentieren

spätestens 1 Woche vorher weit fortgeschrittene

Folienentwürfe einreichen

Page 5: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 5

Programmierprojekte

5–10min Vortrag + 5min Diskussion

2 Seiten Ausarbeitung + Quellen, Deadline 1.4.2013

(möglichst vorher)

Danach noch Nachbesserungsmöglichkeit bis Beginn

Vorlesungszeit

Page 6: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 6

Überblick Was ist Algorithm Engineering, Modelle, . . .

Erste Schritte: Arrays, verkettete Listen, Stacks, FIFOs,. . .

Sortieren rauf und runter

Prioritätslisten

Sortierte Listen

Hashtabellen

Minimale Spannbäume

Kürzeste Wege

Ausgewählte fortgeschrittene Algorithmen, z.B. maximale

Flüsse

Methodik: in Exkursen

Page 7: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 7

Algorithmics

= the systematic design of efficient software and hardware

computer science

logics correct

efficient

soft− & hardware

the

ore

tic

al p

rac

tica

l

algorithmics

Page 8: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 8

(Caricatured) Traditional View: Algorithm Theory

models

design

analysis

perf. guarantees applications

implementation

deduction

Theory Practice

Page 9: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 9

Gaps Between Theory & Practice

Theory ←→ Practice

simple appl. model complex

simple machine model real

complex algorithms FOR simple

advanced data structures arrays,. . .

worst case max complexity measure inputs

asympt. O(·) efficiency 42% constant factors

Page 10: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 10

Algorithmics as Algorithm Engineering

models

design

performanceguarantees

deduction

analysis experiments

algorithmengineering

??

?

Page 11: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 11

Algorithmics as Algorithm Engineering

models

design

performanceguarantees

deduction

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering

implementation

Page 12: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 12

Algorithmics as Algorithm Engineering

realisticmodels

performanceguarantees

deduction

falsifiable

inductionhypotheses experiments

algorithmengineering

design

analysis

real.

real.

implementation

Page 13: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 13

Algorithmics as Algorithm Engineering

realisticmodels

design

implementation

librariesalgorithm

performanceguarantees

deduction

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering

benchmarksinputs

Page 14: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 14

Algorithmics as Algorithm Engineering

appl. engin.

realisticmodels

design

implementation

librariesalgorithm

performanceguarantees

applications

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering

deduction

benchmarksinputs

Page 15: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 15

Goals

bridge gaps between theory and practice

accelerate transfer of algorithmic results into applications

keep the advantages of theoretical treatment:

generality of solutions and

reliabiltiy, predictabilty from performance guarantees

appl. engin.

realisticmodels

design

implementation

librariesalgorithm

performanceguarantees

applications

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering

deduction

benchmarksinputs

Page 16: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 16

Bits of History1843– Algorithms in theory and practice

1950s,1960s Still infancy

1970s,1980s Paper and pencil algorithm theory.

Exceptions exist, e.g., [J. Bentley, D. Johnson]

1986 Term used by [T. Beth],

lecture “Algorithmentechnik” in Karlsruhe.

1988– Library of Efficient Data Types and

Algorithms (LEDA) [K. Mehlhorn]

1990– DIMACS Implementation Challenges [D. Johnson]

1997– Workshop on Algorithm Engineering

ESA applied track [G. Italiano]

1997 Term used in US policy paper [Aho, Johnson, Karp, et. al]

1998 Alex workshop in Italy ALENEX

Page 17: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 17

Warum diese Vorlesung?

Jeder Informatiker kennt einige Lehrbuchalgorithmen

wir können gleich mit Algorithm Engineering loslegen

Viele Anwendungen profitieren

Es ist frappierend, dass es hier noch Neuland gibt

Basis für Bachelor- Masterarbeiten

Page 18: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 18

Was diese Vorlesung nicht ist:

Keine wiedergekäute Algorithmen I/II o.Ä.

Grundvorlesungen “vereinfachen” die Wahrheit oft

z.T. fortgeschrittene Algorithmen

steilere Lernkurve

Implementierungsdetails

Betonung von Messergebnissen

Page 19: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 19

Was diese Vorlesung nicht ist:

Keine Theorievorlesung

keine (wenig?) Beweise

Reale Leistung vor Asymptotik

Page 20: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 20

Was diese Vorlesung nicht ist:

Keine Implementierungsvorlesung

Etwas Algorithmenanalyse,. . .

Wenig Software Engineering

Page 21: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 21

Exkurs: Maschinenmodelle

RAM/von Neumann Modell

ALUO(1) registers

1 word = O(log n) bits

large memoryfreely programmable

Analyse: zähle Maschinenbefehle —

load, store, Arithmetik, Branch,. . .

Einfach

Sehr erfolgreich

zunehmend unrealistisch

weil reale Hardware

immer komplexer wird

Page 22: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 22

Das Sekundärspeichermodell

1 wordcapacity Mfreely programmable

ALUregisters

B words

large memory

M: Schneller Speicher der Größe M

B: Blockgröße

Analyse: zähle (nur?) Blockzugriffe (I/Os)

Page 23: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 23

Interpretationen des Sekundärspeichermodells

Externspeicher Caches

großer Speicher Platte(n) Hauptspeicher

M Hauptspeicher ein cache level

B Plattenblock (MBytes!) Cache Block (16–256) Byte

Ggf. auch zwei cache levels.

Variante: SSDs

Page 24: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 24

Parallele Platten

MM

B

...B

D1 2

unabhängige Platten

[Vitter Shriver 94]

1 2 D

Mehrkopfmodell

[Aggarwal Vitter 88]

Page 25: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 25

Mehr Modellaspekte

Instruktionsparallelismus (Superscalar, VLIW,

EPIC,SIMD,. . . )

Pipelining

Was kostet branch misprediction?

Multilevel Caches (gegenwärtig 2–3 levels) “cache

oblivious algorithms”

Parallele Prozessoren, Multithreading

Kommunikationsnetzwerke

. . .

Page 26: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 26

1 Arrays, Verkettete Listen und

abgeleitete Datenstrukturen

Bounded Arrays

Eingebaute Datenstruktur.

Größe muss von Anfang an bekannt sein

Page 27: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 27

Unbounded Array

z.B. std::vector

pushBack: Element anhängen

popBack: Letztes Element löschen

Idee: verdopple wenn der Platz ausgeht

halbiere wenn Platz verschwendet wird

Wenn man das richtig macht, brauchen

n pushBack/popBack Operationen Zeit O(n)

Algorithmese: pushBack/popBack haben konstante amortisierte

Komplexität. Was kann man falsch machen?

Page 28: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 28

Doppelt verkettete Listen

Page 29: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 29

Class Item of Element // one link in a doubly linked list

e : Element

next : Handle //

prev : Handle

invariant next→prev=prev→next=this

Trick: Use a dummy header

· · ·· · ·

Page 30: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 30

Procedure splice(a,b,t : Handle)assert b is not before a∧ t 6∈ 〈a, . . . ,b〉//Cut out 〈a, . . . ,b〉 a′ a b b′

· · · · · ·

a′ := a→prevb′ := b→nexta′→next := b′ //b′→prev := a′ // · · · · · ·

// insert 〈a, . . . ,b〉 after t

t ′ := t→next //

t a b t ′

· · · · · ·

b→next := t’ //a→prev := t // · · · · · ·

t→next := a //t ′→prev := b // · · · · · ·

Page 31: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 31

Einfach verkettete Listen

...

Vergleich mit doppelt verketteten Listen

Weniger Speicherplatz

Platz ist oft auch Zeit

Eingeschränkter z.B. kein delete

Merkwürdige Benutzerschnittstelle, z.B. deleteAfter

Page 32: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 32

Speicherverwaltung für Listen

kann leicht 90 % der Zeit kosten!

Lieber Elemente zwischen (Free)lists herschieben als echte

mallocs

Alloziere viele Items gleichzeitig

Am Ende alles freigeben?

Speichere „parasitär“. z.B. Graphen:

Knotenarray. Jeder Knoten speichert ein ListItem

Partition der Knoten kann als verkettete Listen

gespeichert werden

MST, shortest Path

Challenge: garbage collection, viele Datentypen

auch ein Software Engineering Problem hier nicht

Page 33: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 33

Beispiel: Stack

......12 1 2

SList B-Array U-Array

dynamisch + − +

Platzverschwendung pointer zu groß? zu groß?

freigeben?

Zeitverschwendung cache miss + umkopieren

worst case time (+) + −War es das?

Hat jede Implementierung gravierende Schwächen?

Page 34: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 34

The Best From Both Worlds

...

B

hybrid

dynamisch +

Platzverschwendung n/B+B

Zeitverschwendung +

worst case time +

Page 35: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 35

Eine Variante

...

...

B

Directory

Elemente

− Reallozierung im top level nicht worst case konstante

Zeit

+ Indizierter Zugrif auf S[i] in konstanter Zeit

Page 36: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 36

Beyond Stacks...

stack

...FIFO queue

...

pushBack popBackpushFrontpopFront

deque

FIFO: BArray −→ cyclic array

h

t0n

b

Aufgabe: Ein Array, das “[i]” in konstanter Zeit und

einfügen/löschen von Elementen in Zeit O(√

n) unterstützt

Aufgabe: Ein externer Stack, der n push/pop Operationen mit

O(n/B) I/Os unterstützt

Page 37: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 37

Aufgabe: Tabelle für hybride Datenstrukturen vervollständigenOperation List SList UArray CArray explanation of ‘∗’[·] n n 1 1|·| 1∗ 1∗ 1 1 not with inter-list splice

first 1 1 1 1last 1 1 1 1insert 1 1∗ n n insertAfter onlyremove 1 1∗ n n removeAfter onlypushBack 1 1 1∗ 1∗ amortizedpushFront 1 1 n 1∗ amortizedpopBack 1 n 1∗ 1∗ amortizedpopFront 1 1 n 1∗ amortizedconcat 1 1 n n

splice 1 1 n n

findNext,. . . n n n∗ n∗ cache efficient

Page 38: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 38

Was fehlt?

Fakten Fakten Fakten

Messungen für

Verschiedene Implementierungsvarianten

Verschiedene Architekturen

Verschiedene Eingabegrößen

Auswirkungen auf reale Anwendungen

Kurven dazu

Interpretation, ggf. Theoriebildung

Aufgabe: Array durchlaufen versus zufällig allozierte verkettete

Liste

Page 39: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 39

Algorithm Engineering

A Detailed View

Using Sorting as Guiding Example

Page 40: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 40

Sorting

Permute n elements of an array a such that

a[1]≤ a[2]≤ ·· · ≤ a[n]

Efficient sequential, comparison based algorithms take time

O(n logn)

Page 41: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 41

Sorting – Model

basedComparison

<e.g. integer

arbitrary

full informationtrue/false

Page 42: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 42

Why Sorting?

Teaching perspective:

simple

surprisingly nontrivial

computer scientists know the basics

Application Perspective:

Build index data structures

Process objects in well defined order

Group similar objects

Bottleneck in many applications

Page 43: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 43

appl. engin.

realisticmodels

design

implementation

librariesalgorithm

performanceguarantees

applications

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering

deduction

benchmarksinputs

Page 44: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 44

Realistic Models

Theory ←→ Practice

simple appl. model complex

simple machine model real

Careful refinements

Try to preserve (partial) analyzability / simple results

Page 45: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 45

Advanced Machine Models

registersALU

freely programmable

1 2 p

von NeumannRAM / PRAM / shared memory

exhibit parallelism

Page 46: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 46

Distributed Memory [1]

also consider communication

...

...p21

Page 47: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 47

Parallel Disks [2]

MM

B...

B

D1 2

Page 48: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 48

Set Associative Caches [3]

MB

...cache lines of the memory

cache setsa=2

B cache

main memory

...

Page 49: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 49

Branch Prediction [4]

Page 50: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 50

Hierarchical Parallel External Memory [5]

multicore

disks...

MPI

Page 51: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 51

Graphics Processing Units [6]

Page 52: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 52

Combininig Models?

design / analyze one aspect at a time

hierarchical combination

autotuning ?

...cache lines of the memory

cache setsa=2

B cache

main memory

...

M

basedComparison

<

true/false

e.g. integerarbitrary

full information......

P

B...

21

D1 2

Or: Model Agnostic Algorithm Design

Page 53: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 53

appl. engin.

realisticmodels

design

implementation

librariesalgorithm

performanceguarantees

applications

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering

deduction

benchmarksinputs

Page 54: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 54

Design

of algorithms that work well in practice

simplicity

reuse

constant factors

exploit easy instances

Page 55: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 55

Design – Sorting

simplicity...

...

...

...

...

...

...

...

...

...

...

...

merge distribute

reuse disk scheduling, prefetching,

load balancing, sequence partitioning [7, 2, 8, 5]

constant factors detailed machine model·(caches, TLBs, registers, branch prediction, ILP) [9, 4]

instances randomization for difficult instances [2, 5]

Page 56: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 56

Example: External Sorting [10]

n: input size

M: internal memory size

B: block size

large memory

fast memorycapacity Mfreely programmable

ALUregisters

B

Page 57: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 57

Procedure externalMerge(a,b,c :File of Element)

x := a.readElement // Assume emptyFile.readElement= ∞

y := b.readElement

for j := 1 to |a|+ |b| do

if x≤ y then c.writeElement(x); x := a.readElement

else c.writeElement(y); y := b.readElement

xac

b y

Page 58: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 58

External Binary Merging

read file a: ≈ |a|/B.

read file b: ≈ |b|/B.

write file c: ≈ (|a|+ |b|)/B.

overall:

≈ 2|a|+ |b|

B

xac

b y

Page 59: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 59

Run Formation

Sort input pieces of size M

i=0 i=M i=2M

i=2Mi=0 i=M

internalsort

f

t

run

M

I/Os: ≈ 2n

B

Page 60: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 60

Sorting by External Binary Merging

formRuns formRuns formRuns formRuns

merge merge

make_things_

__aeghikmnst

as_simple_as

__aaeilmpsss

_possible_bu

__aaeilmpsss

t_no_simpler

__eilmnoprst

____aaaeeghiiklmmnpsssst ____bbeeiillmnoopprssstu

merge

________aaabbeeeeghiiiiklllmmmnnooppprsssssssttu

Procedure externalBinaryMergeSort // I/Os: ≈run formation // 2n/B

while more than one run left do //⌈log n

M

⌉×

merge pairs of runs // 2n/B

output remaining run // ∑ : 2n

B

(

1+⌈

logn

M

⌉)

Page 61: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 61

Example Numbers: PC 2015

n = 240 Byte (1 TB)

M = 233 Byte (8 GB)

B = 222 Byte (4 MB)

one I/O needs 2−5 s (31.25 ms)

time =2n

B

(

1+⌈

logn

M

⌉)

·2−5s

=2 ·218 · (1+7) ·2−5s = 217s≈ 36h

Idea: 8 passes 2 passes

Page 62: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 62

Multiway Merging

Procedure multiwayMerge(a1, . . . ,ak,c :File of Element)

for i := 1 to k do xi:= ai.readElement

for j := 1 to ∑ki=1 |ai| do

find i ∈ 1..k that minimizes xi// no I/Os!, O(logk) time

c.writeElement(xi)

xi := ai.readElement

...

...

...

...

...

internal buffers

...

Page 63: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 63

Mulitway Merging – Analysis

...

...

...

...

...

internal buffers

...

I/Os: read file ai: ≈ |ai|/B.

write file c: ≈ ∑ki=1 |ai|/B

overall:

≤≈ 2∑

ki=1 |ai|

B

constraint: We need k+1 buffer blocks, i.e., k+1 < M/B

Page 64: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 64

Sorting by Multiway-Merging sort ⌈n/M⌉ runs with M elements each 2n/B I/Os

merge M/B runs at a time 2n/B I/Os

until a single run remains ×⌈

logM/BnM

merging phases

————————————————-

overall sort(n) :=2n

B

(

1+⌈

logM/B

n

M

⌉)

I/Os

formRuns formRuns formRuns formRuns

make_things_

__aeghikmnst

as_simple_as

__aaeilmpsss

_possible_bu

__aaeilmpsss

t_no_simpler

__eilmnoprst

________aaabbeeeeghiiiiklllmmmnnooppprsssssssttu

multi merge

Page 65: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 65

External Sorting by Multiway-Merging

More than one merging phase?:

Not for the hierarchy main memory, hard disk.

reason:

>2000︷︸︸︷

M

B>

≈200︷ ︸︸ ︷

RAM Euro/bit

Platte Euro/bit

Page 66: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 66

More on Multiway Mergesort – Parallel Disks

Randomized Striping [2]

Optimal Prefetching [2]

Overlapping of I/O and Computation [7]

.

.

..

read buffers

.

.

..

k+O(D)overlap buffers

merging

−ping

1

k

merge

2D w

rite buffers

D blocks

merge buffers

overlap−disk schedulingO

(D) prefetch buffers

Page 67: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 67

appl. engin.

realisticmodels

design

implementation

librariesalgorithm

performanceguarantees

applications

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering

deduction

benchmarksinputs

Page 68: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 68

Analysis

Constant factors matter

Beyond worst case analysis

Practical algorithms might be difficult to analyze

(randomization, meta heuristics,. . . )

Page 69: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 69

Analysis – Sorting

.

.

......

merge

write

prefetch

overlap

merge

Constant factors matter (1+o(1))×lower bound

[2, 5] I/Os for parallel (disk) external sorting

Beyond worst case analysis

Practical algorithms might be difficult to analyze Open:

[2] greedy algorithm for parallel disk prefetching

[Knuth@48]

Page 70: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 70

appl. engin.

realisticmodels

design

implementation

librariesalgorithm

performanceguarantees

applications

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering

deduction

benchmarksinputs

Page 71: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 71

Implementation

sanity check for algorithms !

Challenges

Semantic gaps:

Abstract algorithm↔C++. . .↔hardware

Small constant factors:

compare highly tuned competitors

.

.

......

merge

write

prefetch

overlap

merge

Page 72: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 72

Example: Inner Loops Sample Sort [4]

template <class T>

void findOraclesAndCount(const T* const a,

const int n, const int k, const T* const s,

Oracle* const oracle, int* const bucket)

for (int i = 0; i < n; i++)

int j = 1;

while (j < k)

j = j*2 + (a[i] > s[j]);

int b = j-k;

bucket[b]++;

oracle[i] = b;

6s

splitter array index

3s

<

< <

<<<< 1s 5s 7s6 7

buckets

4

3s 2 2

1 4s

5

decisions

decisions

decisions

> >

>>>>

*2

*2 *2

>+1

+1 +1

Page 73: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 73

Example: Inner Loops Sample Sort [4]

template <class T>

void findOraclesAndCountUnrolled([...])

for (int i = 0; i < n; i++)

int j = 1;

j = j*2 + (a[i] > s[j]);

j = j*2 + (a[i] > s[j]);

j = j*2 + (a[i] > s[j]);

j = j*2 + (a[i] > s[j]);

int b = j-k;

bucket[b]++;

oracle[i] = b;

6s

splitter array index

3s

<

< <

<<<< 1s 5s 7s6 7

buckets

4

3s 2 2

1 4s

5

decisions

decisions

decisions

> >

>>>>

*2

*2 *2

>+1

+1 +1

Page 74: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 74

Example: Inner Loops Sample Sort [4]

template <class T>

void findOraclesAndCountUnrolled2([...])

for (int i = n & 1; i < n; i+=2) \

int j0 = 1; int j1 = 1;

T ai0 = a[i]; T ai1 = a[i+1];

j0=j0*2+(ai0>s[j0]); j1=j1*2+(ai1>s[j1]);

j0=j0*2+(ai0>s[j0]); j1=j1*2+(ai1>s[j1]);

j0=j0*2+(ai0>s[j0]); j1=j1*2+(ai1>s[j1]);

j0=j0*2+(ai0>s[j0]); j1=j1*2+(ai1>s[j1]);

int b0 = j0-k; int b1 = j1-k;

bucket[b0]++; bucket[b1]++;

oracle[i] = b0; oracle[i+1] = b1;

Page 75: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 75

appl. engin.

realisticmodels

design

implementation

librariesalgorithm

performanceguarantees

applications

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering

deduction

benchmarksinputs

Page 76: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 76

Experiments

central for AE in science

reproducibility, careful comparisons, careful preparation of

evidence

also important in applications – just more informal

careful planning

careful interpretation

close AE cycle fast

Page 77: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 77

Example: Distributed Mem. Parallel External

Sorting [7]

0

1 0 0 0

2 0 0 0

3 0 0 0

4 0 0 0

1 2 4 8 1 6 3 2 6 4

sec

nodes

sort 100GiB per node

worst case inputworst case input, randomized

random inputrandom input, randomized

Page 78: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 78

appl. engin.

realisticmodels

design

implementation

librariesalgorithm

performanceguarantees

applications

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering

deduction

benchmarksinputs

Page 79: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 79

Experiments

sometimes a good surrogate for analysis

too much rather than too little output data

reproducibility (10 years!)

software engineering

Page 80: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 80

Example, Parallel External Sorting [7]

0

1 0 0 0

2 0 0 0

3 0 0 0

4 0 0 0

1 2 4 8 1 6 3 2 6 4

sec

nodes

sort 100GiB per node

worst case inputworst case input, randomized

random inputrandom input, randomized

Page 81: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 81

Algorithm Libraries — Challenges

software engineering , e.g. CGAL [www.cgal.org]

standardization, e.g. java.util, C++ STL and BOOST

performance ↔ generality ↔ simplicity

applications are a priori unknown

result checking, verification

MC

ST

L

TX

XL

S

files, I/O requests, disk queues,

block prefetcher, buffered block writer

completion handlers

Asynchronous I/O primitives layer

Block management layertyped block, block manager, buffered streams,

Containers:

STL−user layervector, stack, set

priority_queue, mapsort, for_each, merge

Pipelined sorting,zero−I/O scanning

Streaming layer

Algorithms:

Operating System

Applications

Page 82: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 82

Example: External Sorting [7, 16]

TX

XL

S

LinuxWindowsMac, ...

files, I/O requests, disk queues,

block prefetcher, buffered block writer

completion handlers

Asynchronous I/O primitives layer

Block management layertyped block, block manager, buffered streams,

Containers:

STL−user layervector, stack, set

priority_queue, mapsort

Pipelined sorting,zero−I/O scanning

Streaming layer

Algorithms:

Operating System

Applications

for_each, merge

Page 83: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 83

appl. engin.

realisticmodels

design

implementation

librariesalgorithm

performanceguarantees

applications

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering

deduction

benchmarksinputs

Page 84: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 84

Problem Instances

Benchmark instances for NP-hard problems

TSP

Steiner-Tree

SAT

set covering

graph partitioning [18]

. . .

have proved essential for development of practical algorithms

Strange: much less real world instances for polynomial

problems (MST, shortest path, max flow, matching. . . )

Page 85: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 85

Example: Sorting Benchmark (Indy) [5, 12]

100 byte records, 10 byte random keys, with file I/O

Category data volume performance improvement

GraySort 100 TB 564 GB / min 17×MinuteSort 955 GB 955 GB / min > 10×JouleSort 1 000 GB 13 400 Recs/Joule 4×JouleSort 100 GB 35 500 Recs/Joule 3×JouleSort 10 GB 34 300 Recs/Joule 3×

Also: PennySort

Page 86: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 86

GraySort: inplace multiway mergesort, exact splitting [5]

16 GBRAM

240GB

Infiniband switch400 MB / s node all−all

XeonXeon

Page 87: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 87

JouleSort [12]

Intel Atom N330

4 GB RAM

4×256 GB

SSD (SuperTalent)

Algorithm similar to

GraySort

Page 88: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 88

More Sorting Work in my Group

Model multiway merge s. sample s quicks. radixs.

GPU [19] [19]

distributed memory [5] [20]

cache [8] [4] [21]

parallel disks [7] [22]

branch mispredictions [4] [23]

parallel string s. [24] [25] [25] [25]

massively parallel [14] [14]

Page 89: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 89

Sorting — Overview

You think you understand quicksort?

Avoiding branch mispredictions: Super Scalar Sample Sort

(Parallel disk) external sorting

Page 90: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 90

Quicksort

Function quickSort(s : Sequence of Element) : Sequence of Element

if |s| ≤ 1 then return s // base case

pick p ∈ s uniformly at random // pivot key

a := 〈e ∈ s : e < p〉b := 〈e ∈ s : e = p〉c := 〈e ∈ s : e > p〉return concatenate(quickSort(a),b,quickSort(c)

Page 91: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 91

Engineering Quicksort

array

2-way-Comparisons

sentinels for inner loop

inplace swaps

Recursion on smaller subproblems

→ O(logn) additional space

break recursion for small (20–100) inputs, insertion sort

(not one big insertion sort)

Page 92: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 92

Procedure qSort(a : Array of Element; ℓ,r : N) // Sort a[ℓ..r]

while r− ℓ≥ n0 do // Use divide-and-conquerj := pickPivotPos(a, l,r)swap(a[ℓ],a[ j]) // Helps to establish the invariantp := a[ℓ]

i := ℓ; j := r

repeat // a: ℓ i→ j← r

while a[i]< p do i++ // Scan over elements (A)while a[ j]> p do j−− // on the correct side (B)if i≤ j then swap(a[i],a[ j]); i++ ; j−−

until i > j // Done partitioningif i < l+r

2 then qSort(a, ℓ, j); ℓ := j

else qSort(a, i,r) ; r := i

insertionSort(a[l..r]) // faster for small r− ℓ

Page 93: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 93

Picking Pivots Painstakingly — Theory

probabilistically: Expected 1.4n logn element comparisons

median of three: Expected 1.2n logn element comparisons

perfect: −→ n logn element comparisons

(approximate using large samples)

Practice

3GHz Pentium 4 Prescott, g++

Page 94: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 94

Picking Pivots Painstakingly — Instructions

6

7

8

9

10

11

12

10 12 14 16 18 20 22 24 26

inst

ruct

ions

con

stan

t

n

out sort Instructions / n lg n for algs: random pivot - median of 3 - exact median - skewed pivot n/10 - n/11

random pivotmedian of 3

exact medianskewed pivot n/10skewed pivot n/11

Page 95: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 95

Picking Pivots Painstakingly — Time

6.8

7

7.2

7.4

7.6

7.8

8

8.2

8.4

10 12 14 16 18 20 22 24 26

nano

secs

con

stan

t

n

out sort Seconds / n lg n for algs: random pivot - median of 3 - exact median - skewed pivot n/10 - n/11

random pivotmedian of 3

exact medianskewed pivot n/10skewed pivot n/11

Page 96: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 96

Picking Pivots Painstakingly — Branch Misses

0.3

0.32

0.34

0.36

0.38

0.4

0.42

0.44

0.46

0.48

0.5

10 12 14 16 18 20 22 24 26

bran

ch m

isse

s co

nsta

nt

n

out sort Branch misses / n lg n for algs: random pivot - median of 3 - exact median - skewed pivot n/10 - n/11

random pivotmedian of 3

exact medianskewed pivot n/10skewed pivot n/11

Page 97: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 97

Can We Do Better? Previous WorkInteger Keys

+ Can be 2−3 times faster than quicksort

− Naive ones are cache inefficient and slower than quicksort

− Simple ones are distribution dependent.

Cache efficient sorting

k-ary merge sort

[Nyberg et al. 94, Arge et al. 04, Ranade et al. 00, Brodal et al.

04]

+ Faktor logk less cache faults

− Only ≈ 20 % speedup, and only for laaarge inputs

Page 98: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 98

Sample Sort

Function sampleSort(e = 〈e1, . . . ,en〉,k)if n/k is “small” then return smallSort(e)

let S = 〈S1, . . . ,Sak−1〉 denote a random sample of e

sort S

〈s0,s1,s2, . . . ,sk−1,sk〉:=〈−∞,Sa,S2a, . . . ,S(k−1)a,∞〉for i := 1 to n do

find j ∈ 1, . . . ,ksuch that s j−1 < ei ≤ s j

place ei in bucket b j

return concatenate(sampleSort(b1), . . . ,sampleSort(bk))

1s 5s 6s3s 7s 2s 4s

binary search?

buckets

Page 99: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 99

Why Sample Sort? traditionally: parallelizable on coarse grained machines

+ Cache efficient ≈ merge sort

− Binary search not much faster than merging

− complicated memory management

Super Scalar Sample Sort Binary search −→ implicit search tree

Eliminate all conditional branches

Exploit instruction parallelism

Cache efficiency comes to bear

“steal” memory management from radix sort

Page 100: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 100

Classifying Elements

t:= 〈sk/2,sk/4,s3k/4,sk/8,s3k/8,s5k/8,s7k/8, . . .〉for i := 1 to n do

j:= 1

repeat logk times //unroll

j:= 2 j+(ai > t j)

j:= j− k+1∣∣b j

∣∣++

oracle[i]:= j //oracle

6s

splitter array index

3s

<

< <

<<<< 1s 5s 7s6 7

buckets

4

3s 2 2

1 4s

5

decisions

decisions

decisions

> >

>>>>

*2

*2 *2

>+1

+1 +1

Now the compiler should:

use predicated instructions

interleave for-loop iterations (unrolling ∨ software pipelining)

Page 101: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 101

template <class T>

void findOraclesAndCount(const T* const a,

const int n, const int k, const T* const s,

Oracle* const oracle, int* const bucket)

for (int i = 0; i < n; i++)

int j = 1;

while (j < k)

j = j*2 + (a[i] > s[j]);

int b = j-k;

bucket[b]++;

oracle[i] = b;

Page 102: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 102

Predication

Hardware mechanism that allows instructions to be

conditionally executed

Boolean predicate registers (1–64) hold condition codes

predicate registers p are additional inputs of predicated

instructions I

At runtime, I is executed if and only if p is true

+ Avoids branch misprediction penalty

+ More flexible instruction scheduling

− Switched off instructions still take time

− Longer opcodes

− Complicated hardware design

Page 103: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 103

Example (IA-64)

Translation of: if (r1 > r2) r3 := r3 + 4

With a conditional branch: Via predication:

cmp.gt p6,p7=r1,r2 cmp.gt p6,p7=r1,r2

(p7) br.cond .label (p6) add r3=4,r3

add r3=4,r3

.label:

(...)

Other Current Architectures:

Conditional moves only

Page 104: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 104

Unrolling (k = 16)

template <class T>

void findOraclesAndCountUnrolled([...])

for (int i = 0; i < n; i++)

int j = 1;

j = j*2 + (a[i] > s[j]);

j = j*2 + (a[i] > s[j]);

j = j*2 + (a[i] > s[j]);

j = j*2 + (a[i] > s[j]);

int b = j-k;

bucket[b]++;

oracle[i] = b;

Page 105: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 105

More Unrolling k = 16, n eventemplate <class T>

void findOraclesAndCountUnrolled2([...])

for (int i = n & 1; i < n; i+=2) \

int j0 = 1; int j1 = 1;

T ai0 = a[i]; T ai1 = a[i+1];

j0=j0*2+(ai0>s[j0]); j1=j1*2+(ai1>s[j1]);

j0=j0*2+(ai0>s[j0]); j1=j1*2+(ai1>s[j1]);

j0=j0*2+(ai0>s[j0]); j1=j1*2+(ai1>s[j1]);

j0=j0*2+(ai0>s[j0]); j1=j1*2+(ai1>s[j1]);

int b0 = j0-k; int b1 = j1-k;

bucket[b0]++; bucket[b1]++;

oracle[i] = b0; oracle[i+1] = b1;

Page 106: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 106

Distributing Elements

for i := 1 to n do a′B[oracle[i]]++ := ai

o

B

a’

a

i

moverefer

refer

refer

Why Oracles?

simplifies memory management

no overflow tests or re-copying

simplifies software pipelining

separates computation and memory access aspects

small (n bytes)

sequential, predictable memory access

can be hidden using prefetching / write buffering

Page 107: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 107

Distributing Elementstemplate <class T> void distribute(

const T* const a0, T* const a1,

const int n, const int k,

const Oracle* const oracle, int* const bucket)

for (int i = 0, sum = 0; i <= k; i++)

int t = bucket[i]; bucket[i] = sum; sum += t;

for (int i = 0; i < n; i++)

a1[bucket[oracle[i]]++] = a0[i];

Page 108: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 108

Experiments: 1.4 GHz Itanium 2

restrict keyword from ANSI/ISO C99

to indicate nonaliasing

Intel’s C++ compiler v8.0 uses predicated instructions

automatically

Profiling gives 9% speedup

k = 256 splitters

Use stl:sort from g++ (n≤ 1000)!

insertion sort for n≤ 100

Random 32 bit integers in [0,109]

Page 109: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 109

Comparison with Quicksort

0

1

2

3

4

5

6

7

8

4096 16384 65536 218 220 222 224

time

/ n lo

g n

[ns]

n

GCC STL qsortsss-sort

Page 110: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 110

Breakdown of Execution Time

0

1

2

3

4

5

6

4096 16384 65536 218 220 222 224

time

/ n lo

g n

[ns]

n

TotalfindBuckets + distribute + smallSort

findBuckets + distributefindBuckets

Page 111: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 111

A More Detailed View

dynamic dynamic

instr. cycles IPC small n IPC n = 225

findBuckets,

1× outer loop 63 11 5.4 4.5

distribute,

one element 14 4 3.5 0.8

Page 112: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 112

Comparison with Quicksort Pentium 4

0

2

4

6

8

10

12

14

4096 16384 65536 218 220 222 224

time

/ n lo

g n

[ns]

n

Intel STLGCC STLsss-sort

Problems: few registers, one condition code only, compiler needs “help”

Page 113: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 113

Breakdown of Execution Time Pentium 4

0

1

2

3

4

5

6

7

8

4096 16384 65536 218 220 222 224

time

/ n lo

g n

[ns]

n

TotalfindBuckets + distribute + smallSort

findBuckets + distributefindBuckets

Page 114: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 114

Analysis

mem. acc.branches data dep. I/Os registers instructions

k-way distribution:

sss-sort n logk O(1) O(n) 3.5n/B 3×unroll O(logk)

quicksort logk lvls. 2n logk n logk O(n logk)2 nB

logk 4 O(1)

k-way merging:

memory n logk n logk O(n logk) 2n/B 7 O(logk)

register 2n n logk O(n logk) 2n/B k O(k)

funnel k′logk′ k 2n logk′ k n logk O(n logk) 2n/B 2k′+2 O(k′)

Page 115: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 115

Conclusions

sss-sort up to twice as fast as quicksort on Itanium

comparisons 6= conditional branches

algorithm analysis is not just instructions and caches

More results: GPU-Sample-Sort is (was) best comparison based

sorting algorithm on graphics hardware

[Leischner/Osipov/Sanders 2009]

Parallel String Sample-Sorting is best string sorting algorithm

[Bingmann/Sanders 2013]

AMS Sort scales to 215 PEs [AxtmannBSS SPAA 2015]

Page 116: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 116

Criticism I

Why only random keys?

Answer I

Sample sort hardly depends on input distribution

Page 117: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 117

Criticism I´

What if there are many equal keys?

They all end up in the same bucket

Answer I´

Its not a bug its a feature:

si = si+1 = · · ·= s j indicates a frequent key!

Set si := maxx ∈ Key : x < si,(optional: drop si+2, . . .s j)

Now bucket i+1 need not be sorted!

Exercise: Explain how to support equality buckets using a single

additional comparison per element.

Page 118: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 118

Criticism II

Quicksort is inplace

Answer II

Use hybrid List-Array Representation of sequences

needs O

(√kn)

extra space for k-way sample sort

Page 119: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 119

Criticism II´

But I WANT Arrays for input and output

Answer II´

Inplace Konversion

input: easy

output: tricky. Exercise: develop rough idea. Hint: permute

blocks. Each permutation consists of a product of cyclic

permutations. An inplace cyclic permutation is easy.

Page 120: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 120

Future Work

better small case sorter

handle large buckets

high level fine-tuning, e.g., clever choice of k

modern architectures

almost in-place implementation

multilevel cache-aware or cache-oblivious generalization

(oracles help)

More parallel stuff

Page 121: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 121

Externes Sortieren

n: Eingabegröße

M: Größe des schnellen Speichers

B: Blockgröße

1 wordcapacity Mfreely programmable

ALUregisters

B words

large memory

Page 122: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 122

Procedure externalMerge(a,b,c :File of Element)

x := a.readElement // Assume emptyFile.readElement= ∞

y := b.readElement

for j := 1 to |a|+ |b| do

if x≤ y then c.writeElement(x); x := a.readElement

else c.writeElement(y); y := b.readElement

xac

b y

Page 123: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 123

Externes (binäres) Mischen – I/O-Analyse

Datei a lesen: ⌈|a|/B⌉ ≤ |a|/B+1.

Datei b lesen: ⌈|b|/B⌉ ≤ |b|/B+1.

Datei c schreiben: ⌈(|a|+ |b|)/B⌉ ≤ (|a|+ |b|)/B+1.

Insgesamt:

≤ 3+2|a|+ |b|

B≈ 2|a|+ |b|

B

Bedingung: Wir brauchen 3 Pufferblöcke, d.h., M > 3B.

xac

b y

Page 124: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 124

Run Formation

Sortiere Eingabeportionen der Größe M

i=0 i=M i=2M

i=2Mi=0 i=M

internalsort

f

t

run

M

I/Os: ≈ 2n

B

Page 125: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 125

Sortieren durch Externes Binäres Mischen

formRuns formRuns formRuns formRuns

merge merge

make_things_

__aeghikmnst

as_simple_as

__aaeilmpsss

_possible_bu

__aaeilmpsss

t_no_simpler

__eilmnoprst

____aaaeeghiiklmmnpsssst ____bbeeiillmnoopprssstu

merge

________aaabbeeeeghiiiiklllmmmnnooppprsssssssttu

Procedure externalBinaryMergeSort // I/Os: ≈run formation // 2n/B

while more than one run left do //⌈log n

M

⌉×

merge pairs of runs // 2n/B

output remaining run // ∑ : 2n

B

(

1+⌈

logn

M

⌉)

Page 126: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 126

Zahlenbeispiel: PC 2007

n = 238 Byte

M = 231 Byte

B = 220 Byte

I/O braucht 2−6 s

Zeit: 2n

B

(

1+⌈

logn

M

⌉)

= 2 ·218 · (1+7) ·2−6 s = 216 s≈ 18 h

Idee: 8 Durchläufe 2 Durchläufe

Page 127: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 127

Zahlenbeispiel: PC 2007→ 2014

n = 238→39 Byte

M = 231→33 Byte

B = 220→21 Byte

I/O braucht 2−6 s

Zeit: 2n

B

(

1+⌈

logn

M

⌉)

= 2 ·218 · (1+6) ·2−6 s = 216 s≈ 16 h

Page 128: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 128

Mehrwegemischen

Procedure multiwayMerge(a1, . . . ,ak,c :File of Element)

for i := 1 to k do xi:= ai.readElement

for j := 1 to ∑ki=1 |ai| do

find i ∈ 1..k that minimizes xi// no I/Os!, O(logk) time

c.writeElement(xi)

xi := ai.readElement

...

...

...

...

...

internal buffers

...

Page 129: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 129

Mehrwegemischen – Analyse

...

...

...

...

...

internal buffers

...

I/Os: Datei ai lesen: ≈ |ai|/B.

Datei c schreiben: ≈ ∑ki=1 |ai|/B

Insgesamt:

≤≈ 2∑

ki=1 |ai|

B

Bedingung: Wir brauchen k Pufferblöcke, d.h., k < M/B.

Interne Arbeit: (benutze Prioritätsliste !)

O

(

logkk

∑i=1

|ai|)

Page 130: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 130

Sortieren durch Mehrwege-Mischen Sortiere ⌈n/M⌉ runs mit je M Elementen 2n/B I/Os

Mische jeweils M/B runs 2n/B I/Os

bis nur noch ein run übrig ist ×⌈

logM/BnM

Mischphasen

————————————————-

Insgesamt sort(n) :=2n

B

(

1+⌈

logM/B

n

M

⌉)

I/Os

formRuns formRuns formRuns formRuns

make_things_

__aeghikmnst

as_simple_as

__aaeilmpsss

_possible_bu

__aaeilmpsss

t_no_simpler

__eilmnoprst

________aaabbeeeeghiiiiklllmmmnnooppprsssssssttu

multi merge

Page 131: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 131

Sortieren durch Mehrwege-Mischen

Interne Arbeit:

O

run formation︷ ︸︸ ︷

n logM + n logM

B︸ ︷︷ ︸

PQ access per phase

phases︷ ︸︸ ︷⌈

logM/B

n

M

= O(n logn)

Mehr als eine Mischphase?:

Nicht für Hierarchie Hauptspeicher, Festplatte.

Grund

>1000︷︸︸︷

M

B>

<1000︷ ︸︸ ︷

RAM Euro/bit

Platte Euro/bit

4-2014: 8GB4MB

= 2048 > 67/8GB

67/2T B≈ 250

Page 132: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 132

Mehr zu externem Sortieren

Untere Schranke ≈ 2(?)n

B

(

1+⌈

logM/B

n

M

⌉)

I/Os

[Aggarwal Vitter 1988]

Obere Schranke ≈ 2n

DB

(

1+⌈

logM/B

n

M

⌉)

I/Os (erwartet)

für D parallele Platten

[Hutchinson Sanders Vitter 2005, Dementiev Sanders2003]

Offene Frage: deterministisch?

Page 133: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 133

Sorting with Parallel DisksI/O Step := Access to a single physical block per disk

M

B

... D1 2

independent disks

[Vitter Shriver 94]

Theory: Balance Sort [Nodine Vitter 93].

Deterministic, complex

asymptotically optimal O(·)

Multiway merging

“Usually” factor 10? less I/Os.

Not asymptotically optimal. 42%

Basic Approach: Improve Multiway Merging

Page 134: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 134

Striping

21 D

emulated disk

logical block

physical blocks

That takes care of run formation

and writing the output...

...

...

...

...

internal buffers

...

???

But what about merging?

Page 135: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 135

Naive Striping

Run single disk merge-sort on striped logical disk:

2n

DB

(

1+⌈

logM/DB

n

M

⌉)

I/Os

Theory: Θ(logM/B) worse when D≈M/B

Practice: 2→ 3 passes in some cases

Page 136: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 136

Prediction

[Folklore, Knuth]

Smallest Element

of each block

triggers fetch.

Prefetch buffers

allow parallel access

of next blocks

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

internal buffers

...

controls

prediction sequence

pref

etch

buf

fers

Page 137: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 137

Warmup: Multihead Model

M

B1 2 D

Multihead Model

[Aggarwal Vitter 88]

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

internal buffers

...

controls

prediction sequence

pref

etch

buf

fers

D prefetch buffers yield an optimal algorithm

sort(n) :=2n

DB

(

1+⌈

logM/B

n

M

⌉)

I/Os

Page 138: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 138

Bigger Prefetch Buffer

...

...

...

...

...

...prediction sequence

...

internal buffers

...

1

2

k

prefetch buffers

Dk good deterministic performance

O(D) would yield an optimal algorithm.

Possible?

Page 139: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 139

Randomized Cycling

[Vitter Hutchinson 01]

Block i of stripe j goes to disk π j(i) for a rand. permutation π j

...prediction sequence

internal buffers

...

...

...

...

...

...

prefetch buffers

Good for naive prefetching and Ω(D logD) buffers

Page 140: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 140

Buffered Writing[S-Egner-Korst SODA00, Hutchinson-S-Vitter ESA 01]

Theorem:

Buffered Writing

is optimal

. . .

But

how good is optimal?W/D

...

...

...

1 2 3

queues

D

Sequence Σof blocks

randomizedmapping

write whenever

one of W

buffers is free

otherwise, output

one block from

each nonempty queue

Theorem: Rand. cycling achieves efficiency 1−O(D/W ).

Analysis: negative association of random variables,

application of queueing theory to a “throttled” Alg.

Page 141: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 141

Optimal Offline Prefetching

Theorem:

For buffer size W :

∃ (offline) prefetching schedule for Σ with T input steps

⇔∃ (online) write schedule for ΣR with T output steps

r q p o l i f

n g e

m k j

h

abcd

abcdefghijklmnopqr

ΣR

Σ

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

order of reading

order of writing

Page 142: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 142

Optimal Offline Prefetching

Theorem:

For buffer size W :

∃ (offline) prefetching schedule for Σ with T input steps

⇔∃ (online) write schedule for ΣR with T output steps

abcdefghijklmnopqr

ΣR

Σrqponm

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

r

m

n

order of reading

order of writing

Page 143: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 143

Optimal Offline Prefetching

Theorem:

For buffer size W :

∃ (offline) prefetching schedule for Σ with T input steps

⇔∃ (online) write schedule for ΣR with T output steps

abcdefghijklmnopqr

ΣR

Σ qpolk

j

k

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

qr

m

n

order of reading

order of writing

Page 144: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 144

Optimal Offline Prefetching

Theorem:

For buffer size W :

∃ (offline) prefetching schedule for Σ with T input steps

⇔∃ (online) write schedule for ΣR with T output steps

abcdefghijklmnopqr

ΣR

Σpol

j

ki

h

j

h

p

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

qr

m

n

order of reading

order of writing

Page 145: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 145

Optimal Offline Prefetching

Theorem:

For buffer size W :

∃ (offline) prefetching schedule for Σ with T input steps

⇔∃ (online) write schedule for ΣR with T output steps

abcdefghijklmnopqr

ΣR

Σ

ol

ki j

h

p

efg

g

o

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

qr

m

n

order of reading

order of writing

Page 146: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 146

Optimal Offline Prefetching

Theorem:

For buffer size W :

∃ (offline) prefetching schedule for Σ with T input steps

⇔∃ (online) write schedule for ΣR with T output steps

abcdefghijklmnopqr

ΣR

Σ

l

ki j

h

p

ef

g

od

c e

l

d

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

qr

m

n

order of reading

order of writing

Page 147: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 147

Optimal Offline Prefetching

Theorem:

For buffer size W :

∃ (offline) prefetching schedule for Σ with T input steps

⇔∃ (online) write schedule for ΣR with T output steps

abcdefghijklmnopqr

ΣR

Σ

ki j

h

p

f

g

o

c e

l

d

b

a

c

i

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

qr

m

n

order of reading

order of writing

Page 148: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 148

Optimal Offline Prefetching

Theorem:

For buffer size W :

∃ (offline) prefetching schedule for Σ with T input steps

⇔∃ (online) write schedule for ΣR with T output steps

abcdefghijklmnopqr

ΣR

Σ

k j

h

p

f

g

o

e

l

d

b

a

c

i

b

f

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

qr

m

n

order of reading

order of writing

Page 149: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 149

Optimal Offline Prefetching

Theorem:

For buffer size W :

∃ (offline) prefetching schedule for Σ with T input steps

⇔∃ (online) write schedule for ΣR with T output steps

abcdefghijklmnopqr

ΣR

Σ

k j

h

p

g

o

e

l

d

a

c

i

b

f

a

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

qr

m

n

order of reading

order of writing

Page 150: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 150

SynthesisMultiway merging

+ prediction [60s Folklore]

+optimal (randomized) writing [S-Egner-Korst SODA 2000]

+randomized cycling [Vitter Hutchinson 2001]

+optimal prefetching [Hutchinson-S-Vitter ESA 2002]

(1+o(1))·sort(n) I/Os

“answers” question in [Knuth 98];

difficulty 48 on a 1..50 scale.

Page 151: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 151

We are not done yet!

Internal work

Overlapping I/O and computation

Reasonable hardware

Interfacing with the Operating System

Parameter Tuning

Software engineering

Pipelining

Page 152: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 152

Key Sorting

The I/O bandwidth of our machine is about 1/3 of its main

memory bandwidth

If key size≪ element size

sort key pointer pairs to save memory bandwidth during run

formation

3 1 4 1 5 1 1 3 4 5

3 1 4 1 5

1 1 3 4 5

a

a

b

b

c

c

d

d

e

e

permute

extract

sort

Page 153: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 153

Tournament Trees for Multiway Merging

Assume k = 2K runs

K level complete binary tree

Leaves: smallest current element of each run

Internal nodes: loser of a competition for being smallest

Above root: global winner

6 2 7 9 1 4 74

6 7 9 7

4 4

2

1

36 2 7 9 4 74

6 7 9 7

4 4

3

2deleteMin+insertNext

3

Page 154: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 154

Why Tournament Trees Exactly logk element comparisons

Implicit layout in an array simple index arithmetics

(shifts)

Predictable load instructions and index computations(Unrollable) inner loop:for (int i=(winnerIndex+kReg)>>1; i>0; i>>=1)

currentPos = entry + i;

currentKey = currentPos->key;

if (currentKey < winnerKey)

currentIndex = currentPos->index;

currentPos->key = winnerKey;

currentPos->index = winnerIndex;

winnerKey = currentKey;

winnerIndex = currentIndex;

Page 155: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 155

Overlapping I/O and Computation

One thread for each disk (or asynchronous I/O)

Possibly additional threads

Blocks filled with elements are passed by references

between different buffers

Page 156: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 156

Overlapping During Run Formation

First post read requests for runs 1 and 2

Thread A: Loop wait-read i; sort i; post-write i;Thread B: Loop wait-write i; post-read i+2;

computebound case

I/Obound case

sort

read

time

control flow in thread A

sort

read

write

1

1

1

2

2

3

2

3

3

4

4

4

...

...

...

k−1

k−1

k−1

k

k

k

control flow in thread B

1 2

2

2

1

1

3

3

3

4

4

4

k−1

k−1

k−1

k

k

k

...

...

...write

Page 157: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 157

Overlapping During Merging

1B−12 3B−14 5B−16 · · ·Bad example: · · ·

1B−12 3B−14 5B−16 · · ·

Page 158: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 158

Overlapping During Merging

.

.

..

read buffers

.

.

..

k+O(D)overlap buffers

merging

−ping

1

k

merge

2D w

rite buffers

D blocks

merge buffers

overlap−disk scheduling

O(D

) prefetch buffersI/O Threads: Writing has priority over reading

Page 159: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 159

I/O bound case: The I/O thread never blocks

y = # of elements

merged during

one I/O step.

I/O bound

y > DB2

y≤ DB

blocking

r

DB−y DB 2DB−y 2DBw

kB+2DB

kB+DB+y

kB+DB

kB+3DB

kB+2DB+y

fetch

output#e

lem

ents

in o

verla

p+m

erge

buf

fers

#elements in output buffers

Page 160: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 160

Compute bound case:

The merging thread

never blocks

r

kB+2DB

kB+DB

kB+3DB

block I/O

kB+y

output

fetch

block merging

DB 2DBw

2DB−y

kB

kB+2y

#ele

men

ts in

ove

rlap+

mer

ge b

uffe

rs

#elements in output buffers

Page 161: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 161

Hardware (mid 2002)

Linux

(2 × 2GHz Xeon × 2 Threads)

Several 66 MHz PCI-buses

(SuperMicro P4DPE3)

Several fast IDE controllers

(4 × Promise Ultra100 TX2)

Many fast IDE disks

(8× IBM IC35L080AVVA07)

PCI−Busses

Controller

Channels

2x Xeon

2x64x66 Mb/s

4 Threads

E7500 128

MB/s

400x64 Mb/s

4x2x100

Intel

8x80GBMB/s

Chipset

8x45

RAMDDR1 GB

cost effective I/O-bandwidth (real 360 MB/s for ≈ 3000 )e

Page 162: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 162

Hardware (end 2009) geschätzt

Linux

(2 × 2.4 GHz Xeon E5530 × 4 Cores × 2 Threads)

PCIe x8 SATA controller

16–24 1.5 TByte SATA disks

(8× IBM IC35L080AVVA07)

24 GByte RAM

cost effective I/O-bandwidth (real 2 GB/s for ≈ 6000 )e

Page 163: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 163

Hardware 2015 geschätzt

≈3000 Euro for 32 512 GB SATA SSDs a 93 Euro:

16TB capacity, and

16GB/s read bandwidth ?

64 GB/RAM 800 Euro

2 PCEexpress 3.0 Controllers (8 lanes each)

8 SATA channels on each controller.

2x6 cores Intel Xeon E5-2603v3 (a 200 Euro)

Page 164: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 164

Software Interface

Goals: efficient + simple + compatible

TX

XL

S

files, I/O requests, disk queues,

block prefetcher, buffered block writer

completion handlers

Asynchronous I/O primitives layer

Block management layertyped block, block manager, buffered streams,

Containers:

STL−user layervector, stack, set

priority_queue, mapsort, for_each, merge

Pipelined sorting,zero−I/O scanning

Streaming layer

Algorithms:

Operating System

Applications

Page 165: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 165

Default Measurement Parameters

t := number of available buffer blocks

Input Size: 16 GByte

Element Size: 128 Byte

Keys: Random 32 bit integers

Run Size: 256 MByte

Block size B: 2 MByte

Compiler: g++ 3.2 -O6

Write Buffers: max(t/4,2D)

Prefetch Buffers: 2D+3

10(t−w−2D)

PCI−Busses

Controller

Channels

2x Xeon

2x64x66 Mb/s

4 Threads

E7500 128

MB/s

400x64 Mb/s

4x2x100

Intel

8x80GBMB/s

Chipset

8x45

RAMDDR1 GB

Page 166: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 166

Element sizes (16 GByte, 8 disks)

0

50

100

150

200

250

300

350

400

16 32 64 128 256 512 1024

time

[s]

element size [byte]

run formationmergingI/O wait in merge phaseI/O wait in run formation phase

parallel disks bandwidth “for free” internal work, overlapping are relevant

Page 167: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 167

Earlier Academic ImplementationsSingle Disk, at most 2 GByte, old measurements use artificial M

0

100

200

300

400

500

600

700

800

16 32 64 128 256 512 1024

sort

tim

e [s

]

element size [byte]

LEDA-SMTPIE<stxxl> comparison based

Page 168: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 168

Earlier Acad. Implementations: Multiple Disks

2630

4050

75

100

200

300

400500616

16 32 64 128 256 512 1024

sort

tim

e [s

]

element size [byte]

LEDA-SM Soft-RAIDTPIE Soft-RAID<stxxl> Soft-RAID<stxxl>

Page 169: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 169

What are good block sizes (8 disks)?

12

14

16

18

20

22

24

26

128 256 512 1024 2048 4096 8192

sort

tim

e [n

s/by

te]

block size [KByte]

128 GBytes 1x merge128 GBytes 2x merge

16 GBytes

B is not a technology constant

Page 170: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 170

Optimal Versus Naive Prefetching

Total merge time

-3

-2

-1

0

1

0 20 40 60 80 100

diffe

renc

e to

nai

ve p

refe

tchi

ng [%

]

fraction of prefetch buffers [%]

176 read buffers112 read buffers48 read buffers

Page 171: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 171

Impact of Prefetch and Overlap Buffers

115

120

125

130

135

140

145

150

16 32 48 64 80 96 112 128 144 160 172

mer

ging

tim

e [s

]

number of read buffers

no prefetch bufferheuristic schedule

Page 172: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 172

Tradeoff: Write Buffer Size Versus Read Buffer

Size

110

115

120

125

130

135

140

145

0 50 100 150 200

mer

ge ti

me

[s]

number of read buffers

Page 173: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 173

Scalability

0

5

10

15

20

16 32 64 128

sort

tim

e [n

s/by

te]

input size [GByte]

128-byte elements512-byte elements

4N/DB bulk I/Os

Page 174: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 174

Discussion

Theory and practice harmonize

No expensive server hardware necessary (SCSI,. . . )

No need to work with artificial M

No 2/4 GByte limits

Faster than academic implementations

(Must be) as fast as commercial implementations but with

performance guarantees

Blocks are much larger than often assumed. Not a

technology constant

Parallel disks

bandwidth “for free” don’t neglect internal costs

Page 175: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 175

More Parallel Disk Sorting?

Pipelining: Input does not come from disk but from a logical

input stream. Output goes to a logical output stream

only half the I/Os for sorting

often no I/Os for scanning todo: better overlapping

Parallelism: This is the only way to go for really many disks

Tuning and Special Cases: ssssort, permutations, balance work

between merging and run formation?. . .

Longer Runs: not done with guaranteed overlapping, fast

internal sorting !

Distribution Sorting: Better for seeks etc.?

Inplace Sorting: Could also be faster

Determinism: A practical and theoretically efficient algorithm?

Page 176: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 176

Procedure formLongRunsq,q′ : PriorityQueuefor i := 1 to M do q.insert(readElement)

invariant |q|+ |q′|= M

loopwhile q 6= /0

writeElement(e:= q.deleteMin)

if input exhausted then break outer loopif e′:= readElement < e then q′.insert(e′)else q.insert(e′)

q:= q′; q′:= /0output q in sorted order; output q′ in sorted order

Knuth: average run length 2M

todo: cache-effiziente Implementierung

Page 177: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 177

2 Priority Queues (insert, deleteMin)Binary Heaps best comparison based “flat memory” algorithm

+ On average constant time for insertion

+ On average logn+O(1) key comparisons per delete-Min

using the “bottom-up” heuristics [Wegener 93].

− ≈ log(n/M) block accesses

per delete-Min

CacheM

Page 178: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 178

Bottom Up Heuristics

97

2

97

2

movecompare swap

97

1

48

6

8

8

6

2

97

4

delete MinO(1)

sift down holelog(n)

averageO(1)

sift up

8

2

9

7

8

2

6

4 4

6

4

6

8

2

9

7

6

4

Factor two faster

than naive implementation

Page 179: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 179

Der Wettbewerber fit gemacht:

int i=1, m=2, t = a[1];

m += (m != n && a[m] > a[m + 1]);

if (t > a[m])

do a[i] = a[m];

i = m;

m = 2*i;

if (m > n) break;

m += (m != n && a[m] > a[m + 1]);

while (t > a[m]);

a[i] = t;

Keine signifikanten Leistungsunterschiede auf meiner Maschine

(heapsort von random integers)

Page 180: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 180

Vergleich

Speicherzugriffe: O(1) weniger als top down.

O(logn) worst case. bei effizienter Implementierung

Elementvergleiche: ≈ logn weniger für bottom up (average

case) aber die sind leicht vorhersagbar

Aufgabe: siftDown mit worst case logn+O(log logn)

Elementvergleichen

Page 181: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 181

Heapkonstruktion

Procedure buildHeapBackwards

for i := ⌊n/2⌋ downto 1 do siftDown(i)

Procedure buildHeapRecursive(i : N)

if 4i≤ n then

buildHeapRecursive(2i)

buildHeapRecursive(2i+1)

siftDown(i)

Rekursive Funktion für große Eingaben 2× schneller!

(Rekursion abrollen für 2 unterste Ebenen)

Aufgabe: Erklärung

Page 182: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 182

Mittelgroße PQs – km≪M2/B Einfügungen

minsertionbuffer PQ

sortedsequences

PQ

...1 2 k

k−merge

mB

Insert: Anfangs in insertion buffer.

Überlauf −→sort; flush; kleinster Schlüssel in merge-PQ

Delete-Min: deleteMin aus der PQ mit kleinerem min

Page 183: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 183

Analyse – I/Os

deleteMin: jedes Element wird ≤ 1× gelesen, zusammen mit B

anderen – amortisiert 1/B penalty für insert.

minsertionbuffer PQ

sortedsequences

PQ

...1 2 k

k−merge

mB

Page 184: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 184

Analyse – Vergleiche (Maß für interne Arbeit)

deleteMin: 1+O(max(logk, logm)) = O(logm)

genauere Argumentation: amortisiert 1+ logk bei

geeigneter PQ

insert: ≈ m logm alle m Ops. Amortisiert logm

Insgesamt nur logkm amortisiert !

minsertionbuffer PQ

sortedsequences

PQ

...1 2 k

k−merge

mB

Page 185: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 185

LargeQueues[Sanders 00]

group− group− group−buffer bufferbuffer 1 2 3

cached

externalswapped

group 1

group 2

group 3

k k

k

m

m m

2T 3T

...1 2 k ...1 2 k ...1 2 k

k−merge k−merge k−merge

R−merge

m

m’

m

1T

deletion buffer

m

insertion bufferminsert:

insert buffer full −→ merge ins-buf with del-buf·group-buf-1.

m′ smallest into deletion buffer, next m into group buffer one,

rest into group 1.

group full −→ merge group; shift into next group.

merge invalid group buffers and move them into group 1.

Page 186: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 186

LargeQueues[Sanders 00]

group− group− group−buffer bufferbuffer 1 2 3

cached

externalswapped

group 1

group 2

group 3

k k

k

m

m m

2T 3T

...1 2 k ...1 2 k ...1 2 k

k−merge k−merge k−merge

R−merge

m

m’

m

1T

deletion buffer

m

insertion bufferm

Delete-Min:

Refill. m′≪ m. nothing else

Page 187: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 187

Example

Merge insertion buffer, deletion buffer, and leftmost group

buffer

b

g

v

jilq

k f

unmc

wtpoh

a

db

ex

3

sr

b

v

jilq

k f

unm

wtpoh

xab

ed

sr

3insert( )

c

Page 188: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 188

Example

Merge group 1

b

g

v

jilq

k f

unmc

wtpoh

a

db

ex

3

sr

ex

b

v

ilq

unmc

wtpoh

a

db

fgjk

3

sr

Page 189: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 189

Example

Merge group 2

ex

b

v

ilq

unmc

wtpoh

a

db

fgjk

3

sr

ex

3a

fgjk

bdb

chil

mn o p

qrst u v

w

Page 190: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 190

Example

Merge group buffers

ex

3a

fgjk

bdb

chil

mn o p

qrst u v

w

ex

3a

fgjk

chil

mn o p

qrst u v

w

bbd

Page 191: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 191

Example

DeleteMin 3; DeleteMin a;

ex

3a

fgjk

chil

mn o p

qrst u v

w

bbd

ex

fgjk

chil

mn o p

qrst u v

w

bbd

Page 192: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 192

Example

DeleteMin b

ex

fgjk

il

mn o p

qrst u v

wd

chil

mn o p

qrst u v

w

bbd

ex

jk

bfg

ch

Page 193: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 193

Analysis

I insertions, buffer sizes m = Θ(M)

merging degree k = Θ(M/B)

block accesses: sort(I)+“small terms”

key comparisons: I log I + “small terms”

(on average)

... ... ...

R

R

W

WR

WR W

R W

RR

Other (similar, earlier) [Arge 95, Brodal-Katajainen 98, Brengel

et al. 99, Fadel et al. 97] data structures spend a factor ≥ 3 more

I/Os to replace I by queue size.

Page 194: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 194

Implementation Details

Fast routines for 2–4 way merging keeping smallest

elements in registers

Use sentinels to avoid special case treatments (empty

sequences, . . . )

Currently heap sort for sorting the insertion buffer

k 6= M/B: multiple levels, limited associativity, TLB

Page 195: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 195

Experiments

Keys: random 32 bit integers

Associated information: 32 dummy bits

Deletion buffer size: 32 Near optimal

Group buffer size: 256 : performance on

Merging degree k: 128 all machines tried!

Compiler flags: Highly optimizing, nothing advanced

Operation Sequence:

(Insert-DeleteMin-Insert)N(DeleteMin-Insert-DeleteMin)N

Near optimal performance on all machines tried!

Page 196: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 196

MIPS R10000, 180 MHz

0

50

100

150

1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapbottom up aligned 4-ary heapsequence heap

Page 197: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 197

Ultra-SparcIIi, 300 MHz

0

20

40

60

80

100

120

140

160

256 1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapbottom up aligned 4-ary heap

sequence heap

Page 198: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 198

Alpha-21164, 533 MHz

0

20

40

60

80

100

120

140

256 1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapbottom up aligned 4-ary heap

sequence heap

Page 199: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 199

Pentium II, 300 MHz

0

50

100

150

200

1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapbottom up aligned 4-ary heap

sequence heap

Page 200: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 200

Core2 Duo Notebook, 1.??? GHz

0

5

10

15

20

25

1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapsequence heap

Page 201: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 201

Intel i3-4030U, 1.90GHz

0

5

10

15

20

25

1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapsequence heap

Page 202: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 202

(insert (deleteMin insert)s)N

(deleteMin (insert deleteMin)s)N

32

64

128

256 1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

s=0, binary heaps=0, 4-ary heap

s=0s=1s=4

s=16

Page 203: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 203

Methodological Lessons

If you want to compare small constant factors in execution time:

Reproducability demands publication of source codes

(4-ary heaps, old study in Pascal)

Highly tuned codes in particular for the competitors

(binary heaps have factor 2 between good and naive

implementation).

How do you compare two mediocre implementations?

Careful choice/description of inputs

Use multiple different hardware platforms

Augment with theory (e.g., comparisons, data

dependencies, cache faults, locality effects . . . )

Page 204: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 204

Open Problems

Dependence on size rather than number of insertions

Parallel disks

Space efficient implementation

Multi-level cache aware or cache-oblivious variants

Page 205: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 205

3 van Emde-Boas Search Trees

K’ bit

vEB

top

hash table

K’ bit

vEB

boti

Store set M of K = 2k-bit integers.

later: associated information

K = 1 or |M|= 1: store directly

K′:= K/2

Mi:=

x mod 2K′ : x div 2K′ = i

root points to nonempty Mi-s

top t = i : Mi 6= /0

insert, delete, search in O(logK) time

Page 206: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 206

Locate

//minx ∈M : y≤ x

Function locate(y : N) : ElementHandle

if y > maxM then return ∞

if K = 1 then return locateLocally(y)

if M = x then return x

(i, j):= (y div 2K/2,y mod 2K/2)

if Mi = /0∨ y > maxMi then

i:= top.locate(i+1)

j:= minMi

else j:= Mi.locate( j)

return i2K/2 + j

Page 207: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 207

Comparison with Comparison Based Search TreesIdeally: logn log logn

Problems:

Many special

case tests

High space

consumption

100

1000

256 1024 4096 16384 65536 218 220 222 223

Tim

e fo

r ra

ndom

loca

te [n

s]

n

orig-STreeLEDA-STree

STL map(2,16)-tree

Page 208: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 208

Efficient 32 bit Implementation

top 3 layers of bit arrays

K’ bit

vEB

top

hash table

K’ bit

vEB

boti

hash table

K’ bit

vEB

boti

65535

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

65535

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

Page 209: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 209

Layers of Bit Arrays

t1[i] = 1 iff Mi 6= /0

t2[i] = t1[32i]∨ t1[32i+1]∨ ·· ·∨ t1[32i+31]

t3[i] = t2[32i]∨ t2[32i+1]∨ ·· ·∨ t2[32i+31]

Page 210: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 210

Efficient 32 bit Implementation

Break recursion after 3 layers

hash table

K’ bit

vEB

boti

65535

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

65535

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

… …

hash table

… …

hash table

… …

hash table

… …

… …

hash table

… …

hash table

… …

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 3

Bits 7-0

… …

hash table

Level 3

Bits 7-0

… …

hash table

… …

hash table

… …

… …

hash table

hash tablehash table

65535

Page 211: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 211

Efficient 32 bit Implementation

root hash table root array

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

… …

hash table

… …

hash table

… …

hash table

… …

… …

hash table

… …

hash table

… …

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 3

Bits 7-0

… …

hash table

Level 3

Bits 7-0

… …

hash table

… …

hash table

… …

… …

hash table

hash tablehash table

65535

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

… …

hash table

… …

hash table

… …

hash table

… …

… …

hash table

… …

hash table

… …

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 3

Bits 7-0

… …

hash table

Level 3

Bits 7-0

… …

hash table

… …

hash table

… …

… …

hash table

65535

65535

array 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0…

65535

array 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0…

Page 212: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 212

Efficient 32 bit Implementation

Sorted doubly linked lists for associated information and range

queries

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

… …

hash table

… …

hash table

… …

hash table

… …

… …

hash table

… …

hash table

… …

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 3

Bits 7-0

… …

hash table

Level 3

Bits 7-0

… …

hash table

… …

hash table

… …

… …

hash table

65535

65535

array 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0…

65535

array 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0…

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

Level 1 – root

Bits 31-16

… …

hash table

… …

… …

hash table

… …

hash table

… …

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 3

Bits 7-0

… …

hash table

… …

… …

hash table

65535

65535

array 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0…

65535

array 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0…

Page 213: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 213

Example

0 0 0 0 0 0 00 0 0 00 0...

v

hash

v...

0 0 0 00 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0

v

...v

...

1 111111111111111

t3

t2

t1

=111111=1111

=1,11,111,1111M

M1M04

t100

t200

t20

t10

0

32

0

32 ...111

...111

0 0 0 0 0 0 0 0

0

root

L2

L3

1..1111

0 hash1..111

32

32

0 1

63

2047 root−top

65535

Element List

...4

=1,11,111,1111,111111M

M00=1,11,111

0

Page 214: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 214

Locate High Level// return handle of minx ∈M : y≤ x

Function locate(y : N) : ElementHandle

if y > maxM then return ∞

i := y[16..31] // Level 1

if r[i] = nil ∨y > maxMi then return minMt1.locate(i+1)

if Mi = x then return x

j := y[8..15] // Level 2

if ri[ j] = nil ∨y > maxMi j then return minMi,t1i .locate( j+1)

if Mi j = x then return x

return ri j[t1i j.locate(y[0..7])] // Level 3

Page 215: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 215

Locate in Bit Arrays

//find the smallest j ≥ i such that tk[ j] = 1

Method locate(i) for a bit array tk consisting of n bit words

//n = 32 for t1, t2, t1i , t1

i j; n = 64 for t3; n = 8 for t2i , t2

i j

assert some bit in tk to the right of i is nonzero

j := i div n // which word?

a := tk[n j..n j+n−1]

set a[(i mod n)+1..n−1] to zero // n−1 · · · i mod n · · ·0if a = 0 then

j := tk+1.locate( j)

a := tk[n j..n j+n−1]

return n j+msbPos(a) // e.g. floating point conversion

Page 216: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 216

Random Locate

100

1000

256 1024 4096 16384 65536 218 220 222 223

Tim

e fo

r lo

cate

[ns]

n

orig-STreeLEDA-STree

STL map(2,16)-tree

STree

Page 217: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 217

Random Insert

0

500

1000

1500

2000

2500

3000

3500

4000

1024 4096 16384 65536 218 220 222 223

Tim

e fo

r in

sert

[ns]

n

orig-STreeLEDA-STree

STL map(2,16)-tree

STree

Page 218: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 218

Delete Random Elements

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

5500

1024 4096 16384 65536 218 220 222 223

Tim

e fo

r de

lete

[ns]

n

orig-STreeLEDA-STree

STL map(2,16)-tree

STree

Page 219: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 219

Open Problems

Measurement for “worst case” inputs

Measure Performance for realistic inputs

– IP lookup etc.

– Best first heuristics like, e.g., bin packing

More space efficient implementation

(A few) more bits

Page 220: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 220

4 Hashing

“to hash” ≈ “to bring into complete disorder”

paradoxically, this helps us to find things

more easily!

store set M ⊆ Element.

key(e) is unique for e ∈M.

support dictionary operations in O(1) time:

M.insert(e : Element): M := M∪e

M.remove(k : Key): M := M \e, e = k

M.find(k : Key): return e ∈M with e = k; ⊥ if none present

(Convention: key is implicit), e.g. e = k iff key(e) = k)

Page 221: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 221

An (Over)optimistic approach

t

h

M

A (perfect) hash function h

maps elements of M to

unique entries of table t[0..m−1], i.e.,

t[h(key(e))] = e

Page 222: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 222

Collisions

perfect hash functions are difficult to obtain

t

h

M

Example: Birthday Paradoxon

Page 223: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 223

Collision Resolution

for example by closed hashing

entries: elements sequences of elements

< >< >

< >

< >

<><><>

<><>

<><><>

t[h(k)]

t

h

M

k

Page 224: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 224

Hashing with Chaining

Implement sequences in closed hashing by singly linked lists

insert(e): Insert e at the beginning of t[h(e)]. constant time

remove(k): Scan through t[h(k)]. If an element e with h(e) = k is

encountered, remove it and return.

find(k) : Scan through t[h(k)].

If an element e with h(e) = k is encountered,

return it. Otherwise, return ⊥.

O(|M|) worst case time for

remove and find

< >< >

< >

< >

<><><>

<><>

<><><>

t[h(k)]

t

h

M

k

Page 225: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 225

A Review of Probability

1s 5s 6s3s 7s 2s 4s

binary search?

buckets

Example from hashing

sample space Ω random hash functions0..m−1Key

events: subsets of Ω E42 = h ∈Ω : h(4) = h(2)px =probability of x ∈Ω uniform distr. ph = m−|Key|

P [E ] = ∑x∈E px P [E42] =1m

random variable X0 : Ω→ R X = |e ∈M : h(e) = 0|.

Page 226: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 226

expectation E[X0] = ∑y∈Ω pyX(y) E[X ] = |M|m

(*)

Linearity of Expectation: E[X +Y ] = E[X ]+E[Y ]

Proof of (*):

Consider the 0-1 RV Xe = 1 if h(e) = 0 for e ∈M and Xe = 0

else.

E[X0] = E[ ∑e∈M

Xe] = ∑e∈M

E[Xe] = ∑e∈M

P [Xe = 1] = |M| · 1

m

Page 227: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 227

Analysis for Random Hash Functions< >< >

< >

< >

<><><>

<><>

<><><>t

h

M

t[h(k)]Satz 1. The expected execution time of

remove(k) and find(k)

is O(1) if |M|= O(m).

Beweis. Constant time plus the time for scanning t[h(k)].

X := |t[h(k)]|= |e ∈M : h(e) = h(k)|.Consider the 0-1 RV Xe = 1 if h(e) = h(k) for e ∈M and Xe = 0

else.

E[X ] = E[ ∑e∈M

Xe] = ∑e∈M

E[Xe] = ∑e∈M

P [Xe = 1] =|M|m

= O(1)

This is independent of the input set M.

Page 228: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 228

Universal Hashing

Idea: use only certain “easy” hash functions

Definition:

U ⊆ 0..m−1Key is universal

if for all x, y in Key with x 6= y and random h ∈U ,

P [h(x) = h(y)] =1

m.

Satz 2. Theorem 1 also applies to universal families of hash

functions.

Beweis. For Ω = U we still have P [Xe = 1] = 1m

.

The rest is as before.

Page 229: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 229

H Ω

Page 230: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 230

A Simple Universal FamilyAssume m is prime, Key⊆ 0, . . . ,m−1k

Satz 3. For a = (a1, . . . ,ak) ∈ 0, . . . ,m−1kdefine

ha(x) = a·x mod m, H · =

ha : a ∈ 0..m−1k

.

H · is a universal family of hash functions

x1 x2 x3

a2 a3a1* * * mod m = h (x)+ + a

Page 231: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 231

Beweis. Consider x = (x1, . . . ,xk), y = (y1, . . . ,yk) with x j 6= y j

count a-s with ha(x) = ha(y).

For each choice of ais, i 6= j, ∃ exactly one a j with

ha(x) = ha(y):

∑1≤i≤k

aixi ≡ ∑1≤i≤k

aiyi( mod m)

⇔ a j(x j− y j)≡ ∑i6= j,1≤i≤k

ai(yi− xi)( mod m)

⇔ a j ≡ (x j− y j)−1 ∑

i6= j,1≤i≤k

ai(yi− xi)( mod m)

mk−1 ways to choose the ai with i 6= j.

mk is total number of as, i.e.,

Page 232: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 232

P [ha(x) = ha(y)] =mk−1

mk=

1

m.

Page 233: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 233

Bit Based Universal FamiliesLet m = 2w, Key = 0,1k

Bit-Matrix Multiplication: H⊕ =

hM : M ∈ 0,1w×k

where hM(x) = Mx (arithmetics mod 2, i.e., xor, and)

Table Lookup:H⊕[] =

h⊕[](t1,...,tb)

: ti ∈ 0..m−10..w−1

where h⊕[](t1,...,tb)

((x0,x1, . . . ,xb)) = x0⊕⊕

bi=1ti[xi]

x0x2 x1

a wa

x

k

Page 234: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 234

Hashing with Linear Probing

Open hashing: go back to original idea.

Elements are directly stored in the table.

Collisions are resolved by finding other entries.

linear probing: search for next free place by scanning the table.

Wrap around at the end.

simple

space efficient

cache efficient

t

h

M

Page 235: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 235

The Easy Part

Class BoundedLinearProbing(m,m′ : N; h : Key→ 0..m−1)

t=[⊥, . . . ,⊥] : Array [0..m+m′−1] of Element

invariant ∀i : t[i] 6=⊥⇒ ∀ j ∈ h(t[i])..i−1 : t[i] 6=⊥

Procedure insert(e : Element)

for i := h(e) to ∞ while t[i] 6=⊥ do ;

assert i < m+m′−1

t[i] := e

Function find(k : Key) : Element

for i := h(e) to ∞ while t[i] 6=⊥ do

if t[i] = k then return t[i]

return ⊥

t

h

M

m’

m

Page 236: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 236

Removeexample: t = [. . . , x

h(z),y,z, . . .], remove(x)

invariant ∀i : t[i] 6=⊥⇒ ∀ j ∈ h(t[i])..i−1 : t[i] 6=⊥Procedure remove(k : Key)

for i := h(k) to ∞ while k 6= t[i] do // search k

if t[i] =⊥ then return // nothing to do

//we plan for a hole at i.

for j := i+1 to ∞ while t[ j] 6=⊥ do

//Establish invariant for t[ j].

if h(t[ j])≤ i then

t[i] := t[ j] // Overwrite removed element

i := j // move planned hole

t[i] := ⊥ // erase freed entry

Page 237: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 237

More Hashing Issues

High probability and worst case guarantees

more requirements on the hash functions

Hashing as a means of load balancing in parallel systems,

e.g., storage servers

– Different disk sizes and speeds

– Adding disks / replacing failed disks without much

copying

Page 238: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 238

Space Efficient Hashing

with Worst Case Constant Access Time

Represent a set of n elements (with associated information)

using space (1+ ε)n.

Support operations insert, delete, lookup, (doall) efficiently.

Assume a truly random hash function h

Page 239: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 239

Related Work

Uniform hashing:

Expected time ≈ 1ε

h3h1 h2

Dynamic Perfect Hashing,

[Dietzfelbinger et al. 94]

Worst case constant time

for lookup but ε is not small.

Approaching the Information Theoretic Lower Bound:

[Brodnik Munro 99,Raman Rao 02]

Space (1+o(1))×lower bound without associated information

[Pagh 01] static case.

Page 240: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 240

Cuckoo Hashing

[Pagh Rodler 01] Table of size 2+ ε .

Two choices for each element.

Insert moves elements;

rebuild if necessary.

Very fast lookup and insert.

Expected constant insertion time.

h1

h2

Page 241: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 241

d-ary Cuckoo Hashing

d choices for each element.

Worst case d probes for delete and lookup.

Task: maintain perfect matching

in the bipartite graph

(L = Elements,R = Cells,E = Choices),

e.g., insert by BFS of random walk.

h1

h2

h3

Page 242: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 242

Tradeoff: Space↔ Lookup/Deletion Time

Lookup and Delete: d = O(log 1

ε

)probes

Insert:

(1

ε

)O(log(1/ε))

, (experiments) −→ O(1/ε)?

Page 243: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 243

Experiments

0

1

2

3

4

5

0 0.2 0.4 0.6 0.8 1

ε *

#pro

bes

for

inse

rt

space utilization

d=2d=3d=4d=5

Page 244: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 244

Open Questions and Further Results Tight analysis of insertion

Two choices with d slots each [Dietzfelbinger et al.]

cache efficiency

Good Implementation?

Automatic rehash

Always correct

Page 245: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 245

5 Minimum Spanning Trees

undirected Graph G = (V,E).

nodes V , n = |V |, e.g., V = 1, . . . ,nedges e ∈ E, m = |E|, two-element subsets of V .

edge weight c(e), c(e) ∈ R+.

G is connected, i.e., ∃ path between any two nodes.4

2

3

1

792

5

Find a tree (V,T ) with minimum weight ∑e∈T c(e) that connects

all nodes.

Page 246: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 246

MST: Overview

Basics: Edge property and cycle property

Jarník-Prim Algorithm

Kruskals Algorithm

Filter-Kruskal

Comparison

(Advanced algorithms using the cycle property)

External MST

Page 247: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 247

Applications

Clustering

Subroutine in combinatorial optimization, e.g., Held-Karp

lower bound for TSP.

Challenging real world instances???

Image segementation −→ [Diss. Jan Wassenberg]

Anyway: almost ideal “fruit fly” problem

Page 248: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 248

Selecting and Discarding MST Edges

The Cut Property

For any S⊂V consider the cut edges

C = u,v ∈ E : u ∈ S,v ∈V \SThe lightest edge in C can be used in an MST. 4

2

3

1

72

59

The Cycle Property

The heaviest edge on a cycle is not needed for an MST4

2

3

1

792

5

Page 249: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 249

The Jarník-Prim Algorithm [Jarník 1930, Prim

1957]

Idea: grow a tree

T := /0

S:= s for arbitrary start node s

repeat n−1 times

find (u,v) fulfilling the cut property for S

S:= S∪vT := T ∪(u,v)

4

2

3

1

792

5

Page 250: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 250

Implementation Using Priority Queues

Function jpMST(V, E, w) : Set of Edge

dist=[∞, . . . ,∞] : Array [1..n]// dist[v] is distance of v from the tree

pred : Array of Edge// pred[v] is shortest edge between S and v

q : PriorityQueue of Node with dist[·] as priority

dist[s] := 0; q.insert(s) for any s ∈V

for i := 1 to n−1 do do

u := q.deleteMin() // new node for S

dist[u] := 0

foreach (u,v) ∈ E do

if c((u,v))< dist[v] then

dist[v] := c((u,v)); pred[v] := (u,v)

if v ∈ q then q.decreaseKey(v) else q.insert(v)

return pred[v] : v ∈V \s

4

2

3

1

792

5

Page 251: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 251

Graph Representation for Jarník-Prim

We need node→ incident edges

4

2

3

1

792

5

m 8=m+1

V

E

1 3 5 7 91 n 5=n+1

4 1 3 2 4 1 3c 9 5 7 7 2 2 95

2

1

+ fast (cache efficient)

+ more compact than linked lists

− difficult to change

− Edges are stored twice

Page 252: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 252

Analysis

O(m+n) time outside priority queue

n deleteMin (time O(n logn))

O(m) decreaseKey (time O(1) amortized)

O(m+n logn) using Fibonacci Heaps

practical implementation using simpler pairing heaps.

But analysis is still partly open!

Page 253: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 253

Kruskal’s Algorithm [1956]

T := /0 // subforest of the MST

foreach (u,v) ∈ E in ascending order of weight do

if u and v are in different subtrees of T then

T := T ∪(u,v) // Join two subtrees

return T

Page 254: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 254

The Union-Find Data Structure

Class UnionFind(n : N) // Maintain a partition of 1..n

parent=[n+1, . . . ,n+1] : Array [1..n] of 1..n+⌈logn⌉Function find(i : 1..n) : 1..n

if parent[i]> n then return i

else i′ := find(parent[i])

parent[i] := i′

return i′

Procedure link(i, j : 1..n)

assert i and j are leaders of different subsets

if parent[i]< parent[ j] then parent[i] := j

else if parent[i]> parent[ j] then parent[ j] := i

else parent[ j] := i; parent[i]++ // next generation

Procedure union(i, j) if find(i) 6= find( j) then link(find(i), find( j))

3 421

556 1

5 6

rank

1

rank

2

4

1 2

3

parent:

Page 255: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 255

Kruskal Using Union Find

T : UnionFind(n)

sort E in ascending order of weight

kruskal(E)

Procedure kruskal(E)

foreach (u,v) ∈ E do

u′:= T.find(u)

v′:= T.find(v)

if u′ 6= v′ then

output (u,v)

T.link(u′,v′)

Page 256: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 256

Graph Representation for Kruskal

Just an edge sequence (array) !

+ very fast (cache efficient)

+ Edges are stored only once

more compact than adjacency array

Page 257: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 257

Analysis

O(sort(m)+mα(m,n)) = O(m logm) where α is the inverse

Ackermann function

Page 258: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 258

Kruskal versus Jarník-Prim I

Kruskal wins for very sparse graphs

Prim seems to win for denser graphs

Switching point is unclear

– How is the input represented?

– How many decreaseKeys are performed by JP?

(average case: n log mn

[Noshita 85])

– Experimental studies are quite old [Moret Shapiro 91],

use slow graph representation for both algs,

and artificial inputs

see attached slides.

Page 259: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 259

5.1 Filtering by Sampling Rather Than Sorting

R:= random sample of r edges from E

F :=MST(R) // Wlog assume that F spans V

L:= /0 // “light edges” with respect to R

foreach e ∈ E do // Filter

C := the unique cycle in e∪F

if e is not heaviest in C then

L:= L∪ereturn MST((L∪F))

Page 260: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 260

5.1.1 Analysis

[Chan 98, KKK 95]

Observation: e ∈ L only if e ∈MST(R∪e).(Otherwise e could replace some heavier edge in F).

Lemma 4. E[|L∪F|]≤ mnr

Page 261: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 261

MST Verification by Interval Maxima

Number the nodes by the order they were added to the MST

by Prim’s algorithm.

wi = weight of the edge that inserted node i.

Largest weight on path(u,v) = maxw j‖u < j ≤ v.

14

3

5

8

0 1 84 3 5

14

8

3

5

0 1 4 3 5 8

Page 262: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 262

Interval Maxima

Preprocessing: build

n logn size array

PreSuf.

Level 0

Level 3

Level 2

Level 1

88 56 30 56 52 74 76

88 88 30 65 65 52 52 77 77 74 74 76 80

98 98 98 15 65 75 77 77 77 41 62 76 80

98 98 75 75 75 56 52 77 77 77 77 77 80

98 15 65 75 34 77 41 62 80

757598

98 75 74

98 3498

2 6To find maxa[i.. j]:

Find the level of the LCA: ℓ= ⌊log2(i⊕ j)⌋.

Return max(PreSuf[ℓ][i],PreSuf[ℓ][ j]).

Example: 2⊕6 = 010⊕110 = 100⇒ ℓ= 2

Page 263: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 263

A Simple Filter Based Algorithm

Choose r =√

mn.

We get expected time

TPrim(√

mn)+O(n logn+m)+TPrim(mn√mn

) = O(n logn+m)

The constant factor in front of the m is very small.

Page 264: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 264

Results

10 000 nodes, SUN-Fire-15000, 900 MHz UltraSPARC-III+

Worst Case Graph

0

100

200

300

400

500

600

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Tim

e pe

r ed

ge [n

s]

Edge density

PrimFilter

Page 265: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 265

Results

10 000 nodes, SUN-Fire-15000, 900 MHz UltraSPARC-III+

Random Graph

0

100

200

300

400

500

600

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Tim

e pe

r ed

ge [n

s]

Edge density

PrimFilter

Page 266: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 266

Results10 000 nodes,

NEC SX-5

Vector Machine

“worst case”

0

200

400

600

800

1000

1200

1400

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Tim

e pe

r ed

ge [n

s]

Edge density

Prim (Scalar)I-Max (Scalar)

Prim (Vectorized)I-Max (Vectorized)

Page 267: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 267

Edge Contraction

Let u,v denote an MST edge.

Eliminate v:

forall (w,v) ∈ E do

E := E \ (w,v)∪(w,u) // but remember orignal terminals

4

1

4 3

1

792

59

2

7 (was 2,3)2

3

Page 268: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 268

Boruvka’s Node Reduction Algorithm

For each node find the lightest incident edge.

Include them into the MST (cut property)

contract these edges,

Time O(m)

At least halves the number of remaining nodes

4

1

5

792

5 2

33

Page 269: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 269

5.2 Simpler and Faster Node Reduction

for i := n downto n′+1 do

pick a random node v

find the lightest edge (u,v) out of v and output it

contract (u,v)

E[degree(v)]≤ 2m/i

∑n′<i≤n

2m

i= 2m

(

∑0<i≤n

1

i− ∑

0<i≤n′

1

i

)

≈ 2m(lnn− lnn′)= 2m lnn

n′

4

1

4 3

1

4 32

9 (was 1,4)795

9

7 (was 2,3)

output 2,3

2

3 7output 1,2

52 2

Page 270: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 270

5.3 Randomized Linear Time Algorithm

1. Factor 8 node reduction (3× Boruvka or sweep algorithm)

O(m+n).

2. R⇐ m/2 random edges. O(m+n).

3. F ⇐MST (R) [Recursively].

4. Find light edges L (edge reduction). O(m+n)

E[|L|]≤ mn/8m/2 = n/4.

5. T ⇐MST (L∪F) [Recursively].

T (n,m)≤ T (n/8,m/2)+T (n/8,n/4)+ c(n+m)

T (n,m)≤ 2c(n+m) fulfills this recurrence.

Page 271: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 271

5.4 External MSTs

Semiexternal Algorithms

Assume n≤M−2B:

run Kruskal’s algorithm using external sorting

Page 272: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 272

Streaming MSTs

If M is yet a bit larger we can even do it with m/B I/Os:

T := /0 // current approximation of MST

while there are any unprocessed edges do

load any Θ(M) unprocessed edges E ′

T := MST(T ∪E ′) // for any internal MST alg.

Corollary: we can do it with linear expected internal work

Disadvantages to Kruskal:

Slower in practice

Smaller max. n

Page 273: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 273

General External MST

while n > M−2B do

perform some node reduction

use semi-external Kruskal

n’<M

m/nn

Theory: O(sort(m)) expected I/Os by externalizing the linear

time algorithm.

(i.e., node reduction + edge reduction)

Page 274: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 274

External Implementation I: Sweeping

π : random permutation V →V

sort edges (u,v) by max(π(u),π(v))

for i := n downto n′+1 do

pick the node v with π(v) = i

find the lightest edge (u,v) out of v and output it

contract (u,v)

u v...

relink

relink

outputsweep line

Problem: how to implement relinking?

Page 275: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 275

Relinking Using Priority Queues

Q: priority queue // Order: max node, then min edge weight

foreach (u,v ,c) ∈ E do Q.insert((π(u),π(v) ,c,u,v))current := n+1

loop

(u,v ,c,u0,v0) := Q.deleteMin()

if current6= maxu,vthen

if current= M+1 then return

output u0,v0 ,ccurrent := maxu,vconnect := minu,v

else Q.insert((minu,v ,connect ,c,u0,v0))

≈ sort(10m ln nM) I/Os with opt. priority queues [Sanders 00]

Problem: Compute bound

u v...

relink

relink

outputsweep line

Page 276: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 276

Sweeping with linear internal work

Assume m = O(M2/B

)

k = Θ(M/B) external buckets with n/k nodes each

M nodes for last “semiexternal” bucket

split current bucket into

internal buckets for each node

Sweeping:

Scan current internal bucket twice:

1. Find minimum

2. Relink

New external bucket: scan and put in internal buckets

Large degree nodes: move to semiexternal bucket

...

externalexternal

internal

current semiexternal

Page 277: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 277

Experiments

Instances from “classical” MST study [Moret Shapiro 1994]

sparse random graphs

random geometric graphs

grids

O(sort(m)) I/Os

for planar graphs by

removing parallel edges!

Other instances are rather dense

or designed to fool specific algorithms.

PCI−Busses

Controller

Channels

2x64x66 Mb/s

4 Threads2x Xeon

128

E75001 GBDDRRAM

8x45

4x2x100MB/s

400x64 Mb/s

8x80GBMB/s

Chipset

Intel

Page 278: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 278

m≈ 2n

1

2

3

4

5

6

5 10 20 40 80 160 320 640 1280 2560

t / m

[ms]

m / 1 000 000

KruskalPrim

randomgeometric

grid

Page 279: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 279

m≈ 4n

1

2

3

4

5

6

5 10 20 40 80 160 320 640 1280 2560

t / m

[ms]

m / 1 000 000

KruskalPrim

randomgeometric

Page 280: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 280

m≈ 8n

1

2

3

4

5

6

5 10 20 40 80 160 320 640 1280 2560

t / m

[ms]

m / 1 000 000

KruskalPrim

randomgeometric

Page 281: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 281

MST Summary

Edge reduction helps for very dense, “hard” graphs

A fast and simple node reduction algorithm

4× less I/Os than previous algorithms

Refined semiexternal MST, use as base case

Simple pseudo random permutations (no I/Os)

A fast implementation

Experiments with huge graphs (up to n = 4 ·109 nodes)

External MST is feasible

Page 282: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 282

Ergebnis

u v...

relink

relink

outputsweep line

Einfach extern implementierbar

n′ = M semiexterner Kruskal Algorithmus

Insgesamt O(sort(mln n

m))

erwartete I/Os

Für realistische Eingaben mindestens 4× besser als bisher

bekannte Algorithmen

Implementierung in <stxxl> mit bis zu 96 GByte großen

Graphen läuft „über Nacht“

Page 283: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 283

Conclusions

Even fundamental, “simple” algorithmic problems

still raise interesting questions

Implementation and experiments are important and were

neglected by parts of the algorithms community

Theory an (at least) equally important, essential component

of the algorithm design process

Page 284: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 284

Open Problems

New experiments for (improved) Kruskal versus

Jarník-Prim

Realistic (huge) inputs

Parallel and/or external algorithms

A practical linear time Algorithm (Hagerup 2010?)

Implementations for other graph problems

Page 285: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 285

More Algorithm Engineering on Graphs

Count triangles in very large graphs. Interesting as a

measure of clusteredness. (Cooperation with AG Wagner)

External BFS, SSSP (u.a. Vitaly Osipov)

Maximum flows: Is the theoretical best algorithm any

good? (Jein)

Approximate max. weighted matching

Graph partitioning KaHiP

Route planning

Page 286: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 286

Maximal Flows

Theory: O(mΛ log(n2/m) logU

)binary blocking

flow-algorithm mit Λ = minm1/2,n2/3 [Goldberg-Rao-97].

Problem: best case ≈ worst case

[Hagerup Sanders Träff WAE 98]:

Implementable generalization

best case≪ worst case

best algorithms for some “difficult” instances

Page 287: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 287

More On Experimental Methodology

Scientific Method:

Experiment need a possible outcome that falsifies a

hypothesis

Reproducible

– keep data/code for at least 10 years

– clear and detailed

description in papers / TRs

– share instances and code

appl. engin.

realisticmodels

design

implementation

librariesalgorithm

performanceguarantees

applications

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering

deduction

benchmarksinputs

Page 288: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 288

Quality Criteria

Beat the state of the art, globally – (not your own toy codes

or the toy codes used in your community!)

Clearly demonstrate this !

– both codes use same data ideally from accepted

benchmarks (not just your favorite data!)

– comparable machines or fair (conservative) scaling

– Avoid uncomparabilities like: “Yeah we are worse but

twice as fast”

– real world data wherever possible

– as much different inputs as possible

– its fine if you are better just on some (important) inputs

Page 289: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 289

Not Here but Important

describing the setup

finding sources of measurement errors

reducing measurement errors (averaging, median,unloaded

machine. . . )

measurements in the creative phase of experimental

algorithmics.

Page 290: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 290

The Starting Point

(Several) Algorithm(s)

A few quantities to be measured: time, space, solution

quality, comparisons, cache faults,. . . There may also be

measurement errors.

An unlimited number of potential inputs. condense to a

few characteristic ones (size, |V |, |E|, . . . or problem

instances from applications)

Usually there is not a lack but an abundance of data 6= many

other sciences

Page 291: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 291

The Process

Waterfall model?

1. Design

2. Measurement

3. Interpretation

Perhaps the paper should at least look like that.

Page 292: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 292

The Process

Eventually stop asking questions (Advisors/Referees listen

!)

build measurement tools

automate (re)measurements

Choice of Experiments driven by risk and opportunity

Distinguish mode

explorative: many different parameter settings, interactive,

short turnaround times

consolidating: many large instances, standardized

measurement conditions, batch mode, many machines

Page 293: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 293

Of Risks and Opportunities

Example: Hypothesis = my algorithm is the best

big risk: untried main competitor

small risk: tuning of a subroutine that takes 20 % of the time.

big opportunity: use algorithm for a new application

new input instances

Page 294: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 294

Presenting Data from Experiments in

Algorithmics

Restrictions

black and white easy and cheap printing

2D (stay tuned)

no animation

no realism desired

Page 295: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 295

Basic Principles

Minimize nondata ink

(form follows function, not a beauty contest,. . . )

Letter size ≈ surrounding text

Avoid clutter and overwhelming complexity

Avoid boredom (too little data per m2).

Make the conclusions evident

Page 296: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 296

Tables

+ easy

− easy overuse

+ accurate values ( 6= 3D)

+ more compact than bar chart

+ good for unrelated instances (e.g. solution quality)

− boring

− no visual processing

rule of thumb that “tables usually outperform a graph for small

data sets of 20 numbers or less” [Tufte 83]

Curves in main paper, tables in appendix?

Page 297: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 297

2D Figures

default: x = input size, y = f (execution time)

Page 298: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 298

x AxisChoose unit to eliminate a parameter?

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

10 100 1000 10000 100000 1e+06

impr

ovem

ent m

in(T

* 1,T* ∞

)/T* *

k/t0

P=64P=1024P=16384

length k fractional tree broadcasting. latency t0 + k

Page 299: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 299

x Axislogarithmic scale?

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

10 100 1000 10000 100000 1e+06

impr

ovem

ent m

in(T

* 1,T* ∞

)/T* *

k/t0

P=64P=1024P=16384

yes if x range is wide

Page 300: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 300

x Axislogarithmic scale, powers of two (or

√2)

0

50

100

150

1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapbottom up aligned 4-ary heapsequence heap

with tic marks, (plus a few small ones)

Page 301: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 301

gnuplotset xlabel "N"

set ylabel "(time per operation)/log N [ns]"

set xtics (256, 1024, 4096, 16384, 65536, "2^18" 262144,

set size 0.66, 0.33

set logscale x 2

set data style linespoints

set key left

set terminal postscript portrait enhanced 10

set output "r10000timenew.eps"

plot [1024:10000000][0:220]\

"h2r10000new.log" using 1:3 title "bottom up binary heap"

"h4r10000new.log" using 1:3 title "bottom up aligned 4-ary

"knr10000new.log" using 1:3 title "sequence heap" with linespoints

Page 302: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 302

Data File256 703.125 87.8906

512 729.167 81.0185

1024 768.229 76.8229

2048 830.078 75.4616

4096 846.354 70.5295

8192 878.906 67.6082

16384 915.527 65.3948

32768 925.7 61.7133

65536 946.045 59.1278

131072 971.476 57.1457

262144 1009.62 56.0902

524288 1035.69 54.51

1048576 1055.08 52.7541

2097152 1113.73 53.0349

4194304 1150.29 52.2859

8388608 1172.62 50.9836

Page 303: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 303

x Axislinear scale for ratios or small ranges (#processor,. . . )

0

100

200

300

400

500

600

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Tim

e pe

r ed

ge [n

s]

Edge density

PrimFilter

Page 304: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 304

x AxisAn exotic scale: arrival rate 1− ε of saturation point

1

2

3

4

5

6

7

8

2 4 6 8 10 12 14 16 18 20

aver

age

dela

y

1/ε

nonredundantmirrorring shortest queuering with matchingshortest queue

Page 305: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 305

y AxisAvoid log scale ! scale such that theory gives ≈ horizontal lines

0

50

100

150

1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapbottom up aligned 4-ary heapsequence heap

but give easy interpretation of the scaling function

Page 306: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 306

y Axisgive units

0

50

100

150

1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapbottom up aligned 4-ary heapsequence heap

Page 307: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 307

y Axisstart from 0 if this does not waste too much space

0

50

100

150

1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapbottom up aligned 4-ary heapsequence heap

you may assume readers to be out of Kindergarten

Page 308: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 308

y Axisclip outclassed algorithms

1

2

3

4

5

6

7

8

2 4 6 8 10 12 14 16 18 20

aver

age

dela

y

1/ε

nonredundantmirrorring shortest queuering with matchingshortest queue

Page 309: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 309

y Axisvertical size: weighted average of the slants of the line segments

in the figure should be about 45[Cleveland 94]

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

10 100 1000 10000 100000 1e+06

impr

ovem

ent m

in(T

* 1,T* ∞

)/T* *

k/t0

P=64P=1024P=16384

Page 310: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 310

y Axisgraph a bit wider than high, e.g., golden ratio [Tufte 83]

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

10 100 1000 10000 100000 1e+06

impr

ovem

ent m

in(T

* 1,T* ∞

)/T* *

k/t0

P=64P=1024P=16384

Page 311: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 311

Multiple Curves

+ high information density

+ better than 3D (reading off values)

− Easily overdone

≤ 7 smooth curves

Page 312: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 312

Reducing the Number of Curves

use ratios

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

10 100 1000 10000 100000 1e+06

impr

ovem

ent m

in(T

* 1,T* ∞

)/T* *

k/t0

P=64P=1024P=16384

Page 313: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 313

Reducing the Number of Curves

omit curves

outclassed algorithms (for case shown)

equivalent algorithms (for case shown)

Page 314: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 314

Reducing the Number of Curves

split into two graphs

1

2

3

4

5

6

7

8

2 4 6 8 10 12 14 16 18 20

aver

age

dela

y

1/ε

nonredundantmirrorring shortest queuering with matchingshortest queue

Page 315: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 315

Reducing the Number of Curvessplit into two graphs

1

1.2

1.4

1.6

1.8

2

2 4 6 8 10 12 14 16 18 20

aver

age

dela

y

1/ε

shortest queuehybridlazy sharingmatching

Page 316: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 316

Keeping Curves apart: log y scale

0

200

400

600

800

1000

1200

1400

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Tim

e pe

r ed

ge [n

s]

Edge density

Prim (Scalar)I-Max (Scalar)

Prim (Vectorized)I-Max (Vectorized)

Page 317: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 317

Keeping Curves apart: smoothing

1

2

3

4

5

6

7

8

2 4 6 8 10 12 14 16 18 20

aver

age

dela

y

1/ε

nonredundantmirrorring shortest queuering with matchingshortest queue

Page 318: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 318

Keys

1

2

3

4

5

6

7

8

2 4 6 8 10 12 14 16 18 20

aver

age

dela

y

1/ε

nonredundantmirrorring shortest queuering with matchingshortest queue

same order as curves

Page 319: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 319

Keysplace in white space

1

1.2

1.4

1.6

1.8

2

2 4 6 8 10 12 14 16 18 20

aver

age

dela

y

1/ε

shortest queuehybridlazy sharingmatching

consistent in different figures

Page 320: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 320

Todsünden1. forget explaining the axes

2. connecting unrelated points

by lines

3. mindless use/overinterpretation

of double-log plot

4. cryptic abbreviations

5. microscopic lettering

6. excessive complexity

7. pie charts

Page 321: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 321

Arranging Instances

bar charts

stack components of execution time

careful with shading

postprocessing

phase 2

phase 1

preprocessing

Page 322: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 322

Arranging Instancesscatter plots

1

10

100

1000

1 10 100 1000

n / a

ctiv

e se

t siz

e

n/m

Page 323: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 323

Measurements and Connections

straight line between points do not imply claim of linear

interpolation

different with higher order curves

no points imply an even stronger claim. Good for very

dense smooth measurements.

Page 324: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 324

Grids and Ticks

Avoid grids or make it light gray

usually round numbers for tic marks!

sometimes plot important values on the axis

Page 325: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 325

0

5

10

15

20

32 1024 32768n

Measurementlog n + log ln n + 1log n

errors may not be of statistical nature!

Page 326: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 326

3D

− you cannot read off absolute values

− interesting parts may be hidden

− only one surface

+ good impression of shape

Page 327: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 327

Caption

what is displayed

how has the data been obtained

surrounding text has more.

Page 328: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 328

Check List

Should the experimental setup from the exploratory phase

be redesigned to increase conciseness or accuracy?

What parameters should be varied? What variables should

be measured? How are parameters chosen that cannot be

varied?

Can tables be converted into curves, bar charts, scatter plots

or any other useful graphics?

Should tables be added in an appendix or on a web page?

Should a 3D-plot be replaced by collections of 2D-curves?

Can we reduce the number of curves to be displayed?

How many figures are needed?

Page 329: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 329

Scale the x-axis to make y-values independent of some

parameters?

Should the x-axis have a logarithmic scale? If so, do the

x-values used for measuring have the same basis as the tick

marks?

Should the x-axis be transformed to magnify interesting

subranges?

Is the range of x-values adequate?

Do we have measurements for the right x-values, i.e.,

nowhere too dense or too sparse?

Should the y-axis be transformed to make the interesting

part of the data more visible?

Should the y-axis have a logarithmic scale?

Page 330: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 330

Is it be misleading to start the y-range at the smallest

measured value?

Clip the range of y-values to exclude useless parts of

curves?

Can we use banking to 45?

Are all curves sufficiently well separated?

Can noise be reduced using more accurate measurements?

Are error bars needed? If so, what should they indicate?

Remember that measurement errors are usually not random

variables.

Use points to indicate for which x-values actual data is

available.

Connect points belonging to the same curve.

Page 331: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 331

Only use splines for connecting points if interpolation is

sensible.

Do not connect points belonging to unrelated problem

instances.

Use different point and line styles for different curves.

Use the same styles for corresponding curves in different

graphs.

Place labels defining point and line styles in the right order

and without concealing the curves.

Captions should make figures self contained.

Give enough information to make experiments

reproducible.

Page 332: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 332

Literatur[1] Peter Sanders, Sebastian Schlag, and Ingo Müller. Communication efficient algorithms for fundamental big data

problems. In IEEE Int. Conf. on Big Data, 2013.

[2] D. A. Hutchinson, P. Sanders, and J. S. Vitter. Duality between prefetching and queued writing with parallel disks. SIAM

Journal on Computing, 34(6):1443–1463, 2005.

[3] K. Mehlhorn and P. Sanders. Scanning multiple sequences via cache memory. Algorithmica, 35(1):75–93, 2003.

[4] P. Sanders and S. Winkel. Super scalar sample sort. In 12th European Symposium on Algorithms, volume 3221 of LNCS,

pages 784–796. Springer, 2004.

[5] M. Rahn, P. Sanders, and J. Singler. Scalable distributed-memory external sorting. In 26th IEEE International Conference

on Data Engineering, pages 685–688, 2010.

[6] N. Leischner, V. Osipov, and P. Sanders. GPU sample sort. CoRR, abs/0909.5649, 2009. submitted for publication.

[7] R. Dementiev and P. Sanders. Asynchronous parallel disk sorting. In 15th ACM Symposium on Parallelism in

Algorithms and Architectures, pages 138–148, San Diego, 2003.

[8] J. Singler, P. Sanders, and F. Putze. MCSTL: The multi-core standard template library. In 13th Euro-Par, volume 4641 of

LNCS, pages 682–694. Springer, 2007.

[9] U. Meyer, P. Sanders, and J. Sibeyn, editors. Algorithms for Memory Hierarchies, volume 2625 of LNCS Tutorial.

Springer, 2003.

[10] K. Mehlhorn and P. Sanders. Algorithms and Data Structures — The Basic Toolbox. Springer, 2008.

[11] A. Beckmann, R. Dementiev, and J. Singler. Building a parallel pipelined external memory algorithm library. In 23rd

IEEE International Symposium on Parallel and Distributed Processing, pages 1–10, 2009.

[12] A. Beckmann, U. Meyer, P. Sanders, and J. Singler. Energy-efficient sorting using solid state disks. In 1st International

Green Computing Conference, pages 191–202. IEEE, 2010.

Page 333: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 333

[13] Patrick Flick, Peter Sanders, and Jochen Speck. Malleable sorting. In 27th International Symposium on Parallel &

Distributed Processing (IPDPS), pages 418–426. IEEE, 2013.

[14] M. Axtmann, T. Bingmann, P. Sanders, and C. Schulz. Practical Massively Parallel Sorting – Basic Algorithmic Ideas.

ArXiv e-prints, October 2014.

[15] P. Sanders. Fast priority queues for cached memory. ACM Journal of Experimental Algorithmics, 5, 2000.

[16] R. Dementiev, L. Kettner, and P. Sanders. STXXL: Standard Template Library for XXL data sets. Software Practice &

Experience, 38(6):589–637, 2008.

[17] J. Singler and B. Kosnik. The libstdc++ parallel mode: Software engineering considerations. In International Workshop

on Multicore Software Engineering (IWMSE), 2008.

[18] David A. Bader, Henning Meyerhenke, Peter Sanders, and Dorothea Wagner, editors. 10th DIMACS Implementation

Challenge – Graph Partitioning and Graph Clustering, volume 588 of Contemporary Mathematics. AMS, 2013.

[19] N. Leischner, V. Osipov, and P. Sanders. GPU sample sort. In 24th IEEE International Parallel and Distributed Processing

Symposium, 2010. see also arXiv:0909.5649.

[20] P. Sanders and T. Hansch. On the efficient implementation of massively parallel quicksort. In G. Bilardi, A. Ferreira,

R. Lüling, and J. Rolim, editors, 4th International Symposium on Solving Irregularly Structured Problems in Parallel,

number 1253 in LNCS, pages 13–24. Springer, 1997.

[21] Peter Sanders and Jan Wassenberg. Engineering a multi-core radix sort. In Euro-Par, volume 6853 of LNCS, pages

160–169. Springer, 2011.

[22] P. Sanders, S. Egner, and J. Korst. Fast concurrent access to parallel disks. In 11th ACM-SIAM Symposium on Discrete

Algorithms, pages 849–858, 2000.

[23] K. Kaligosi and P. Sanders. How branch mispredictions affect quicksort. In 14th European Symposium on Algorithms

(ESA), volume 4168 of LNCS, pages 780–791, 2006.

[24] Timo Bingmann, Andreas Eberle, and Peter Sanders. Engineering parallel string sorting. Technical report, KIT, 2014.

Page 334: Sanders: Algorithm Engineering Algorithm Engineering für ...algo2.iti.kit.edu/sanders/courses/algen16/vorlesung.pdf · Sanders: Algorithm EngineeringJune 16, 2015 1 Algorithm Engineering

Sanders: Algorithm Engineering June 16, 2015 334

[25] Timo Bingmann and Peter Sanders. Parallel string sample sort. In 21st European Symposium on Algorithme (ESA),

volume 8125 of LNCS, pages 169–180. Springer, 2013.