trip report final meeting and summer school of dfg priority program algorithm engineering

21
Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

Upload: rafe-greene

Post on 17-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

Trip ReportFINAL MEETING AND SUMMER SCHOOL OF

DFG PRIORITY PROGRAM

ALGORITHM ENGINEERING

Page 2: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 2

DFG PP 1307: Algorithm Engineering

PP 1307: Algorithm Engineering• 28 research projects

• 267 publications

• 17 software projects, e.g.:

• Multi-Core STL (MCSTL) – now gcc parallel mode

• STL for Extra Large Datasets (STXXL)

2014-10-27

DFG Priority Program: nationwide funding program over 6 years for up to 30 individual projects

Page 3: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 3

Recap: Algorithm Engineering

1. realistic modelshardware and problem

2. designefficient, implementable algorithms

3. analyzebeyond worst-case

4. implementwith hardware peculiarities in mind

5. experimentrepeatable, thorough interpretation

“The distance between theory and practice is closer in

theory than in practice”

[Y. Matias (Google) in his invited talk at ESA ‘12]

2014-10-27

Page 4: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 4

Final Meeting (17.09.2014)

9 talks, covering wide range of topics◦ route planning in road and public transport networks◦ graph clustering and partitioning◦ data compression◦ linear and mixed integer optimization◦ sequence analysis

no Indico used, slides only partially available

2014-10-27

Page 5: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 5

Summer School (18.-19.09.2014)

Two days of lectures and hands-on sessions◦ data compression (lecture only)

◦ linear and mixed integer optimization◦ network analysis - graph clustering and partitioning◦ shortest paths algorithms (lecture only)

about 30 PhD studentslots of discussion among students and lecturers

2014-10-27

Page 6: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 6

Selected Topics

2014-10-27

Page 7: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 7

Network Analysis Networks are everywhere

◦ Computer networks◦ Social networks◦ …

2014-10-27

Page 8: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 8

Network Analysis Network analysis mainly concerned with complex networks

◦ Small diameter◦ Varying degree distribution◦ Lots of triangles

2014-10-27

Page 9: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 9

Network AnalysisGRAPH CLUSTERING

◦ Find (non-overlapping) internally dense, externally sparse subgraphs

◦ Unknown: Number of subgraphs, their size◦ Goals / Applications:

GRAPH PARTITIONING

◦ Partition vertex set into k (nearly) equally sized blocks

◦ Objective functions aim at small interfaces◦ Applications:

◦ Numerical simulations◦ route planning◦ distributed graph algorithms

o Uncover community structure (analysis, ...)

o Prepartition network (distributed storage, ...)

2014-10-27

Page 10: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 10

Network AnalysisGRAPH CLUSTERING

Algorithms:◦ Label propagation algorithm◦ Louvain greedy method

Many different metrics:◦ Conductance◦ Expansion◦ Modularity◦ …

GRAPH PARTITIONING

Algorithms:◦ Size-constrained label propagation◦ Diffusion-based partitioning

2014-10-27

Page 11: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 11

Network Analysis NetworKit:

◦ Toolkit developed during the project for network analysis – C++ with Python bindings◦ Includes wide range of tools for graph analysis◦ Excellent IPython notebook-based tutorial

◦ Includes algorithms proposed for evolving networks◦ Analyze changing social networks – e.g. ITI email graph

Interest for CERN:◦ Community detection on the grid planning of file transfers◦ Track reconstruction ongoing work

2014-10-27

Page 12: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 12

Shortest Paths and Routing Problem: find shortest path between s and t in weighted graph G

Algorithms:◦ Dijkstra’s algorithm too slow for large graphs◦ Manifold speedup techniques [survey]

◦ A : search with Euclidean bounds (classic)∗◦ ALT: A search with landmarks, preprocessing computes distances to landmarks∗◦ Contraction Hierarchies: introduce shortcuts between “important” vertices of the graph◦ Hub Labeling: every vertex stores distance to several hubs, covering the graph

◦ Most techniques rely on (more or less) expensive pre-computations

2014-10-27

Page 13: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 13

Shortest Paths and Routing Problem: User-defined cost functions render pre-computations futile

Solution: Three-stage processing [Delling et al. 2013]1. Metric-independent pre-processing

Recursively partition graphGenerate arcs between entry and exit nodes to neighboring partitions

2. Metric-dependent pre-processingCompute metric between all shortcut arcs

3. QueryFind shortest-path in contracted graph and unpack it in original one

2014-10-27

≈ hr

≈ s

≈ μs

Page 14: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 14

Shortest Paths and Routing Routing in public transport networks is a much harder problem

◦ Inherent time-dependence◦ Solved using (potentially huge!) event-activity networks

Interest for CERN:◦ Grid tiers already define contraction hierarchy

examine actual data flows for missing/misplaced hubs

2014-10-27

Page 15: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 15

Data Compression

Requirements:◦ Compressed space◦ Decompression time◦ Compression time is not much an issue

Compressor on dataset

MINGW (1gb)

Compressed space (MB)

Decompression time

(secs)Gzip 344 5.5Lzma 188 8.3Snappy 461 0.9

Trade-off

“Snappy is widely used inside Google, in everything from BigTable and MapReduce …”

Problem: compress once, decompress many times

2014-10-27

Page 16: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 16

Data Compression Reminder: Lempel-Ziv compression

a a c a a c a b c a a d a a a<6,3>

a c<0,d>This part has been already compressed <3,2> <11,3>

Greedy approach only optimal if every pair takes constant space◦ but variable number of bits required for distances non-optimal

Bit-optimal LZ parsing [Ferragina et al. 2013]◦ Solve shortest path problem on DAG describing possible compression pairs

2014-10-27

Page 17: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 17

Data Compression Bi-criteria Compression [Farruggia et al. 2014]:

◦ Space and decompression time edge weight in DAQ◦ Fix space constraint, search for lowest decompression time and vice versa

2014-10-27

Page 18: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 18

Data Compression Different approach to compression: Burrows-Wheeler Transform [introduction]

2014-10-27

Page 19: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 19

Data Compression Different approach to compression: Burrows-Wheeler Transform

◦ Yields smaller compression size but longer decompression time◦ Construction of BWT closely related to suffix-array construction◦ Allows decompression of any substring

FM index [Ferragina and Manzini 2000]◦ Used BWT and auxiliary data structures to answer count and locate queries on compressed text

Interest for CERN:◦ Compression of ROOT files + access of individual entries◦ Compression of and search in dictionaries

2014-10-27

Page 20: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 20

Miscellaneous Linear programming

◦ Disprove of Hirsch conjecture poses thread to simplex method still well in practice

◦ Anecdote: interior point method patented by AT&T circumvent patent by polar transformation of problem and usage of barrier method

SeqAn◦ Package for analysis of (genome) sequences◦ Developers face similar problems as HEP:

Bridge gap between computer science and real world problems

External memory algorithms◦ Flow computations for massive LiDAR terrain data sets◦ General trick of time forward processing to reduce I/O

2014-10-27

Page 21: Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING

TRIP REPORT: ALGORITHM ENGINEERING 21

Conclusions◦ Final meeting gave good overview of broad activity in DFG PP 1307 “Algorithm Engineering”◦ Summer school expanded on four focus topics of the PP

◦ Similar research continues in DFG PP DFG 1736 “Algorithms for Big Data”◦ Funding period 2013-2019◦ Currently 16 projects covering graph analysis, energy efficient scheduling, search and text indexing, genome assembly,…

◦ Most projects concerned with computer science problems◦ Computational biology problems present in both PPs

2014-10-27

HEP community needs to explore how to exploit this resource of expertise and funding