trip report final meeting and summer school of dfg priority program algorithm engineering
TRANSCRIPT
Trip ReportFINAL MEETING AND SUMMER SCHOOL OF
DFG PRIORITY PROGRAM
ALGORITHM ENGINEERING
TRIP REPORT: ALGORITHM ENGINEERING 2
DFG PP 1307: Algorithm Engineering
PP 1307: Algorithm Engineering• 28 research projects
• 267 publications
• 17 software projects, e.g.:
• Multi-Core STL (MCSTL) – now gcc parallel mode
• STL for Extra Large Datasets (STXXL)
2014-10-27
DFG Priority Program: nationwide funding program over 6 years for up to 30 individual projects
TRIP REPORT: ALGORITHM ENGINEERING 3
Recap: Algorithm Engineering
1. realistic modelshardware and problem
2. designefficient, implementable algorithms
3. analyzebeyond worst-case
4. implementwith hardware peculiarities in mind
5. experimentrepeatable, thorough interpretation
“The distance between theory and practice is closer in
theory than in practice”
[Y. Matias (Google) in his invited talk at ESA ‘12]
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 4
Final Meeting (17.09.2014)
9 talks, covering wide range of topics◦ route planning in road and public transport networks◦ graph clustering and partitioning◦ data compression◦ linear and mixed integer optimization◦ sequence analysis
no Indico used, slides only partially available
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 5
Summer School (18.-19.09.2014)
Two days of lectures and hands-on sessions◦ data compression (lecture only)
◦ linear and mixed integer optimization◦ network analysis - graph clustering and partitioning◦ shortest paths algorithms (lecture only)
about 30 PhD studentslots of discussion among students and lecturers
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 6
Selected Topics
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 7
Network Analysis Networks are everywhere
◦ Computer networks◦ Social networks◦ …
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 8
Network Analysis Network analysis mainly concerned with complex networks
◦ Small diameter◦ Varying degree distribution◦ Lots of triangles
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 9
Network AnalysisGRAPH CLUSTERING
◦ Find (non-overlapping) internally dense, externally sparse subgraphs
◦ Unknown: Number of subgraphs, their size◦ Goals / Applications:
GRAPH PARTITIONING
◦ Partition vertex set into k (nearly) equally sized blocks
◦ Objective functions aim at small interfaces◦ Applications:
◦ Numerical simulations◦ route planning◦ distributed graph algorithms
o Uncover community structure (analysis, ...)
o Prepartition network (distributed storage, ...)
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 10
Network AnalysisGRAPH CLUSTERING
Algorithms:◦ Label propagation algorithm◦ Louvain greedy method
Many different metrics:◦ Conductance◦ Expansion◦ Modularity◦ …
GRAPH PARTITIONING
Algorithms:◦ Size-constrained label propagation◦ Diffusion-based partitioning
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 11
Network Analysis NetworKit:
◦ Toolkit developed during the project for network analysis – C++ with Python bindings◦ Includes wide range of tools for graph analysis◦ Excellent IPython notebook-based tutorial
◦ Includes algorithms proposed for evolving networks◦ Analyze changing social networks – e.g. ITI email graph
Interest for CERN:◦ Community detection on the grid planning of file transfers◦ Track reconstruction ongoing work
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 12
Shortest Paths and Routing Problem: find shortest path between s and t in weighted graph G
Algorithms:◦ Dijkstra’s algorithm too slow for large graphs◦ Manifold speedup techniques [survey]
◦ A : search with Euclidean bounds (classic)∗◦ ALT: A search with landmarks, preprocessing computes distances to landmarks∗◦ Contraction Hierarchies: introduce shortcuts between “important” vertices of the graph◦ Hub Labeling: every vertex stores distance to several hubs, covering the graph
◦ Most techniques rely on (more or less) expensive pre-computations
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 13
Shortest Paths and Routing Problem: User-defined cost functions render pre-computations futile
Solution: Three-stage processing [Delling et al. 2013]1. Metric-independent pre-processing
Recursively partition graphGenerate arcs between entry and exit nodes to neighboring partitions
2. Metric-dependent pre-processingCompute metric between all shortcut arcs
3. QueryFind shortest-path in contracted graph and unpack it in original one
2014-10-27
≈ hr
≈ s
≈ μs
TRIP REPORT: ALGORITHM ENGINEERING 14
Shortest Paths and Routing Routing in public transport networks is a much harder problem
◦ Inherent time-dependence◦ Solved using (potentially huge!) event-activity networks
Interest for CERN:◦ Grid tiers already define contraction hierarchy
examine actual data flows for missing/misplaced hubs
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 15
Data Compression
Requirements:◦ Compressed space◦ Decompression time◦ Compression time is not much an issue
Compressor on dataset
MINGW (1gb)
Compressed space (MB)
Decompression time
(secs)Gzip 344 5.5Lzma 188 8.3Snappy 461 0.9
Trade-off
“Snappy is widely used inside Google, in everything from BigTable and MapReduce …”
Problem: compress once, decompress many times
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 16
Data Compression Reminder: Lempel-Ziv compression
a a c a a c a b c a a d a a a<6,3>
a c<0,d>This part has been already compressed <3,2> <11,3>
Greedy approach only optimal if every pair takes constant space◦ but variable number of bits required for distances non-optimal
Bit-optimal LZ parsing [Ferragina et al. 2013]◦ Solve shortest path problem on DAG describing possible compression pairs
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 17
Data Compression Bi-criteria Compression [Farruggia et al. 2014]:
◦ Space and decompression time edge weight in DAQ◦ Fix space constraint, search for lowest decompression time and vice versa
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 18
Data Compression Different approach to compression: Burrows-Wheeler Transform [introduction]
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 19
Data Compression Different approach to compression: Burrows-Wheeler Transform
◦ Yields smaller compression size but longer decompression time◦ Construction of BWT closely related to suffix-array construction◦ Allows decompression of any substring
FM index [Ferragina and Manzini 2000]◦ Used BWT and auxiliary data structures to answer count and locate queries on compressed text
Interest for CERN:◦ Compression of ROOT files + access of individual entries◦ Compression of and search in dictionaries
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 20
Miscellaneous Linear programming
◦ Disprove of Hirsch conjecture poses thread to simplex method still well in practice
◦ Anecdote: interior point method patented by AT&T circumvent patent by polar transformation of problem and usage of barrier method
SeqAn◦ Package for analysis of (genome) sequences◦ Developers face similar problems as HEP:
Bridge gap between computer science and real world problems
External memory algorithms◦ Flow computations for massive LiDAR terrain data sets◦ General trick of time forward processing to reduce I/O
2014-10-27
TRIP REPORT: ALGORITHM ENGINEERING 21
Conclusions◦ Final meeting gave good overview of broad activity in DFG PP 1307 “Algorithm Engineering”◦ Summer school expanded on four focus topics of the PP
◦ Similar research continues in DFG PP DFG 1736 “Algorithms for Big Data”◦ Funding period 2013-2019◦ Currently 16 projects covering graph analysis, energy efficient scheduling, search and text indexing, genome assembly,…
◦ Most projects concerned with computer science problems◦ Computational biology problems present in both PPs
2014-10-27
HEP community needs to explore how to exploit this resource of expertise and funding