microsoft proprietary high productivity computing large-scale knowledge discovery: co-evolving...

Microsoft ProprietaryMicrosoft Proprietary

High Productivity Computing

Large-scale Knowledge Discovery:Co-evolving Algorithms and Mechanisms

Steve Reinhardt

Principal ArchitectMicrosoft

Prof. John Gilbert, UCSBDr. Viral Shah, UCSB


Context for Knowledge Discovery

From Debbie Gracio and Ian Gorton, PNNL Data Intensive Computing Initiative


Knowledge Discovery (KD) Definition

• Data-intensive computing: when the acquisition and movement of input data is a primary limitation on feasibility or performance

• Simple data mining: searching for exceptional values on elemental measures (e.g., heat, #transactions)

• Knowledge discovery: searching for exceptional values on associative/social measures (e.g., most between, belonging to greatest number of valuable reactions)


Today’s Biggest Obstacle in the KD Field• Lack of fast feedback

between domain experts and infrastructure/tool developers about good usable scalable KD software platforms

• Need to accelerate the rate of learning about both good KD algorithms and good KD infrastructure

Domain experts want:• Good infrastructure that works• … and scales greatly and runs fast• Flexibility to develop/tweak

algorithms to suit their needs• Algorithms with strong math basisBut don’t know• The best approach or algorithms

Infrastructure developers want:• Clear audience for what they develop• Architecture that copes with client,

cluster, cloud, GPU, and huge dataBut don’t know• The best approach

Need to get good (not perfect) scalable platforms in use to co-evolve

towards best approaches and algorithms


Candidate ApproachesAd hoc “Visitor” Sparse-matrix-based

Description Build each algorithm from ground up

Tailor actions at key points in graph traversal

Cast graphs as sparse matrices and use sparse linear algebraic operations

Example Metis Boost Graph Library, Pregel KDT

Pros • Fast on single node, since highly tailored

• Fast, since tailored• Extensible to out-of-

memory formats (Pregel)

• Proven math basis• Built-in tolerance for high

cluster latency• Good use of local

memory hierarchy• Extensible to out-of-

memory formatsCons • Unclear math basis

• Devpt is time-consuming, since no common kernels

• Scaling is hard• Poor use of local

memory hierarchy

• Unclear math basis• Each alg may need to

cope with high cluster latency

• Poor use of local memory hierarchy

• Mind-bender without good graph API on top

Notes • Not at domain-expert level

• Not at domain-expert level

• Graph layer at domain-expert level

Microsoft Proprietary

KDT Layers: Enable overloading with various technologies

Betweenness Centrality

…

Community Detection

Elementary Mode Analysis

Barycentric Clustering

LocalSpGEMM

LocalSpRef/SpAsgn

LocalSpMV

LocalSpAdd

LocalSpGEMMon semi-

rings

Parallel/distributed operations(constructors, SpGEMM, SpMV, SpAdd, SpGEMM semi-rings, I/O)

kdt.

scipy. LocalI/O

Localconstructors

All Pairs Shortest Path

LocalSpGEMM

(GPU)

LocalSpGEMM

(GPU)

Parallel/distributed operations(in-memory (Star-P) or out-of-memory (DryadLINQ-based))

All Pairs Shortest Path(Cray XMT)

…


DryadLINQ: Query + Plan + Parallel Execution• Dryad

– Distributed-memory coarse-grain run-time– Generalized MapReduce– Using computational vertices and

communication channels to form a dataflow execution graph

• LINQ (Language INtegrated Query)

– A query-style language interface to Dryad– Typical relational operators (e.g., Select,

Join, GroupBy)• Scaling for histogram example

– Input data 10.2TB, using 1,800 cluster nodes, 43,171 execution-graph vertices spawning 11,072 processes, creating 33GB output data in 11.5 minutes of execution

Files, TCP, FIFO, Networksched

data plane

control plane

NS PD PDPD

V V V

Job manager cluster


Star-P Bridges Scientists to HPCs

MATLAB

Star-P enables domain experts to use parallel, big-memory systems via

productivity languages

(e.g., the M language of MATLAB)

Knowledge discovery scaling with Star-P• Kernels to 55B edges between 5B

vertices, on 128 cores (consuming 4TB memory)

• Compact applications to 1B edges on 256 cores


Next Steps

• Get prototypes available for early experience and feedback– in-memory and out-of-memory targets of KDT– with graph layer– likely exposed via Python library interface


© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista, Windows 7, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft

Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.

MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

microsoft proprietary high productivity computing large-scale knowledge discovery: co-evolving...

Documents

local io local constructors

good kd algorithms

microsoft proprietary

memory dryadlinq

good infrastructure

ucsb slide

good graph api

domain expert level