python blaze overview

Post on 27-Jan-2015

112 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Blaze is a next-generation NumPy library to provide out of memory data access. This talk summarizes its features.

TRANSCRIPT

Next Generation NumPy !

blaze.pydata.org

BlazeNumPy

Out of Core, Distributed and Optimized

NumPy

NumPy Array

shape

Blaze: Different kinds of Arrays

Indexable

NDTable NDArray

Deferred Concrete Deferred Concrete

Record Type Primitive Type

Blaze Deferred Arrays

+"

A" *"

B" C"

A + B*C

• Symbolic objects which build a graph • Represents deferred computation

Usually what you have when you have a Blaze Array

Deferred allows handling large arrays������ ������

� �

�������� ���������

��

��

��

������

Can be handled out-of-core using chunks to

stream through memory.

Blaze Concrete Array

URL URL URL URL URL

IndexesData Descriptor

Extensible Type System which includes shape

DataShape

Where are the bytes?

What do the bytes mean?

MetaData DictionaryLabels, provenance, etc.

Multiple URLs comprising an array

�������������� �

������ �� �

URLs Provide Bytes

Memory-Like Arbitrarily sliced Random Seeks

Deal with in chunks Random Seeks

Deal with in Chunks Sequential Seeks

File-Like

Stream-Like

Blaze Data Container

ByteProvider

Data BufferIndex Operation

NumPy BLZ Persistent Format

RDBMS

Data Descriptor Protocol

CSVData Stream

Indexes

Contiguous / Strided

Chunked / Tiled

Opaque Element-only

NumPy-Like

Opaque Iterator-access

Special Access

Indexes allow for many orderings

������������ ���� � ���� ����������

�������� ����� �������� ������

���������� � ��������������

��� ������������� �������

DataShape Type System

�������������

��������

����• A data description language • A super-set of NumPy’s dtype • Provides more flexibility

Shape DType

DataShape

Allows for all kinds of containers

���������� ��������������� ���

�����������������������������������

�� ���������

��

��

� � �� �

!���������

������������� ����

Advanced Types

type Point = { x : int; y : int}!type Space = { a: Point; b: Point}!5, 10, Space

type SquareMatrix T = N, N, T

type IntMatrix N = N, N, int32

Parametrized Types

Alias Types

Advanced Shapes

{1,2,4,2,1}, int32 [ [1], [1,2], [1,3,2,9], [3,2], [3]]

Could Represent

Execution Model

• Graphs dispatch to specialized library code that is “registered with the system” based on type and meta-data of array (blaze Modules)

• Many operations can be compiled with LLVM to machine-code • BLIR (simple typed expression syntax) • Numba (Python compiler)

Blaze Agents

MongoDB

Vertica

HDFS

CSV Directory

Blaze Agent

Blaze Agent

Blaze Agent

Blaze Agent

Code Graph with Blaze Arrays

Code

Data

“I think you should be more explicit here in step two.”

How?

Out-of-core calculations

NumFOCUS

Num(Py) Foundation for Open Code for Usable Science

http://www.numfocus.org

BLZ persistence

BLZ$layout$at$a$glance$

root$

meta$ data$

__0__.blp$ __1__.blp$

Header&

Offset$0$

Offset$1$

Offset$2$

<<<<<$

Block$0$

Block$1$

Block$2$

Header&

Offset$0$

Offset$1$

Offset$2$

<<<<<$

Chunk$0$

Chunk$1$

Chunk$2$

Chunk$Super<Chunk$

Blosc$format$Bloscpack$(BLP)$format$Blaze$(BLZ)$format$

Dataset$

top related