the analysis of data from small volumes and simple algorithms to large data and complex systems —...

38
Анализ данных: от малых объемов и простых алгоритмов до больших данных и сложных систем Дмитрий Сподарец

Upload: black-sea-summit-it-conference-in-odessa

Post on 25-Jan-2017

337 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Анализ данных: от малых объемов и простых алгоритмов до больших данных и сложных систем

Дмитрий Сподарец

Page 2: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Обо мне

• Преподаватель ОНПУ кафедры Системного программного обеспечения

• Основатель FlyElephant и GeeksLab.

Page 3: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

FlyElephantPlatform for scientific computing and data management

Page 4: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Данные

Page 5: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Алгоритмы

Page 6: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Инфраструктура

Page 7: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Данные Алгоритмы

Инфраструктура

Page 8: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Много данных - это сколько?

Данные

Page 9: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

~30 PB / день

~10 PB / год

LSST

~15 PB / год

Page 10: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)
Page 11: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Инфраструктура

Page 12: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Данные Алгоритмы

Инфраструктура

Page 13: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Сценарии

Page 14: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Простые данные и простые алгоритмы

Page 15: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Много данных и сложные алгоритмы

Page 16: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Большие данные

Page 17: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Комбинирование

Page 18: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Простые данные и простые алгоритмыАлгоритмы

- Линейный поиск - Перемножение матриц- Поиска минимального пути- ….

Данные Инфраструктура

Page 19: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Библиотеки и инструментыEigen

eigen.tuxfamily.org

intel-mklsoftware.intel.com/en-us/intel-mkl

SciPywww.scipy.org

ND4Jnd4j.org

MATLABwww.mathworks.com www.scilab.org

Scilab

Juliajulialang.orgOctave

octave.org

Page 20: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Много данных и сложные алгоритмыАлгоритмы

- Data Mining- Machine Learning- Computer Vision- …

Данные Инфраструктура

MPI, OpenMP…

Page 21: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)
Page 22: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)
Page 23: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)
Page 24: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Message Passing Interface (MPI)

Page 25: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

OpenMP

Page 26: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

CUDA

Page 27: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Intel Xeon Phi

Page 28: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Большие данные

Данные Инфраструктура

NoSQL, MapReduce, Hadoop, Spark…

Page 29: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

NoSQL• Хранилище «ключ-значение»

Berkeley DB, MemcacheDB, Redis, Amazon DynamoDB.

• Хранилище семейств колонокHBase, Apache Cassandra, Apache Accumulo, Hypertable, SimpleDB (amazon.com)…

• Документо-ориентированная СУБД MongoDB, CouchDB, Couchbase, MarkLogic, eXist..

• Базы данных на основе графов Neo4j, OrientDB, AllegroGraph, InfiniteGraph…

Page 30: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

MapReduceМодель распределённых вычислений

• Map-шаг - предварительная обработка.

• Reduce-шаг - сверка результатов и формирование решения задачи.

Page 31: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Hadoop и Spark

Page 32: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

https://aws.amazon.com/ru/elasticmapreduce/

http://azure.microsoft.com/ru-ru/services/hdinsight/

Page 33: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Комбинирование

Page 34: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)
Page 35: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Анализ данных при помощи FlyElephant

Уже готово

C++OpenMP

Page 36: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Анализ данных при помощи FlyElephant

Что ждать в ближайшем релизе

MPI

R Python

Java

Page 37: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

http://flyelephant.net/

http://flyelephant.net/beta/

Page 38: The analysis of data from small volumes and simple algorithms to large data and complex systems — Dmitry Spodarets (FlyElephant, Tech Stage)

Q&A

Дмитрий Сподарец[email protected]