fast data platforms - hadoop user group (italy)
TRANSCRIPT
![Page 1: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/1.jpg)
Fast Data Platforms@HUG_Italy Meetup (17/4/2015)
@andrea_gioia
![Page 2: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/2.jpg)
Un po’ di storia
VoltDB e i Fast Data
Utilizzo di VoltDB in una Enterprise Data Platform
![Page 3: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/3.jpg)
Un po’ di storia
VoltDB e i Fast Data
Utilizzo di VoltDB in una Enterprise Data Platform
![Page 4: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/4.jpg)
FASE 1: ONE SIZE FIT ALL
![Page 5: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/5.jpg)
FASE 2: OLAP vs OLTP
![Page 6: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/6.jpg)
FASE 2: ARCHITETTURA DATI
![Page 7: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/7.jpg)
…MA I VOLUMI CRESCONO VELOCEMENTE
![Page 8: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/8.jpg)
PROBLEMA: SCALABILITA’ SOLO VERTICALE
![Page 9: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/9.jpg)
SOLUZIONE: CODE + SHARDING
![Page 10: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/10.jpg)
SOLUZIONE: CODE + SHARDING
Partition-1 Partition-2 Partition-3 Partition-4 Partition-5 Partition-6
![Page 11: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/11.jpg)
…MA I VOLUMI CRESCONO VELOCEMENTE
![Page 12: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/12.jpg)
Problemi
PROBLEMI
1. Gestione dei fault
2. Gestione applicativa del cluster
3. Ricalcolo massivo
![Page 13: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/13.jpg)
FASE 3: HADOOP 1.0
Componenti1. Dati distribuiti (HDFS)2. Computazione distribuita (Map-Reduce)
Vantaggi1. Maschera la complessità della gestione
del cluster2. Minimizza gli spostamenti dei dati3. Scala orizzontalmente su commodity
hardware
![Page 14: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/14.jpg)
FASE 3: ARCHITETTURA
![Page 15: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/15.jpg)
FASE 3: DATA LAKE
Caratteristiche1. Tutti i dati al massimo livello di
dettaglio (Volume)2. Dati strutturati e non (Varietà)3. Dati aggiunti appena disponibili
(Velocità)4. Dati processabili in modalità
distribuita (Valore)
![Page 16: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/16.jpg)
DATA LAKE != DWH
![Page 17: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/17.jpg)
PROBLEMA: BIG MA NON FAST
COLLECT EXPLORE
ANALYZEACT
RISULTATI1. Scoperta2. Interrogazione3. Ottimizzazione
![Page 18: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/18.jpg)
FASE 4: SQL on HADOOP
![Page 19: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/19.jpg)
FASE 4: ARCHITETTURA
![Page 20: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/20.jpg)
PROBLEMA: VELOCE MA NON ABBASTANZA
![Page 21: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/21.jpg)
…PERCHE’ I DATI CRESCONO IN VOLUME E IN VELOCITA’
![Page 22: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/22.jpg)
FASE 5: SPECIALIZZAZIONE
![Page 23: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/23.jpg)
FASE 5: LAMBDA ARCHITECTURE
Marged View(QUERY)
![Page 24: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/24.jpg)
LAMBDA ARCHITECTURE: PAIN POINTS
Problematiche1. Duplicazione della logica di calcolo2. Integrazione viste effettuata a livello
applicativo3. Molte componenti software da
gestire4. Molte componenti hardware
esposte a possibili fault5. Velocità del fast layer limitata dal
sistema di storage dello stato utilizzato
![Page 25: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/25.jpg)
FAST LAYER SEMPLIFICATO
![Page 26: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/26.jpg)
Un po’ di storia
VoltDB e i Fast Data
Utilizzo di VoltDB in una Enterprise Data Platform
![Page 27: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/27.jpg)
VoltDB è un database…1. In memory2. Partitioned3. Single-threaded4. Distributed5. ACID compliant
COS’E’?
![Page 28: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/28.jpg)
A tutte quelle applicazioni che hanno bisogno di processare grosse quantità di dati in modo affidabile e veloce (fast data)
Requisiti chiave per queste applicazioni sono…
1. Altissimo throughput2. Scalabilità3. Affidabilità4. High Availability
A CHI SERVE
![Page 29: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/29.jpg)
A CHI NON SERVE
A tutte quelle applicazioni che hanno bisogno di immagazzinare e confrontare grosse quantità di dati storici suddivisi su più tabelle (dwhe bi)
![Page 30: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/30.jpg)
DATA PARTITIONING
![Page 31: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/31.jpg)
DATA REPLICATION
![Page 32: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/32.jpg)
DISTRIBUTED PROCESSING
![Page 33: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/33.jpg)
Garantita per mezzo di …1. Replica delle partizioni
(K-SAFETY)
HIGH AVAILABILITY
![Page 34: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/34.jpg)
DURABILITY
Garantita per mezzo di …1. Snapshots periodici2. Command logging (sincrono o
asincrono)3. Replication (business continuity)
![Page 35: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/35.jpg)
Un po’ di storia
VoltDB e i Fast Data
Utilizzo di VoltDB in una Enterprise Data Platform
![Page 36: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/36.jpg)
DATA PLATFORM 1
![Page 37: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/37.jpg)
DATA PLATFORM 2
![Page 38: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/38.jpg)
DATA PLATFORM 2
![Page 39: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/39.jpg)
DATA PLATFORM 2
![Page 40: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/40.jpg)
DATA PLATFORM 2
APP APP
![Page 41: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/41.jpg)
GRAZIE!
![Page 42: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/42.jpg)
DOMANDE?
![Page 43: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/43.jpg)
BIBLIOGRAFIA
1. How to beat the CAP (Nathan Marz)2. Questioning the Lambda Architecture (Jay Kreps)3. The Log: What every software engineer should know about real-
time data's unifying abstraction (Jay Kreps)4. Polyglot Persistence (Martin Fowler)5. Fast Data and the New Enterprise Data Architecture (Scott Jarr)6. Simplifying the (complex) Lambda architecture (John Piekos)
![Page 44: Fast data platforms - Hadoop User Group (Italy)](https://reader030.vdocument.in/reader030/viewer/2022032421/55a85f751a28ab794c8b46c2/html5/thumbnails/44.jpg)
@quantycabiwww.quantyca.it