cap1250-fast data meets big data.pdf

Upload: kinankazuki104

Post on 14-Apr-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    1/41

    Fast Data Meets Big Data

    Jags Ramnarayan, VMware, Inc.

    -- Chief Architect, GemFire/SQLFire

    Mike Stolz, VMware, Inc.-- Global Senior Staff Architect

    APP-CAP1250

    #vmworldapps

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    2/41

    2

    Disclaimer

    This session may contain product features that are

    currently under development.

    This session/overview of the new technology represents

    no commitment from VMware to deliver these features in

    any generally available product.

    Features are subject to change, and must not be included in

    contracts, purchase orders, or sales agreements of any kind.

    Technical feasibi lity and market demand will affect final delivery.

    Pricing and packaging for any new technologies or features

    discussed or presented have not been determined.

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    3/41

    3

    Whats Common?

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    4/41

    4

    Whats Common?

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    5/41

    5

    Fast Data Meets Big Data

    Big Data allows you tofind opportunities you

    didnt know you had

    Fast Data allows you to

    respond to opportunities

    before they are gone

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    6/41

    6

    Fast Data Meets Big Data

    Working togetherthey enable entirely new business models

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    7/41

    7

    The Database is Being Stretched

    Big Data

    Petabytes vs.

    Gigabytes

    Democratize BI

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    8/41

    8

    The Database is Being Stretched

    Big Data

    Petabytes vs.

    Gigabytes

    Democratize BI

    Fast Data Low latency expectations

    Horizontal scale

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    9/41

    9

    The Database is Being Stretched

    Big Data Flexible Data Petabytes vs.

    Gigabytes

    Democratize BI

    Multi-structured data

    Developer productivity

    Fast Data Low latency expectations

    Horizontal scale

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    10/41

    10

    The Database is Being Stretched

    Big Data

    Cloud Delivery

    Flexible Data

    Virtualized

    Offered -as-a-Service

    Petabytes vs.

    Gigabytes

    Democratize BI

    Multi-structured data

    Developer productivity

    Fast Data Low latency expectations

    Horizontal scale

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    11/41

    11

    Need a Horizontally Scalable, Elastic Data Management Solution

    Add/remove dataservers dynamically

    Grow orshrink dynamically

    with no interruption of service or data loss

    Elastic

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    12/41

    12

    Tiered Data Strategy

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    13/41

    13

    Data Warehouse

    (structured)

    Looking Back Traditional Batch Analytics

    BatchDelay Transform

    RDBMS

    (structured)OLTP Data Validate

    Enrich

    OLTP

    Query/Update

    CEP/BAM

    (vendor proprietary)

    Business Events

    Analytic Quer ies

    Analytic Quer iesDelayed Batch

    Processing

    Transform

    Polling UI

    Online DB

    Other Data

    Online DB

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    14/41

    14

    Pipeline with Hadoop (Log Analytics)

    Logs,

    Raw data infiles

    ACQUIRE

    Sequentialprocess to filter,

    extract, xFrm

    TRANSFORM ANALYZE

    Analytics

    SQL DB

    MPP DBBatch SQL

    Visualization

    HDFS,

    MapReduce

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    15/41

    15

    The Real-time Pipeline

    Stream Data(Social, SaaS,Transactional)

    ACQUIRE

    Filter, enrich,correlate

    PROCESS IN REAL TIME BATCH

    ANALYZE

    SQL DB

    MPP DBBatch/

    single

    events?

    Raw Data in

    batches

    HDFS,MapReduce

    Low latency

    Online DB

    Filtered events

    Derived

    Insight

    Online Apps

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    16/41

    16

    New Architecture for Real-time (Custom)

    StreamsBATCH

    ANALYZE

    SQL DB

    MPP DB

    HDFS,

    MapReduce

    In-Memory Data Grids

    Buffer data, process events,

    In-memory Map-reduce

    (VMWare GemFire, SQLFire,

    Oracle Coherence, etc.)

    Stream Processing

    Derive insight with continuous

    event processing

    (Apache S4, STORM, Esper,

    StreamBase, GemFire)

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    17/41

    17

    What Principles Drive This Architecture?

    Very low latency ingest, high scalable (write scalabili ty)

    Support structured as well as unstructured

    Real-time processing cannot throttle incoming stream(s)

    Highly parallelizable with minimum IO (network and disk)

    Be elastic

    Cannot lose events else derived value is questionable

    Post processing Raw, derived events (batch analytics)

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    18/41

    18

    What Principles Drive This Architecture?

    STAGE to SCALE

    Staged Events Driven Architecture

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    19/41

    19

    Acquire, Transform, Filter

    (Fast Ingest)

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    20/41

    20

    In-memory Data Grid Concepts

    Distributed memory oriented key-value store

    Queriable, Indexable and transactional

    Distr ibuted namespace of Maps (key-value)

    Called Regions (GemFire, Hibernate), Cache(Oracle), etc.

    2 key storage models: Replication, Partitioning

    Handle thousands of concurrent connections

    Synchronous replication for

    slow changing data

    Replicated

    Region

    Partition for large data or highly transactional data

    Partitioned Region

    Redundant copy

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    21/41

    21

    Acquire, Cache, Transform, etc.

    High ingest partitioned buffering

    Expiry based on TTL, idleTime

    Windows count, heap size, LRU eviction

    Works with r igid or flexible Schema (JSON, Objects, SQL)

    Cache frequently used DB data for transform, massaging

    Partit ioned listeners for filtering, event transform, etc.

    21

    Handle thousands of concurrent connections

    Synchronous replication for

    slow changing data

    Replicated

    Region

    Partition for large data or highly transactional data

    Partitioned Region

    Redundant copy

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    22/41

    22

    22

    Continuously Available

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    23/41

    23

    Complex Stream Processing

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    24/41

    24

    24

    Continuous Filtering Using CQs

    When data changes,

    subscribers are pushed

    Async events reliably

    -Al l related data is

    accessible at memory

    speeds

    Streams

    Distributed

    Processing

    Apps subscribe to

    streams using Queries:

    Select * from Tweetswhere tweetCount > 10

    and userId in (X,Y,Z)

    CQ: Continuous Queries

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    25/41

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    26/41

    26

    Correlate, Joins with Data

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    27/41

    27

    Accessing Historical Data in Real-time

    Correlations/Joins with History

    Option 1) Keep history in memory

    Option 2) Keep in MPP DB (Greenplum DB)

    Option 3) In HDFS

    27

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    28/41

    28

    Real-time Aggregations

    with Parallel Processing

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    29/41

    29

    Map-reduce for Real Time

    Hadoop M/R is for sequential batch processing and not for real time

    Java Stored procedure

    @DistributedFunction(regionName="trades")public List AnalyzeTrades(@FilterKey Set months, String portfolio) {

    ...

    }

    Parallel Data aware function execution

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    30/41

    30

    Ingestion to Hadoop, MPP DB

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    31/41

    St d Pi li b i i i t ll t th

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    32/41

    32

    Staged Pipeline bringing it all together

    Use Spring Integration to orchestrate the pipeline

    Patterns: Pub-sub, splits, routers, Xfrm, etc.

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    33/41

    33

    Distribution of Analytic

    Results

    M lti Sit C bili t

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    34/41

    34

    Multi-Site Capabili ty

    Single Cluster Spanning Data Centers

    Data Fabric Node Data Fabric Node Data Fabric Node Data

    Active Everywhere

    M lti Sit C bili t

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    35/41

    35

    Multi-Site Capabili ty

    Active Everywhere

    Asynchronous, Fault Tolerant, Bi-Directional WAN Gateway

    Gl b l D t Di t ib ti

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    36/41

    36

    Global Data Distribution

    Distribute

    GemFire can keep clusters that are distributed around the world

    synchronized in real-time and can operate reliably in Disconnected,

    Intermittent and Low-Bandwidth network environments.

    Bringing It All Together What Wo ld It Look Like?

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    37/41

    37

    Bringing It All TogetherWhat Would It Look Like?

    Existing technologies

    working together

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    38/41

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    39/41

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    40/41

  • 7/27/2019 CAP1250-Fast Data Meets Big Data.pdf

    41/41