apache nifi 1.0 in nutshell

Post on 07-Jan-2017

378 Views

Category:

Technology

6 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Apache NiFi 1.0 in NutshellKoji Kawamura – Software EngineerArti Wadhwani – Technical Support Engineer

2016 October 27

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat’s NiFi

NiFi 1.0 Enhancements

NiFi on the edge

Common issues

What’s Next?

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat’s NiFi

NiFi 1.0 Enhancements

NiFi on the edge

Common issues

What’s Next?

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

November 2014NiFi is donated to the Apache Software Foundation (ASF) through NSA’s Technology Transfer Program and enters ASF’s incubator.

2006NiagaraFiles (NiFi) was first incepted at the National Security Agency (NSA)

A Brief History

July 2015NiFi reaches ASF top-level project status

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

” NiFi is like digging irrigation ditches as the water flows, rather than building out a sprinkler system in advance."

“NiFiは事前にスプリンクラーを配備するというより、

水が流れるのに合わせて用水路を整備するようなもんさ”

https://mail-archives.apache.org/mod_mbox/nifi-users/201604.mbox/%3C2FCCBD60-0A79-42F1-9F9B-A121591C826E@apache.org%3E

What’s Apache NiFi?

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi is a tool for

Data FlowManagement

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Store Data

Process and Analyze Data

Acquire Data

Simplistic View of DataFlows: Easy, Definitive

Dataflow

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Realistic View of Dataflows: Complex, Convoluted

Store Data

Process and Analyze Data

Acquire Data

Store DataStore Data

Store Data

Store Data

Acquire Data

Acquire Data

Acquire Data

Dataflow

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 has 170+ Processors, 30% Increase from NiFi 0.7

Hash

Extract

Merge

Duplicate

Scan

GeoEnrich

Replace

ConvertSplit

Translate

Route Content

Route Context

Route Text

Control Rate

Distribute Load

Generate Table Fetch

Jolt Transform JSON

Prioritized Delivery

Encrypt

Tail

Evaluate

Execute

HL7

FTP

UDP

XML

SFTP

HTTP

Syslog

Email

HTML

Image

AMQP

MQTT

All Apache project logos are trademarks of the ASF and the respective projects.

Fetch

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Deeper Ecosystem Integration – New Processors

Processor Description

Publish/ConsumeKafka Two NARs, with kafka 0.9/0.10 client libraries, respectively

JoltTransformJson Manipulate JSON data on the fly, with a preview functionality

GenerateTableFetch Incremental fetch + parallel fetch against source table partitions

PutHiveQL Ingest to Hive tables

SelectHiveQL Select from Hive tables

PutHiveStreaming ingest streaming data to Hive, leverage Hive streaming API

CovertAvroToORC Format conversation, Avro to ORC

Publish/ConsumeMQTT MQTT is a popular protocol in IoT world

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

SOURCES REGIONALINFRASTRUCTURE

COREINFRASTRUCTURE

Data Movement Management

ConstrainedHigh-Latency

Localized Context

Hybrid – Cloud/On-PremiseLow-Latency

Global Context

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hortonworks DataFlow (HDF)

Constrained High-latency Localized context

Hybrid – cloud/on-premises Low-latency Global context

SOURCES REGIONAL INFRASTRUCTURE

CORE INFRASTRUCTURE

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Flow Management

Detailed Break Down of Requirements

Req 1: Acquire data from various Wearable Device’s Cloud Instances

Req 2: Move Data from Customer Cloud Instances to on-premise instance

Req 3: Perform intelligent Routing & Filtering of data. The routing and filtering rules will be often changed at run-time.

Req 4: Deliver the data data to various downstream systems. New downstream apps should will always appear and the data should be fed to it when it comes online.

Req 5: Parse the device data to standardized format that downstream sysem can understand

Req 6: Enrich the data with contextual information including patient/customer info (age, gender, etc..)

Req 7: Recognize the pattern when the resting heart rate exceeds a certain threshold (the insight), and then create an alert/notification.

Req 8: Run a Outlier detection model on streaming heart rate that comes in. If the score is above certain threshold, alert on the heart rate.

Stream Processing & Analytics

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat’s NiFi

NiFi 1.0 Enhancements

NiFi on the edge

Common issues

What’s Next?

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0: Modernized UI

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Modernized UI – Complete Interface Redesign

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Connect Components to design your data flow

Component What for?Processor Purpose built processing unit e.g. GetXXX, PutXXXInput Port Receiving data endpoint btw Process Groups (local/remote)Output Port Exposing data endpoint btw Process Groups (local/remote)Process Group Must have, to design well structured data flowRemote Process Group Enable data transfer btw NiFi deployments via Site-to-SiteFunnel Bundle multiple relationships into oneTemplate Share part of data flowLabel Useful to visually group processors, and description

From left to right

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data Provenance

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0: Multitenant Authorization

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 0.x - Authorization Model

Previously had role based authorization– Dataflow Manager (DFM)– Monitor – Provenance– Admin– Proxy– NiFi

Limitation - All or nothing model– DFM can change everything, Monitor can change nothing– Can’t give a user ability to modify/view only certain components– Would require standing up multiple NiFi instances

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 - Authorization Model

NiFi 1.0 introduces a new delegated authorization model Authorize each request based on user identity, action, and resource

– Example for user1 modifying properties on processor1: • User Identity: user1• Action: WRITE• Resource: processor1 (uuid)

If authorizer says resource not found, parent is checked… if parent isn’t found, parent’s parent is checked, and so on…

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 – NiFi Managed Authorizer vs. External Authorizer

Managed Authorizer– File based persistence

• Could be be extended to other persistence mechanisms– NiFi UI to manage policies– NiFi controls authorization logic

External Authorizer– Ranger integration– Ranger UI to manage policies– Ranger controls authorization logic

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 – Managing Users

Clicking the new user icon allows the admin to create Users and Groups– Individual Users can be grouped– Groups can be assigned

members

Clicking the edit user icon allows the admin to update a specific User/Group

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 – UI OverviewUsers Icon in Global

Menu used to accessUsers/Groups

Lock Icon in GlobalMenu used to

accessGlobal policies

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 – UI Overview

Lock Icon in palette used to access

policies for currently selected component

Selection Context

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 – Overriding Component Policies

Component inherit policies from the closest ancestor Process Group with policies defined

View/Modify policies handled independently

Click Override to define a new policy, then add Users and Groups

New Users and Groups override the inherited policies (whitelisting)

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 - Multi-Tenancy Example

Create a Group for Team 1 and a Group for Team 2 Give Team 1 view & modify for Process Group 1 Give Team 2 view & modify for Process Group 2 A user from Team 1 would see:

Can’t see the name of the group and can’t right-click to configure the group, but can enter the group

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0 – Revisions

Revision per component Supports concurrent editing of different components without need for refreshing

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0: Zero Master Clustering

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 0.x: NCM (NiFi Cluster Manager)

NCM

Node1

Node2

ExternalData Source

Chunk

Chunk

Chunk

Distribution mechanismdepends on data source

Web UI

OtherNiFi

Interact with NCM

Site-to-Site:Get topology from NCMThen transfer data p2p

Primary

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0: ZMC (Zero Master Clustering)

Node1

Node2

Node3

ExternalData Source

Chunk

Chunk

Chunk

Distribution mechanism depends on data source

Web UI

OtherNiFi

Interact with any node

Site-to-Site:Get topology from one of nodes

Then transfer data p2pZookeeper

Primary

Coordinator

Zookeeper electsCluster Coordinator and Primary node

Any node can fail

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi 1.0: And More!

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Foundational Work for SDLC Deterministic template export

– Deterministic ordering, template xml file

– Version control of the template

– Collaborative SDLC effort

Variable registry

– Phase one implementation

– In-memory variable registry

– The same key referenced in a template, mapped to different environmental

specific values

34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

JVM

REST API

NiFi

Framework

Proc CS Report Task

Extension API

S2S API

JVM

S2S Client Libraries

Site-to-Site Refactoring – S2S HTTP(S) Protocol through Proxy Server

Socket protocol: TCP

HDF 2.0: HTTP(s) protocol

HTTP proxy

36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat’s NiFi

NiFi 1.0 Enhancements

NiFi on the edge

Common issues

What’s Next?

37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Edge Intelligence with Apache MiNiFi

Guaranteed delivery Data buffering

‒ Backpressure‒ Pressure release

Prioritized queuing Flow specific QoS

‒ Latency vs. throughput‒ Loss tolerance

Data provenance

Recovery / recording a rolling log of fine-grained history

Designed for extension

Different from Apache NiFi Design and Deploy Warm re-deploys

Key Features

38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi vs. MiNiFi Java Processor, Smaller Footprint ~40 MB

NiFi Framework

Components

MiNiFi

NiFi Framework

User Interface

Components

NiFi

39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat’s NiFi

NiFi 1.0 Enhancements

NiFi on the edge

Common issues

What’s Next?

40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Common issues

Hbase Connection Issues - ClassNotFoundException NiFi SSL issues ExecuteSQL Processor issues NiFi Content Repo full PutKafka/GetKafka issues Issues after enabling Kerberos OutOfMemory Issues

41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Interesting Issues/Use Cases

TBD (need to add 2-3 interesting issues/use cases)

42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Best Practices

Debug Logging in case of Processor issues

NiFi Site-to-Site Practices

Core Properties tuning

JVM tuning

Understanding health via NiFi UI

43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat’s NiFi

NiFi 1.0 Enhancements

NiFi on the edge

Common issues

What’s Next?

44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

What’s Next

Framework extension– Distributed data durability (HA

data)– Configuration management flows

(SDLC) Enhanced User Experience

– Template/Extension Registry– Variable Registry

Deeper ecosystem integration

Central Command and Control Native Agent (GA)

NiFi MiNiFi

https://cwiki.apache.org/confluence/display/NIFI/Product+requirements

Nifi product requirements Search!

45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank You

top related