apache nifi and minifi: edge to core meetup

55
Apache NiFi and MiNiFi: Edge to Core Andy LoPresto - @ yolopey Apache NiFi PMC DataWorks Summit 2017 - Sydney 19 Sep 2017

Upload: dataworks-summit

Post on 21-Jan-2018

363 views

Category:

Technology


3 download

TRANSCRIPT

ApacheNiFiandMiNiFi:EdgetoCoreAndyLoPresto-@yolopey

ApacheNiFiPMCDataWorksSummit2017-Sydney

19Sep2017

©HortonworksInc.2011–2016.AllRightsReserved2

AgendaWhatisdataflowandwhatarethechallenges?ApacheNiFiIoTChallengesApacheMiNiFiExplorationCommunity

©HortonworksInc.2011–2016.AllRightsReserved3

AgendaWhatisdataflowandwhatarethechallenges?ApacheNiFiIoTChallengesApacheMiNiFiExplorationCommunity

©HortonworksInc.2011–2017.AllRightsReserved

GaugingAudienceFamiliarityWithNiFi

“What’saNeeFee?”

NoexperiencewithdataflowNoexperiencewithNiFi

“Icanpickthisupprettyquickly”

SomeexperiencewithdataflowSomeexperiencewithNiFi

“IrefactoredtheAmbariintegrationendpointtoallowformutualauthenticationTLSduringmycoffeebreak”

ForgottenmoreaboutNiFithanmostofuswilleverknow

©HortonworksInc.2011–2017.AllRightsReserved5

Let’sConnectAtoBProducersA.K.AThings

AnythingAND

Everything

Internet!

Consumers• User• Storage• System• …MoreThings

©HortonworksInc.2011–2017.AllRightsReserved6

Movingdataeffectivelyishard

Standards:http://xkcd.com/927/

©HortonworksInc.2011–2017.AllRightsReserved7

Whyismovingdataeffectivelyhard?

⬢ Standards⬢ Formats⬢ “ExactlyOnce”Delivery⬢ Protocols⬢ VeracityofInformation⬢ ValidityofInformation⬢ EnsuringSecurity⬢ OvercomingSecurity

⬢ Compliance⬢ Schemas⬢ ConsumersChange⬢ CredentialManagement⬢ “That[person|team|group]”⬢ Network*⬢ “ExactlyOnce”Delivery

©HortonworksInc.2011–2017.AllRightsReserved8

ConnectingAtoBtoCEasyenoughwithBashscripts,Ruby/Python/Groovy,etc.

Logfiles

SQL

BigData

©HortonworksInc.2011–2017.AllRightsReserved9

Let’sConnectLotsofAstoBstoAstoCstoBstoΔstoCstoϕsLet’sconsidertheneedsofacourierservice

PhysicalStore

GatewayServer

MobileDevices

Registers

ServerCluster

DistributionCenter CoreDataCenteratHQ

ServerCluster

OnDeliveryRoutes

Trucks Deliverers

DeliveryTruck:CreativeStall,https://thenounproject.com/creativestall/Deliverer:RigoPeter,https://thenounproject.com/rigo/CashRegister:SergeyPatutin,https://thenounproject.com/bdesign.by/HandScanner:EricPearson,https://thenounproject.com/epearson001/

©HortonworksInc.2011–2017.AllRightsReserved10

Great!Iamcollectingallthisdata!Let’suseit!Findingourneedlesinthehaystack

PhysicalStore

GatewayServer

MobileDevices

Registers

ServerCluster

DistributionCenter

Kafka

CoreDataCenteratHQ

ServerCluster

Others

Storm/Spark/Flink/Apex

Kafka

Storm/Spark/Flink/Apex

OnDeliveryRoutes

Trucks Deliverers

DeliveryTruck:CreativeStall,https://thenounproject.com/creativestall/Deliverer:RigoPeter,https://thenounproject.com/rigo/CashRegister:SergeyPatutin,https://thenounproject.com/bdesign.by/HandScanner:EricPearson,https://thenounproject.com/epearson001/

©HortonworksInc.2011–2017.AllRightsReserved11

Let’sConnectLotsofAstoBstoAstoCstoBstoΔstoCstoϕsRaiseyourhandifyouwanttomaintainPythonscriptsfortherestofyourlife

©HortonworksInc.2011–2016.AllRightsReserved12

AgendaWhatisdataflowandwhatarethechallenges?ApacheNiFiIoTChallengesApacheMiNiFiExplorationCommunity

©HortonworksInc.2011–2017.AllRightsReserved13

NiFiisbasedonFlowBasedProgramming(FBP)

FBPTerm NiFiTerm DescriptionInformationPacket

FlowFile Eachobjectmovingthroughthesystem.

BlackBox FlowFileProcessor

Performsthework,doingsomecombinationofdatarouting,transformation,ormediationbetweensystems.

BoundedBuffer

Connection Thelinkagebetweenprocessors,actingasqueuesandallowingvariousprocessestointeractatdifferingrates.

Scheduler FlowController

Maintainstheknowledgeofhowprocessesareconnected,andmanagesthethreadsandallocationsthereofwhichallprocessesuse.

Subnet ProcessGroup

Asetofprocessesandtheirconnections,whichcanreceiveandsenddataviaports.Aprocessgroupallowscreationofentirelynewcomponentsimplybycompositionofitscomponents.

©HortonworksInc.2011–2017.AllRightsReserved14

ApacheNiFiKeyFeatures

• Guaranteeddelivery• Databuffering

- Backpressure- Pressurerelease

• Prioritizedqueuing• FlowspecificQoS

- Latencyvs.throughput- Losstolerance

• Dataprovenance• Supportspushandpull

models

• Recovery/recordingarollinglogoffine-grainedhistory

• Visualcommandandcontrol

• Flowtemplates• Pluggable,multi-tenant

security• Designedforextension• Clustering

©HortonworksInc.2011–2017.AllRightsReserved15

FlowFilesarelikeHTTPdataHTTPData FlowFile

HTTP/1.1200OKDate:Sun,10Oct201023:26:07GMTServer:Apache/2.2.8(CentOS)OpenSSL/0.9.8gLast-Modified:Sun,26Sep201022:04:35GMTETag:"45b6-834-49130cc1182c0"Accept-Ranges:bytesContent-Length:13Connection:closeContent-Type:text/html

Helloworld!

StandardFlowFileAttributesKey:'entryDate’ Value:'FriJun1717:15:04EDT2016'Key:'lineageStartDate’Value:'FriJun1717:15:04EDT2016'Key:'fileSize’ Value:'23609'FlowFileAttributeMapContentKey:'filename’ Value:'15650246997242'Key:'path’ Value:'./’

BinaryContent*

Header

Content

©HortonworksInc.2011–2017.AllRightsReserved16

UserInterfaceLessofthis… …moreofthis

©HortonworksInc.2011–2017.AllRightsReserved17

UserInterface

©HortonworksInc.2011–2017.AllRightsReserved18

DataProvenance

▪ Constrained▪ High-latency▪ Localizedcontext

▪ Hybrid–cloud/on-premises▪ Low-latency▪ Globalcontext

Origin–attributionReplay–recovery

EvolutionoftopologiesLongretention

TypesofLineage• Event• Configuration

©HortonworksInc.2011–2017.AllRightsReserved19

DeeperEcosystemIntegration:220+Processors

Hash

Extract

Merge

Duplicate

Scan

GeoEnrich

Replace

ConvertSplit

Translate

RouteContent

RouteContext

RouteText

ControlRate

DistributeLoad

GenerateTableFetch

JoltTransformJSON

PrioritizedDelivery

Encrypt

Tail

Evaluate

Execute

AllApacheprojectlogosaretrademarksoftheASFandtherespectiveprojects.

Fetch

HTTP

Syslog

Email

HTML

Image

HL7

FTP

UDP

XML

SFTP

AMQP

WebSocket

©HortonworksInc.2011–2017.AllRightsReserved20

EdgeChallenges

⬢ Limitedcomputingcapability

⬢ Limitedpower/network

⬢ Restrictedsoftwarelibrary/platformavailability

⬢ NoUI

⬢ Physicallyinaccessible

⬢ Notfrequentlyupdated

⬢ Competingstandards/protocols

⬢ Scalability

⬢ Privacy&Security

©HortonworksInc.2011–2017.AllRightsReserved21

RecentExamples

⬢ WhentheMiraiattackhasitsownWikipediapage,that’snotgood

©HortonworksInc.2011–2017.AllRightsReserved22

NiFiSolvesEverything*

⬢ RunsonJVM

⬢ ProvidesUIforflowdesign&monitoring

⬢ Securitybuilt-in

⬢ TLS,authn/authz,encrypteddata

⬢ Handlespracticallyanyformat/protocol

©HortonworksInc.2011–2017.AllRightsReserved23

NiFiforIoT

⬢ NiFisupportsAMQP,MQTT,UDP,TCP,HTTP(S),CEF,JMS,(S)FTP,AWSIoT

⬢ Withalittlepruning,NiFicanrunonaRaspberryPi

©HortonworksInc.2011–2017.AllRightsReserved24

Example—SensorReadingsviaRP3B

⬢ TimSpann

⬢ SenseHatsensorattachment

⬢ Temp,humidity,pressure

⬢ 8x8LEDdisplay

⬢ PythonFlaskserverreadingsensorandpushingtoMQTT

⬢ NiFiconsumingMQTT

https://community.hortonworks.com/articles/55839/reading-sensor-data-from-remote-sensors-on-raspber.html

©HortonworksInc.2011–2017.AllRightsReserved25

SoWhyDoWeNeedADifferentSolution?

⬢ NiFiisdesignedto“ownthebox”

⬢ NiFi0.7.xstartedupinabout10-15minutesonRP3(593MB)

⬢ NiFi1.xstartedupinabout30minutesonRP3(760MB)

⬢ 33newprocessors

⬢ Rewriteformultitenantauthorization

⬢ CompleteUIoverhaul

©HortonworksInc.2011–2017.AllRightsReserved26

ApacheNiFiSubproject:MiNiFi

⬢ GetthekeypartsofNiFiclosetowheredatabeginsandprovidebidirectionalcommunication

⬢ NiFilivesinthedatacenter—giveitanenterpriseserveroraclusterofthem

⬢ MiNiFilivesasclosetowheredataisbornandisaguestonthatdeviceorsystem

⬢ IoT

⬢ Connectedcar

⬢ Legacyhardware

©HortonworksInc.2011–2017.AllRightsReserved27

WhybuildMiNiFi?

⬢ NiFiisbig

⬢ 1.3.0releaseis933MBcompressed

⬢ Canbemodifiedtoruninrestrictedenvironments,butrequiresmanualsurgery

⬢ ProvidesUI,provenancequery,etc.

⬢ Runsondedicatedmachines/clusters—“ownsthebox”

⬢ MiNiFilivesattheedge

⬢ NoUI

⬢ 0.1.0Javabinaryis45MB,C++binaryis746KB

⬢ “Goodguest”

©HortonworksInc.2011–2017.AllRightsReserved28

HowDoesMiNiFiInteractWithNiFi?

⬢ NiFi

⬢ Designflows

⬢ Aggregatedatafrommanysources

⬢ Performrouting/analysis/SEP

⬢ MiNiFi

⬢ Receiveflows

⬢ Collectdata

⬢ Sendforprocessing

©HortonworksInc.2011–2017.AllRightsReserved29

Let’sAddDimensionality

⬢ We’vebeenimaginingEDGEtoCOREasabi-directionallinearsystem

⬢ Let’sexpand thattotherealworld

©HortonworksInc.2011–2017.AllRightsReserved30

FlavorsofMiNiFi

⬢ MiNiFiJava(v0.2.0)

⬢ ModifiedversionofNiFi

⬢ NoUI

⬢ YAMLconfiguration

⬢ Reducedprocessorcount

⬢ 110bydefault,more availablewithadditionalNARs

⬢ MiNiFiC++(v0.2.0)

⬢ Writtenfromscratch

⬢ 10processorsbydefault

⬢ Bi-directionalsite-to-site&provenancedata

©HortonworksInc.2011–2017.AllRightsReserved31

NiFivsMiNiFiJavaProcesses

NiFiFramework

Components

MiNiFi

NiFiFramework

UserInterface

Components

NiFi

©HortonworksInc.2011–2017.AllRightsReserved32

NiFiJavaProcesses

Bootstrap

NiFi

UI

bootstrap.conf

nifi.properties

flow.xml.gzreads&modifies

reads

reads

starts

NiFi MiNiFi

©HortonworksInc.2011–2017.AllRightsReserved33

MiNiFiJavaProcesses

MiNiFi

Bootstrap

Configuration ChangeNotifier(s)

bootstrap.conf

nifi.properties

flow.xml.gzreads

reads

starts

config.ymltransforms

reads

into

NiFi MiNiFi

©HortonworksInc.2011–2017.AllRightsReserved34

WhatdoesMiNiFiprovide?

⬢ Datatagging/provenance

⬢ Governancefromedge(geopoliticalrestrictions)

⬢ Security(encryption,certificate-basedauthentication)

⬢ Lowlatency(immediatereactions&decision-making)

Connected Car Reference Platform Box

Tuner + DSRC CardConnectivity Card

©HortonworksInc.2011–2017.AllRightsReserved35

MiNiFionaConnectedCar

Comprehension

Collection

CANBus

Gateway

MCU MCU MCU

Ethernet/EthernetAVB

LocalInterconnectNetwork

Yettobeestablishedprotocol

ListenEthernet ListenLINListenCAN Listen<>

ParseCAN ParseEthernet ParseLIN Parse<>

Processing/Synthesis

Route

Transmit Execute PrioritizeFilter

©HortonworksInc.2011–2017.AllRightsReserved36

MiNiFionaConnectedCar

©HortonworksInc.2011–2017.AllRightsReserved37

MiNiFiExfil

⬢ Site-to-Site

⬢ NiFiprotocol

⬢ Twoimplementations

⬢ Rawsocket

⬢ HTTP(S)(Javaonly)

⬢ SecuredwithmutualauthenticationTLS

⬢ HTTP(S),(S)FTP,JMS,Syslog,File,Email,Process(Javaonly)

©HortonworksInc.2011–2017.AllRightsReserved38

AdvancedTopics

⬢ NewfeaturesinApacheNiFi1.2.0&1.3.0

⬢ NewfeaturesinApacheMiNiFiJava0.2.0&C++0.2.0

⬢ NewsubprojectApacheNiFiRegistry

©HortonworksInc.2011–2017.AllRightsReserved39

Newin1.2.0/1.3.0

⬢ RecordParsing

⬢ EncryptedProvenanceRepository

©HortonworksInc.2011–2017.AllRightsReserved40

RecordParsing

⬢ Previously,datahadtobedividedintoindividualflowfilestoperformwork

⬢ CSVoutputwith50klineswouldneedtobesplit,operatedon,re-merged

⬢ 1+50k+50k+1flowfiles=100kflowfiles

©HortonworksInc.2011–2017.AllRightsReserved41

RecordParsing

⬢ Nowflowfilecontentcancontainmany“record”elements

⬢ Readandwritewith*Readerand*WriterControllerServices

⬢ Performlookups,routing,conversion,SQLqueries,validation,andmore…

⬢ 1+1flowfiles=2flowfiles

©HortonworksInc.2011–2017.AllRightsReserved42

EncryptedProvenanceRepository

⬢ EveryprovenanceeventrecordisencryptedwithAESG/CMbeforebeingpersistedtodisk

⬢ Decryptedondeserializationforretrieval/query

⬢ Randomaccessviaoffsetseek

⬢ Handleskeymigration&rotation

©HortonworksInc.2011–2017.AllRightsReserved43

MiNiFiJava0.2.0

⬢ UpgradingofcorecomponentdependenciestoNiFi1.2.0

⬢ Initialcommandandcontrolservercapabilities

⬢ IncreasedsupportforNiFifeaturesinconfigurationYAMLinclusiveof:

⬢ SupportforHTTPSitetoSiteProxyProperties

⬢ ControllerServices

⬢ Bindingsitetositetoaspecificnetworkinterface

©HortonworksInc.2011–2017.AllRightsReserved44

MiNiFiC++0.2.0

⬢ IncorporationofCatchtestingframeworkandGooglelintingforcodequalityandenhancedtestcoverage

⬢ ProvidingsupportforreportingtasksandaninitialimplementationofSitetoSiteProvenancereporting

⬢ NewProcessorsinclusiveofPutFile,LIstenHTTP

©HortonworksInc.2011–2017.AllRightsReserved45

MiNiFiFeatureProposals

⬢ FlowVersioning

⬢ DevelopflowsforclassofMiNiFiinstances

⬢ Command&Control(C2)API(inJavamaster)

⬢ FileChangeIngestor

⬢ RestAPIIngestor

⬢ PullHTTPIngestor

©HortonworksInc.2011–2017.AllRightsReserved46

ApacheNiFiRegistry

⬢ “…complementaryapplicationthatprovidesacentrallocationforstorageandmanagementofsharedresourcesacrossoneormoreinstancesofNiFiand/orMiNiFi.”

©HortonworksInc.2011–2017.AllRightsReserved47

ApacheNiFiRegistry-FlowRegistry

⬢ Flowregistrystores&managesversionedflowdefinitions

⬢ IntegratedwithNiFitoallowsave/retrieve/upgradeoperationsfromcanvas

⬢ Adminofusers,groups,andpolicies

©HortonworksInc.2011–2017.AllRightsReserved48

ApacheNiFiRegistry-FlowRegistry

©HortonworksInc.2011–2017.AllRightsReserved49

ApacheNiFiRegistry-FlowRegistry

©HortonworksInc.2011–2017.AllRightsReserved50

ApacheNiFiRegistry-FlowRegistry

©HortonworksInc.2011–2017.AllRightsReserved51

WhyNiFi&MiNiFi?

⬢ Movingdataismultifacetedinitschallengesandthesearepresentindifferentcontextsatvaryingscopes– Intervsintra,domestically,internationally

⬢ Providecommontoolingandextensionsthatareneededbutbeflexibleforextension– LeverageexistinglibrariesandexpansiveJavaecosystemforfunctionality– Alloworganizationstointegratewiththeirexistinginfrastructure

⬢ Empowerfolksmanagingyourinfrastructuretomakechangesandreasonaboutissuesthatareoccurring– DataProvenancetoshowcontextanddata’sjourney– UserInterface/Experienceakeycomponent

©HortonworksInc.2011–2017.AllRightsReserved52

HealthyCommunity

©HortonworksInc.2011–2017.AllRightsReserved53

Learnmoreandjoinus

Apache NiFi site https://nifi.apache.org

Subproject MiNiFi site https://nifi.apache.org/minifi/

Subscribe to and collaborate at [email protected] [email protected]

Submit Ideas or Issues https://issues.apache.org/jira/browse/NIFI

Follow us on Twitter @apachenifi

©HortonworksInc.2011–2017.AllRightsReserved54

LearnandshareatBirdsofaFeatherIOT,STREAMING&DATAFLOW

ThursdaySeptember216:00pm,C4.6

©HortonworksInc.2011–2017.AllRightsReserved

ThankYou

I’mstickingaroundfordiscussions/questions@yolopey/@[email protected]:70ECB3E598A65A3FD3C4BACE3C6EF65B2F7DEF69

55