stream processing in go

of 18/18
1 Stream Processing In Go Khosrow Afroozeh Sunil Sayyaparaju

Post on 15-Jan-2017

1.011 views

Category:

Software

0 download

Embed Size (px)

TRANSCRIPT

  • 1

    Stream Processing In Go

    Khosrow AfroozehSunil Sayyaparaju

  • 2

    Streams are the Norm Need for BusinessAnalyticsgenerates endlessstreams of data

    HorizontalScaling adds tothe number ofstreams

    Stream variety ison the rise

    Streams need tobe composed andco-processed

  • 3

    Stream

    ArraysSlicesChannelsBuffersFilesDatabase Queries...

  • 4

    Stream ElementsNo Generics In Go, so stream elements are boxed

    objects:

    interface{}

    There is no type-safety for generic streamprocessing.

    Not a big deal really, Schemaless datasourcesreturn interfaces anyway.

    It can be easily managed by runtime type-checking in the first step of the pipeline.

  • 5

    Classic Collections

  • 6

    Traditional Compositions 1

    stream 1

    stream

    2

    stream1.Join(stream2).Filter(...)

    API InterfaceProblem

  • 7

    Traditional Compositions 2

    stream 1

    stream

    2

    Join(stream1, stream2)

    Lots of Gophers Needed forPipelining, Signature Problem

    Still Unsolved

    Filter(stream3, ...)

    stream3

  • 8

    Problem

    Dont want to code1 unlessabsolutely necessary

    Dont want to repeat ourselves More code leads to more maintenanceand testing

    1 not on company hours at least! YMMV.

  • 9

    Abstraction Goals Data processing should be decoupledfrom data structures.

    Compositions should happen on data, not datastructures.

    Note: denotes type. This is not valid Gocode.

    Note: f and m are functions, e.g:

    f(value interface{}) bool m(value interface{}) interface{}

  • 10

    Abstraction Goals Contd Data should not be transportedduring transformation, unlessnecessary.

  • 11

    Transducers1

    1 Idea inspired by Clojure. Fair enough, they got inspired by channels ;)

  • 12

    Transducers Impl.

  • 13

    Reducer

    Responsible for chaining of the pipeline:stream t1 t2 tn reducer result

  • 14

    Transducers Impl. Example

  • 15

    Transduction

    Flush is used when some function in thechain would like to eject the operation.

    When all the data in the stream has beenprocessed or a flush has been requested,method Complete() is called to capturethe states in the stateful reducers.

    Chain of functions call eachother:

    f, m => m(f(val))

  • 16

    Example

  • 17

    Observations Cons

    No compile-time type safety Tricky to parallelize

    Pros Fewer Go-routines for long pipelines Fewer synchronizations For channels Potentially uses less memory Decoupled processing logic from data structures Better compositions More readable

  • 18

    Thank You

    Khosrow Afroozeh: @parshua [email protected]

    Sunil Sayyaparajou [email protected]

    mailto:[email protected]:[email protected]

    Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18