collectors in the wild

107
Collectors in the Wild @JosePaumard

Upload: jose-paumard

Post on 21-Jan-2018

344 views

Category:

Education


2 download

TRANSCRIPT

Page 1: Collectors in the Wild

Collectors in theWild@JosePaumard

Page 2: Collectors in the Wild

Collectors?Why should we be interested in collectors?

▪ They are part of the Stream API

▪ And kind of left aside…

Page 3: Collectors in the Wild

Collectors?YouTube:

▪ Stream tutorials ~700k

▪ Collectors tutorials < 5k

Page 4: Collectors in the Wild

Collectors?Why should we be interested in collectors?

▪ They are part of the Stream API

▪ And kind of left aside…

And it’s a pity because it is a very powerful API

Page 5: Collectors in the Wild

@JosePaumard

Microsoft Virtual Academy

Page 6: Collectors in the Wild

Questions?#ColJ8

Page 7: Collectors in the Wild

movies.stream().flatMap(movie -> movie.actors().stream()).collect(

Collectors.groupingBy(Function.identity(), Collectors.counting()

)).entrySet().stream().max(Map.Entry.comparingByValue()).get();

Page 8: Collectors in the Wild

movies.stream().collect(

Collectors.groupingBy(movie -> movie.releaseYear(),

Collector.of(() -> new HashMap<Actor, AtomicLong>(), (map, movie) -> {

movie.actors().forEach(actor -> map.computeIfAbsent(actor, a -> new AtomicLong()).incrementAndGet()

) ;},(map1, map2) -> {

map2.entrySet().stream().forEach(entry -> map1.computeIfAbsent(entry.getKey(), a -> new AtomicLong()).addAndGet(entry.getValue().get())

) ;return map1 ;

}, new Collector.Characteristics [] {

Collector.Characteristics.CONCURRENT.CONCURRENT}

))

).entrySet().stream().collect(

Collectors.toMap(entry5 -> entry5.getKey(),entry5 -> entry5.getValue()

.entrySet().stream()

.max(Map.Entry.comparingByValue(Comparator.comparing(l -> l.get())))

.get())

).entrySet().stream().max(Comparator.comparing(entry -> entry.getValue().getValue().get())).get();

Page 9: Collectors in the Wild

Do not give bugs a place to hide!

Brian Goetz

Page 10: Collectors in the Wild

Collectors?Why should we be interested in collectors?

▪ They are part of the Stream API

▪ And kind of left aside…

And it’s a pity because it is a very powerful API

▪ Even if we can also write unreadable code with it!

Page 11: Collectors in the Wild

AgendaQuick overview about streams

About collectors

Extending existing collectors

Making a collector readable

Creating new collectors

Composing Collectors

Page 12: Collectors in the Wild

A Few Words on Streams

Page 13: Collectors in the Wild

About StreamsA Stream:

▪ Is an object that connects to a source

▪ Has intermediate & terminal operations

▪ Some of the terminal operations can be collectors

▪ A collector can take more collectors as parameters

Page 14: Collectors in the Wild

A Stream is…An object that connects to a source of data and watch them flow

There is no data « in » a stream ≠ collection

stream

Page 15: Collectors in the Wild

About StreamsOn a stream:

▪ Any operation can be modeled with a collector

▪ Why is it interesting?

stream.collect(collector);

Page 16: Collectors in the Wild

Intermediate Operations

Page 17: Collectors in the Wild

stream

1st operation: mapping = changing the type

Page 18: Collectors in the Wild

stream

2nd operation: filtering = removing some objects

Page 19: Collectors in the Wild

3rd operation: flattening

stream

Page 20: Collectors in the Wild

stream

3rd operation: flattening

Page 21: Collectors in the Wild

Map, Filter, FlatMapThree operations that do not need any buffer to work

Not the case of all the operations…

Page 22: Collectors in the Wild

Sorting elements using a comparator

The stream needs to see all the elementsbefore beginning to transmit them

stream

Page 23: Collectors in the Wild

stream

Distinct

The Stream needs to remember all the elements before transmitting them (or not)

Page 24: Collectors in the Wild

Distinct, sortedBoth operations need a buffer to store all the elements from the source

Page 25: Collectors in the Wild

Intermediate Operations2 categories:

- Stateless operations = do not need to remember anything

- Stateful operations = do need a buffer

Page 26: Collectors in the Wild

Limit and SkipTwo methods that rely on the order of the elements:

- Limit = keeps the n first elements

- Skip = skips the n first elements

Needs to keep track of the index of the elements and to process them in order

Page 27: Collectors in the Wild

Terminal Operations

Page 28: Collectors in the Wild

Intermediate vs TerminalOnly a terminal operation triggers the consuming of the data from the source

movies.stream().filter(movie -> movie.releaseYear() == 2007).flatMap(movie -> movie.actors().stream()).map(movie -> movie.getTitle());

Page 29: Collectors in the Wild

Intermediate vs TerminalOnly a terminal operation triggers the consuming of the data from the source

movies.stream().filter(movie -> movie.releaseYear() == 2007).flatMap(movie -> movie.actors().stream()).map(movie -> movie.getTitle()).forEach(movie -> System.out.println(movie.getTitle()));

Page 30: Collectors in the Wild

Terminal OperationsFirst batch:

- forEach

- count

- max, min

- reduce

- toArray

Page 31: Collectors in the Wild

Terminal OperationsFirst batch:

- forEach

- count

- max, min

- reduce

- toArray

Will consume all the data

Page 32: Collectors in the Wild

Terminal OperationsSecond Batch:

- allMatch

- anyMatch

- noneMatch

- findFirst

- findAny

Page 33: Collectors in the Wild

Terminal OperationsSecond Batch:

- allMatch

- anyMatch

- noneMatch

- findFirst

- findAny

Do not need to consume all the data = short-circuit operations

Page 34: Collectors in the Wild

Terminal OperationsSpecial cases:

- max

- min

- reduce

Returns an Optional (to handle empty streams)

https://www.youtube.com/watch?v=Ej0sss6cq14@StuartMarks

Page 35: Collectors in the Wild

A First CollectorAnd then there is collect!

The most seen:

Takes a collector as a parameter

List<String> result = strings.stream()

.filter(s -> s.itEmpty())

.collect(Collectors.toList());

Page 36: Collectors in the Wild

A First Collector (bis)And then there is collect!

The most seen:

Takes a collector as a parameter

Set<String> result = strings.stream()

.filter(s -> s.itEmpty())

.collect(Collectors.toSet());

Page 37: Collectors in the Wild

A Second CollectorAnd then there is collect!

Maybe less known?:

Takes a collector as a parameter

String authors = authors.stream()

.map(Author::getName)

.collect(Collectors.joining(", "));

Page 38: Collectors in the Wild

Demo Time

Page 39: Collectors in the Wild

A Third CollectorCreating a Map

Map<Integer, List<String>> result = strings.stream()

.filter(s -> !s.isEmpty())

.collect(Collectors.groupingBy(

s -> s.length())

);

Page 40: Collectors in the Wild

3

4

5

one, two, three, four, five, six, seven, eight, nine, ten

one, two, six, ten

four, five, nine

three, seven, eight

groupingBy(String::length)

Map<Integer, List<String>>

Page 41: Collectors in the Wild

3

4

5

one, two, three, four, five, six, seven, eight, nine, ten

one, two, six, ten

four, five, nine

three, seven, eight

groupingBy(String::length, downstream)

.stream().collect(downstream)

.stream().collect(downstream)

.stream().collect(downstream)

Page 42: Collectors in the Wild

3

4

5

one, two, three, four, five, six, seven, eight, nine, ten

one, two, six, ten

four, five, nine

three, seven, eight

groupingBy(String::length, Collectors.counting())

4L

3L

3L

Map<Integer, Long>

Page 43: Collectors in the Wild

A Third Collector (bis)Creating a Map

Map<Integer, Long> result = strings.stream()

.filter(s -> s.itEmpty())

.collect(Collectors.groupingBy(

s -> s.length(), Collectors.counting())

);

Page 44: Collectors in the Wild

Demo Time

Page 45: Collectors in the Wild

A Collector that CountsNumber of articles per author

Page 46: Collectors in the Wild

Gent & Walsh, Beyond NP: The QSAT Phase TransitionGent & Hoos & Prosser & Walsh, Morphing: Combining…

A1 A2

Gent

Walsh

Gent

Hoos

Prosser

Walsh

flatMap(Article::getAuthors)

Page 47: Collectors in the Wild

Gent & Walsh, Beyond NP: The QSAT Phase TransitionGent & Hoos & Prosser & Walsh, Morphing: Combining…

Gent, Walsh, Gent, Hoos, Prosser, Walsh

flatMap(Article::getAuthors)

Gent

Walsh

Hoos

2L

2L

1L

Prosser 1L

groupingBy(

)

groupingBy(identity(),counting()

)

groupingBy(identity(),

)

Page 48: Collectors in the Wild

Demo Time

Page 49: Collectors in the Wild

Supply, Accumulate and Combine

Page 50: Collectors in the Wild

Creating ListsA closer look at that code:

List<String> result = strings.stream()

.filter(s -> !s.isEmpty())

.collect(Collectors.toList());

Page 51: Collectors in the Wild

stream a b b

collector1) Build the list2) Add elements one

by one

a b c

ArrayList

Page 52: Collectors in the Wild

Creating Lists1) Building the list: supplier

2) Adding an element to that list: accumulator

Supplier<List> supplier = () -> new ArrayList();

BiConsumer<List<E>, E> accumulator = (list, e) -> list.add(e);

Page 53: Collectors in the Wild

In parallel

Stream

Collector

collector1) Build a list2) Add elements one

by one3) Merge the lists

CPU 2

Stream

CollectorCPU 1

Page 54: Collectors in the Wild

Creating Lists1) Building the list: supplier

2) Adding an element to that list: accumulator

3) Combining two lists

Supplier<List> supplier = ArrayList::new;

BiConsumer<List<E>, E> accumulator = List::add;

BiConsumer<List<E>, List<E>> combiner = List::addAll;

Page 55: Collectors in the Wild

Creating ListsSo we have:

List<String> result = strings.stream()

.filter(s -> !s.isEmpty())

.collect(ArrayList::new,List::add, List::adAll);

Page 56: Collectors in the Wild

Creating ListsSo we have:

List<String> result = strings.stream()

.filter(s -> !s.isEmpty())

.collect(ArrayList::new,Collection::add, Collection::adAll);

Page 57: Collectors in the Wild

Creating SetsAlmost the same:

Set<String> result = strings.stream()

.filter(s -> !s.isEmpty())

.collect(HashSet::new,Collection::add, Collection::adAll);

Page 58: Collectors in the Wild

String ConcatenationNow we need to create a String by concatenating the elements using a separator:

« one, two, six »

Works with Streams of Strings

Page 59: Collectors in the Wild

String ConcatenationLet us collect

strings.stream().filter(s -> s.length() == 3).collect(() -> new String(),

(finalString, s) -> finalString.concat(s), (s1, s2) -> s1.concat(s2));

Page 60: Collectors in the Wild

String ConcatenationLet us collect

strings.stream().filter(s -> s.length() == 3).collect(() -> new String(),

(finalString, s) -> finalString.concat(s), (s1, s2) -> s1.concat(s2));

Page 61: Collectors in the Wild

String ConcatenationLet us collect

strings.stream().filter(s -> s.length() == 3).collect(() -> new StringBuilder(),

(sb, s) -> sb.append(s), (sb1, sb2) -> sb1.append(sb2));

Page 62: Collectors in the Wild

String ConcatenationLet us collect

strings.stream().filter(s -> s.length() == 3).collect(StringBuilder::new,

StringBuilder::append, StringBuilder::append);

Page 63: Collectors in the Wild

String ConcatenationLet us collect

StringBuilder stringBuilder = strings.stream()

.filter(s -> s.length() == 3)

.collect(StringBuilder::new,StringBuilder::append, StringBuilder::append);

Page 64: Collectors in the Wild

String ConcatenationLet us collect

String string = strings.stream()

.filter(s -> s.length() == 3)

.collect(StringBuilder::new,StringBuilder::append, StringBuilder::append)

.toString();

Page 65: Collectors in the Wild

A Collector is…3 Operations

- Supplier: creates the mutable container

- Accumulator

- Combiner

Page 66: Collectors in the Wild

A Collector is…3 + 1 Operations

- Supplier: creates the mutable container

- Accumulator

- Combiner

- Finisher, that can be the identity function

Page 67: Collectors in the Wild

Collecting and ThenAnd we have a collector for that!

strings.stream().filter(s -> s.length() == 3).collect(

Collectors.collectingAndThen(collector, finisher // Function

));

Page 68: Collectors in the Wild

Demo Time

Page 69: Collectors in the Wild

7634L {2004, 7634L}

Map<Long, List<Entry<Integer, Long>>>

Page 70: Collectors in the Wild

7634L {2004, 7634L}

Map<Long, List<Entry<Integer, Long>>>

Entry<Integer, Long> -> Integer = mapping

Page 71: Collectors in the Wild

7634L {2004, 7634L}

Map<Long, List<Entry<Integer, Long>>>

Entry<Integer, Long> -> Integer = mapping

Function<> mapper = entry -> entry.getKey();

Collectors.mapping(mapper, toList());

Page 72: Collectors in the Wild

Demo Time

Page 73: Collectors in the Wild

Collect toMapUseful for remapping maps

Do not generate duplicate keys!

map.entrySet().stream().collect(

Collectors.toMap(entry -> entry.getKey(), entry -> // create a new value

));

Page 74: Collectors in the Wild

Custom Collectors:1) Filter, Flat Map2) Joins3) Composition

Coffee break!

Page 75: Collectors in the Wild

About Types

Page 76: Collectors in the Wild

The Collector Interfacepublic interface Collector<T, A, R> {

public Supplier<A> supplier(); // A: mutable container

public BiConsumer<A, T> accumulator(); // T: processed elments

public BinaryOperator<A> combiner(); // Often the type returned

public Function<A, R> finisher(); // Final touch

}

Page 77: Collectors in the Wild

The Collector Interfacepublic interface Collector<T, A, R> {

public Supplier<A> supplier(); // A: mutable container

public BiConsumer<A, T> accumulator(); // T: processed elments

public BinaryOperator<A> combiner(); // Often the type returned

public Function<A, R> finisher(); // Final touch

public Set<Characteristics> characteristics();}

Page 78: Collectors in the Wild

Type of a CollectorIn a nutshell:

- T: type of the elements of the stream

- A: type the mutable container

- R: type of the final container

We often have A = R

The finisher may be the identity function≠

Page 79: Collectors in the Wild

one, two, three, four, five, six, seven, eight, nine, ten

groupingBy(String::length)

3

4

5

one, two, six, ten

four, five, nine

three, seven, eight

Page 80: Collectors in the Wild

one, two, three, four, five, six, seven, eight, nine, ten

Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(String::length)

3

4

5

one, two, six, ten

four, five, nine

three, seven, eight

Page 81: Collectors in the Wild

one, two, three, four, five, six, seven, eight, nine, ten

Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(String::length)

3

4

5

one, two, six, ten

four, five, nine

three, seven, eight

Page 82: Collectors in the Wild

one, two, three, four, five, six, seven, eight, nine, ten

Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(String::length)

3

4

5

one, two, six, ten

four, five, nine

three, seven, eight

Page 83: Collectors in the Wild

one, two, three, four, five, six, seven, eight, nine, ten

Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(

String::length,?

)

3

4

5

one, two, six, ten

four, five, nine

three, seven, eight

Page 84: Collectors in the Wild

one, two, three, four, five, six, seven, eight, nine, ten

Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(

String::length,Collector<String, ?, >

)

3

4

5

one, two, six, ten

four, five, nine

three, seven, eight

Page 85: Collectors in the Wild

one, two, three, four, five, six, seven, eight, nine, ten

Collector<String, ?, Map<Integer, Value>> c = groupingBy(

String::length,Collector<String, ?, Value>

)

counting() : Collector<T, ?, Long>

3

4

5

4L

3L

3L

Page 86: Collectors in the Wild

Intermediate Operations

Page 87: Collectors in the Wild

Intermediate CollectorsBack to the mapping collector

This collector takes a downstream collectorstream.collect(mapping(function, downstream));

Page 88: Collectors in the Wild

Intermediate CollectorsThe mapping Collector provides an intermediate operation

stream.collect(mapping(function, downstream));

Page 89: Collectors in the Wild

Intermediate CollectorsThe mapping Collector provides an intermediate operation

Why is it interesting?

To create downstream collectors!

So what about integrating all our streamprocessing as a collector?

stream.collect(mapping(function, downstream));

Page 90: Collectors in the Wild

Intermediate CollectorsIf collectors can map, why would’nt they filter, or flatMap?

…in fact they can in 9 ☺

Page 91: Collectors in the Wild

Intermediate CollectorsThe mapping Collector provides an intermediate operation

We have a Stream<T>

So predicate is a Predicate<T>

Downstream is a Collector<T, ?, R>

stream.collect(mapping(function, downstream));

stream.collect(filtering(predicate, downstream));

Page 92: Collectors in the Wild

Intermediate CollectorsThe mapping Collector provides an intermediate operation

We have a Stream<T>

So flatMapper is a Function<T, Stream<TT>>

And downstream is a Collector<TT, ?, R>

stream.collect(mapping(function, downstream));

stream.collect(flatMapping(flatMapper, downstream));

Page 93: Collectors in the Wild

Demo Time

Page 94: Collectors in the Wild

CharacteristicsThree characteristics for the collectors:

- IDENTITY_FINISH: the finisher is the identityfunction

- UNORDERED: the collector does not preservethe order of the elements

- CONCURRENT: the collector is thread safe

Page 95: Collectors in the Wild

Handling Empty OptionalsTwo things:

- Make an Optional a Stream

- Remove the empty Streams with flatMap

Map<K, Optional<V>> // with empty Optionals...-> Map<K, Steam<V>> // with empty Streams-> Stream<Map.Entry<K, V>> // the empty are gone-> Map<K, V> // using a toMap

Page 96: Collectors in the Wild

Joins1) The authors that published the most

together

2) The authors that published the mosttogether in a year

StreamsUtils to the rescue!

Page 97: Collectors in the Wild

Gent & Walsh, Beyond NP: The QSAT Phase TransitionGent & Hoos & Prosser & Walsh, Morphing: Combining…

Gent, Hoos, Prosser, Walsh

Gent, Walsh

{Gent, Walsh}

{Gent, Hoos} {Gent, Prosser} {Gent, Walsh}{Hoos, Prosser} {Hoos, Walsh}{Prosser, Walsh}

flatMap()

Page 98: Collectors in the Wild

Demo Time

Page 99: Collectors in the Wild

Application What is interesting in modeling a processing as a collector?

We can reuse this collector as a downstreamcollector for other processings

Page 100: Collectors in the Wild

What About Readability?Creating composable Collectors

Page 101: Collectors in the Wild

Demo Time

Page 102: Collectors in the Wild

Dealing with IssuesThe main issue is the empty stream

A whole stream may have elements

But when we build an histogram, a givensubstream may become empty…

Page 103: Collectors in the Wild

Conclusion

Page 104: Collectors in the Wild

API CollectorA very rich API indeed

Quite complex…

One needs to have a very precise idea of the data processing pipeline

Can be extended!

Page 105: Collectors in the Wild

API CollectorA collector can model a whole processing

Once it is written, it can be passed as a downstream to another processing pipeline

Can be made composable to improvereadability

https://github.com/JosePaumard

Page 106: Collectors in the Wild

Thank you for yourattention!

Page 107: Collectors in the Wild

Questions?

@JosePaumard

https://github.com/JosePaumard

https://www.slideshare.net/jpaumard

https://www.youtube.com/user/JPaumard