tpl dataflow – зачем и для кого?
TRANSCRIPT
Control your data right way
TPL Dataflow crash course
Mikhail Veselov
Moscow
20 Апрель 2015
2
History overview § TPL Dataflow is another abstracGon level § ReacGve Extensions & Flow sync § TTB Flow Graph analog § AAL à CCR à TDF
TPL Dataflow – crash course IntroducGon
20 Апрель 2015
Data
AcGon
Recycle
Cache
Thread Pool
Threads
3
Main idea -‐ why Dataflow? § Define your applicaGon dataflow § Async I/O and CPU-‐oriented code, high-‐throughput & low-‐latency § Random, unstructured data (compare with Parallel and PLINQ) § Rx IObservable<T> support § Easy start: ActionBlock<T> & BufferBlock<T>
TPL Dataflow – crash course IntroducGon
20 Апрель 2015
FIFO queue Action
Example: GZIP compressing simple schema
4
Interfaces -‐ IDataflowBlock public interface IDataflowBlock { void Complete(); void Fault(Exception error); Task Completion { get; } } block.Completion.ContinueWith(t => { if (t.IsFaulted) ((IDataflowBlock)nextBlock).Fault(t.Exception); else nextBlock.Complete(); });
TPL Dataflow – crash course Fundamental Interfaces
20 Апрель 2015
§ IDataflowBlock – base interface, no abstract implementaGon
§ Task CompleGon staGc methods: § WhenAll § ContinueWith
§ AggregateException with previous block’s fault § CancellationToken
5
Interfaces -‐ ITargetBlock public interface ITargetBlock<in TInput> : IDataflowBlock { DataflowMessageStatus OfferMessage( DataflowMessageHeader messageHeader, TInput messageValue, ISourceBlock<TInput> source, bool consumeToAccept); }
§ DataflowMessageStatus: – Accepted – OK, I’ll handle it – Declined – NO, take it back – Postponed – May Be, please, call back later J – NotAvailable – Tried to consume with no luck – DecliningPermanently – No, and don’t call me anymore L
§ bool consumeToAccept: – Call ConsumeMessage synchroniously
TPL Dataflow – crash course Fundamental Interfaces
20 Апрель 2015
6
Interfaces -‐ ISourceBlock public interface ISourceBlock<out TOutput> : IDataflowBlock { IDisposable LinkTo(ITargetBlock<TOutput> target, bool unlinkAfterOne); bool ReserveMessage( // prepare DataflowMessageHeader messageHeader, ITargetBlock<TOutput> target); TOutput ConsumeMessage( // commit DataflowMessageHeader messageHeader, ITargetBlock<TOutput> target, out bool messageConsumed); void ReleaseReservation( //rollback DataflowMessageHeader messageHeader, ITargetBlock<TOutput> target); }
§ 2-‐phase commit protocol (a.k.a. transacGon)
TPL Dataflow – crash course Fundamental Interfaces
20 Апрель 2015
7
Advanced Interfaces § IPropagatorBlock<TInput, TOutput>
public interface IPropagatorBlock<in TInput, out TOutput> : ITargetBlock<TInput>, ISourceBlock<TOutput> { }
§ 1-‐by-‐1 linking vs 1-‐by-‐n linking
§ IReceivableSourceBlock<TOutput> public interface IReceivableSourceBlock<TOutput> :
ISourceBlock<TOutput> { bool TryReceive(out TOutput item, Predicate<TOutput> filter); bool TryReceiveAll(out IList<TOutput> items); }
§ Easier data process
TPL Dataflow – crash course Fundamental Interfaces
20 Апрель 2015
8
Pure Buffering Blocks § BufferBlock<T>
– FIFO queue – Producer/Consumer
BufferBlock<FacebookDTO> dataToProcess = new BufferBlock<FacebookDTO>(); dataToProcess.PostAsync(newVideo); dataToProcess.Post(newRepost); dataToProcess.SendAsync(newLike);
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
9
Pure Buffering Blocks § BroadcastBlock<T>
– Current overwrite – No receivers – drop it
var bb = new BroadcastBlock<ImageDTO>(i => i); var saveToDisk = new ActionBlock<ImageDTO>(item => item.Image.Save(item.Path)); var showInUi = new ActionBlock<ImageDTO>(item => imagePanel.AddImage(item.Image), new DataflowBlockOptions { TaskScheduler = TaskScheduler.FromCurrentSynchronizationContext() }); bb.LinkTo(saveToDisk); bb.LinkTo(showInUi);
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
10
Pure Buffering Blocks § WriteOnceBlock<T>
– Singleton
writeOnce = new WriteOnceBlock<Lazy<Task<T>>>(i => i); writeOnce.Post(new Lazy<Task<T>>(() => Task.Run(amadeusConnectionFactory))); var lazyValue = await writeOnce.RecieveAsync(); var taskConnection = await lazyValue.Value;
var connection = taskConnection.Result;
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
11
Executor Blocks § ActionBlock<TInput> var chooser = new ActionBlock<PostDTO>(post => { Process(post); });
var threeMessageAtOnce = new DataflowBlockOptions { BoundedCapacity = 3, TaskScheduler = TaskScheduler.Current }; var threePerTask = new DataflowBlockOptions { MaxMessagesPerTask = 3 };
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
12
Executor Blocks § TransformBlock<TInput, TOutput>
– Output ordering-‐safe queue
var gzipper = new TransformBlock<byte[], Task<byte[]>> (b => Task.Run(() => Compress(b)); var RSAEncryptor = new TransformBlock<byte[], byte[]>(z => RSA(z)); gzipper.LinkTo(RSAEncryptor);
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
13
Executor Blocks § TransformManyBlock<TInput, TOutput>
– Produce zero or more items per 1 input message – Output can be a Task
// .SelectMany() analog var tagCloudAggregator = new TransformManyBlock<TagFromPost[], TagFromPost> (arrayOfTags => arrayOfTags); var filteringTags = return new TransformManyBlock<T, T>(async tag => await filter(tag) ? new [] { tag } : Enumerable.Empty<T>()); tagCloudAggregator.LinkTo(filteringTags); // provide info to UI filteringTags.TryRecieveAll(out tagDataSource); tagCloudControl.Show(tagDataSource);
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
14
Executor Blocks § NullTarget<TInput> Recycle bin
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
15
Joining Blocks § BatchBlock<T>
– accumulate and run var batch = new BatchBlock<T>(batchSize: Int32.MaxValue); new Timer(delegate { batch.TriggerBatch(); }).Change(1000, 1000);
var batch = new BatchBlock<T>(batchSize: 100);
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
16
Joining Blocks § JoinBlock<T1, T2, …>
– make a Tuple – StarvaGon problem
var throttle = new JoinBlock<SyntaxTree, Request>(); for (int i = 0; i < 10; ++i) throttle.Target1.Post(new SyntaxTree()); var processor = new TransformBlock<Tuple<SyntaxTree, Request>, SyntaxTree> (pair => { var request = pair.Item2; var resource = pair.Item1; request.ProcessWith(resource); return resource; }); throttle.LinkTo(processor); processor.LinkTo(throttle.Target1);
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
17
Joining Blocks § BatchedJoinBlock<T1, T2,…>
– accumulate a Tuples and run try { batchedJoin.Target1.Post(DoWork()); batchedJoin.Target2.Post(default(T2)); } Catch (Exception e) { batchJoin.Target2.Post(e); batchJoin.Target1.Post(default(T1)); } // Item1 – results from Target1 // Item2 – results from Target12 await batchedJoin.RecieveAsync();
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
18
ConfiguraKon OpKons – TPL support § TaskScheduler & SynchronizationContext
– TaskScheduler.Default is default – TaskScheduler.Current is not a default – ConcurrentExclusiveSchedulerPair
§ MaxDegreeOfParallelism – ExecuGonDataflowBlockOpGons – All operaGons are not concurrent by default
§ CancellationToken
TPL Dataflow – crash course Advanced topics
20 Апрель 2015
19
ConfiguraKon OpKons – load balancing § MaxMessagesPerTask
– Blocks try to minimize number of Tasks § MaxNumberOfGroups
– Grouping blocks autocomplete § Greedy
– How to create batches and join § BoundedCapacity
– load balancing – queue size var taskSchedulerPair = new ConcurrentExclusiveSchedulerPair(); var readerActions = from checkBox in new[] { checkBox1, checkBox2, checkBox3 } select new ActionBlock<int>(milliseconds => { toggleCheckBox.Post(checkBox); Thread.Sleep(milliseconds); toggleCheckBox.Post(checkBox);
);}, new ExecutionDataflowBlockOptions { TaskScheduler = taskSchedulerPair.ConcurrentScheduler });
TPL Dataflow – crash course Advanced topics
20 Апрель 2015
20
StaKc extension methods – data process § Choose
– MulGple sources for an acGon – There will be only one message processed
§ OutputAvailableAsync – Analog of Stack.Peek operaGon
§ Post/SendAsync – Always async data propagaGon – You can postpone the message with SendAsync , not with Post
§ Receive(Async)
TPL Dataflow – crash course Advanced topics
20 Апрель 2015
21
StaKc extension methods – outer API § Encapsulate (propagator block factory method) § LinkTo
– Filter the message propagaGon – Link opGons – Do not confuse with ISourceBlock<T> method
§ AsObsevable(er) – Rx extension support, no holy war here
new DataflowLinkOptions { MaxMessages = 1, Append = false, PropagateCompletion = true
}
TPL Dataflow – crash course Advanced topics
20 Апрель 2015
22
Deep inside § Implement your own block § Advanced debug info with DebuggerDisplayAttribute
§ Chapter #4 in Concurrency in C# Cookbook by Stephen Cleary
TPL Dataflow – crash course Advanced topics
20 Апрель 2015
23
Gains When to use Losses
TPL Dataflow – crash course Ending
20 Апрель 2015
Gathering it up
§ Thread Safety § Structured dataflow § Async TPL-‐oriented § Rx-‐Extension support § CCR-‐oriented code
easy migraGon
§ Another abstracGon layer
§ Hard debug when dataflow is complicated
§ Too many generics
§ You have a random data which must be ordered
§ CPU and I/O operaGons § You can parallelize work
Your QR Code
I am at your disposal in case of any questions or doubts
20 Апрель 2015
Mikhail Veselov
Moscow
+7 911 951 42 98