Friday, November 17, 2017

Taming the stragglers in Google Cloud Dataflow

I'm currently bench-marking Flink against Google Cloud Dataflow using the same Apache Beam pipeline for quantitative analytics. One observation I've seen with Flink is the tail latency associated with some shards. 

Google Cloud Dataflow can optimise away stragglers in large jobs using "Dynamic Workload Rebalancing".  As far as I know, Flink is currently unable to perform similar optimisations.