Resilient Datacenter Load Balancing in the Wild Hong Zhang1
33 Slides805.05 KB

Resilient Datacenter Load Balancing in the Wild Hong Zhang1 Junxue Zhang1, Wei Bai1, Kai Chen1, Mosharaf Chowdhury2

Background Datacenter networks --- multi-rooted trees (e.g., Fat-tree, Leaf-spine) Multiple-paths between each end host pair Precise load balancing is required Difficult because datacenters are filled with uncertainties Switch & server icon source: CONGA [SIGCOMM’14]

Uncertainties Networks in Datacenter Traffic dynamics Congestions can quickly arise at any place

Uncertainties Networks in Datacenter Asymmetries Link cuts Heterogenous devices Spine 40G Link cut 10G Leaf

Uncertainties Networks in Datacenter Switch Failures Packet blackholes: drop packets with certain patterns deterministically; Silent random packet drops: drops packets randomly at a high rate; ‘Gray failure’ Spine Leaf

Uncertainties Networks in Datacenter Uncertainties: traffic dynamics, asymmetries, switch failures ‘Gray failure’ Spine Link cut Leaf How to effectively and appropriately load balance traffic?

Sensing Uncertainties efficiently sense congestion & failures Prior arts have important drawbacks in both Reacting to Uncertainties appropriately split traffic among parallel paths in reaction to uncertainties

Sensing Uncertainties --- Current Practice Sensing Congestion Congestion-oblivious Poor under asymmetry ECMP, RPS[INFOCOM’09], DRB[CoNEXT’13], Presto[SIGCOMM’15] Congestion-aware Switch-based Advanced hardware CONGA[SIGCOMM’14], HULA[SOSR’16], DRILL*[SIGCOMM’17] End host-based CLOVE-ECN[HotNets’16] Sensing Failures Most current solutions do not sense failures Limited visibility

Dest Leaf Problem of Being Failure-ignorant Path S0 S1 L1 5 S0 5 Random drop L1 L0 S1

Dest Leaf Problem of Being Failure-ignorant Path S0 S1 L1 21 S0 5 Random drop L1 L0 S1

Dest Leaf Problem of Being Failure-ignorant Path S0 S1 L1 2 S0 2 Random drop L1 L0 S1 Even worse than ECMP under failures

Reacting to Uncertainties --- Current Practice Problem of flowlet switching --- CONGA[SIGCOMM’14], CLOVE[HotNets’16], LetFlow[NSDI’17], Flowlet gap Passive and conservative in order to preserve packet orders

Reacting to Uncertainties --- Current Practice Problem of flowlet switching Flow A, B finish P1 Flows A L0 B C D P1 L1 P2 Flow C reroute from P2 to P1 Ideal Time Flow A, B finish P2 Flow C, D finish P1 Flow C, D finish Cannot find a flowlet gap P2 CONGA (flowlet) DCTCP Time Cannot always timely react to uncertainties

Reacting to Uncertainties --- Current Practice Problem of vigorous rerouting Packet reordering Congestion mismatch What is congestion mismatch? Congestion control: adjust rates based on the congestion of the current path; With vigorous rerouting: congestion states of different paths are mixed together; Congestion on one path may be mistakenly used to adjust the rate on another path

Reacting to Uncertainties --- Current Practice Example of congestion mismatch 10G 1G 1G ell Start with high sending rate 10G L1 c ow 1 fl Flow A DCTCP 10G 10 L0 S0 flo wc e lls Sending rate keeps increasing Flowcell S1 (Presto[SIGCOMM’15] ) fix sized data units

Reacting to Uncertainties --- Current Practice Example of congestion mismatch 10G L1 10G c ow 1 fl Flow A DCTCP 10G 10 L0 S0 flo wc e lls Start with low sending rate 1G ell 1G Rate reduce greatly S1

Reacting to Uncertainties --- Current Practice Example of congestion mismatch S0 10G L1 10G c ow 1 fl Flow A DCTCP 10G 10 L0 flo wc el ls Cannot fully utilize 10Gbps 1G e ll 1G Severe queue build-up S1 Congestion mismatch leads to performance loss

Q: Can we design a resilient load balancing scheme that can gracefully handle all these uncertainties? Comprehensiveness: effectively detect congestion and failures Timeliness: quickly react to uncertainties Transport-friendliness: limited impact of reordering and congestion mismatch Deployability: implementable with commodity hardware Herme

Hermes in One Slide Endhost-based --- No hardware/kernel modification Network Traffic End host Hypervisor Sensing Module (Re)Routing Module Sensing Feed When & Trigger Congestion Sensing Failures Where to reroute? Probe Active Probing (Re)Route Comprehensive Sensing Leveraging Transport-layer signals & events Active probing with small costs Timely yet Cautious Rerouting Explicitly consider both the cost and gain of rerouting

Comprehensive Sensing Idea 1: Leveraging transport-level signals & Sensing Congestion events ECN and RTT ----- widely used in congestion control, directly observable Sensing Failures Packet blackhole --- Frequent timeout Random packet drop ---- Frequent retransmission Failed paths

Comprehensive Sensing Idea 1: Leveraging transport-level signals & Sensing Congestion events ECN and RTT ----- widely used in congestion control, directly observable Sensing Failures Packet blackhole --- Frequent timeout Random packet drop ---- Frequent retransmission Failed paths Idea 2: Improving visibility via active probing Baseline Power of 2 Choices Probe all paths for all endhost pairs Probe 2 random 1 previous best path Sacrifice some visibility for much smaller probing overhead

Timely yet Rerouting Cautious When to reroute? Flowlet-switching: too conservative for timely reaction Vigorous-switching: too aggressive to be transport-friendly Can we achieve a better trade-off by explicitly considering both the cost and gain of rerouting? A new angle: utility-based rerouting ----- reroute when it is likely to be beneficial final performance vs. intermediate consequences Estimated based on both path conditions and flow status obtained from comprehensive sensing.

Timely yet Rerouting Cautious A simplified cost-benefit assessment for rerouting Rate R2 Do not reroute R1 0.5R1 Remaining size R1 T1 T2 T1 Motivation for timely rerouting Time

Timely yet Rerouting Cautious A simplified cost-benefit assessment for rerouting Rate Reroute R2 Do not reroute R1 0.5R1 Remaining size R1 T1 T2 T1 Time Motivation for timely rerouting Rerouting can be beneficial even with packet reordering; Reroute immediately as long as it is likely to reduce flow completion time. Quick reaction to uncertainties

Timely yet Cautious Rerouting A simplified cost-benefit assessment for rerouting Rate R2 R2 R1 0.5R1 Reroute Reroute Estimation Error Do not reroute Remaining size R1 T1 Heuristics for cautious rerouting T2 T1 T2 Time Reroute only if new path is notably better (in terms of ECN&RTT);

Timely yet Cautious Rerouting A simplified cost-benefit assessment for rerouting Rate R2 Reroute Do not reroute R1 Remaining size R1 T1 0.5R1 T1 T2 Heuristics for cautious rerouting Time Reroute only if new path is notably better (in terms of ECN&RTT); Avoid rerouting flows with small remaining size;

Timely yet Cautious Rerouting A simplified cost-benefit assessment for rerouting Rate R2 R1 R’2 Do not reroute Remaining size R1 T1 Heuristics for cautious rerouting T1 Time Reroute only if new path is notably better (in terms of ECN&RTT); Avoid rerouting flows with small remaining size; Avoid rerouting flows with high sending rate R1; Limited impact of congestion mismatch and packet reordering

Evaluation Settings 1 Workload CDF Web-search Data-mining 0.8 0.6 0.4 Web Search 0.2 Transport Protocol DCTCP Testbed Evaluations 12 servers, 4 switches 2X2 leaf spine with 3:2 oversubscription ratio 0 1.0E 01 1.0E 03 1.0E 05 1.0E 07 1.0E 09 Size (Bytes) Large Scale Simulations 128 servers, 16 switches 8X8 Leaf Spine with 2:1 oversubscription ratio

Evaluation Results Hermes under baseline topology (8*8 leaf-spine) 35 30 25 20 15 10 5 0 More More heavy heavy tailed, tailed, less less bursty, bursty, thus thus more more difficult difficult to to create create flowlet flowlet gaps gaps 50 ECMP 40 CONGA 30 50 FCT (ms) FCT(ms) Switch-based Switch-based solution solution has has better better visibility visibility to to congestion congestion 70 Load (%) 90 Web-Search Workload Outperforms ECMP by up to 55% Within 17% of CONGA 30 ECMP CONGA HERMES 20 10 0 30 50 70 Load (%) 90 Data-Mining Workload 29% better than ECMP at high load slightly outperform CONGA (up to 4%)

Evaluation Results Hermes under asymmetric case (data-mining workload) FCT (Norm. to Hermes) Reduce the capacity from 10Gbps to 2Gbps for 20% of randomly selected leaf-to-spine links 1.5 CLOVE-ECN CONGA Presto* LetFlow Hermes (Weighted) Presto*: congestion-oblivious, thus not efficient against asymmetry 1.3 1.1 LetFlow & CLOVE-ECN: Hermes has better visibility and more timely reaction 0.9 0.7 0.5 30 40 50 60 Load (%) 70 80 90 CONGA: Hermes can more timely resolve collisions of large flows on 2Gbps links

Evaluation Results 30 25 20 15 10 5 0 ECMP Presto* 30 40 CONGA LetFlow 50 60 Load (%) Hermes FCT (ms, log based) FCT(ms) Hermes under switch failures 70 Silent random packet drops 1000 ECMP Presto* CONGA LetFlow Hermes 100 10 1 30 40 50 60 Load (%) Packet blackhole Outperform other schemes by over 32% 70

Conclusion Datacenter is filled with uncertainties Hermes: a resilient load balancing scheme that gracefully handles uncertainties. Readily-deployable at end hosts Sensing Congestion & failure-aware Improved visibility Reacting Timely & Cautious rerouting

Thank You!