OpenNF: Enabling Innovation in Network Function Control Aaron
32 Slides1.02 MB

OpenNF: Enabling Innovation in Network Function Control Aaron Gember-Jacobson, Chaithan Prakash, Raajay Viswanathan, Robert Grandl, Junaid Khalid, Sourav Das, Aditya Akella 1

Network functions (NFs) Perform sophisticated stateful actions on packets/flows WAN optimizer Caching proxy Intrusion detection system (IDS) 2

NF trends NFV dynamically allocate NF instances SDN dynamically reroute flows WAN optimizer Dynamic reallocation of packet processing Xen/KVM Caching proxy Intrusion detection system (IDS) 3

Example: elastic NF scaling 1. Satisfy performance SLAs 2. Minimize operating costs 3. Accurately monitor traffic CPU Packet loss 4

Example: elastic is NFinsufficient scaling Problem: NFV SDN To simultaneously 1. Satisfy performance SLAs 2. Minimize operating costs 3. Accurately monitor traffic CPU Cannot effectively implement new services or abstractions! Packet loss 5

Why NFV SDN falls short ? Packet loss SLA: 1% 1. SLAs 2. Cost 3. Accuracy Reroute new flows [Stratos - arXiv:1305.0209] Reroute existing flows ? [SIMPLE - SIGCOMM ‘13] Wait for flows to die [Stratos - arXiv:1305.0209] 6

SLAs cost accuracy: What do we need? Quickly move, copy, or share internal NF state alongside updates to network forwarding state Guarantees: loss-free, order-preserving, 1 2 3 Also applies to other scenarios 7

Outline Motivation and requirements Challenges OpenNF architecture – State export/import – State operations – Guarantees Evaluation 8

Challenges 1. Supporting many NFs with minimal changes 2. Dealing with race conditions Route Update Packet State 3. Bounding overhead 9

OpenNF overview Control Application move/copy/share state OpenNF NF State Manager Controller export/import State Flow Manager 10

NF state taxonomy State created or updated by an NF applies to either a single flow or a collection of flows Per-flow state TcpAnalyzer Connection HttpAnalyzer Connection TcpAnalyzer HttpAnalyzer Multi-flow state ConnCount All-flows state Statistics 11

NF API: export/import state Functions: get, put, delete put Per Scope Multi All Filter NF get No need to expose/change internal state organization! 12

Control operations: move Control Application Flow Manager move (port 80, Bro1, Bro2) get(per, port 80) del(per, port 80) forward(port 80, Bro2) NF State Manager [Chunk1] [Chunk2] put (per, Chunk1) put (per, Chunk2) Bro1 Bro2 Also provide copy and share 13

Lost updates during move move(red,Bro1 ,Bro2 ) Missing state R2 R3 detectMHR Missing updates R1 R2 B1 Bro1 Bro2 Loss-free: All state updates should be reflected in the transferred state, and all packets should be processed Split/Merge [NSDI ‘13]: pause traffic, buffer packets – Packets in-transit when buffering starts are dropped 14

NF API: observe/prevent updates using events R2 B1 R1 NF R1 R1 Only need to change an NF’s receive packet function! 15

Use events for loss-free move 1. enableEvents(red,drop) on Bro1 2. get/delete on Bro1 3. Buffer events at controller 4. put on Bro2 R1 R3 R2 5. Flush packets in events to Bro2 R1 Drop R2 6. Update Bro1 forwarding R1,R2,R3 R1,R2 Bro2 16

Re-ordering of packets False positives from Bro’s weird script Controller Bro1 5. Flush buffer R2 6. Request forwarding update Switch R2 R3 R3 Bro2 R2 R4 R3 R4 R3 R3 Order-preserving: All packets should be processed in the order they were forwarded by the switch 17

OpenNF: SLAs cost accuracy 1. Dealing with diversity Export/import state based on its association with flows 2. Dealing with race conditions Events Lock-step forwarding updates 18

Implementation Controller (3.8K lines of Java) Communication library (2.6K lines of C) Modified NFs (3-8% increase in code) Bro IDS iptables Squid Cache PRADS 19

Overall benefits for elastic scaling Bro IDS processing 10K pkts/sec – At 180 sec: move HTTP flows (489) to new IDS – At 360 sec: move back to old IDS SLAs: 260ms to move (loss-free) Accuracy: same log entries as using one IDS – VM replication: incorrect log entries Cost: scale down after state is moved – Stratos: scale down delayed 25 minutes [arXiv:1305.0209] 20

Evaluation: state export/import Serialization/deserialization costs dominate Cost grows with state complexity 21

Evaluation: operations 450 400 350 300 250 200 150 100 50 0 Packets dropped! 686 462 NG NG Bro: 5% of alerts missed! NG LF PL ER PL ER OP PL ER NG PL PL LF Operations are efficient, but guarantees come at a cost! Per-packet Latency Increase (ms) Move Time (ms) PRADS asset detector processing 5K pkts/sec Move per-flow state for 500 flows 881 packets 200 180 160 140 120 100 80 60 40 20 0 in events Average Maximum 838 pkts 1120 pkts in events buffered 22

Conclusion Dynamic reallocation of packet processing enables new services Realizing SLAs cost accuracy requires quick, safe control of internal NF state OpenNF provides flexible and efficient control with few NF modifications http://opennf.cs.wisc.edu 23

Backup Related work Copy and share Order-preserving move Bounding overhead Example control application Evaluation: controller scalability Evaluation: importance of guarantees Evaluation: benefits of granular control 24

Existing approaches Virtual machine replication – Unneeded state incorrect actions – Cannot combine limited reallocation Split/Merge [NSDI’13] – State allocations and accesses occur via library – Addresses a specific problem limited suitability – Packets may be dropped or re-ordered wrong NF behavior 25

Copy and share operations Used when multiple instances need some state Copy – no or eventual consistency – Once, periodically, based on events, etc. Share – strong or strict consistency – Events are raised for all packets – Events are released Copy (multi-flow): 111ms one at a time Share (strong): 13ms/packet – State is copied before releasing the next event 26

Order-preserving move Flush packets in events to Inst2 enableEvents(blue,buffer) on Inst2 Forwarding update: send to Inst1 & controller Wait for packet from B3 B4 switch (remember last) Forwarding update: Buf Drop B3 send to Inst2 B1 Wait for event for last packet from Inst2 Release buffer of packets on Inst2 B2 B1,B2, B1,B2 B1 B3 B3,B4 27

Bounding overhead Applications decide (based on NF & objectives): 1. Granularity of Per operations Multi Filter 2. Guarantees desired Scope All None LF 3 1 2 LF OP 28

Example app: elastic NF scaling scan.bro vulnerable.bro weird.bro movePrefix(prefix,oldInst,newInst): copy(oldInst,newInst,{nw src:prefix},multi) move(oldInst,newInst,{nw src:prefix},per,LF OP) while (true): sleep(60) copy(oldInst,newInst,{nw src:prefix},multi) copy(newInst,oldInst,{nw src:prefix},multi) 29

Evaluation: controller scalability Improve scalability with P2P state transfers 30

Evaluation: importance of guarantees Bro1 processing malicious trace @ 1K pkts/sec After 14K packets: move active flows to Bro2 Alert Baseline Incorrect file type 26 MHR Match 31 MD5 116 Total 173 NF 25 28 111 164 LF 24 27 106 157 LF OP 26 31 116 173

Evaluation: benefits of granular control HTTP requests from 2 clients (40 unique URLs) Initially: both go to Squid1 20s later: reassign Client1 to Squid2 Hits @ Squid1 Hits @ Squid2 State transferred Ignore 117 Crash! 0 MB Copy-client 117 39 4 MB Copy-all 117 50 54 MB