Preview only show first 10 pages with watermark. For full document please download

Efficient Constraint Monitoring Using Adaptive Thresholds

   EMBED

  • Rating

  • Date

    August 2018
  • Size

    978.7KB
  • Views

    1,426
  • Categories


Share

Transcript

Efficient Constraint Monitoring Using Adaptive Thresholds Srinivas Kashyap, IBM T. J. Watson Research Center Jeyashankar Ramamirtham, Netcore Solutions Rajeev Rastogi, Yahoo! Labs Bangalore Pushpraj Shukla, Univ of Texas at Austin Talk Outline •Motivation •Constraint monitoring architecture •Existing approaches •Problem formulation •Markov-based algorithm •Reactive algorithm •Experimental results •Conclusions Constraint Monitoring Problem • Detect violation of distributed SUM constraints – Distributed Triggers [Jain et al. 04] X1 Constraint: X1+…+Xn  T Detect TT X1+…+Xm Sites: X1 Xn time Variables Applications: Network Monitoring Alert when sum of link delays along a Voice over IP path exceeds 200msec Network Operations Center (NOC) Identify all destinations that receive more than 2GB of traffic from the monitored network in a day, and report their transfer totals Monitor the volume of remote login (telnet, ssh, ftp etc.) requests received by hosts within the organization that originate from the external hosts. Source 10.1.0.2 18.6.7.1 13.9.4.3 15.2.2.9 12.4.3.8 10.5.1.3 11.1.0.6 19.7.1.2 Destination 16.2.3.7 12.4.0.3 11.6.8.2 17.1.2.1 14.8.7.4 13.0.0.1 10.3.4.5 16.5.5.8 Duration 12 16 15 19 26 27 32 18 Bytes 20K 24K 20K 40K 58K 100K 300K 80K Example NetFlow IP Session Data Protocol http http http http http ftp ftp ftp Constraint Monitoring Architecture Coordinator Sites Variables: X1 Local thresholds: T1 Xn Tn T  T i i • At site j: – if Xj>Tj: send alarm to coordinator (with Xj value) • At coordinator: – if Xj + i j Ti  T : poll Xi values to check if constraint is violated (global poll) Existing Approaches: Zero Slack • Local thresholds satisfy: iTi  T • Drawback: every alarm  global poll ( X j  i j Ti  T ) Existing Approaches: Zero Slack • Static local thresholds [Jain et al. 04] T Ti  n • Dynamic local thresholds [Sharfman et al. 06] – Thresholds reset each time alarm generated Slack S  T - iX i S Ti  X i  n Non-zero Slack [Dilman and Raz 01] • Threshold setting with slack: T  iTi  0 Slack = 30 • Slack leads to fewer global polls Non-zero Slack: Key Questions • How to set local threshold values so that constraint violations can be detected with minimal communication overhead? – – T T i i i i too low  too many local alarms too high too many global polls • How to adapt thresholds for changing data distributions? Communication Cost Model • Define Yi = Xi if Xi >Ti = Ti otherwise • Coordinator’s SUM estimate Y = i Yi • Probability of local alarm Pl(i) = Pr[Xi > Ti] • Probability of global poll Pg = Pr[Y > T] • Local alarm is O(1) messages, global poll is O(n) messages • Expected cost = n * Pg + i Pl(i) Problem Formulation • Given threshold T and variables Xi at sites, select local thresholds Ti such that the cost n * Pg + i Pl(i) is minimized. Key Challenge Computing P  Pr[ Y  T] g i i • Depends on Ti values at all sites • Optimal value computation requires enumerating all Ti value combinations • Each site maintains histogram: Hi(v) = Pr[Xi = v]  Ti • Then P(i)  1- v 0 H(v) l i – Can be computed locally for specific Ti value Markov-based Algorithm • Key idea: Use Markov’s inequality to decompose Pg into components that can be computed locally Pg  E[i Yi ] E[Y ]   i i T T n * E[Yi ] Cost  (i  P(i)) l T E[Yi ]  v 0 Ti * H(v)  v T 1 v * H(v) i i Ti T i • Each site can independently determine Ti value that minimizes its contribution to the total cost Drawback of the Markov Algorithm • Markov’s inequality over-estimates global poll probability Pg – computed thresholds Ti’ lower than optimal Reactive Algorithm • Key idea: Use local alarms and global polls to adjust Ti values l • Let λ i  P(i) at thresholds Ti’ computed by Markov Pg 1 • On local alarm: With probability , set Ti  α * Ti λi Ti • On global poll: With probability λ i , set Ti  α Analysis # of local alarms • At stable state, λ i  # of global polls • Since Markov inequality over-estimates Pg, at Ti’ # of local alarms λi  # of global polls • So thresholds Ti will converge to values > Ti’ Experimental Results • Real-life datasets – Netflow traces from Abilene network: 73 million packets across 11 routers – Link traces from NLANR: 21 million packets • Distributed constraint: Total amount of traffic flowing into network across ingress links  T • Schemes considered – Geometric [Sharfman et al. 06], Markov-based, Reactive Communication Savings (Abilene Dataset) Breakup of Message Overhead (Abilene Dataset) Geometric Reactive Markov Effect of scale (NLANR Dataset) Summary • Reactive algorithm for setting local thresholds in non-zero slack setting – Uses on Markov’s inequality to simplify global poll probability estimation – Adjusts thresholds in response to local alarm and global poll events – Adapts to changing data distributions • Reactive algorithm incurs 60% less communication overhead compared to the state-of-the-art zero slack scheme