Sunday, August 23, 2015

Paper Review #5: Understanding BGP Misconfiguration

The paper presented a quantitative study of BGP misconfiguration, the frequency of its occurrence, possible causes, its overall impact on Internet connectivity  as well as different ways to prevent these instances. Border Gateway Protocol (BGP), the Internet’s inter-domain routing protocol, is a crucial part of the overall reliability of the Internet. Misconfigurations in BGP may result in excessive routing load, connectivity disruption and policy violations. 

The authors analyzed  the BGP updates over  a period of 21 days from 23 different vantage points across a diverse set of ISPs and have validated the results by emailing the operators involved in the incidents. Two globally visible BGP misconfigurations were considered: Origin misconfiguration (caused by initialization bugs, reliance on upstream filtering, old configuration, redistribution, communities, hijacks, forgotten filter, incorrect summary) and Export Misconfiguration (caused by prefix based configuration, Bad ACL or route map). These causes are categorized into slips(error in the execution of a correct plan) and mistakes(design mistakes).

Based on their study, the authors found out that the Internet is surprisingly robust to most misconfigurations. Connectivity is affected in only 4% of the misconfigured announcements or 13% of the misconfiguration incidents. However, the effect on routing load is quite significant. 

To reduce the Internet’s vulnerability to accidental errors, the authors proposed solutions such as user interfaced design improvement, high-level language design, database consistency and deployment of protocol extensions (S-BGP).  

As the primary goal is to minimize human errors in large distributed system, I think redesigning the system to limit the need for interaction with an operator will likely help in avoiding these misconfigurations. Automated monitoring can also be added.  


Reference: 

R. Mahajan, D. Wetherall, T. Anderson, “Understanding BGP Misconfiguration”, August 2002

1 comment:

  1. A very concise and precise summary of the paper! It is noteworthy that the review was able to mention the experiment conditions (i.e. time frame, duration, data gathering etc) as these help give an overview to the reader on how the experiment was conducted and how did the paper reached a conclusion and produce results.

    I also agree that maybe limiting human/operator interaction would help in minimizing errors since according to the paper, much of the errors were caused by operators either due to lack of knowledge in configuration or error in execution. :) Maybe with this, the operator's time and efforts can be channeled to setting up the template/framework upon which future systems could be formed from. Of course, regular check-ups by a real human might still be necessary especially in the first few weeks or months of this automation. :)

    Thank you so much, Fatima! Really enjoyed reading this review. :D

    ReplyDelete