Thursday, November 5, 2009

DNS Performance and the Effectiveness of Caching

[paper]

As mentioned in the previous paper, DNS servers aggressively cache address data. The previous paper also suggests that negative caching could give a performance boost. This paper questions and tests the effectiveness of both of these mechanisms using DNS and associated TCP traffic from MIT CSAIL and KAIST, circa 2000 and 2001.

Background: Many servers ("stub servers" or "stub resolvers") do nothing but cache responses and act as proxies for resolvers (they will query another server if they do not already have the data cached). (Stub servers answer "recursive queries", whereas authoritative servers receive "iterative queries.")

They collected outgoing DNS queries, incoming DNS responses, and TCP session start and end packets. Notably they cannot observe DNS lookups cached inside their network (eg on the client itself). They removed all TCP sessions that didn't first generate a TCP lookup; I assume this removes the effect of client-cached addresses. They also removed all DNS A-record lookups not part of a TCP session; I don't understand why not or how this skews their findings, and this bothers me.


Their results:

-- About 80% of DNS lookups don't require a referral; this means the first NS contacted has the answer.
-- Total number of DNS queries is much higher than DNS lookups, meaning that many lookups require query retransmission. Despite retransmissions, about 20% of clients never got any answer (not even an error). They suggest that it is better to give up sooner than to keep retransmitting, and let the application figure out what else to do. Between 12-19% of their lookups did not have any retransmissions. Aggressive retransmissions can cause a lot of traffic -- 63% of all DNS queries they saw were for lookups that never received a response!
-- 13% of client lookups result in errors (most say the name does not exist).
-- Distribution of both successful and failed DNS requests are heavy-tailed. For one set of traces, 10% of names account for 68% of answers but then the remaining 32% are very distributed.
-- They do believe that NS record caching is important. Only 20% of responses came from a root server, which means that if TTLs on NS records were shorter, the root server load would increase by 5x.

Fact to remember: Address (A) records store IP addresses, and NS records say what name server is responsible for an IP address

---------------

Class Notes:

-- Top sites accessed a lot, but low TTL; then many sites accessed infrequently. Reminds me of the 80:20 rule.
-- Key point is to stop with aggressive retransmissions -- if multiple attempts get no response, give up because it's not helping.

No comments:

Post a Comment

About Me

Berkeley EECS PhD student