Wednesday, November 18, 2009

Cutting the Electric Bill for Internet-Scale Systems

This is a REALLY cool paper!

Their idea is to cut a company's electric bill by servicing traffic at data centers with low energy prices. This might mean servicing a host in Boston from a data center in Virginia (rather than NY) if VA is cheaper than NY at that time. "Preferred data centers" would be recalculated every hour based on that hour's regional energy prices. By shifting traffic around based on energy price, the total electricity cost of the system is minimized. They look at energy prices for different regions and note that not all preferences can be pre-determined; sometimes static routing might work (always prefer one region over another, e.g., Chicago vs Virginia), but other times dynamic routing would be preferential (e.g. Boston vs NY). They use a real, large Akamai traffic data set to examine the usefulness of their idea. They find that energy savings could be up to 40% with perfect conditions.

Their idea has four caveats:

(1) Electrical elasticity refers to how much power a server consumes when idle. If idle & active costs are identical, then this plan doesn't work at all. If idle uses 0 power, then you'd have ideal conditions. The reality is somewhere in the middle. Google's elasticity rate is 65%. With perfect elasticity, they find 40% savings; with Google's elasticity rate, they only see 5% savings.

(2) This could increase latency, which might not be tolerable for some applications. Looking at normal Akamai traffic (Akamai is a latency-sensitive CDN), they see that geo-locality is not always adhered to anyway; perhaps geo-locality does not actually correspond to performance. I would be interested in seeing more work on this point.

(3) Increasing the routing path could have the side effect of incurring extra energy expenditures along the routing path. However, they say this is not a big problem because routers energy use is not proportional to traffic.

(4) The 95/5 rule of bandwidth billing generally states that if the 95th percentile of traffic exceeds the committed rate, then the consumer has to pay substantially more for the overage. This means that their work is constrained by the 95/5 rule and they can't increase traffic to one data center beyond that point. The extra bandwidth would be more expensive than the energy they are saving. If the 95/5 rule is adhered to, perfect elasticity only yields 15% savings; without the 95/5 rule, you see their 40% max savings figure. Note that with both the 95/5 rule and 65% elasticity, Google only sees a 2% savings (which is still $1M+).

One question I have is whether this system would be self-affecting. That is, to say: if a large company like Google actually moving its computation to its cheaper DCs, would power costs in those regions increase? (Conversely, if all the DCs in Silicon Valley moved their servers to Oklahoma, would Silicon Valley prices go down?)

------------------

Notes -

Google #s should be higher now. Their machines used to be single-app, however now they are multi-app. Regardless though, utilization is likely under 60%...over 60% you see an increase in latency because of competition for resources.

2 comments:

  1. The paper makes some sense, but much of the art of the DC is to figure out how to run them at high levels of utilization. The opportunities for load balancing are not so clear. That stated, they did work with Bruce Maggs from Akamai, is there is value in the approach.

    ReplyDelete
  2. I was wondering the same thing (ie. if large companies moved to the cheaper power regions, would it still be lower cost?). With more demand, it makes sense that prices would go up, but it's possible that there are reasons other than just amount of power usage that affect the cost of power in a certain region.

    ReplyDelete

About Me

Berkeley EECS PhD student