Monday, September 3, 2007

Viability of DNS failover

In Spring 2006, my site was plagued with recurrent hardware problems causing serious downtime. At the time, the site was hosted on a dedicated server and I had no failover strategy whatsoever so when a hard disk failed on the server, you could expect a few days of downtime.

At the beginning of the summer, I got fed up and started investigating possible solutions to this problem and, after some experimentation, finally settled on DNS failover. Here are the results of the experimentation, originally posted on WebHosting Talk:

I run a site with about 1,000,000 unique visitors per month and recent server failures made me decide to get a failover server to minimize downtime. My goal wasn't to get 99.999% uptime but to be able to be back on track after a failure in a "reasonable" amount of time. After evaluating several solutions, I decided to go with DNS failover. Here's how the setup work:

1) points to main server with a very low TTL (time to live)
2) failover server replicates data from main server
3) when main server goes down, is changed to point to failover server

The drawback is the DNS propagation time since some DNS servers don't honor the TTL and there is some caching happening on the user's machine and browser. I looked for empirical data to gauge the extent of the problem but couldn't find any so I decided to setup my own experiment:

I start with pointing to the main server with a TTL of 1800 seconds (1/2 hour). I then change it to point to the failover server which simply port forwards to the main server. On the main server, I periodically compute the percentage of requests coming from the failover server which gives me the percentage of people for which the DNS change has propagated.

I made the DNS change at exactly 16:04 on 06/21/06 and here are the percentage of propagated users:

06/21/06 16:00 0 %
06/21/06 16:05 3 %
06/21/06 16:10 20 %
06/21/06 16:15 37 %
06/21/06 16:20 59 %
06/21/06 16:25 69 %
06/21/06 16:30 76 %
06/21/06 16:35 80 %
06/21/06 16:40 86 %
06/21/06 16:45 90 %
06/21/06 16:50 91 %
06/21/06 16:55 92 %
06/21/06 17:00 93 %
06/21/06 17:05 94 %
06/21/06 17:10 94 %
06/21/06 17:15 95 %
06/21/06 17:35 95 %
06/21/06 17:40 96 %
06/21/06 17:45 97 %
06/22/06 10:40 99 %

So even after 18 hours, there is still a certain percentage of users going to the old server so DNS failover is obviously not a 99.999% uptime solution. However, since more than 90% of the users are propagated in the first hour, the solution works well enough for me.


NuvoDev Technologies said...

Great blog article about this topic, I have been lately in your blog once or twice now. I just wanted to say my thanks for the information provided here.

Servesh Singh said...

Hello VIP’s this is Koena a most attractive Delhi Escort.

Hi! Guys this is Tashu sizzling Delhi Escorts.

Hello welcome to Good Night Delhi Escorts.

We guarantee your satisfaction to the fullest and promise to make you feel craving for our

Delhi Escorts.

High profile Delhi Escorts.

Most attractive Delhi Escorts for only VIP


Hire the most beautiful Escort girl from upscale Delhi Escorts


Hi! This is Dipti Sinha,I am a positive, well knowledgeable Independent Escorts in Pune.

Hi! myself Nirja Bhave an Independent Escorts in Pune.

Hi! This is Kiran Joglekar mind blowing Pune Escort.

Nisha Gohalekar-I have an ideal fashioned body, scrambling shapes and sleek skin Escort in Pune.

Beautiful Sexy Goa Escorts For memorable time in


Are You looking beautiful Indian Escorts