Outage Earlier Today
Monday, May 11th, 2009, we were down between 5:00pm and 6:15pm EST. The trouble began when one of our DB servers had a process list build up that began affecting the web servers. For about 10 minutes, starting at 5:10pm, the DB server recovered and the site was responsive. We thought the problem was fixed. Shortly after, we found out the server could not fully recover, so we began the reboot and crash recovery process. This, combined with the rebuilding of cache, took the remaining time.
We have a few leads on potential symptoms that could have caused this downtime. We’ll be investigating and hopefully coming up with a concrete answer. We apologize to everyone who has been affected by this outage.