Today, the good people of Major League Baseball suffered what looked like a denial-of-service attack which kept them from selling tickets to the (at least) the World Series games in Denver (at Coors Field). This "attack" started at the same time as tickets sales began - 1000 MDT.
Amusingly enough, this apparent denial-of-service attack was probably caused by customers. This year's World Series promises to be a good one, between the venerable Boston Red Sox[1] and the unbelievably hot and exceptional Colorado Rockies[2] (Go Rockies!), who are in the World Series for the first time ever, and who have played some absolutely amazing baseball in recent weeks.
As a result of this first-of-a-kind opportunity, many Coloradans stayed home from work, or took a "break" from work to order tickets all at once. The server infrastructure couldn't stand up to the load, and no one got enough packets through to be able to order any tickets.
Since I was one of those trying to order tickets, and this was clearly a lack of availability in a critical time, I did a tiny bit of investigation. It appears that they had about 15 servers in their mix. The ticket sales are being managed by Paciolan[3].
In an event like this, it is vital that they have both a load balancing methodology and a load-shedding methodology. It appears that they had both. However, the symptoms suggest that their load-shedding methodology was insufficient to this incredible load. Some of us Coloradans may be fickle Rockies' fans, but we're sure loyal when they're hot - especially when our other teams aren't going anywhere! Unfortunately, the Rockies were too hot and and their fans too loyal for Pacolian's load shedding infrastructure.
Here's what I can see about their infrastructure from the outside:
- They have external web servers which put you in a holding pattern, and try and get through to the inner sanctum of web servers once a minute.
- If you get in to the inner web servers, you can then order tickets, presumably without a heavy overload, since the inner infrastructure limits the number of simultaneous users.
However, there is an Achilles' heel here - which both the loyal and the fickle Rockies' fans ran immediately into. You have to have enough network bandwidth to allow people access to the outer infrastructure. If you don't, various bad things can happen - your load balancers can crash, your routers can crash. I'm not a networking expert, but if the offered load is an order of magnitude or two higher than the incoming infrastructure can support, most packets won't get through. If most don't get through, then it doesn't matter how good the load shedding methods are, or how robust your servers are. Customers can't buy your product. OOPS!
Pacolian claimed that they were the victims of a real DDOS attack, and that they measured 8.5 million hits in an hour. Quite honestly, that doesn't sound that high to me. Personally, I had 4 browsers going at once. 8.5 million hits and hour is only 2361 hits/second, which is less than 142K hits/minute. In my judgment, between people like me, school kids, scalpers, etc. 142K people isn't very many. If they were all running 3 or 4 browsers, it would only take about 35K people - which is basically nothing. Also, when you look at network bandwidth, ethernet links, no matter how fast they are, can only support a relatively small number of packets/second maximum because of minimum times required between packets. IIRC, that number is something like 1000 packets/sec. If they only had a single gigabit link to their infrastrucure, 142K people would be 2 orders of magnitude larger than that - which would put us in the right ball park (pun intended) for the behavior that was observed.
One can look at the fact that they almost certainly had only a single site to take this load as a single point of failure. The networking infrastructure to that site failed. Exactly how it failed, I can't say. The fact that it failed is indisputable.
The good news for the Rockies is that because it's the World Series - eventually, somehow those tickets will get sold. But, in the mean time, they appear to have dumped Paciolan for this event[4]. This is unfortunate from my perspective, since I'm in the UK at the moment, and can't exactly run down to Coors Field to wait in line to buy tickets. Oh well. I guess it's not really all about me, eh? :-D
But, this will take longer, and has already aggravated customers a great deal. The Rockies will probably survive this error - after all that's not their specialty - they're baseball players. But, it will take Paciolan a while to live this down. They underestimated the load, maybe they bid too low for selling the tickets, maybe it was done with too little lead time, whatever. In the end, this is their failure. This will have cost them a good bit of reputation. If they had succeeded, they would probably have the World Series business and maybe other sports for several years to come. Under the circumstances, this will no doubt be a tremendous opportunity cost - the cost of lost opportunities.
Although I can't quantify the size of their opportunity loss, it is a clear illustration of how it is that lack of availability translates directly into the bottom line of companies - even if it's future revenues. To be fair to Paciolan,I believe that this is the first time anyone has attempted to sell World Series tickets only by the web. Hopefully, it won't be the last.
IBM has done some very high-profile sports web sites in the past, and I can tell you from the people that I've worked with who worked on those, that they required lots of money, incredible planning, always at least three geographic sites with separate networking infrastructure. Fortunately, so far, IBM has not suffered any embarrassing failures of this magnitude.
Maybe you're saying "I don't sell World Series Tickets, so this doesn't apply to me". Probably you don't sell World Series tickets. But, you do probably do something vital for your company's future health. But, if you take this catastrophe to heart, maybe you can avoid the same embarassing and expensive fate as Paciolan.
If anyone from Paciolan reads this article, I'm sure our readership would love to hear what actually went wrong, and how, in your opinion, it might have been avoided. Of course, feel free to correct the things I guessed at as well!
Even more than usual, I commend my disclaimer page to you. I don't speak for anyone but myself. Not my employer, the Boston Red Sox, the outstanding Colorado Rockies, nor the unlucky/unfortunate Paciolan.
See also my follow-up posting [5] on this subject.
References
[1] http://boston.redsox.mlb.com/index.jsp?c_id=bos
[2] http://colorado.rockies.mlb.com/index.jsp?c_id=col
[3] http://www.paciolan.com/
[4] http://colorado.rockies.mlb.com/news/press_releases/press_release.jsp?ymd=20071022&content_id=2276226&vkey=pr_col&fext=.jsp&c_id=col
[5] http://techthoughts.typepad.com/managing_computers/2007/10/bad-application.html
This is the most thoughtful analysis I've seen about the ticketing mess. Paciolan doesn't yet look ready to play in the big leagues.
Doug
Posted by: Doug Moran | 24 October 2007 at 12:03
It appears they had another signifant failure last night, as USC football tickets went on sale. The USC site, and at least 2 other Pacolian sites I tried were completely non-responsive for at least an hour. I contacted the company for comment, so far they have not responded.
Posted by: NextPage | 21 July 2008 at 10:29
HA is a big deal - and is more than just guarding against server crashes (important as that is). Probably they can't respond in a very elastic way to demand. Sounds like a good opportunity for someone ;-).
Posted by: Alan R. | 21 July 2008 at 13:22