Even if you aren’t a user of Skype, the internet telephone service, you probably won’t have missed seeing or hearing about the total service outage during Thursday and Friday last week.
Service came back up over the weekend and today things seem very much back to normal.
Clearly for the average 6-9 million or so individuals and businesses typically using Skype at any one time (including me), it was a major problem suddenly being cut off from your telephone and instant messaging services.
Lots of speculation abounded as to what was the cause, including theories about cyber attacks. Skype themselves posted on the Heartbeat status blog that the cause was to do with “a deficiency in an algorithm within Skype networking software.”
Today, a fuller picture emerged as to what happened:
On Thursday, 16th August 2007, the Skype peer-to-peer network became unstable and suffered a critical disruption. The disruption was initiated by a massive restart of our userâ€™s computers across the globe within a very short timeframe as they re-booted after receiving a routine software update.
The abnormally high number of restarts affected Skypeâ€™s network resources. This caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources, prompted a chain reaction that had a critical impact.
I don’t know about you, but I find this explanation quite a bit confusing.
What “routine software update,” I wonder? I didn’t get a Skype software update (but I did get one today as a new version of that software for Windows was released on August 17).
Maybe they mean Windows updates – quite a few security and other updates were released by Microsoft last week. In which case, was it Windows users who were the ones whose computers restarted which then affected the rest of the peer network?
Such technical questions combined with the technical explanation from Skype won’t help most people who aren’t tech-inclined really understand why the outage happened even if they do get a sense of what happened.
Some people have accused Skype and parent eBay of insufficient communication in a time of crisis.
Firstly, I don’t think this counts as a crisis. Two days outage, now service is restored. Secondly, there was plenty of ongoing communication via Heartbeat as to what was happening and, importantly, what Skype was doing about it.
That was the most effective means of communication at the time – regular updates via an appropriate channel, which was picked up and further communicated (or commented on) throughout the mainstream media, online and print, and the blogosphere.
It wasn’t the moment for the C-suite to be issuing statements.
Perhaps that moment is now. The immediate technical problem is fixed, which leaves an immediate business problem to address – confidence in the service and the company’s reputation.
This is part of a bigger picture, too. According to the FT:
[…] the prolonged outage drew widespread comments from users and industry analysts, who cautioned that it highlighted the dangers, particularly for business users, of relying on a single service provider â€“ especially one using relatively untested voice over internet protocol (VoIP) technology.
Industry experts believe the problems at Skype were specific to its service. However, some analysts speculated that Skypeâ€™s problems, coupled with the legal problems of Vonage, the leading US independent VoIP provider, and the failure of SunRocket, another internet telephony company, could undermine confidence in internet protocol communications as a whole and encourage users to return to more traditional telecoms carriers.
It’s a fragile thing, confidence.
I’m pretty sure the “routine software update” was a Windows patch delivered automatically throught he Windows Update Service.
That’s what some people are saying, Shel.
I’d like to know what Microsoft will say. It’s a pretty serious accusation to lay the initial blame with Microsoft. Many ingredients here for a major public kerfuffle.
I hope we get some clarity on exactly what the cause was rather than speculatory media and blog interpretations of the Skype Heartbeat post.
Just taken another look at the Heartbeat post that explains things.
Update to the text which now explicitly lays the blame on Windows Update:
I think this story is only just getting going.
It’s a bloody good thing you and Shel weren’t trying to do a live show on Thursday!
Skype interacts with the BIOS on Windows computers, which means it’s tied in at a very deep level, so it’s possible quite unexpected things could interfere with it.
You’re right, Sallie, that hadn’t occurred to me re last Thursday’s show!
We devoted quite a bit of time in today’s show to a discussion on this Skype issue from the communication perspective.