Strongspace's 10-Day Crash Highlights Web Storage Risks
For the last 10 days, Sausalito, CA-based online document and storage hosting company Joyent struggled to get its online secure document collaboration service, Strongspace, back online.
During the outage, the company's clients had no access to their hosted documents--leaving some IT pros to wonder whether the features of such online collaboration services are worth the risks.
A Bad Week Gets Worse
It all started Saturday, Jan. 12, when the company's two online repositories--BingoDisk, for data storage, and Strongspace--went down owing to issues encountered by its Sunfire X4500 server.
Joyent CEO David Young started a Jan. 16 post
announcing the crash by joking that it "was not the week to stop drinking."
"We got bit by a massive ZFS bug. That's the long and the short of it," he wrote in a follow-up post. "The good news is we can unravel the corruption. The bad news, given the fact that Strongspace and BingoDisk ran on a Thumper (aka SunFire X4500) (48 500GiB drives), was that we have to use other Thumpers to stage the uncoding of the ZFS mess. Moving so much data around to decode the ZFS corruption has taken time."
And it did. While BingoDisk went back online Friday, problems with Storagespace
continued. On Sunday, the company sent an e-mail to customers stating that the
service was back up, only to send another e-mail today that it was back down.
and "down" notifications
on the company's Web
site and blogs
followed, with estimates for the service being restored "late
The service was down around 3 p.m PST Monday afternoon, then appeared to be back up a short time later. A post made late Monday said that the servers were back up and being watched "closely."
"I'll put it back into production for a period of 24 hours and we'll watch it closely," a company tech stated in the latest post. "The hope will be that ... Tuesday night we can do a clean shutdown and be beyond this silliness. Fingers crossed."
Joyent did not respond to our request for verification of the service's current status and comment on the situation by press time.
While Joyent has repeatedly assured customers that no data has been lost --
it praised ZFS highly for its work in helping fix the problem--the amount
of time the service was out combined with the up-and-down nature of the restoration
appears to have shaken some customers.
"While I appreciate the hard work you guys are doing to get everything back online, I'm starting to find it unacceptable that we've unable to use the service for over a week," wrote one user on the company's blog. "We rely on the service for client data transfers that are critical to our business. When our non-technical clients ask why there seems to be no redundancy built into a service that is likely used by many for business critical purposes, I find myself with no explanation."
"What a mess. Please, just fix it or simply admit that you cannot. This has been going on for ONE WEEK," wrote another.
Another blog poster wrote that while he's relatively happy with how Joyent
has communicated about the problems, "the false starts are unfortunate...in
a situation like this, given everything that's happened, I would expect them
to fully test things (and double check them) before claiming that things are
back to normal."
An IT executive and Joyent customer we talked with who asked not to be named said that the whole experience is "just unacceptable," leaving him to question the future of his company using online document hosting -- whether from Joyent or others.
"How could they allow this to happen?" he questioned. "This is really bad--you can't even access your account, let alone the documents."
While his company is not hosting any current projects on the service, it did
a few months ago, and he said if this outage had happened then, "We'd be
dead in the water."
"Obviously, it changes my intial view that this was a very secure, high-availability
solution for remote storage," he stated, adding that the situation is a
game-changer for any IT professional looking at this or similar solutions: "[IT
professionals are] going to have to...to treat [hosted services] much like you
would internal storage, and have a backup and recovery plan, because it's clear
these types of vendors face internally the same IT issues [as] internal IT departments."
Young posted Joyent and customers will receive "generous" compensation for the outages; details have yet to be released.
In the meantime, customers located near Joyent's office may want to cruise
by: The company has a standing offer
to give any customer who drops in a Joyent T-shirt plus "a fine whiskey,
a glass of Pernod, or for you lightweights ... a good old bottle of water."
Considering what's happened during the last week, it shouldn't be too hard
to convince Joyent to give you a double--or to find someone to join you.
Becky Nagel is executive editor, Web Initiatives for the 1105 Redmond Media Group and the editor of Redmondmag.com.