SAN FRANCISCO -- The week before its major user conference, Salesforce.com suffered a lengthy outage of its Sandbox environment, bringing new questions about how customers should handle outages.
Outages, or unscheduled downtime, are nothing new for Software as a Service (SaaS) companies. The week before, Microsoft had an outage of its Microsoft CRM Online and Office 365 when they went down for five hours Aug. 15. Indeed, Salesforce.com itself has set up a site to monitor and inform its user base about unscheduled downtime at trust.salesforce.com.
In fact, the trust.salesforce.com site outlined the issue that affected its CS3 server from Aug. 23 to Aug. 29. The issue, according to the trust.com site was:
Detail: On August 22, 2011 beginning at 15:28 UTC and ending at 16:03 UTC, the salesforce.com Technology Team resolved a performance degradation issue affecting AP0/NA1/NA2/NA5/NA6 and the CS0/CS1/CS3/CS12 Sandbox instances. During this period, customers on the affected instances may have experienced intermittent login failures while trying to access the salesforce.com services. We apologize for any inconvenience this may have caused you.
Root cause: The problem was caused by a disk failure on our primary load balancer that caused the device performance to degrade and intermittently pass traffic. The salesforce.com Technology Team initiated a manual failover of the load balancer to the standby device which immediately resolved the problem.
Actions to prevent future Incidents:
The salesforce.com Technology Team:
- is implementing improvements to the monitoring infrastructure to identify this type of issue in real-time.
- is working with our vendors on the root cause of this issue
"It was annoying," said one customer who asked not to be named. "I was in a business meeting. I don't like surprises."
While Salesforce.com clearly articulated the issue and in fact created trust.salesforce.com in the wake of an outage several years ago, last week's instance brought up questions concerning Service Level Agreements (SLAs) between Salesforce.com and its customers.
Several customers attending the show this week were unsure whether the Sandbox environment affected uptime guarantees laid out in their SLA. Some SLAs provide for significant discounts or refunds in the event of lengthy downtime. Whether the Sandbox applies the same as for production environments was a mystery for some.
Another customer, who was on a different server and unaffected by the downtime, didn't know whether it affected his company's SLA with Salesforce.com, but the prospect had him concerned.
"I actually made a note to talk to my AE about it, to make sure it doesn't happen on our production environment," said the customer, who also asked not to be named.
For Chicago-based Devry Inc., the outage held back the development organization that was working on some routine maintenance and enhancements. The fact that it was just the testing environment made the downtime more palatable, said John Cunningham, senior manager for CRM architecture.
The Sandbox environment does not affect Devry's SLA, he said.
Yet, it's a bigger issue for some, particularly companies building their own applications on top of Force.com, according to Liz Herbert, principal analyst at Cambridge, Mass.-based Forrester Research. Many companies are unsure how the SLAs apply to the Sandbox Prospective customers and those coming up for renewals should take this into account.
"Clients should specify if uptime SLA includes the sandbox environment," Herbert said.
Salesforce.com had not responded to a request for comment at the time this story was posted.