Director’s Blog: 10 Ways to eliminate the causes of unplanned downtime in your data centre operations

Downtime during data centre operations is a nightmare.

But I don’t have to tell you that… We’ve all seen it first-hand.

And the costs are exorbitant. Not only are there repair/maintenance costs, but there’s also the potential for a huge loss of revenue. We’ve spoken before about how the average data centre downtime incident costs over £400k.

But that’s just an average – In 2013 Google had a power outage that reportedly cost over half a million dollars… for just five minutes of data centre downtime. Amazon estimates that whilst their .com domain is down, they lose $1,104 per second. And that’s not even factoring in the potential for damage to reputation, lapses in communication, and loss of vital data.

Clearly, this is something we want to avoid.

The causes, though, range from the routine to the ridiculous. The number of scenarios that could potentially force data centre downtime is nearly endless.

The majority, however, can be summarised into these big four:

1) Failures (Equipment, systems, UPS power supplies, utilities, backup generators, etc.)
2) Weather
3) Human error
4) Cybercrime

So without further ado, here are 10 ways to eliminate these causes of unplanned downtime during your data centre operations.

1. Plan Your Data Centre Downtime

A no-brainer, perhaps, but one which should be kept in the forefront of our minds.

The regular scheduling of planned data centre downtime for UPS maintenance, pre-emptive repairs, and upgrades is paramount to avoiding unplanned downtime caused by equipment failures.

When performed properly, these periods of downtime are completely invisible to your clients and the outside world. They may be time-consuming and expensive to schedule regularly, but it’s a pittance compared to the costs of unplanned data centre downtime.

2. Don’t Skimp (Remember the Costs!)

Now, more than ever is a time for investment.

Due largely to the growth in enterprise cloud computing and M2M connectivity, not to mention content-heavy applications, Cisco predicts that global IP traffic will reach 1 zettabyte this year.

That’s a billion terabytes.

And it gets better. Cisco also predicts that by 2017 that figure will reach 1.4 zettabytes. 40% growth in just a couple of years.

It’s safe to say that most data centre infrastructure will be growing in the coming years and that new data centres will be popping up all over the place. But whilst that happens, I’d also like to venture a prediction: unplanned data centre downtime will increase.

While rushing to meet the ever-increasing demand for server space, some companies will not take the time and care necessary to ensure near-flawless uptime. I highly recommend that you avoid this potential trend; train your staff thoroughly, invest in high-qualityUPS monitoring systems, and do everything you possibly can to avoid unplanned downtime.

On that note, I’ll leave you with the words of Lee Brathwaite, Vice President of Real Estate for Verizon – A company renowned for their exceptional uptime statistics.

“In a business where an hour of downtime can result in millions of dollars of lost revenue, it is critical that customers are confident that the networks they depend on are continuously up and running”

3. Watch Out for Squirrels…

If you followed the earlier link to datacenterknowledge.com, you will no doubt have read that in 2010 squirrels took down half of Yahoo’s Santa Clara data centre infrastructure. In fact, it’s not that uncommon – a 2011 study found that 17% of level 3 communications’ cable damage was due to squirrel chewing.

Perhaps even more bizarrely, in 2010 Google reported that aerial fibre links to its $600m Oregon data centre operations were “regularly shot down by hunters”.

But I’m not here to talk to you about squirrels. Or hunters.

The takeaway from this is that it’s vital to assess all potential threats to your uptime and take action against them. You can’t necessarily predict every eventuality, but if you really look for them you’ll find a lot more threats than are immediately apparent.

And you’ll be surprised how taking preventative measures against one risk can help protect you against a plethora of others you never considered.

Google solved their hunting problem by moving their fibre links underground. A simple step, but one which also protects against damage from other sources, including weather, vehicles, and, of course, squirrels.

4. Implement a Well Designed Uninterruptible Power Supply

You didn’t think you’d get through a whole article without me talking about UPS power supplies, did you?

Data centre UPS systems are essential to avoiding unplanned downtime, and as I’ve written previously they don’t just protect against power outages. Power problems come in a variety of forms, and they can cause significant damage to your critical UPS power systems.

And that’s why just any old UPS power supply won’t do – even if it’s a data centre design for power supply applications. You need a UPS system designed for you.

The same is true of your backup generator. Ideally, your diesel generator and UPS suppliers will have a history of working together (or indeed be one and the same), as the switchover between these vital UPS power systems is essential to avoiding unplanned downtime.

Backup generators and UPS systems vary tremendously in size and function, and experienced UPS suppliers know that every uninterruptible power supply must be tailored to the specific needs of the client.

That’s the way we operate at KOHLER Uninterruptible Power Ltd, and you shouldn’t accept anything less.

5. Have a Backup for Your Backup

Unfortunately, there is such a thing as UPS power outage. And it’s not pretty.

With an old-fashioned online UPS system, you could potentially have a situation where UPS power loss resulted in unplanned data centre downtime even though mains power was present.

Thankfully in modern UPS power systems, this isn’t the case. In the last article, where we discussed the major components of modern uninterruptible power supplies, I talked briefly about the static switch. This UPS power supply component enables switchover to mains/bypass power in the event of a UPS power outage.

This is simply another example of the value of investing in modern, high quality, tailored UPS power systems. In an ideal world, any single system or piece of equipment should be able to fail without causing unplanned data centre downtime.

That might not always be possible, but it’s a good point to aim for.

6. Look After Your UPS Batteries

I’ve written about the importance of UPS battery maintenance before, but I really can’t stress it enough. If your uninterruptible power supplies are vital to your overall uptime (and they are), UPS battery maintenance becomes paramount as well.

Of course, in a large data centre design, you’re not going to be running on UPS battery power for long. Even a 500KVA UPS system is only designed to support critical loads for long enough to switch over to a backup power supply.

But those precious minutes between mains power and backup power are a scary time. If everything goes to plan, the event will be invisible to the outside world. If there’s a fault with your UPS batteries, you’re going to experience unplanned downtime.

So look after them.

7. Don’t Rely on People

OK, this is a little facetious. You can’t avoid relying on people altogether.

But studies cite human error as the cause of between 50-80% of all unplanned power outages in data centre operations. Whatever the precise figure may be, it’s time to sit up and take notice.

There is any number of electronic systems and devices designed to remove some of the human element from data centre operations. Tracking everything from environmental conditions (heat, humidity, etc.) to the clock speed of individual servers, these innovations can identify potential issues before they happen.

The KUP PowerNSURE UPS system is an excellent example of how you can cut a huge part of the human element out of the UPS battery maintenance process, whilst also drastically reducing the chances of unforeseen data centre downtime.

Remember what I said earlier about skimping? Some of these UPS power systems may be on the expensive side, but I’ll wager the costs of unforeseen downtime would be more costly in the long run.

8. Consider Outsourcing

The decision of whether to perform a function in-house or not is usually informed by costs.

Can we justify the personnel and training costs necessary to meet this function, or should we simply outsource it?

But I’d like to venture a second question into the discussion: Is this function essential to maintaining our uptime?

Sure, you have technically minded people on site most of the time. But are they specialists in server maintenance? How about UPS systems? Utilities? Environmental factors?

It’s certainly possible to train your people to perform these functions, but when it’s just one small part of their duties they’re never going to be experts. When you outsource vital functions to a specialist, you’re giving yourself the best possible chance of avoiding the human error factor.

9. Defend Your Digital Fortress

It’s a chilling thought.

Your UPS systems are in perfect health. You’re regularly cycling your servers, monitoring your UPS batteries, and you’re expertly maintaining your systems.

But even though you’ve done everything right, you’re still at risk.

Cybercrime accounts for a relatively small proportion of unplanned data centre downtime incidents, but it can be very, very expensive to resolve. Not only is there the potential for major damage to your UPS power systems, but there’s also the potential for loss of sensitive information.

Just like every other element of your data centre design, you can’t skimp if you want to maintain your uptime. I would strongly advise that you consider uptime to be your primary concern – Above minimising costs, and above improving energy efficiency.

I won’t pretend to be an expert in digital security, but I know enough to say with certainty that this is not an area in which you’ll want to be lacking.

And of course, we also shouldn’t forget about physical security. Even in this day and age, there are occasionally high profile instances of physical security breaches!

Cybercrime is becoming more prevalent every year. As a data centre, choosing whether or not to invest in state of the art security should be an easy decision.

10. Get a Health MOT

An outside perspective is often useful.

Even if you’ve done your due care and diligence to implement high quality, modern UPS systems, I’d still recommend that you have them thoroughly reviewed by an external source.

When it comes to your uninterruptible power supplies, we offer a free site survey service to ensure your UPS power systems are properly designed/installed to protect you against unplanned data centre downtime. We’re experts in everything necessary to maintain a constant supply of clean, incident-free power supply to your data centre infrastructure.

Just get in touch, and we’ll do the rest.

And who knows, if you take all these suggestions on board, maybe you’ll hit that elusive ‘six nines’ (99.9999%) availability target next year!

 

Follow us on LinkedIn for regular industry articles & company information and for more information regarding any of our UPS power products or services, you can get in touch with KOHLER Uninterruptible Power via our contact page or call us on 0800 731 3269.

You may also be interesed in....

Need an answer quickly? Our team of experts are on hand to help, call 0800 731 3269.

More Articles