Virtualization is a business continuity answer to the vulnerabilities and foibles of physical servers. By spreading applications virtually and horizontally across vertical stacks of computing power, service can be ensured even if one stack goes down and the same application elsewhere picks up the slack. In principle, that’s fine – as long as IT administrators remember they’re dealing with virtual machines and manage them correctly. War stories grow daily of catastrophes or near misses concerning faulty perceptions and handling of virtualisation. The following can help you conserve business continuity and avoid the need for disaster recovery.
The main challenges in properly implementing business continuity management in an organisation can be expressed in four words: engagement, understanding, appropriateness and assumptions. In other words: senior management needs to be involved and committed to BCM; business continuity managers need to understand the essentials about IT operations; BCM processes need to link business objectives to operational realities; and any assumptions in BC planning need to be closely scrutinized. If this sounds like IT governance, you’re right. IT governance gives some good hints about how to make business continuity a practical, valued reality.
Historically, vendor solutions for disaster recovery have been created for on-site use for individual enterprises. The client company concerned was the sole owner of the user data involved, and disaster recovery could be implemented without having to worry about anybody else. The cloud computing model changes that situation. It’s possible to use cloud services to have your own dedicated servers and instances of applications, or to share physical space but still have your own application (as in multi-instance setups). However, multi-tenancy (perhaps the defining feature of cloud architectures) makes the application of disaster recovery solutions rather more delicate.
Agile project methodologies have their roots in the software industry, but the overall principle of staying close to market requirements can be applied in any sector. When risk management becomes difficult because of uncertainties like the weather or the economy, short agile cycles encourage a focus on objectives. This may make more sense than detailed planning that tries to put everything in place for the mid to long term. Efficiency and business continuity can be improved, on condition that communications remain open and productive with all stakeholders. So with these advantages, why don’t all organisations and projects jump on the agile bandwagon?
The ‘not invented here’ syndrome was something that forward-looking corporations set out to beat about 20 years ago. If a different product or service could be more cost-effectively bought in rather than being designed and manufactured in-house, then it was bought in. The challenge was to overcome misplaced pride and internal turf wars, where being asked to give up control over development could be construed as an attack on credibility, status or both. Some departments resisted by refusing to work with something that was ‘not invented here’. Now, Disaster Recovery as a Service (DRaaS) may be plagued with a similar issue, where companies cannot look outside what they already have – but for a different reason.
Traditional data backup happens once every so often – once an hour, once a day, once a week, for example, depending on the recovery requirements associated with the data. It’s typically the recovery point objective or RPO that determines the frequency of the backup. If you cannot afford to lose more than the last 30 minutes’ worth of data, then your RPO will be 30 minutes and backups will happen at least every half an hour. Continuous replication on the other hand changes the model by backing up your data every time you make a change. But what does that do to RPO, disk space requirements and network capacity (assuming you’re backing up to storage in a different physical location)?
Ensuring employee safety by rapidly disseminating the right information, and keeping communication lines open in a time of crisis are both priorities for businesses. Traditional solutions for this have relied on the manual ‘call tree’ or ‘phone tree’. Key employees are contacted first to inform them of whatever situation or crisis has arisen, with remaining staff to be contacted as soon as possible afterwards. However, even for smaller organisations of 100 people for example, the manual call tree rapidly demonstrates its limitations. For larger enterprises, there is no doubt – a better solution is required.
If you’ve already experienced a distributed denial of service attack, you may have simply seen it as an attempt to cripple a company or organisation by blocking connections to its servers. Indeed, that’s what DDoS is designed to do. Hackers use a multitude of computers, some without the real computer owner’s knowledge, to generate more traffic than a server can cope with. Legitimate users are unable to connect to the server or experience very poor performance (slow connections). However, DDoS often indicates more than one stand-alone cyber aggression. Organisations experiencing this kind of attack should be on the lookout for other risks too.
No news is good news, or so the saying goes. But when equipment malfunctions and services are interrupted, no news can mean intense frustration for customers and end-users. In today’s quality and satisfaction-oriented business world, you might think that major corporations had understood the importance of good crisis communication. And to be fair, many now make efforts to keep customers informed of the causes of business interruption, the solutions being put in place, and the estimated time when normal service will be resumed. That’s what makes behaviour around a recent outage by one of the top IT and cloud service vendors so hard to fathom.
Business continuity problems often carry their own penalty in the form of lost revenue, customer churn and reputational damage. In some cases, outages also mean stiff fines that go beyond the penalties that are part of any service level agreement. Thus, SingTel, the Singaporean telecommunications company, received a 6 million dollar fine (about 4.81 million USD) from the ICT regulator in Singapore for a breakdown in service in October 2013. The disruption affected government agencies and financial institutions and had an impact on 270,000 subscribers. But what is really behind fining a company whose business continuity fails like this?
Could it happen? With the growing popularity of cloud computing services and the increasing dependence of companies and operations on them, it’s clear that online services need at least a minimum of safeguarding and protection. But aren’t cloud services supposed to be distributed, redundant and robust enough to protect themselves? After all, that’s what many enterprises rely on when they choose the cloud for data storage, backup, applications and databases. The number of high-profile outages suggests that assumption may not be as valid as either vendors or customers would like. A case in point was the recent unavailability of the Adobe CS cloud service and the resulting paralysis of a major media activity in the UK.
Considered by some to be obsolescent, obsolete or virtually flat-lining, tape backup is still around. Even new hard drive technology and solid state storage cannot match the price point per terabyte stored. Now IBM and Fujifilm have pushed the envelope even further with new tape cartridge that can hold 154 terabytes of data. By comparison, the last time market leader Seagate discussed progress on hard drives in 2012, its objective was for a 6 terabyte 3.5-inch desktop drive, with ‘eventually’ a 60 terabyte version. Does this mean tape is once again snatching itself from the jaws of death – or could it be (gasp) that tape is simply better for volume storage?
When hospitals moved from film-based hardcopy systems to electronic images, they began to generate large amounts of data held on PACS – Picture Archiving and Communications Systems. Hospitals use various ‘modalities’ to scan patients, including Computer Tomography, Magnetic Resonance Imaging and Ultrasound systems. These modalities must regularly (and frequently) upload the scanned images to the PACS, where they can be stored, sequenced for retrieval and made available for remote diagnosis. However, a PACS is often a potential single point of failure with inevitable downtime – which is where the DR lessons start.
For some organisations, it’s an explicit legal requirement. For others, it’s the consequence of prevailing laws and regulatory structures. The mandatory requirement defined by the Australian Government for its agencies sets the tone: “Agencies must establish a business continuity management (BCM) program to provide for the continued availability of critical services and assets, and of other services and assets when warranted by a threat and risk assessment.” And for the rest? There’s a strong argument to be made that business continuity management is no longer a choice for any enterprise – and that an obligation for BCM is a good thing anyway.
Server virtualisation, that sophisticated solution for stacking several virtual servers on one physical machine, may mean some sticky times for certain organisations. The underlying idea is attractive: with virtualisation, you can increase operational resilience and efficiency. The bottlenecks arrive when virtualisation either gets out of hand, putting a strain on I/O capability, or when IT staff bump up against a conceptual barrier that blocks additional deployment.