If the air conditioning breaks down in a hospital administration department in the height of summer, productivity starts to drop as the temperature rises. It becomes harder to stay focused on the task at hand, people get crabbier on the telephone with patients and suppliers and the “go the extra mile” motivation your organisation normally prides itself on wanes significantly. Maximum tolerable outage? A day or so, perhaps. On the other hand if the air conditioning stops functioning in your data centre, then your servers may stop functioning too. Maximum tolerable outage? For vital medical systems, perhaps one minute – if that. (more…)
Posts Tagged ‘Maximum Tolerable Outage’
What’s your maximum tolerable outage?
Monday, October 10th, 2011Offsite Backup Tape Archiving for Disaster Recovery
Monday, September 26th, 2011If tape backup is an essential component of your disaster recovery strategy, then offsite tape archiving will often be as well. One of the classic tape backup risks is leaving the tapes onsite, where any disaster that wipes out your systems will do the same to your tapes. Basic disaster recovery strategy dictates that tapes need to be stored in a physically separate location. In that case, who is responsible for transporting them offsite; how are they stored in the offsite archive; who will bring them back onsite if disaster strikes, and how quickly? (more…)
What type of Business Continuity Recovery Site do you need?
Monday, January 11th, 2010The Recovery site is sometimes also referred to as the Alternate Site, Standby Site or Fallback Site.
Recovery sites can function purely as a standby data centre for your IT systems or they can be for business recovery as well, with desks, phones, desktop computers, meeting rooms and other facilities.
The data centre equipment and also the business recovery seats can be dedicated, by that meaning, totally reserved for your use only or shared, meaning first come first served in the event of a disaster. Which is why the ratio of clients to equipment is important as is the formula for how many clients from a given geographical area they subscribe to their ‘shared’ facility is as well.
One key decision when determining the most effective Business Continuity Strategy for an organization is the maximum readiness level of the recovery site (cold, warm, hot) that is required.
A cold recovery site is a facility that already has in place the environmental infrastructure required to recover critical business functions or information systems, but does not have any pre-installed computer hardware, telecommunications equipment, communication lines, etc. This scenario has the longest lead time to restoring live services because the equipment must be provisioned and setup after the event.
A warm recovery site is a site which is equipped with some hardware, and communications interfaces, electrical and environmental conditioning which is only capable of going live after additional provisioning, software or customization is performed, and the restoration of a database backup into the environment.
A hot recovery site is a facility that already has in place the computer, telecommunications, and environmental infrastructure required to recover critical business functions or information systems. Typically the organization’s data is synchronized to the hot site so that it can be switched across into live operation in a very short time, almost instantaneously in some instances. Because the data is mirroring at the data centre instantaneously or very frequently, the level of data loss in this scenario is usually minimal.
How to determine which type of recovery site is right for you?
Arising from your Business Impact Analysis, the Maximum Tolerable Outage for your business functions will give you the requirements by when the systems need to be up and running. The Recovery Point Objective, or the amount of acceptable data loss will help to inform these requirements as well. The right balance needs to be struck between the cost of the recovery solution and the cost of data loss, delays and downtime if you had to wait days or weeks to recover the systems.
This is why a wholistic, comprehensive Business Impact Analysis, involving the right business stakeholders and sponsored by Executive management is essential in order to determine the business continuity recovery strategy for your organization.
Business Continuity Terminology – What’s the difference between MTO, RTO and RPO?
Sunday, January 3rd, 2010A common query that we come across in business continuity consulting is, ‘what is the difference between MTO, RTO and RPO?’
MTO is the Maximum Tolerable Outage
The Maximum Tolerable Outage for a critical business process represents the maximum amount of time that an organization can survive without the business process in any form (manual or automated). Defining the MTO for a process gives you the deadline for when this process must be up and running in some form or another.
The BCI describes MTO as ‘At what point in time do you need to either recover your business process, or invoke contingency procedures to prevent you from meeting your business objectives\targets.’
RTO is Recovery Time Objective
Recovery Time Objective is essentially the timeframe requirement for how long it should take to recover from the time of declaring the disaster (not the time of the actual incident) to when the critical process or system is available to users.
RPO is the Recovery Point Objective
The Recovery Point Objective describes the age of the data you want to restore in the event of a disaster. For example if your RPO is 6 hours, you want to restore systems back to the state they were in no longer than 6 hours ago. This dictates your backup requirements, in this example you must be making data backups at least every 6 hours. Any data created up to the 6 hour RPO will be lost and will need to be recreated during your recovery process (if possible).
Workarounds and the backlog effect
Friday, November 27th, 2009A workaround is an alternative process used to replace the normal ‘business-as-usual’ process or IT system which may be unavailable during business disruption. When determining the Maximum Tolerable Outage (MTO) for a business function, whether or not there are manual, paper-based workarounds is a factor that can help work out how long you can afford to be offline from your IT systems and possibly allow you to implement a lower cost ‘warm’ or ‘cold’ solution’ instead of a ‘hot’ one.
These workaround procedures define the interim tasks to keep the process going whilst the IT systems or other resources are being recovered.
When considering how long a process can operate manually one area to beware of is the backlog effect. At time of incident, if the volume of work remains constant but the rate of processing is slower because it is manual, an increase in workload eventuates which will result in backlog. This backlog may increase exponentially for as long as you are not processing at full capacity. For each process there comes a time when no matter how much overtime you throw at it, it is very costly or impossible to catch up.
It is important to consider what this threshold may be for your process and what the absolute maximum period of time is that the process can operate manually and still feasibly recover. It is wise to allow some contingency between the MTO you select (when the process needs to be recovered by) and your absolute maximum time operating manually to ensure that you have some breathing space in case something goes wrong with the recovery efforts.
As a result, how long will your area will be able to function using manual workaround procedures should be revisited during your area’s BIA updates and tested as part of your business continuity exercise program.


