Archive for November, 2009

Workarounds and the backlog effect

Friday, November 27th, 2009

A workaround is an alternative process used to replace the normal ‘business-as-usual’ process or IT system which may be unavailable during business disruption. When determining the Maximum Tolerable Outage (MTO) for a business function, whether or not there are manual, paper-based workarounds is a factor that can help work out how long you can afford to be offline from your IT systems and possibly allow you to implement a lower cost ‘warm’ or ‘cold’ solution’ instead of a ‘hot’ one.

These workaround procedures define the interim tasks to keep the process going whilst the IT systems or other resources are being recovered.

When considering how long a process can operate manually one area to beware of is the backlog effect. At time of incident, if the volume of work remains constant but the rate of processing is slower because it is manual, an increase in workload eventuates which will result in backlog. This backlog may increase exponentially for as long as you are not processing at full capacity. For each process there comes a time when no matter how much overtime you throw at it, it is very costly or impossible to catch up.

It is important to consider what this threshold may be for your process and what the absolute maximum period of time is that the process can operate manually and still feasibly recover. It is wise to allow some contingency between the MTO you select (when the process needs to be recovered by) and your absolute maximum time operating manually to ensure that you have some breathing space in case something goes wrong with the recovery efforts.

As a result, how long will your area will be able to function using manual workaround procedures should be revisited during your area’s BIA updates and tested as part of your business continuity exercise program.

OpsCentre – Business Continuity Consulting

Choosing a Business Continuity Recovery Site

Wednesday, November 25th, 2009

If an organization experiences a ‘denial of access’ or ‘loss of premises’ due to incidents such as extended power outage, flood or fire, an alternate location for critical business processes and staff needs to be established.

An Alternate Site is the premises to which a business unit may transfer its operations in the event of a business continuity incident. This is sometimes also known by the name Fallback Site or Recovery Site.

There are a number of different options that can be used as an Alternate Site depending on organization’s overall BCP strategy, recovery time frame requirements, budget etc. These are:

Commercial Recovery Site
In most capital cities there are organizations that provide both dedicated and shared recovery seats and some provide IT recovery infrastructure as well. Annual leasing fees are paid based on the number and type of seats required as well as for any IT equipment, storage of your IT equipment and other related services.  

Internal Property Assets
Sometimes organizations may have other property assets which have vacant, underutilized or lower priority business functions housed there. These could be designated as an Alternate Site  for a higher priority business function should the BCP need to be invoked. This is why it is important to have a clear prioritization of your business functions from the BIA as it will ensure lower priority business functions are vacated in the event of a significant business disruption to enable operations of a higher criticality to continue. It is also vital to have a displacement plan in place for the regular staff of the Alternate Site so everyone knows where they are going. Other considerations when planning how to use the displaced Alternate Site are transport, parking, seating, security access and IT requirements.

Telecommuting
Often staff are already geared up to telecommute and this does offer a low cost solution that suits many business functions. However there still needs to be a clear plan around which business functions are expected to telecommute and to ensure they have the resources such as IT equipment and remote access in order to do their jobs.

Vacant seats or displaced seats at a partner \third party organization
On some occasions there is a partner\third party organization that have capacity to house additional staff should the need arise. This may be a reciprocal arrangement. If an organization needs to rely on this type of arrangement it should be formalized and reviewed on a regular basis to ensure the seats are able to be made available should they be needed and to outline any commercial terms.

Commercial Serviced Offices
A commercial serviced office will certainly have the meeting room, seating and internet access required to get many people up and running initially. However, as this is a first-in first-served arrangement it is not recommended that this be relied upon as the sole recovery site for critical functions. If the serviced office is likely to be subject to increased demand from other organizations affected by the incident, you may not be able to get in as expected. It is still a useful contingency to have the contact details for some serviced offices both near the office and geographically separate as well.   Hotels are also another option as they will typically have a business centre and meeting rooms.

In all instances it is best practice to maintain geographic distance between your primary site and your Alternate Site(s) in case there is a widespread incident affecting the general area of your primary site, for example, a large power outage. If your Alternate Site is too close, it may be affected as well. 

Whichever type of Alternate Site is selected it is vital to include this as part of your regular Business Continuity and IT Disaster Recovery testing exercises to build staff familiarity and ensure that they can activate and function as you planned.

Is an outdated business continuity plan worse than none at all?

Friday, November 20th, 2009

This is a debatable point but possibly acting upon an outdated strategy will be time, money and energy misspent in recovering something that is incorrect or no longer needed.

Change is inevitable … A plan can easily get out of date as staff turnover, new business units are created or decommissioned,  IT systems are changed, removed or added, risks affecting the business change or the priorities of the business have changed.

Given the resources typically spent to get a BCP in place in the first instance, it makes good sense to undertake some regular maintenance to ensure it stays current. The longer this is put off, the greater the chance that the whole thing will need to be re-visited down the track.

Maintaining the BCP needn’t be hard but it has to be assigned as someone’s specific responsibility and priority.

Nominating a BCP Manager or Co-ordinator is the first step. It is their responsibility to maintain the overview of all of the planning documents and resources in the organization and to ensure they are kept up to date, even if they are delegating tasks to others.

Ensure the BCP Manager is empowered by Senior Management in this role, making sure the stakeholders that may need to be involved know this is an important task they will be asked to participate in.

Determine a frequency for updates that is realistic and achieveable and stick to it. Schedule out review dates ahead of time, put them in stakeholder’s diaries and schedule review meetings well in advance if necessary.

Include BCP and IT DR considerations in the ‘impact analysis’ for all new projects, not just IT but business projects as well. This may mean adding a section into the organisation’s Business Case and IT Change Request templates. New projects should be considered in the light of impact on existing strategies and business continuity provisions. New IT systems should have their IT Disaster Recovery provisions planned for within their business case and implementation projects if necessary so that the new systems are not left without sufficient coverage.

Not all organizations are able to invest in a full-time BCP Manager so instead the responsibility gets tacked on to someone’s existing role, with varying degrees of success.

Business Continuity Management

Business Continuity Testing Isn’t a Pass or Fail Exercise

Wednesday, November 18th, 2009

Business Continuity Plans (BCPs) need to be regularly tested and updated to ensure accuracy and effective recovery in the event of a disruption.

Testing (sometimes referred to as Exercising) shouldn’t be viewed as a Pass or Fail exercise as every test is an opportunity to find potential problems with your plan and to have an opportunity to rectify them.

We view testing as an opportunity to continually evolve not only the strategy and plan documents themselves but to build the competence of the key staff members involved. A regular BCP test will help to embed the skills required to effectively manage a business interruption.

Once you’ve mastered the basics of testing your business continuity plan, put your strategy to the test by seeing if it can hold up in a variety of circumstances. It also helps to engage the interest of the business continuity team members if the test scenarios are dynamic and evolving.

Some ideas on how to keep your testing fresh and continually improving are to try some different ‘incidents’ such as a catastrophic loss of premises, extended power failure resulting in denial of access to building or loss of significant staff due to a pandemic. 

Other ways to put your plan through its paces is to throw in a complication such as the unavailability or loss of a key recovery staff member Eg. The IT Recovery Co-ordinator or the Command Team Leader. Other complications could include, mobile phone telecommunications being unavailable.

A robust business continuity plan and the business continuity team should be able to respond to these challenges.

Most importantly, to ensure you are getting value out of your testing process, ensure someone is assigned the responsibility for noting down issues and action items for rectification arising from the test and for following up to make sure they are completed.