jim.shamlin.com

5: Testing Your Plan

(EN: This chapter also appears to be about testing, and I'm not clear on the difference from the previous chapter.)

What Steps Do I Need to Follow?

The author differentiates between first-stage tests that are conducted in a test environment and final-stage tests that are conducted as the system is incrementally moved to the "live service delivery" (production) environment.

The author notes that organizations are very squeamish about making any changes to the production environment because, presumably, it's working properly as it is and any new element introduces risk. However, systems are never static, and changes must be accommodated.

He categorizes changes into three kinds: an isolated solution (does not replace or interact with existing systems); replacement (when the new solution is implemented, the old is removed); and integration (the new solution is implemented and it will interact with existing systems). The degree of risk is self-evident.

Complete First-Stage Test

The author stresses the need for extensive testing of a new system, and implies that you should not rely on its similarity to previous systems and assume it will function the same in any regard. Generally, the procedure is to perform a test, and if a failure results, to make adjustments and re-test, then repeat until it succeeded.

(EN: This is very methodical, but I wonder if it misses something - specifically, if an "adjustment" to pass a test might create a condition that would cause a previous test to fail, but the previous test has already been checked off as completed. I suppose you have to break through recursion at some point, ands top going back.)

(EN: Another problem the author doesn't seem to address is the "don't question success" mentality - before you can say "success" you should not only have a solution that makes the system work, but also understand why it broke in the first place, to be certain the real problem has been addressed and that the later success wasn't merely happenstance).

There is also the concept of a "reassessment" before go-live. Especially if a project has been underway for a long period of time, the business conditions may have changed, and in spite of all the expense and effort, it may no longer make sense to move forward to installation. It is also important, after testing is done, to re-evaluate the capabilities of the system to ensure it is delivering as promised and that there are no side-effects that detract from (or engate) the primary value of the project.

Complete Final-Stage Test

A new solution can be loaded all at once, or it may be done in pieces. Whatever the case, the procedure is to load a batch of elements, test them, and then monitor them, to ensure that they are stable and functional in the production environment.

The process is often referred to as "release management," which orchestrates the activities necessary to move the code and data to the production environment in a coordinated manner. Generally, the progress is slow, and the release managers are prepared to withdraw or "back out" the new application if any of the test conditions fail, to protect the rest of the production environment.

Following the loading of the application, there is generally a period of close scrutiny and monitoring of the application in the production environment, until such time as the novelty has worn off.

Meeting Contractual Obligations

An application is not successful if it merely limps along - it must perform to a specific level of quality, generally referred to as a service level agreement (SLA) or objective (SLO), such as the amount of traffic it can stand, speed of response, etc.

When working with vendors, the SLO is generally guaranteed by the vendor for a certain length of time (the don't want to be on the hook forever) and are dependent on certain conditions (if you downgrade your internet connection or database server, the vendor is freed of its obligations that depend on them). These agreements are generally couched in contractual language (as they often come with a financial remedy to be paid in exchange for failure) which reduce or eliminate penalties for any event that is outside the vendor's control.

"Service" is also pertinent to the service provided by the vendor, in addition to the performance of the system: how quickly they respond when called to respond to an outage or service request is a critical component of the contract.

A fair amount of information is provided on granular issues - but fundamentally, you should carefully consider what SLAs you expect a vendor to guarantee, define their obligation to you in case of failure, and be prepared to accept clauses that exempt the vendor from liability for circumstances beyond their control or prediction.