TomHenderson

Setting up an Internal Cloud for Patch/Fix Testing

by Tom Henderson (TomHenderson) on 25-05-2011 10:20 AM

For some organizations, “Patch Tuesday” has become Assault Tuesday, with incumbent overtime. Microsoft certainly isn’t the only culprit; other companies, too, can cause massive problems with working platforms through their update service. Indeed the best lain patches proffered by Oracle, Novell, RedHat, and a long list of others have caused dependency problems or even smoldering craters where data centers once lived.

The complex nature of working platforms means that interdependent applications must all work together in harmony to keep systems running even if they are from highly competing vendors. IT staff are often dedicated to testing and deploying patches to end users.

Making a patch/fix platform used to mean keeping a duplicate or pilot set of hardware. Each needed strongly documented tracking of software assets, revision levels, configuration file backups, as well as pages of online and double-backed-up notes to be used when disaster struck after a patch or fix.

The concepts of alpha, beta, prelease, pilot, and production releases varies enough to blur these concepts between software organizations and developer groups. Because of the enormous interactions, patch-fix testing often delayed introductions of patches into production systems, often causing enormous problems.

Zero-day security patches might increase an organization’s website vulnerability to the point where the website may need to be taken down until a patch or fix could be introduced. A line-of-business database application might have to be stopped because of data integrity issues. Dependencies in e-mail systems might prevent spam filters or audit software to function correctly. The need to rapidly pilot and test changes became very highly desirable, if not mandatory for production processes.

The Evolution of Pilot Platforms

In the bad old days, your IT staff needed a representative atomic sample of each platform to be piloted. If a patch arrived, the platform was brought up to the “speed” of the production platform in as many realistic ways as could be accomplished. Sadly, some platforms couldn’t be easily emulated. Maybe traffic couldn’t be easily emulated. Perhaps it wasn’t possible to get a database of equivalent size as a production system’s instances. Compromises were (and often still are) made to emulate as many characteristics as possible to test the patch and fix in the hopes that the production system wouldn’t be destabilized.

Virtual machine concepts were a godsend for patch/fix testing because many of the systems could be copied, using P2V/Physical-2-Virtual copying techniques that produced a near carbon copy of a production system. The new virtualized copy could then be brought up as a virtual machine, patched-and-fixed according to instructions, then checked to see if dependencies or configuration(s) were changed sufficiently to cause a problem.

Virtualization platform makers such as VMWare and Citrix advanced the concept of “Lab Manager” software specifically to aid in the use of pilot platforms. These have library management functions to allow comparatively advanced methods of storing test platforms and restoring them for future testing.

The Lab Manager concept permitted groups of VMs (formerly discrete server processes) to be brought alive, modified, tested, and then frozen together again via the snapshot or VM-image storing processes for later revival. In this process, the dependencies are known; via iterative snapshots, processes can be rolled back to known points to find introductions of effects, bugs, or successful relationship states among service processes in this pilot environment.

Fast Forward: The Cloud

Conceptually, the cloud is an aggregation of resources, usually hardware and raw operating systems of a known state that can be rapidly provisioned for a job. (I don’t consider Software as a Service in this definition; those are simply hosted applications, meant for persistent use.) Once the purpose is finished, cloud resources are “disposed of,” meaning they’re restored to their initial form for the next job. Often, the jobs are non-persistent, job-control-like processes. Cloud resources, however, are perfect places to bring up library images of production/pilot images for patch/fix dependency checks.

Indeed, Citrix has brought forth its Lab Manager component of XenServer — Citrix’s virtualization platform, to perform repurposing of data center assets in just this way. Lab Manager builds rapidly-deployable groups of organizational assets and relationships; some use this for development piloting, others for IaaS, but it’s also just what the doctor ordered for patch/fix testing of complex platforms and dependencies.

The rationale of using cloud resources depends on their availability, which in cloud asset management, ought to be good. After all, the cloud is supposed to be designed with immediate availability in mind.

Building patch/test cloud platforms requires understanding the relationships of the platforms to be tested, their storage and IO needs, as well as the working relationships among the services participants in a system. Some of the states of the services must also have dependence on current production resources, such as DNS, routing, or production platforms.

As an example, bringing up a pilot e-mail system to test a patch often requires putting up an artificial wall around the applications so that they don’t try to assert themselves as production — and usurp real and working email. VM deployment skills, such as isolating networks behind hypervisor NAT, may be required. Your IT staff may need to emulate other resources, such as the aforementioned DNS server, or even an Active Directory Catalog server.

Often, these components can be obtained via snapshots of production systems. Those snapshots are then reconfigured in sandbox environments (walled away from the rest of the network and SAN) to prevent disruption to the pilot or production platforms. Once your staff vets a configuration, it can be “frozen” for further testing purposes in the same sandbox at a later date.

Approaching Cloud Testing

Internal cloud resources are often offered for many purposes, even to civilians to do common jobs such as data warehouse analysis. Civilians might never know that to accomplish a goal, they’re clicking on cloud VM resources to accomplish a mission, and so much the better. However, patch/fix mandates should shield these processes from civilians. One organization listed a patch server, and a user dutifully downloaded it and corrupted it.

For similar reasons, groups of analysts involved in testing patch/fix and configuration changes also realize that the nature of emulating large platforms can occasionally drag resources to their knees. The use of a qualified list of target host platforms for testing is advised, so that mercurial resource consumption doesn’t become the source of complaints.

Testing may require internal clouds, as external clouds may violate organizational security policy, regulatory authority, or data privacy needs. Therefore, many production environments that need testing must be tested locally rather than in the public cloud, even if the public cloud offers encryption and security vetting.

Once the cloud resources needed for testing have been defined and tested themselves, many analysts develop libraries of groups of server dependencies into revisions of specific production systems. These groups of VMs can be redeployed into test for subsequent patch/fix/configuration testing as groups, within a cloud. Once performance is vetted, the servers are once again saved as a group and given a revision component, so that a selection within the cloud resources menu allows the subsequently revised platform to be re-initialized on command. Revision control may also be a part of organizational audit and compliance requirements, and so saving clouds using revision control trees may help satisfy audit and documentation needs.

Specific Combo Limitations

Virtualization disassociates the hardware platform from the hosted operating systems combinations, along with their configurations, drivers, and other connectivity or state elements. However, some patches and fixes relate to how operating systems talk to hardware, and therefore are hypervisor-specific, perhaps hardware specific.

Because of this, patching hardware drivers or drivers that involve both operating systems to hardware must be tested where the hardware and hypervisor matches what you use in a production platform. This tends to narrow cloud choices — which are based on virtualization. To overcome this difficulty, some organizations, via choices allowed in Lab Manager-type software, emulate the widest common denominator of hardware platforms with virtualization choices.

Your IT staff can also test user platforms based on Windows and Linux to allow rapid software dependency checking. Apple is missing from this mix, although limited patch/fix testing can be accomplished via “bare metal” features of Parallels and VMware Xserve or Apple Server Edition testing.

Each production platform can become components of virtual machines, which in turn can usually be emulated in cloud platforms that are sandboxed so as not to disturb or cannibalize production systems. Using versioning techniques, analysts  create “slabs” of pilot platforms that emulate (actually are in many ways) production platforms for purposes of patch and fix testing. Allocating testing resources, partitioning these resources from civilians, and documenting everything provides an astute and rapidly deployable systematic approach to keeping patches and fixes from becoming crashes and burns.

Post a Comment
Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.

The HP Input Output site is sponsored by HP and features articles and content from HP and third-party contributors. Third-party articles and content, while paid for by HP, do not necessarily represent the views and opinions of HP. HP does not endorse this content and is not responsible for its accuracy, availability and quality.

Follow Us
Spotlight
The Permissions Your Database Users Really Need (Video) The 16 Linux Shell Commands Every Desktop Linux User Should Know 7 Deadly Sins of Job Searching: Why You Still Don't Have a Job, and How to Get Back on Track 9 Tech Analogies That No Longer Mean Anything To Those Young Whippersnappers
┼ Based on energy, paper and toner savings from regular printer usage. Results may vary.