Overview
There are many technology systems with which increased capability follows increased complexity and consequent responsibility. A simple example is an automated traffic light at a multi-lane four way intersection. The power of the traffic light is in its ability to efficiently route vehicles though the intersection. This is done with a complex algorithm that is given predictions (or real time data) including the time of day, inbound traffic flow on each lane, outbound traffic capacity, and turning percentages. With this power comes “under the covers complexity” that we never see as drivers. Also with this power comes responsibility. What happens if the electricity goes out at the intersection? What if the structure holding the lights fails? How is the functioning of the system monitored continually? Most of us have witnessed the backup plans for a traffic light – it fails to a blinking red (which reduces its performance but continues traffic flow) or ultimately may be replaced with a police officer (which may have the same or better performance but is impractical in the long term).
As server processing in the information technology industry moves to virtual platforms we, as integrators and administrators, have the same issues of complexity and responsibility. We have two new issues:
- Virtualization is much more complex and deserves respect
- Additional administration and monitoring are required
We need to take a moment and review new virtual platform responsibilities: backups, replication, snapshot management, monitoring, and updates. I have left out performance in this discussion: we must first focus on keeping our new infrastructure stable.
Backups and Snapshots
Backups in a virtual environment can be handled differently. Rather than traditional filesystem backups we can now backup the entire virtual machine. However, there are some details which are important to understand. We have become accustomed to “agents” in traditional backup products which use various technologies to allow service style data applications to remain online during the backup process. These agents are available for most leading service applications such as Oracle, Microsoft SQL, Microsoft Exchange, Lotus Notes, and others. If the backup task is moved to the virtual layer, we need to make sure that our backup methodology creates “clean” backups by communicating with the operating system level (and possibly its applications). Otherwise we need to make sure that each virtual machine (and its applications) can withstand a “crash consistent” backup. In addition, backups can now create “runnable” virtual machine backups as well as “restorable” backups. This newsletter article cannot go into all of the options for performing backups in a virtual environment; however, we must recognize that a thorough understanding of the new technology must be used to deploy the appropriate strategy.
Because a virtual machine is encapsulated in a single file or files a new opportunity to make a replica of a virtual machine exists. If the virtual mahcine is turned “off”, replicas can be created by simply copying the virtual machine files. If the machine is turned “on” then a snapshot can be used to backup the VM. The snapshot is taken which now causes all the disk changes to go to the snapshot file while the base disk file remains static. The base file is copied, either fully or differentially, to a target location. This makes an exact replica of the VM at the point the snapshot was taken. Once completed, the snapshot is merged with the base file to return VM disk to a single file. Of course, replicating a machine is subject to some of the same issues as backups.
Snapshot Management
Snapshot management is likely the most significant increase in complexity that comes with virtualization. A snapshot is a file that tracks all of the changes to another file. The snapshot can then be used to provide access to the file as it existed at the time of the snapshot. Snapshots are used in a virtual environment for backups, replication, cloning, and migration. Snapshots are a great tool to avert risk. For example, if you are about to upgrade an application server from version 3 of an application to version 4 you can first take a snapshot, then perform your upgrade, then test your results. If the upgrade fails or the testing is unsuccessful then the snapshot can be used to revert to the state at the time the snapshot was taken; thus, removing the upgrade attempt. The consequence of a snapshot is that it never stops growing. Even if data is deleted from the disk on the virtual machine, the snapshot continues to grow because it is tracking changes not just storing data. If snapshots are not removed they can grow beyond a manageable size. Therefore a strategy is needed to manage “snaps” and to assure that they do not exist in places where they are not monitored.
Management and Maintenance
When an operating system such as Microsoft Windows is controlling the hardware it is a conduit for information about the hardware on which it is running. If a disk is lost in a mirror set the RAID controller service running on Windows would generate an event log entry which could be captured. Or the RAID controller service would generate its own alert via another running service. Let’s consider the same situation in a virtual environment. The RAID controller is now being controlled by the virtual layer (hypervisor) and the VM operating system is merely aware of the virtual disk it has been allotted. If there is a disk failure in a RAID 1 mirror the Windows operating system does not know anything about it; it continues to run as if everything is fully functional. The virtual layer knows about it; however, the virtual layer needs to be setup to notify the responsible group that an error exists. Our traditional operating system alerts are no longer enough. We need to deploy and additional layer of monitoring and alerting at the hypervisor level in order to track the functionality of the hardware.
Most system users are familiar with a notification that updates are available for the operating system that they are using. In Windows this occurs on the System Tray. On the Mac it is typically a pop-up. On the Linux desktop it is in the upper right of the screen. Where is it for the hypervisor layer? Since we do not interact with the hypervisor layer in the same fashion how do we notice that updates are available? Different hypervisors use different methods. Most hypervisors can be managed by a central service which can notify a group that updates exist. However, once again this requires additional setup that, as administrators, we have not had to address in the past.
We must remember that the virtual layer, or hypervisor, is in fact an operating system itself. It is an additional layer to understand and manage. It offers system integrators tremendous flexibility, redundancy, and scalability with the addition of complexity and responsibility. Ignore its complexity and it can crumble unexpectedly; figure it out and it is a revolutionary step forward.