A few practical strategies have emerged for managing dynamic virtual machines (VMs) in a cloud environment:
A central facility completes the provisioning of new VMs by connecting as root and running scripts remotely. Similarly, the central facility pushes updates and patches to running VM instances. Infrastructure management tools use various schemes to protect root credentials from hackers.
Each VM template includes configuration management tools and an initialization script that runs on startup. The script connects to servers that manage the cloud environment to register themselves. Running instances are responsible for checking for available updates and patches and keeping themselves up to date.
The bootstrapping process ensures new VM instances are fully configured and provisioned before they are activated on the network. Once running, an instance is never modified. When updates or patches are required, new instances are configured from sources and/or templates as appropriate, and the old instances are destroyed. Cloud infrastructure management tooling makes this process invisible to applications, which appear to be “always on” from an external perspective.
This is a very high-level description. There are numerous details that can vary from one setup to another. In general, the strategies of push, pull, and immutable instances represent a spectrum from less-secure-and-harder-to-scale to more-secure-and-easier-to-scale. You could argue that point by diving into implementation details of particular cloud environments and configuration products. It’s only a generalization.
Masterless Configuration Management
An approach known as masterless configuration management simplifies the environment, facilitates scaling, and improves security. Instead of using a master server to push updates or as a touch point for pulling updates, configuration and provisioning are run offline. Either by executing scripts manually or through a timed batch process, scripts can push updates to servers. To support a pull strategy, updates can be packaged using a package manager for each platform like yum, apt, or chocolatey. Then, running instances can pull their updates from a secure repository using standard mechanisms.
Challenge: Configuration Drift
When many instances of a given server definition are running, changes may be made to the configuration of individual instances. Before long, there’s no assurance that all the servers of a given type have the same configuration. This can complicate the work of supporting the servers. The longer instances remain up and running, the greater the chances that a modification will be made; typically when someone changes a server configuration manually, without going through the usual delivery pipeline. The phenomenon is called configuration drift.
Unfortunately, we live in a world where bad actors are continually trying to break into systems. They are well funded, often government supported, and they do nothing else all day but attack computer networks. They have more security skills than anyone we might hire to defend our network. There is no practical way to ensure no one will be able to hack into our systems, and no guarantee we will be able to detect an attack and deal with it before damage has been done.
When using a strategy that involves updating running VM instances (push or pull), the technique of continuous synchronization has emerged as a way to mitigate the two key challenges of configuration drift and security. On a scheduled basis, scripts are executed against running instances to re-apply configuration settings. The scripts are idempotent, meaning they are designed so that running them more than once will not cause any problems. Any configuration settings that have changed since the last update will be overwritten, and settings that are already correct will not be corrupted.
The strategy mitigates configuration drift by nudging all the configurations back to where they are supposed to be every hour or two. Some servers may be misconfigured temporarily from time to time.
The same strategy mitigates the risk of malware being injected into a running VM instance by reconfiguring each instance. This may result in malware being overwritten or deleted, even if system administrators are unaware the malware is present. It is not a reliable fail-safe, however.
Immutable Server Strategy
Masterless configuration management can simplify a push or pull strategy somewhat, but the immutable server strategy makes things even simpler by eliminating the need for synchronization altogether. Cloud environments are designed to destroy and create VMs on the fly in a way that is invisible to applications. The environment manages the switching of client traffic between instances so that in-flight transactions complete before their instance is destroyed. The environment also manages log output so that the data are consolidated in a way that makes sense for tracking down application-related issues, regardless of how the VMs supporting that application may have come and gone over time.
These features were originally intended to support the idea of the elastic cloud. That means the number of servers in a given category, supporting a given application, can increase or decrease dynamically based on load. To make that happen seamlessly and invisibly to applications, the environment must be able to start up and shut down VM instances while the application is active, without losing any log data or updates to application files and databases, and without impact to end users.
Thanks to that functionality, it’s feasible to replace VM instances rather than keeping them alive for the long term and applying updates. This has led to the so-called phoenix server strategy. Named for the mythical bird that rises from its own ashes, phoenix servers are routinely destroyed and rebuilt on a schedule even when there are no updates or patches to be applied.
An immutable instance can’t be modified, so servers managed in this way are not subject to configuration drift. The problem is solved by virtue of the fact it doesn’t exist.
Malware may be injected into running instances, but it will not live long before the phoenix burns itself up and arises again, reborn with nothing installed except what is in the virus-scanned template under version control. Thus, immutable phoenix servers provide pretty good security protection. Beware of getting a false sense of security, however, as professional cyber-criminals and cyber-warriors have nothing else to do with their time but find ways to break into your systems, while you have a lot of other things to do during the day. You’ll never completely foil them, but you can take steps to make their task less convenient.
For me, the security issue is reason enough to choose the most resilient strategy available. And yet, immutable servers and phoenix servers are not widely adopted. Instead, most system administrators seem to have the “old school” mindset about servers: They try to keep each instance alive as long as possible, even extending to years of continuous up-time. Back in the day this was a reasonable strategy, due to the cost of downtime and the delay in restarting failed systems. In a contemporary cloud environment, the same strategy doesn’t yield value. If anything, it increases risk, cost, and difficulty.