Unit Testing Shell Scripts:
Part Four

WRITTEN BY Dave Nicolette

Unit Testing

So far, in this series of posts on unit testing shell scripts, we’ve covered several tools for unit testing scripts written in common *nix shell languages like bash, korn, and zsh. The same concepts and methods apply to any shell.

The tools covered so far are general-purpose test frameworks for shell languages on *nix systems: shunit2, BATS, zunit, bash-spec, and korn-spec. There are also unit testing frameworks designed to support specific use cases, like Pester (for Powershell), t-rexx (for Rexx), ChefSpec (for Chef recipes), and rspec-puppet (for Puppet).

In this fourth installment, we’ll take a look at Pester, the unit testing framework for Powershell.

What’s Powershell?

Powershell is a Windows-oriented scripting language tailored for system administration tasks. It comes preinstalled with Microsoft Windows® 10, and it can be installed on earlier versions of Windows as well as Arch- and Debian-based Linux distros, either standalone as a host or guest OS or running on a Windows host under the Windows Subsystem for Linux.

I expect Powershell to prove useful for system administrators who need to manage Windows instances. The reason it is supported on Linux may be to help a system administrator using a Linux system to manage remote Windows instances. (I don’t have any inside information about that; it just seems a reasonable supposition.) You can execute Powershell commands and scripts against remote machines. I do not expect Powershell to supplant *nix shell languages for working with platforms other than Windows.

A Word on Documentation

As per normal, the official Microsoft documentation can be hard to navigate. Many people turn to resources like StackOverflow to learn how to use any given tool to perform any given task. Powershell is currently under active development, and things change rapidly. Advice and examples you find outside the official Microsoft world are likely to be out of date. You’ll get all kinds of contradictory advice and non-working sample scripts from “random” resources. Fair warning.

What’s Pester?

Pester is a Powershell module that provides most of the functionality we expect from a good unit testing framework or library. It’s preinstalled on Windows 10, although the Pester project site recommends upgrading to the latest version before starting to use it. I will second that recommendation, as I encountered difficulty running the sample Get-Planet.Tests.ps1 script from the project site on Windows 10, even though it ran fine on Linux. Upgrading fixed the problem.

Our Test Subject

We’ve been using Vivek Gite’s sample script for checking disk usage and optionally sending an email notification. That’s a standard *nix shell script that will not run on Windows. To keep things consistent, we ought to create a Powershell script that performs equivalent tasks.

In the previous installments, we started with an existing script and wrote unit test cases to cover it. Here we have an opportunity to use a unit testing framework to test-drive a new script.

This may be an unfamiliar approach for readers who don’t come from an application development background. It’s one of the development skills that system administrators and infrastructure engineers are picking up from the software engineering world as devops gains ground in the industry.

I don’t want to give the impression that test-driven development (TDD) is unique to Pester and Powershell. We can use the same approach to develop scripts in any language using any unit testing framework or library. It just so happens we need to write a new Powershell script right now.

The first step in test-driving a solution is to think about what we need the code to do. Recall that Vivek’s script checks the disk usage and issues an email notification if the utilization is 90% or higher. Our previous test scripts contain two cases:

  • Ensure no notification is sent when disk utilization is 89%
  • Ensure a notification is sent when disk utilization is 90%

For our immediate purposes, we can consider those our “requirements.”

The next step is to write a piece of code that acts as a client to the new script and asserts the result we expect for the particular preconditions we set up. The precondition for the first case is that disk utilization stands at 89%. The “piece of code” in this case will be a Pester function.

Pester uses the behavioral-style syntax for expressing test cases and asserting results, with function names like Describe, Context, and It. These names will be familiar to people who have worked with another behavioral-style framework, such as Rspec. They follow the usual pattern of taking two arguments: A string that describes the intent of the function, and a closure or block containing the test code. Pester also uses one of the two common behavioral-style assertion patterns: Should. (The other one is expect; you saw it in the previous installment about bash-spec.)

Here’s a sample of some Pester code to get us started:

$here = (Split-Path -Parent $MyInvocation.MyCommand.Path)
. $here\Disk-Usage.ps1

Describe 'Disk-Usage' {
    Context 'disk usage is below threshold' {
        It 'issues no notification when disk usage is 89%' {
            Mock Get-WmiObject { [PSCustomObject]@{ Size = 100; FreeSpace = 11 }}
            Mock Send-MailMessage
            NotifyWhenThresholdExceeded -Threshold 90.0
            Assert-MockCalled Send-MailMessage -Times 0
        }
    }
}

One thing to notice here is that the behavioral-style syntax makes the test case pretty easy to follow. Tools that help people understand the intent of code are very beneficial, as the time of software professionals is the single largest cost in any enterprise IT environment. The value increases in direct proportion to the number of test cases and test scripts we have to maintain.

The dot command means to source an external script; this is the same as with *nix shells. What’s different is that we have to excavate the current working directory name from wherever Windows has buried it, rather than assuming ‘.\’ points to it by default.

As we’re using TDD, our actual script file doesn’t exist yet. We’re assuming the name of it will be Disk-Usage.ps1, and that it will reside in the same directory as the test script. We know that file doesn’t exist. That’s okay; it’s part of the TDD process.

The initial test script shows the use of the Describe, Context, and It functions for illustration, even though we don’t strictly need a Context block at this stage. RealWorld® test scripts can be substantially more complicated than this example, and the value of the Context keyword will be more apparent.

We’re making some assumptions to get started. We’ll refine the test script as we proceed. For now, we’re assuming the Disk-Usage.ps1 script will have a function named NotifyWhenThresholdExceeded that takes a parameter named -Threshold of type double (the type is suggested by the decimal point in the parameter value).

We’re also assuming the NotifyWhenThresholdExceeded function will call the Powershell cmdlet Send-MailMessage when it wants to issue a notification. We’re using Pester’s built-in mocking support to define a mock of Send-MailMessage. That will prevent a real email from being generated, as well as giving us a mechanism to check whether the code under test called the mock. This is useful here because the NotifyWhenThresholdExceeded function does not return a value.

The other mock, Get-WmiObject, will return values for Size and FreeSpace that would normally be returned from a Powershell command that returns the total space and used space on a volume, like this:

Get-WmiObject Win32_LogicalDisk -Filter "DeviceID='C:'"

So, we’re assuming the Disk-Usage.ps1 script will call that Powershell command to look up the disk usage information. Of course, that code doesn’t exist yet.

Functionally, all of that is the same thing we’ve been doing in our other test scripts. Here we’re using facilities of Powershell and Pester to do it.

Now we’re ready to start the TDD cycle: red, green, refactor. We run the test script expecting it to fail because we haven’t defined a function named NotifyWhenThresholdExceeds. If it fails for any other reason, it means we haven’t set up the test case properly. When I tried it, I got these error messages:

. : The term 'C:\Users\[stuff]\Disk-Usage.ps1' 
    is not recognized as the name of a cmdlet, function, script file,
    or operable program [...] Disk-Usage.Tests.ps1:2, char:3

CommandNotFoundException: The term 'NotifyWhenThresholdExceeded'
    is not recognized as the name of a cmdlet, function, script file,
    or operable program [...] Disk-Usage.Tests.ps1: line 9

That’s what we expected, so we have “red for the right reason.” There’s no such file as Disk-Usage.ps1, and there’s no such function as NotifyWhenThresholdExceeded. Yet.

Next step in the TDD cycle is to make the test case “green”; in other words, make it pass. The minimum code to make it pass is a function declaration. Let’s create a file, Disk-Usage.ps1, with the following contents:

function NotifyWhenThresholdExceeded ([double]Threshold = 90.0)
{

}

This time, the test script Disk-Usage.Tests.ps1 runs without error. We’ve reached the “green” step in the TDD cycle.

A note for those new to TDD: Clearly, this implementation is not correct. It doesn’t do anything at all. It isn’t “deciding” not to issue a notification. This is normal for work-in-progress using TDD. The way to drive out the rest of the functionality is by defining additional unit test cases that “force” us to write the necessary code.

Before proceeding to the next “red” step we need to do the “refactor” step, the third step in the TDD cycle. We look for opportunities to simplify and clean up the code. At the moment, there’s nothing to do, as we’ve only just begun. It’s still a good habit to examine the code and explicitly decide not to refactor, rather than to assume refactoring isn’t necessary.

Next, we need to check the case where the disk usage it at the threshold of 90%. We’ll add another It function to our test script, resulting in:

$here = (Split-Path -Parent $MyInvocation.MyCommand.Path)
. $here\Disk-Usage.ps1

Describe 'Disk-Usage' {
    Context 'disk usage is below threshold' {
        It 'issues no notification when disk usage is 89%' {
            Mock Get-WmiObject { [PSCustomObject]@{ Size = 100; FreeSpace = 11 }}
            Mock Send-MailMessage
            NotifyWhenThresholdExceeded -Threshold 90.0
            Assert-MockCalled Send-MailMessage -Times 0
        }
    }
    Context 'disk usage is at or above threshold' {
        It 'issues an email notification when disk usage is 90%' {
            Mock Get-WmiObject { [PSCustomObject]@{ Size = 100; FreeSpace = 10 }}
            Mock Send-MailMessage
            NotifyWhenThresholdExceeded -Threshold 90.0
            Assert-MockCalled Send-MailMessage -Times 1
        }
    }
}

The second test case is very similar to the first. We set up the Get-WmiObject mock to represent 90% disk usage, and we assert the Send-MailMessage mock is called once. When we run the test script, we want to see the first case pass and the second case fail for the right reason.

When I ran it at this point, I got the following output:

Context disk usage is below threshold
  [+] issues no notification when disk usage is 89% 55ms

Context disk usage is at or above threshold
  [-] issues an email notification when disk usage is 90% 77ms
  Expected Send-MailMessage to be called at least 1 times but was called 0 times

This is where we expected to be. Now we need to add some logic to the Disk-Usage.ps1 script to make both test cases pass. Here’s a crude implementation:

function NotifyWhenThresholdExceeded ([double]Threshold = 90.0)
{
    $diskusage = Get-WmiObject Win32_LogicalDisk -Filter "DeviceID='C:'"
    $size = $diskusage.Size
    $used = $size - $diskusage.FreeSpace
    $percentused = ($used / $size) * 100 
    If ($percentused -ge $Threshold) 
    {
        Send-MailMessage -From "sender" -To "recipient"
            -Subject "Oh, no!" -Body "Running out of disk space!"
    }
}

This time we get:

Context disk usage is below threshold
  [+] issues no notification when disk usage is 89% 43ms

Context disk usage is at or above threshold
  [+] issues an email notification when disk usage is 90% 50ms

And that’s about it.

Code Coverage

Pester can calculate code coverage metrics, much like a testing framework for application programming languages. To get coverage metrics for the Disk-Usage.Tests.ps1 script, we would execute Pester this way:

Invoke-Pester .\Disk-Usage.Tests.ps1 -CodeCoverage .\Disk-Usage.ps1

This results in the output shown above followed by:

Tests completed in 1.04s
Tests Passed: 2, Failed: 0, Skipped: 0, Pending: 0, Inconclusive: 0

Code coverage report:
Covered 100.00% of 7 analyzed commands in 1 file.

Of course, that sort of information is far more useful when you have a larger number of test cases and files. It’s a useful feature that isn’t included in most test frameworks for shell languages.

Some Caveats

Readers familiar with basic software testing techniques may be squirming in their seats. The two test cases are hardly sufficient to provide robust detection of potential errors. For instance, we have not validated the behavior of the code when an invalid or empty parameter is passed, or when the caller depends on the default value of Threshold.

Furthermore, the “below threshold” case could pass for any number of reasons other than the Disk-Usage.ps1 script intentionally “deciding” not to issue a notification. The risk of false positives is high.

One more thing: The way we coded our mock logic for Send-MailMessage, the test case would pass if the code under test called Send-MailMessage more than once. We really want to verify it is called exactly once. So, there is room for improvement.

Bear in mind the purpose of this post is to show how you can use the Pester module with Powershell to write executable unit tests. It isn’t meant to be a comprehensive testing exercise. From this starting point, you could flesh out the test suite as appropriate for a professional environment. The simplest approach would be to continue repeating the TDD cycle, adding whatever conditions you can think of, until you’re satisfied with the quality and coverage of the suite. At least, that’s how we usually do it in the RealWorld®

What’s Next?

We’ve explored a handful of unit test frameworks for general-purpose shell scripts on *nix platforms, and one such framework for Powershell. Next, we’ll look at two special-purpose unit testing frameworks for the server provisioning tools, Chef and Puppet.

leave a comment

Leave a comment

Your email address will not be published. Required fields are marked *